Filtering Phrases

Available since 3.18.0

The phrase API that existed before 3.18.0 has been removed and will not be automatically migrated to this new version of phrases.

1. Filtering Phrases

The Blacklist Phrase tool is a powerful addition to CleanSpeak that allows text to be matched on Blacklist Tags embedded in a regex. These phrases will match similar to a normal Blacklist entry and uses the same set of rules in the application configuration as a Blacklist entry.

1.1. Creating a Phrase by Example

Consider the case where you want to prevent users from trying to sell social media likes.

First, you will need a good base set of words you want blocked when viewed together. These words are added as standard Blacklist entries with the associated tags on them. A good starting point would be the table below.

Tag

Words

Purchase

Buy, Purchase, Get

Company

Facebook, Twitter, YouTube

Social-Like

Likes, Thumbs Up, Retweet

A Blacklist Phrase can then be created from these tags using the tags from above. The phrase we will create will use regex to find any regions of code with those tags embedded. In order to match someone saying "Buy facebook likes", the phrase will have a regex of %Purchase%\s+%Company%\s+%Social-Like%. This will match Purchase, Company, Social-Like separated by any number (but at least one) spaces. This will cause the whole region of text in the input to be matched just like regex but can handle variations using the power of the Blacklist entries on any tagged words. Aka Buuyy FaaceeBok Likees and many other variations can still be easily matched and found by the phrase.

1.2. How to create a Phrase from the UI

You can create a Phrase by going to your management interface, navigating to Filters → Blacklist → Phrases The Phrases button will be found in the drop down menu in the top right of the Blacklist configuration page. Clicking Add (plus button in the top right), and begin creating a phrase by clicking in the box and typing a regex. Blacklist entries can be matched on their tags by typing %TagName% where the tag name is the tag you are interested in matching that includes only the Blacklist entries that make sense in that position of the regex. You can also click the pills below the pattern field to quickly add a tag to the pattern.

Phrases also utilize a severity, locale, and tags to classify how they should be treated by an application or if they should be ignored by the minimum severity setting on the filter. You can then then click Save (Top right) and the phrase will be created.

1.3. How to create a Phrase from the API

Below is an example of creating a phrase using the API.

URI

POST /filter/blacklist/phrase

Example Request JSON

{
  "phrase": {
    "locale": "en",
    "pattern": "%Purchase%.+%Company%",
    "severity": "mild",
    "tags": [ "Phrase" ]
  }
}

Table 1. Request Body
phrase.locale [String] required	The locale of the pattern/phrase. See Locales.
phrase.pattern [String] requried	A regex pattern that can optionally include blacklist tags to match on blacklist entry matches. Tags are of the form `%TagName%` in a regex.
phrase.severity [String] required	The severity of the phrase. A severity can be any of the following: `mild` `medium` `high` `severe`
phrases.tags [Array<String>] required	A list of blacklist tag names that this phrase should be associated to. At least one tag is required.

Table 2. Response Body
phrase.id [Integer]	The Id of the phrase.
phrases.locale [String]	The locale of the phrase. See Locales.
phrases.pattern [String]	A regex pattern that can optionally include blacklist tags to match on blacklist entry matches. Tags are of the form `%TagName%` in a regex.
phrase.severity [String]	The severity of the phrase. A severity can be any of the following: `mild` `medium` `high` `severe`
phrase.status [String]	The status of the phrase in the filter. The API always returns `ACTIVE` entries as the APIs only operate on the active filter state.
phrases.tags [Array<String>]	A list of blacklist tag names that this phrase should be associated to.

Example Response JSON

{
  "phrase": {
    "id": 1,
    "locale": "en",
    "parts": "%Purchase%.+%Company%",
    "severity": "mild",
    "status": "ACTIVE",
    "tags": [ "Phrase" ]
  }
}

1.4. How to configure an Application to use the Phrases

Phrase application settings share the configuration of the Blacklist entries. In order to automatically handle a phrase match you will need to create a rule that either contains no locale or has the locale of the phrase and must contain at least one tag from the phrase. The most severe rule matching a phrase will then be applied during moderation.

1.5. Example Filter Request

URI

POST /content/item/filter/

Table 3. Headers
Authorization [String]	The API Key of your application is required to make a request to the webservice. You can find this under Settings → API Keys in the CleanSpeak Management Interface.

Request Body

{
    "content": "buy facebook likes"
}

Response

{
    "matches": [
        {
            "blacklistResult": "phrase",
            "length": 18,
            "locale": "en",
            "matched": "buy facebook likes",
            "quality": 1,
            "root": "%Purchase%\s+%Company%\s+%Social-Like%",
            "severity": "mild",
            "start": 0,
            "tags": [
                "Phrase"
            ],
            "type": "blacklist"
        },
        {
            "blacklistResult": "basic",
            "length": 3,
            "locale": "en",
            "matched": "buy",
            "quality": 1,
            "root": "buy",
            "severity": "none",
            "start": 0,
            "tags": [
                "Purchase"
            ],
            "type": "blacklist"
        },
        {
            "blacklistResult": "basic",
            "length": 8,
            "locale": "en",
            "matched": "facebook",
            "quality": 1,
            "root": "facebook",
            "severity": "medium",
            "start": 4,
            "tags": [
                "Company",
                "PII"
            ],
            "type": "blacklist"
        },
        {
            "blacklistResult": "basic",
            "length": 5,
            "locale": "en",
            "matched": "likes",
            "quality": 1,
            "root": "like",
            "severity": "none",
            "start": 13,
            "tags": [
                "Social-Like"
            ],
            "type": "blacklist"
        }
    ],
    "metaMatches": [],
    "replacement": "******************"
}

1.6. Hints

All regexes have Unicode and Ignore Case turned on by default. This matches the behavior of all of the other filters in CleanSpeak.
To match any case sensitive text you can add a regex flag (?-i) to disable case insensitivity starting at the character after that flag. To turn it back on you can use (?i) which turns on case insensitivity at the character after that flag.
All Blacklist matches that contain tags used in the phrase will be converted to tag groups and can only be matched by a wildcard or tag group
All Blacklist matches that do not contain any tags used in the phrase will be left as their original text and can only be matched by standard regex mechanisms.
All % signify a start or end of a tag. To use a percent literal you need to escape it. \% (This includes when inside a tag name!)