Available since 3.18.0 |
The phrase API that existed before 3.18.0 has been removed and will not be automatically migrated to this new version of phrases. |
1. Filtering Phrases
The Blacklist Phrase tool is a powerful addition to CleanSpeak that allows text to be matched on Blacklist Tags embedded in a regex. These phrases will match similar to a normal Blacklist entry and uses the same set of rules in the application configuration as a Blacklist entry.
1.1. Creating a Phrase by Example
Consider the case where you want to prevent users from trying to sell social media likes.
First, you will need a good base set of words you want blocked when viewed together. These words are added as standard Blacklist entries with the associated tags on them. A good starting point would be the table below.
Tag |
Words |
|
|
|
|
|
|
A Blacklist Phrase can then be created from these tags using the tags from above. The phrase we will create will use regex to find any regions of code with those tags embedded. In order to match someone saying "Buy facebook likes", the phrase will have a regex of %Purchase%\s+%Company%\s+%Social-Like%
. This will match Purchase
, Company
, Social-Like
separated by any number (but at least one) spaces. This will cause the whole region of text in the input to be matched just like regex but can handle variations using the power of the Blacklist entries on any tagged words. Aka Buuyy FaaceeBok Likees
and many other variations can still be easily matched and found by the phrase.
1.2. How to create a Phrase from the UI
You can create a Phrase by going to your management interface, navigating to %TagName%
where the tag name is the tag you are interested in matching that includes only the Blacklist entries that make sense in that position of the regex. You can also click the pills below the pattern field to quickly add a tag to the pattern.
Phrases also utilize a severity, locale, and tags to classify how they should be treated by an application or if they should be ignored by the minimum severity setting on the filter. You can then then click Save
(Top right) and the phrase will be created.
1.3. How to create a Phrase from the API
Below is an example of creating a phrase using the API.
POST /filter/blacklist/phrase
{
"phrase": {
"locale": "en",
"pattern": "%Purchase%.+%Company%",
"severity": "mild",
"tags": [ "Phrase" ]
}
}
phrase.locale [String] required |
The locale of the pattern/phrase. See Locales. |
phrase.pattern [String] requried |
A regex pattern that can optionally include blacklist tags to match on blacklist entry matches. Tags are of
the form |
phrase.severity [String] required |
The severity of the phrase. A severity can be any of the following:
|
phrases.tags [Array<String>] required |
A list of blacklist tag names that this phrase should be associated to. At least one tag is required. |
phrase.id [Integer] |
The Id of the phrase. |
phrases.locale [String] |
The locale of the phrase. See Locales. |
phrases.pattern [String] |
A regex pattern that can optionally include blacklist tags to match on blacklist entry matches. Tags are of
the form |
phrase.severity [String] |
The severity of the phrase. A severity can be any of the following:
|
phrase.status [String] |
The status of the phrase in the filter. The API always returns |
phrases.tags [Array<String>] |
A list of blacklist tag names that this phrase should be associated to. |
{
"phrase": {
"id": 1,
"locale": "en",
"parts": "%Purchase%.+%Company%",
"severity": "mild",
"status": "ACTIVE",
"tags": [ "Phrase" ]
}
}
1.4. How to configure an Application to use the Phrases
Phrase application settings share the configuration of the Blacklist entries. In order to automatically handle a phrase match you will need to create a rule that either contains no locale or has the locale of the phrase and must contain at least one tag from the phrase. The most severe rule matching a phrase will then be applied during moderation.
1.5. Example Filter Request
POST /content/item/filter/
Authorization [String] |
The API Key of your application is required to make a request to the webservice. You can find this under in the CleanSpeak Management Interface. |
{
"content": "buy facebook likes"
}
{
"matches": [
{
"blacklistResult": "phrase",
"length": 18,
"locale": "en",
"matched": "buy facebook likes",
"quality": 1,
"root": "%Purchase%\s+%Company%\s+%Social-Like%",
"severity": "mild",
"start": 0,
"tags": [
"Phrase"
],
"type": "blacklist"
},
{
"blacklistResult": "basic",
"length": 3,
"locale": "en",
"matched": "buy",
"quality": 1,
"root": "buy",
"severity": "none",
"start": 0,
"tags": [
"Purchase"
],
"type": "blacklist"
},
{
"blacklistResult": "basic",
"length": 8,
"locale": "en",
"matched": "facebook",
"quality": 1,
"root": "facebook",
"severity": "medium",
"start": 4,
"tags": [
"Company",
"PII"
],
"type": "blacklist"
},
{
"blacklistResult": "basic",
"length": 5,
"locale": "en",
"matched": "likes",
"quality": 1,
"root": "like",
"severity": "none",
"start": 13,
"tags": [
"Social-Like"
],
"type": "blacklist"
}
],
"metaMatches": [],
"replacement": "******************"
}
1.6. Hints
-
All regexes have Unicode and Ignore Case turned on by default. This matches the behavior of all of the other filters in CleanSpeak.
-
To match any case sensitive text you can add a regex flag
(?-i)
to disable case insensitivity starting at the character after that flag. To turn it back on you can use(?i)
which turns on case insensitivity at the character after that flag. -
All Blacklist matches that contain tags used in the phrase will be converted to tag groups and can only be matched by a wildcard or tag group
-
All Blacklist matches that do not contain any tags used in the phrase will be left as their original text and can only be matched by standard regex mechanisms.
-
All
%
signify a start or end of a tag. To use a percent literal you need to escape it.\%
(This includes when inside a tag name!)