CleanSpeak 3.0 Release & Updates

Mike King

CleanSpeak 3.0 Features

We are pleased to announce the release of CleanSpeak 3.0. This is the largest and most significant release of CleanSpeak yet. It is packed with features and improvements that make it even faster and easier to use. You can even use Google login with CleanSpeak.  Let’s dig into some of the other new features.

BBCode

CleanSpeak now fully supports BBCode. You can send CleanSpeak BBCode and it will properly filter and analyze it for profanity and other unwanted content. It will also correctly handle BBCode attributes that might also contain profanity. CleanSpeak will render the BBCode in queues and search results and can be configured to include the custom BBCode tags from your forum.  Finally, you can configure how BBCode is filtered and rendered in CleanSpeak.

Continue reading

Profanity Filtering Techniques: Embedded Words

Brian Pontarelli

The sixth in a series of posts about the finer points of profanity filtering…

Embedded

Embedded words occur when a dictionary word or proper name contain profanity:

  1. Don’t assume profanity filters are inaccurate
  2. Harry Lipshitz has a hard time creating accounts on web sites
  3. This has been documented as the Scunthorpe problem

Continue reading

Tags:

Profanity Filter Techniques : Separators

Brian Pontarelli

Profanity Filter Separators

The fifth in a series of posts about the finer points of profanity filtering…

One of the more sophisticated attacks that users employ against profanity filters involves inserting separators, such as spaces or periods, between the other characters of a word so that the word can still easily be read.

The following examples illustrate how the simple process of inserting additional non-alphabetic characters between the characters of the word does not interfere with the reader’s ability to identify the word correctly:

  1. s…….m…..u…..r……f
  2. s m u r f
  3. s….m u r….f
  4. I’m going to smash it (false positive!)

It might be difficult to see the profanity in #4, but if you look at the last 4 characters on their own, you’ll see it.

Filters that do not intelligently handle separators will incorrectly identify this sentence as inflammatory and generate a false positive. Therefore, the filter must understand how word separators behave within sentences and how they can be used as an attack.

Continue reading

Tags:

Profanity Filtering Techniques: Repeat Characters

Brian Pontarelli

Repeat Characters

The fourth in a series of posts about the finer points of profanity filtering…

Repeat characters is another commonly used filter attack that involves the simple repetition of characters in a word. This straightforward tactic still fools many profanity filters, most of which are not designed to ignore multiple instances of the same character:

  • heeeeeeeeeeellllllllllllooooooooooooo

CleanSpeak’s Profanity Filter is capable of detecting this type of filter attack and will correctly and automatically identify words regardless of the use of repeated characters.

Continue reading

Tags:

Profanity Filtering Techniques: Phonetic Replacements

Brian Pontarelli

The third in a series of posts about the finer points of profanity filtering…

Inversoft Phonetics

Phonetic replacement is the process of replacing characters with other alphabetic characters (or removing unnecessary characters) while still retaining the phonetic structure of the word. This tactic is often used to attack filters that do not understand phonetics:

  1. Teech me guitar
  2. Attak the main castle gate

Example #1 is a simple character swap of an “a” to an “e” that still retains the same phonetic structure of the word and allows the reader to infer the original word.

Example #2, on the other hand, is an example of character collapsing. In this example the “ck” in the word “Attack” has been collapsed to a single “k” character.

In some cases characters can’t be collapsed without changing the meaning of the word. For example, the word “been” can’t be collapsed to “ben”. Therefore, a filter can’t simply ignore multiple characters that are phonetically the same. It has to understand if the word can be collapsed.

Continue reading