Profanity Filtering 101: Phonetics
- By Brian Pontarelli
- CleanSpeak
- August 29, 2013
The third in a series of posts about the finer points of profanity filtering...
Phonetic replacement is the process of replacing characters with other alphabetic characters (or removing unnecessary characters) while still retaining the phonetic structure of the word. This tactic is often used to attack filters that do not understand phonetics:
- Teech me guitar
- Attak the main castle gate
Example #1 is a simple character swap of an "a" to an "e" that still retains the same phonetic structure of the word and allows the reader to infer the original word.
Example #2, on the other hand, is an example of character collapsing. In this example the “ck” in the word “Attack” has been collapsed to a single “k” character.
In some cases characters can't be collapsed without changing the meaning of the word. For example, the word “been” can't be collapsed to “ben”. Therefore, a filter can't simply ignore multiple characters that are phonetically the same. It has to understand if the word can be collapsed.
This type of filter attack is a common tactic of users due to the fact that most filters have no knowledge of phonetics. CleanSpeak's Profanity Filter is able to correctly identify the majority of these types of phonetic attacks. It understands that many characters can be swapped, and in some words, multiple characters can also be collapsed. It also knows which words can't be collapsed without impacting the meaning of the word.
Further Reading:
Profanity Filtering 101: The Grawlix
Profanity Filtering 101: Character Replacements & Leet Speak