Brian Pontarelli on Artificial Intelligence Systems
You can’t trust an artificial intelligence system to consistently protect your online community from inappropriate content. All artificial intelligence systems suffer from two critical flaws:
- Artificial intelligence systems constantly require costly training and retraining
- Re-training of the artificial intelligence system leads to inconsistent performance
Every such system on the market today suffers from these flaws. In this 2-minute video, Inversoft CEO, Brian Pontarelli, explains why CleanSpeak is a different and more effective technology.
The sixth in a series of posts about the finer points of profanity filtering...
Embedded words occur when a dictionary word or proper name contain profanity:
- Don't assume profanity filters are inaccurate
- Harry Lipshitz has a hard time creating accounts on web sites
- This has been documented as the Scunthorpe problem
CleanSpeak's sophisticated profanity filter looks for dictionary words that contain profanity and safely ignores them during the filtering process. Poorly written filters will often get caught up on these simple cases and flag large number of dictionary words as profanity. CleanSpeak pulls from a large set of dictionary words and proper names in real time, over 140,000 in all, to correctly handle this situation and avoid a potentially large number of false positives without hindering performance.
The fifth in a series of posts about the finer points of profanity filtering...
One of the more sophisticated attacks that users employ against profanity filters involves inserting separators, such as spaces or periods, between the other characters of a word so that the word can still easily be read.
The following examples illustrate how the simple process of inserting additional non-alphabetic characters between the characters of the word does not interfere with the reader's ability to identify the word correctly:
- s m u r f
- s....m u r....f
I'm going to smash it (false positive!)
It might be difficult to see the profanity in #4, but if you look at the last 4 characters on their own, you'll see it.
The fourth in a series of posts about the finer points of profanity filtering...
Repeat characters is another commonly used filter attack that involves the simple repetition of characters in a word. This straightforward tactic still fools many profanity filters, most of which are not designed to ignore multiple instances of the same character:
CleanSpeak's Profanity Filter is capable of detecting this type of filter attack and will correctly and automatically identify words regardless of the use of repeated characters.
Profanity Filtering 101: The Grawlix
Profanity Filtering 101: Character Replacements & Leet Speak
Profanity Filtering 101: Phonetics
The third in a series of posts about the finer points of profanity filtering...
Phonetic replacement is the process of replacing characters with other alphabetic characters (or removing unnecessary characters) while still retaining the phonetic structure of the word. This tactic is often used to attack filters that do not understand phonetics:
- Teech me guitar
- Attak the main castle gate
Example #1 is a simple character swap of an "a" to an "e" that still retains the same phonetic structure of the word and allows the reader to infer the original word.
Example #2, on the other hand, is an example of character collapsing. In this example the “ck” in the word “Attack” has been collapsed to a single “k” character.