Multilingual Filtering: 5 Ways to Prevent False Positives
- By Sean Bryant
- Technology, Online Community, CleanSpeak
- April 30, 2013
Communities are no longer restricted by walls or boundaries. People from all over the world can congregate and share their thoughts and opinions from the click of a button. A site owner has an inherent responsibility to protect users and prevent unwanted content. The chat filter is your first line of defense, but when multiple languages find their way in to the community, it can get confused and create false positives. Filtering multiple languages at the same time can quickly turn your leading advocates in to antagonists.
1. Word Collision
Word collisions occur when filtering multiple languages from a central black list. A word in English does not necessarily mean the same thing in German or Spanish. Filtering words and phrases in multiple languages within one community will create false positives. As an example, the word pupil (the center of the eye) is harmless. When an “a” is placed at the end, “pupila,” it becomes derogatory. The sequence of letters placed within a word can mean something harmless in one language and be profane in another. Be aware of the users in your online community and refine your filter based on the languages most commonly seen.
2. Phonetics & Signs
The perception of how words sound when letters are placed together, and the signs (symbols) some languages use to represent those sounds can become an issue for your chat filter. Depending on the community, it is important to be aware of the language most commonly used, and refine the filter around the locale so as not to create false positives. Signs and literary language like those used in the Russian alphabet (кириллица), can also confuse the filter and create issues. Creating restraints and adding variations can help to alleviate any unwanted actions the filter may take against characters/signs.
3. Spanglish (Mixed Language)
Spanglish occurs when someone uses a mixture of English and Spanish within a sentence. An example is when someone uses the word “pinches”, as in “I don’t like it when she pinches me.” A harmless use of the word. A variation in Spanglish may be “You pinches hombres, get a job.” A not so kind way of letting people know you don’t care too much for them. Inversoft’s CleanSpeak combats this issue with the flexibility of adding variations and filtering phrases in multiple languages (locale).
4. Language Detection
The easiest way to prevent unwanted content when multiple languages are involved is to implement a language detection filter. Restricting the use of other languages in a specified community can prevent the issues of false positives from the very start. Make sure to let your users know that the online community they are entering is restricted. When users try to use a different language, signs or symbols, you can provide a pop-up reminding them of the community guidelines.
5. Language Rules
Chat filters come in all shapes and sizes. Some solutions offer extensive blacklists that are very difficult to maintain. Where they make up in numbers is where they lack in language rules. Understanding the root of a word, its inflections, conjugations, phonetics and even variants or characters (leet speak) are just a few of many attributes that need to be considered when looking for the right chat filtering solution. A naive filter will always need attention and must be constantly updated by someone who understands parts of speech and all the related variations. CleanSpeak makes it easy for the site owner by simplifying filter maintenance through automation.
There are many things to consider when providing an online community to engage your users. The more aware you are of the potential pitfalls and the tools available to manage them, the better prepared you will be to create a flourishing community.
Further Reading:
Profanity Filtering 101: Embedding
Profanity Filtering 101: Separators
Profanity Filtering 101: Repeat Characters
Profanity Filtering 101: The Grawlix
Types of Profanity Filters for Online Safety
Crashing Servers: A Stable Filtering Solution