Crashing Servers: A Stable Profanity Filter Solution
- By Brian Pontarelli
- CleanSpeak
- April 9, 2013
We have been hearing from prospective customers that their profanity filters have been crashing their servers. Sure there are some good jokes about what people tend to do when servers go down, but in reality when servers crash users get angry. Angry users can have a big impact on your business. We have compiled two lists of suggestions that will help you prevent issues with your filter.
Picking a Good Filter
Picking a good profanity filter is important. Not only should you select a filter that has good accuracy and is customizable, but you should also pick a filter that can scale. Here are the 5 things you should look for to ensure your profanity filter won't crash your servers.
1. On-premise
The best way to reduce your risk of crashes is to use a filter that can scale. On-premise filters provide the lowest latency and the highest throughput. This means that even if you have massive spikes in traffic, the filter will keep up.
2. Avoid Regular Expressions
Most naive filters use regular expressions to filter your content. Regular expressions are often slow and can be memory intensive. If you are selecting a profanity filtering technology, be sure to verify that your preferred solution doesn't use regular expressions. Instead, look for a solution that uses a linear rule-based filter.
3. Don’t Let Watson Do Your Filtering
Similar to regular expressions, you should avoid filters that use artificial intelligence (AI). In most cases, filters that use AI don't handle spikes or large loads well. They also require expensive hardware and constant tuning to work properly. Instead, you should find a profanity filter that scales well on value priced hardware and doesn’t require a super-computer to find f-bombs.
4. Load Test
Load testing is a vital component of any project. Verify that the filter you use has been extensively load tested. A good load test should be capable of handling hundreds of concurrent client requests and provide throughput of at least 20,000 requests per second. All of these metrics should be sustainable over long periods of time without the filter crashing. This will ensure that any spikes you encounter are easily managed by the filter.
5. Standards
Using a filter that requires a proprietary protocol and a custom built client opens you up to the possibility of crashes. If the protocol implementation or client library have bugs, they could easily crash your server. Instead, use a filter that conforms to the standards such as HTTP and REST. These filters are much less likely to fail on you because REST and HTTP are two of the most widely used standards in the world. They have both been thoroughly tested. Most programming languages come with HTTP libraries out of the box and there are also hundreds of open source libraries for these protocols available. As an added bonus, using a filter that is built on top of HTTP and REST can reduce your development costs and integration time as well.
Prevent Failures
Even if you use an awesome filter, the server it is running on might crash or other problems might cause it to stop working. Here are the 3 ways to ensure that your community isn't impacted on the off chance your filter crashes.
1. Timeouts
Ensure that the part of your code that calls the filter uses a timeout. Specifying a timeout will prevent filter failures from backing your server up and causing cascading failures. Depending on the filter you are using, setting timeouts to around 50ms is a good idea. Although this number might seem high to some people, remember that timeouts are the absolute upper limit for any request.
2. Automatic Throttling
In addition to adding timeouts, you should also include an automatic throttling system. This system should automatically stop sending messages to the filter if a certain number of errors occur in a specific time frame. For example, you might decide that 10 errors in 1 minute indicate a filter problem. In this case, you might want to stop sending requests to the filter for 5 minutes before trying again.
3. Kill Switches
As a last resort, you should build in a global switch that disables chat (or other social features) in your community. If for some reason you see a problem with the filter (or anything else for that matter), you might want to turn off all social features until the problem can be resolved. Also, make this switch work as quickly as possible. You don't want to wait 30 minutes for your kill switch to take effect.
Summary
Using these suggestions and tips for picking a good filter and integrating it properly can nearly eliminate the possibility of your profanity filter crashing your servers. This will help keep your users happy and help your community thrive and grow.
Further Reading:
Regex performance issues - Profanity filter
CleanSpeak's Profanity Filter Tech Specs
Profanity Filter using a Regular Expression
What's the best profanity filter which supports Java integration?