Best Bayesian Spam Filters - Bayesian Email Filter

In this article we discuss the inner workings of a bayesian spam filter.

Which spam filter is best for you? There are a number of methods out there for identifying spam. Keyword search, message hash, RBL lists, forged header detection and Bayesian filters are just a few. Most anti-spam products today employee a number of these techniques with varying degrees of effectiveness. Today we will be discussing “Bayesian filters”, how they work, why they work, and when they donвЂ™t work.

A Bayesian filter is a filter that learns from experience. After an email is classified as spam or solicited, the filter adds the statistics about the email into its recognition table for future reference.

A grossly oversimplified example: 2 messages in your inbox, one is spam and one is not. Message 1 contains the one sentence: Hi cat. Message 2 contains one sentence: Hi dog. The filter would look at the total number of words in a message, compare it with the distinct words that are in spam and not spam and come up with: Hi (0), Cat (.5)spam, Dog (.5)not spam; This is because the word “Hi” happens equally and the word Cat being 50% of the message occurs only in spam messages and the word Dog being 50% of the message only occurs in legitimate emails.

Now lets take a third message it needs to learn from, Message 3: Hi dog how are you? This message we have also identified as legitimate. In this case dog is .20 as well as how, are, and you. So now our filter database is hi(0) Cat(.5)spam Dog(.35)not spam how/are/you (.20)not spam.

This adaptive nature is exactly why Bayesian filters are so attractive to some. Even after the spammers have figured out how to get past the keyword searches and most other methods, the new “words” they create by misspelling and scrambling never appear in legitimate emails and therefore are always seen as spam first.

One of the aspects of a Bayesian filter that separates it from other anti-spam methods is that it is adaptive and based on the user. Depending on the mail YOU get the filter learns. This is extremely helpful in situations where typical spam words are used in your workplace everyday. Take a financial institution for example: The words “mortgage” and “cash” will probably show up in excess, so much so that regular spam blocking techniques would prevent legitimate emails from being delivered. With a Bayesian filter in place however the rules are totally different. Because the emails you “Taught it with” included those terms it does not see them as spam indicators. What is more it will be able to identify discrepancies between spam email coming in with the word “mortgage” in it, and internal messages with the word “mortgage” in it. The differences will obviously be word location, word count and supporting words. All things the Bayesian Filter takes into account before it makes its final judgment about the fate of this email.

To truly get good results out of a Bayesian filter, you need to “feed” it lots of good input. In my experience several hundred spam and non spam messages make it extremely effective. The actual math behind the filter is unique to the product because although the basic concept is well known, the application requires a liberal dose of creativity. Some of the more advanced conditions that the filter can include in its “dictionary” of values are: word location, word presence to email size and subject line existence in message body. There are of course many more and the number will probably increase as long as email spam continues to increase.

With this kind of processing going against each email, it is easy to see that spammers have a very difficult time in tricking Bayesian Filters. As a matter of fact, the only thing they can do is to try and keep their mail as short and neutral as possible. This, to the spammerвЂ™s dismay, soon becomes another signature of spam emails because of the adaptive nature of these filters. In brief, Bayesian Filters hold a lot of promise when “taught right” (much like our kids) and they have to potential to virtually eliminate spam from your inbox.

I hope this has been instructional for you, remember to take care and have fun.

Spam Blocking and Bayesian Spam Filter Software - Click Here

Back to Articles Home