Hi,

> To tell you the truth I'm losing ground lately against spammers. Two
> reasons. The Image spam is getting through and because it poisons the
> bayes I've lost much of the effectiveness of bayes filtering. I'm still
> holding on but I've had people who I hosted for for over a year who
> never had a single spam who are now getting a few. I am also having a
> few more false positives than I used to.


I'm having succes here detecting image spam using OSBF-Lua filter:

from OSBF-lua website:

"OSBF-Lua (Orthogonal Sparse Bigrams with confidence Factor) is a Lua C module 
for text classification. It is a port of the OSBF classifier implemented in 
the CRM114 project. This implementation attempts to put focus on the 
classification task itself by using Lua as the scripting language, a powerful 
yet light-weight and fast language, which makes it easier to build and test 
more elaborated filters and training methods.

The OSBF algorithm is a typical Bayesian classifier but enhanced with two 
techniques that I originally developed for the CRM114 project: Orthogonal 
Sparse Bigrams - OSB, for feature extraction, and the Exponential 
Differential Document Count - EDDC (a.k.a Confidence Factor) for automatic 
feature selection. Combined, these two techniques produce a highly accurate 
classifier. OSBF was developed focused on two classes, SPAM and NON-SPAM, so 
the performance for more than two classes may not be the same."



OSBF-Lua learn very fast. It only require Lua 5.1 installed on Exim server 
with dynamic loading enabled. 
See install doc; http://osbf-lua.luaforge.net/#installation


On exim.conf I add this statements:

On ## ON CONFIGURATION SETTINGS ##

# set OSBF_LUA_DIR to where spamfilter.lua, spamfilter_command.lua etc were 
#installed
OSBF_LUA_DIR=/usr/local/osbf-lua


On ## TRANSPORTS CONFIGURATION ##


add transport_filter to local_delivery transport:

local_delivery:
   driver = appendfile
   check_string = ""
   create_directory
   delivery_date_add
   directory = ${home}/Maildir/
   directory_mode = 700
   envelope_to_add
   return_path_add
   group = mail
   maildir_format
   maildir_tag = ,S=$message_size
   message_prefix = ""
   message_suffix = ""
   mode = 0600
   quota = ${lookup{$local_part}lsearch*{/etc/mail/quota_usr}{$value}    {4M}}
   quota_size_regex = S=(\d+)$
   quota_warn_threshold = 75%
   transport_filter = OSBF_LUA_DIR/spamfilter.lua --udir $home/osbf-lua


that's it!! :)


Verify our setup sending a message to yourself with the following in the 
subject line: help <your password> 

You will receive a message with a help about spamfilter.

To verify that databases wre created correctly: stats <your password>

>From now, all mesages that you received will be classified and tagged 
according the score they get:

Tag          Meaning

[--]         almost sure it's a spam - score <= -20

[-]          probably it's a spam (reinforcement zone) - score < 0 and > -20

[+]        probably it's not spam (reinforcement zone) - score >=0 and < 20

[++]     almost sure it's not spam - score >= 20. This tag is here just for   
symmetry, it's not used. An empty tag is used in place of it so as not to 
pollute the messages.


If the classification is wrong you nust train the filter replaying the message 
back to yourself, replacing the subject with the correspondent training 
command:

learn <password> spam or learn <password> nonspam 


After training a few messages, osbf-lua will increase the accuracy on spam 
detection.
If you have a pre-classified messages (nonspam / spam) database on a imap 
folder, you can use the script toer.lua to do the training.


Regards,

Marlon


 











-- 
## List details at http://www.exim.org/mailman/listinfo/exim-users 
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://www.exim.org/eximwiki/

Reply via email to