Hi all,

maybe this list can give me some feedback on a plugin I've written a few weeks ago. The Plugin is based on parts of the 'NiXSpam' project by the German IT magazine iX. NiXSpam is an elaborate procmail recipe (for more info see http://www.heise.de/ix/nixspam/ - it's German, though), and it uses a cool way of computing hashes from the body of mails to detect highly similar ones (which - propably - are spam).

One example:
Given a mail that has at least 16 spaces in it, NiXSpam does the following:
- reduce all duplicate occurences of [:space:]-chars to just one
- remove all characters of the [:graph:]-class
- then compute a MD5-hash and compare that to existing ones.
(For procmail code see end of post)
Now, in NiXSpam this is a purely local thing - hashes are written to a file, and procmail subsequently simply does a grep on it. However, in march somebody volunteered to feed the hashes computed by the iX mail server into a blacklist DNS server. I subsequently wrote a plugin for SA (http://wiki.apache.org/spamassassin/iXhash) and even managed to set up another DNS server with hashes from our own spam.

Maybe someone is interested in trying out the plugin and report some results? I, for my part, was surprised to find the spam that hit us to be apparently quite different from that hitting iX. BTW: Both tests against our and iX's blacklist hit about 50% of all incoming mails, and I've still to find a false positive. For me, this works.

Dirk

PS: I think I should mention
- Bert Ungerer, (iX)
- Manuel Schmitt, who hosts iX's blacklist
- KungFuHasi, who posted the perl code computing the hashes @ heise

In procmail, the above mentioned reads as follows:
:0B
* -15^0
# This checksum requires at least 16 spaces/tabs:
* 1^1 [     ]
{
 :0 bw
 md5hash=|tr -s '[:space:]' \
         |tr -d '[:graph:]' \
         |md5sum
 # Hashsumme bereits in der Datei?
 :0 Aw
 * ? fgrep -s $md5hash $HASHFILE
 { KNOWN=YES }
}



Reply via email to