On Sat, Sep 26, 2009 at 11:11:05AM -0400, Adam Katz wrote: > Karsten BrÃ?ckelmann wrote: > > > This is a plain RE rule I once wrote, to limit some rule to really short > > > messages only. > > > > > > rawbody __KB_RAWBODY_200 /^.{0,200}$/s > > Warren Togami mused: > > I suspect meta limiting Adam's IXHASH rules with a minimum size subrule > > would eliminate many of the IXHASH false positives. I was using his > > IXHASH plugin for a while, but stopped because I noticed too many FP's > > on short e-mails. I wonder if his IXHASH plugin is suitable to put into > > the sandbox for actual statistical testing. > > Quick note - iXhash isn't mine. The project is the brainchild of Dirk > Bonengel, http://dbonengel.users.sourceforge.net/#, who was inspired by > NiX Spam (by Bert Ungerer). The credits at http://ixhash.sf.net/ don't > actually mention Dirk (Dirk -- take credit!).
FYI.. Current iXhash has many bugs, which I noticed when I worked on my own version with SA native DNS lookups. One of the bigger problems of iXhash is probably of historical nature. There is no decoding of messages (base64 etc). Looking at method #1, which is supposed to apply on messages with 20 spaces and 2 newlines: if (($body =~ /(?>\s.+?){20}/g) || ( $body =~ /\n.*\n/ ) ){ Since it's buggily OR'd instead of &&, it's enough that mail only has two newlines. Especially short base64 messages are basically hashed from a few newlines and equal signs, making even completely different contents into same hashes. When I fixed this, for some reason hash #1 was rarely generated on a mail. It seems the /(?>\s.+?){20}/g clause seemed to match only when there are 20 whitespaces on the same line, which rarely happens. Anyways, making it /(?:\s.+?){20}/s worked but some foreign mails made the RE hang for tens of seconds. Rewrote it in completely different way.. If someone wants to have a look, here is my unofficial version. All the FPs I got are practically gone. http://sa.hege.li/iXhash2.pm http://sa.hege.li/iXhash2.cf I've let Dirk know about the bugs, we'll see what the future brings. Maybe a real iXhash2 that actually does decoding etc. I'm sure there could be many more enhancements, so I think this is good time for many eyes to give a serious look at the REs and methods! Quite long time that these bugs were unnoticed..