On Fri, 2010-04-16 at 12:20 +0100, Matthew Newton wrote: > We had a legitimate e-mail hit the JM_SOUGHT_3 yesterday. It also > hit a few other rules that pushed it over our reject threshold of > 10, and easily over the 'junk mail folder' level of 5. > > I managed to get them to send me the message, and it hits rule > __SEEK_5ID3LI "Conti nuum Intern ational Publishing" (spaces > added!) which is the name of their company.
Makes one wonder how that string ends up quite massively in spam traps. > I know SOUGHT is an auto-generated ruleset; just wondering if > there is there any way to remove false positives before the set is Yes. The Seek bits are cross-checked against a ham corpus, so the easiest way is to inject an artificial ham message with the string in question to get it off of the next run. > generated? Otherwise I'll add local rules to compensate against > this one. meta __SEEK_5ID3LI (0) The Seek ID is constant, and will be the same even with later Sought runs, for a given string. guenther -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}