yep -- feel free to send me over copies of FP messages (or strings that match them)
2010/4/16 Karsten Bräckelmann <guent...@rudersport.de>: > On Fri, 2010-04-16 at 12:20 +0100, Matthew Newton wrote: >> We had a legitimate e-mail hit the JM_SOUGHT_3 yesterday. It also >> hit a few other rules that pushed it over our reject threshold of >> 10, and easily over the 'junk mail folder' level of 5. >> >> I managed to get them to send me the message, and it hits rule >> __SEEK_5ID3LI "Conti nuum Intern ational Publishing" (spaces >> added!) which is the name of their company. > > Makes one wonder how that string ends up quite massively in spam traps. > >> I know SOUGHT is an auto-generated ruleset; just wondering if >> there is there any way to remove false positives before the set is > > Yes. The Seek bits are cross-checked against a ham corpus, so the > easiest way is to inject an artificial ham message with the string in > question to get it off of the next run. > >> generated? Otherwise I'll add local rules to compensate against >> this one. > > meta __SEEK_5ID3LI (0) > > The Seek ID is constant, and will be the same even with later Sought > runs, for a given string. > > guenther > > > -- > char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; > main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: > (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}} > >