On Fri, 2010-08-20 at 17:47 +0200, Karsten Bräckelmann wrote: > On Fri, 2010-08-20 at 17:12 +0200, Jan P. Kessler wrote: > > false-positives hitting on the rules JM_SOUGHT_1 and JM_SOUGHT_2. > > Unfortunaley I can not give examples as these messages contain > > confidental customer data (assurance company). We had more than 100 > > false-positives with these rules in the last 2 days. > > I hope you can tell us the __SEEK_* sub-rules triggered, though. That > would help already. To extract these, either (a) pipe such a message to > spamassassin -D, and get the sub-rule from the debug output, or (b) add > a specific header only showing the sub-rules.
A word of caution: Do note that the seek sub-rules' names are generated using a hash function, and thus identify the actual string matched! You might want to check the string in 20_sought.cf, before disclosing the seek ID. I'd be surprised if it contains sensitive data, tough -- after all, it is found massively in spam. > spamassassin --cf="add_header all Subtests _SUBTESTS(,)_" > > Odds are, the FPs are some sort of stupid disclaimer that sneaked into > the spam corpus. > > Once we know which sub-rule causes the FPs, and preferably get the full, > original string, we can add the sample to the ham corpus, preventing the > automated sought process from picking it up. -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}