On Fri, 2010-08-20 at 17:12 +0200, Jan P. Kessler wrote: > we use spamassassin with the sought ruleset since several years at our > company. After the upgrade to from 3.2.5 to 3.3.1 we notice tons of
The SA upgrade is unrelated, the sought rules are the same for both and frequently generated from recent spam. This is merely a timing coincidence. > false-positives hitting on the rules JM_SOUGHT_1 and JM_SOUGHT_2. > Unfortunaley I can not give examples as these messages contain > confidental customer data (assurance company). We had more than 100 > false-positives with these rules in the last 2 days. I hope you can tell us the __SEEK_* sub-rules triggered, though. That would help already. To extract these, either (a) pipe such a message to spamassassin -D, and get the sub-rule from the debug output, or (b) add a specific header only showing the sub-rules. spamassassin --cf="add_header all Subtests _SUBTESTS(,)_" Odds are, the FPs are some sort of stupid disclaimer that sneaked into the spam corpus. Once we know which sub-rule causes the FPs, and preferably get the full, original string, we can add the sample to the ham corpus, preventing the automated sought process from picking it up. guenther -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}