On Fri, 2010-08-20 at 17:47 +0200, Karsten Bräckelmann wrote:
> On Fri, 2010-08-20 at 17:12 +0200, Jan P. Kessler wrote:
> > false-positives hitting on the rules JM_SOUGHT_1 and JM_SOUGHT_2.
> > Unfortunaley I can not give examples as these messages contain
> > confidental customer data (assurance company). We had more than 100
> > false-positives with these rules in the last 2 days.
> 
> I hope you can tell us the __SEEK_* sub-rules triggered, though. That
> would help already. To extract these, either  (a) pipe such a message to
> spamassassin -D, and get the sub-rule from the debug output, or  (b) add
> a specific header only showing the sub-rules.

A word of caution:  Do note that the seek sub-rules' names are generated
using a hash function, and thus identify the actual string matched!

You might want to check the string in 20_sought.cf, before disclosing
the seek ID. I'd be surprised if it contains sensitive data, tough --
after all, it is found massively in spam.


>   spamassassin --cf="add_header all Subtests _SUBTESTS(,)_"
> 
> Odds are, the FPs are some sort of stupid disclaimer that sneaked into
> the spam corpus.
> 
> Once we know which sub-rule causes the FPs, and preferably get the full,
> original string, we can add the sample to the ham corpus, preventing the
> automated sought process from picking it up.

-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Reply via email to