On Thu, 2009-08-20 at 14:07 -0400, Alex wrote: > A few of these have slipped through on my systems, but for the most > part, these rules have worked here: > > mimeheader AS_090505_CDIS_INLINE Content-Disposition =~ /inline/
> mimeheader AS_090508_CTYP_PNG Content-Type =~ /image\/png/ > mimeheader AS_090508_CTYP_JPG Content-Type =~ /image\/jpg/ > mimeheader AS_090508_CTYP_JPEG Content-Type =~ /image\/jpeg/ All scored the same. Can be written as a single rule. > meta AS_090508_PNGSPAM (AS_090505_CDIS_INLINE && > AS_090508_CTYP_PNG) > meta AS_090508_JPGSPAM (AS_090505_CDIS_INLINE && > AS_090508_CTYP_JPG) > meta AS_090508_JPEGSPAM (AS_090505_CDIS_INLINE && > AS_090508_CTYP_JPEG) All scored the same. Can be written as a single rule. The processing overhead gets costlier with each cascaded meta... > meta LOCAL_BOTNET_JPG (BOTNET && AS_090508_JPGSPAM) > meta LOCAL_BOTNET_JPEG (BOTNET && AS_090508_JPEGSPAM) Sic. And it even missed the PNG part. > The LOCAL_* are mine, adapted to others I found some time ago. I'd be > interested in people's input on these. Can they be simplified? Do you > agree with the scoring? See above on the need for that fine-grained cannonade of single-purpose special rules all targeting the same. As for the scoring... No. An inline jpeg attachment of your friend's holiday photos scores three times 0.5 = 1.5. > How about bayes poisoning? The messages also all have random text, > mostly spelled correctly, but nonsensical. If they are trained, could > it adversely affect my bayes db? Generally, no. A spam advertising body part enhancers also has correctly spelled words. Training them doesn't "poison" Bayes either. And there usually are still useful tokens around. -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}