On Thu, 2009-08-20 at 14:07 -0400, Alex wrote:
> A few of these have slipped through on my systems, but for the most
> part, these rules have worked here:
> 
> mimeheader AS_090505_CDIS_INLINE  Content-Disposition =~ /inline/

> mimeheader AS_090508_CTYP_PNG     Content-Type =~ /image\/png/
> mimeheader AS_090508_CTYP_JPG     Content-Type =~ /image\/jpg/
> mimeheader AS_090508_CTYP_JPEG     Content-Type =~ /image\/jpeg/

All scored the same. Can be written as a single rule.

> meta       AS_090508_PNGSPAM      (AS_090505_CDIS_INLINE && 
> AS_090508_CTYP_PNG)
> meta       AS_090508_JPGSPAM      (AS_090505_CDIS_INLINE && 
> AS_090508_CTYP_JPG)
> meta       AS_090508_JPEGSPAM      (AS_090505_CDIS_INLINE && 
> AS_090508_CTYP_JPEG)

All scored the same. Can be written as a single rule. The processing
overhead gets costlier with each cascaded meta...

> meta       LOCAL_BOTNET_JPG    (BOTNET && AS_090508_JPGSPAM)
> meta       LOCAL_BOTNET_JPEG    (BOTNET && AS_090508_JPEGSPAM)

Sic. And it even missed the PNG part.

> The LOCAL_* are mine, adapted to others I found some time ago. I'd be
> interested in people's input on these. Can they be simplified? Do you
> agree with the scoring?

See above on the need for that fine-grained cannonade of single-purpose
special rules all targeting the same.

As for the scoring... No.  An inline jpeg attachment of your friend's
holiday photos scores three times 0.5 = 1.5.


> How about bayes poisoning? The messages also all have random text,
> mostly spelled correctly, but nonsensical. If they are trained, could
> it adversely affect my bayes db?

Generally, no.  A spam advertising body part enhancers also has
correctly spelled words. Training them doesn't "poison" Bayes either.
And there usually are still useful tokens around.


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Reply via email to