On Mon, 2013-10-28 at 19:30 -0400, Alex wrote:
> > > think I should have an exclusion for messages that contain a
> > > significant attachment.

> After thinking about it, I think I'd like to detect any attachment,
> including those images typically found in signatures.
> 
> >   mimeheader __MIME_IMAGE  Content-Type =~ /^image\/./

"Images typically found in signatures" usually are not attachments. They
are stuck together with the HTML in a multipart/related MIME part, and
addressed internally.

The content-type rule above matches any image, attached or displayed
inline (sic) in the HTML formatted body. Hence, the latter typically
have a Content-Disposition of "inline".

> >   mimeheader __MIME_ATTACH Content-Disposition =~ /^attachment/

> I'll start with the mimeheader. Just use it in my meta?

Given your original request (quoted first sentence)... Yes.

Which flavor depends on what you ultimately want. Images or attachments.


> >> I'd appreciate it if someone could help me review my rules and show me
> >> where they're going wrong. Some of it is adapted from John's work back
> >> in April, I think.

> > I understand this on first sight weird stuff is designed to match a
> > (raw)body with <= 200 chars, and prevent FPing on just slightly
> > exceeding the chunk size, no?
> 
> I think so. I was hoping John had time to chime in here, as he
> explained it once to me, but it was never fully clear to me.

I'd be curious to hear that, too. :)


> > However, since the chunk size is 1-2 kB, __RB_LE_200 cannot match more
> > than once. Even worse, it may match the last chunk with a total size
> > more than 200 byte. The last constraint in the meta prevents this FP,
> > not the 'equals 1' test.
> 
> Chuck size is buffer size, the amount SA processes at a time?

Yes. For rawbody rules, the entire raw body gets split up into chunks of
1-2 kB. The rawbody rules are matched against the chunks individually.


> Okay, I've modified the rule:
> 
> rawbody __RB_GT_200 /^.{201}/s
> meta __BODY_LE_200 (__RB_LE_200 == 1) && !__RB_GT_200

That one is useless after turning __RB_LE_200 into a meta. Oh, and you
really don't have to include my # comment. It was just meant to
emphasize the point of simple logic.

> meta __RB_LE_200  !__RB_GT_200    # less or equal IFF not greater
> mimeheader __MIME_IMAGE  Content-Type =~ /^image\/./
> mimeheader __MIME_ATTACH Content-Disposition =~ /^attachment/
> meta        LOC_SHORT   (__BODY_LE_200 && __HAS_HTTP_URI &&
> (__MIME_IMAGE || __MIME_ATTACH) && (!(BAYES_00 || USER_IN_WHITELIST ||
> KHOP_RCVD_TRUST)))

Your original request was to EXCLUDE messages with attachments. The
logic goes like this:

  ORIGINAL_CONSTRAINTS  && ! MIME_ATTACHMENT

That modified rule however *requires* an attachment or image. Go grab a
large, black coffee...


> I seem to remember it being necessary to specify a beginning bound for
> the __RB_GT_200 rule, but it now seems to work without that, as you've
> specified.

A boundary is not necessary, but anchoring at the very beginning of the
string /^/s might insignificantly speed up the RE.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Reply via email to