On Mon, 2013-10-28 at 19:30 -0400, Alex wrote: > > > think I should have an exclusion for messages that contain a > > > significant attachment.
> After thinking about it, I think I'd like to detect any attachment, > including those images typically found in signatures. > > > mimeheader __MIME_IMAGE Content-Type =~ /^image\/./ "Images typically found in signatures" usually are not attachments. They are stuck together with the HTML in a multipart/related MIME part, and addressed internally. The content-type rule above matches any image, attached or displayed inline (sic) in the HTML formatted body. Hence, the latter typically have a Content-Disposition of "inline". > > mimeheader __MIME_ATTACH Content-Disposition =~ /^attachment/ > I'll start with the mimeheader. Just use it in my meta? Given your original request (quoted first sentence)... Yes. Which flavor depends on what you ultimately want. Images or attachments. > >> I'd appreciate it if someone could help me review my rules and show me > >> where they're going wrong. Some of it is adapted from John's work back > >> in April, I think. > > I understand this on first sight weird stuff is designed to match a > > (raw)body with <= 200 chars, and prevent FPing on just slightly > > exceeding the chunk size, no? > > I think so. I was hoping John had time to chime in here, as he > explained it once to me, but it was never fully clear to me. I'd be curious to hear that, too. :) > > However, since the chunk size is 1-2 kB, __RB_LE_200 cannot match more > > than once. Even worse, it may match the last chunk with a total size > > more than 200 byte. The last constraint in the meta prevents this FP, > > not the 'equals 1' test. > > Chuck size is buffer size, the amount SA processes at a time? Yes. For rawbody rules, the entire raw body gets split up into chunks of 1-2 kB. The rawbody rules are matched against the chunks individually. > Okay, I've modified the rule: > > rawbody __RB_GT_200 /^.{201}/s > meta __BODY_LE_200 (__RB_LE_200 == 1) && !__RB_GT_200 That one is useless after turning __RB_LE_200 into a meta. Oh, and you really don't have to include my # comment. It was just meant to emphasize the point of simple logic. > meta __RB_LE_200 !__RB_GT_200 # less or equal IFF not greater > mimeheader __MIME_IMAGE Content-Type =~ /^image\/./ > mimeheader __MIME_ATTACH Content-Disposition =~ /^attachment/ > meta LOC_SHORT (__BODY_LE_200 && __HAS_HTTP_URI && > (__MIME_IMAGE || __MIME_ATTACH) && (!(BAYES_00 || USER_IN_WHITELIST || > KHOP_RCVD_TRUST))) Your original request was to EXCLUDE messages with attachments. The logic goes like this: ORIGINAL_CONSTRAINTS && ! MIME_ATTACHMENT That modified rule however *requires* an attachment or image. Go grab a large, black coffee... > I seem to remember it being necessary to specify a beginning bound for > the __RB_GT_200 rule, but it now seems to work without that, as you've > specified. A boundary is not necessary, but anchoring at the very beginning of the string /^/s might insignificantly speed up the RE. -- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}