On Sat, 2014-05-10 at 20:19 -0400, Alex wrote: > [...] not sure if something's changed, or the rule never worked as I > expected, but it's having problems, and I hoped someone could help.
Something changed indeed -- you broke the __RB_GT_200 sub-rule. > body __RB_GT_200 /^.{201}/s This is supposed to be a rawbody rule. I know, because I've discussed and partly developed the rule(set) in question with you before, back in Oct 2013. And the RB prefix is a hint as well. ;) http://markmail.org/message/ebrm6snglxipj6wx Body rules are applied against the rendered, normalized textual parts of a message, one paragraph (delimited by double newline) at a time. Rawbody rules are applied against the raw (merely decoded) textual parts (split up into 1-2 kB chunks). Thus, the above RE used in a rawbody rule translates into the desired "contains more than 200 chars", whereas used in a body rule means "at least one paragraph of 200 chars". The difference in meaning can be seen in your sample: The textual parts (regardless of with or without HTML markup) are clearly more than 200 chars. However, with the amount of double newlines (paragraphs!), no paragraph exceeds the limit. So that's what broke the rule(set) and causes FPs. Any chance for a fix? If you really want to count chars in the body, rendered and normalized without HTML markup, the simple RE-based solution of counting does not work. Instead, you need a more complex variant of counting: body __BODY_COUNT_CHAR /./ tflags __BODY_COUNT_CHAR multiple maxhits=201 # maxhits is SA 3.4 only meta __BODY_LE_200 __BODY_COUNT_CHAR <= 200 Beware, that is just briefly tested and modified afterward. Use with care and test before going into production. The __BODY_LE_200 sub-rule is meant as a replacement for its rawbody counterpart __RB_LE_200. The body sub-rule counts chars in (all) textual parts, rendered and normalized without HTML markup. The meta rule is used to get a single threshold rule. I did not test how much counting chars like that, abusing tflags multiple for a single char level, might affect runtime. This holds in particular without the maxhits argument, available since SA 3.4. > meta __BODY_LE_200 (__RB_LE_200 == 1) && !__RB_GT_200 > meta __RB_LE_200 !__RB_GT_200 # less or equal IFF not greater The simple rule __RB_LE_200 supersedes the __BODY_LE_200 rule. As I pointed out before, please drop the latter and its broken logic. Generally, I suggest to re-read the thread of Oct 2013, since I explained quite a few issues contributing to this recurring question. > meta LOC_SHORT ((__BODY_LE_200 && __HAS_HTTP_URI) && (!(__MIME_IMAGE > || __MIME_ATTACH ))) Also use __RB_LE_200 here. -- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}