On Wed, 2014-05-21 at 17:32 -0700, Ian Zimmerman wrote: > > The test message does not have that string. Maybe it uses DOS > > flavor "\r\n". Or what appears to be a bunch of linebreaks > > actually has spaces mixed in. > > Well, no. I looked at the message (the same data I fed to s.a. --debug) > with hexdump -C. It definitely has 10 consecutive 0a's. > > For rawbody rules, is really _the whole_ body fed to the matcher at once?
Well, no. Rawbody rules are applied to the raw, textual body parts, merely decoded with HTML and linebreaks left intact -- split up into 1-2 kByte chunks. It *is* possible the sub-string you're trying to match is placed rather unfortunate and being split. To have a closer look at the occurrences of consecutive newlines and their respective lengths, you can use this rule for testing: rawbody __BLANKS /\n{2,}/ tflags __BLANKS multiple The -D debug output will show all matches. The number of directly following "[...]" continuation lines per hit equals the number of consecutive newline chars matched. Unlike the resulting rule, this debugging variant needs an "or more" quantifier. Adjust the minimum to filter out short matches, while still being able to easily find the largest occurrence. Modifying your sample, or stripping down a minimal test case will show if this is just an unfortunate edge-case. In either case, having a sample would speed up this ping-pong style debugging. And I am curious. ;) Mind putting your sample up a pastebin? -- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}