On Mon, 2013-10-28 at 21:42 -0400, Alex wrote: > > "The 'raw body' of a message is the raw data inside all textual parts. > > [...] HTML tags and line breaks will still be present." > > > > If you don't want to match e.g. HTML tags, use a body rule instead.
> I knew this, but guess I assumed the "content-type text/html" was a > boundary that was not considered as part of the text/plain that is > processed, in the same way it's not with body rules, if that's clear. The operational term here is "textual parts". Plural, and unlike your assumption, not limited to plain-text in the case of rawbody rules. This does not only include both text/plain and text/html, but also includes all textual MIME parts, in case there are more than one. IIRC that even includes text/* parts with Content-Disposition attached. The fundamental difference between rawbody and body rules is, that rawbody is the concatenation of all textual parts as-is, raw, preserving HTML and line breaks. Body rules however are applied to a rendered, normalized version of these textual parts. Most notably, rendering removes (raw) line breaks and uses a traditional plain-text paragraph concept. Paragraphs are delimited by newlines. Normalization means consecutive whitespace is condensed. -- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}