On Mon, 2013-10-28 at 21:42 -0400, Alex wrote:
> >  "The 'raw body' of a message is the raw data inside all textual parts.
> >   [...] HTML tags and line breaks will still be present."
> >
> > If you don't want to match e.g. HTML tags, use a body rule instead.

> I knew this, but guess I assumed the "content-type text/html" was a
> boundary that was not considered as part of the text/plain that is
> processed, in the same way it's not with body rules, if that's clear.

The operational term here is "textual parts". Plural, and unlike your
assumption, not limited to plain-text in the case of rawbody rules.

This does not only include both text/plain and text/html, but also
includes all textual MIME parts, in case there are more than one. IIRC
that even includes text/* parts with Content-Disposition attached.

The fundamental difference between rawbody and body rules is, that
rawbody is the concatenation of all textual parts as-is, raw, preserving
HTML and line breaks.

Body rules however are applied to a rendered, normalized version of
these textual parts. Most notably, rendering removes (raw) line breaks
and uses a traditional plain-text paragraph concept. Paragraphs are
delimited by newlines. Normalization means consecutive whitespace is
condensed.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Reply via email to