On 9 Jun 2016, at 0:53, Henrik K wrote:
Garbage text/plain is known problem..
text/html too. From GMail.
Last week I had a *perfectly legitimate* message with a 151KB logical
single line of HTML (QP encoded of course) freeze up a server scaled for
10k users. It did it slowly over a day, because it took a spamd child
~20 minutes to scan and the way the system is plumbed and how GMail
times out at EOD, the message was re-queued for scanning and GMail kept
coming back to retry it, then the sender and recipient figured out it
hadn't arrive so it was re-sent ands o eventually all the spamd children
were scanning copies of the message plus were spares in the queue, and
it all got ugly. SA could not use its own timeout because the 20 minutes
was spent checking it against a single (local, rawbody) rule (matching
idiosyncratic spammer HTML) with an incautious '.*' such that the regex
matching code would be of n! (or maybe n^n ?) order where n=[number of
'div' open tags in the line]. I'm a bit surprised it ever finished...
However, it did, and SA timed out right after that rule failed to match
after ~20 minutes (longer, once the busy spamd children outnumbered the
cores...) As it turned out, there were actually 3 rules (out of over a
hundred local rules) that broke when fed that message. I found 2 more
that could hypothetically have trouble with plain text in the
crlf-per-paragraph mis-format used by older versions of Outlook and
newer versions of Apple Mail. I ended up replacing every occurrence of
'.*' and '.+' many uses of '(pattern)*' in the local rules. It is *Perl*
after all, not classic grep...
In short, the lesson is: IF you create local rules that match against
idiosyncratic but loose styles of HTML, spacing, or punctuation, be
EXTREMELY cautious with '*' and especially with the classic lazy '.*'
combination. It's especially a risk with 'rawbody' and multiline 'full'
rules, but given the junk tools people use to generate "valid" email,
rules of any of the body types that are too fuzzy could run into problem
of lines of unusual length.