On 9 Jun 2016, at 0:53, Henrik K wrote:

Garbage text/plain is known problem..

text/html too. From GMail.

Last week I had a *perfectly legitimate* message with a 151KB logical single line of HTML (QP encoded of course) freeze up a server scaled for 10k users. It did it slowly over a day, because it took a spamd child ~20 minutes to scan and the way the system is plumbed and how GMail times out at EOD, the message was re-queued for scanning and GMail kept coming back to retry it, then the sender and recipient figured out it hadn't arrive so it was re-sent ands o eventually all the spamd children were scanning copies of the message plus were spares in the queue, and it all got ugly. SA could not use its own timeout because the 20 minutes was spent checking it against a single (local, rawbody) rule (matching idiosyncratic spammer HTML) with an incautious '.*' such that the regex matching code would be of n! (or maybe n^n ?) order where n=[number of 'div' open tags in the line]. I'm a bit surprised it ever finished... However, it did, and SA timed out right after that rule failed to match after ~20 minutes (longer, once the busy spamd children outnumbered the cores...) As it turned out, there were actually 3 rules (out of over a hundred local rules) that broke when fed that message. I found 2 more that could hypothetically have trouble with plain text in the crlf-per-paragraph mis-format used by older versions of Outlook and newer versions of Apple Mail. I ended up replacing every occurrence of '.*' and '.+' many uses of '(pattern)*' in the local rules. It is *Perl* after all, not classic grep...

In short, the lesson is: IF you create local rules that match against idiosyncratic but loose styles of HTML, spacing, or punctuation, be EXTREMELY cautious with '*' and especially with the classic lazy '.*' combination. It's especially a risk with 'rawbody' and multiline 'full' rules, but given the junk tools people use to generate "valid" email, rules of any of the body types that are too fuzzy could run into problem of lines of unusual length.

Reply via email to