John Hardin wrote: > rawbody __TWO_WORD_LINES /^\S\+\s\+\S\+$/ > tflags __TWO_WORD_LINES multiple > meta STACKED_TEXT (__TWO_WORD_LINES > 10) > > Likely somewhat FP-prone...
I think quite FP-prone; think about emailed system logs, lists, invoices, etc. Your example used lots of real words, so I'd trust Bayes to find it. It also had a URL, so the URIBLs can detect it too. Finally, IIRC, some of the fuzzy checksum mechanisms go by patterns that take a keen interest in paragraph structure like that (or at least one was mentioned as well-loved at the last MIT Spam Conference), so make sure you're using Razor2, Pyzor, iXhash, and if permissible, DCC (though I'm not sure which of those use this method ... iXhash certainly does not).