On 10/18/2011 8:53 AM, Daniel McDonald wrote: > One of my users submitted a spam for analysis, and I was amazed at the > efforts this troglodyte expended to poison bayes. > Is it worth the effort to try to find huge html comments hiding junk > like this? > > Maybe something like > > Rawbody OBFU_HTML_LONG_COMMENT /\<--.{1024,}?--\>/ > Describe OBFU_HTML_LONG_COMMENT contains a ridiculously long html comment
It may be worthwhile trying to find overly-long comments, but unfortunately, it's not quite as easy as that. The problem is making sure the beginning and ending markers are part of the same comment. Your example would be tripped up if there was a small comment at the beginning of the message and another small comment at the end. It would count characters between the beginning of the first comment and the end of the second one. As far as "Bayes Poisoning", I'm not sure there is any such thing. Any random text that a spammer dumps into his emails is unlikely to match the pattern of your normal emails. So just feed it to Bayes and let it do its job. Bayes works amazingly well if trained properly. :) -- Bowie