On 10/18/2011 8:53 AM, Daniel McDonald wrote:
> One of my users submitted a spam for analysis, and I was amazed at the
> efforts this troglodyte expended to poison bayes.
> Is it worth the effort to try to find huge html comments hiding junk
> like this?
>
> Maybe something like
>
> Rawbody OBFU_HTML_LONG_COMMENT /\<--.{1024,}?--\>/
> Describe OBFU_HTML_LONG_COMMENT contains a ridiculously long html comment

It may be worthwhile trying to find overly-long comments, but
unfortunately, it's not quite as easy as that.  The problem is making
sure the beginning and ending markers are part of the same comment. 
Your example would be tripped up if there was a small comment at the
beginning of the message and another small comment at the end.  It would
count characters between the beginning of the first comment and the end
of the second one.

As far as "Bayes Poisoning", I'm not sure there is any such thing.  Any
random text that a spammer dumps into his emails is unlikely to match
the pattern of your normal emails.  So just feed it to Bayes and let it
do its job.  Bayes works amazingly well if trained properly.  :)

-- 
Bowie

Reply via email to