Re: what to sa-learn, poisoning

Loren Wilton Thu, 28 Jul 2005 13:41:36 -0700

> I assume everyone else sees spam sneak through that contains a "spammy"
> subject (usually mentioning drugs with some mis-spellings/obfu), an
> attached image that apparently has the actual spam "message" in it, then
> some text that is very hammy in it's content.


I tend to not see a lot of these get thru since I have lots of SARE rules
that check for obfu stuff.  That and the uribl's tent to catch virtually all
of that stuff.


> I've been assuming that this is what people refer to as "bayes poison" and
> I do not feed sa-learn with these.
>
> Is this correct, or would information in the headers still prove valuable
> to bayes?

One has to be careful about the concept "hammy in its content".  While the
words are certainly intended to be bayes poisioning, in the vast majority of
cases what the spammers pick is not at all typical of what shows up in a
real user's ham, and as a result the extra words end up being beautiful
Bayes spam catchers.  In addition to that, bayes will of course suck good
stuff out of the headers to mark the message as spam.

I think a (very) few people have reported that this sort of thing seemed
successful in poisoning their bayes db.  Most people seem to report that if
anything, that sort of stuff really helps bayes get things right most of the
time.  How likely this is to muck up your database may depend on how large a
group of clients you have, and how diverse they are.  If you normally have
problems with bayes going off track this might make things worse (although
it could make it better).  If bayes is doing moderately well for you, I'd
personally expect that feeding these to bayes would probably help.

        Loren

Re: what to sa-learn, poisoning

Reply via email to