[SAtalk] Habeas mark and auto-learning as ham

David A. Carter Mon, 12 Jan 2004 15:29:19 -0800

Hi:

A lot of mail has shown up in the group debating the soundness of Habeas's
watermarking scheme. Whether that debate is on topic, I'll leave as an
exercise for others. For the record, I think Habeas's idea is sound  enough,
provided they follow through with it. But this is not what concerns me.


What does concern me is how SpamAssassin should deal with Habeas marks,
which clearly *is* on-topic. Specifically, should SpamAssassin auto-learn
Habeas-marked messages as ham, as it does today? In an earlier thread, Theo
said it should:

> Well, this is less a question of "should it be autolearned" and more
> of a "how good is the Habeas system"...  In the perfect world, it's
> not forgable/misused and you would always accept it as a sign of ham,
> and therefore autolearning is desired.
>
> Since we don't live in the perfect world, the question is: can the
> Habeas folks act fast/complete enough so that forging/misusing the mark
> is completely minimized?  If they can, then there's not a huge issue --
> yeah, some spam will get through, but they'll quickly be squashed and
> there you go.  If they can't, then their whole business plan fails as
> people start ignoring the mark, and again no problem since the SA rules
> would go away.

I disagree, I think it is still a question of, "should it be autolearned?" I
think auto-learning habeas-marked emails as ham represents an exploitable
vulnerability in SpamAssassin: spammers can send a large amount of
habeas-marked spam (maybe not even real spam that actually sells something,
maybe just email with a large amount of spammy words/phrases like "[EMAIL PROTECTED]",
etc) from untraceable throwaway accounts. This spam gets auto-learned as ham
due to the habeas mark. The spammers can now send real, traceable spam
WITHOUT including the habeas mark, and it will past SA's checks because now
bayes thinks it is ham. We have already seen the effect of this
vulnerability in action over the past two days.

I do agree the Habeas folks will need to act quickly and completely so the
effect of forgeries is minimized. However, this doesn't mean SpamAssassin
needs to be a sitting duck for such forgeries. I think if you just stop
bayes from auto-learning habeas-marked mail as ham, you'd take away the
vulnerability, and the downside would be almost nil.

Consider: With the current scoring, If an email has a habeas mark on it, it
doesn't really need to be added to the bayes database since the habeas mark
will always pull down the score low enough to mark it as ham (except for the
most extreme cases). So we don't really need to add those particular
messages to the ham database anyway (excellent ham examples they may
be).  On the flipside, the negative effect from auto-learning forged habeas
mail as ham is huge. From my perspective, I'd be willing to live with the
FNs from forged habeas marks themselves if it wouldn't mess up my bayes. As
it is, I have to change my habeas scoring to hit at 0.0 to avoid this.

Anyway, what do others think about this?

DaC



-------------------------------------------------------
This SF.net email is sponsored by: Perforce Software.
Perforce is the Fast Software Configuration Management System offering
advanced branching capabilities and atomic changes on 50+ platforms.
Free Eval! http://www.perforce.com/perforce/loadprog.html
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

[SAtalk] Habeas mark and auto-learning as ham

Reply via email to