At 18:41 27/08/2003 -0700, Justin Mason wrote:

Simon Byrnand writes:
>Just got a sales pitch today by phone followed up by email from these guys:
>
>http://www.death2spam.net.nz/
>
>Anybody else heard of them ? Their system claims to be based on Bayesian
>filtering and claims to be a lot more effective than 6 other spam filtering
>programs (including SpamAssassin) according to a pretty little bar graph.
>
>I have some serious doubts about some of their claims though, as that
>particular graph was showing "Out of the Box effectiveness" and we all know
>that a pure Bayesian filter has no out of the box effectiveness until its
>been trained ;-)

Which is what it looks like, BTW.  Judging by the techie details and
credits, I'd say it's a Spambayes-style Robinson classifier -- just like
SpamAssassin's learner.  (Except SpamAssassin has a few tweaks SB doesn't
yet include ;)   Richard Jowsey has cropped up on that list too:

http://mail.python.org/pipermail/spambayes/2003-April/004301.html

By the way, in case anyone thinks I'm slamming them, I wasn't, I'm sure their system is probably a competant implemenation of a Bayesian Classifier, however the claims that were made over the phone and in an email forwarded to me were definately exagerated...


Unless you do a LOT of training, I don't see how a purely Bayesian classifier can perform as well as something as multifacited as SpamAssassin. (I wonder how well a purely Bayesian classifier does on the HTML image only spams for example)

They also said that "SpamAssassin claims to have a proper Bayesian filter but doesn't" (funny comment in light of Justins remark ;) and a few other things...

Personally I've found the Bayes in 2.60-rc2 to be a *vast* improvement over that in 2.55, I copied a working bayes database to another machine, upgraded it (and the database) to 2.60 and ran them in parallel for a couple of email addresses, and in nearly all cases there is a very pronounced improvement.

The BAYES scores are now very definately at each end or in the middle of the range, (typically BAYES_00 BAYES_44 or BAYES_99) and a lot of spams that got uncertain bayes scores on the old version are now hitting hard on BAYES_99) with no extra training.

Likewise HAM is almost always getting BAYES_00. I don't think I've seen any cases yet where BAYES has swung the wrong way, as the old bayes sometimes did with nigerian spams for example...

Now that the action of BAYES is a lot more positive, and the scoreset 3 score for BAYES_99 is 5.4 points, it seems to me that SpamAssassin should in theory be able to catch some kinds of spam purely using BAYES, which was never possible before with a score of 3.0 for BAYES_99.

I'm curious to know roughly what the tweakes in bayes were between 2.55 and 2.60 ? Was it a change to a new algorithm or just a bit of tuning ?

Regards,
Simon



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to