On Thu, 21 Jan 2016 13:45:08 +0100
Christian Laußat wrote:

> Am 21.01.2016 13:19, schrieb Reindl Harald:
> > no entirely when "urrently, SA's bayes tokens are single words" from
> > https://mail-archives.apache.org/mod_mbox/spamassassin-dev/201211.mbox/%3c509d55a8.30...@gmail.com%3E
> > is still true
> > 
> > please review that response below and consider 2/4 word tokes
> > *additionally* in the SA-tokenizer and it will beat out the "new
> > magic" easily witha well trained bayes in all cases  
> 
> Bogofilter has an option to specify how many tokens to put into
> bayes. Here is an analysis of how effective this was:
> http://www.bogofilter.org/pipermail/bogofilter-dev/2006q3/003349.html
> 
> In my opinion it's not worth the effort. You'll blow up your database 
> for little better matching rate.

The FNs dropped from 287 to 69, which I'd call a four-fold improvement.

The FPs rose from 0 to 1, but that mail was ham quoting a full spam, so
arguably it just did a better job in detecting the embedded spam.

Reply via email to