On Thu, 21 Jan 2016 13:45:08 +0100 Christian Laußat wrote: > Am 21.01.2016 13:19, schrieb Reindl Harald: > > no entirely when "urrently, SA's bayes tokens are single words" from > > https://mail-archives.apache.org/mod_mbox/spamassassin-dev/201211.mbox/%3c509d55a8.30...@gmail.com%3E > > is still true > > > > please review that response below and consider 2/4 word tokes > > *additionally* in the SA-tokenizer and it will beat out the "new > > magic" easily witha well trained bayes in all cases > > Bogofilter has an option to specify how many tokens to put into > bayes. Here is an analysis of how effective this was: > http://www.bogofilter.org/pipermail/bogofilter-dev/2006q3/003349.html > > In my opinion it's not worth the effort. You'll blow up your database > for little better matching rate.
The FNs dropped from 287 to 69, which I'd call a four-fold improvement. The FPs rose from 0 to 1, but that mail was ham quoting a full spam, so arguably it just did a better job in detecting the embedded spam.