On 1/20/2016 3:20 PM, Dianne Skoll wrote:
On Wed, 20 Jan 2016 12:11:02 -0800
Marc Perkel <supp...@junkemailfilter.com> wrote:
Again - it's not about matching as Bayes does. It's about not
matching.
It's not about not matching. It's about a preprocessing step that
discards tokens that don't have extreme probabilities.
I think your method works as well as it does because you're using up
to four-word phrases as tokens. The rest of the method is nonsense, but
the four-word phrase tokens are the magic ingredient; they'd make Bayes work
awesomely also.
Regards,
Dianne.
Differential analysis time? Add comparisons to 4-word Bayes and 4-word
Bayes on the same subset of the message as this new method.