On Wed, 20 Jan 2016 12:11:02 -0800
Marc Perkel wrote:

 
> Again - it's not about matching as Bayes does. It's about not
> matching.
> 
> In the subject line of the message the phrase "method for blocking
> spam" makes the message ham. Spammers never use the phrase "method
> for blocking spam". No other tests needed. My system result 100% ham.
> To bayes it's just some words.

It is to Bayes, but most most statistical filters do use phrases as
tokens.  

> What makes it ham is what doesn't match, not what does.
 

Right but it's not about the count of phases that don't match anything,
You uses phrases that occur in spam, but not ham and vice versa, so it
is about matching too.

What you are doing is equivalent to a statistical filter doing
multiword tokenization, dropping the tokens that appear in both spam an
ham and then simply counting the spammy and hammy tokens to produce a
result. A filter like bogofilter can do exactly this if you turn-on
multiword tokenization and configure it with some very sub-optimal
parameters.

Reply via email to