Re: My new method for blocking spam - REVEALED!

Reindl Harald Wed, 20 Jan 2016 12:15:20 -0800


Am 20.01.2016 um 21:11 schrieb Marc Perkel:


On 01/20/16 12:05, RW wrote:

On 01/20/16 10:26, Shawn Bakhtiar wrote:

Sorry.. how is this different than Naive Bayes filtering??

On Wed, 20 Jan 2016 10:52:58 -0800
Marc Perkel wrote:

Yes - you missed something. It is about intersecting one corpi and
NOT intersecting the other.

This is about what doesn't match - not what does.


What you are doing is a special case of an ordinary Bayesian filter. If
you remove Robinson's correction for low-count tokens, or adjust the
Robinson parameters so it has no effect, you end up with tokens that
only occur in spam having a probability of 1, tokens that only occur
in ham having a probability of 0 and token that occur in both having a
probability in-between. If set a cut-off of 0.499999... you leave
only the pure tokens behind. And because all the probabilities are 0 or
1 the chi-squared test reduces to comparing the number of spammy and
hammy tokens just as you are doing.

Your multi-word tokenization is exactly the same as in Bogofilter and
most of what you are doing can be done in Bogofilter with a few lines
in the configuration file.

Any value in your scheme must be in the selection of what you
tokenize. The rest is likely holding it back.


Again - it's not about matching as Bayes does. It's about not matching.

In the subject line of the message the phrase "method for blocking spam"
makes the message ham. Spammers never use the phrase "method for
blocking spam". No other tests needed. My system result 100% ham. To
bayes it's just some words

What makes it ham is what doesn't match, not what does

"Spammers never use the phrase" is pure bullshit - sorry, no way to express it nicer!

signature.asc
Description: OpenPGP digital signature

Re: My new method for blocking spam - REVEALED!

Reply via email to