On Mon, 12 Oct 2009 10:49:06 -0700
Ted Mittelstaedt <t...@ipinc.net> wrote:


> I think if you sit down and start trying to define examples
> and run them through large databases of spam and ham you
> will find that it doesen't work the way you think it does.  That
> is what I was talking about when I said that statistical
> mathematics has parts that are non-intuitive.

I think what you are saying is, you tried it and it didn't work for
you. That's doesn't mean that it can't be made to work - the basic
principle is sound. 

One way I think it might be done is to tokenize large corpora
of ham and spam (mainly fraud), and look for token combinations that are
very strong spam indicators. For example I suspect the simple
two-token combination of lottery+barrister is a pretty reliable
indicator. 

Meta-rules would be an inefficient way of implementing it though.

> The reason you probably think that "meta" rules work better
> is because you have created meta rules that are in reality,
> a grouping of a useless rules with a useful rule.  Thus, giving
> the illusion that "a rule that isn't scoring individually"
> actually is scoring when in a meta rule.

Sometimes meta-rules just make more sense, "paypal" and "yahoo" in the
same From header is worth scoring,  "paypal" or "yahoo" isn't.

Reply via email to