decoder wrote:
LuKreme wrote:

This is an excellent idea, but it also needs rule hits on ham, right?

You're right if you're saying that the method would work better if there were more ham rules. From what I have seen in my experiments however, the results are also very precise with the current SA ruleset. But any rule that adds some information to the feature set might yet increase the performance (especially the performance on unrecognized spam, on ham/spam which is detected by SA as well, the algorithm performs nearly as good as SA itself).




What I'm thinking, once this gets working, is to write what I'll call "informational rules". These rules would by themselves be 0 point rules and might at best be only slight indicators of spam vs. ham, but when combined with other rules would enhance the ability to form accurate metarules. And perhaps tokens can come from other things that just rules. Like the countries the message has passed through. Or individual word rules that we stopped using a long time ago. Marketing phrases.

I remember when Bayes first came out that we discovered that RED text was a stronger indicator of spam than words like viagra. I'm hopeful that this is going to give us a breakthrough like that where we find that interesting combinations change the way we see spam filtering.

I'm looking forward to seeing what comes of this.

Reply via email to