Re: 2 + 2 != 4 - Spamassassin needs a new paradigm

decoder Tue, 03 Mar 2009 16:59:12 -0800

Marc Perkel wrote:

LuKreme wrote:
On Mar 3, 2009, at 10:06, John Wilcock <j...@tradoc.fr> wrote:
Le 03/03/2009 17:42, Matus UHLAR - fantomas a écrit :
I have been already thinking about possibility to combine every two rules
and do a masscheck over them. Then, optionally repeating that again,
skipping duplicates. Finally gather all rules that scored>=0.5 ||<=-0.5
- we could have interesting ruleset here.

But that's going to be a HUGE ruleset.
Not to mention that different combinations will suit different sites.
I wonder about the feasibility of a second Bayesian database, using the same learning mechanism as the current system, but keeping track of rule combinations instead of keywords.
It sounds like a really good idea to me, and also like the most reasonable way to manage self-learning meta rules.
It seems to me that the consensus is that it's worth a try. I don't know if it will work or not but I think there's a good change this could be a significant advancement in how well SA works.

I had exactly the same idea as Marc quite a while ago, but didn't try it (yet) because I didn't have a big corpus of false positives/negatives to test on. Using such a system mainly makes sense to actually improve the performance, i.e. to minimize false positives and negatives, so one would need to show that it indeed does improve the performance.

Apart from that, it should be simple using machine learning algorithms (e.g. Bayes, or even something more complex, like an SVM) to "learn" meta rules and also reasonably fast once one has the model.



Chris

smime.p7s
Description: S/MIME Cryptographic Signature

Re: 2 + 2 != 4 - Spamassassin needs a new paradigm

Reply via email to