Christopher Eykamp said the following on 03/12/02 17:29:
Would it make sense to do a Bayesian analysis using not only on individual words, but also the SpamAssassin regex tests in order to detect phrases and patterns that would be missed using a naive word-by-word analysis? And if that worked, would it then not make sense to discard the standard SA scoring system altogether?I looked into this and it didn't work too well (though others are welcome to try again). The reason I figured was that bayesian probability needs a good balance of spam vs non-spam rules, and SA just doesn't have that - it has a boat load of spam rules but only very few non-spam rules (and with good reason - non-spam rules are basically ways the spammer can sneak through the net). So the bayes stuff weights too heavily towards spam with SA rules added in.
I'd welcome further tests on it though. It might require some of my SA3 work to work correctly though - because that allows much finer grained control over when rules get run.
Matt.
-------------------------------------------------------
This SF.net email is sponsored by: Microsoft Visual Studio.NET comprehensive development tool, built to increase your productivity. Try a free online hosted session at:
http://ads.sourceforge.net/cgi-bin/redirect.pl?micr0003en
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk