Bug? The bayes code in 2.50 doesn't get invoked from spamd because there is no hook from handle_user to [re]open the bayes databases. I have to think this is an oversight, but I thought I'd better ask. * Should spamd do this?
The learn code is a bit slow and if the authors are open to code submissions I'll submit something to help. Having experimented heavily with Graham- inspired code, I have to say it's really fast, works really well (for me) at having tolerable false negatives but NO false positives (but then my corpora have grown to 10000 spam and 10000 ham), and it also fills two important gaps for me: - fixes the persistent false positives I get (e.g. all ietf-announce traffic scores 6.6 consistently) - has a comprehensible end-user interface (just provide examples) * Conclusion: Bayes -- good. Regarding the future, I am hoping that where this is going is the ability to support multiple layers of bayes databases; a system-wide one (probably large corpus, hence large database), PLUS per-user ones (probably small corpus, but important to the user). * Any intention here among the main developers? Open to new code? Regarding Bayes scoring, if Graham's formula becomes an option, you might as well have only two possible scores, as the output is extremely bimodal. Robinson's formula spreads the probability distribution more widely. Extrema of +-4.0 score seem really high, but I haven't seen any bad results. You'd have to treat Robinson and Graham formulas as different tests altogether for GA score determination. If the training corpus is loaded up with a lot of cases that SA gets wrong, then the GA may train to the Bayes scores to the right levels. If the training corpus is too easy, then it will score Bayes too low because it'll seem that Bayes has little to contribute. I'll bet that leaving Bayes out of GA is the right answer for now, but it would be interesting to see what it comes up with. * That's my two cents. P.S. Food for thought. I have noticed some recent spam including unusual random vocabulary words in the message -- the spammers must be watching and experimenting too? A lot of this, and simple vocabulary based Bayes will weaken (especially if spammers get their extreme-words from a default distribution!) -- I think the next ply will include phrases or other more semantic features -- but there are space tradeoffs. ------------------------------------------------------- This sf.net email is sponsored by: Influence the future of Java(TM) technology. Join the Java Community Process(SM) (JCP(SM)) program now. http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0004en _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk