At 10:37 PM (-0800) 1/8/2004 (Thursday), Robert Menschel wrote:

Running these against my corpus, I find
WORDWORD  -- 4212s/14h of 87289 corpus (70035s/17254h)
WORDWORD2 -- 4205s/12h of 87289 corpus (70035s/17254h)

Probably some stupid questions, but I'm having trouble finding documentation to explain proper Bayes Feeding Techniques:


Do I have to keep feeding Bayes ham as I feed it spam? If if have to keep feeding it ham, what ratio of ham/spam should I be feeding it? Does the ratio matter beyond the initial feeding to kick Bayes into action?

When picking ham to feed it, what kinds of things should I consider/avoid when trying to find enough ham? Do the messages have to come from off-site, or can they mostly be internal mail between the same domain or between domains hosted on the same server? What are the potential problems/benefits of using mailing list messages as ham?

Finally, if I am writing my own custom rules, how do I determine what score to give them? I see mentions of "running against the corpus" like the one above, but how do you DO that, and once you do what exactly is it TELLING you?

TIA

--JR







-------------------------------------------------------
This SF.net email is sponsored by: Perforce Software.
Perforce is the Fast Software Configuration Management System offering
advanced branching capabilities and atomic changes on 50+ platforms.
Free Eval! http://www.perforce.com/perforce/loadprog.html
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to