Hello Pedro,

Friday, January 9, 2004, 11:58:55 AM, you wrote:

>> Probably some stupid questions, but I'm having trouble finding
>> documentation to explain proper Bayes Feeding Techniques:
>>
>> Do I have to keep feeding Bayes ham as I feed it spam?  

PS> For best results, feed all human-identified HAM and SPAM.

Agreed, with the exception of this list and any personal email that
discusses the technicalities of spam (eg: if you discuss spam topics such
as the Banned CD, that email should NOT be learned as ham even though to
you it is).

>> If if have to keep 
>> feeding it ham, what ratio of ham/spam should I be feeding it?  Does the
>> ratio matter beyond the initial feeding to kick Bayes into action?

PS> In general, you don't have to worry too much about the ratio, unless your
PS> HAM/SPAM ratio is extremely weird.  Just feed as much as you can.

Agreed. ham/spam = 1/20 is not too bad. I'm not so sure about 1/1000.

>> When picking ham to feed it, what kinds of things should I consider/avoid
>> when trying to find enough ham?  Do the messages have to come from
>> off-site, or can they mostly be internal mail between the same domain or
>> between domains hosted on the same server?  What are the potential
>> problems/benefits of using mailing list messages as ham?

PS> If you use SA to filter both internal and external mail, then yes, learn from
PS> both.  But if you only filter external mail through SA, then learning from
PS> internal mail MAY be counter productive.

Only thing I exclude is SA-Talk and similar emails. Everything else goes
through sa-learn, including email I receive through my email client from
accounts that don't have SA capabilities.

>> Finally, if I am writing my own custom rules, how do I determine what score
>> to give them?  I see mentions of "running against the corpus" like the one
>> above, but how do you DO that, and once you do what exactly is it TELLING
>> you?

PS> dunno, something about mass-check?

My ideas, at least as of a month ago, are documented at
http://www.exit0.us/index.php/RM_RuleScoring

Bob Menschel





-------------------------------------------------------
This SF.net email is sponsored by: Perforce Software.
Perforce is the Fast Software Configuration Management System offering
advanced branching capabilities and atomic changes on 50+ platforms.
Free Eval! http://www.perforce.com/perforce/loadprog.html
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to