Hello Scott, Tuesday, December 2, 2003, 2:36:00 PM, you wrote:
SR> 2. My spam directory is up to 564 MB. Should I be worried about size SR> at all? Eventual corruption? Is there a way to perform periodic DB SR> maintenance? Is it ok to delete spam received say a month ago, if it SR> has been learned by the Bayes DB, either through autolearn or by SR> manually running sa-learn? Yes, and no. It depends on your plans and your capabilities. If you have no plans nor use for historic spam, then it's OK to delete it immediately after running sa-learn (assuming you've confirmed there are no false positives (ham) in your spam collection. However, there are (at least) three uses for historic spam: 1) If something should happen to your Bayes database, it's handy to have 500-1000 spam to retrain it with. (The most recent spam, in this case.) 2) If you develop your own rules, or want to validate others' rules that you adopt, it's useful to be able to check those rules against a corpus. Best way is through SA's mass-check capabilities, though there are other options. This is what I do, and currently I maintain 309 Meg of spam and 100 Meg of ham. My ham extends back a couple of years. I plan to let my spam grow until I have a full four months of it (16 more days to go). I'll then keep the spam corpus steady at around four months worth. 3) If you have the ability to contribute to the GA run which scores each new version's ruleset, then having a corpus to run those new rules against helps generate the distribution scores of the distribution ruleset, benefiting all SA users. Bob Menschel ------------------------------------------------------- This SF.net email is sponsored by OSDN's Audience Survey. Help shape OSDN's sites and tell us what you think. Take this five minute survey and you could win a $250 Gift Certificate. http://www.wrgsurveys.com/2003/osdntech03.php?site=8 _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk