Hello Scott,

Tuesday, December 2, 2003, 2:36:00 PM, you wrote:

SR> 2. My spam directory is up to 564 MB.  Should I be worried about size
SR> at all?  Eventual corruption?  Is there a way to perform periodic DB
SR> maintenance?  Is it ok to delete spam received say a month ago, if it
SR> has been learned by the Bayes DB, either through autolearn or by
SR> manually running sa-learn?

Yes, and no. It depends on your plans and your capabilities.

If you have no plans nor use for historic spam, then it's OK to delete it
immediately after running sa-learn (assuming you've confirmed there are
no false positives (ham) in your spam collection.

However, there are (at least) three uses for historic spam:

1) If something should happen to your Bayes database, it's handy to have
500-1000 spam to retrain it with. (The most recent spam, in this case.)

2) If you develop your own rules, or want to validate others' rules that
you adopt, it's useful to be able to check those rules against a corpus.
Best way is through SA's mass-check capabilities, though there are other
options.

This is what I do, and currently I maintain 309 Meg of spam and 100 Meg
of ham. My ham extends back a couple of years. I plan to let my spam grow
until I have a full four months of it (16 more days to go). I'll then
keep the spam corpus steady at around four months worth.

3) If you have the ability to contribute to the GA run which scores each
new version's ruleset, then having a corpus to run those new rules
against helps generate the distribution scores of the distribution
ruleset, benefiting all SA users.

Bob Menschel




-------------------------------------------------------
This SF.net email is sponsored by OSDN's Audience Survey.
Help shape OSDN's sites and tell us what you think. Take this
five minute survey and you could win a $250 Gift Certificate.
http://www.wrgsurveys.com/2003/osdntech03.php?site=8
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to