Folks, This is now ready to go. Myself and Theo are already submitting results, more folks should too.
Basically, the idea is that, if you've got a corpus of mail classified into spam and nonspam, you run a script which checks out a tagged version every day, runs mass-checks, and submits the logs to the corpus rsync server. We then take those logs and coalesce them into one set of freqs for everyone's corpora, allowing rule development to be QA'ed automatically, and on a large scale. Very nice. How it works is detailed in masses/CORPUS_SUBMIT_NIGHTLY ; check out a version of SpamAssassin into a dir (I use /home/jm/ftp/mcver), then just set up a cron to do something like this at or after 0900 GMT: cd /home/jm/ftp/mcver cvs -z3 update -dP -r CURRENT_CORPORA_SUBMIT_VERSION cd masses [run your mass-check-all-mail-dirs script] RSYNC_PASSWORD=yourpasswordfromcraig export RSYNC_PASSWORD rsync -CPcvuzb nonspam.log \ [EMAIL PROTECTED]::corpus/nonspam-you.log rsync -CPcvuzb spam.log \ [EMAIL PROTECTED]::corpus/spam-you.log and that's it. Then, on an hourly basis, I have a cron set up on spamassassin.taint.org to generate a file in the website with the latest freqs, using the combined submitted results for that day. The results then go up in this dir: http://spamassassin.taint.org/qa/freqs/ --j. ------------------------------------------------------- This sf.net email is sponsored by: viaVerio will pay you up to $1,000 for every account that you consolidate with us. http://ad.doubleclick.net/clk;4749864;7604308;v? http://www.viaverio.com/consolidator/osdn.cfm _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk