Folks,

This is now ready to go.  Myself and Theo are already submitting results,
more folks should too.

Basically, the idea is that, if you've got a corpus of mail classified
into spam and nonspam, you run a script which checks out a tagged version
every day, runs mass-checks, and submits the logs to the corpus rsync
server.

We then take those logs and coalesce them into one set of freqs for
everyone's corpora, allowing rule development to be QA'ed automatically,
and on a large scale.  Very nice.

How it works is detailed in masses/CORPUS_SUBMIT_NIGHTLY ; check out a
version of SpamAssassin into a dir (I use /home/jm/ftp/mcver), then just
set up a cron to do something like this at or after 0900 GMT:

    cd /home/jm/ftp/mcver
    cvs -z3 update -dP -r CURRENT_CORPORA_SUBMIT_VERSION
    cd masses

    [run your mass-check-all-mail-dirs script]

    RSYNC_PASSWORD=yourpasswordfromcraig
    export RSYNC_PASSWORD
    rsync -CPcvuzb nonspam.log \
            [EMAIL PROTECTED]::corpus/nonspam-you.log
    rsync -CPcvuzb spam.log \
            [EMAIL PROTECTED]::corpus/spam-you.log

and that's it.  Then, on an hourly basis, I have a cron set up on
spamassassin.taint.org to generate a file in the website with the latest
freqs, using the combined submitted results for that day.  The results
then go up in this dir:

    http://spamassassin.taint.org/qa/freqs/

--j.


-------------------------------------------------------
This sf.net email is sponsored by: viaVerio will pay you up to
$1,000 for every account that you consolidate with us.
http://ad.doubleclick.net/clk;4749864;7604308;v?
http://www.viaverio.com/consolidator/osdn.cfm
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to