Let the mass-checking begin!

The mass-check results are used as input for the genetic algorithm (GA)
that generates SpamAssassin rule scores.  Basically, the people who have
both an email corpus and the capability to run mass-check submit their
mass-check data to the SA developers and one of them (Theo is doing
2.50) runs the GA to generate new optimized scores.

This is the process for the first mass-check for 2.50.  Since we already
tried once to start the mass-check and postponed it to make some changes,
this notice is dubbed "REV2" and the CVS tag name has changed.

Unlike previous mass-checks for SpamAssassin, more than one mass-check is
going to be required this time.  SpamAssassin 2.50 will have multiple sets
of scores.  The one that is used depends on the configuration you're using
(specifically, whether network and Bayes tests are turned on or not).

- There will be a single mass-check from a first CVS revision this week
  lasting until Friday 23:59 GMT

- There will be two more mass-checks (with different options) from a second
  CVS revision starting sometime several days after that.

This first mass-check is a mass-check run without Bayes, but with network
checks.  The process is a wee bit complicated and it takes a while to run
with network checks on, so we're giving everyone just over 4 days to
finish.  The ground rules are below.

If you have questions or problems, please post to spamassassin-talk.

------------------------------------------------------------------------

Here are the ground rules and procedure (culled from recent messages and
such).  The CVS tag to be used for this mass-check is named:

  CORPORA_SUBMIT_VERSION_2_5_0_CHECK1

Use that tag name instead of the usual CURRENT_CORPORA_SUBMIT_VERSION.

Everyone has until Fri Jan 17 23:59:00 GMT 2003 to upload their mass-check
results.  Earlier is better, of course.  :-)

1. your corpus must follow the basic content policy described in
   masses/CORPUS_POLICY

2. use the process described in masses/CORPUS_SUBMIT for making sure
   your ham and spam is clean

3. we will use the procedure in masses/CORPUS_SUBMIT_NIGHTLY for this
   mass-check with the following modifications:

   a. Running the test:
      - check out using the CORPORA_SUBMIT_VERSION_2_5_0_CHECK1 revision
      - rm masses/spamassassin/bayes* before every mass-check run
      - use a single mass-check command so everything will be sorted by date
      - do *not* hand-train Bayes

   b. Options for mass-check:

      SAFE COMMAND TO USE: "mass-check --net --all <targets>"

      - required options for *this* run: --net
      - recommended options: --all
      - optional options: -j, --mid
      - do *not* use these options: -o, -n, --head, or --tail, --mbox,
                                    --file, or --dir

      Note that the target mail folder specification used on the
      mass-check command line has changed since 2.43.  See the top of the
      mass-check file for the format.

      Regarding -j, it is not recommended to go above -j 4 when network
      checks are on (or above the number of processors in your system when
      network checks are off).  Also note that -j will not work if your
      system does not have Unix domain sockets.  If it doesn't work, don't
      use it.

   c. User preferences:

      For *this* run, masses/spamassassin/user_prefs should contain the
      following single line:

------- start of cut text --------------
auto_learn 0
------- end ----------------------------

   d. Which network tests have to be working

      You need to have working DNS tests.  Razor2 support is also highly
      desired so try to have that working too.

   e. Submitting your results

      Upload them via rsync with the names "ham-nobayes-net-username.log"
      and "spam-nobayes-net-username.log".  Also make sure the tag name
      of CORPORA_SUBMIT_VERSION_2_5_0_CHECK1 appears at the top of each.

      Everyone will need to have a login/password from Craig Hughes
      <[EMAIL PROTECTED]> to rsync since anonymous uploads won't
      work.

If I missed anything important in this procedure, please reply and let
me know, but assume the procedure has not changed until I post a new
"NOTICE" with "(REV3)" in the subject.  That way, there will be less
confusion about what to do.

Daniel

-- 
Daniel Quinlan                      Linux, open source, and
http://www.pathname.com/~quinlan/    anti-spam consulting


-------------------------------------------------------
This SF.NET email is sponsored by: FREE  SSL Guide from Thawte
are you planning your Web Server Security? Click here to get a FREE
Thawte SSL guide and find the answers to all your  SSL security issues.
http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to