Let the mass-checking begin! The mass-check results are used as input for the genetic algorithm (GA) that generates SpamAssassin rule scores. Basically, the people who have both an email corpus and the capability to run mass-check submit their mass-check data to the SA developers and one of them (Theo is doing 2.50) runs the GA to generate new optimized scores.
This is the process for the first mass-check for 2.50. Since we already tried once to start the mass-check and postponed it to make some changes, this notice is dubbed "REV2" and the CVS tag name has changed. Unlike previous mass-checks for SpamAssassin, more than one mass-check is going to be required this time. SpamAssassin 2.50 will have multiple sets of scores. The one that is used depends on the configuration you're using (specifically, whether network and Bayes tests are turned on or not). - There will be a single mass-check from a first CVS revision this week lasting until Friday 23:59 GMT - There will be two more mass-checks (with different options) from a second CVS revision starting sometime several days after that. This first mass-check is a mass-check run without Bayes, but with network checks. The process is a wee bit complicated and it takes a while to run with network checks on, so we're giving everyone just over 4 days to finish. The ground rules are below. If you have questions or problems, please post to spamassassin-talk. ------------------------------------------------------------------------ Here are the ground rules and procedure (culled from recent messages and such). The CVS tag to be used for this mass-check is named: CORPORA_SUBMIT_VERSION_2_5_0_CHECK1 Use that tag name instead of the usual CURRENT_CORPORA_SUBMIT_VERSION. Everyone has until Fri Jan 17 23:59:00 GMT 2003 to upload their mass-check results. Earlier is better, of course. :-) 1. your corpus must follow the basic content policy described in masses/CORPUS_POLICY 2. use the process described in masses/CORPUS_SUBMIT for making sure your ham and spam is clean 3. we will use the procedure in masses/CORPUS_SUBMIT_NIGHTLY for this mass-check with the following modifications: a. Running the test: - check out using the CORPORA_SUBMIT_VERSION_2_5_0_CHECK1 revision - rm masses/spamassassin/bayes* before every mass-check run - use a single mass-check command so everything will be sorted by date - do *not* hand-train Bayes b. Options for mass-check: SAFE COMMAND TO USE: "mass-check --net --all <targets>" - required options for *this* run: --net - recommended options: --all - optional options: -j, --mid - do *not* use these options: -o, -n, --head, or --tail, --mbox, --file, or --dir Note that the target mail folder specification used on the mass-check command line has changed since 2.43. See the top of the mass-check file for the format. Regarding -j, it is not recommended to go above -j 4 when network checks are on (or above the number of processors in your system when network checks are off). Also note that -j will not work if your system does not have Unix domain sockets. If it doesn't work, don't use it. c. User preferences: For *this* run, masses/spamassassin/user_prefs should contain the following single line: ------- start of cut text -------------- auto_learn 0 ------- end ---------------------------- d. Which network tests have to be working You need to have working DNS tests. Razor2 support is also highly desired so try to have that working too. e. Submitting your results Upload them via rsync with the names "ham-nobayes-net-username.log" and "spam-nobayes-net-username.log". Also make sure the tag name of CORPORA_SUBMIT_VERSION_2_5_0_CHECK1 appears at the top of each. Everyone will need to have a login/password from Craig Hughes <[EMAIL PROTECTED]> to rsync since anonymous uploads won't work. If I missed anything important in this procedure, please reply and let me know, but assume the procedure has not changed until I post a new "NOTICE" with "(REV3)" in the subject. That way, there will be less confusion about what to do. Daniel -- Daniel Quinlan Linux, open source, and http://www.pathname.com/~quinlan/ anti-spam consulting ------------------------------------------------------- This SF.NET email is sponsored by: FREE SSL Guide from Thawte are you planning your Web Server Security? Click here to get a FREE Thawte SSL guide and find the answers to all your SSL security issues. http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk