On 02/02/2010 12:07 PM, Adam Katz wrote:
That is quite different from our masscheck stats. Today's results at
http://ruleqa.spamassassin.org/20100201/%2FJM_SOUGHT look like this:
SPAM% HAM% S/O RANK SCORE NAME
9.8564 0.0042 1.000 0.94 0.01 T_JM_SOUGHT_3
8.1587 0.0068 0.999 0.93 0.01 T_JM_SOUGHT_2
11.6464 0.0289 0.998 0.89 0.01 T_JM_SOUGHT_1
0 0 0.500 0.48 0.00 JM_SOUGHT_FRAUD_1
0 0 0.500 0.48 0.00 JM_SOUGHT_FRAUD_2
0 0 0.500 0.48 0.00 JM_SOUGHT_FRAUD_3
FWIW the nightly masscheck is often very unbalanced especially on the
spam side. Sometimes we have only 50k spam, sometimes over 500k spam.
Some spam corpora contain a disproportionate amount of high scoring spam
trap mail. I personally randomly filter out a large percentage of high
scoring mail in an attempt to balance my spam corpus. But ultimately we
need more masscheck participants to have better results.
Warren