On 02/02/2010 12:07 PM, Adam Katz wrote:
That is quite different from our masscheck stats.  Today's results at
http://ruleqa.spamassassin.org/20100201/%2FJM_SOUGHT look like this:

    SPAM%     HAM%     S/O    RANK   SCORE  NAME
   9.8564   0.0042   1.000    0.94    0.01  T_JM_SOUGHT_3
   8.1587   0.0068   0.999    0.93    0.01  T_JM_SOUGHT_2
  11.6464   0.0289   0.998    0.89    0.01  T_JM_SOUGHT_1
        0        0   0.500    0.48    0.00  JM_SOUGHT_FRAUD_1
        0        0   0.500    0.48    0.00  JM_SOUGHT_FRAUD_2
        0        0   0.500    0.48    0.00  JM_SOUGHT_FRAUD_3


FWIW the nightly masscheck is often very unbalanced especially on the spam side. Sometimes we have only 50k spam, sometimes over 500k spam. Some spam corpora contain a disproportionate amount of high scoring spam trap mail. I personally randomly filter out a large percentage of high scoring mail in an attempt to balance my spam corpus. But ultimately we need more masscheck participants to have better results.

Warren

Reply via email to