On Tue, Feb 2, 2010 at 18:21, Warren Togami <wtog...@redhat.com> wrote:
> On 02/02/2010 12:07 PM, Adam Katz wrote:
>>
>> That is quite different from our masscheck stats.  Today's results at
>> http://ruleqa.spamassassin.org/20100201/%2FJM_SOUGHT look like this:
>>
>>    SPAM%     HAM%     S/O    RANK   SCORE  NAME
>>   9.8564   0.0042   1.000    0.94    0.01  T_JM_SOUGHT_3
>>   8.1587   0.0068   0.999    0.93    0.01  T_JM_SOUGHT_2
>>  11.6464   0.0289   0.998    0.89    0.01  T_JM_SOUGHT_1
>>        0        0   0.500    0.48    0.00  JM_SOUGHT_FRAUD_1
>>        0        0   0.500    0.48    0.00  JM_SOUGHT_FRAUD_2
>>        0        0   0.500    0.48    0.00  JM_SOUGHT_FRAUD_3
>>
>
> FWIW the nightly masscheck is often very unbalanced especially on the spam
> side.  Sometimes we have only 50k spam, sometimes over 500k spam. Some spam
> corpora contain a disproportionate amount of high scoring spam trap mail.  I
> personally randomly filter out a large percentage of high scoring mail in an
> attempt to balance my spam corpus.  But ultimately we need more masscheck
> participants to have better results.

The corpus-quality for that masscheck doesn't look too bad though:

http://ruleqa.spamassassin.org/20100201-r905213-n/T_JM_SOUGHT_1/detail?s_corpus=1#corpus

-- 
--j.

Reply via email to