On Wed, 6 Aug 2003, Daniel Carrera yowled:
> On Wed, Aug 06, 2003 at 09:10:36PM +0300, Harri Pesonen wrote:
>> 
>>    This has probably been asked a zillion times, but why so low scores?
> 
> I think that it's just to pick safe defaults.  Bayes is only reliable 
> after it's been well-trained.

The GA probably chose those scores because stuff that hit high BAYES
scores also tended to hit so many other rules that it wasn't necessary
to give the scores a big hit to push them above 5.0.

Bear in mind that the GA does *not* aim for `maximise spam score and
minimise nonspam score'. It aims for `maximise %age of spam with score
>5.0' and `maximise ^age of nonspam with score <5.0', giving strong
>preference to the latter.

Looking at the soratios:

OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
 12.424  45.4324   0.0287    0.999   1.00    4.03  BAYES_90
 28.841   0.0107  39.6665    0.000   0.99   -5.40  BAYES_01
 11.173   0.0015  15.3686    0.000   0.95   -5.30  BAYES_10
  4.145  15.1681   0.0052    1.000   0.95    5.30  BAYES_80
  9.062   0.0015  12.4644    0.000   0.95   -5.30  BAYES_00
  2.299   8.4186   0.0006    1.000   0.94    5.20  BAYES_99
  5.878   0.0077   8.0825    0.001   0.94   -4.70  BAYES_20
  2.991  10.8226   0.0500    0.995   0.93    2.59  BAYES_70
  4.375   0.1164   5.9740    0.019   0.88   -1.07  BAYES_30
  2.405   8.0740   0.2766    0.967   0.84    2.00  BAYES_60
  0.000   0.0000   0.0000    0.500   0.00    0.00  BAYES_56
  0.000   0.0000   0.0000    0.500   0.00    0.00  BAYES_50
  0.000   0.0000   0.0000    0.500   0.00    0.00  BAYES_44
  0.000   0.0000   0.0000    0.500   0.00    0.00  BAYES_40

Note that BAYES_90 and BAYES_99 actually hit some nonspam; BAYES_90
actually hit more nonspam than did BAYES_80 (although BAYES_80 catches
much less spam than does BAYES_90). Therefore, the GA was driven to push
the high-confidence Bayes scores down because they were occasionally
wrong for legitimate email, and giving BAYES_90 a ludicrously high score
was pushing that legit email into the spam range. The GA works *hard* to
prevent that.

>>    I have noticed that SA has missed a couple of mails, score about 4.8,
>>    even though Bayes gave them 90% or 99% probability.
> 
> I noticed that too.  After I found that my SA was well-trained enough to 
> have a very high accuracy I raised the values for BAYES.

Look at rules/STATISTICS-set2.txt (set2 being for the bayes and no-net
run). Bayes isn't as brilliant as you think; it does occasionally make
a mistake, and the GA pushed its score down accordingly.

-- 
`That sound you hear is configure wailing, "MY PRECIOUSSSSSSSS!" as it
 overwrites Multilib with Primary.' --- Phil Edwards


-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to