Hi all,
Bayes seems to be missing quite  a lot of spam. I'm getting these
results quite often:


<snip>

Email:    63252  Autolearn: 26740  AvgScore:  14.53  AvgScanTime:  1.69 sec
Spam:     51232  Autolearn: 23252  AvgScore:  21.08  AvgScanTime:  1.68 sec
Ham:      12020  Autolearn:  3488  AvgScore: -13.40  AvgScanTime:  1.72 sec


TOP SPAM RULES FIRED
----------------------------------------------------------------------
RANK    RULE NAME                       COUNT  %OFMAIL %OFSPAM  %OFHAM
----------------------------------------------------------------------
   1    HTML_MESSAGE                    36720    70.25   71.67   64.18
   2    BAYES_99                        35269    56.74   68.84    5.17
   3    URIBL_SBL                       32502    54.28   63.44   15.22
   4    URIBL_JP_SURBL                  31805    50.70   62.08    2.20
   5    URIBL_SC_SURBL                  27524    43.83   53.72    1.65
   6    URIBL_OB_SURBL                  22908    36.27   44.71    0.29
   7    RCVD_IN_BL_SPAMCOP_NET          22082    35.55   43.10    3.35
   8    URIBL_AB_SURBL                  21789    34.63   42.53    0.96
   9    AWL                             19280    43.57   37.63   68.89
  10    RCVD_IN_XBL                     17122    27.09   33.42    0.12
  11    FORGED_RCVD_HELO                15386    28.34   30.03   21.12
  12    RCVD_IN_SORBS_DUL               13501    21.49   26.35    0.74
  13    RCVD_IN_NJABL_DUL               10934    17.37   21.34    0.43
  14    BODY_GAPPY_TEXT                 10888    22.04   21.25   25.40
  15    URIBL_WS_SURBL                  10615    16.80   20.72    0.08
  16    NO_REAL_NAME                     8883    22.63   17.34   45.18
  17    MIME_HTML_ONLY                   8226    16.09   16.06   16.21
  18    MSGID_FROM_MTA_ID                7667    13.04   14.97    4.83
  19    BAYES_00                         7445    23.53   14.53   61.87
  20    SUBJ_SPAMWORD                    7012    11.56   13.69    2.49
----------------------------------------------------------------------



To me, it looks like Bayes_00 is hitting far too much spam.

<snip>

~ $ sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0    2110713          0  non-token data: nspam
0.000          0     156758          0  non-token data: nham
0.000          0    1608693          0  non-token data: ntokens
0.000          0 1153323145          0  non-token data: oldest atime
0.000          0 1153446556          0  non-token data: newest atime
0.000          0 1153446557          0  non-token data: last journal sync atime
0.000          0 1153367234          0  non-token data: last expiry atime
0.000          0      43200          0  non-token data: last expire atime delta
0.000          0    1204872          0  non-token data: last expire reduction 
count


I have fed a large amount of mail into Bayes:


And I'm quite certain that it was fed correctly.
All of the misses I have checked have hit Bayes_00.

Any ideas why this is happening? I have toyed with the idea of lowering
the bayes_00 score. Anyone care to enlighten me on whether this would be
a bad idea and why?



Methinks you don't have enough mail trained in bayes... take a look at my numbers for hit count, then see how many spam and ham tokens I have in my bayes database.

If more training doesn't correct the scoring, you could lower the score for bayes_00, but mine's untouched.


Regards,
            Leigh

Leigh Sharpe
Network Systems Engineer
Pacific Wireless
Ph +61 3 9584 8966
Mob 0408 009 502
email [EMAIL PROTECTED]
web www.pacificwireless.com.au





-Gary

Reply via email to