From: Arvid Ephraim Picciani <a...@exys.org>
   Date: Tue, 31 Mar 2009 12:33:49 +0200
   
   > What do you mean "its impossible to train bayes"?
   
   i was assuming the random text at the end is what couses my bayes db to 
   behave randomly.
   
Random text that occurs only in spam rapidly becomes a spam sign.  Random
spam text that also occurs in ham requires a period of adjustment for
Bayes, but eventually Bayes figures it out.

   > Bayes really can be trained to deal with this message.
   > For example, I get BAYES_95:
   
   well i get 00
   
An occasional spam getting a low bayes score is ok, but lots
of spam getting BAYES_00 is a problem.

Train Bayes with more spam messages and correct any incorrectly learned
messages.

   > After I learn this message the probability increases to BAYES_99
   
   yes, for that specific message.  what exactly is the point of learning 
   specific messages when the next one will be different anyway.

Perhaps you are missing the point of bayes.  I got bayes_95 on the
message before training on the message.  My SpamAssassin hadn't seen
the message before, but it had trained on similar spams.
Bayes breaks the message up into various tokens, some of tokens from
this or any spam message will be repeated in other spam messages.

   >   % wget -O - -q http://codepad.org/W53onqK9/raw.txt | spamc | /bin/fgrep 
--text X-Spam-Bayes
   >   X-Spam-Bayes: bayes=1.0000, N=50(47-2+29), ham=(sort, doing), 
spam=(UD:spaces.live.com, UD:live.com, UD:entry, dawn, 
HX-Mozilla-Status2:00000000)
   
   interestingly i dont have that header.
   i'll check docs.

The X-Spam-Bayes header was added with
  add_header all Bayes bayes=_BAYES_, 
N=_BAYESTC_(_BAYESTCLEARNED_-_BAYESTCHAMMY_+_BAYESTCSPAMMY_), 
ham=(_HAMMYTOKENS(5,short)_), spam=(_SPAMMYTOKENS(5,short)_)

-jeff

Reply via email to