From: Arvid Ephraim Picciani <a...@exys.org> Date: Tue, 31 Mar 2009 12:33:49 +0200 > What do you mean "its impossible to train bayes"? i was assuming the random text at the end is what couses my bayes db to behave randomly. Random text that occurs only in spam rapidly becomes a spam sign. Random spam text that also occurs in ham requires a period of adjustment for Bayes, but eventually Bayes figures it out.
> Bayes really can be trained to deal with this message. > For example, I get BAYES_95: well i get 00 An occasional spam getting a low bayes score is ok, but lots of spam getting BAYES_00 is a problem. Train Bayes with more spam messages and correct any incorrectly learned messages. > After I learn this message the probability increases to BAYES_99 yes, for that specific message. what exactly is the point of learning specific messages when the next one will be different anyway. Perhaps you are missing the point of bayes. I got bayes_95 on the message before training on the message. My SpamAssassin hadn't seen the message before, but it had trained on similar spams. Bayes breaks the message up into various tokens, some of tokens from this or any spam message will be repeated in other spam messages. > % wget -O - -q http://codepad.org/W53onqK9/raw.txt | spamc | /bin/fgrep --text X-Spam-Bayes > X-Spam-Bayes: bayes=1.0000, N=50(47-2+29), ham=(sort, doing), spam=(UD:spaces.live.com, UD:live.com, UD:entry, dawn, HX-Mozilla-Status2:00000000) interestingly i dont have that header. i'll check docs. The X-Spam-Bayes header was added with add_header all Bayes bayes=_BAYES_, N=_BAYESTC_(_BAYESTCLEARNED_-_BAYESTCHAMMY_+_BAYESTCSPAMMY_), ham=(_HAMMYTOKENS(5,short)_), spam=(_SPAMMYTOKENS(5,short)_) -jeff