Re: [SAtalk] Re: Why wil "sa-learn" not learn?

Simon Byrnand Mon, 14 Jul 2003 18:49:23 -0700

At 00:29 14/07/03 -0500, Fuzzy Fox wrote:

Simon Byrnand <[EMAIL PROTECTED]> wrote:
>
> * Are you learning a proportionate amount of ham as well ? [...]
> * Bayes doesn't "learn" particular messages, it learns the statistics
>   of the words used in the messages.  [...]

None of this addresses the original posted concern:  When you feed one
or more messages to sa-learn, it should report:  "Learned from X messages",
where X is some number greater than zero.

Well all reference to this "original concern" was deleted from the quoting in the messages that I read and replied to....its a very busy mailing list and I don't read all messages before replying to one...

If sa-learn reports "Learned from 0 messages," it means that it didn't
put anything into the Bayes database.  The main reason that I have seen
for this, is that SA thinks that it has already seen the message before,
meaning that the Message-ID in the mail is already stored in the bayes
database.

Thats the usual cause, but its not the only one, as far as I know. There are also other criteria that may stop it from learning it. For example I believe (and the developers might have to correct me here) if the message has insufficient material to generate a worthwhile number of tokens the message will not be learnt.

However, the original poster already stated that he used "sa-learn --forget"
to try to remove any traces of the message, and yet sa-learn STILL
continues to say "Learned from 0 messages", meaning that it does not
want to learn from the particular message.

So there must be something else stopping it - I thought I remembered someone (Theo ?) asking him to try running it in debugging mode which should give a reason why it wasn't learnt, but I don't remember seeing a response to this...

I had several spams not too long ago in which all of them used the same
Message-ID.  SA refused to learn from any of them except the first one.
What can be done to combat that?  Of course, duplicate Message-ID's are
a violation of RFC's, but spammers don't care.  :)

I've wondered that before too.....but been afraid to voice it on the list in case spammers pick up on it.... but now that you've mentioned it.....:)

What happens if all spammers start sharing a handfull of message ID's, will that make Bayes useless against trying to learn their messages ? Good question indeed...

Regards,
Simon

-------------------------------------------------------
This SF.Net email sponsored by: Parasoft
Error proof Web apps, automate testing & more.
Download & eval WebKing and get a free book.
www.parasoft.com/bulletproofapps1
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Re: [SAtalk] Re: Why wil "sa-learn" not learn?

Reply via email to