If I have followed the discussion correctly so far, the explanation for manual-learn not being distinguished from auto-learn is this: no matter what mode of learning caused a token to appear in the database, if there is ongoing mail traffic that "hits" on the token then said token will not expire out anyway.
In other words, tokens don't expire because of where or how they came to be listed, they expire because no more incoming mail traffic references them. If you manually train a message that is the ONLY instance of that particular spam to slip through your other filter, and your Bayes never sees another message that matches the tokens it generated, then those tokens are irrelevant regardless of learn mode. >>> Wes <[EMAIL PROTECTED]> 11/30/07 11:56 AM >>> > > The whole reason bayes works is the fact that there's a *LOT* of tokens > that are repeated over and over and over again for any given kind of > mail. So the set of tokens acted on by one message are 95% the same as > the ones in another, provided the general type of email is the same (and > by general type, I'm thinking all email fits into maybe 20 types, I'm > talking really broad categories like "conversation" "newsletter" "spam" > "nonspam ad", etc..) Guess I need to read up on Bayes some more. I was thinking more along the lines of separate databases for auto and manual learning that are combined for a result, giving more weight to manual learning. Maybe that just isn't reasonable, though. I can't see (at least here) that manual learning would get any kind of significant volume. Someone's only going to send in a message for manual learning if it is a leaked spam or a false positive, and then only if they bother to do it. I'd be surprised if the manual learning volume was 1 in 10,000 of the messages going through the auto-learning. Wes