On 2/20/2015 12:35 PM, Kevin Miller wrote:
When a fresh spam flood comes in, sometimes 50 or more of my users will get hit 
with the same message - just a different user in the To: line.  When one trains 
the bayes database, is there a significant difference between training on all 
50+ or just grabbing a few of the messages and training on them?  Will bayes be 
more convinced of the spaminess of a particular message if it sees dozens 
rather than a couple?

Yes, there will be a difference. Training the exact same message multiple times will not do anything, but if you have 50 copies of the message that are all slightly different, train them all.

In general, train as much as you can manage. Ideally, you would train bayes on every message that passes through your server. The more data bayes has, the better it works.

--
Bowie

Reply via email to