That all makes sense. Thanks... ...Kevin -- Kevin Miller Network/email Administrator, CBJ MIS Dept. 155 South Seward Street Juneau, Alaska 99801 Phone: (907) 586-0242, Fax: (907) 586-4500 Registered Linux User No: 307357
> -----Original Message----- > From: Dave Warren [mailto:da...@hireahit.com] > Sent: Friday, February 20, 2015 11:30 AM > To: users@spamassassin.apache.org > Subject: Re: Quick question about training... > > On 2015-02-20 09:44, Bowie Bailey wrote: > > On 2/20/2015 12:35 PM, Kevin Miller wrote: > >> When a fresh spam flood comes in, sometimes 50 or more of my users > >> will get hit with the same message - just a different user in the To: > >> line. When one trains the bayes database, is there a significant > >> difference between training on all 50+ or just grabbing a few of the > >> messages and training on them? Will bayes be more convinced of the > >> spaminess of a particular message if it sees dozens rather than a > >> couple? > > > > Yes, there will be a difference. Training the exact same message > > multiple times will not do anything, but if you have 50 copies of the > > message that are all slightly different, train them all. > > > > In general, train as much as you can manage. Ideally, you would train > > bayes on every message that passes through your server. The more data > > bayes has, the better it works. > > And I'd suggest the same for non-spam, train duplicative ham even if it > happens to be similarly addressed to different users. More data is > (nearly) always better for bayesian learning systems. > > -- > Dave Warren > http://www.hireahit.com/ > http://ca.linkedin.com/in/davejwarren >