That all makes sense.  Thanks...

...Kevin
--
Kevin Miller
Network/email Administrator, CBJ MIS Dept.
155 South Seward Street
Juneau, Alaska 99801
Phone: (907) 586-0242, Fax: (907) 586-4500
Registered Linux User No: 307357 


> -----Original Message-----
> From: Dave Warren [mailto:da...@hireahit.com]
> Sent: Friday, February 20, 2015 11:30 AM
> To: users@spamassassin.apache.org
> Subject: Re: Quick question about training...
> 
> On 2015-02-20 09:44, Bowie Bailey wrote:
> > On 2/20/2015 12:35 PM, Kevin Miller wrote:
> >> When a fresh spam flood comes in, sometimes 50 or more of my users
> >> will get hit with the same message - just a different user in the To:
> >> line.  When one trains the bayes database, is there a significant
> >> difference between training on all 50+ or just grabbing a few of the
> >> messages and training on them?  Will bayes be more convinced of the
> >> spaminess of a particular message if it sees dozens rather than a
> >> couple?
> >
> > Yes, there will be a difference.  Training the exact same message
> > multiple times will not do anything, but if you have 50 copies of the
> > message that are all slightly different, train them all.
> >
> > In general, train as much as you can manage.  Ideally, you would train
> > bayes on every message that passes through your server.  The more data
> > bayes has, the better it works.
> 
> And I'd suggest the same for non-spam, train duplicative ham even if it
> happens to be similarly addressed to different users. More data is
> (nearly) always better for bayesian learning systems.
> 
> --
> Dave Warren
> http://www.hireahit.com/
> http://ca.linkedin.com/in/davejwarren
> 

Reply via email to