On Wed, 23 Jul 2003, Joe Julian wrote: > I have a list of specific trusted addresses in my whitelist, but it > still won't autolearn from them. Why not? Their scores are quite > negative, way below -2, but it still won't autolearn from them. It looks > like it's ignoring the whitelist when checking whether or not it should > autolearn. What can I do to change that?
Um, you probably -don't- want to change that, there's a good reason for that logic. Think about what you whitelist and why you whitelist those sites. It's usually because those sources send out 'ham' that "looks spammish". (if it didn't "look spammish" you wouldn't need to whitelist it.) If the messages contain lots of "spammish" content and you autolearn it as 'ham' then your bayes database will contain "spammish" tokens with 'ham' scores and it will defeat the whole purpose of that facility. For example, suppose you're subscribed to a maillist that discusses spam fighting techniques (hmm, do we know one of those ;). In that list it's not unusual for people to post example spam messages while discussing "why did this one get thru?", so traditional scoring mechanisms would mark those messages as 'spam', thus necessitating a whitelisting of that list. However if you autolearn those posts, you'll add lots of "spammish" tokens to your ham list. When you get good 'ham' from those sources just feed it to "sa-learn --ham" and you're done. If you always get 'ham' from those sources, you don't need the whitelist. The flip side of this is to beware of feeding 'hammish' spam to "sa-learn --spam". I fouled up my bayes by feeding lots of 'Nigerian' spam into "sa-learn --spam". For the most part 'Nigerian' looks like business mail and so I ended up with lots of business like tokens that had strong spam scores. Thus I started seeing all kinds of strictly 'ham' mail end up with 99% bayes scores. I had to dump it and start from scratch. Thus be sure that anything you "learn" is clearly representative of the type of message (spam or ham) that you want to recognize in the future. (and remember that bayes looks at individual words or small phrases). -- Dave Funk University of Iowa <dbfunk (at) engineering.uiowa.edu> College of Engineering 319/335-5751 FAX: 319/384-0549 1256 Seamans Center Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527 #include <std_disclaimer.h> Better is not better, 'standard' is better. B{ ------------------------------------------------------- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01 _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk