Hi All, Thanks for being so patient with me :)
Is it true then, that if Bayesian filtering is active, that each user has their own version/customized database in their home ( ~/.spamassassin) directory and that using sa-learn is essentially modifying this particular database? The problem here is that there are no unambiguous statements relating to this in the documentation. Saying that SA does Bayesian filtering and the user can train SA does nothing to reveal these details. I wish someone would add lines like: A. Bayesian filtering only uses a database located in the users home directory B. Administrators can implement site wide Bayesian filtering and train SA using sa-learn. In this case the SA database is stored in directory /A/B/C These are not unreasonable statements and are very obvious guesses as to how SA might be working. If they are not correct then the proper statements and explanations really need to be added to the documentation. Regards, PH P. ----- Original Message ----- From: "Martin Radford" <[EMAIL PROTECTED]> To: "pjh" <[EMAIL PROTECTED]> Cc: "Martin Radford" <[EMAIL PROTECTED]>; "pjh" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Wednesday, December 31, 2003 6:02 AM Subject: Re: [SAtalk] user vs local prefs > At Wed Dec 31 00:17:54 2003, pjh wrote: > > > My main problem is that I'm struggling to build a conceptual model > > in my head of how SA is working and where responsibility is > > delegated between administrator settings and user settings. > > Some of the confusion might be caused by what you're interpreting as > user settings and the way I (and I suspect others) interpret them. > > To me, a user setting is something the user puts in user_prefs. For > example, a user might decide that he wants to receive mails about > mortgages and wants to reduce the scores for that set of rules -- if > so, he can add entries to user_prefs to override the default scores > for those rules. > > Bayesian learning is not what I'd call "user settings", simply because > it's not something the user has to enable in a default installation. > > > I am seeing the filtering work - but I'm confused because I don't > > know if it's largely the result of some clever default mode of > > installation, or if it's because of the 200 or so ham and spam > > messages I've entered, or both. > > OK, to clarify: > > Bayesian classification is enabled by default. > Automatic learning is enabled by default. > Automatic learning works on messages scoring < 0.1 or > 12.0 (in SA > 2.61, other versions may have different thresholds). > > > The documentation is maddeningly vague on these issues. > > That's from the man page for Mail::SpamAssassin::Conf, under the > heading "LEARNING OPTIONS". > > > I'm frustrated because I'm not getting unambiguous answers to my > > questions :-) Again, if I (as an end user) didn't use sa-learn at > > all, would Bayesian filtering occur on my incoming email > > (presumeably because of a default or generic mode of operation - in > > which case my original question of how can I differentiate between > > user and generic effectiveness applies!)? > > Yes, it will work. It will take much longer to kick in because it > will only learn from spam that scores over 12.0 and ham that scores > under 0.1. It's not likely to work as well as if you manually train > it, because it's only learning from a limited subset of the mail you > receive. > > As to how you'd tell the difference between what you'd get via > autolearning as opposed to by manual training, well, you'll need to > run the same set of mail through two different setups and see how each > message scored. > > > The reason why I dwell on this is because, as an administrator, I > > can envision where some Bayesian filtering might occur by default, > > but that the sa-learn tool might allow the user to fine-tune it for > > themselves. > > I'd recommend that users do at least some manual learning. In > addition, it's pretty much a requirement that they correct erroneous > autolearning (where a spam gets auto-learned as a ham or vice-versa) > -- this is called "mistake-based training". > > > Or, I could envision a scenario where it won't occur for the end > > user unless they actively use sa-learn - in which case I assume that > > the basic pattern matching searches on key phrases (e.g. Adult, Teen > > etc.) is sufficient to provide some useful filtering. > > There are many environments where it's not possible to use Bayesian > learning -- it's not available at my workplace, for example. In many > environments, I'd argue that users are not sufficiently sophisticated > to use Bayes properly (i.e. by doing their own training and/or by > correcting erroneous auto-learning), with the end result that the > Bayes scores get completely screwed up (e.g. spam getting very low > Bayes scores and ham getting high scores). If that happens, you're > worse off than if Bayes wasn't there at all. > > Martin > -- > Martin Radford | "Only wimps use tape backup: _real_ > [EMAIL PROTECTED] | men just upload their important stuff -o) > Registered Linux user #9257 | on ftp and let the rest of the world /\\ > - see http://counter.li.org | mirror it ;)" - Linus Torvalds _\_V > > > ------------------------------------------------------- > This SF.net email is sponsored by: IBM Linux Tutorials. > Become an expert in LINUX or just sharpen your skills. Sign up for IBM's > Free Linux Tutorials. Learn everything from the bash shell to sys admin. > Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click > _______________________________________________ > Spamassassin-talk mailing list > [EMAIL PROTECTED] > https://lists.sourceforge.net/lists/listinfo/spamassassin-talk ------------------------------------------------------- This SF.net email is sponsored by: IBM Linux Tutorials. Become an expert in LINUX or just sharpen your skills. Sign up for IBM's Free Linux Tutorials. Learn everything from the bash shell to sys admin. Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk