At Wed Dec 31 00:17:54 2003, pjh wrote:

> My main problem is that I'm struggling to build a conceptual model
> in my head of how SA is working and where responsibility is
> delegated between administrator settings and user settings.

Some of the confusion might be caused by what you're interpreting as
user settings and the way I (and I suspect others) interpret them. 

To me, a user setting is something the user puts in user_prefs.  For
example, a user might decide that he wants to receive mails about
mortgages and wants to reduce the scores for that set of rules -- if
so, he can add entries to user_prefs to override the default scores
for those rules.

Bayesian learning is not what I'd call "user settings", simply because
it's not something the user has to enable in a default installation.

> I am seeing the filtering work - but I'm confused because I don't
> know if it's largely the result of some clever default mode of
> installation, or if it's because of the 200 or so ham and spam
> messages I've entered, or both.

OK, to clarify:

Bayesian classification is enabled by default.
Automatic learning is enabled by default.
Automatic learning works on messages scoring < 0.1 or > 12.0 (in SA
2.61, other versions may have different thresholds).

> The documentation is maddeningly vague on these issues.

That's from the man page for Mail::SpamAssassin::Conf, under the
heading "LEARNING OPTIONS".

> I'm frustrated because I'm not getting unambiguous answers to my
> questions :-) Again, if I (as an end user) didn't use sa-learn at
> all, would Bayesian filtering occur on my incoming email
> (presumeably because of a default or generic mode of operation - in
> which case my original question of how can I differentiate between
> user and generic effectiveness applies!)?

Yes, it will work.  It will take much longer to kick in because it
will only learn from spam that scores over 12.0 and ham that scores
under 0.1.  It's not likely to work as well as if you manually train
it, because it's only learning from a limited subset of the mail you
receive. 

As to how you'd tell the difference between what you'd get via
autolearning as opposed to by manual training, well, you'll need to
run the same set of mail through two different setups and see how each
message scored.

> The reason why I dwell on this is because, as an administrator, I
> can envision where some Bayesian filtering might occur by default,
> but that the sa-learn tool might allow the user to fine-tune it for
> themselves.

I'd recommend that users do at least some manual learning.  In
addition, it's pretty much a requirement that they correct erroneous
autolearning (where a spam gets auto-learned as a ham or vice-versa)
-- this is called "mistake-based training".

> Or, I could envision a scenario where it won't occur for the end
> user unless they actively use sa-learn - in which case I assume that
> the basic pattern matching searches on key phrases (e.g. Adult, Teen
> etc.) is sufficient to provide some useful filtering.

There are many environments where it's not possible to use Bayesian
learning -- it's not available at my workplace, for example.  In many
environments, I'd argue that users are not sufficiently sophisticated
to use Bayes properly (i.e. by doing their own training and/or by
correcting erroneous auto-learning), with the end result that the
Bayes scores get completely screwed up (e.g. spam getting very low
Bayes scores and ham getting high scores).  If that happens, you're
worse off than if Bayes wasn't there at all.

Martin
-- 
Martin Radford              |   "Only wimps use tape backup: _real_ 
[EMAIL PROTECTED] | men just upload their important stuff  -o)
Registered Linux user #9257 |  on ftp and let the rest of the world  /\\
- see http://counter.li.org |       mirror it ;)"  - Linus Torvalds _\_V


-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to