Machines don't do all things better than do humans.  In particular, they're very
bad at making educated guesses in an absence of actual data.  So rules which are
triggered infrequently in the corpus used by the GA are not evolved, but just
set by hand.  This includes all the network-related rules, since mass-check does
not cover those rules.  It also includes rules which have been found to create
lots of false positives if used at the GA-evolved scores.  It includes the
whitelist/blacklist scores, since those are somewhat absolute, and in particular
it's deemed (well by me, others have different views) important that they're
polar opposites of each other.  If you browse the 50_scores.cf file, the
comments ahead of each section more or less explain why any scores not set by
the GA aren't.  The section at the end is simply those rules where there wasn't
enough data in the corpus to justify allowing the GA to modify the scores.
Doesn't mean that those rules don't show up at all; just that they don't show up
more than a small handful of times.  As to your false-pos/false-neg problem, I
would encourage turning on autowhitelisting (-a option to spamd or spamassassin)
to reduce false positives, and turn on all network tests, including razor and
DCC, to reduce false-negatives.

C

Kingsley G. Morse Jr. wrote:

KGMJ> I installed SA 2.20 a few days ago and it's
KGMJ> mis-categorizing more emails than I'd like. I'll
KGMJ> *guess* that it's missing 10% of spams and
KGMJ> mislabelling 1% of my legitimate email as spam.
KGMJ>
KGMJ> The obvious explanation is that I'm doing something
KGMJ> wrong, like not using razor or spamd.
KGMJ>
KGMJ> However, I noticed that only some scores in
KGMJ> /etc/spamassassin/50_scores.cf were optimized by the
KGMJ> genetic algorithm.
KGMJ>
KGMJ> It seems to me that SA would work more coherently if
KGMJ> ALL its rules' scores were optimized in the
KGMJ> evolutionary cauldron.
KGMJ>
KGMJ> Am I missing something?


_______________________________________________________________

Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm

_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to