Alex Woick wrote:
> ...very nice analysis of rule trimmed...

Thank you very much for taking the time to look so closely at that
rule.  I still think it is not behaving as it was originally intended
and as such is scoring too heavily.  I filed a bug on this issue so
that it would not get lost.

  http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5716

> Since these rules were assigned such a high score, only very few ham 
> from the score-generating corpus (if any) seem to contain this 
> misspelling.

Very likely the case.  I think the typical email has mostly correctly
spelled normal words with a splatter of text strings that are not in
any dictionary.

> If I understand this process correctly, the scores are not manually
> determined but by a lengthy automatic analysis process for a big
> message corpus that tries to minimize scores for known ham and
> maximize scores for known spam as a whole.

Correct.  It is machine scored.

  http://wiki.apache.org/spamassassin/HowScoresAreAssigned

> What you can do:
> - lower the score for these rules manually

Already done.  I reduced those to 0.5 each so that the combined score
for a single mispelling would be only 1.0 points.

> - and perhaps give the SA developers your FP to include it into their 
> corpus.

Sure.  But this is also very easily created on the fly as well.

Thanks
Bob

Reply via email to