On Thu, Apr 13, 2006 at 01:35:19PM +0200, Mark Martinec wrote:
> Agreed, this rule is completely inappropriate, it penalizes valid
> encoding according to RFC 2047 and fires on any lengthier Subject
> line in non-English language. It should disappear or have a
> much reduced default score.

Says you. ;)

  1.047   1.4619   0.0792    0.949   0.58    0.89  SUBJECT_ENCODED_TWICE

So in the results used to generate scores, that rule is ~94.9% accurate,
and hits ~1.46% of all spam.  In a recent nightly mass-check run:

  1.153   1.4173   0.1151    0.925   0.73    0.89  SUBJECT_ENCODED_TWICE

So more ham seems to use encoding twice in the subject, and a little
less spam uses it.  Based on this, my guess is the generated score would
go down.

The thing to remember about rules is that they neither necessarily
look for RFC non-compliance, nor do they avoid RFC compliant mails.
They look for features that hit spam and try to avoid hitting ham.
The key there is that rule development occurs with the results people
make available.  If the people generating results don't receive ham
mails that, for instance, use multiple encodings in a Subject header,
the results won't indicate that it occurs in ham very much.

-- 
Randomly Generated Tagline:
"I protect home plate like a mormon girl on prom night."
         - Mimi on the Drew Carey show

Attachment: pgp7GImSPz38Z.pgp
Description: PGP signature

Reply via email to