On Thu, Apr 13, 2006 at 01:35:19PM +0200, Mark Martinec wrote: > Agreed, this rule is completely inappropriate, it penalizes valid > encoding according to RFC 2047 and fires on any lengthier Subject > line in non-English language. It should disappear or have a > much reduced default score.
Says you. ;) 1.047 1.4619 0.0792 0.949 0.58 0.89 SUBJECT_ENCODED_TWICE So in the results used to generate scores, that rule is ~94.9% accurate, and hits ~1.46% of all spam. In a recent nightly mass-check run: 1.153 1.4173 0.1151 0.925 0.73 0.89 SUBJECT_ENCODED_TWICE So more ham seems to use encoding twice in the subject, and a little less spam uses it. Based on this, my guess is the generated score would go down. The thing to remember about rules is that they neither necessarily look for RFC non-compliance, nor do they avoid RFC compliant mails. They look for features that hit spam and try to avoid hitting ham. The key there is that rule development occurs with the results people make available. If the people generating results don't receive ham mails that, for instance, use multiple encodings in a Subject header, the results won't indicate that it occurs in ham very much. -- Randomly Generated Tagline: "I protect home plate like a mormon girl on prom night." - Mimi on the Drew Carey show
pgp7GImSPz38Z.pgp
Description: PGP signature