Re: Concerned with scores for from rfc-ignorant.org

John Rudd Thu, 12 Oct 2006 15:54:59 -0700

Kurt Fitzner wrote:

John D. Hardin wrote:
 > But if the stated purpose of the BL is "this domain does not have a

working postmaster address" then it's unreasonable to ask them to
exclude a domain that does not have a working postmaster address, no
matter how large or popular that domain is.


My concern is the score attached to those rules by SpamAssassin.  The
purpose of SpamAssassin is to detect spam with as few false positives as
possible.  Attaching a score of 3.2 to every outgoing mail from
yahoo.com is, counterproductive.  I would even go so far as to claim
that those rules are adding more spam points to ham mail than any other
rule.

The purpose of SpamAssassin is not to punish domains without working
postmaster addresses.  It is not to act as RFC cops. It is to detect
spam.  Let's not lose sight of the goal because some BL list has gone on
a crusade to police compliance to RFC's that have lost relevance.

As far as SpamAssassin is concerned, the rule is only to detect spam,
and if that is the case, then size and popularity of the domain does
matter - the ham to spam ratio from that domain matters, and the volume
of false positives definitely matters.  Note to all:  the rule is broken.

No. The size of the domain does not matter. The volume of the domandoes not matter. The popularity of the domain does not matter.

What matters is, when looking at the spam corpus vs the ham corpus, doesapplying that score value to messages which come from/through a hostlisted in RFCI help to differentiate spam from ham. The specific hosts,and their characteristics, don't matter in determining the value of_that_ rule. Nor should they.

The essential questions are: "did the message come from/through a hostin that RBL?" and "given _all_ messages that come from _all_ hosts inthat RBL, how accurate is that characteristic as a predictor of anyrandom message being spam?" Notice, a specific host isn't part ofeither question.

You're right that the purpose of spam assassin is not to punish domainswho violate RFC's. It is also not the purpose of spam assassin toreward or give exemptions to domains that are large/popular/etc. It isthe purpose of spam assassin to identify spam, and in doing so itdevelops rules and then weights those rules according to their accuracyto the corpus. That rule has a 3.2 value because the 3.2 value isaccurate to differentiating spam vs ham in the corpus. Therefore, thescore is appropriate.

If you're complaining that the rule isn't actually weighted correctly_across_all_messages_from_all_hosts_ (not just messages from your petdomain(s)), then see about giving more counter examples to the team thatperforms that part of the determination, so that they can be part of thecorpus which sets the scores.

Re: Concerned with scores for from rfc-ignorant.org

Reply via email to