Kurt Fitzner wrote:
John D. Hardin wrote:
> But if the stated purpose of the BL is "this domain does not have a
working postmaster address" then it's unreasonable to ask them to
exclude a domain that does not have a working postmaster address, no
matter how large or popular that domain is.
My concern is the score attached to those rules by SpamAssassin. The
purpose of SpamAssassin is to detect spam with as few false positives as
possible. Attaching a score of 3.2 to every outgoing mail from
yahoo.com is, counterproductive. I would even go so far as to claim
that those rules are adding more spam points to ham mail than any other
rule.
The purpose of SpamAssassin is not to punish domains without working
postmaster addresses. It is not to act as RFC cops. It is to detect
spam. Let's not lose sight of the goal because some BL list has gone on
a crusade to police compliance to RFC's that have lost relevance.
As far as SpamAssassin is concerned, the rule is only to detect spam,
and if that is the case, then size and popularity of the domain does
matter - the ham to spam ratio from that domain matters, and the volume
of false positives definitely matters. Note to all: the rule is broken.
No. The size of the domain does not matter. The volume of the doman
does not matter. The popularity of the domain does not matter.
What matters is, when looking at the spam corpus vs the ham corpus, does
applying that score value to messages which come from/through a host
listed in RFCI help to differentiate spam from ham. The specific hosts,
and their characteristics, don't matter in determining the value of
_that_ rule. Nor should they.
The essential questions are: "did the message come from/through a host
in that RBL?" and "given _all_ messages that come from _all_ hosts in
that RBL, how accurate is that characteristic as a predictor of any
random message being spam?" Notice, a specific host isn't part of
either question.
You're right that the purpose of spam assassin is not to punish domains
who violate RFC's. It is also not the purpose of spam assassin to
reward or give exemptions to domains that are large/popular/etc. It is
the purpose of spam assassin to identify spam, and in doing so it
develops rules and then weights those rules according to their accuracy
to the corpus. That rule has a 3.2 value because the 3.2 value is
accurate to differentiating spam vs ham in the corpus. Therefore, the
score is appropriate.
If you're complaining that the rule isn't actually weighted correctly
_across_all_messages_from_all_hosts_ (not just messages from your pet
domain(s)), then see about giving more counter examples to the team that
performs that part of the determination, so that they can be part of the
corpus which sets the scores.