Re: Defining what the default welcomelist means

Greg Troxel Sun, 14 Apr 2024 05:00:52 -0700

Bill Cole <sausers-20150...@billmail.scconsult.com> writes:

> On 2024-04-12 at 18:56:15 UTC-0400 (Fri, 12 Apr 2024 18:56:15 -0400)
> Greg Troxel <g...@lexort.com>
>
>> Bill Cole <billc...@apache.org> writes:
>>
>>> 1. We serve our users: receivers, not senders. Senders claiming FPs
>>> need the support of a corroborating would-be receiver.
>>
>> Agreed.  Or maybe we take requests to add only from receivers.
>
> Effectively, yes. Senders won't refrain from requesting to be welcomed
> by default just because we say we don't accept those requests. Only
> receivers can corroborate the existence of any FP problem which would
> be solved by a default welcomelist entry, and this isn't a 'just find
> one example' sort of issue.


They won't refrain from writing, but it's fair to not let them open bugs
or have bugs open in the tracker.  And to tell them

  1) clean up your mail

  2) we only take requests for defwl from actual receivers, so we're
  done with this conversation.  use of sock puppets is not ok.

That's what I meant by "not take requests from".

>>> 2. If senders have FPs on objectively legitimate mail, their first and
>>> most important step is to identify WHY SpamAssassin thinks it is
>>> spam. and address that. Do you need the invisible text? Is the message
>>> embedded in a remotely-fetched image? The sea of "&zwnj" entities in
>>> your messages' HTML serves what purpose exactly? If there's a real FP
>>> problem with some rule that regularly is proved out by RuleQA, open a
>>> bug.
>>
>> Sure, but if you serve receivers, often people will have misfiling and
>> the sender is opaque, even if not spam and dkim.  So saying the sender
>> should fix is misaligned with serving receivers.  Yes, they *should*,
>> but people shouldn't send html mail either :-)
>
> I don't see this as misaligned, but rather a way of saying that def_w*
> entries come behind site-local receiver mitigations and
> receiver/sender collaboration on fixing the shabby mail.

What I was trying to express is that often senders, even zero-spam
senders, are often enormous, opaque, and intractable.  So while I agree
in theory, I guess the real question is whather we want to say to a
receiver:

  your non-spam mail is spammy, and we aren't going to add a defwl
  because first you need to get e.g. Bank of America to stop sending
  html mail.

or

  your non-spam mail is spammy and it's ok to add a defwl

I have occasionally complained to BigCorp and it has never been useful.
Sure, one can get the branch manager to reverse a fee, but I mean one
cannot get them to change their practices.

> One reason I opened this topic is that many existing listings were
> nothing like last resorts to solve concrete problems but seem to be
> more prophylactically applied. I.e. to assure that generally (and
> vaguely) 'good' senders will get their mail through despite using
> pointless antipatterns that are predominantly used by spammers. Maybe
> there's a need for that, but it should not be part of SA proper.

This is a slippery slope.  We're trying to make correct classification
decisions for users.  I can definitely see both sides.

But I don't mean generally/vaguely.  I mean senders that are zero-spam
and likely important to receivers, in the bank/airline notification (and
similar) class.  Meaning something with real-world consequences that is
timely.  Not newsletters.

>> I see all spam classification as probabalistic and there is risk of FP.
>> If a domain emits *only ham* and is dkim signed, and we believe that
>> receivers want it, I think it makes sense to have it in.
>
> I see no point in that if there is no *evidence* of actual FPs. I
> don't think the default rules should try to game local incidents of
> Bayes or AWL dis-learning that ends up hitting banking
> notifications. Or (at the risk of being misinterpreted...) by the use
> of 3rd-party rules like the KAM channel that are much tougher on the
> bad HTML practices of corporate email composers.

FWIW, I have given up on the KAM rules.  The scores are insanely high
for things that appear in ham, and I was having too-frequent
misclassification.  Some of the scores were triggering on things which
are not even objectively spammy, e.g a watch rule on a technical
discussion of clocks where it was on topic and I was subscribed.

Because of the probabalistic nature, I see it as sensible to defwl
things like bank notifications (that are 100% non-spam and dkim) to
reduce the odds that future rules will cause problems.  This is partly
from my KAM ruleset experience where I wake up to misfiled mail because
there is new overly aggressive rule.  Much less likely in SA proper, but
still.

>> I am extremely skeptical of anything that smells of email marketing
>> here.  I would expect only places sending transactional mail and alerts
>> to established customers.
>
> I share the skepticism, but I have been working with business
> customers and their love of other businesspeople's email marketing
> (and random non-work-related email...) for long enough that I have
> stopped arguing with the nature of email that people eagerly desire in
> their mailboxes. I care that it is contextually safe, legal, and
> solidly consensual. There are marketers who stay inside the lines.

If it's really 100% ok, fine.  I just said that I'm skeptical and thus
require more convincing from and ESP than from bank alerts, to overcome
a presumption of "email marketing is rarely ok".

> It's easy to write a rule that will identify mail from a specific
> sender pattern, passing DKIM and/or SPF. The general
> welcomelist/blocklist mechanism exists because it's easier and more
> manageable than having rules for each pattern, but testing has to be
> done with specific rules.
>
> If you want to see an example, look for the bugs in the past couple of
> years opened against the various abused TLDs that we have in a
> suspicious domains list used in various rules. The test has been to
> create rules examining specific TLDs, e.g. the xyz, best, online,
> site, fun, pro, and btc TLDs all have test rules in the current
> default ruleset. (LAt the moment it looks like xyz need to not be
> listed.)

So you mean

   we can add a defwl line if it passes the "this really isn't spam and
   we [have evidence of FP|concern of future FP]"

and

   if it's a *rule*, not a defwl, then the bar is vastly higher

and that makes sense to me.

>> It might also make sense for each welcomelist rule to have a score.
>
> Do you mean unique rules per domain? No, that's got a scaling problem.

I mean

  defwl foo.org -4
  defwl bar.org -2

and I get it that you object.

However, I think it would be good if one could express that, for users
to configure, even if doctrine says that the default ruleset doesn't do
that.  I realize that's out of scope for this discussion.

> Anyone who feels it justified can (as I do) reduce the power of the default 
> welcomelist:
>
> score USER_IN_DEF_DKIM_WL -2
> score USER_IN_DEF_SPF_WL -2

Thanks, useful to know.

> By default those each score -7.5 so a doubly-confirmed message gets
> the same insane -15 as a legacy listing (def_whitelist_from_rcvd) that
> doesn't require authentication. No such listings still exist in the
> default rules.


I am slightly skeptical of SPF vs DKIM, and I wonder how much mail there
is that

  belongs on the defwl
  does not have dkim

I'd be inclined to

  drop defwl spf rules if there is a dkim rule

  score USER_IN_DEF_SPF_WL -2.5
  (in published rules)

but these are tiny nits  not really relevant to your major point.

Re: Defining what the default welcomelist means

Reply via email to