Re: Defining what the default welcomelist means

Bill Cole Sat, 13 Apr 2024 10:44:12 -0700

On 2024-04-12 at 18:56:15 UTC-0400 (Fri, 12 Apr 2024 18:56:15 -0400)
Greg Troxel <g...@lexort.com>
is rumored to have said:

> I see it very slightly differently, but mostly agree
>
> Bill Cole <billc...@apache.org> writes:
>
>> 1. We serve our users: receivers, not senders. Senders claiming FPs
>> need the support of a corroborating would-be receiver.
>
> Agreed.  Or maybe we take requests to add only from receivers.

Effectively, yes. Senders won't refrain from requesting to be welcomed by 
default just because we say we don't accept those requests. Only receivers can 
corroborate the existence of any FP problem which would be solved by a default 
welcomelist entry, and this isn't a 'just find one example' sort of issue.

>> 2. If senders have FPs on objectively legitimate mail, their first and
>> most important step is to identify WHY SpamAssassin thinks it is
>> spam. and address that. Do you need the invisible text? Is the message
>> embedded in a remotely-fetched image? The sea of "&zwnj" entities in
>> your messages' HTML serves what purpose exactly? If there's a real FP
>> problem with some rule that regularly is proved out by RuleQA, open a
>> bug.
>
> Sure, but if you serve receivers, often people will have misfiling and
> the sender is opaque, even if not spam and dkim.  So saying the sender
> should fix is misaligned with serving receivers.  Yes, they *should*,
> but people shouldn't send html mail either :-)

I don't see this as misaligned, but rather a way of saying that def_w* entries 
come behind site-local receiver mitigations and receiver/sender collaboration 
on fixing the shabby mail.

> I agree that requests from senders should be met with "make your mail
> less spammy".

Right. If SA is generating FPs, in nearly all cases this can be fixed without 
resorting to a global welcomelist entry. There's a balance between local rule 
mitigations, sender adjustments to lose spamsign patterns, and tweaks to the 
rules at the project level which validate in RuleQA in how FP issues are 
solved, and def_wl entries really should be a last resort.

One reason I opened this topic is that many existing listings were nothing like 
last resorts to solve concrete problems but seem to be more prophylactically 
applied. I.e. to assure that generally (and vaguely) 'good' senders will get 
their mail through despite using pointless antipatterns that are predominantly 
used by spammers. Maybe there's a need for that, but it should not be part of 
SA proper.

>> 3. This is NOT a general-purpose reputation list. It exists to aid SA
>> users who have FPs from SpamAssassin default rules for wanted mail,
>> where we cannot determine any acceptable adjustment to rules which
>> would avoid the problem. It is a "last resort" form of FP mitigation
>> when we cannot find an acceptable general solution that isn't
>> domain-specific to a widely accepted sender domain.
>
> I see all spam classification as probabalistic and there is risk of FP.
> If a domain emits *only ham* and is dkim signed, and we believe that
> receivers want it, I think it makes sense to have it in.

I see no point in that if there is no *evidence* of actual FPs. I don't think 
the default rules should try to game local incidents of Bayes or AWL 
dis-learning that ends up hitting banking notifications. Or (at the risk of 
being misinterpreted...) by the use of 3rd-party rules like the KAM channel 
that are much tougher on the bad HTML practices of corporate email composers.

> I think of things like alerts from banks, airline saying your flight
> time has changed, etc. where FPs are a real problem.

Right. I think we basically have that covered with the legacy entries, which 
are extensive, undocumented, and generally banal.

> I am extremely skeptical of anything that smells of email marketing
> here.  I would expect only places sending transactional mail and alerts
> to established customers.

I share the skepticism, but I have been working with business customers and 
their love of other businesspeople's email marketing (and random 
non-work-related email...) for long enough that I have stopped arguing with the 
nature of email that people eagerly desire in their mailboxes. I care that it 
is contextually safe, legal, and solidly consensual. There are marketers who 
stay inside the lines.

>> 4. We should only add or remove listings based on specific requests
>> backed by transparent evidence. Subversion commit messages are not
>> enough, we need a bug report or a mailing list discussion.
>
> sure

Important because it brings us more in line with the transparency norms that 
all ASF projects are expected to follow and because it reduces the likelihood 
of snowballing conflict to have a record of open discussion of how & why 
decisions are made.

>> 5. Existing entries are presumed valid unless and until they cause a
>> false "ham" classification of spam which can be shared publicly in a
>> useful form.
>
> I guess, or if someone makes an argument that they aren't right.

Defining the validity of "aren't right" arguments is important.

I believe that we are ethically (and perhaps as a result legally) safe as long 
as we are acting on rational judgment grounded in relevant facts and not 
hunches.

>> 6. New entries must pass prolonged RuleQA testing of sender-specific
>> rules before being added to the default welcomelist.
>
> I don't follow this.  Do you mean add 'def_welcomelist_dkim foo@bar' to
> a testing ruleset and see if it's ok?

No. That's not  a useful test, as it gets lost in the rest of the list.

> That seems fine if so.  If not, I
> didn't follow you.

It's easy to write a rule that will identify mail from a specific sender 
pattern, passing DKIM and/or SPF. The general welcomelist/blocklist mechanism 
exists because it's easier and more manageable than having rules for each 
pattern, but testing has to be done with specific rules.

If you want to see an example, look for the bugs in the past couple of years 
opened against the various abused TLDs that we have in a suspicious domains 
list used in various rules. The test has been to create rules examining 
specific TLDs, e.g. the xyz, best, online, site, fun, pro, and btc TLDs all 
have test rules in the current default ruleset. (LAt the moment it looks like 
xyz need to not be listed.)

> It might also make sense for each welcomelist rule to have a score.

Do you mean unique rules per domain? No, that's got a scaling problem.

> Basically to bring the mail down to -2, to give it some headroom.   But
> that might be too complicated compared to benefit.

Anyone who feels it justified can (as I do) reduce the power of the default 
welcomelist:

score USER_IN_DEF_DKIM_WL -2
score USER_IN_DEF_SPF_WL -2

By default those each score -7.5 so a doubly-confirmed message gets the same 
insane -15 as a legacy listing (def_whitelist_from_rcvd) that doesn't require 
authentication. No such listings still exist in the default rules.

-- 
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire

Re: Defining what the default welcomelist means

Reply via email to