Re: mailman keeps holding for non-subscribers

Bob Proulx Sun, 12 Apr 2020 23:46:26 -0700

Eric Wong wrote:
> Bob Proulx <b...@proulx.com> wrote:
> > Eric Wong wrote:
> > > OK, so I'm following half the recommendations
> > > 
> > > The ones I'm going against are:
> > > 
> > >   generic_nonmember_action=hold (I want Accept)
> > >   default_member_moderation=yes (I want no)
> > 
> > May I try to convince you otherwise?  Because there are good reasons
> > for the recommended settings.
> 
> Not unless the maximum delay can be minutes.  In other words,
> similar to what greylisting gets without any human interaction.


The initial contact delay is the hill being defended?  On a mailing
list that may have many interactions over time.  You and I might be
discussing some topic.  Say the topic of mailing list operations. :-)
We may send many messages back and forth on the mailing list.  This
might go on for years and years over many topics.  Each of those
happen fast and efficiently.  And it is not the continuing problem of
spam to the mailing list that is a problem.  That spam is okay.  But
it is the very first initial contact email message delay that is the
showstopper?  It's beyond the pale?

How about SMTP time greylisting?  I would gather from this discussion
so far that SMTP greylisting, which is exactly the same and creates a
delay upon the initial contact, would also be a showstopper too then?
Greylisting at SMTP time would also be beyond the pale?

I am sorry but IMNHO it is the daily day to day operations that are
much more important to optimize and make efficient.  Because those are
things that happen repeatedly, day after day.  One time startup costs
should not be too onerous, but may have some cost in order to have
benefit.  Like greylisting.  But it is the repeated operations that I
think should be targeted for optimization.  And that is the normal day
to day use of the mailing lists without having them filled with spam.

> > > So, should I remove listhel...@gnu.org from moderators?
> > > I still want automated spam filters such as SpamAssassin, though.
> > 
> > The listhelper anti-spam SpamAssassin et al cancel-bot depends upon
> > the hold actions.  If messages do not get held then it has no ability
> > to filter spam.  That's fundamental to how it works with Mailman.
> 
> That's unfortunate.  I'm not familiar with Mailman, but can't
> the MTA feed the message through spam filters before Mailman
> ever sees it?

It's interesting that you mention that.  Because for years and years
the frontend anti-spam was poor.  Very poor.  And this is not a
reflection upon the current FSF staff who have inherited the present
situation.  But that is the traditional situation.  For a very long
time the frontend anti-spam has been very poor.  And therefore we have
been implementing the anti-spam portion mostly in the Mailman
interface where it is possible for volunteers to interact with the
system.

There has been discussion of how to improve the frontend anti-spam.
At this time the systems are getting OS upgrades.  Those are dearly
needed.  And obviously a first step in the improvement of the system.
And there have been discussion about what needs to be done to improve
the frontend anti-spam.  This is starting to happen.  But is still
going to take a while from now to be improved.  As with many things
life and time is what keeps everything from happening all at once.

However given the flow of mail and spam there needs to be a way to
train the learning engines.  As we just mentioned in the previous
emails in our thread.  Right now Mailman provides a reasonably
convenient hook location to provide that training.  One that is not as
easy to do without the mailing list manager.  Improving the feedback
location in the flow of email is something to look at doing.  But
there is a lot of associated work that needs to happen first before
working on that aspect of the problem.

> I use mlmmj for legacy mailing list subscribers, that just runs
> off cron with no synchronous relationship with the MTA at all.
> I have replay script which makes it incrementally read mail from
> public-inbox (git).

If we are going to start listing out mailing list management software
that is better than Mailman then we had better get comfortable.  It's
a long list!  I am not a fan of Mailman.  Mailman presents a pretty
low threshold.  I would start with Smartlist which is very capable and
scales well.  Also I have long been a fan of the way ezmlm works, if
only it didn't require qmail.  And at one time I would have said that
Enemies of Carlotta had interesting features for a mailing list.  For
that matter I actually like the venerable old Majordomo.  One of the
very active mailing lists I interact with still to this day uses
Majordomo for it!

But Mailman is an official GNU Project.  There is a benefit to "eating
your own dogfood" as the saying goes.  That and due to other reasons
the lists.gnu.org machine is likely to continue to run GNU Mailman
instead of other mailing list manager programs for a while to come.

> 100% agreed.  I've been using an inotify + Maildir-based
> training system since 2008 or so spamc, even pre-public-inbox:
> 
>       https://public-inbox.org/dc-dlvr-spam-flow.html

I looked at the mail flow through the diagram and without having spent
a huge amount of time understanding it the flow looks similar to the
way other sites do this.  As users read mail and determine that a
message is spam or non-spam they divert mail to different places and
based up on those places the learning engines are trained-on-error.
That's great!  I do that too on my non-gnu systems end user mailboxes.

But that isn't really applicable to the way a mailing list works.
Because a mailing list delivers (forwards) mail to other people.  The
delivery of spam to other people's mailbox is very bad.  And it is
difficult for implementing distributed training feedback from the
community.  We can't not deliver a message that is spam after already
having delivered it.

> Spam gets trained upon removal from archives.

Your preferred system (AFAICT) is one of a centralized storage without
delivery.  Because there is no delivery it does not deliver spam and
that spam can be removed "quietly behind the scenes" as it were.  That
is what Google does with Gmail too.  And others.

However that is not a mailing list.  It's something different.  It is
more similar to a web forum.  Even if it is also different in many
ways from a web forum.  It feels more similar than it is different.

If I am a subscriber to a mailing list and it passes along spam then I
will receive that spam.  (Where I can filter it out on my end but that
is already too late to prevent the delivery of it.)

Many people would object to the centralized storage based system
because it is centralized and creates an environment where a cabal
could, 1984 style, remove historical messages and rewrite history.
Don't like what someone said?  Simply remove that message from the
storage.  Or without malice there is the possibility of technical
failure.  A storage failure without backup would lose the entire
mailing list history.  These problems are not possible in a
traditional mailing list as those historical messages already were
sent and became part of the historical record.  And they were
distributed among all participants.  Everyone has a copy.

> > > > The resulting process means that as a general statement project
> > > > mailing lists need no explicit maintenance.  If you as a project
> > > > maintainer and also a maintainer of the mailing list do nothing then
> > > > everything happens as needed anyway.  You are however free to be as
> > > > involved in the mailing lists as you want.
> > > 
> > > So if I'm away and unable to administer dtas-...@nongnu.org, and
> > > generic_nonmember_action is "Hold"; does the "human team" at GNU
> > > will eventually accept postings in my absence?
> > 
> > Yes.  Eventually usually means a few hours.
> 
> <snip> yikes, that seems like a lot of human labor :<

No.  It's only a few minutes a day.

While typing this message I switched over to the other window and ran
through the mail queues.  It took less than two minutes before I was
done and flipped back to this message.  Everything was mostly caught
up.  There were only a dozen messages needing review at this moment.
Other listhelpers had been at work.  We interlace randomly.  There was
no heavy spam wave hitting the system needing a custom rule written.
Just the normal routine activity.  A couple of minutes.

Note that I am NOT clicking around in the Mailman web interface.  I am
either in 'mutt' looking at mail from the moderation emails, or
running scripts which are doing things.  There is no mouse activity
involved at all.  That would definitely be tedious.

> > It is your mailing list and this is up to you.  But people tend to be
> > very intolerant of spam on mailing lists.
> 
> It depends on the quantity, I suppose.  vger.kernel.org lets a
> few through and nobody seems to mind.  (I'm just a subscriber
> on vger, not an admin)

And lists.gnu.org has infrequent spam slip through too.  No system is
perfect.  And there are human mistakes at times.  Humans have a
non-zero bit-error-rate after all.  Worse than the automation
actually.

> > For example if people receive their mail at Gmail or Yahoo or
> > wherever, and then spam to the mailing list is received at their
> > mailbox, and they push the Spam button, this teaches Google and Yahoo
> > and so forth that lists.gnu.org is a source of spam and may create
> > problems for normal mailing list delivery.  This has been more of a
> > problem with Yahoo than most other places.  Some spam is of course
> > inevitable but we try to keep it to a minimum.  If it becomes a
> > problem then if not us volunteers then FSF admin will need to become
> > involved.  Getting blacklisted due to spam is a pain to deal with.
> 
> Yes, that is a problem.  It's part of the reason public-inbox is
> slowly moving mailing lists into a "pull" subscriber model over
> NNTP/Atom/HTML (and maybe even POP3).

That's great!  For public-inbox users.  Which is not a mailing list.

Most of what has been said about non-delivery and central storage also
applies to web forums.  And people who like web forums often say they
like it for all of the same reasons.  However I personally really hate
using web forums.  For some of the same reasons!

> > The only thing we really must insist upon is to discard spam and not
> > reject spam.  Most spam uses forged from addresses.  Therefore
> > rejecting spam ala Mailman Reject usually sends a rejection message to
> > an innocent 3rd party who then gets "backscatter" spam.  They validly
> > report lists.gnu.org as a spam source in that case and it gets us in
> > trouble with the DNSBLs.  Therefore please do not Reject random spam
> > messages.
> 
> Right.  One of my concerns with increased reliance on whitelisting
> is that spammers will start using whitelisted addresses themselves.
> SPF might discourage that, though.

It's somewhat of a scary potential avenue for abuse.  One that has
only been infrequently targeted.  But SPF, DKIM, and so forth helps
with preventing the forgeries.  Many sites do not use those however
and are still subject to delivery of forgeries from those sites.  I
have been thinking of ways to defend this particular potential abuse
avenue on the mailing lists, because it prickles at me.  Hopefully in
the arms race between user and abuser the user will win.

> Fwiw, vger.kernel.org just drops HTML, which seems to cut a lot
> of spam, too.  They also do greylisting from what I can discern.

For you and I if the mail is HTML then I can drop it without any real
loss of signal.  (How do you like that opinionated comment!)  However
for a LOT of other people they believe just as strongly that they want
to send HTML mail.  Just recently on the 'mutt' users mailing list
there was a netizen of long standing who started a discussion asking
how could they use mutt to send HTML email?  I found the statement
rather shocking!  Who would be a mutt user but also be embracing HTML
mail?  But so it is.  And many freemail sites make it impossible to
avoid sending html mail.

Simply dropping html mail is not a practical solution, regardless of
how much I would wish the world would do so.  For most of the mailing
lists we have Mailman convert the html to plain text and that seems to
be the acceptable compromise.

Bob

Re: mailman keeps holding for non-subscribers

Reply via email to