[MM3-users] Re: Held messages not delivered after approval

Krinetzki, Stephan Mon, 04 Aug 2025 07:56:06 -0700

>The django-admin commands aren't directly related, I'm going to ignore them 
>for now.  The only thing I know for *sure* runs at midnight daily is "mailman 
>digests --send".  On my Debian Linode, the default (which I left alone) is for 
>logrotate's cron job to live in >/etc/cron.daily, which is run at 06:25 daily 
>using "run-parts".  (This is quite a common setup on Linux.)  So we need to 
>know where the logrotate job is specified (crontab, cron.d, or cron.daily) and 
>at what time (@daily =
>midnight) to be sure that the mailman restart is related to the bad and shunt 
>queue files.

The logrotate is executed by a system Timer (Rocky 9 OS btw) and is planned for:

Tue 2025-08-05 00:00:00 CEST 7h left    Mon 2025-08-04 00:00:00 CEST 16h ago    
 logrotate.timer              logrotate.service

So every day at midnight.

>That is not normal.  Your control process is crashing every 15-20 seconds.  I 
>think it probably is a problem with the digests, not with the restart.  What 
>appears to be happening is that the digest process gets triggered, it creates 
>a message and queues it, then fails to >send it so nastily that Mailman 
>restarts (or stops and something like systemd restarts it).  On restart, 
>Mailman finds the digest message (probably in the out queue), tries to send it 
>again, crashes again, and eventually decides that isn't going to work, sends 
>it to bad, >and stops crashing.

I saw this but I don't have any idea how this happens. Currently there are ~42 
Mails after 'mailman unshunt' and I think, mailman loops over them (queue 
doesn't get shorter). But mails are delivered for a lot of lists.

>According to the config you posted earlier, you're sending most channels to 
>separate log files.  Have you checked any of them other than mailman.log and 
>smtp.log?  Also, note that httpd.log and error.log are normally used by 
>Mailman core's gunicorn (ie, the REST >API).  I'm not sure what effect 
>directing Mailman's error channel to error.log will have, but I suspect you 
>could end up losing logs or having text from different sources mixed.

So I should update my logging config. Do you have a good example or maybe even 
the dist?

--
Stephan Krinetzki

IT Center
Gruppe: Anwendungsbetrieb und Cloud
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23 
52074 Aachen
Tel: +49 241 80-24866
Fax: +49 241 80-22134
krinet...@itc.rwth-aachen.de
www.itc.rwth-aachen.de

Social Media Kanäle des IT Centers:
https://blog.rwth-aachen.de/itc/ 
https://www.facebook.com/itcenterrwth 
https://www.linkedin.com/company/itcenterrwth
https://twitter.com/ITCenterRWTH
https://www.youtube.com/channel/UCKKDJJukeRwO0LP-ac8x8rQ

-----Original Message-----
From: Stephen J. Turnbull <st...@turnbull.jp> 
Sent: Monday, August 4, 2025 1:51 PM
To: Krinetzki, Stephan <krinet...@itc.rwth-aachen.de>
Cc: Mark Sapiro <m...@msapiro.net>; mailman-users@mailman3.org
Subject: RE: [MM3-users] Re: Held messages not delivered after approval

Krinetzki, Stephan writes:

 > /opt/mailman/var/queue/bad:
 > -rw-rw----  1 mailman mailman  221723 Aug  1 17:26 
 > 1754061973.4885209+46b1ae3716439bf3ef98090296dfce0320fc3017.psv

This one might be spam, but it's weird that it managed to get pickled but can't 
be read.

 > -rw-rw----  1 mailman mailman   32912 Aug  2 00:00 
 > 1754085602.191851+3576cf33232db110fa7761233f67245564553652.psv
 > -rw-rw----  1 mailman mailman     416 Aug  2 00:00 
 > 1754085604.0204346+ad485da0c45cb0ad17a5dc42613c3eb3f313c20e.psv
 > -rw-rw----  1 mailman mailman 1407649 Aug  2 00:00 
 > 1754085623.275817+f23139c8127c454b4fe65453af3db18e558b0e87.psv
 > -rw-rw----  1 mailman mailman 1407634 Aug  2 00:02 
 > 1754085729.3529432+1643f907bac39a22a7d71e50b031c4f8a574082c.psv

I have no clue about these four (see below for comments on cron).

 > /opt/mailman/var/queue/out:

Looks normal for your configuration.

 > /opt/mailman/var/queue/shunt:

I don't understand why on August 1st you see shunts at intervals throughout the 
working day, then suddenly on the 2nd they all happen at midnight.

Have you tried "mailman unshunt"?  If not what happens when you do?
If the shunts are happening because of the restart, then they should go through 
on unshunt.  If they don't, there's some other problem.

You can also try renaming the .psvs to .pck, and check the metadata in the 
pickle for which queue to move it to.  That's more risky, and you shouldn't try 
it if the output of "mailman qfile" isn't as expected.

 > I don't see here a problem. But the timestamp seems to be related  > to the 
 > restart of mailman. Can I skip this in the logrotate?

As I mentioned before, there was (and may still be) a bug in Mailman's logging 
such that Mailman fails to reopen the logs, and typically after a couple of 
days you end up with a nameless open file collecting the logs and uselessly 
consuming more and more disk space.  The restart is intended to work around 
this problem.

 > Btw: The crontab is the following:
 > @daily mailman cd /opt/mailman; source 
 > /opt/mailman/mailman-venv/bin/activate; 
 > /opt/mailman/mailman-venv/bin/mailman digests --send > /dev/null 2>&1

The django-admin commands aren't directly related, I'm going to ignore them for 
now.  The only thing I know for *sure* runs at midnight daily is "mailman 
digests --send".  On my Debian Linode, the default (which I left alone) is for 
logrotate's cron job to live in /etc/cron.daily, which is run at 06:25 daily 
using "run-parts".  (This is quite a common setup on Linux.)  So we need to 
know where the logrotate job is specified (crontab, cron.d, or cron.daily) and 
at what time (@daily =
midnight) to be sure that the mailman restart is related to the bad and shunt 
queue files.

 > So i checked the mailman.log:
 >
 > [2025-08-01 00:00:02 +0200] [324558] [INFO] Shutting down: Master  > 
 > [2025-08-01 00:00:23 +0200] [567059] [INFO] Shutting down: Master  > 
 > [2025-08-01 00:00:42 +0200] [567206] [INFO] Shutting down: Master  > 
 > [2025-08-01 00:01:01 +0200] [567278] [INFO] Shutting down: Master  > 
 > [2025-08-01 00:01:34 +0200] [567379] [INFO] Shutting down: Master  > 
 > [2025-08-01 00:01:52 +0200] [567516] [INFO] Shutting down: Master  > 
 > [2025-08-01 00:02:11 +0200] [567646] [INFO] Shutting down: Master

That is not normal.  Your control process is crashing every 15-20 seconds.  I 
think it probably is a problem with the digests, not with the restart.  What 
appears to be happening is that the digest process gets triggered, it creates a 
message and queues it, then fails to send it so nastily that Mailman restarts 
(or stops and something like systemd restarts it).  On restart, Mailman finds 
the digest message (probably in the out queue), tries to send it again, crashes 
again, and eventually decides that isn't going to work, sends it to bad, and 
stops crashing.

There's normally lot more chatter at startup and shutdown, for example about 
runners being started.  That's probably because you have that redirected to a 
separate log file, or maybe that information doesn't get output with a log 
level of "warn".  Maybe the crash information is in the runner.log.

According to the config you posted earlier, you're sending most channels to 
separate log files.  Have you checked any of them other than mailman.log and 
smtp.log?  Also, note that httpd.log and error.log are normally used by Mailman 
core's gunicorn (ie, the REST API).  I'm not sure what effect directing 
Mailman's error channel to error.log will have, but I suspect you could end up 
losing logs or having text from different sources mixed.

I haven't thought about it carefully, but I would have separate logs for 
bounces, subscriptions, smtp, and nntp because they are quite separate.  
Everything else would go into mailman.log, because that makes it easier to 
trace a single message through the whole process.
Until you know that you don't need it, I would have most channels at the info 
level.  The debug level is almost never useful unless you're a developer trying 
to fix something (vs a troubleshooter trying to diagnose the problem).  The 
logs compress very well (often 70% reduction), so it's generally a good idea to 
include the extra information at info level.  Remember, the real explosion is 
logging is that outgoing mail gets logged up to 43k times per incoming post.  
Of course you can do quite a bit better if you can sacrifice the personalized 
footers, but most sites don't anymore because there are strict rules about 
convenience of unsubscription.

 > Well...i will stop the restart after the log rotate today.

You can do that if you want, but it's likely that you'll end up losing logs.

 > >And for every one of those shunted messages there should be an  > 
 > >>exception with traceback logged in mailman.log. Those tracebacks  > 
 > >>should be helpful.
 >
 > If there were any. Maybe the "debug" level should be "info". But  > for 
 > which logs?

Setting the channel to "debug" gives maximum verbosity, and unhandled 
exceptions are logged at "warn" or "error" level (maximum severity).

 > Maybe the restart at night after the lograte maybe the issue.

Not with Mailman bouncing up and down pretty much as fast as it can.
The restart can only account for one restart, the other 6 were caused by 
something else.

--
GNU Mailman consultant (installation, migration, customization)
Sirius Open Source    https://www.siriusopensource.com/
Software systems consulting in Europe, North America, and Japan

smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
Mailman-users mailing list -- mailman-users@mailman3.org
To unsubscribe send an email to mailman-users-le...@mailman3.org
https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/
Archived at: 
https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/message/37GVW7Y6IATQDXHYYIYWSFEPRLHQXAWI/

This message sent to arch...@mail-archive.com

[MM3-users] Re: Held messages not delivered after approval

Reply via email to