>The django-admin commands aren't directly related, I'm going to ignore them >for now. The only thing I know for *sure* runs at midnight daily is "mailman >digests --send". On my Debian Linode, the default (which I left alone) is for >logrotate's cron job to live in >/etc/cron.daily, which is run at 06:25 daily >using "run-parts". (This is quite a common setup on Linux.) So we need to >know where the logrotate job is specified (crontab, cron.d, or cron.daily) and >at what time (@daily = >midnight) to be sure that the mailman restart is related to the bad and shunt >queue files.
The logrotate is executed by a system Timer (Rocky 9 OS btw) and is planned for: Tue 2025-08-05 00:00:00 CEST 7h left Mon 2025-08-04 00:00:00 CEST 16h ago logrotate.timer logrotate.service So every day at midnight. >That is not normal. Your control process is crashing every 15-20 seconds. I >think it probably is a problem with the digests, not with the restart. What >appears to be happening is that the digest process gets triggered, it creates >a message and queues it, then fails to >send it so nastily that Mailman >restarts (or stops and something like systemd restarts it). On restart, >Mailman finds the digest message (probably in the out queue), tries to send it >again, crashes again, and eventually decides that isn't going to work, sends >it to bad, >and stops crashing. I saw this but I don't have any idea how this happens. Currently there are ~42 Mails after 'mailman unshunt' and I think, mailman loops over them (queue doesn't get shorter). But mails are delivered for a lot of lists. >According to the config you posted earlier, you're sending most channels to >separate log files. Have you checked any of them other than mailman.log and >smtp.log? Also, note that httpd.log and error.log are normally used by >Mailman core's gunicorn (ie, the REST >API). I'm not sure what effect >directing Mailman's error channel to error.log will have, but I suspect you >could end up losing logs or having text from different sources mixed. So I should update my logging config. Do you have a good example or maybe even the dist? -- Stephan Krinetzki IT Center Gruppe: Anwendungsbetrieb und Cloud Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24866 Fax: +49 241 80-22134 krinet...@itc.rwth-aachen.de www.itc.rwth-aachen.de Social Media Kanäle des IT Centers: https://blog.rwth-aachen.de/itc/ https://www.facebook.com/itcenterrwth https://www.linkedin.com/company/itcenterrwth https://twitter.com/ITCenterRWTH https://www.youtube.com/channel/UCKKDJJukeRwO0LP-ac8x8rQ -----Original Message----- From: Stephen J. Turnbull <st...@turnbull.jp> Sent: Monday, August 4, 2025 1:51 PM To: Krinetzki, Stephan <krinet...@itc.rwth-aachen.de> Cc: Mark Sapiro <m...@msapiro.net>; mailman-users@mailman3.org Subject: RE: [MM3-users] Re: Held messages not delivered after approval Krinetzki, Stephan writes: > /opt/mailman/var/queue/bad: > -rw-rw---- 1 mailman mailman 221723 Aug 1 17:26 > 1754061973.4885209+46b1ae3716439bf3ef98090296dfce0320fc3017.psv This one might be spam, but it's weird that it managed to get pickled but can't be read. > -rw-rw---- 1 mailman mailman 32912 Aug 2 00:00 > 1754085602.191851+3576cf33232db110fa7761233f67245564553652.psv > -rw-rw---- 1 mailman mailman 416 Aug 2 00:00 > 1754085604.0204346+ad485da0c45cb0ad17a5dc42613c3eb3f313c20e.psv > -rw-rw---- 1 mailman mailman 1407649 Aug 2 00:00 > 1754085623.275817+f23139c8127c454b4fe65453af3db18e558b0e87.psv > -rw-rw---- 1 mailman mailman 1407634 Aug 2 00:02 > 1754085729.3529432+1643f907bac39a22a7d71e50b031c4f8a574082c.psv I have no clue about these four (see below for comments on cron). > /opt/mailman/var/queue/out: Looks normal for your configuration. > /opt/mailman/var/queue/shunt: I don't understand why on August 1st you see shunts at intervals throughout the working day, then suddenly on the 2nd they all happen at midnight. Have you tried "mailman unshunt"? If not what happens when you do? If the shunts are happening because of the restart, then they should go through on unshunt. If they don't, there's some other problem. You can also try renaming the .psvs to .pck, and check the metadata in the pickle for which queue to move it to. That's more risky, and you shouldn't try it if the output of "mailman qfile" isn't as expected. > I don't see here a problem. But the timestamp seems to be related > to the > restart of mailman. Can I skip this in the logrotate? As I mentioned before, there was (and may still be) a bug in Mailman's logging such that Mailman fails to reopen the logs, and typically after a couple of days you end up with a nameless open file collecting the logs and uselessly consuming more and more disk space. The restart is intended to work around this problem. > Btw: The crontab is the following: > @daily mailman cd /opt/mailman; source > /opt/mailman/mailman-venv/bin/activate; > /opt/mailman/mailman-venv/bin/mailman digests --send > /dev/null 2>&1 The django-admin commands aren't directly related, I'm going to ignore them for now. The only thing I know for *sure* runs at midnight daily is "mailman digests --send". On my Debian Linode, the default (which I left alone) is for logrotate's cron job to live in /etc/cron.daily, which is run at 06:25 daily using "run-parts". (This is quite a common setup on Linux.) So we need to know where the logrotate job is specified (crontab, cron.d, or cron.daily) and at what time (@daily = midnight) to be sure that the mailman restart is related to the bad and shunt queue files. > So i checked the mailman.log: > > [2025-08-01 00:00:02 +0200] [324558] [INFO] Shutting down: Master > > [2025-08-01 00:00:23 +0200] [567059] [INFO] Shutting down: Master > > [2025-08-01 00:00:42 +0200] [567206] [INFO] Shutting down: Master > > [2025-08-01 00:01:01 +0200] [567278] [INFO] Shutting down: Master > > [2025-08-01 00:01:34 +0200] [567379] [INFO] Shutting down: Master > > [2025-08-01 00:01:52 +0200] [567516] [INFO] Shutting down: Master > > [2025-08-01 00:02:11 +0200] [567646] [INFO] Shutting down: Master That is not normal. Your control process is crashing every 15-20 seconds. I think it probably is a problem with the digests, not with the restart. What appears to be happening is that the digest process gets triggered, it creates a message and queues it, then fails to send it so nastily that Mailman restarts (or stops and something like systemd restarts it). On restart, Mailman finds the digest message (probably in the out queue), tries to send it again, crashes again, and eventually decides that isn't going to work, sends it to bad, and stops crashing. There's normally lot more chatter at startup and shutdown, for example about runners being started. That's probably because you have that redirected to a separate log file, or maybe that information doesn't get output with a log level of "warn". Maybe the crash information is in the runner.log. According to the config you posted earlier, you're sending most channels to separate log files. Have you checked any of them other than mailman.log and smtp.log? Also, note that httpd.log and error.log are normally used by Mailman core's gunicorn (ie, the REST API). I'm not sure what effect directing Mailman's error channel to error.log will have, but I suspect you could end up losing logs or having text from different sources mixed. I haven't thought about it carefully, but I would have separate logs for bounces, subscriptions, smtp, and nntp because they are quite separate. Everything else would go into mailman.log, because that makes it easier to trace a single message through the whole process. Until you know that you don't need it, I would have most channels at the info level. The debug level is almost never useful unless you're a developer trying to fix something (vs a troubleshooter trying to diagnose the problem). The logs compress very well (often 70% reduction), so it's generally a good idea to include the extra information at info level. Remember, the real explosion is logging is that outgoing mail gets logged up to 43k times per incoming post. Of course you can do quite a bit better if you can sacrifice the personalized footers, but most sites don't anymore because there are strict rules about convenience of unsubscription. > Well...i will stop the restart after the log rotate today. You can do that if you want, but it's likely that you'll end up losing logs. > >And for every one of those shunted messages there should be an > > >>exception with traceback logged in mailman.log. Those tracebacks > > >>should be helpful. > > If there were any. Maybe the "debug" level should be "info". But > for > which logs? Setting the channel to "debug" gives maximum verbosity, and unhandled exceptions are logged at "warn" or "error" level (maximum severity). > Maybe the restart at night after the lograte maybe the issue. Not with Mailman bouncing up and down pretty much as fast as it can. The restart can only account for one restart, the other 6 were caused by something else. -- GNU Mailman consultant (installation, migration, customization) Sirius Open Source https://www.siriusopensource.com/ Software systems consulting in Europe, North America, and Japan
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-le...@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/ Archived at: https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/message/37GVW7Y6IATQDXHYYIYWSFEPRLHQXAWI/ This message sent to arch...@mail-archive.com