This one was pretty silly. On 2024-7-1, I performed some upgrades on the machine hosting the lists. I updated Debian as well as my installation of the SMTP server Haraka. After doing so, I went to some effort to manually verify that the lists were still working, and even fixed a preexisting minor issue that only affected me. However, this upgrade broke the aws CLI that I was using for CloudWatch alerts. I was in a hurry so I left it that way for the time being.
On 2024-7-16, for the first time since the upgrade, Haraka tried to auto-update the Public Suffix List in its own source code directory, which failed because I intentionally restricted the user running Haraka to read-only access to its source code. I’m not sure whether this auto-update behavior was somehow not happening before (it seems to have been added five years ago [1], and the previously-running version isn’t *that* old), or whether I dealt with this for the previous version and then forgot having done so. In any case, the auto-update failure promptly terminated the whole Haraka process. systemd did not restart it automatically (turns out that unlike launchd, systemd requires explicitly requesting auto-restart), so it stayed down. I didn’t have a CloudWatch alert for Haraka being down, and even if I had, I wouldn’t have gotten it because my CloudWatch alerts were still down (I’d gotten lazy about fixing them)). And as far as I can tell, nobody notified me manually until Random Internet Cat sent me a private Mastodon message on 7-22, and I didn’t even see that message until later because I wasn’t checking Mastodon. There were some ALT messages sent as a result of the downtime that reached my email inbox, but to make things worse, I wasn’t checking my inbox. (I recommend messaging me on Discord if you want to get my attention.) So I didn’t know the lists were down until I happened to check my email and notice the chatter, I think on 7-21. On 7-22 I fixed the issue (well, more like hacked around it) by giving the user write access to that directory, and started Haraka back up. Then I didn’t get around to writing up what happened until today. Also today, I fixed the CloudWatch alerts, added an alert for Haraka or other processes being down, set Haraka to auto-restart, performed additional upgrades, and enabled TLS support for both incoming and outgoing mail. Hopefully none of that breaks anything. As I’ve said in the past, I’m happy to continue hosting the lists, but you may get better uptime from an alternate service. In particular, I see that Janet Cobb is now trying to host eir own Mailman 3-based list. If that experiment goes well then perhaps the existing lists can be migrated over; it would be nice to have continuity of archives. Incidentally, I experimented with migrating to Mailman 3 all the way back in 2017, but people were asking for some customizations to be made [2] and I didn’t have the energy to keep working on it, so I just abandoned it. Maybe others will have more energy. :) - omd [1] https://github.com/haraka/haraka-tld/commit/c507a750c87dcc8bc771f864bbeafc8cb7d8b0f8 [2] https://www.mail-archive.com/agora-discussion@agoranomic.org/msg36383.html