Hi Chris! I think your todo list looks accurate.
On the question of cron jobs, here are the answers as we understand them upstream: What happens if the user runs two multiple cron jobs? Answer 0: probably nothing. "certbot renew" is designed to be run as often as you like, and is normally a no-op. Answer 1: with some small probability, the user might have two "certbot renew" commands that are executed at the same time. In that case, it would be fairly common for one or both of those to fail with an error, that would produce cron email. The baseline probability of this is collision is about once per 5,000 cert renewals if the hour in the user's cron job is uniform-random, and one in 200 cert renewals if they picked the same two hours (noon and midnight) that are baked into Debian's cron job. Answer 2: with some much smaller probability, two overlapping "certbot renew" commands could experience a race condition in writing cert lineage files in /etc/letsencrypt/archive, or symlinks in /etc/letsencrypt/live. These would cause a configuration problem (certs and privkeys don't match) about 75 - 80% of the time. I just measured this race condition window on a AWS tiny instance, and estimate that a cert writing race might happen about once every 36,000 cert renewals if the cron job hours line up, or once every 864,000 renewals if users have cron jobs at uniform-random hours. It is however possible that there are other race conditions in some of our plugins (apache, nginx) that are more likely to occur. We have a few mitigation options: Mitigation 0: write a patch to add locking to Certbot 0.10.2 / 0.10.3. This would add a new dependency on python-filelock, and we'd have to make a choice about how much field testing we want for this patch before SRUing it. Mitigation 1: change the cron job, which picks random times in the hours after noon and midnight, to a systemd timer that runs at two uniform- random hours, or a cron job that has two hours that are less likely to be chosen by sys admins. We can probably use LE serverside data to pick the two least common hours. Mitigation 2: study the plugin code to ensure that problematic race conditions are really as rare as we think. We could probably tolerate a temporary risk of failure that's one in a million cert renewals on the subset of systems which have two cron processes and where the admin ignored the notice about it -- hard disks fail faster than that. I think the upstream team favours mitigation 0 :) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1640978 Title: [SRU] Backport letsencrypt 0.9.3 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/python-acme/+bug/1640978/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs