By way of introduction, my perspective is primarily that of an ACME 
client developer, so you'll notice my bias toward simpler client 
implementations as much as possible. However, I also am a web server 
developer (the Caddy Web Server), so I can also appreciate the concerns 
of server developers.



First, thanks to Roland and Jacob for submitting such a well-crafted proposal. 
It is easy to read, understand, and it is mindful of certain complexities and 
unknowns that will need further discussion.



The proposal suggests two problems that it attempts to solve:

1. Notifying subscribers of impending revocations

2. Scheduling regular certificate renewals



I do think both of these can be problems, but I am not sure if this proposal -- 
or any ACME extension, for that matter -- is the best solution to them.





## Impending revocations



In terms of trust, what is the difference between knowing a certificate is 
going to be revoked soon, and a certificate that is already revoked? In a 
binary sense, if you know a certificate is going to be revoked, it's as good as 
revoked. Why should you continue to trust a certificate when the CA already 
knows it shouldn't continue to be trusted?



The proposal treats this endpoint as non-confidential, so we can assume the 
CA-suggested renewal windows are public information, just as OCSP responses 
are. Given that some vendors are already shipping their own revocation lists to 
their clients ahead of CRLs, it's quite likely that some relying parties may 
even use the proposed endpoint to get ahead of OCSP and CRLs and apply its 
information toward a trust decision.



Fundamentally, the proposed extension isn't too different from OCSP already: 
it's a (signed? unsigned?) response from the CA that tells you whether the 
certificate is still believed to be trustworthy.



Before going too deep into implementation details, I think the philosophical 
paradox this proposal introduces should be resolved.





## Scheduling certificate renewals



I have written a lot of code that renews certificates. The proposal mentions 
that there are two main ways to schedule certificate renewals: 1) run a 
timer/cron at static intervals, or 2) choose a renewal time based on the 
certificate's actual NotBefore and NotAfter dates. I would add at least a third 
way, which is what Caddy/CertMagic does: 3) scan all managed certificates at 
short, frequent intervals, and if a certificate's lifetime is N% spent, 
initiate a renewal right then. This is similar to (2) mentioned in the 
proposal, but with a subtle difference: it's much simpler in that it doesn't 
require setting a timer or scheduling each certificate individually, but you 
still get the benefits of (2) and no downsides of (1). Method (3) also does not 
require sleeping/making reservations, which is difficult to preempt.



The downside that the proposal seems concerned with is "load clustering 
for the issuing CA" -- I read that as "thundering herd"-type problems. This is 
obviously a problem with (1), but for methods (2) and (3):



1. Staggering the start of ACME clients should disperse this load naturally. In 
other words, not all ACME clients will start their poller/scanning routine at 
the same time if they are duration/interval-based. Clients should avoid using 
wall-clock times like "minute 30" or "hour 12" for the same reasons (1) should 
be avoided.



2. As certificate lifetimes get shorter, the herds will thunder no matter how 
staggered they are.



If the problem of load clustering is really the crux of this, then is it
 possible for ACME servers to reply with a Retry-After header on 
existing endpoints if they are getting overwhelmed?





## Optional extension



This extension is very helpful for attentive, responsible clients. But for ACME 
clients that are... I'll say "minimally implemented"... they may not take 
advantage of this endpoint, and unfortunately, it's those clients which will 
need it the most.





## OCSP stapling sorta works



For the record, a case study: Caddy/CertMagic wasn't impacted by the recent 
Let's Encrypt revocation event because it attempts certificate renewal 
immediately upon discovering a "Revoked" OCSP status. (It staples OCSP to all 
certificates by default, caches the responses to disk, and keeps them refreshed 
about 1/2way through their lifetime.) When this happens, it does not staple 
that response to the current certificate -- which keeps its current Valid 
response for ~3 more days, while CertMagic attempts renewal. After renewal 
succeeds, the certificate is replaced, with a fresh new OCSP staple of course. 
No relying party ever sees a Revoked certificate, even with immediate 
revocation.



The point is, I think existing infrastructure can work for this problem.





## Vision








In my opinion, the burden is on the clients to just be a little more 
fault-tolerant. They should staple OCSP responses. They should do so 
conservatively. They can call `renewCert()` when they see a Revoked response.



Ultimately, revocation is just the means to an end: short certificate lifetimes.
_______________________________________________
Acme mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/acme

Reply via email to