[Acme] Re: Paul Wouters' Discuss on draft-ietf-acme-ari-07: (with DISCUSS and COMMENT)

Aaron Gable Wed, 26 Feb 2025 14:17:10 -0800

Hi Paul,

Apologies for the delay, I had a very busy beginning of the year. I'm now
getting to these in preparation for IETF 122. I have incorporated these
comments into the working copy
<https://github.com/aarongable/draft-acme-ari/pull/94> (from which I will
publish a new version soon), and have responded inline below.

Thanks for your comments!

On Mon, Jan 6, 2025 at 12:36 PM Paul Wouters via Datatracker <
nore...@ietf.org> wrote:

> Paul Wouters has entered the following ballot position for
> draft-ietf-acme-ari-07: Discuss
>
> ----------------------------------------------------------------------
> DISCUSS:
> ----------------------------------------------------------------------
>
> Thanks for this document. It can be a useful extension but I do have some
> issues I would like to discuss / clarify
>
>         Query the renewalInfo resource to get a suggested renewal window.
>         Select a uniform random time within the suggested window.
>         If the selected time is in the past, attempt renewal immediately.
>
> This seems to skew to "now" which would only cause the ACME server more
> load
> than without this extension (one GET and one actual renewal). Why not let
> the
> client select a uniform random time between "now" and "end" if "now" >
> "start" ?
>

In general, there are three kinds of ARI response:
- Entirely in the past, meaning the client should attempt renewal
immediately.
- Entirely in the future, meaning the client should schedule renewal within
that window.
- Overlapping the current instant.

If windows of the third kind were treated as "pick any time between now and
the end of the window", then actual renewal times within such windows would
skew strongly towards the end of the window. That's not a desirable
property. Of course, you can't mitigate that fully -- if the first time a
client ever checks ARI is already past the beginning of its suggested
window, there's nothing you could have done to get them to renew during
that missed period. But having the client renew immediately prevents the
distribution from skewing *even further* towards the tail end of the window.

There's a second advantage: simplicity. The algorithm as described has only
one real branch point: whether the randomly-selected timestamp is in the
past or future. All windows can be treated the same by the client, no need
to special-case the random selection logic based on where start and end
timestamps fall relative to local time.

>         it indicates both the earliest time and a target time.
>
> It is really not the "earliest time" because an ACME server isn't going to
> refuse it? I would rewrite this to just say "it indicates the desired
> target
> time".
>

An ACME server absolutely may refuse it! Neither this draft nor the
original RFC 8555 places restrictions on the server's ability to rate-limit
requests. For example, Let's Encrypt rate-limits requests coming for each
distinct origin IP, regardless of the target endpoint or content of those
requests. A client retrieving ARI info in a tight loop could easily hit
those limits.

That said, I'm happy to update the language used here. I've brought it more
in line with RFC 7231, saying that the header indicates "not just the
minimum but the desired amount of time that the client is asked to wait
before issuing another request".

> This does bring up a point of concern. Clients who do not implement this
> have an advantage on an overloaded server compared to clients who do
> implement this. For example, let's say some new industry certification
> license says "certifiates MUST always be valid for at least two
> more weeks".  Wouldn't it make more sense for the server to check the
> "urgency" of the client request and when (too) busy, start rejecting to
> renew those with plenty of lifetime left?
>

I'm not sure I follow. You're proposing a hypothetical in which some
certificates must be replaced at least two weeks before they would expire,
and therefore some renewal requests are more urgent than others? To be
honest, I'm not particularly concerned about this situation -- most ACME
servers are operating in homogeneous PKIs, more subject to the rules of
that PKI's relying parties than of that PKI's subscribers. That said, if an
ACME server is aware of this situation, the fix is simple: suggest an
earlier default renewal window for such certificates. Then all renewal
requests will be pushed towards being equally urgent.

> I am also not sure about the argument of revocation for timing. Either
> the owner of the cert to be revoked knows this and is already in the
> process of replacing the cert/private key, and it wants to get a new
> cert issued "now", or it is completely unaware, and most likely whatever
> caused the leak of the current cert/private key, would also leak this
> renewed one. I don't think an ACME server can help with either cases by
> issuing shorter calls to renew. These would also be certificate specific
> and I understood this unauthenticated extension to be generic and based
> on load, and not on specific individual certificate issues.
>

There are many causes for revocation beyond key compromise.

Perhaps the most impactful, and one of the original impetuses for the
creation of this spec, is CA misissuance. If a CA discovers that some large
population of its certificates have been misissued and need to be revoked,
it needs to communicate that information to its subscribers quickly so that
they can get replacement certificates issued. ARI is that communication
mechanism. If those subscribers are polling ARI, their clients will
automatically replace their certificates without the domain operator even
needing to know that something untoward happened.

Another example is preventing malicious denial of service. Within ACME, a
certificate can be revoked by a subscriber simply by proving that they
control the identifiers within that certificate, whether or not they were
the subscriber which initially requested its issuance. If a malicious actor
gains temporary control of a domain (e.g. via a BGP hijack) they can revoke
your certificate for that domain, causing your legitimate customers to see
browser interstitial warnings. If your client is checking ARI, it will
quickly become aware of this situation and replace the certificate without
your intervention.

>         Clients SHOULD set reasonable limits on the their checking
> interval. For
>         example, values under one minute could be treated as if they were
> one
>         minute, and values over one day could be treated as if they were
> one
>         day.
>
> This really does violate the "compliant clients MUST" clause :)
>

I don't follow -- the "conforming clients MUST" clause you comment on below
is in regard to when the client must attempt renewal; this statement is in
regard to when the client should re-check ARI.

> Security Considerations:
>
>         This document specifies that renewalInfo resources MUST be exposed
>         and accessed via unauthenticated GET requests, a departure from
>         RFC8555's requirement
>
> Where does it specify this, other then right here? This specification
> should be
> outside the Security Considerations section. What I can find is:
>
>         To request the suggested renewal information for a certificate,
>         the client sends a GET request to a path under the server's
>         renewalInfo URL.
>
> Maybe a sentence can be added there that this GET request is
> unauthenticated, so
> that an implementer does not accidentally send credentials of any kind?
> Maybe
> even say that a server MUST reject any attempted authorized connections for
> renewalInfo to ensure such badly implemented clients cannot prosper ?
>

Good point, I've added the adjective "unauthenticated" in Section 4.1 where
the GET request is first introduced. I've also rephrased to remove the word
"MUST" from the Security Considerations section, as you're absolutely right
that that section should not include normative requirements. That said,
note that within ACME the alternative would not be "GET with some
authentication headers set", the alternative would be "RFC 8555 POST-as-GET
with JWS-based authentication". This protocol is explicitly eschewing
POST-as-GET, not eschewing (e.g.) bearer tokens.

> Perhaps also a clarifying sentence can be added along the lines of:
>
>         If an on-path attacker would force ACME clients to postpone renewal
>         indefinately, a properly implemented client would ignore these
> when the
>         lifetime of its certificate becomes critically low (eg 7 days ?).
>
> I also feel this belongs more in Section 4.3.2 with some concrete advise to
> implementers.
>

I feel that accounting for on-path attackers is outside the scope of this
document, as per RFC 8555 Section 6.1 all requests to an ACME server must
be made over HTTPS / TLS. The RFC 8555 Security Considerations section
already discusses the security of the client<->server channel, and this
document does not meaningfully change the scope of threats to that channel.

> As for the last paragraph in the Security Considerations, it seems to
> specify
> specific server behaviour that belongs in the formal specification instead
> of
> as security example. If we look at the protocol requirement of the server
> to
> tell the client "renew now", why not define this by either using a
> timestamp of
> unix time 0 (eg 1970) or by introducing a third keyword along the "start"
> and
> "end" in the suggestedWindow property, eg "fetch-now": "recommended" ?
> Using
> some kind of fake time seems like a poor hack for a protocol, as the text
> in
> the security considerations already admits to (but then tries to band-aid
> the
> client)
>
> Again, I feel this belongs in the base document specification and not in
> the
> Security Consideration section.
>

I refer to my paragraph above about the simplicity of the algorithm: the
client doesn't have to care about the timestamps contained in the message,
it just has to pick a time between them and *then* begin caring about what
time was picked. I feel that including special tombstone values (such as
the unix epoch) is ripe for introducing parsing errors or other edge-case
bugs within clients. Similarly, having two different sets of fields
(start/end vs fetch-now) which only make sense when populated separately is
asking for confusion: what should a client do if a server accidentally
populates all three fields? I believe the simplicity of this protocol is
one of its strengths, and that introducing more fields and more logic
decision points will make both server and client implementation harder, not
easier.

> ----------------------------------------------------------------------
> COMMENT:
> ----------------------------------------------------------------------
>
>         Conforming clients MUST attempt renewal
>
> I find this a bit weasel wording. How about:
>
>         Clients SHOULD attempet renewal
>
> Clearly, a client can have some overriding local policy concern that
> trumps the
> ACME servers
>

Done.

>         The keyIdentifier field of the certificate's AKI extension has the
>         hexadecimal bytes
>         69:88:5B:6B:87:46:40:41:E1:B3:7B:84:7B:A0:AE:2C:DE:01:C8:D4 as its
>         ASN.1 Octet String value. The base64url encoding of those bytes is
>         aYhba4dGQEHhs3uEe6CuLN4ByNQ=
>
> There seems to be an endian swap in here? Perhaps this text should be
> clarified?
> The same for the the certificate's Serial Number field in the next
> paragraph.
>

Could you be more specific? Do you mean that you believe there's an
endianness swap between the hex bytes and the base64 string? Or between the
values in the Appendix A certificate and the hex bytes here? Multiple
active implementations (including simply `openssl x509 -noout -text -in
appendix_a.pem | grep -A1 "Authority Key Identifier" | tail -n 1 | xxd -r
-p | base64`) agree on the value aYhba4dGQEHhs3uEe6CuLN4ByNQ= being the
correct base64-encoding for the Appendix A keyIdentifier.

> Maybe instead of:
>
>         GET https://example.com/....
>
> Use:
>
>         GET https://acme-server.example.com/.....
>
> Similar for the explanationURL value.
>

Good point, done.

>         Clients MUST stop checking RenewalInfo after a certificate is
> expired.
>
> I would stay "MUST skip checking RenewalInfo after a certificate is
> expired and
> immediately request a renewal."
>

I didn't want to conflate these two concepts -- maybe the client doesn't
want to renew the certificate, and has let it expire on purpose. I've taken
this comment and the one below as impetus to rephrase this paragraph, and I
think you'll like the result better.m

        Clients MUST stop checking RenewalInfo after they consider a
>         certificate to be replaced (for instance, after a new certificate
>         for the same identifiers has been received and configured).
>
> I would also avoid the "MUST stop" construct here. Perhaps:
>
>         RenewalInfo MUST NOT be attempted for any certificate that has been
>         replaced (for instance, after a new certificate for the same
> identifiers
>         has been received and configured)
>

Good point. I haven't quite taken this suggestion verbatim, but have
instead rephrased both of these sentences to: "Clients MUST NOT check a
certificate's RenewalInfo after the certificate has expired. Clients MUST
NOT check a certificate's RenewalInfo after they consider the certificate
to be replaced (for instance, after a new certificate for the same
identifiers has been received and configured)."

Thanks again,
Aaron

_______________________________________________
Acme mailing list -- acme@ietf.org
To unsubscribe send an email to acme-le...@ietf.org

[Acme] Re: Paul Wouters' Discuss on draft-ietf-acme-ari-07: (with DISCUSS and COMMENT)

Reply via email to