Hi all, As we’ve been using TLS 1.3 in more scenarios, we’ve encountered some interesting interactions with TCP. We thought we’d document these and send a note here. In general, we've found that TLS implementations need to be wary of post-handshake messages and “unexpected” transport writes. This unfortunately also includes some server handshake alerts.
TLS APIs First, some background on APIs for TLS libraries. TLS is often deployed “transparently” underneath a TCP-based protocol. HTTPS sandwiches TLS between HTTP and TCP, etc. By and large, reads and writes over TLS are one-to-one with reads and writes over TCP. TLS APIs and callers can subtly rely on this. Some libraries expose an interface like the POSIX sockets API, including non-blocking behavior. If the transport is blocked on I/O, this is surfaced as an error for the caller to retry later. Importantly, the library cannot drive transport I/O on its own. The caller must drive the operation to completion. Other APIs transform bytes and leave I/O to the application. Any TCP writes triggered by TLS reads and vice versa are even more directly part of the API surface. In contrast, sometimes the TLS library can drive I/O itself. For example, a Go TLS implementation can do background work in a goroutine. Also note that libraries may predate TLS 1.3, but now enable TLS 1.3 by default. Those libraries must ensure callers written against TLS 1.2 work in TLS 1.3. Post-handshake messages and flow control TLS 1.2 and TLS 1.3 both have post-handshake messages, but TLS 1.2 only uses them for renegotiation, which is rare and often disabled. TLS 1.3 has post-handshake NewSessionTickets. A server will typically send tickets immediately after the handshake We initially treated NewSessionTicket as an extra flight in the server handshake. After receiving the client Finished, the server would write NewSessionTicket and then signal handshake completion. This kept tickets working in unmodified server callers. However, this can lead to a deadlock in some cases. A typical HTTP/1.1 client will first write its request and, only when this is complete, read the response. If the write exceeds the transport buffer, it will not complete, and thus the client will not read, until after the server starts reading. An HTTP/1.1 server caller knows to read first, but only after the handshake completes. If NewSessionTicket also exceeds the transport buffer, this strategy means the server won’t complete the handshake until the client starts reading. Thus the connection deadlocks. While these messages usually fit in transport buffers, we don’t like systems with invisible cliffs, particularly deadlocks. Some TLS implementations embed client certificates in tickets, which can make them large. Additionally, mock transports in tests sometimes use artificially small buffers. Recommendation: We switched to deferring NewSessionTicket to the first application write by default. Server callers which wish to flush them earlier may, but they should not block normal I/O on it. TLS implementations which can drive transport I/O themselves may be able to instead write them in the background after the handshake. Note, however, the discussion on “Client-write-only protocols” below. Separately, in case the server does not do this, TLS 1.3 client implementations should eagerly read from the socket after the handshake, even if the caller isn’t expecting application data. However, this is only possible at a layer which can drive I/O itself. We implement this in Chromium’s abstractions over BoringSSL, but cannot do so in BoringSSL itself. Likewise, while they are unlikely to exceed the transport buffer, TLS libraries should defer KeyUpdate acknowledgements to the next application write, possible from the KeyUpdate tweaks <https://mailarchive.ietf.org/arch/msg/tls/cfw4paCGxI7Fj8QNmj6k1I66VII/> made early on. 0-RTT and flow control There is a similar effect in 0-RTT. The client writes the ClientHello and early data. The server responds with ServerHello..Finished. Depending on I/O strategy, implementations may hit a similar deadlock if the client won’t read the ServerHello flight until it has written its early data, but the server won’t read early data until it has written the ServerHello flight. Some factors make this deadlock less of a concern than NewSessionTicket: - 0-RTT is new as of TLS 1.3 and should not be enabled by default. 0-RTT clients are already expected to handle extra cases such as 0-RTT rejects and replayability. That means libraries can impose extra requirements or introduce APIs for these I/O patterns. - The ServerHello flight does not contain a certificate. It is more likely to fit in transport buffers and has more-or-less fixed size. - For HTTP, RFC8470 only sends GETs over early data by default, which are smaller than POSTs and more likely to fit in transport buffers. But this is another invisible cliff in the system. Recommendation: 0-RTT clients should eagerly read from the connection, even if the application isn’t expecting data yet. This avoids this deadlock and opportunistically confirms the handshake sooner, so more data is sent over 1-RTT. Note this must be done at a layer which can drive transport I/O itself. Client certificate errors and TCP resets If a server rejects a client certificate, it should end an alert, so the client can react accordingly. The client may display an error to the user, clear a cache of certificate decisions, or prompt the user to select a different certificate. TLS 1.2 has a two round-trip handshake: ClientHello --------> ServerHello Certificate* ServerKeyExchange* CertificateRequest* <-------- ServerHelloDone Certificate* ClientKeyExchange CertificateVerify* [ChangeCipherSpec] Finished --------> [ChangeCipherSpec] <-------- Finished The server has a handshake flight after the client certificate. If it rejects the certificate, the TLS implementation will write an alert and then report failure, at which point the caller will close the socket and discard the connection. From the client’s perspective, this alert comes instead of ChangeCipherSpec/Finished, during the handshake. It processes the alert and cleanly fails the handshake, before any application data flows. TLS 1.3 reduces the handshake to one round-trip: ClientHello --------> ServerHello {EncryptedExtensions} {CertificateRequest*} {Certificate*} {CertificateVerify*} {Finished} <-------- [Application Data*] {Certificate*} {CertificateVerify*} {Finished} --------> There is no server flight after the client certificate. If the server rejects it, it will again write an alert and report failure. However, now the client receives it instead of the first application data record. This is a behavior change to callers, who now must handle client certificate errors out of read as well as connect. Moreover, in a client-speaks-first protocol, the error now comes after the client has already sent its request. This is not only a behavior change but makes it unreliable over TCP. TCP sees: 1. Client: write(ClientHello); 2. Server: read(ClientHello); write(ServerHello..Finished); 3. Client: read(ServerHello..Finished); write(Certificate..Finished); 4. Server: read(Certificate..Finished); write(bad_certificate); close(); 5. Client: write(“GET / ...”); read(???); Note (4) and (5) happen in parallel. Ideally ??? would be a bad_certificate alert, but it is sometimes a TCP reset. I’m not a TCP expert, but I believe this is because the client writes data (“GET / ...”) the server never consumes. If it arrives at the server TCP stack before close(), the socket is closed with unread data. If it arrives after close(), the socket receives data after close(). TCP appears to consider either condition an application protocol error and triggers a reset shortly after sending the alert. If the client consumes the alert before its TCP stack sees the reset, the alert gets through. Otherwise, TCP will not reliably deliver the alert. Receive buffers may be cleared, data isn’t retransmitted, etc. This is particularly pronounced on loopback. Note, if TCP did not reset, we’d deadlock other scenarios. Suppose the client request did not fit in transport buffers. The client would not read until that is flushed, but the server will never ACK it. The client would then never progress to the alert and get stuck. By resetting, TCP interrupts large client writes with some error, albeit the wrong one. Recommendation: We do not have a good answer here. The deadlock scenario means we cannot hope to reliably deliver alerts unless the client eagerly reads as above. But servers cannot rely on clients to do this, and this is not sufficient because of the TCP reset. It seems the only fix is for the server to keep the connection alive for some time after the failure, maybe draining some bytes from the application, with some limit before giving up and resetting if the client seems to be writing a lot of data without ever reading. This would need to be quite up the stack. We have not implemented this. TLS False Start (RFC7918) exposes much of the same issues, but, in TLS 1.3, this flow is not optional. Clients cannot even choose to pay a round-trip to restore the TLS 1.2 flow because, in the successful case, there is nothing to wait for. One could imagine an extension that adds an optional server flight, but a round-trip to fix an error condition is an unsatisfying trade-off. Clients could also consider TCP resets to be potential client certificate errors. This is also unsatisfying as TCP resets are unauthenticated and may have other causes. Client-write-only protocols Edge cases may have other unexpected writes. Consider a protocol where the server never writes, and thus the client never reads. TLS 1.3 introduces server NewSessionTicket messages, so we again trigger the deadlock above. If the client further shuts down the read half of the connection, the NewSessionTicket message will also trigger the TCP reset behavior above. Recommendation: Don’t do this. If you must, either the client must read anyway to pick up the ticket, or the server must not send tickets. A TLS server library probably should default to deferring tickets to application write, which would do the latter. Note this means such protocols don’t get resumption. Half-RTT data We haven’t done much with half-RTT data outside of 0-RTT connections, but half-RTT may risk similar issues. Half-RTT data in a client certificate connection is sent before the server learns the client identity. That means the connection is writable before all its properties are established, which is an awkward API. One might think to avoid this state by configuring half-RTT data ahead of time for the library to write during the handshake, immediately after ServerHello..Finished. A half-RTT HTTP/2 SETTINGS frame doesn’t need a streaming API, and this avoids exposing the incomplete state to the caller. However, this also risks flow control issues, depending on sizes and I/O patterns. If both half-RTT data and client Certificate..Finished are too large, this design has another flow control deadlock: the server will not read client Certificate..Finished until writing half-RTT data, and the client will not read half-RTT data until it has written its flight. Recommendation: Sadly, it seems half-RTT APIs need to be more complicated than this. Hopefully this is helpful to folks. David
_______________________________________________ TLS mailing list TLS@ietf.org https://www.ietf.org/mailman/listinfo/tls