(Whoops, accidentally hit reply instead of reply all.) On Thu, May 22, 2025, 22:19 David Benjamin <david...@chromium.org> wrote:
> On Thu, May 22, 2025, 20:48 Martin Thomson <m...@lowentropy.net> wrote: > >> On Fri, May 23, 2025, at 06:17, Jeremy Harris wrote: >> > Bad programming. >> >> I might not have said this, but I was going to say something >> approximately like what you said afterwards. >> >> I don't think that you can fault the application, but you can interpret >> the close differently at the TLS layer. That is, as you suggest, that the >> TLS layer can take the close under advisement and do what is necessary to >> follow that instruction. Which might involve sticking around long enough >> to complete the handshake, even if the application doesn't need it. >> > > Yes, I said as much in the email: > > > So I think the cleanest way out of this is to say that even if the > server has nothing more to read or write on the connection at the > application level, the server MUST still drive the handshake to completion > in full or the response may not go through. > > However, this is not obvious because we failed to write it down. And thus > the email. I don't write these to say the application is right. I write > them to say that I observed it, spent some time figuring out what happened, > and I thought it was worth making a note of it. Even if wrong, this is a > natural way for folks to interpret the protocol stacks we define and, when > they do this, you get this subtle failure mode, so it is worth generally > spreading awareness. > > Sure, once the situation is described and the failure diagnosed, one can > probably form opinions about what should have happened. But dismissing this > as an application bug is kind of missing the point. > > Indeed if you explore what *should* have happened, you'll find there are > architectural reasons why these sorts of things tend to happen. More below. > > Interestingly, this is exactly what the TCP stack does. David mentioned >> that a retransmission of already-received data doesn't cause the server to >> send a RST. Why is that? If the server dropped all state, a PSH really >> would induce a RST in response. It's because the TCP stack really does >> maintain some state, so that it can recognize a PSH that can be ignored >> (one with a sequence number between the initial and final values for the >> closed connection) from one that does need a RST (one outside of that >> narrow range). >> > > Yup. And there's a crucial difference architecturally in how these > protocols are typically deployed. TCP typically lives in the kernel. This > extra bit of state outlives even your process. The application gets to > pretend the connection stops existing and forget about it afterwards. This > pretense is false as we all know, but it is how applications are typically > architected. > > Any kind of long-lived state above TCP typically does not get to play this > game. And so long-lived shutdown processes, like this one which we did not > write down, and close_notify which we did write down, tend not to fail in > practice because applications often are not architected to accommodate > this. Indeed close_notify has really not worked in practice. HTTPS clients > don't enforce it because HTTPS servers don't reliably send it. > > A story: a long time ago, I accidentally caused Chrome to enforce > close_notify in the course of other exchanges. As soon as it reaches beta, > something broke and I had to switch it back. What broke? IANA's website. > > Are those applications correct? No, clearly not! close_notify is in the > spec and even provides some important truncation protections in > EOF-delimited protocols, including some modes of HTTP/1.x. But it hasn't > worked and if even IANA couldn't deploy it, I'm not going to throw stones. > It is useful to think about why, even though, yes, there is no fundamental > difference here at the protocol level. And indeed given close_notify's > failure we probably should not build EOF-delimited protocols. (Happily > protocols tend to move away from making EOF significant anyway as they grow > connection reuse and whatnot. Even in HTTP/1.x, you can avoid it.) > > Of course, this presumes a particular architecture, in which the TLS stack >> is a layer on top of TCP. It's not a model I favor any more. I prefer the >> SSLEngine one, where TLS is kept to the side, with the application driving >> the TCP bits. In that case, this gotcha becomes an application problem. >> Maybe the simple application in the example isn't going to do that, so it's >> fine to think of this as being strictly layered in software, but it's worth >> acknowledging our assumptions. >> > > Yup. I also prefer the SSLEngine model. I've found that, even when your > TLS stack doesn't look like SSLEngine, application authors tend to make > assumptions about the mapping between TLS and transport I/O anyway. After > all, the natural assumption (that TLS is just a filter after app data > flows) is *basically* true. It fails mostly in weird edge cases that may > not be apparent to folks who aren't experts in the protocol. > > These assumptions often show up in code because the POSIX-style > non-blocking I/O model assumes that something on the outside is driving the > event loop and TLS stacks tend to be far enough down the tower that they > can't post to the event loop themselves. SSLEngine-style APIs make this > assumption even more explicit. > > And thus this burden goes to the application and the application developer > probably is not thinking of these details. They certainly won't if we > didn't even think to write it down. > > Thus my hope is that writing about these will help in some small way > towards these kinds of things being in folks' general awareness. We > ultimately care about running code and it is useful to understand the > architecture of running code in practice. > > David > >>
_______________________________________________ TLS mailing list -- tls@ietf.org To unsubscribe send an email to tls-le...@ietf.org