(Whoops, accidentally hit reply instead of reply all.)

On Thu, May 22, 2025, 22:19 David Benjamin <david...@chromium.org> wrote:

> On Thu, May 22, 2025, 20:48 Martin Thomson <m...@lowentropy.net> wrote:
>
>> On Fri, May 23, 2025, at 06:17, Jeremy Harris wrote:
>> > Bad programming.
>>
>> I might not have said this, but I was going to say something
>> approximately like what you said afterwards.
>>
>> I don't think that you can fault the application, but you can interpret
>> the close differently at the TLS layer.  That is, as you suggest, that the
>> TLS layer can take the close under advisement and do what is necessary to
>> follow that instruction.  Which might involve sticking around long enough
>> to complete the handshake, even if the application doesn't need it.
>>
>
> Yes, I said as much in the email:
>
> > So I think the cleanest way out of this is to say that even if the
> server has nothing more to read or write on the connection at the
> application level, the server MUST still drive the handshake to completion
> in full or the response may not go through.
>
> However, this is not obvious because we failed to write it down. And thus
> the email. I don't write these to say the application is right. I write
> them to say that I observed it, spent some time figuring out what happened,
> and I thought it was worth making a note of it. Even if wrong, this is a
> natural way for folks to interpret the protocol stacks we define and, when
> they do this, you get this subtle failure mode, so it is worth generally
> spreading awareness.
>
> Sure, once the situation is described and the failure diagnosed, one can
> probably form opinions about what should have happened. But dismissing this
> as an application bug is kind of missing the point.
>
> Indeed if you explore what *should* have happened, you'll find there are
> architectural reasons why these sorts of things tend to happen. More below.
>
> Interestingly, this is exactly what the TCP stack does.  David mentioned
>> that a retransmission of already-received data doesn't cause the server to
>> send a RST.  Why is that?  If the server dropped all state, a PSH really
>> would induce a RST in response.  It's because the TCP stack really does
>> maintain some state, so that it can recognize a PSH that can be ignored
>> (one with a sequence number between the initial and final values for the
>> closed connection) from one that does need a RST (one outside of that
>> narrow range).
>>
>
> Yup. And there's a crucial difference architecturally in how these
> protocols are typically deployed. TCP typically lives in the kernel. This
> extra bit of state outlives even your process. The application gets to
> pretend the connection stops existing and forget about it afterwards. This
> pretense is false as we all know, but it is how applications are typically
> architected.
>
> Any kind of long-lived state above TCP typically does not get to play this
> game. And so long-lived shutdown processes, like this one which we did not
> write down, and close_notify which we did write down, tend not to fail in
> practice because applications often are not architected to accommodate
> this. Indeed close_notify has really not worked in practice. HTTPS clients
> don't enforce it because HTTPS servers don't reliably send it.
>
> A story: a long time ago, I accidentally caused Chrome to enforce
> close_notify in the course of other exchanges. As soon as it reaches beta,
> something broke and I had to switch it back. What broke? IANA's website.
>
> Are those applications correct? No, clearly not! close_notify is in the
> spec and even provides some important truncation protections in
> EOF-delimited protocols, including some modes of HTTP/1.x. But it hasn't
> worked and if even IANA couldn't deploy it, I'm not going to throw stones.
> It is useful to think about why, even though, yes, there is no fundamental
> difference here at the protocol level. And indeed given close_notify's
> failure we probably should not build EOF-delimited protocols. (Happily
> protocols tend to move away from making EOF significant anyway as they grow
> connection reuse and whatnot. Even in HTTP/1.x, you can avoid it.)
>
> Of course, this presumes a particular architecture, in which the TLS stack
>> is a layer on top of TCP.  It's not a model I favor any more.  I prefer the
>> SSLEngine one, where TLS is kept to the side, with the application driving
>> the TCP bits.  In that case, this gotcha becomes an application problem.
>> Maybe the simple application in the example isn't going to do that, so it's
>> fine to think of this as being strictly layered in software, but it's worth
>> acknowledging our assumptions.
>>
>
> Yup. I also prefer the SSLEngine model. I've found that, even when your
> TLS stack doesn't look like SSLEngine, application authors tend to make
> assumptions about the mapping between TLS and transport I/O anyway. After
> all, the natural assumption (that TLS is just a filter after app data
> flows) is *basically* true. It fails mostly in weird edge cases that may
> not be apparent to folks who aren't experts in the protocol.
>
> These assumptions often show up in code because the POSIX-style
> non-blocking I/O model assumes that something on the outside is driving the
> event loop and TLS stacks tend to be far enough down the tower that they
> can't post to the event loop themselves. SSLEngine-style APIs make this
> assumption even more explicit.
>
> And thus this burden goes to the application and the application developer
> probably is not thinking of these details. They certainly won't if we
> didn't even think to write it down.
>
> Thus my hope is that writing about these will help in some small way
> towards these kinds of things being in folks' general awareness. We
> ultimately care about running code and it is useful to understand the
> architecture of running code in practice.
>
> David
>
>>
_______________________________________________
TLS mailing list -- tls@ietf.org
To unsubscribe send an email to tls-le...@ietf.org

Reply via email to