[TLS] [Editorial Errata Reported] RFC9147 (8100)

2024-09-12 Thread RFC Errata System
The following errata report has been submitted for RFC9147,
"The Datagram Transport Layer Security (DTLS) Protocol Version 1.3".

--
You may review the report below and at:
https://www.rfc-editor.org/errata/eid8100

--
Type: Editorial
Reported by: David Benjamin 

Section: 4.1

Original Text
-
   *  If the first byte is alert(21), handshake(22), or ack(proposed,
  26), the record MUST be interpreted as a DTLSPlaintext record.

Corrected Text
--
   *  If the first byte is alert(21), handshake(22), or ack(26), the
  record MUST be interpreted as a DTLSPlaintext record.

Notes
-
This appears to be a remnant from before the codepoint was officially allocated.

Instructions:
-
This erratum is currently posted as "Reported". (If it is spam, it 
will be removed shortly by the RFC Production Center.) Please
use "Reply All" to discuss whether it should be verified or
rejected. When a decision is reached, the verifying party  
will log in to change the status and edit the report, if necessary.

--
RFC9147 (draft-ietf-tls-dtls13-43)
--
Title   : The Datagram Transport Layer Security (DTLS) Protocol 
Version 1.3
Publication Date: April 2022
Author(s)   : E. Rescorla, H. Tschofenig, N. Modadugu
Category: PROPOSED STANDARD
Source  : Transport Layer Security
Stream  : IETF
Verifying Party : IESG

___
TLS mailing list -- tls@ietf.org
To unsubscribe send an email to tls-le...@ietf.org


[TLS] Re: draft-ietf-tls-key-share-prediction next steps

2024-09-12 Thread Kampanakis, Panos
Hi David,

Note I am not against draft-ietf-tls-key-share-prediction. It is definitely 
better to not send unnecessary bytes on the wire.

> Yup. Even adding one PQ key was a noticeable size cost (we still haven't 
> shipped Kyber/ML-KEM to mobile Chrome because the performance regression was 
> more prominent) so, yeah, we definitely do not want to send two PQ keys in 
> the initial ClientHello.

I have seen this claim before and, respectfully, I don’t fully buy it. A mobile 
client that suffers with two packet CHs is probably already crawling for 
hundreds of KBs of web content per conn. Any numbers you have to showcase the 
regression and the relevant affected web metrics?


From: David Benjamin 
Sent: Wednesday, September 11, 2024 8:02 PM
To: Ilari Liusvaara 
Cc:  
Subject: [EXTERNAL] [TLS] Re: draft-ietf-tls-key-share-prediction next steps


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


On Wed, Sep 11, 2024 at 3:58 AM Ilari Liusvaara 
mailto:ilariliusva...@welho.com>> wrote:
On Wed, Sep 11, 2024 at 10:13:55AM +0400, Loganaden Velvindron wrote:
> On Wed, 11 Sept 2024 at 01:40, David Benjamin 
> mailto:david...@chromium.org>> wrote:
> >
> > Hi all,
> >
> > Now that we're working through the Kyber to ML-KEM transition, TLS
> > 1.3's awkwardness around key share prediction is becoming starkly
> > visible. (It is difficult for clients to efficiently offer both
> > Kyber and ML-KEM, but a hard transition loses PQ coverage for some
> > clients. Kyber was a draft standard, just deployed by early
> > adopters, so while not ideal, I think the hard transition is not
> > the end of the world. ML-KEM is expected to be durable, so a
> > coverage-interrupting transition to FancyNewKEM would be a problem.)
> >
>
> Can you detail a little bit more in terms of numbers ?
> -Did you discover that handshakes are failing because of the larger
> ClientHello ?
> -Some web clients aren't auto-updating ?

The outright failures because of larger ClientHello are actually web
server issues. However, even ignoring any hard failures, larger
ClientHello can cause performance issues.

The most relevant of the issues is tldr.fail (https://tldr.fail/),
where web server ends up unable to deal with TCP-level fragmentation
of ClientHello. Even one PQ key (1216 bytes) fills vast manjority of
TCP fragment (and other stuff in ClientHello can easily push it over,
as upper limit is around 1430-1460 bytes). There is no way to fit two
PQ keys.

Then some web servers have ClientHello buffer limits. However, these
limits are almost invariably high enough that one could fit two PQ
keys. IIRC, some research years back came to conclusion that the
maximum tolerable key size is about 3.3kB, which is almost enough for
three PQ keys.

Then there are a lot of web servers that are unable to deal with TLS-
level fragmentation of ClientHello. However, this is not really
relevant, given that the limit is 16kB, which is easily enough for
10 PQ keys and more than enough to definitely cause performance issues
with TCP.

Yup. Even adding one PQ key was a noticeable size cost (we still haven't 
shipped Kyber/ML-KEM to mobile Chrome because the performance regression was 
more prominent) so, yeah, we definitely do not want to send two PQ keys in the 
initial ClientHello. Sending them in supported_groups is cheap, but as those 
options take a RTT hit, they're not really practical. Hence all the 
key-share-prediction work. (For some more background, so the earlier WG 
discussions around this draft, before it was adopted.)

And it is possible for web server to offer both, so even with hard
client transition both old and new clients get PQ coverage.

Yup, although that transition strategy requires that every PQ server move 
before any client moves, if your goal is to never interrupt coverage. That's 
not really a viable transition strategy in the long run, once PQ becomes widely 
deployed.

David
___
TLS mailing list -- tls@ietf.org
To unsubscribe send an email to tls-le...@ietf.org


[TLS] Re: draft-ietf-tls-key-share-prediction next steps

2024-09-12 Thread David Adrian
> Any numbers you have to showcase the regression and the relevant affected
web metrics?

Adding Kyber to the TLS handshake increased TLS handshake latency by 4% on
desktop [1] and 9% on Android at P50, and considerably higher at P95. In
general, Cloudflare found that every 1K of additional data added to the
server response caused median HTTPS handshake latency increase by around
1.5% [2].

> I have seen this claim before and, respectfully, I don’t fully buy it. A
mobile client that suffers with two packet CHs is probably already crawling
for hundreds of KBs of web content per conn.

There is a considerable difference between loading large amounts of data
for a single site, which is a decision that is controllable by a site, and
adding a fixed amount of latency to _all_ connections to all sites to
defend against a computer that does not exist [3].

[1]:
https://blog.chromium.org/2024/05/advancing-our-amazing-bet-on-asymmetric.html
[2]: https://blog.cloudflare.com/pq-2024/
[3]: https://dadrian.io/blog/posts/pqc-not-plaintext/




On Thu, Sep 12, 2024 at 4:11 PM Kampanakis, Panos  wrote:

> Hi David,
>
>
>
> Note I am not against draft-ietf-tls-key-share-prediction. It is
> definitely better to not send unnecessary bytes on the wire.
>
>
>
> > Yup. Even adding one PQ key was a noticeable size cost (we still haven't
> shipped Kyber/ML-KEM to mobile Chrome because the performance regression
> was more prominent) so, yeah, we definitely do not want to send two PQ keys
> in the initial ClientHello.
>
>
>
> I have seen this claim before and, respectfully, I don’t fully buy it. A
> mobile client that suffers with two packet CHs is probably already crawling
> for hundreds of KBs of web content per conn. Any numbers you have to
> showcase the regression and the relevant affected web metrics?
>
>
>
>
>
> *From:* David Benjamin 
> *Sent:* Wednesday, September 11, 2024 8:02 PM
> *To:* Ilari Liusvaara 
> *Cc:*  
> *Subject:* [EXTERNAL] [TLS] Re: draft-ietf-tls-key-share-prediction next
> steps
>
>
>
> *CAUTION*: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
>
>
> On Wed, Sep 11, 2024 at 3:58 AM Ilari Liusvaara 
> wrote:
>
> On Wed, Sep 11, 2024 at 10:13:55AM +0400, Loganaden Velvindron wrote:
> > On Wed, 11 Sept 2024 at 01:40, David Benjamin 
> wrote:
> > >
> > > Hi all,
> > >
> > > Now that we're working through the Kyber to ML-KEM transition, TLS
> > > 1.3's awkwardness around key share prediction is becoming starkly
> > > visible. (It is difficult for clients to efficiently offer both
> > > Kyber and ML-KEM, but a hard transition loses PQ coverage for some
> > > clients. Kyber was a draft standard, just deployed by early
> > > adopters, so while not ideal, I think the hard transition is not
> > > the end of the world. ML-KEM is expected to be durable, so a
> > > coverage-interrupting transition to FancyNewKEM would be a problem.)
> > >
> >
> > Can you detail a little bit more in terms of numbers ?
> > -Did you discover that handshakes are failing because of the larger
> > ClientHello ?
> > -Some web clients aren't auto-updating ?
>
> The outright failures because of larger ClientHello are actually web
> server issues. However, even ignoring any hard failures, larger
> ClientHello can cause performance issues.
>
> The most relevant of the issues is tldr.fail (https://tldr.fail/),
> where web server ends up unable to deal with TCP-level fragmentation
> of ClientHello. Even one PQ key (1216 bytes) fills vast manjority of
> TCP fragment (and other stuff in ClientHello can easily push it over,
> as upper limit is around 1430-1460 bytes). There is no way to fit two
> PQ keys.
>
> Then some web servers have ClientHello buffer limits. However, these
> limits are almost invariably high enough that one could fit two PQ
> keys. IIRC, some research years back came to conclusion that the
> maximum tolerable key size is about 3.3kB, which is almost enough for
> three PQ keys.
>
> Then there are a lot of web servers that are unable to deal with TLS-
> level fragmentation of ClientHello. However, this is not really
> relevant, given that the limit is 16kB, which is easily enough for
> 10 PQ keys and more than enough to definitely cause performance issues
> with TCP.
>
>
>
> Yup. Even adding one PQ key was a noticeable size cost (we still haven't
> shipped Kyber/ML-KEM to mobile Chrome because the performance regression
> was more prominent) so, yeah, we definitely do not want to send two PQ keys
> in the initial ClientHello. Sending them in supported_groups is cheap, but
> as those options take a RTT hit, they're not really practical. Hence all
> the key-share-prediction work. (For some more background, so the earlier WG
> discussions around this draft, before it was adopted.)
>
>
>
> And it is possible for web server to offer both, so even with hard
> client transition both old and new clients get PQ coverage.
>
>

[TLS] DTLS 1.3 ACKs near the version transition

2024-09-12 Thread David Benjamin
Hi all,

I noticed another issue with the DTLS 1.3 ACK design. :-)

So, DTLS 1.3 uses ACKs. DTLS 1.2 does not use ACKs. But you only learn what
version you're speaking partway through the lifetime of the connection, so
there are some interesting corner cases to answer. As an illustrative
example, I believe the diagram in section 6 is [probably] incorrect:
https://www.rfc-editor.org/rfc/rfc9147.html#section-6

If the client loses the first packet, it never sees the ServerHello and
thus learns it's speaking DTLS 1.3. While it does see the second packet,
that packet only contains ciphertext that it cannot decrypt. Unless it
decides to say "this looks like a 1.3 record header, therefore I will turn
on the 1.3 state machine", which isn't supported by the RFC (maybe TLS 1.4
will use the same record header but redo ACKs once again), it shouldn't
activate the 1.3 state machine yet. I expect what will *actually* happen is
that the client will wait for the retransmission timeout a la DTLS 1.2.

More generally, I believe these are the situations to worry about:

1. If a DTLS 1.2 (i.e. does not implement RFC 9147 at all) implementation
receives an ACK record for whatever reason, what happens? This decision we
don't get to change. Rather, it is a design constraint. Both OpenSSL and
BoringSSL treat unexpected record types as a fatal error. I haven't checked
other implementations. So I think we must take as a constraint that you
cannot send an ACK unless you know the peer is 1.3-capable.

2. Do plaintext ACKs exist? Or is the plaintext epoch permanently at the
old state machine? Honestly, I wish the answer here was "no". That would
have avoided so many problems, because then epochs never change state
machines. Unfortunately, the RFC does not support this interpretation.
Section 4.1 talks about how to demux a plaintext ACK, and section 6, though
wrong, clearly depicts a plaintext ACK. So instead we get to worry about
the transition within an epoch. Keep in mind that transitions happen at
different times on both sides. Keep in mind that there is a portion of the
plaintext epoch that lasts after version negotiation in HelloRetryRequest
handshakes.

3. If a 1.3-capable server receives half of a ClientHello, does it send an
ACK? I believe (1) means the answer must be "no". If you haven't read the
ClientHello, you haven't selected the version, so you don't know if the
client is 1.3-capable or not. If the client is not 1.3-capable, sending an
ACK may be incompatible.

4. Is it possible for a 1.3-capable client to receive an ACK *before* it
receives a ServerHello? If so, how does the client respond? I believe the
answer to this question, if plaintext ACKs exist, is unavoidably "yes".
Suppose the server receives a 1.3 ClientHello and then negotiates DTLS 1.3.
That is a complete flight, so Section 7.1 discourages ACKing explicitly
(you can ACK implicitly), but it *does not forbid* an explicit ACK. An
explicit ACK may be sent if the server cannot generate its responding
flight immediately. That means a server could well send ACK followed by
ServerHello. Now suppose ServerHello is lost but the ACK gets through. Now
the client must decide what it's doing. Rejecting the ACK would result in
connection failure, so we must either drop the ACK on the floor, or process
it. While processing it would be more efficient (you don't need to
retransmit the whole ClientHello), it means the plaintext epoch must
support this hybrid state where 1.3 ACKs are processed but never sent! Or
perhaps receiving that ACK transitions you to the 1.3 state machine even
though you don't know the version yet. That all sounds like a mess, so I
would advocate you simply drop it on the floor.

5. If a 1.3-capable client receives half of the server's first message (HRR
or ServerHello), does it send an ACK? Again, because of (1), I believe the
answer must be "no". If you don't know the server's selected version, the
server may not be 1.3-capable and may not be compatible with the ACK.

6. What does a 1.3-capable server do if it receives an ACK prior to picking
the TLS version? Unlike (4), I believe this is impossible. If the client
has something to ACK, the server must have sent something, which the server
will only do once it's received the full ClientHello and thus picked the
version. However, given (4), I suspect an implementation will naturally
just drop that ACK. In this state error vs drop is kinda academic.

>From what I can tell, RFC 9147 is silent on all of this. I think it should
say something. I believe these are the plausible options:

OPTION A -- There are no ACKs in epoch 0.

We avoid this ridiculous transition point and say that ACKs only exist
starting epoch 1. Epoch 0 uses the old DTLS 1.2 state machine. This is very
attractive from a simplicity perspective, but since RFC 9147 was already
published with this ambiguity, I think we need to, at minimum, say that
DTLS 1.3 implementations drop epoch 0 ACKs on the floor. It also means that
packet loss in Hel