Hi Bas,

That is interesting and surprising, thank you.

I am mostly interested in the ~63% of non-resumed sessions that would be 
affected by 10-15KB of auth data. It looks like your data showed that each QUIC 
conn transfers about 4.7KB which is very surprising to me. It seems very low.

In experiments I am getting here for top web servers, I see lots of conns which 
transfer hundreds of KB even over QUIC in cached browsers sessions. This aligns 
with the average KB from your blog is 551*0.6=~330KB, but not the median 4.7. 
Hundreds of KB also aligns with the p50 per page / conns per page in 
https://httparchive.org/reports/page-weight?lens=top1k&start=2024_05_01&end=latest&view=list
 . Of course browsers cache a lot of things like javascript, images etc, so 
they don’t transfer all resources which could explain the median. But still, 
based on anecdotal experience looking at top visited servers, I am noticing 
many small transfers and just a few that transfer larger HTML, css etc on every 
page even in cached browser sessions..

I am curious about the 4.7KB and the 15.8% of conns transferring <100KB in your 
blog. Like you say in your blog, if the 95th percentile includes very large 
transfers that would skew the diff between the median and the average. But I am 
wondering if there is another explanation. In my experiments I see a lot of 302 
and 301 redirects which transfer minimal data. Some pages have a lot of those. 
If you have many of them, then your median will get skewed as it fills up with 
very small data transfers that basically don’t do anything. In essence, we 
could have 10 pages which transfer 100KB each for one of their resources and 
have another 9 that are HTTP Redirects or transfer 0.1KB. That would make us 
think that 90% of the 10 pages will be blazing fast, but the 100KB resource in 
each page will take a good amount of time in a slow network.

To validate this theory, what would your data show if you queried for the % of 
conns that transfer <.5 or <1KB? If that is a lot, then there are many small 
conns that skew the median downwards. Or what if you run the query to exclude 
the very heavy conns and the very light (HTTP 301, 302 etc)? For example if you 
ran a report on the conns transferring 1KB<data<80th percentile KB, what would 
be the median for that? That would tell us if the too small and two big conns 
skew the median.

Btw, I am curious also about
> Chrome is more cautious and set 10% as their target for maximum TLS handshake 
> time regression.
Is this public somewhere? There is no immediate link between TLS handshake and 
any of the Core Web Vitals Metrics or the CruX metrics other than the TTFB. 
Even for the TTFB, 10% in the handshake does not mean 10% TTFB; the TTFB is 
affected much less. I am wondering if we should start expecting the TLS 
handshake to slowly become a tracked web performance metric.


From: Bas Westerbaan <bas=40cloudflare....@dmarc.ietf.org>
Sent: Thursday, November 7, 2024 9:07 AM
To: <tls@ietf.org> <tls@ietf.org>; p...@ietf.org
Subject: [EXTERNAL] [TLS] Bytes server -> client


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


Hi all,

Just wanted to highlight a blog post we just published. 
https://blog.cloudflare.com/another-look-at-pq-signatures/  At the end we share 
some statistics that may be of interest:

On average, around 15 million TLS connections are established with Cloudflare 
per second. Upgrading each to ML-DSA, would take 1.8Tbps, which is 0.6% of our 
current total network capacity. No problem so far. The question is how these 
extra bytes affect performance.
Back in 2021, we ran a large-scale experiment to measure the impact of big 
post-quantum certificate chains on connections to Cloudflare’s network over the 
open Internet. There were two important results. First, we saw a steep increase 
in the rate of client and middlebox failures when we added more than 10kB to 
existing certificate chains. Secondly, when adding less than 9kB, the slowdown 
in TLS handshake time would be approximately 15%. We felt the latter is 
workable, but far from ideal: such a slowdown is noticeable and people might 
hold off deploying post-quantum certificates before it’s too late.

Chrome is more cautious and set 10% as their target for maximum TLS handshake 
time regression. They report that deploying post-quantum key agreement has 
already incurred a 4% slowdown in TLS handshake time, for the extra 1.1kB from 
server-to-client and 1.2kB from client-to-server. That slowdown is 
proportionally larger than the 15% we found for 9kB, but that could be 
explained by slower upload speeds than download speeds.

There has been pushback against the focus on TLS handshake times. One argument 
is that session resumption alleviates the need for sending the certificates 
again. A second argument is that the data required to visit a typical website 
dwarfs the additional bytes for post-quantum certificates. One example is this 
2024 publication, where Amazon researchers have simulated the impact of large 
post-quantum certificates on data-heavy TLS connections. They argue that 
typical connections transfer multiple requests and hundreds of kilobytes, and 
for those the TLS handshake slowdown disappears in the margin.

Are session resumption and hundreds of kilobytes over a connection typical 
though? We’d like to share what we see. We focus on QUIC connections, which are 
likely initiated by browsers or browser-like clients. Of all QUIC connections 
with Cloudflare that carry at least one HTTP request, 37% are resumptions, 
meaning that key material from a previous TLS connection is reused, avoiding 
the need to transmit certificates. The median number of bytes transferred from 
server-to-client over a resumed QUIC connection is 4.4kB, while the average is 
395kB. For non-resumptions the median is 7.8kB and average is 551kB. This vast 
difference between median and average indicates that a small fraction of 
data-heavy connections skew the average. In fact, only 15.8% of all QUIC 
connections transfer more than 100kB.

The median certificate chain today (with compression) is 3.2kB. That means that 
almost 40% of all data transferred from server to client on more than half of 
the non-resumed QUIC connections are just for the certificates, and this only 
gets worse with post-quantum algorithms. For the majority of QUIC connections, 
using ML-DSA as a drop-in replacement for classical signatures would more than 
double the number of transmitted bytes over the lifetime of the connection.

It sounds quite bad if the vast majority of data transferred for a typical 
connection is just for the post-quantum certificates. It’s still only a proxy 
for what is actually important: the effect on metrics relevant to the end-user, 
such as the browsing experience (e.g. largest contentful paint) and the amount 
of data those certificates take from a user’s monthly data cap. We will 
continue to investigate and get a better understanding of the impact.

Best,

 Bas
_______________________________________________
TLS mailing list -- tls@ietf.org
To unsubscribe send an email to tls-le...@ietf.org

Reply via email to