Hi Bas, That is interesting and surprising, thank you.
I am mostly interested in the ~63% of non-resumed sessions that would be affected by 10-15KB of auth data. It looks like your data showed that each QUIC conn transfers about 4.7KB which is very surprising to me. It seems very low. In experiments I am getting here for top web servers, I see lots of conns which transfer hundreds of KB even over QUIC in cached browsers sessions. This aligns with the average KB from your blog is 551*0.6=~330KB, but not the median 4.7. Hundreds of KB also aligns with the p50 per page / conns per page in https://httparchive.org/reports/page-weight?lens=top1k&start=2024_05_01&end=latest&view=list . Of course browsers cache a lot of things like javascript, images etc, so they don’t transfer all resources which could explain the median. But still, based on anecdotal experience looking at top visited servers, I am noticing many small transfers and just a few that transfer larger HTML, css etc on every page even in cached browser sessions.. I am curious about the 4.7KB and the 15.8% of conns transferring <100KB in your blog. Like you say in your blog, if the 95th percentile includes very large transfers that would skew the diff between the median and the average. But I am wondering if there is another explanation. In my experiments I see a lot of 302 and 301 redirects which transfer minimal data. Some pages have a lot of those. If you have many of them, then your median will get skewed as it fills up with very small data transfers that basically don’t do anything. In essence, we could have 10 pages which transfer 100KB each for one of their resources and have another 9 that are HTTP Redirects or transfer 0.1KB. That would make us think that 90% of the 10 pages will be blazing fast, but the 100KB resource in each page will take a good amount of time in a slow network. To validate this theory, what would your data show if you queried for the % of conns that transfer <.5 or <1KB? If that is a lot, then there are many small conns that skew the median downwards. Or what if you run the query to exclude the very heavy conns and the very light (HTTP 301, 302 etc)? For example if you ran a report on the conns transferring 1KB<data<80th percentile KB, what would be the median for that? That would tell us if the too small and two big conns skew the median. Btw, I am curious also about > Chrome is more cautious and set 10% as their target for maximum TLS handshake > time regression. Is this public somewhere? There is no immediate link between TLS handshake and any of the Core Web Vitals Metrics or the CruX metrics other than the TTFB. Even for the TTFB, 10% in the handshake does not mean 10% TTFB; the TTFB is affected much less. I am wondering if we should start expecting the TLS handshake to slowly become a tracked web performance metric. From: Bas Westerbaan <bas=40cloudflare....@dmarc.ietf.org> Sent: Thursday, November 7, 2024 9:07 AM To: <tls@ietf.org> <tls@ietf.org>; p...@ietf.org Subject: [EXTERNAL] [TLS] Bytes server -> client CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hi all, Just wanted to highlight a blog post we just published. https://blog.cloudflare.com/another-look-at-pq-signatures/ At the end we share some statistics that may be of interest: On average, around 15 million TLS connections are established with Cloudflare per second. Upgrading each to ML-DSA, would take 1.8Tbps, which is 0.6% of our current total network capacity. No problem so far. The question is how these extra bytes affect performance. Back in 2021, we ran a large-scale experiment to measure the impact of big post-quantum certificate chains on connections to Cloudflare’s network over the open Internet. There were two important results. First, we saw a steep increase in the rate of client and middlebox failures when we added more than 10kB to existing certificate chains. Secondly, when adding less than 9kB, the slowdown in TLS handshake time would be approximately 15%. We felt the latter is workable, but far from ideal: such a slowdown is noticeable and people might hold off deploying post-quantum certificates before it’s too late. Chrome is more cautious and set 10% as their target for maximum TLS handshake time regression. They report that deploying post-quantum key agreement has already incurred a 4% slowdown in TLS handshake time, for the extra 1.1kB from server-to-client and 1.2kB from client-to-server. That slowdown is proportionally larger than the 15% we found for 9kB, but that could be explained by slower upload speeds than download speeds. There has been pushback against the focus on TLS handshake times. One argument is that session resumption alleviates the need for sending the certificates again. A second argument is that the data required to visit a typical website dwarfs the additional bytes for post-quantum certificates. One example is this 2024 publication, where Amazon researchers have simulated the impact of large post-quantum certificates on data-heavy TLS connections. They argue that typical connections transfer multiple requests and hundreds of kilobytes, and for those the TLS handshake slowdown disappears in the margin. Are session resumption and hundreds of kilobytes over a connection typical though? We’d like to share what we see. We focus on QUIC connections, which are likely initiated by browsers or browser-like clients. Of all QUIC connections with Cloudflare that carry at least one HTTP request, 37% are resumptions, meaning that key material from a previous TLS connection is reused, avoiding the need to transmit certificates. The median number of bytes transferred from server-to-client over a resumed QUIC connection is 4.4kB, while the average is 395kB. For non-resumptions the median is 7.8kB and average is 551kB. This vast difference between median and average indicates that a small fraction of data-heavy connections skew the average. In fact, only 15.8% of all QUIC connections transfer more than 100kB. The median certificate chain today (with compression) is 3.2kB. That means that almost 40% of all data transferred from server to client on more than half of the non-resumed QUIC connections are just for the certificates, and this only gets worse with post-quantum algorithms. For the majority of QUIC connections, using ML-DSA as a drop-in replacement for classical signatures would more than double the number of transmitted bytes over the lifetime of the connection. It sounds quite bad if the vast majority of data transferred for a typical connection is just for the post-quantum certificates. It’s still only a proxy for what is actually important: the effect on metrics relevant to the end-user, such as the browsing experience (e.g. largest contentful paint) and the amount of data those certificates take from a user’s monthly data cap. We will continue to investigate and get a better understanding of the impact. Best, Bas
_______________________________________________ TLS mailing list -- tls@ietf.org To unsubscribe send an email to tls-le...@ietf.org