Hi Panos, Here are some more details on what we see in connections to Cloudflare.
To validate this theory, what would your data show if you queried for the % > of conns that transfer <.5 or <1KB? If that is a lot, then there are many > small conns that skew the median downwards. Or what if you run the query to > exclude the very heavy conns and the very light (HTTP 301, 302 etc)? For > example if you ran a report on the conns transferring 1KB<data<80th percentile > KB, what would be the median for that? That would tell us if the too small > and two big conns skew the median. For non-resumed QUIC connections with at least one request where we transfer (including TLS data) between 4kB and 80kB (the 10th and 80th percentiles of the distribution, respectively), the median bytes transferred is 6.5kB and average is 13.8kB. In other words, less than 10% of non-resumed QUIC connections with at least one request transfer less than 4kB, so it does not appear to be the case that a large number of small requests are skewing the median downwards. Ignoring the top 20% of connections in terms of bytes transferred shifts the average down significantly, which supports the idea that a relatively small number of large requests are skewing the average upwards. Let me know if I can clarify further! This is just what we see today, but it'll be great to see more measurements to see what the real impact is on end-users. Best, Luke On Thu, Nov 7, 2024 at 10:54 AM Kampanakis, Panos <kpanos= 40amazon....@dmarc.ietf.org> wrote: > Hi Bas, > > > > That is interesting and surprising, thank you. > > > > I am mostly interested in the ~63% of non-resumed sessions that would be > affected by 10-15KB of auth data. It looks like your data showed that each > QUIC conn transfers about 4.7KB which is very surprising to me. It seems > very low. > > > > In experiments I am getting here for top web servers, I see lots of conns > which transfer hundreds of KB even over QUIC in cached browsers sessions. > This aligns with the average KB from your blog is 551*0.6=~330KB, but not > the median 4.7. Hundreds of KB also aligns with the p50 per page / conns > per page in > https://httparchive.org/reports/page-weight?lens=top1k&start=2024_05_01&end=latest&view=list > . Of course browsers cache a lot of things like javascript, images etc, so > they don’t transfer all resources which could explain the median. But > still, based on anecdotal experience looking at top visited servers, I am > noticing many small transfers and just a few that transfer larger HTML, css > etc on every page even in cached browser sessions.. > > > > I am curious about the 4.7KB and the 15.8% of conns transferring <100KB in > your blog. Like you say in your blog, if the 95th percentile includes > very large transfers that would skew the diff between the median and the > average. But I am wondering if there is another explanation. In my > experiments I see a lot of 302 and 301 redirects which transfer minimal > data. Some pages have a lot of those. If you have many of them, then your > median will get skewed as it fills up with very small data transfers that > basically don’t do anything. In essence, we could have 10 pages which > transfer 100KB each for one of their resources and have another 9 that are > HTTP Redirects or transfer 0.1KB. That would make us think that 90% of the > 10 pages will be blazing fast, but the 100KB resource in each page will > take a good amount of time in a slow network. > > > > To validate this theory, what would your data show if you queried for the > % of conns that transfer <.5 or <1KB? If that is a lot, then there are many > small conns that skew the median downwards. Or what if you run the query to > exclude the very heavy conns and the very light (HTTP 301, 302 etc)? For > example if you ran a report on the conns transferring 1KB<data<80th > percentile KB, what would be the median for that? That would tell us if the > too small and two big conns skew the median. > > > > Btw, I am curious also about > > > Chrome is more cautious and set 10% as their target for maximum TLS > handshake time regression. > > Is this public somewhere? There is no immediate link between TLS handshake > and any of the Core Web Vitals Metrics or the CruX metrics other than the > TTFB. Even for the TTFB, 10% in the handshake does not mean 10% TTFB; the > TTFB is affected much less. I am wondering if we should start expecting the > TLS handshake to slowly become a tracked web performance metric. > > > > > > *From:* Bas Westerbaan <bas=40cloudflare....@dmarc.ietf.org> > *Sent:* Thursday, November 7, 2024 9:07 AM > *To:* <tls@ietf.org> <tls@ietf.org>; p...@ietf.org > *Subject:* [EXTERNAL] [TLS] Bytes server -> client > > > > *CAUTION*: This email originated from outside of the organization. Do not > click links or open attachments unless you can confirm the sender and know > the content is safe. > > > > Hi all, > > > > Just wanted to highlight a blog post we just published. > https://blog.cloudflare.com/another-look-at-pq-signatures/ At the end we > share some statistics that may be of interest: > > > > On average, around 15 million TLS connections are established with > Cloudflare per second. Upgrading each to ML-DSA, would take 1.8Tbps, which > is 0.6% of our current total network capacity. No problem so far. The > question is how these extra bytes affect performance. > Back in 2021, we ran a large-scale experiment to measure the impact of big > post-quantum certificate chains on connections to Cloudflare’s network over > the open Internet. There were two important results. First, we saw a steep > increase in the rate of client and middlebox failures when we added more > than 10kB to existing certificate chains. Secondly, when adding less than > 9kB, the slowdown in TLS handshake time would be approximately 15%. We felt > the latter is workable, but far from ideal: such a slowdown is noticeable > and people might hold off deploying post-quantum certificates before it’s > too late. > > > > Chrome is more cautious and set 10% as their target for maximum TLS > handshake time regression. They report that deploying post-quantum key > agreement has already incurred a 4% slowdown in TLS handshake time, for the > extra 1.1kB from server-to-client and 1.2kB from client-to-server. That > slowdown is proportionally larger than the 15% we found for 9kB, but that > could be explained by slower upload speeds than download speeds. > > > There has been pushback against the focus on TLS handshake times. One > argument is that session resumption alleviates the need for sending the > certificates again. A second argument is that the data required to visit a > typical website dwarfs the additional bytes for post-quantum certificates. > One example is this 2024 publication, where Amazon researchers have > simulated the impact of large post-quantum certificates on data-heavy TLS > connections. They argue that typical connections transfer multiple requests > and hundreds of kilobytes, and for those the TLS handshake slowdown > disappears in the margin. > > > > Are session resumption and hundreds of kilobytes over a connection typical > though? We’d like to share what we see. We focus on QUIC connections, which > are likely initiated by browsers or browser-like clients. Of all QUIC > connections with Cloudflare that carry at least one HTTP request, 37% are > resumptions, meaning that key material from a previous TLS connection is > reused, avoiding the need to transmit certificates. The median number of > bytes transferred from server-to-client over a resumed QUIC connection is > 4.4kB, while the average is 395kB. For non-resumptions the median is 7.8kB > and average is 551kB. This vast difference between median and average > indicates that a small fraction of data-heavy connections skew the average. > In fact, only 15.8% of all QUIC connections transfer more than 100kB. > > > The median certificate chain today (with compression) is 3.2kB. That means > that almost 40% of all data transferred from server to client on more than > half of the non-resumed QUIC connections are just for the certificates, and > this only gets worse with post-quantum algorithms. For the majority of QUIC > connections, using ML-DSA as a drop-in replacement for classical signatures > would more than double the number of transmitted bytes over the lifetime of > the connection. > > > > It sounds quite bad if the vast majority of data transferred for a typical > connection is just for the post-quantum certificates. It’s still only a > proxy for what is actually important: the effect on metrics relevant to the > end-user, such as the browsing experience (e.g. largest contentful paint) > and the amount of data those certificates take from a user’s monthly data > cap. We will continue to investigate and get a better understanding of the > impact. > > > > Best, > > > > Bas > _______________________________________________ > TLS mailing list -- tls@ietf.org > To unsubscribe send an email to tls-le...@ietf.org > -- Luke Valenta Systems Engineer - Research
_______________________________________________ TLS mailing list -- tls@ietf.org To unsubscribe send an email to tls-le...@ietf.org