> But, I think you should add the list and the reason for the range choice to 
> the paper. For example, I can't tell what range you actually used from your 
> description (although that might just be due to a hurried reply).

Section 3.2.4 talks about the selection of websites:

We collect HARs (and resulting DNS lookups) for the top 1,000 websites on the 
Tranco top-list to understand browser performance for the average user visiting 
popular sites. Furthermore, we measure the bottom 1,000 of the top 100,000 
websites (ranked 99,000 to 100,000) to understand browser performance for 
websites that are less popular. We chose to measure the tail of the top 100,000 
instead of the tail of the top 1 million because we found through 
experimentation that many of the websites in the tail of the top 1 million were 
offline at the time of our measurements. Furthermore, there is significant 
churn in the tail of top 1 million, which means that we would not be accurately 
measuring browser performance for the tail across the duration of our 
experiment.

We didn't include the full list in the paper itself for space reasons and 
because extracting the list from the paper would be cumbersome. It will be part 
of our future open source release though.

> Another issue is that, while your paper might accurately capture the network 
> conditions on your local network, it's probably doesn't capture network 
> variation as well as a large scale test along the lines of what Mozilla did. 
> For example, if the university used a single router brand, this could skew 
> the test. As one data point, I've never seen the various network-throttling 
> apps match a real-user-metrics test very well, although they do catch really 
> problematic situations.

It is true that the an experiment from more diverse vantage points could be 
more useful to capture network variations, however, we already see that the 
protocols perform differently to the degree that a human could notice (that is, 
we already show that selecting your DNS protocol based on your network 
characteristics can have a non-negligible impact). Running a large scale test 
for page load times like Mozilla did for resolution times is also quite 
difficult: Simple telemetry data won't answer the questions we want to ask, as 
websites need to be fetched multiple times (per protocol/recursor combination), 
and browser caches need to be flushed. These are not things you want to do with 
your users. Using existing measurement platforms also was not an option, as 
they are limited in what you can access or do not support the necessary 
protocols. Meaning, you would need to set up your own measurement clients, 
which brings questions like "How do you select the vantage points?" (you can't 
use 
 hosting/cloud providers, as this would lead to little network variation). If 
you have an idea on how to easily run this on a larger scale, we'd love to look 
into it.

We set link capacity and latency based on best case values from real-world 
measurements (Section 3.2.3) from OpenSignal, and we use iproute2 for traffic 
control (tc), which we verified to be accurate through latency (Table 1) and 
multiple "speedtest" (via Ookla). By tuning the network setup in this way 
(instead of using a 4G/3G modem for connectivity), we eliminated potential 
differences in routing that could have otherwise influenced our results and 
impact comparability.

-- Kevin
_______________________________________________
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Reply via email to