> But, I think you should add the list and the reason for the range choice to > the paper. For example, I can't tell what range you actually used from your > description (although that might just be due to a hurried reply).
Section 3.2.4 talks about the selection of websites: We collect HARs (and resulting DNS lookups) for the top 1,000 websites on the Tranco top-list to understand browser performance for the average user visiting popular sites. Furthermore, we measure the bottom 1,000 of the top 100,000 websites (ranked 99,000 to 100,000) to understand browser performance for websites that are less popular. We chose to measure the tail of the top 100,000 instead of the tail of the top 1 million because we found through experimentation that many of the websites in the tail of the top 1 million were offline at the time of our measurements. Furthermore, there is significant churn in the tail of top 1 million, which means that we would not be accurately measuring browser performance for the tail across the duration of our experiment. We didn't include the full list in the paper itself for space reasons and because extracting the list from the paper would be cumbersome. It will be part of our future open source release though. > Another issue is that, while your paper might accurately capture the network > conditions on your local network, it's probably doesn't capture network > variation as well as a large scale test along the lines of what Mozilla did. > For example, if the university used a single router brand, this could skew > the test. As one data point, I've never seen the various network-throttling > apps match a real-user-metrics test very well, although they do catch really > problematic situations. It is true that the an experiment from more diverse vantage points could be more useful to capture network variations, however, we already see that the protocols perform differently to the degree that a human could notice (that is, we already show that selecting your DNS protocol based on your network characteristics can have a non-negligible impact). Running a large scale test for page load times like Mozilla did for resolution times is also quite difficult: Simple telemetry data won't answer the questions we want to ask, as websites need to be fetched multiple times (per protocol/recursor combination), and browser caches need to be flushed. These are not things you want to do with your users. Using existing measurement platforms also was not an option, as they are limited in what you can access or do not support the necessary protocols. Meaning, you would need to set up your own measurement clients, which brings questions like "How do you select the vantage points?" (you can't use hosting/cloud providers, as this would lead to little network variation). If you have an idea on how to easily run this on a larger scale, we'd love to look into it. We set link capacity and latency based on best case values from real-world measurements (Section 3.2.3) from OpenSignal, and we use iproute2 for traffic control (tc), which we verified to be accurate through latency (Table 1) and multiple "speedtest" (via Ookla). By tuning the network setup in this way (instead of using a 4G/3G modem for connectivity), we eliminated potential differences in routing that could have otherwise influenced our results and impact comparability. -- Kevin _______________________________________________ DNSOP mailing list DNSOP@ietf.org https://www.ietf.org/mailman/listinfo/dnsop