Hello Geoff,

On Thu, 2025-03-06 at 01:24 +0000, Geoff Huston wrote:
> There is a difference between taking a single vantage point (or a
> small set of vantage) and looking at the behaviour of a resolution of
> a large domain set,
> and taking a very large set of vantage points and looking at the
> behaviours of resolution of a small set of domains that exhibit
> particular properties.
> 
> The work we do at APNIC Labs is focussed on the latter mode, and this
> has lead to our observations about the incremental issues arising
> from the use of IPv6 transport for the DNS. 

Actually, you are not using more vantage points; Given that you are
inverting the measurement direction, in your setup, the authoritatives
are the vantage points.

> Clearly we disagree here over the quantification of this
> impactarising from these differing methodologies.

The problem is, that your work is very much impossible to reproduce,
and raw data is not available. That makes it impossible to argue on
this data, because _any_ other measurement could ultimately be argued
against with 'but our data does not show this'.

In fact, your measurements demonstrate several limitations highly
impacting the results and conclusions. Those events range in the same
order of magnitude as the _whole effects_ I am measuring. In fact, for
many cases, it seems like core effects are tied to an individual
operator; Btw, does somebody know someone working for LGI/Liberty
Global? Fixing their fragment dropping would make a lot of things a lot
better in those stats.

---
# Measured fragment drop rates

Given what we are discussing, fragmentation dropping as a major
concern, this ist the most critical issue in the data. During my scans
of authoritatives, the impact of fragments being lost was not notable
for .nl.

.nl saw between 0.25% and 2.5% of connections during resolution
encounter fragments on UDP, with a lower share of fragments for IPv6
enabled NSSets. Naturally, fragmentation was higher for zones with
DNSSEC, but not notably different between v4 and v6:

https://data.measurement.network/dns-mtu-msmt/out_data/2025-03-04/plots/pcap_stats/pcap_stats_UDP_frag.pdf

At the same time, the APNIC data reports ~40% drop rate for IPv6
fragments in the Netherlands. Notably, up until the second half of
2023, that rate was well below 5%. After an event at the end of August
2023, where both frag drop and atomic frag drop rate rose to ~40%, the
rate seemed to plateau around 15%, followed by a steep increase in the
end of November 2023, again to a rate around 40%, comparable to the
high increase observed in the end of August 2023.

Unlike two other constraint events since then, this is not accompanied
by a corresponding decrease in the number of clients.

https://rincewind.home.aperture-labs.org/~tfiebig/mtu/frag-drop-nl.png

Listing this by AS, it is clear that LGI/AS33915 is the main culprit
here, providing Internet for a large portion of people in the
Netherlands via its subsidiary Ziggo. However, this aggregation
approach highly skews the country stats; We have to convince one
operator to properly handle fragments, not half a country.

Furthermore, this pattern continues across countries:

For PL, IE, SK,  the root cause is AS6830 (LGI).
For CH, the root cause is AS6730 (Sunrise, behind LGI).
For BE, the root cause is AS6848 (Telenet BV, behind LGI).
...


Interestingly, though, the LGI/Vodafone instance in DE (AS3209) seems
to _not_ have such a high frag drop rate. At the same time, Cloudflare-
-likely due to their tunnel offerings--seems to have a notable impact
in the eyeball space.

Hence, I would argue that we do not really have a global problem with
IPv6 fragment dropping (at least in Europe). We have an issue with
LGI's network (or a specific device/vendor/configuration used only in
specific countries). Does somebody know someone working there?


# Impact of vantage point location in a fragment-dropping AS

Interestingly, though, AS63949  AKAMAI-LINODE-AP Akamai Connected
Cloud, shows up prominently in the data for .NL; According to the APNIC
data, this AS shows a volatile drop rate, averaging well above 50%.

This is also the AS in which the vantage points seem to be located. It
is unclear, in how far this additionally impacts the measurements'
reliability.

---

Beyond those aspects, there are several other aspects which raise some
doubts w.r.t. the reliability of the meausrement platform:

# Noise Effects

The measurements exhibit strange cliffs with volatile jumps in the
range of orders of magnitude impacting all measured endpoints with no
clear root-cause; See, for example, the suspicious cliff in q1 use from
late 2022 until late 2024:

https://rincewind.home.aperture-labs.org/~tfiebig/mtu/dns-resolver-use-all-resolvers.png

https://rincewind.home.aperture-labs.org/~tfiebig/mtu/dns-resolver-use-first.png

Similar noise effects occur in other measurements as well, for example
the RPKI validation data (for example for AS3320's announced prefixes):

https://rincewind.home.aperture-labs.org/~tfiebig/mtu/AS3320-RPKI.png


# High participation from North Korea

Some time ago I remarked that I found the high number of daily users
from NK a bit odd; Indeed, it seems like KP is no longer listed as part
of Eastern Asia:

https://rincewind.home.aperture-labs.org/~tfiebig/mtu/adds_eastern_asia.png

However, looking at the country-site for 'KP', that data still seems to
be there, also receiving a comparatively high 'Weight' for the number
of samples:

https://rincewind.home.aperture-labs.org/~tfiebig/mtu/KP-samples-over-time.png

Interestingly, the AS responsible for this has actually less v4
announced than me, and seems to also contain a fair bit of hosted
services in those prefixes:

https://bgp.tools/prefix/175.45.179.0/24#dns

https://bgp.tools/prefix/175.45.176.0/24#dns

For comparison, look at, e.g., FFMUC:

https://rincewind.home.aperture-labs.org/~tfiebig/mtu/FFMUC-samples-over-time.png

This eyeball project as ~2-4k concurrently active users; See:

https://stats.ffmuc.net/d/kUoZ2DRWz/network-overview?orgId=1&from=now-7d&to=now&timezone=browser&refresh=1m


# Impact of MTU mingeling from hypervisors

As I noted before, the set of vantage points for your measurements is
actually somewhat limited, effectively to the number (iirc <=10)
authoritatives put up. Furthermore, as far as I know, you are running
the authoritatives on virtual machines rented from a commodity hoster.

The problem there is that the use of VirtIO NICs can actually heavily
impact MTU behavior in a way transparent to the virtual machine. The
only way I managed to force my systems to fragment and drop 'correctly'
was not only forcing TSO/GRO et al. to off on the host, but also
explicitly doing so on the nic of the Hypervisor.

# Zone-path of measurement domain

One of the domains involved in the measurements seems to be
dotnxdomain.net:

% dig +short NS dotnxdomain.net
ns1.dotnxdomain.net.
ns3.dotnxdomain.net.


At least from an EU PoP, these do resolve to IPs in Australia (ns1.)
and SoCal (ns3.);

% dig +short A ns1.dotnxdomain.net
203.133.248.6
% dig +short A ns3.dotnxdomain.net
173.230.146.214
% dig +short AAAA ns1.dotnxdomain.net
2401:2000:6660::6
% dig +short AAAA ns3.dotnxdomain.net
2600:3c01::f03c:93ff:fe02:be25

This induces notable latency and sources of error along the path, even
if the final resolution for the <cc/region> names thereunder used in
the actual measurements are different. Essentially, it adds 'traversing
half the planet' to the set of noise sources.

# Measurement Setup / Config Location

The measurements seem to (have been) configured via
cfg.dotnxdomain.net; This name (again, via the aforementioned NS)
resolves to three distinct IPs, located in AP, EU, and NA:

% dig +short A cfg.dotnxdomain.net
139.162.2.194
139.162.149.100
45.79.7.112

% dig +short AAAA cfg.dotnxdomain.net
2a01:7e01::f03c:91ff:fe12:6bfe
2400:8901::f03c:91ff:fe98:63d6
2600:3c00::f03c:91ff:fe98:16c8

The problem with that is that clients may end up on a rather remote
cfg. host of these. That, in turn, may lead to self-selection depending
on clock and timeout settings. At least for cfg. retrieval, the timeout
seems to be 10000ms, which should be mostly save. Nevertheless, the
added latency may lead clients to abandon the ad; Also, I am not sure
what timeout behavior modern browsers are forcing.

When throwing these names into Atlas with resolve-on-probe, at least,
RTT and hit end-point vary wildly and are not congruent with the
physical locations of probes:

v4:
https://atlas.ripe.net/measurements/89419109/overview

v6:
https://atlas.ripe.net/measurements/89419368/overview

---

So, in summary, I am not convinced that the data commonly referenced is
robust enough to allow the strength of conclusions drawn on fragment
handling for IPv6.

With best regards,
Tobias


_______________________________________________
DNSOP mailing list -- dnsop@ietf.org
To unsubscribe send an email to dnsop-le...@ietf.org

Reply via email to