Thanks for your response, it helps me understand things better. On Mon, May 1, 2023 at 1:39 PM Simon Kelley <si...@thekelleys.org.uk> wrote:
> I'm not sure there's much incentive to change this: it's not as if > addresses are scarce in the IPv6 world. > Well, it may not be the most necessary change, but I don't think the reason has as much to do with the plentiful address space. I think it makes the interface complex. The most clear complication is having to advise the recipient of a prefix that there's a little bite taken out of it, I have to say, "yeah it's ::2/64, but not ::1," where the latter is in a manual somewhere. I have received networks with little bites taken out of them here and there, and each such scheme I have to account for, because I can't re-delegate slices of it freely, if there is a conflict and other layers do things with the traffic. For example, I recently was working with a vendor, and they punched out three holes in their static route for my prefix: two for VARP, one for a gateway. It's not that this can't be sustained, but I have to itemize the holes out-of-band for each such strategy. Rinse and repeat for other vendors or software stacks. Thankfully, there are norms suggesting IPv6 networks shouldn't use :: or ::1 if they can help it, and although I use link local addressing for the gateway, I suppose having dnsmasq's recursive resolver and DHCP server at ::1 is "close enough" to the intended norm. So I suppose I could use that. The second problem, related to the first, but more subtle, is how an interface with an address assigned within the network affects Linux. Here's an example of a network situation that works with, static configuration, for example, in a virtual machine's network namespace: # ip -n vm57nhjf -6 a 2: vethivm57nhjf@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000 inet6 2a01:4f9:2b:35a:c6dd::/80 scope global valid_lft forever preferred_lft forever inet6 fe80::b068:43ff:fe48:50e1/64 scope link valid_lft forever preferred_lft forever 3: tapvm57nhjf: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000 inet6 fe80::d487:b2ff:fe66:40cd/64 scope link valid_lft forever preferred_lft forever # ip -n vm57nhjf -6 r 2a01:4f9:2b:35a:c6dc::/80 via fe80::d028:12ff:feb8:a206 dev tapvm57nhjf metric 1024 pref medium 2a01:4f9:2b:35a:c6dd::/80 dev vethivm57nhjf proto kernel metric 256 pref medium 2000::/3 via fe80::2ce1:51ff:feb6:c56f dev vethivm57nhjf metric 1024 pref medium fd34:f6d7:d9b1:6c85::/64 via fe80::d028:12ff:feb8:a206 dev tapvm57nhjf metric 1024 pref medium fe80::/64 dev vethivm57nhjf proto kernel metric 256 pref medium fe80::/64 dev tapvm57nhjf proto kernel metric 256 pref medium # ping 2a01:4f9:2b:35a:c6dc:: PING 2a01:4f9:2b:35a:c6dc::(2a01:4f9:2b:35a:c6dc::) 56 data bytes < Hangs, which is expected > ^C --- 2a01:4f9:2b:35a:c6dc:: ping statistics --- 8 packets transmitted, 0 received, 100% packet loss, time 7151ms Here what you see is a namespace that receives a /79 (not directly visible), and divides it into two adjacent /80s. The lower /80 is sent into the VM tap device, the higher /80 is processed by the veth on the host (where I, contra norms, use the :: address for now). Pinging the 2a01:4f9:2b:35a:c6dc::, at the bottom of the VM's /80 doesn't get any response because the VM does not have that address configured (it configures ::2, which returns ping as expected). When pinging the :: address, tcpdump shows ICMP packets going in, but no responses coming out, since the VM doesn't happen to have the address configured. Everything works as expected. Now, let's adapt for dnsmasq, and allocate an address in-network on the host, say, ::1: ip -n vm57nhjf addr add 2a01:4f9:2b:35a:c6dc::1/80 dev tapvm57nhjf Now we get this: # ping 2a01:4f9:2b:35a:c6dc:: PING 2a01:4f9:2b:35a:c6dc::(2a01:4f9:2b:35a:c6dc::) 56 data bytes 64 bytes from 2a01:4f9:2b:35a:c6dd::: icmp_seq=1 ttl=64 time=0.059 ms (DIFFERENT ADDRESS!) 64 bytes from 2a01:4f9:2b:35a:c6dd::: icmp_seq=2 ttl=64 time=0.080 ms (DIFFERENT ADDRESS!) [...etc...] Which is not so good: rather than forwarding the ping to the VM to then get no response, we get the reply from the veth, which has a different source address IP entirely. I could slather on some more Linux features to fix this, I imagine, but: complexity. Pinging the bite of the address space works, though it's handled by the host of course, as does the address the VM listens to: root@Ubuntu-2204-jammy-amd64-base ~ # ping 2a01:4f9:2b:35a:c6dc::1 PING 2a01:4f9:2b:35a:c6dc::1(2a01:4f9:2b:35a:c6dc::1) 56 data bytes 64 bytes from 2a01:4f9:2b:35a:c6dc::1: icmp_seq=1 ttl=64 time=0.051 ms 64 bytes from 2a01:4f9:2b:35a:c6dc::1: icmp_seq=2 ttl=64 time=0.075 ms ^C --- 2a01:4f9:2b:35a:c6dc::1 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1018ms rtt min/avg/max/mdev = 0.051/0.063/0.075/0.012 ms root@Ubuntu-2204-jammy-amd64-base ~ # ping 2a01:4f9:2b:35a:c6dc::2 PING 2a01:4f9:2b:35a:c6dc::2(2a01:4f9:2b:35a:c6dc::2) 56 data bytes 64 bytes from 2a01:4f9:2b:35a:c6dc::2: icmp_seq=1 ttl=63 time=0.494 ms 64 bytes from 2a01:4f9:2b:35a:c6dc::2: icmp_seq=2 ttl=63 time=0.504 ms 64 bytes from 2a01:4f9:2b:35a:c6dc::2: icmp_seq=3 ttl=63 time=0.516 ms ^C --- 2a01:4f9:2b:35a:c6dc::2 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, Small note here, you can see the overhead of virtualization here, which tries to batch network traffic together before toggling in and out of the VM, as I understand it. But, moving onto the address ::3 or any other higher address the VM is not listening for, we get another change in behavior: root@Ubuntu-2204-jammy-amd64-base ~ # ping 2a01:4f9:2b:35a:c6dc::3 PING 2a01:4f9:2b:35a:c6dc::3(2a01:4f9:2b:35a:c6dc::3) 56 data bytes >From 2a01:4f9:2b:35a:c6dd:: icmp_seq=1 Destination unreachable: Address unreachable >From 2a01:4f9:2b:35a:c6dd:: icmp_seq=2 Destination unreachable: Address unreachable >From 2a01:4f9:2b:35a:c6dd:: icmp_seq=3 Destination unreachable: Address unreachable It's not an entirely offensive change in behavior, but it is /different/, and somewhat more complex than the former scenario (where packets are forwarded without anything interesting going on in the host) because the tap device has to have a footprint in the guest VM's network and start processing traffic on it. These are the interesting transitions I noticed making this change for dnsmasq, and they're somewhat orthogonal to there being a lot of address space in IPv6.
_______________________________________________ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss