Joel <jo...@sdf.org> writes: > The root cause is that Azure Networking currently does not support > Path MTU Discovery (PMTUD). As more Microsoft websites come up with > AAAA records, most of those sites use Azure Networking (without Azure > Front Door) and can cause this issue.
And because PMTUD is not optional, this is spelled "broken". It would be great if anyone on this list knows people at MS and could get them to fix it. While running a test to make sure that NetBSD correctly does what I told Joel it would do :-) I noticed multiple anomalies, but I'm not sure if they are actually wrong. In this case, the topology is: - lan-client: normal netbsd 10 machine on Ethernet - router: netbsd 9 Ethernet, and gif0 to he, mtu 1280 (but he configured to 1480) (another ethernet to a fiber ONT) - he endpoint etc. - vps: netbsd 9 domU at a xen-based hosting service On router: I removed the "mtu 1480" dynamic routes. On lan-client, I ran "wget https:://vps/foo/bar". 18:29:00.168831 IP6 lan-client.61111 > vps.https: Flags [S], seq 583236747, win 65535, options [mss 1440,nop,wscale 3,sackOK,TS val 1 ecr 0], length 0 18:29:00.168878 IP6 vps.https > lan-client.61111: Flags [S.], seq 942780486, ack 583236748, win 65535, options [mss 1440,nop,wscale 3,sackOK,TS val 1 ecr 1], length 0 18:29:00.252215 IP6 lan-client.61111 > vps.https: Flags [.], ack 1, win 16560, options [nop,nop,TS val 1 ecr 1], length 0 normal open 18:29:00.252229 IP6 lan-client.61111 > vps.https: Flags [P.], seq 1:412, ack 1, win 16560, options [nop,nop,TS val 1 ecr 1], length 411 client part of TLS 18:29:00.252332 IP6 vps.https > lan-client.61111: Flags [.], ack 1, win 16560, options [nop,nop,TS val 2 ecr 1], length 0 really good question what this ack is for; doesn't cover 412. Probably crossing in the mail and sent before the previous client TLS packet is procesed. QUESTION: is this some scheme to send ack pairs to measure link capacity? 18:29:00.253813 IP6 vps.https > lan-client.61111: Flags [.], seq 1:1429, ack 412, win 16560, options [nop,nop,TS val 2 ecr 1], length 1428 18:29:00.253815 IP6 vps.https > lan-client.61111: Flags [.], seq 1429:2857, ack 412, win 16560, options [nop,nop,TS val 2 ecr 1], length 1428 18:29:00.253816 IP6 vps.https > lan-client.61111: Flags [P.], seq 2857:3149, ack 412, win 16560, options [nop,nop,TS val 2 ecr 1], length 292 vps sends packets #1A, 2A, and 3A, pile of certs. Only 3A fits in the tunnel. 18:29:00.315724 IP6 tserv1.nyc4.he.net > vps: ICMP6, packet too big, mtu 1480, length 1240 he's tunnel endpoint objects to 1A, arriving ~62 ms later. 18:29:00.315836 IP6 vps.https > lan-client.61111: Flags [.], seq 1:1409, ack 412, win 16560, options [nop,nop,TS val 2 ecr 1], length 1408 18:29:00.315838 IP6 vps.https > lan-client.61111: Flags [.], seq 1409:2817, ack 412, win 16560, options [nop,nop,TS val 2 ecr 1], length 1408 18:29:00.315840 IP6 vps.https > lan-client.61111: Flags [P.], seq 2817:3149, ack 412, win 16560, options [nop,nop,TS val 2 ecr 1], length 332 vps immediately resends the certs, sliced into 3 packets differently as 1B, 2B, and 3B. 18:29:00.316729 IP6 tserv1.nyc4.he.net > vps: ICMP6, packet too big, mtu 1480, length 1240 he's tunnel endpoint objects to 1B. 18:29:00.316760 IP6 vps.https > lan-client.61111: Flags [.], seq 1:1409, ack 412, win 16560, options [nop,nop,TS val 2 ecr 1], length 1408 18:29:00.316762 IP6 vps.https > lan-client.61111: Flags [.], seq 1409:2817, ack 412, win 16560, options [nop,nop,TS val 2 ecr 1], length 1408 18:29:00.316763 IP6 vps.https > lan-client.61111: Flags [P.], seq 2817:3149, ack 412, win 16560, options [nop,nop,TS val 2 ecr 1], length 332 vps immediately resends the certs again, with the same split. QUESTION: is this correct, or should it have somehow recorded that it already did this? 18:29:00.326324 IP6 lan-client.61111 > vps.https: Flags [.], ack 1, win 16560, options [nop,nop,TS val 2 ecr 2,nop,nop,sack 1 {2857:3149}], length 0 ack arrives for packet #3 as SACK, ack point still 1 (the SYN). 18:29:00.394163 IP6 lan-client.61111 > vps.https: Flags [.], ack 1409, win 16384, options [nop,nop,TS val 2 ecr 2,nop,nop,sack 1 {2857:3149}], length 0 ack for 1B, and SACKing 3A 18:29:00.394165 IP6 lan-client.61111 > vps.https: Flags [.], ack 2817, win 16208, options [nop,nop,TS val 2 ecr 2,nop,nop,sack 1 {2857:3149}], length 0 ack for 2B, and SACKING 3A. Not acking 2817-2857. 18:29:00.394166 IP6 lan-client.61111 > vps.https: Flags [.], ack 3149, win 16166, options [nop,nop,TS val 2 ecr 2], length 0 full ack for 3B, not SACK. 18:29:00.394166 IP6 lan-client.61111 > vps.https: Flags [.], ack 3149, win 16515, options [nop,nop,TS val 2 ecr 2], length 0 second ack for 3B, not SACK. Yes, the time is the same but tcpdump really has two lines. QUESTION: why? 18:29:00.394167 IP6 lan-client.61111 > vps.https: Flags [.], ack 3149, win 16515, options [nop,nop,TS val 2 ecr 2,nop,nop,sack 1 {1:1409}], length 0 dupack for 1B 2nd transmit 18:29:00.394167 IP6 lan-client.61111 > vps.https: Flags [.], ack 3149, win 16515, options [nop,nop,TS val 2 ecr 2,nop,nop,sack 1 {1409:2817}], length 0 dupack for 2B 2nd transmit 18:29:00.394168 IP6 lan-client.61111 > vps.https: Flags [.], ack 3149, win 16515, options [nop,nop,TS val 2 ecr 2,nop,nop,sack 1 {2817:3149}], length 0 dupack for 3B 2nd transmit from here on, looks mostly normal with full acks, no SACK. 18:29:00.394168 IP6 lan-client.61111 > vps.https: Flags [P.], seq 412:492, ack 3149, win 16560, options [nop,nop,TS val 2 ecr 2], length 80 18:29:00.394452 IP6 vps.https > lan-client.61111: Flags [P.], seq 3149:3436, ack 492, win 16560, options [nop,nop,TS val 2 ecr 2], length 287 18:29:00.394509 IP6 vps.https > lan-client.61111: Flags [P.], seq 3436:3723, ack 492, win 16560, options [nop,nop,TS val 2 ecr 2], length 287 18:29:00.468791 IP6 lan-client.61111 > vps.https: Flags [P.], seq 492:650, ack 3436, win 16524, options [nop,nop,TS val 2 ecr 2], length 158 18:29:00.470602 IP6 vps.https > lan-client.61111: Flags [.], seq 3723:5131, ack 650, win 16560, options [nop,nop,TS val 2 ecr 2], length 1408 18:29:00.470604 IP6 vps.https > lan-client.61111: Flags [.], seq 5131:6539, ack 650, win 16560, options [nop,nop,TS val 2 ecr 2], length 1408 18:29:00.470606 IP6 vps.https > lan-client.61111: Flags [.], seq 6539:7947, ack 650, win 16560, options [nop,nop,TS val 2 ecr 2], length 1408 18:29:00.470609 IP6 vps.https > lan-client.61111: Flags [.], seq 7947:9355, ack 650, win 16560, options [nop,nop,TS val 2 ecr 2], length 1408 18:29:00.470611 IP6 vps.https > lan-client.61111: Flags [.], seq 9355:10763, ack 650, win 16560, options [nop,nop,TS val 2 ecr 2], length 1408 18:29:00.550668 IP6 lan-client.61111 > vps.https: Flags [.], ack 5131, win 16384, options [nop,nop,TS val 2 ecr 2], length 0 18:29:00.550756 IP6 lan-client.61111 > vps.https: Flags [.], ack 7947, win 16032, options [nop,nop,TS val 2 ecr 2], length 0 18:29:00.550757 IP6 lan-client.61111 > vps.https: Flags [.], ack 7947, win 16560, options [nop,nop,TS val 2 ecr 2], length 0 18:29:00.550757 IP6 lan-client.61111 > vps.https: Flags [.], ack 10763, win 16208, options [nop,nop,TS val 2 ecr 2], length 0 18:29:00.550758 IP6 lan-client.61111 > vps.https: Flags [.], ack 10763, win 16560, options [nop,nop,TS val 2 ecr 2], length 0 except there are two acks for 7947 and two for 10763. QUESTION: why and is this correct? Could there be reordering? 18:29:00.550760 IP6 vps.https > lan-client.61111: Flags [.], seq 10763:12171, ack 650, win 16560, options [nop,nop,TS val 2 ecr 2], length 1408 18:29:00.550762 IP6 vps.https > lan-client.61111: Flags [.], seq 12171:13579, ack 650, win 16560, options [nop,nop,TS val 2 ecr 2], length 1408 18:29:00.550791 IP6 vps.https > lan-client.61111: Flags [.], seq 13579:14987, ack 650, win 16560, options [nop,nop,TS val 2 ecr 2], length 1408 18:29:00.550793 IP6 vps.https > lan-client.61111: Flags [.], seq 14987:16395, ack 650, win 16560, options [nop,nop,TS val 2 ecr 2], length 1408 18:29:00.550795 IP6 vps.https > lan-client.61111: Flags [.], seq 16395:17803, ack 650, win 16560, options [nop,nop,TS val 2 ecr 2], length 1408 18:29:00.550796 IP6 vps.https > lan-client.61111: Flags [P.], seq 17803:18082, ack 650, win 16560, options [nop,nop,TS val 2 ecr 2], length 279 18:29:00.626999 IP6 lan-client.61111 > vps.https: Flags [.], ack 13579, win 16208, options [nop,nop,TS val 2 ecr 2], length 0 18:29:00.627001 IP6 lan-client.61111 > vps.https: Flags [.], ack 13579, win 16560, options [nop,nop,TS val 2 ecr 2], length 0 18:29:00.630529 IP6 lan-client.61111 > vps.https: Flags [.], ack 16395, win 16208, options [nop,nop,TS val 2 ecr 2], length 0 18:29:00.630530 IP6 lan-client.61111 > vps.https: Flags [.], ack 18082, win 15997, options [nop,nop,TS val 2 ecr 2], length 0 18:29:00.630532 IP6 lan-client.61111 > vps.https: Flags [.], ack 18082, win 16560, options [nop,nop,TS val 2 ecr 2], length 0 Again extra acks 18:29:00.630532 IP6 lan-client.61111 > vps.https: Flags [F.], seq 650, ack 18082, win 16560, options [nop,nop,TS val 2 ecr 2], length 0 18:29:00.630563 IP6 vps.https > lan-client.61111: Flags [.], ack 651, win 16560, options [nop,nop,TS val 2 ecr 2], length 0 18:29:00.630665 IP6 vps.https > lan-client.61111: Flags [F.], seq 18082, ack 651, win 16560, options [nop,nop,TS val 2 ecr 2], length 0 18:29:00.698763 IP6 lan-client.61111 > vps.https: Flags [.], ack 18083, win 16560, options [nop,nop,TS val 2 ecr 2], length 0 normal close (where vps acks the FIN before the server shuts down its half); seems ok.