Joel <jo...@sdf.org> writes:

> The root cause is that Azure Networking currently does not support
> Path MTU Discovery (PMTUD). As more Microsoft websites come up with
> AAAA records, most of those sites use Azure Networking (without Azure
> Front Door) and can cause this issue.

And because PMTUD is not optional, this is spelled "broken".  It would
be great if anyone on this list knows people at MS and could get them to
fix it.

While running a test to make sure that NetBSD correctly does what I told
Joel it would do :-) I noticed multiple anomalies, but I'm not sure if
they are actually wrong.

In this case, the topology is:
  - lan-client: normal netbsd 10 machine on Ethernet
  - router: netbsd 9 Ethernet, and gif0 to he, mtu 1280 (but he configured to 
1480)
    (another ethernet to a fiber ONT)
  - he endpoint etc.
  - vps: netbsd 9 domU at a xen-based hosting service

On router: I removed the "mtu 1480" dynamic routes.
On lan-client, I ran "wget https:://vps/foo/bar".

  18:29:00.168831 IP6 lan-client.61111 > vps.https: Flags [S], seq 583236747, 
win 65535, options [mss 1440,nop,wscale 3,sackOK,TS val 1 ecr 0], length 0
  18:29:00.168878 IP6 vps.https > lan-client.61111: Flags [S.], seq 942780486, 
ack 583236748, win 65535, options [mss 1440,nop,wscale 3,sackOK,TS val 1 ecr 
1], length 0
  18:29:00.252215 IP6 lan-client.61111 > vps.https: Flags [.], ack 1, win 
16560, options [nop,nop,TS val 1 ecr 1], length 0

normal open

  18:29:00.252229 IP6 lan-client.61111 > vps.https: Flags [P.], seq 1:412, ack 
1, win 16560, options [nop,nop,TS val 1 ecr 1], length 411

client part of TLS

  18:29:00.252332 IP6 vps.https > lan-client.61111: Flags [.], ack 1, win 
16560, options [nop,nop,TS val 2 ecr 1], length 0

really good question what this ack is for; doesn't cover 412.  Probably
crossing in the mail and sent before the previous client TLS packet is
procesed.

QUESTION: is this some scheme to send ack pairs to measure link capacity?

  18:29:00.253813 IP6 vps.https > lan-client.61111: Flags [.], seq 1:1429, ack 
412, win 16560, options [nop,nop,TS val 2 ecr 1], length 1428
  18:29:00.253815 IP6 vps.https > lan-client.61111: Flags [.], seq 1429:2857, 
ack 412, win 16560, options [nop,nop,TS val 2 ecr 1], length 1428
  18:29:00.253816 IP6 vps.https > lan-client.61111: Flags [P.], seq 2857:3149, 
ack 412, win 16560, options [nop,nop,TS val 2 ecr 1], length 292

vps sends packets #1A, 2A, and 3A, pile of certs.  Only 3A fits in the
tunnel.

  18:29:00.315724 IP6 tserv1.nyc4.he.net > vps: ICMP6, packet too big, mtu 
1480, length 1240

he's tunnel endpoint objects to 1A, arriving ~62 ms later.

  18:29:00.315836 IP6 vps.https > lan-client.61111: Flags [.], seq 1:1409, ack 
412, win 16560, options [nop,nop,TS val 2 ecr 1], length 1408
  18:29:00.315838 IP6 vps.https > lan-client.61111: Flags [.], seq 1409:2817, 
ack 412, win 16560, options [nop,nop,TS val 2 ecr 1], length 1408
  18:29:00.315840 IP6 vps.https > lan-client.61111: Flags [P.], seq 2817:3149, 
ack 412, win 16560, options [nop,nop,TS val 2 ecr 1], length 332

vps immediately resends the certs, sliced into 3 packets differently as
1B, 2B, and 3B.

  18:29:00.316729 IP6 tserv1.nyc4.he.net > vps: ICMP6, packet too big, mtu 
1480, length 1240

he's tunnel endpoint objects to 1B.

  18:29:00.316760 IP6 vps.https > lan-client.61111: Flags [.], seq 1:1409, ack 
412, win 16560, options [nop,nop,TS val 2 ecr 1], length 1408
  18:29:00.316762 IP6 vps.https > lan-client.61111: Flags [.], seq 1409:2817, 
ack 412, win 16560, options [nop,nop,TS val 2 ecr 1], length 1408
  18:29:00.316763 IP6 vps.https > lan-client.61111: Flags [P.], seq 2817:3149, 
ack 412, win 16560, options [nop,nop,TS val 2 ecr 1], length 332

vps immediately resends the certs again, with the same split.

QUESTION: is this correct, or should it have somehow recorded that it
already did this?

  18:29:00.326324 IP6 lan-client.61111 > vps.https: Flags [.], ack 1, win 
16560, options [nop,nop,TS val 2 ecr 2,nop,nop,sack 1 {2857:3149}], length 0

ack arrives for packet #3 as SACK, ack point still 1 (the SYN).

  18:29:00.394163 IP6 lan-client.61111 > vps.https: Flags [.], ack 1409, win 
16384, options [nop,nop,TS val 2 ecr 2,nop,nop,sack 1 {2857:3149}], length 0

ack for 1B, and SACKing 3A

  18:29:00.394165 IP6 lan-client.61111 > vps.https: Flags [.], ack 2817, win 
16208, options [nop,nop,TS val 2 ecr 2,nop,nop,sack 1 {2857:3149}], length 0

ack for 2B, and SACKING 3A.  Not acking 2817-2857.

  18:29:00.394166 IP6 lan-client.61111 > vps.https: Flags [.], ack 3149, win 
16166, options [nop,nop,TS val 2 ecr 2], length 0

full ack for 3B, not SACK.

  18:29:00.394166 IP6 lan-client.61111 > vps.https: Flags [.], ack 3149, win 
16515, options [nop,nop,TS val 2 ecr 2], length 0

second ack for 3B, not SACK.  Yes, the time is the same but tcpdump
really has two lines.

QUESTION: why?

  18:29:00.394167 IP6 lan-client.61111 > vps.https: Flags [.], ack 3149, win 
16515, options [nop,nop,TS val 2 ecr 2,nop,nop,sack 1 {1:1409}], length 0

dupack for 1B 2nd transmit

  18:29:00.394167 IP6 lan-client.61111 > vps.https: Flags [.], ack 3149, win 
16515, options [nop,nop,TS val 2 ecr 2,nop,nop,sack 1 {1409:2817}], length 0

dupack for 2B 2nd transmit

  18:29:00.394168 IP6 lan-client.61111 > vps.https: Flags [.], ack 3149, win 
16515, options [nop,nop,TS val 2 ecr 2,nop,nop,sack 1 {2817:3149}], length 0

dupack for 3B 2nd transmit

from here on, looks mostly normal with full acks, no SACK.

  18:29:00.394168 IP6 lan-client.61111 > vps.https: Flags [P.], seq 412:492, 
ack 3149, win 16560, options [nop,nop,TS val 2 ecr 2], length 80
  18:29:00.394452 IP6 vps.https > lan-client.61111: Flags [P.], seq 3149:3436, 
ack 492, win 16560, options [nop,nop,TS val 2 ecr 2], length 287
  18:29:00.394509 IP6 vps.https > lan-client.61111: Flags [P.], seq 3436:3723, 
ack 492, win 16560, options [nop,nop,TS val 2 ecr 2], length 287
  18:29:00.468791 IP6 lan-client.61111 > vps.https: Flags [P.], seq 492:650, 
ack 3436, win 16524, options [nop,nop,TS val 2 ecr 2], length 158
  18:29:00.470602 IP6 vps.https > lan-client.61111: Flags [.], seq 3723:5131, 
ack 650, win 16560, options [nop,nop,TS val 2 ecr 2], length 1408
  18:29:00.470604 IP6 vps.https > lan-client.61111: Flags [.], seq 5131:6539, 
ack 650, win 16560, options [nop,nop,TS val 2 ecr 2], length 1408
  18:29:00.470606 IP6 vps.https > lan-client.61111: Flags [.], seq 6539:7947, 
ack 650, win 16560, options [nop,nop,TS val 2 ecr 2], length 1408
  18:29:00.470609 IP6 vps.https > lan-client.61111: Flags [.], seq 7947:9355, 
ack 650, win 16560, options [nop,nop,TS val 2 ecr 2], length 1408
  18:29:00.470611 IP6 vps.https > lan-client.61111: Flags [.], seq 9355:10763, 
ack 650, win 16560, options [nop,nop,TS val 2 ecr 2], length 1408
  18:29:00.550668 IP6 lan-client.61111 > vps.https: Flags [.], ack 5131, win 
16384, options [nop,nop,TS val 2 ecr 2], length 0
  18:29:00.550756 IP6 lan-client.61111 > vps.https: Flags [.], ack 7947, win 
16032, options [nop,nop,TS val 2 ecr 2], length 0
  18:29:00.550757 IP6 lan-client.61111 > vps.https: Flags [.], ack 7947, win 
16560, options [nop,nop,TS val 2 ecr 2], length 0
  18:29:00.550757 IP6 lan-client.61111 > vps.https: Flags [.], ack 10763, win 
16208, options [nop,nop,TS val 2 ecr 2], length 0
  18:29:00.550758 IP6 lan-client.61111 > vps.https: Flags [.], ack 10763, win 
16560, options [nop,nop,TS val 2 ecr 2], length 0

except there are two acks for 7947 and two for 10763.

QUESTION: why and is this correct?  Could there be reordering?

  18:29:00.550760 IP6 vps.https > lan-client.61111: Flags [.], seq 10763:12171, 
ack 650, win 16560, options [nop,nop,TS val 2 ecr 2], length 1408
  18:29:00.550762 IP6 vps.https > lan-client.61111: Flags [.], seq 12171:13579, 
ack 650, win 16560, options [nop,nop,TS val 2 ecr 2], length 1408
  18:29:00.550791 IP6 vps.https > lan-client.61111: Flags [.], seq 13579:14987, 
ack 650, win 16560, options [nop,nop,TS val 2 ecr 2], length 1408
  18:29:00.550793 IP6 vps.https > lan-client.61111: Flags [.], seq 14987:16395, 
ack 650, win 16560, options [nop,nop,TS val 2 ecr 2], length 1408
  18:29:00.550795 IP6 vps.https > lan-client.61111: Flags [.], seq 16395:17803, 
ack 650, win 16560, options [nop,nop,TS val 2 ecr 2], length 1408
  18:29:00.550796 IP6 vps.https > lan-client.61111: Flags [P.], seq 
17803:18082, ack 650, win 16560, options [nop,nop,TS val 2 ecr 2], length 279
  18:29:00.626999 IP6 lan-client.61111 > vps.https: Flags [.], ack 13579, win 
16208, options [nop,nop,TS val 2 ecr 2], length 0
  18:29:00.627001 IP6 lan-client.61111 > vps.https: Flags [.], ack 13579, win 
16560, options [nop,nop,TS val 2 ecr 2], length 0
  18:29:00.630529 IP6 lan-client.61111 > vps.https: Flags [.], ack 16395, win 
16208, options [nop,nop,TS val 2 ecr 2], length 0
  18:29:00.630530 IP6 lan-client.61111 > vps.https: Flags [.], ack 18082, win 
15997, options [nop,nop,TS val 2 ecr 2], length 0
  18:29:00.630532 IP6 lan-client.61111 > vps.https: Flags [.], ack 18082, win 
16560, options [nop,nop,TS val 2 ecr 2], length 0

Again extra acks

  18:29:00.630532 IP6 lan-client.61111 > vps.https: Flags [F.], seq 650, ack 
18082, win 16560, options [nop,nop,TS val 2 ecr 2], length 0
  18:29:00.630563 IP6 vps.https > lan-client.61111: Flags [.], ack 651, win 
16560, options [nop,nop,TS val 2 ecr 2], length 0
  18:29:00.630665 IP6 vps.https > lan-client.61111: Flags [F.], seq 18082, ack 
651, win 16560, options [nop,nop,TS val 2 ecr 2], length 0
  18:29:00.698763 IP6 lan-client.61111 > vps.https: Flags [.], ack 18083, win 
16560, options [nop,nop,TS val 2 ecr 2], length 0

normal close (where vps acks the FIN before the server shuts down its
half); seems ok.

Reply via email to