On 19 January 2018 at 13:48, Mike Hammett <na...@ics-il.net> wrote: > Other than people improperly blocking ICMP, when does PMTUD not work? > Honest question, not troll. > > It can break under _certain_ scenarios with Anycast.
It can break under _certain_ scenarios in v6 with ECMP. It can break across an LB in L4 mode, when a real behind the LB has an unexpected MSS. None of these scenarios are the normal, obviously, however PMTUD does have some edge-cases. /Ruairi > > > > ----- > Mike Hammett > Intelligent Computing Solutions > http://www.ics-il.com > > Midwest-IX > http://www.midwest-ix.com > > ----- Original Message ----- > > From: "Mikael Abrahamsson" <swm...@swm.pp.se> > To: "Michael Crapse" <mich...@wi-fiber.io> > Cc: "NANOG list" <nanog@nanog.org> > Sent: Friday, January 19, 2018 1:22:02 AM > Subject: Re: MTU to CDN's > > On Thu, 18 Jan 2018, Michael Crapse wrote: > > > I don't mind letting the client premises routers break down 9000 byte > > packets. My ISP controls end to end connectivity. 80% of people even let > > our techs change settings on their computer, this would allow me to give > > ~5% increase in speeds, and less network congestion for end users for a > one > > time $60 service many people would want. It's also where the internet > > should be heading... Not to beat a dead horse(re:ipv6 ) but why hasn't > the > > entire internet just moved to 9000(or 9600 L2) byte MTU? It was created > for > > the jump to gigabit... That's 4 orders of magnitude ago. The internet > > backbone shouldn't be shuffling around 1500byte packets at 1tbps. That > > means if you want to layer 3 that data, you need a router capable of more > > than half a billion packets/s forwarding capacity. On the other hand, > with > > even just a 9000 byte MTU, TCP/IP overhead is reduced 6 fold, and > > forwarding capacity needs just 100 or so mpps capacity. Routers that > > forward at that rate are found for less than $2k. > > As usual, there are 5-10 (or more) factors playing into this. Some, in > random order: > > 1. IEEE hasn't standardised > 1500 byte ethernet packets > 2. DSL/WIFI chips typically don't support > ~2300 because reasons. > 3. Because 2, most SoC ethernet chips don't either > 4. There is no standardised way to understand/probe the L2 MTU to your > next hop (ARP/ND and probing if the value actually works) > 5. PMTUD doesn't always work. > 6. PLPMTUD hasn't been implemented neither in protocols nor hosts > generally. > 7. Some implementations have been optimized to work on packets < 2000 > bytes and actually has less performance than if they have to support > larger packets (they will allocate 2k buffer memory per packet), 9k is > ill-fitting across 2^X values > 8. Because of all above reasons, mixed-MTU LAN doesn't work, and it's > going to be mixed-MTU unless you control all devices (which is typically > not the case outside of the datacenter). > 9. The PPS problem in hosts and routers was solved by hardware offloading > to NICs and forwarding NPUs/ASICs with very high lookup speeds where PPS > no longer was a big problem. > > On the value to choose for "large MTU", 9000 for edge and 9180 for core is > what I advocate, after non-trivial amount of looking into this. All major > core routing platforms work with 9180 (with JunOS only supporting this > after 2015 or something). So if we'd want to standardise on MTU that all > devices should support, then it's 9180, but we'd typically use 9000 in RA > to send to devices. > > If we want a higher MTU to be deployable across the Internet, we need to > make it incrementally deployable. Some key things to achieve that: > > 1. Get something like > https://tools.ietf.org/html/draft-van-beijnum-multi-mtu-05 implemented. > 2. Go to the IETF and get a document published that advises all protocols > to support PLMTUD (RFC4821) > > 1 to enable mixed-MTU lans. > 2 to enable large MTU hosts to actually be able to communicate when PMTUD > doesn't work. > > With this in place (wait ~10 years), larger MTU is now incrementally > deployable which means it'll be deployable on the Internet, and IEEE might > actually accept to standardise > 1500 byte packets for ethernet. > > -- > Mikael Abrahamsson email: swm...@swm.pp.se > >