On Thu, Mar 24, 2022, 3:11 PM Joel M. Halpern <j...@joelhalpern.com> wrote:
> I do remember token ring. (I was working from 1983 for folks who > delivered 50 megabits starting in 1976, and built some of the best FDDI > around at the time.) > > I am not claiming that increasing the MTU from 1500 to 9K did nothing. > I am claiming that diminishing returns has distinctly set in. > If the Data Center folks (who tend these days to have the highest > demand) really want a 64K link, they would have one. Joel, Indeed. Google, at least, is looking into it at least insofar as getting bigger packets for GRO/GSO. See https://netdevconf.info/0x15/session.html?BIG-TCP Tom They don't. They > prefer to use Ethernet. > The improvement via increasing the MTU further runs into many obstacles, > including such issues as error detection code coverage), application > desired communication size, retransmission costs, and on and on. > Yes, they can all be overcome. But the returns get smaller and smaller. > > So absent real evidence that there is a problem needing the network > stack and protocol to change, I just don't see this (IP Parcels) as > providing enough benefit to justify the work. > > > Yours, > Joel > > On 3/24/2022 3:05 PM, Templin (US), Fred L wrote: > > Hi Joel, > > > >> -----Original Message----- > >> From: Joel M. Halpern [mailto:j...@joelhalpern.com] > >> Sent: Thursday, March 24, 2022 11:41 AM > >> To: Templin (US), Fred L <fred.l.temp...@boeing.com> > >> Cc: int-area <int-area@ietf.org> > >> Subject: Re: [Int-area] IP Parcels improves performance for end systems > >> > >> This exchange seems to assume facts not in evidence. > > > > It is a fact that back in the 1980's the architects took simple token > ring, > > changed the over-the-wire coding to 4B/5B, replaced the copper with > > fiber and then boosted the MTU by a factor of 3 and called it FDDI. They > > were able to claim what at the time was an astounding 100Mbps (i.e., in > > comparison to the 10Mbps Ethernet of the day), but the performance > > gain was largely due to the increase in the MTU. They told me: "Fred, > > go figure out the path MTU problem", and they said: "go talk to Jeff > > Mogul out in Palo Alto who knows something about it". But, then, the > > Path MTU discovery group took a left turn at Albuquerque and left the > > Internet as a tiny MTU wasteland. We have the opportunity to fix all > > of that now - so, let's get it right for once. > > > > Fred > > > > > >> > >> And the whole premise is spending resources in other parts of the > >> network for a marginal diminishing return in the hosts. > >> > >> It simply does not add up. > >> > >> Yours, > >> Joel > >> > >> On 3/24/2022 2:19 PM, Templin (US), Fred L wrote: > >>>> The category 1) links are not yet in existence, but once parcels > start to > >>>> enter the mainstream innovation will drive the creation of new kinds > of > >>>> data links (1TB Ethernet?) that will be rolled out as new hardware. > >>> > >>> I want to put a gold star next to the above. AFAICT, pushing the MTU > and > >>> implementing IP parcels can get us to 1TB Ethernet practically > overnight. > >>> Back in the 1980's, FDDI proved that pushing to larger MTUs could boost > >>> throughput without changing the speed of light, so why wouldn't the > same > >>> concept work for Ethernet in the modern era? > >>> > >>> Fred > >>> > >>>> -----Original Message----- > >>>> From: Int-area [mailto:int-area-boun...@ietf.org] On Behalf Of > Templin (US), Fred L > >>>> Sent: Thursday, March 24, 2022 9:45 AM > >>>> To: Tom Herbert <t...@herbertland.com> > >>>> Cc: int-area <int-area@ietf.org>; Eggert, Lars <l...@netapp.com>; > l...@eggert.org > >>>> Subject: Re: [Int-area] IP Parcels improves performance for end > systems > >>>> > >>>> Hi Tom - responses below: > >>>> > >>>>> -----Original Message----- > >>>>> From: Tom Herbert [mailto:t...@herbertland.com] > >>>>> Sent: Thursday, March 24, 2022 9:09 AM > >>>>> To: Templin (US), Fred L <fred.l.temp...@boeing.com> > >>>>> Cc: Eggert, Lars <l...@netapp.com>; int-area <int-area@ietf.org>; > l...@eggert.org > >>>>> Subject: Re: [Int-area] IP Parcels improves performance for end > systems > >>>>> > >>>>> On Thu, Mar 24, 2022 at 7:27 AM Templin (US), Fred L > >>>>> <fred.l.temp...@boeing.com> wrote: > >>>>>> > >>>>>> Tom - see below: > >>>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: Tom Herbert [mailto:t...@herbertland.com] > >>>>>>> Sent: Thursday, March 24, 2022 6:22 AM > >>>>>>> To: Templin (US), Fred L <fred.l.temp...@boeing.com> > >>>>>>> Cc: Eggert, Lars <l...@netapp.com>; int-area <int-area@ietf.org>; > l...@eggert.org > >>>>>>> Subject: Re: [Int-area] IP Parcels improves performance for end > systems > >>>>>>> > >>>>>>> On Wed, Mar 23, 2022 at 10:47 AM Templin (US), Fred L > >>>>>>> <fred.l.temp...@boeing.com> wrote: > >>>>>>>> > >>>>>>>> Tom, looks like you have switched over to HTML which can be a > real conversation-killer. > >>>>>>>> > >>>>>>>> But, to some points you raised that require a response: > >>>>>>>> > >>>>>>>>> You can't turn it off UDP checksums for IPv6 (except for narrow > case of encapsulation). > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> That sounds like a good reason to continue to use IPv4 – at least > as far as end system > >>>>>>>> > >>>>>>>> addressing is concerned – right? > >>>>>>> > >>>>>>> > >>>>>>> Not at all. All NICs today provide checksum offload and so it's > >>>>>>> basically zero cost to perform the UDP checksum. The fact that we > >>>>>>> don't have to do extra checks on the UDPv6 checksum field to see if > >>>>>>> it's zero actually is a performance improvement over UDPv4. (btw, I > >>>>>>> will present implementation of the Internet checksum at TSVGWG > Friday, > >>>>>>> this will include discussion of checksum offloads). > >>>>>> > >>>>>> Actually, my assertion wasn't good to begin with because for IPv6 > even if UDP > >>>>>> checksums are turned off the OMNI encapsulation layer includes a > checksum > >>>>>> that ensures the integrity of the IPv6 header. UDP checksums off > for IPv6 when > >>>>>> OMNI encapsulation is used is perfectly fine. > >>>>>> > >>>>> I assume you are referring to RFC6935 and RFC6936 that allow the > UDPv6 > >>>>> to be zero for tunneling with a very constrained set of conditions. > >>>>> > >>>>>>>>> If it's a standard per packet Internet checksum then a lot of HW > could do it. If it's something like CRC32 then probably not. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> The integrity check is covered in RFC5327, and I honestly haven’t > had a chance to > >>>>>>>> > >>>>>>>> look at that myself yet. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> LTP is a nice experiment, but I'm more interested as to the > interaction between IP parcels and TCP or QUIC. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> Please be aware that while LTP may seem obscure at the moment > that may be changing now > >>>>>>>> > >>>>>>>> that the core DTN standards have been published. As DTN use > becomes more widespread I > >>>>>>>> > >>>>>>>> think we can see LTP also come into wider adoption. > >>>>>>> > >>>>>>> > >>>>>>> My assumption is that IP parcels is intended to be a general > solution > >>>>>>> of all protocols. Maybe in the next draft you could discuss the > >>>>>>> details of TCP in IP parcels including how to offload the TCP > >>>>>>> checksum. > >>>>>> > >>>>>> I could certainly add that. For TCP, each of the concatenated > segments would > >>>>>> include its own TCP header with checksum field included. Any > hardware that > >>>>>> knows the structure of an IP Parcel can then simply do the TCP > checksum > >>>>>> offload function for each segment. > >>>>> > >>>>> To be honest, the odds of ever getting support in NIC hardware for IP > >>>>> parcels are extremely slim. Hardware vendors are driven by economics, > >>>>> so the only way they would do that would be to demonstrate widespread > >>>>> deployment of the protocol. But even then, with all the legacy > >>>>> hardware in deployment it will take many years before there's any > >>>>> appreciable traction. IMO, the better approach is to figure out how > to > >>>>> leverage the existing hardware features for use with IP parcels. > >>>> > >>>> There will be two kinds of links that will need to be > "Parcel-capable": > >>>> 1) Edge network (physical) links that natively forward large parcels, > and > >>>> 2) OMNI (virtual) links that forward parcels using encapsulation and > >>>> fragmentation. > >>>> > >>>> The category 1) links are not yet in existence, but once parcels > start to > >>>> enter the mainstream innovation will drive the creation of new kinds > of > >>>> data links (1TB Ethernet?) that will be rolled out as new hardware. > And > >>>> that new hardware can be made to understand the structure of parcels > >>>> from the beginning. The category 2) links might take a large parcel > from > >>>> the upper layers on the local node (or one that has been forwarded by > >>>> a parcel-capable link) and break it down into smaller sub-parcels then > >>>> apply IP fragmentation to each sub-parcel and send the fragments to an > >>>> OMNI link egress node. You know better than me how checksum offload > >>>> could be applied in an environment like that. > >>>> > >>>>>>>>> There was quite a bit of work and discussion on this in Linux. I > believe the deviation from the standard was motivated by some > >>>>>>>> > >>>>>>>>> deployed devices required the IPID be set on receive, and > setting IPID with DF equals to 1 is thought to be innocuous. You may > >>>>>>>> > >>>>>>>>> want to look at Alex Duyck's papers on UDP GSO, he wrote a lot > of code in this area. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> RFC6864 has quite a bit to say about coding IP ID with DF=1 – > mostly in the negative. > >>>>>>>> > >>>>>>>> But, what I have seen in the linux code seems to indicate that > there is not even any > >>>>>>>> > >>>>>>>> coordination between the GSO source and the GRO destination – > instead, GRO simply > >>>>>>>> > >>>>>>>> starts gluing together packets that appear to have consecutive IP > IDs without ever first > >>>>>>>> > >>>>>>>> checking that they were sent by a peer that was earnestly doing > GSO. These aspects > >>>>>>>> > >>>>>>>> would make it very difficult to work GSO/GRO into an IETF > standard, plus it doesn’t > >>>>>>>> > >>>>>>>> work for IPv6 at all where there is no IP ID included by default. > IP Parcels addresses > >>>>>>>> > >>>>>>>> all of these points, and can be made into a standard. > >>>>>>> > >>>>>>> > >>>>>>> Huh? GRO/GSO works perfectly fine with IPV6. > >>>>>> > >>>>>> Where is the spec for that? My understanding is that GSO/GRO > leverages the > >>>>>> IP ID for IPv4. But, for IPv6, there is no IP ID unless you include > a Fragment Header. > >>>>>> Does IPv6 somehow do GSO/GRO differently? > >>>>>> > >>>>> > >>>>> GRO and GSO don't use the IPID to match a flow. The primary match is > >>>>> the TCP 4-tuple. > >>>> > >>>> Correct, the 5-tuple (src-ip, src-port, dst-ip, dst-pot, proto) is > what is used > >>>> to match the flow. But, you need more than that in order to correctly > paste > >>>> back together with GRO the segments of an original ULP buffer that was > >>>> broken down by GSO - you need Identifications and/or other markings in > >>>> the IP headers to give a reassembly context. Otherwise, GRO might end > >>>> up gluing together old and new pieces of ULP data and/or impart a lot > of > >>>> reordering. IP Parcels have well behaved Identifications and Parcel > IDs so > >>>> that the original ULP buffer context is honored during reassembly. > >>>> > >>>>> There's also another possibility with IPv6-- use jumbograms. For > >>>>> instance, instead of GRO reassembling segments up to a 64K packet, it > >>>>> could be modified to reassemble up to a 4G packet using IPv6 > >>>>> jumbograms where one really big packet is given to the stack. > >>>>> > >>>>> But we probably don't even need jumbograms for that. In Linux, GRO > >>>>> might be taught to reassemble up to 4G super packet and set a flag > bit > >>>>> in the skbuf to ignore the IP payload field and get the length from > >>>>> the skbuf len field (as though a jumbogram was received). This trick > >>>>> would work for IPV4 and IPv6 and GSO as well. It should also work TSO > >>>>> if the device takes the IP payload length to be that for each > segment. > >>>> > >>>> Yes, I was planning to give that a try to see what kind of performance > >>>> can be gotten with GSO/GRO when you exceed 64KB. But, my concern > >>>> with GSO/GRO is that the reassembly is (relatively) unguided and > >>>> haphazard and can result in mis-ordered concatenations. And, there is > >>>> no protocol by which the GRO receiver can imply that the things it is > >>>> gluing together actually originated from a sender that is earnestly > doing > >>>> GSO. So, I do not see how GSO/GRO as I see it in the implementation > >>>> could be made into a standard, whereas there is a clear path for > >>>> standardizing IP parcels. > >>>> > >>>> Another thing I forgot to mention is that in my experiments with > GSO/GRO > >>>> I found that it won't let me set a GSO segment size that would cause > the > >>>> resulting IP packets to exceed the path MTU (i.e., it won't allow > fragmentation). > >>>> I fixed that by configuring IPv4-in-IPv6 encapsulation per RFC2473 > and then > >>>> allowed the IPv6 layer to apply fragmentation to the encapsulated > packet. > >>>> That way, I can use IPv4 GSO segment sizes up to ~64KB. > >>>> > >>>> Fred > >>>> > >>>>> > >>>>> Tom > >>>>> > >>>>>> Thanks - Fred > >>>>>> > >>>>>>> Tom > >>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> Fred > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> From: Tom Herbert [mailto:t...@herbertland.com] > >>>>>>>> Sent: Wednesday, March 23, 2022 9:37 AM > >>>>>>>> To: Templin (US), Fred L <fred.l.temp...@boeing.com> > >>>>>>>> Cc: Eggert, Lars <l...@netapp.com>; int-area <int-area@ietf.org>; > l...@eggert.org > >>>>>>>> Subject: Re: [EXTERNAL] Re: [Int-area] IP Parcels improves > performance for end systems > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> EXT email: be mindful of links/attachments. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> On Wed, Mar 23, 2022, 9:54 AM Templin (US), Fred L < > fred.l.temp...@boeing.com> wrote: > >>>>>>>> > >>>>>>>> Hi Tom, > >>>>>>>> > >>>>>>>>> -----Original Message----- > >>>>>>>>> From: Tom Herbert [mailto:t...@herbertland.com] > >>>>>>>>> Sent: Wednesday, March 23, 2022 6:19 AM > >>>>>>>>> To: Templin (US), Fred L <fred.l.temp...@boeing.com> > >>>>>>>>> Cc: Eggert, Lars <l...@netapp.com>; int-area@ietf.org; > l...@eggert.org > >>>>>>>>> Subject: Re: [Int-area] IP Parcels improves performance for end > systems > >>>>>>>>> > >>>>>>>>> On Tue, Mar 22, 2022 at 10:38 AM Templin (US), Fred L > >>>>>>>>> <fred.l.temp...@boeing.com> wrote: > >>>>>>>>>> > >>>>>>>>>> Tom, see below: > >>>>>>>>>> > >>>>>>>>>>> -----Original Message----- > >>>>>>>>>>> From: Tom Herbert [mailto:t...@herbertland.com] > >>>>>>>>>>> Sent: Tuesday, March 22, 2022 10:00 AM > >>>>>>>>>>> To: Templin (US), Fred L <fred.l.temp...@boeing.com> > >>>>>>>>>>> Cc: Eggert, Lars <l...@netapp.com>; int-area@ietf.org > >>>>>>>>>>> Subject: Re: [Int-area] IP Parcels improves performance for > end systems > >>>>>>>>>>> > >>>>>>>>>>> On Tue, Mar 22, 2022 at 7:42 AM Templin (US), Fred L > >>>>>>>>>>> <fred.l.temp...@boeing.com> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> Lars, I did a poor job of answering your question. One of the > most important aspects of > >>>>>>>>>>>> > >>>>>>>>>>>> IP Parcels in relation to TSO and GSO/GRO is that transports > get to use a full 4MB buffer > >>>>>>>>>>>> > >>>>>>>>>>>> instead of the 64KB limit in current practices. This is > possible due to the IP Parcel jumbo > >>>>>>>>>>>> > >>>>>>>>>>>> payload option encapsulation which provides a 32-bit length > field instead of just a 16-bit. > >>>>>>>>>>>> > >>>>>>>>>>>> By allowing the transport to present the IP layer with a > buffer of up to 4MB, it reduces > >>>>>>>>>>>> > >>>>>>>>>>>> the overhead, minimizes system calls and interrupts, etc. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> So, yes, IP Parcels is very much about improving the > performance for end systems in > >>>>>>>>>>>> > >>>>>>>>>>>> comparison with current practice (GSO/GRO and TSO). > >>>>>>>>>>> > >>>>>>>>>>> Hi Fred, > >>>>>>>>>>> > >>>>>>>>>>> The nice thing about TSO/GSO/GRO is that they don't require any > >>>>>>>>>>> changes to the protocol as just implementation techniques, also > >>>>>>>>>>> they're one sided opitmizations meaning for instance that TSO > can be > >>>>>>>>>>> used at the sender without requiring GRO to be used at the > receiver. > >>>>>>>>>>> My understanding is that IP parcels requires new protocol that > would > >>>>>>>>>>> need to be implemented on both endpoints and possibly in some > routers. > >>>>>>>>>> > >>>>>>>>>> It is not entirely true that the protocol needs to be > implemented on both > >>>>>>>>>> endpoints . Sources that send IP Parcels send them into a > Parcel-capable path > >>>>>>>>>> which ends at either the final destination or a router for > which the next hop is > >>>>>>>>>> not Parcel-capable. If the Parcel-capable path extends all the > way to the final > >>>>>>>>>> destination, then the Parcel is delivered to the destination > which knows how > >>>>>>>>>> to deal with it. If the Parcel-capable path ends at a router > somewhere in the > >>>>>>>>>> middle, the router opens the Parcel and sends each enclosed > segment as an > >>>>>>>>>> independent IP packet. The final destination is then free to > apply GRO to the > >>>>>>>>>> incoming IP packets even if it does not understand Parcels. > >>>>>>>>>> > >>>>>>>>>> IP Parcels is about efficient shipping and handling just like > the major online > >>>>>>>>>> retailer service model I described during the talk. The goal is > to deliver the > >>>>>>>>>> fewest and largest possible parcels to the final destination > rather than > >>>>>>>>>> delivering lots of small IP packets. It is good for the network > and good for > >>>>>>>>>> the end systems both. If this were not true, then Amazon would > send the > >>>>>>>>>> consumer 50 small boxes with 1 item each instead of 1 larger > box with all > >>>>>>>>>> 50 items inside. And, we all know what they would choose to do. > >>>>>>>>>> > >>>>>>>>>>> Do you have data that shows the benefits of IP Parcels in > light of > >>>>>>>>>>> these requirements? > >>>>>>>>>> > >>>>>>>>>> I have data that shows that GSO/GRO is good for packaging sizes > up to 64KB > >>>>>>>>>> even if the enclosed segments will require IP fragmentation > upon transmission. > >>>>>>>>>> The data implies that even larger packaging sizes (up to a > maximum of 4MB) > >>>>>>>>>> would be better still. > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> Fred, > >>>>>>>>> > >>>>>>>>> You seem to be only looking at the problem from a per packet cost > >>>>>>>>> point of view. There is also per byte cost, particularly in the > >>>>>>>>> computation of the TCP/UDP checksum. The cost is hidden in modern > >>>>>>>>> implementations by checksum offload, and for segmentation > offload we > >>>>>>>>> have methods to preserve the utility of checksum offload. IP > parcels > >>>>>>>>> will have to also leverage checksum offload, because if the > checksum > >>>>>>>>> is not offloaded then the cost of computing the payload checksum > in > >>>>>>>>> CPU would dwarf any benefits we'd get by using segments larger > than > >>>>>>>>> 64K. > >>>>>>>> > >>>>>>>> There is plenty of opportunity to apply hardware checksum offload > since > >>>>>>>> the structure of a Parcel will be very standard. My experiments > have been > >>>>>>>> with a protocol called LTP which is layered over UDP/IP as some > other > >>>>>>>> upper layer protocols are. LTP includes a segment-by-segment > checksum > >>>>>>>> that is used at its level in the absence of lower layer integrity > checks, so > >>>>>>>> for larger Parcels LTP would use that and turn off UDP checksums > >>>>>>>> altogether. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> You can't turn it off UDP checksums for IPv6 (except for narrow > case of encapsulation). > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> As far as I am aware, there are currently no hardware > >>>>>>>> checksum offload implementations available for calculating the > >>>>>>>> LTP checksums. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> If it's a standard per packet Internet checksum then a lot of HW > could do it. If it's something like CRC32 then probably not. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> LTP is a nice experiment, but I'm more interested as to the > interaction between IP parcels and TCP or QUIC. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> Speaking of standard, AFAICT GSO/GRO are doing something very > >>>>>>>> non-standard. GSO seems to be coding the IP ID field in the IPv4 > >>>>>>>> headers of packets with DF=1 which goes against RFC 6864. When > >>>>>>>> DF=1, GSO cannot simply claim the IP ID and code it as if there > were > >>>>>>>> some sort of protocol. Or, if it does, there would be no way to > >>>>>>>> standardize it. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> There was quite a bit of work and discussion on this in Linux. I > believe the deviation from the standard was motivated by some > >>>>> deployed > >>>>>>> devices required the IPID be set on receive, and setting IPID with > DF equals to 1 is thought to be innocuous. You may want to look at > >>>>> Alex > >>>>>>> Duyck's papers on UDP GSO, he wrote a lot of code in this area. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> Tom > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> Fred > >>>>>>>> > >>>>>>>>> > >>>>>>>>> Tom > >>>>>>>>> > >>>>>>>>>> Fred > >>>>>>>>>> > >>>>>>>>>>> Thanks, > >>>>>>>>>>> Tom > >>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks - Fred > >>>>>>>>>>>> > >>>>>>>>>>>> _______________________________________________ > >>>>>>>>>>>> Int-area mailing list > >>>>>>>>>>>> Int-area@ietf.org > >>>>>>>>>>>> https://www.ietf.org/mailman/listinfo/int-area > >>>>>> > >>>> > >>>> _______________________________________________ > >>>> Int-area mailing list > >>>> Int-area@ietf.org > >>>> https://www.ietf.org/mailman/listinfo/int-area > >>> _______________________________________________ > >>> Int-area mailing list > >>> Int-area@ietf.org > >>> https://www.ietf.org/mailman/listinfo/int-area > > > > _______________________________________________ > Int-area mailing list > Int-area@ietf.org > https://www.ietf.org/mailman/listinfo/int-area >
_______________________________________________ Int-area mailing list Int-area@ietf.org https://www.ietf.org/mailman/listinfo/int-area