Re: [Int-area] IP Parcels improves performance for end systems

Templin (US), Fred L Thu, 24 Mar 2022 13:46:53 -0700

Joel, to continue to argue only the MTU aspects of having an Adaptation Layer
loses sight of the fact that it is about much more than just that. The 
Adaptation
Layer gives the "6M's of Modern Internetworking" including Multilink, Multinet,
Mobility, Multicast, Multihop and MTU determination. How all of these aspects
fit together is well explained in the document portfolio, but the combined
aspects of all - again without disrupting the existing deployment - more than
motivate the introduction of the new layer.


If you want to continue to argue the benefits of layering and segmentation and
reassembly while starting with something like GSO/GRO or IP parcels, the studies
that have shown benefits are documented. If you wonder why large parcels are
good, look no further than the Amazon shipping model.

Fred

> -----Original Message-----
> From: Joel M. Halpern [mailto:j...@joelhalpern.com]
> Sent: Thursday, March 24, 2022 1:38 PM
> To: Templin (US), Fred L <fred.l.temp...@boeing.com>
> Cc: int-area <int-area@ietf.org>
> Subject: Re: [Int-area] IP Parcels improves performance for end systems
> 
> I understood that.  I just don't see the benefit.
> 
> We have a host.  It is assembling data to send.   It is doing so
> progressively.
> It can either send in nice sized pieces (9K? 64K) as it has the data and
> everything flows so that the receiver can process the data in pieces.
> 
> Or it can wait until it has a VERY large amount of data to send.  Get a
> small I/O benefit in shipping all that data out. (Having spent latency
> collecting the information). And then have some router upstream spend
> cycles, power, etc breaking that back down into reasonable pieces?    Why?
> 
> Note that if you really want just the host I/O benefit. work with the
> TCP (or presumably QUIC) offload folks so that the host sends data in
> whatever size it likes to the outboard engine.  And gets a continuous
> stream of bytes / blocks from the outboard engine when receiving.  No
> changes to the transport protocol.  No changes to IP.  No adaptation
> layer.  No router trying to break a "parcel" apart.
> 
> Yours,
> Joel
> 
> On 3/24/2022 4:25 PM, Templin (US), Fred L wrote:
> > Joel, what you may be missing is that we are introducing a new layer in
> > the Internet architecture known as the Adaptation Layer - that layer that
> > logically resides between L3 and L2. Remember AAL5? it is kind of like that,
> > except over heterogeneous Internetworks instead of over a switched fabric
> > with 53B cells.
> >
> > So yes, the data is sent in pieces but the pieces are broken down 
> > progressively
> > to smaller pieces through the layers. But, the core routers will see no 
> > changes
> > while the end systems will see the benefits of more efficient packaging 
> > through
> > the use of parcels.
> >
> > Fred
> >
> >> -----Original Message-----
> >> From: Joel M. Halpern [mailto:j...@joelhalpern.com]
> >> Sent: Thursday, March 24, 2022 12:41 PM
> >> To: Templin (US), Fred L <fred.l.temp...@boeing.com>
> >> Cc: int-area <int-area@ietf.org>
> >> Subject: [EXTERNAL] Re: [Int-area] IP Parcels improves performance for end 
> >> systems
> >>
> >> EXT email: be mindful of links/attachments.
> >>
> >>
> >>
> >> I will observe that if one is sending a very large set of data, one
> >> needs to assemble that very large set of data.  I have trouble
> >> constructing a situation in which is better to spend all the time
> >> assembling it, and then start sending data once it is all assembled.
> >> Send it in pieces.  I suppose that there are a few corner cases where
> >> all the data is in memory to send for other reasons.  And the receiver
> >> wants to get it all in memory (rather than needing to store or process
> >> it a piece at a time.)  Although most of those divide the data and send
> >> it to different places, so as to parallelize the computation.
> >>
> >> You claimed that if we had Terabit IP we would have terabit link MTUs.
> >> Since we have 64K IP, and do not have 64K link MTUs, I think we need
> >> real evidence.
> >>
> >> Yours,
> >> Joel
> >>
> >> On 3/24/2022 3:25 PM, Templin (US), Fred L wrote:
> >>> Joel, I can demonstrate today (and have documented) that some ULPs see 
> >>> dramatic
> >>> increases in performance proportional to the ULP segment sizes they use. 
> >>> This is true
> >>> when the ULP segments are encapsulated and fragmented, and so must also 
> >>> be true
> >>> when they can be sent over the wire in once piece over large-MTU links 
> >>> and paths.
> >>> This was known even back in the day when NFS was run over UDP and saw 
> >>> dramatic
> >>> performance gains for boosting the NFS ULP block size.
> >>>
> >>> Your argument seems to be one of "let's just accept the status quo and 
> >>> never mind
> >>> how we got here". What I am saying is that we can fix things to work the 
> >>> way they
> >>> should have all along, and without having to do a forklift upgrade on the 
> >>> entire
> >>> Internet. The OMNI service can be deployed on existing networking gear to 
> >>> make
> >>> the virtual link extend from the core out to as far as the edges as 
> >>> possible making
> >>> that entire expanse parcel-capable. And, then large-MTU parcel-capable 
> >>> links can
> >>> begin to proliferate in the edges at a pace that suits them.
> >>>
> >>> BTW, coincidentally, my professional career got started in 1983 also. 
> >>> Admittedly,
> >>> I did not get into network driver and NIC architecture support until 1986.
> >>>
> >>> Fred
> >>>
> >>>> -----Original Message-----
> >>>> From: Joel M. Halpern [mailto:j...@joelhalpern.com]
> >>>> Sent: Thursday, March 24, 2022 12:11 PM
> >>>> To: Templin (US), Fred L <fred.l.temp...@boeing.com>
> >>>> Cc: int-area <int-area@ietf.org>
> >>>> Subject: [EXTERNAL] Re: [Int-area] IP Parcels improves performance for 
> >>>> end systems
> >>>>
> >>>> EXT email: be mindful of links/attachments.
> >>>>
> >>>>
> >>>>
> >>>> I do remember token ring.  (I was working from 1983 for folks who
> >>>> delivered 50 megabits starting in 1976, and built some of the best FDDI
> >>>> around at the time.)
> >>>>
> >>>> I am not claiming that increasing the MTU from 1500 to 9K did nothing.
> >>>> I am claiming that diminishing returns has distinctly set in.
> >>>> If the Data Center folks (who tend these days to have the highest
> >>>> demand) really want a 64K link, they would have one.  They don't.  They
> >>>> prefer to use Ethernet.
> >>>> The improvement via increasing the MTU further runs into many obstacles,
> >>>> including such issues as error detection code coverage), application
> >>>> desired communication size, retransmission costs, and on and on.
> >>>> Yes, they can all be overcome.   But the returns get smaller and smaller.
> >>>>
> >>>> So absent real evidence that there is a problem needing the network
> >>>> stack and protocol to change, I just don't see this (IP Parcels) as
> >>>> providing enough benefit to justify the work.
> >>>>
> >>>> Yours,
> >>>> Joel
> >>>>
> >>>> On 3/24/2022 3:05 PM, Templin (US), Fred L wrote:
> >>>>> Hi Joel,
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Joel M. Halpern [mailto:j...@joelhalpern.com]
> >>>>>> Sent: Thursday, March 24, 2022 11:41 AM
> >>>>>> To: Templin (US), Fred L <fred.l.temp...@boeing.com>
> >>>>>> Cc: int-area <int-area@ietf.org>
> >>>>>> Subject: Re: [Int-area] IP Parcels improves performance for end systems
> >>>>>>
> >>>>>> This exchange seems to assume facts not in evidence.
> >>>>>
> >>>>> It is a fact that back in the 1980's the architects took simple token 
> >>>>> ring,
> >>>>> changed the over-the-wire coding to 4B/5B, replaced the copper with
> >>>>> fiber and then boosted the MTU by a factor of 3 and called it FDDI. They
> >>>>> were able to claim what at the time was an astounding 100Mbps (i.e., in
> >>>>> comparison to the 10Mbps Ethernet of the day), but the performance
> >>>>> gain was largely due to the increase in the MTU. They told me: "Fred,
> >>>>> go figure out the path MTU problem", and they said: "go talk to Jeff
> >>>>> Mogul out in Palo Alto who knows something about it". But, then, the
> >>>>> Path MTU discovery group took a left turn at Albuquerque and left the
> >>>>> Internet as a tiny MTU wasteland. We have the opportunity to fix all
> >>>>> of that now - so, let's get it right for once.
> >>>>>
> >>>>> Fred
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> And the whole premise is spending resources in other parts of the
> >>>>>> network for a marginal diminishing return in the hosts.
> >>>>>>
> >>>>>> It simply does not add up.
> >>>>>>
> >>>>>> Yours,
> >>>>>> Joel
> >>>>>>
> >>>>>> On 3/24/2022 2:19 PM, Templin (US), Fred L wrote:
> >>>>>>>> The category 1) links are not yet in existence, but once parcels 
> >>>>>>>> start to
> >>>>>>>> enter the mainstream innovation will drive the creation of new kinds 
> >>>>>>>> of
> >>>>>>>> data links (1TB Ethernet?) that will be rolled out as new hardware.
> >>>>>>>
> >>>>>>> I want to put a gold star next to the above. AFAICT, pushing the MTU 
> >>>>>>> and
> >>>>>>> implementing IP parcels can get us to 1TB Ethernet practically 
> >>>>>>> overnight.
> >>>>>>> Back in the 1980's, FDDI proved that pushing to larger MTUs could 
> >>>>>>> boost
> >>>>>>> throughput without changing the speed of light, so why wouldn't the 
> >>>>>>> same
> >>>>>>> concept work for Ethernet in the modern era?
> >>>>>>>
> >>>>>>> Fred
> >>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: Int-area [mailto:int-area-boun...@ietf.org] On Behalf Of 
> >>>>>>>> Templin (US), Fred L
> >>>>>>>> Sent: Thursday, March 24, 2022 9:45 AM
> >>>>>>>> To: Tom Herbert <t...@herbertland.com>
> >>>>>>>> Cc: int-area <int-area@ietf.org>; Eggert, Lars <l...@netapp.com>; 
> >>>>>>>> l...@eggert.org
> >>>>>>>> Subject: Re: [Int-area] IP Parcels improves performance for end 
> >>>>>>>> systems
> >>>>>>>>
> >>>>>>>> Hi Tom - responses below:
> >>>>>>>>
> >>>>>>>>> -----Original Message-----
> >>>>>>>>> From: Tom Herbert [mailto:t...@herbertland.com]
> >>>>>>>>> Sent: Thursday, March 24, 2022 9:09 AM
> >>>>>>>>> To: Templin (US), Fred L <fred.l.temp...@boeing.com>
> >>>>>>>>> Cc: Eggert, Lars <l...@netapp.com>; int-area <int-area@ietf.org>; 
> >>>>>>>>> l...@eggert.org
> >>>>>>>>> Subject: Re: [Int-area] IP Parcels improves performance for end 
> >>>>>>>>> systems
> >>>>>>>>>
> >>>>>>>>> On Thu, Mar 24, 2022 at 7:27 AM Templin (US), Fred L
> >>>>>>>>> <fred.l.temp...@boeing.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Tom - see below:
> >>>>>>>>>>
> >>>>>>>>>>> -----Original Message-----
> >>>>>>>>>>> From: Tom Herbert [mailto:t...@herbertland.com]
> >>>>>>>>>>> Sent: Thursday, March 24, 2022 6:22 AM
> >>>>>>>>>>> To: Templin (US), Fred L <fred.l.temp...@boeing.com>
> >>>>>>>>>>> Cc: Eggert, Lars <l...@netapp.com>; int-area <int-area@ietf.org>; 
> >>>>>>>>>>> l...@eggert.org
> >>>>>>>>>>> Subject: Re: [Int-area] IP Parcels improves performance for end 
> >>>>>>>>>>> systems
> >>>>>>>>>>>
> >>>>>>>>>>> On Wed, Mar 23, 2022 at 10:47 AM Templin (US), Fred L
> >>>>>>>>>>> <fred.l.temp...@boeing.com> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Tom, looks like you have switched over to HTML which can be a 
> >>>>>>>>>>>> real conversation-killer.
> >>>>>>>>>>>>
> >>>>>>>>>>>> But, to some points you raised that require a response:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> You can't turn it off UDP checksums for IPv6 (except for narrow 
> >>>>>>>>>>>>> case of encapsulation).
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> That sounds like a good reason to continue to use IPv4 – at 
> >>>>>>>>>>>> least as far as end system
> >>>>>>>>>>>>
> >>>>>>>>>>>> addressing is concerned – right?
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Not at all. All NICs today provide checksum offload and so it's
> >>>>>>>>>>> basically zero cost to perform the UDP checksum. The fact that we
> >>>>>>>>>>> don't have to do extra checks on the UDPv6 checksum field to see 
> >>>>>>>>>>> if
> >>>>>>>>>>> it's zero actually is a performance improvement over UDPv4. (btw, 
> >>>>>>>>>>> I
> >>>>>>>>>>> will present implementation of the Internet checksum at TSVGWG 
> >>>>>>>>>>> Friday,
> >>>>>>>>>>> this will include discussion of checksum offloads).
> >>>>>>>>>>
> >>>>>>>>>> Actually, my assertion wasn't good to begin with because for IPv6 
> >>>>>>>>>> even if UDP
> >>>>>>>>>> checksums are turned off the OMNI encapsulation layer includes a 
> >>>>>>>>>> checksum
> >>>>>>>>>> that ensures the integrity of the IPv6 header. UDP checksums off 
> >>>>>>>>>> for IPv6 when
> >>>>>>>>>> OMNI encapsulation is used is perfectly fine.
> >>>>>>>>>>
> >>>>>>>>> I assume you are referring to RFC6935 and RFC6936 that allow the 
> >>>>>>>>> UDPv6
> >>>>>>>>> to be zero for tunneling with a very constrained set of conditions.
> >>>>>>>>>
> >>>>>>>>>>>>> If it's a standard per packet Internet checksum then a lot of 
> >>>>>>>>>>>>> HW could do it. If it's something like CRC32 then probably not.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> The integrity check is covered in RFC5327, and I honestly 
> >>>>>>>>>>>> haven’t had a chance to
> >>>>>>>>>>>>
> >>>>>>>>>>>> look at that myself yet.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> LTP is a nice experiment, but I'm more interested as to the 
> >>>>>>>>>>>>> interaction between IP parcels and TCP or QUIC.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Please be aware that while LTP may seem obscure at the moment 
> >>>>>>>>>>>> that may be changing now
> >>>>>>>>>>>>
> >>>>>>>>>>>> that the core DTN standards have been published. As DTN use 
> >>>>>>>>>>>> becomes more widespread I
> >>>>>>>>>>>>
> >>>>>>>>>>>> think we can see LTP also come into wider adoption.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>      My assumption is that IP parcels is intended to be a general 
> >>>>>>>>>>> solution
> >>>>>>>>>>> of all protocols. Maybe in the next draft you could discuss the
> >>>>>>>>>>> details of TCP in IP parcels including how to offload the TCP
> >>>>>>>>>>> checksum.
> >>>>>>>>>>
> >>>>>>>>>> I could certainly add that. For TCP, each of the concatenated 
> >>>>>>>>>> segments would
> >>>>>>>>>> include its own TCP header with checksum field included. Any 
> >>>>>>>>>> hardware that
> >>>>>>>>>> knows the structure of an IP Parcel can then simply do the TCP 
> >>>>>>>>>> checksum
> >>>>>>>>>> offload function for each segment.
> >>>>>>>>>
> >>>>>>>>> To be honest, the odds of ever getting support in NIC hardware for 
> >>>>>>>>> IP
> >>>>>>>>> parcels are extremely slim. Hardware vendors are driven by 
> >>>>>>>>> economics,
> >>>>>>>>> so the only way they would do that would be to demonstrate 
> >>>>>>>>> widespread
> >>>>>>>>> deployment of the protocol. But even then, with all the legacy
> >>>>>>>>> hardware in deployment it will take many years before there's any
> >>>>>>>>> appreciable traction. IMO, the better approach is to figure out how 
> >>>>>>>>> to
> >>>>>>>>> leverage the existing hardware features for use with IP parcels.
> >>>>>>>>
> >>>>>>>> There will be two kinds of links that will need to be 
> >>>>>>>> "Parcel-capable":
> >>>>>>>> 1) Edge network (physical) links that natively forward large 
> >>>>>>>> parcels, and
> >>>>>>>> 2) OMNI (virtual) links that forward parcels using encapsulation and
> >>>>>>>> fragmentation.
> >>>>>>>>
> >>>>>>>> The category 1) links are not yet in existence, but once parcels 
> >>>>>>>> start to
> >>>>>>>> enter the mainstream innovation will drive the creation of new kinds 
> >>>>>>>> of
> >>>>>>>> data links (1TB Ethernet?) that will be rolled out as new hardware. 
> >>>>>>>> And
> >>>>>>>> that new hardware can be made to understand the structure of parcels
> >>>>>>>> from the beginning. The category 2) links might take a large parcel 
> >>>>>>>> from
> >>>>>>>> the upper layers on the local node (or one that has been forwarded by
> >>>>>>>> a parcel-capable link) and break it down into smaller sub-parcels 
> >>>>>>>> then
> >>>>>>>> apply IP fragmentation to each sub-parcel and send the fragments to 
> >>>>>>>> an
> >>>>>>>> OMNI link egress node. You know better than me how checksum offload
> >>>>>>>> could be applied in an environment like that.
> >>>>>>>>
> >>>>>>>>>>>>> There was quite a bit of work and discussion on this in Linux. 
> >>>>>>>>>>>>> I believe the deviation from the standard was motivated by
> some
> >>>>>>>>>>>>
> >>>>>>>>>>>>> deployed devices required the IPID be set on receive, and 
> >>>>>>>>>>>>> setting IPID with DF equals to 1 is thought to be innocuous. You
> may
> >>>>>>>>>>>>
> >>>>>>>>>>>>> want to look at Alex Duyck's papers on UDP GSO, he wrote a lot 
> >>>>>>>>>>>>> of code in this area.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> RFC6864 has quite a bit to say about coding IP ID with DF=1 – 
> >>>>>>>>>>>> mostly in the negative.
> >>>>>>>>>>>>
> >>>>>>>>>>>> But, what I have seen in the linux code seems to indicate that 
> >>>>>>>>>>>> there is not even any
> >>>>>>>>>>>>
> >>>>>>>>>>>> coordination between the GSO source and the GRO destination – 
> >>>>>>>>>>>> instead, GRO simply
> >>>>>>>>>>>>
> >>>>>>>>>>>> starts gluing together packets that appear to have consecutive 
> >>>>>>>>>>>> IP IDs without ever first
> >>>>>>>>>>>>
> >>>>>>>>>>>> checking that they were sent by a peer that was earnestly doing 
> >>>>>>>>>>>> GSO. These aspects
> >>>>>>>>>>>>
> >>>>>>>>>>>> would make it very difficult to work GSO/GRO into an IETF 
> >>>>>>>>>>>> standard, plus it doesn’t
> >>>>>>>>>>>>
> >>>>>>>>>>>> work for IPv6 at all where there is no IP ID included by 
> >>>>>>>>>>>> default. IP Parcels addresses
> >>>>>>>>>>>>
> >>>>>>>>>>>> all of these points, and can be made into a standard.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Huh? GRO/GSO works perfectly fine with IPV6.
> >>>>>>>>>>
> >>>>>>>>>> Where is the spec for that? My understanding is that GSO/GRO 
> >>>>>>>>>> leverages the
> >>>>>>>>>> IP ID for IPv4. But, for IPv6, there is no IP ID unless you 
> >>>>>>>>>> include a Fragment Header.
> >>>>>>>>>> Does IPv6 somehow do GSO/GRO differently?
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> GRO and GSO don't use the IPID to match a flow. The primary match is
> >>>>>>>>> the TCP 4-tuple.
> >>>>>>>>
> >>>>>>>> Correct, the 5-tuple (src-ip, src-port, dst-ip, dst-pot, proto) is 
> >>>>>>>> what is used
> >>>>>>>> to match the flow. But, you need more than that in order to 
> >>>>>>>> correctly paste
> >>>>>>>> back together with GRO the segments of an original ULP buffer that 
> >>>>>>>> was
> >>>>>>>> broken down by GSO - you need Identifications and/or other markings 
> >>>>>>>> in
> >>>>>>>> the IP headers to give a reassembly context. Otherwise, GRO might end
> >>>>>>>> up gluing together old and new pieces of ULP data and/or impart a 
> >>>>>>>> lot of
> >>>>>>>> reordering. IP Parcels have well behaved Identifications and Parcel 
> >>>>>>>> IDs so
> >>>>>>>> that the original ULP buffer context is honored during reassembly.
> >>>>>>>>
> >>>>>>>>> There's also another possibility with IPv6-- use jumbograms. For
> >>>>>>>>> instance, instead of GRO reassembling segments up to a 64K packet, 
> >>>>>>>>> it
> >>>>>>>>> could be modified to reassemble up to a 4G packet using IPv6
> >>>>>>>>> jumbograms where one really big packet is given to the stack.
> >>>>>>>>>
> >>>>>>>>> But we probably don't even need jumbograms for that. In Linux, GRO
> >>>>>>>>> might be taught to reassemble up to 4G super packet and set a flag 
> >>>>>>>>> bit
> >>>>>>>>> in the skbuf to ignore the IP payload field and get the length from
> >>>>>>>>> the skbuf len field (as though a jumbogram was received). This trick
> >>>>>>>>> would work for IPV4 and IPv6 and GSO as well. It should also work 
> >>>>>>>>> TSO
> >>>>>>>>> if the device takes the IP payload length to be that for each 
> >>>>>>>>> segment.
> >>>>>>>>
> >>>>>>>> Yes, I was planning to give that a try to see what kind of 
> >>>>>>>> performance
> >>>>>>>> can be gotten with GSO/GRO when you exceed 64KB. But, my concern
> >>>>>>>> with GSO/GRO is that the reassembly is (relatively) unguided and
> >>>>>>>> haphazard and can result in mis-ordered concatenations. And, there is
> >>>>>>>> no protocol by which the GRO receiver can imply that the things it is
> >>>>>>>> gluing together actually originated from a sender that is earnestly 
> >>>>>>>> doing
> >>>>>>>> GSO. So, I do not see how GSO/GRO as I see it in the implementation
> >>>>>>>> could be made into a standard, whereas there is a clear path for
> >>>>>>>> standardizing IP parcels.
> >>>>>>>>
> >>>>>>>> Another thing I forgot to mention is that in my experiments with 
> >>>>>>>> GSO/GRO
> >>>>>>>> I found that it won't let me set a GSO segment size that would cause 
> >>>>>>>> the
> >>>>>>>> resulting IP packets to exceed the path MTU (i.e., it won't allow 
> >>>>>>>> fragmentation).
> >>>>>>>> I fixed that by configuring IPv4-in-IPv6 encapsulation per RFC2473 
> >>>>>>>> and then
> >>>>>>>> allowed the IPv6 layer to apply fragmentation to the encapsulated 
> >>>>>>>> packet.
> >>>>>>>> That way, I can use IPv4 GSO segment sizes up to ~64KB.
> >>>>>>>>
> >>>>>>>> Fred
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Tom
> >>>>>>>>>
> >>>>>>>>>> Thanks - Fred
> >>>>>>>>>>
> >>>>>>>>>>> Tom
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Fred
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> From: Tom Herbert [mailto:t...@herbertland.com]
> >>>>>>>>>>>> Sent: Wednesday, March 23, 2022 9:37 AM
> >>>>>>>>>>>> To: Templin (US), Fred L <fred.l.temp...@boeing.com>
> >>>>>>>>>>>> Cc: Eggert, Lars <l...@netapp.com>; int-area 
> >>>>>>>>>>>> <int-area@ietf.org>; l...@eggert.org
> >>>>>>>>>>>> Subject: Re: [EXTERNAL] Re: [Int-area] IP Parcels improves 
> >>>>>>>>>>>> performance for end systems
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> EXT email: be mindful of links/attachments.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Wed, Mar 23, 2022, 9:54 AM Templin (US), Fred L 
> >>>>>>>>>>>> <fred.l.temp...@boeing.com> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Hi Tom,
> >>>>>>>>>>>>
> >>>>>>>>>>>>> -----Original Message-----
> >>>>>>>>>>>>> From: Tom Herbert [mailto:t...@herbertland.com]
> >>>>>>>>>>>>> Sent: Wednesday, March 23, 2022 6:19 AM
> >>>>>>>>>>>>> To: Templin (US), Fred L <fred.l.temp...@boeing.com>
> >>>>>>>>>>>>> Cc: Eggert, Lars <l...@netapp.com>; int-area@ietf.org; 
> >>>>>>>>>>>>> l...@eggert.org
> >>>>>>>>>>>>> Subject: Re: [Int-area] IP Parcels improves performance for end 
> >>>>>>>>>>>>> systems
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Tue, Mar 22, 2022 at 10:38 AM Templin (US), Fred L
> >>>>>>>>>>>>> <fred.l.temp...@boeing.com> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Tom, see below:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> -----Original Message-----
> >>>>>>>>>>>>>>> From: Tom Herbert [mailto:t...@herbertland.com]
> >>>>>>>>>>>>>>> Sent: Tuesday, March 22, 2022 10:00 AM
> >>>>>>>>>>>>>>> To: Templin (US), Fred L <fred.l.temp...@boeing.com>
> >>>>>>>>>>>>>>> Cc: Eggert, Lars <l...@netapp.com>; int-area@ietf.org
> >>>>>>>>>>>>>>> Subject: Re: [Int-area] IP Parcels improves performance for 
> >>>>>>>>>>>>>>> end systems
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Tue, Mar 22, 2022 at 7:42 AM Templin (US), Fred L
> >>>>>>>>>>>>>>> <fred.l.temp...@boeing.com> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Lars, I did a poor job of answering your question. One of 
> >>>>>>>>>>>>>>>> the most important aspects of
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> IP Parcels in relation to TSO and GSO/GRO is that transports 
> >>>>>>>>>>>>>>>> get to use a full 4MB buffer
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> instead of the 64KB limit in current practices. This is 
> >>>>>>>>>>>>>>>> possible due to the IP Parcel jumbo
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> payload option encapsulation which provides a 32-bit length 
> >>>>>>>>>>>>>>>> field instead of just a 16-bit.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> By allowing the transport to present the IP layer with a 
> >>>>>>>>>>>>>>>> buffer of up to 4MB, it reduces
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> the overhead, minimizes system calls and interrupts, etc.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> So, yes, IP Parcels is very much about improving the 
> >>>>>>>>>>>>>>>> performance for end systems in
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> comparison with current practice (GSO/GRO and TSO).
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Hi Fred,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> The nice thing about TSO/GSO/GRO is that they don't require 
> >>>>>>>>>>>>>>> any
> >>>>>>>>>>>>>>> changes to the protocol as just implementation techniques, 
> >>>>>>>>>>>>>>> also
> >>>>>>>>>>>>>>> they're one sided opitmizations meaning for instance that TSO 
> >>>>>>>>>>>>>>> can be
> >>>>>>>>>>>>>>> used at the sender without requiring GRO to be used at the 
> >>>>>>>>>>>>>>> receiver.
> >>>>>>>>>>>>>>> My understanding is that IP parcels requires new protocol 
> >>>>>>>>>>>>>>> that would
> >>>>>>>>>>>>>>> need to be implemented on both endpoints and possibly in some 
> >>>>>>>>>>>>>>> routers.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> It is not entirely true that the protocol needs to be 
> >>>>>>>>>>>>>> implemented on both
> >>>>>>>>>>>>>> endpoints . Sources that send IP Parcels send them into a 
> >>>>>>>>>>>>>> Parcel-capable path
> >>>>>>>>>>>>>> which ends at either the final destination or a router for 
> >>>>>>>>>>>>>> which the next hop is
> >>>>>>>>>>>>>> not Parcel-capable. If the Parcel-capable path extends all the 
> >>>>>>>>>>>>>> way to the final
> >>>>>>>>>>>>>> destination, then the Parcel is delivered to the destination 
> >>>>>>>>>>>>>> which knows how
> >>>>>>>>>>>>>> to deal with it. If the Parcel-capable path ends at a router 
> >>>>>>>>>>>>>> somewhere in the
> >>>>>>>>>>>>>> middle, the router opens the Parcel and sends each enclosed 
> >>>>>>>>>>>>>> segment as an
> >>>>>>>>>>>>>> independent IP packet. The final destination is then free to 
> >>>>>>>>>>>>>> apply GRO to the
> >>>>>>>>>>>>>> incoming IP packets even if it does not understand Parcels.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> IP Parcels is about efficient shipping and handling just like 
> >>>>>>>>>>>>>> the major online
> >>>>>>>>>>>>>> retailer service model I described during the talk. The goal 
> >>>>>>>>>>>>>> is to deliver the
> >>>>>>>>>>>>>> fewest and largest possible parcels to the final destination 
> >>>>>>>>>>>>>> rather than
> >>>>>>>>>>>>>> delivering lots of small IP packets. It is good for the 
> >>>>>>>>>>>>>> network and good for
> >>>>>>>>>>>>>> the end systems both. If this were not true, then Amazon would 
> >>>>>>>>>>>>>> send the
> >>>>>>>>>>>>>> consumer 50 small boxes with 1 item each instead of 1 larger 
> >>>>>>>>>>>>>> box with all
> >>>>>>>>>>>>>> 50 items inside. And, we all know what they would choose to do.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Do you have data that shows the benefits of IP Parcels in 
> >>>>>>>>>>>>>>> light of
> >>>>>>>>>>>>>>> these requirements?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I have data that shows that GSO/GRO is good for packaging 
> >>>>>>>>>>>>>> sizes up to 64KB
> >>>>>>>>>>>>>> even if the enclosed segments will require IP fragmentation 
> >>>>>>>>>>>>>> upon transmission.
> >>>>>>>>>>>>>> The data implies that even larger packaging sizes (up to a 
> >>>>>>>>>>>>>> maximum of 4MB)
> >>>>>>>>>>>>>> would be better still.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Fred,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> You seem to be only looking at the problem from a per packet 
> >>>>>>>>>>>>> cost
> >>>>>>>>>>>>> point of view. There is also per byte cost, particularly in the
> >>>>>>>>>>>>> computation of the TCP/UDP checksum. The cost is hidden in 
> >>>>>>>>>>>>> modern
> >>>>>>>>>>>>> implementations by checksum offload, and for segmentation 
> >>>>>>>>>>>>> offload we
> >>>>>>>>>>>>> have methods to preserve the utility of checksum offload. IP 
> >>>>>>>>>>>>> parcels
> >>>>>>>>>>>>> will have to also leverage checksum offload, because if the 
> >>>>>>>>>>>>> checksum
> >>>>>>>>>>>>> is not offloaded then the cost of computing the payload 
> >>>>>>>>>>>>> checksum in
> >>>>>>>>>>>>> CPU would dwarf any benefits we'd get by using segments larger 
> >>>>>>>>>>>>> than
> >>>>>>>>>>>>> 64K.
> >>>>>>>>>>>>
> >>>>>>>>>>>> There is plenty of opportunity to apply hardware checksum 
> >>>>>>>>>>>> offload since
> >>>>>>>>>>>> the structure of a Parcel will be very standard. My experiments 
> >>>>>>>>>>>> have been
> >>>>>>>>>>>> with a protocol called LTP which is layered over UDP/IP as some 
> >>>>>>>>>>>> other
> >>>>>>>>>>>> upper layer protocols are. LTP includes a segment-by-segment 
> >>>>>>>>>>>> checksum
> >>>>>>>>>>>> that is used at its level in the absence of lower layer 
> >>>>>>>>>>>> integrity checks, so
> >>>>>>>>>>>> for larger Parcels LTP would use that and turn off UDP checksums
> >>>>>>>>>>>> altogether.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> You can't turn it off UDP checksums for IPv6 (except for narrow 
> >>>>>>>>>>>> case of encapsulation).
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> As far as I am aware, there are currently no hardware
> >>>>>>>>>>>> checksum offload implementations available for calculating the
> >>>>>>>>>>>> LTP checksums.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> If it's a standard per packet Internet checksum then a lot of HW 
> >>>>>>>>>>>> could do it. If it's something like CRC32 then probably not.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> LTP is a nice experiment, but I'm more interested as to the 
> >>>>>>>>>>>> interaction between IP parcels and TCP or QUIC.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Speaking of standard, AFAICT GSO/GRO are doing something very
> >>>>>>>>>>>> non-standard. GSO seems to be coding the IP ID field in the IPv4
> >>>>>>>>>>>> headers of packets with DF=1 which goes against RFC 6864. When
> >>>>>>>>>>>> DF=1, GSO cannot simply claim the IP ID and code it as if there 
> >>>>>>>>>>>> were
> >>>>>>>>>>>> some sort of protocol. Or, if it does, there would be no way to
> >>>>>>>>>>>> standardize it.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> There was quite a bit of work and discussion on this in Linux. I 
> >>>>>>>>>>>> believe the deviation from the standard was motivated by
> some
> >>>>>>>>> deployed
> >>>>>>>>>>> devices required the IPID be set on receive, and setting IPID 
> >>>>>>>>>>> with DF equals to 1 is thought to be innocuous. You may want to
> >> look
> >>>> at
> >>>>>>>>> Alex
> >>>>>>>>>>> Duyck's papers on UDP GSO, he wrote a lot of code in this area.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Tom
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Fred
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Tom
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Fred
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>> Tom
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Thanks - Fred
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> _______________________________________________
> >>>>>>>>>>>>>>>> Int-area mailing list
> >>>>>>>>>>>>>>>> Int-area@ietf.org
> >>>>>>>>>>>>>>>> https://www.ietf.org/mailman/listinfo/int-area
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> Int-area mailing list
> >>>>>>>> Int-area@ietf.org
> >>>>>>>> https://www.ietf.org/mailman/listinfo/int-area
> >>>>>>> _______________________________________________
> >>>>>>> Int-area mailing list
> >>>>>>> Int-area@ietf.org
> >>>>>>> https://www.ietf.org/mailman/listinfo/int-area
> >>>>>
> >>>
> >

_______________________________________________
Int-area mailing list
Int-area@ietf.org
https://www.ietf.org/mailman/listinfo/int-area

Re: [Int-area] IP Parcels improves performance for end systems

Reply via email to