This ETH_RX_OFFLOAD_BUFFER_SPLIT sounds promising indeed.
On Mar 26, 2024 at 9:14 AM -0700, Konstantin Ananyev 
<konstantin.anan...@huawei.com>, wrote:
> Just wonder what would happen if you’ll receive an ipv6 packet with options 
> or some fancy encapsulation IP-IP or so?
> BTW, there is an  RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT offload:
> https://doc.dpdk.org/api/structrte__eth__rxseg__split.html
> Which might be close to what you are looking for, but right now it is 
> supported by mlx5 PMD only.
>
> From: Garrett D'Amore <garr...@damore.org>
> Sent: Tuesday, March 26, 2024 2:19 PM
> To: Bruce Richardson <bruce.richard...@intel.com>; Stephen Hemminger 
> <step...@networkplumber.org>; Morten Brørup <m...@smartsharesystems.com>
> Cc: dev@dpdk.org; Parthakumar Roy <parthakumar....@ibm.com>
> Subject: RE: meson option to customize RTE_PKTMBUF_HEADROOM patch
>
> This could work. Not that we would like to have the exceptional case of IPv6 
> use less headroom.   So we would say 40 is our compiled in default and then 
> we reduce it by 20 on IPv6 which doesn’t have to support all the same devices 
> that IPv4 does. This would give the lowest disruption to the existing IPv4 
> stack and allow PMDs to updated incrementally.
> On Mar 26, 2024 at 1:05 AM -0700, Morten Brørup <m...@smartsharesystems.com>, 
> wrote:
>
> > quote_type
> > Interesting requirement. I can easily imagine how a (non-forwarding, i.e. 
> > traffic terminating) application, which doesn’t really care about the 
> > preceding headers, can benefit from having its actual data at a specific 
> > offset for alignment purposes. I don’t consider this very exotic. (Even the 
> > Linux kernel uses this trick to achieve improved IP header alignment on RX.)
> >
> > I think the proper solution would be to add a new offload parameter to 
> > rte_eth_rxconf to specify how many bytes the driver should subtract from 
> > RTE_PKTMBUF_HEADROOM when writing the RX descriptor to the NIC hardware. 
> > Depending on driver support, this would make it configurable per device and 
> > per RX queue.
> >
> > If this parameter is set, the driver should adjust m->data_off accordingly 
> > on RX, so rte_pktmbuf_mtod[_offset]() and rte_pktmbuf_iova[_offset]() still 
> > point to the Ethernet header.
> >
> >
> > Med venlig hilsen / Kind regards,
> > -Morten Brørup
> >
> > From: Garrett D'Amore [mailto:garr...@damore.org]
> > Sent: Monday, 25 March 2024 23.56
> > So we need (for reasons that I don't want to get to into in too much 
> > detail) that our UDP payload headers are at a specific offset in the packet.
> >
> > This was not a problem as long as we only used IPv4.  (We have configured 
> > 40 bytes of headroom, which is more than any of our PMDs need by a hefty 
> > margin.)
> >
> > Now that we're extending to support IPv6, we need to reduce that headroom 
> > by 20 bytes, to preserve our UDP payload offset.
> >
> > This has big ramifications for how we fragment our own upper layer 
> > messages, and it has been determined that updating the PMDs to allow us to 
> > change the headroom for this use case (on a per port basis, as we will have 
> > some ports on IPv4 and others on IPv6) is the least effort, but a large 
> > margin.  (Well, copying the frames via memcpy would be less development 
> > effort, but would be a performance catastrophe.)
> >
> > For transmit side we don't need this, as we can simply adjust the packet as 
> > needed.  But for the receive side, we are kind of stuck, as the PMDs rely 
> > on the hard coded RTE_PKTMBUF_HEADROOM to program receive locations.
> >
> > As far as header splitting, that would indeed be a much much nicer solution.
> >
> > I haven't looked in the latest code to see if header splitting is even an 
> > option -- the version of the DPDK I'm working with is a little older 
> > (20.11) -- we have to update but we have other local changes and so 
> > updating is one of the things that we still have to do.
> >
> > At any rate, the version I did look at doesn't seem to support header 
> > splits on any device other than FM10K.  That's not terrifically interesting 
> > for us.  We use Mellanox, E810 (ICE), bnxt, cloud NICs (all of them really 
> > -- ENA, virtio-net, etc.)   We also have a fair amount of ixgbe and i40e on 
> > client systems in the field.
> >
> > We also, unfortunately, have an older DPDK 18 with Mellanox contributions 
> > for IPoverIB.... though I'm not sure we will try to support IPv6 there.  
> > (We are working towards replacing that part of stack with UCX.)
> >
> > Unless header splitting will work on all of this (excepting the IPoIB 
> > piece), then it's not something we can really use.
> > On Mar 25, 2024 at 10:20 AM -0700, Stephen Hemminger 
> > <step...@networkplumber.org>, wrote:
> > On Mon, 25 Mar 2024 10:01:52 +0000
> > Bruce Richardson <bruce.richard...@intel.com> wrote:
> >
> > On Sat, Mar 23, 2024 at 01:51:25PM -0700, Garrett D'Amore wrote:
> > > So we right now (at WEKA) have a somewhat older version of DPDK that we
> > > have customized heavily, and I am going to to need to to make the
> > > headroom *dynamic* (passed in at run time, and per port.)
> > > We have this requirement because we need payload to be at a specific
> > > offset, but have to deal with different header lengths for IPv4 and now
> > > IPv6.
> > > My reason for pointing this out, is that I would dearly like if we
> > > could collaborate on this -- this change is going to touch pretty much
> > > every PMD (we don't need it on all of them as we only support a subset
> > > of PMDs, but its still a significant set.)
> > > I'm not sure if anyone else has considered such a need -- this
> > > particular message caught my eye as I'm looking specifically in this
> > > area right now.
> > >
> > Hi
> >
> > thanks for reaching out. Can you clarify a little more as to the need for
> > this requirement? Can you not just set the headroom value to the max needed
> > value for any port and use that? Is there an issue with having blank space
> > at the start of a buffer?
> >
> > Thanks,
> > /Bruce
> >
> > If you have to make such a deep change across all PMD's then maybe
> > it is not the best solution. What about being able to do some form of buffer
> > chaining or pullup.

Reply via email to