This had occurred to me as well.  I think most hardware DMA engines can align 
on 32-bit boundaries.  I've yet to see a device that actually requires 64-bit 
DMA alignment.  (But I have only looked at a subset  of devices, and most of 
the  ones I have looked at are not ones that would be considered 'modern'.)
On Mar 26, 2024 at 8:06 AM -0700, Morten Brørup <m...@smartsharesystems.com>, 
wrote:
> Something just struck me…
> The buffer address field in the RX descriptor of some NICs may have alignment 
> requirements, i.e. the lowest bits in the buffer address field of the NIC’s 
> RX descriptor may be used for other purposes (and assumed zero for buffer 
> address purposes). 40 is divisible by 8, but offset 20 requires that the NIC 
> hardware supports 4-byte aligned addresses (so only the 2 lowest bits may be 
> used for other purposes).
>
> Here’s an example of what I mean:
> https://docs.amd.com/r/en-US/am011-versal-acap-trm/RX-Descriptor-Words
>
> If any of your supported NICs have that restriction, i.e. requires an 8 byte 
> aligned buffer address, your concept of having the UDP payload at the same 
> fixed offset for both IPv4 and IPv6 is not going to be possible. (And you 
> were lucky that the offset happens to be sufficiently aligned to work for 
> IPv4 to begin with.)
>
> It seems you need to read a bunch of datasheets before proceeding.
>
>
> Med venlig hilsen / Kind regards,
> -Morten Brørup
>
> From: Garrett D'Amore [mailto:garr...@damore.org]
> Sent: Tuesday, 26 March 2024 15.19
>
> This could work. Not that we would like to have the exceptional case of IPv6 
> use less headroom.   So we would say 40 is our compiled in default and then 
> we reduce it by 20 on IPv6 which doesn’t have to support all the same devices 
> that IPv4 does. This would give the lowest disruption to the existing IPv4 
> stack and allow PMDs to updated incrementally.
> On Mar 26, 2024 at 1:05 AM -0700, Morten Brørup <m...@smartsharesystems.com>, 
> wrote:
>
> Interesting requirement. I can easily imagine how a (non-forwarding, i.e. 
> traffic terminating) application, which doesn’t really care about the 
> preceding headers, can benefit from having its actual data at a specific 
> offset for alignment purposes. I don’t consider this very exotic. (Even the 
> Linux kernel uses this trick to achieve improved IP header alignment on RX.)
>
> I think the proper solution would be to add a new offload parameter to 
> rte_eth_rxconf to specify how many bytes the driver should subtract from 
> RTE_PKTMBUF_HEADROOM when writing the RX descriptor to the NIC hardware. 
> Depending on driver support, this would make it configurable per device and 
> per RX queue.
>
> If this parameter is set, the driver should adjust m->data_off accordingly on 
> RX, so rte_pktmbuf_mtod[_offset]() and rte_pktmbuf_iova[_offset]() still 
> point to the Ethernet header.
>
>
> Med venlig hilsen / Kind regards,
> -Morten Brørup
>
> From: Garrett D'Amore [mailto:garr...@damore.org]
> Sent: Monday, 25 March 2024 23.56
> So we need (for reasons that I don't want to get to into in too much detail) 
> that our UDP payload headers are at a specific offset in the packet.
>
> This was not a problem as long as we only used IPv4.  (We have configured 40 
> bytes of headroom, which is more than any of our PMDs need by a hefty margin.)
>
> Now that we're extending to support IPv6, we need to reduce that headroom by 
> 20 bytes, to preserve our UDP payload offset.
>
> This has big ramifications for how we fragment our own upper layer messages, 
> and it has been determined that updating the PMDs to allow us to change the 
> headroom for this use case (on a per port basis, as we will have some ports 
> on IPv4 and others on IPv6) is the least effort, but a large margin.  (Well, 
> copying the frames via memcpy would be less development effort, but would be 
> a performance catastrophe.)
>
> For transmit side we don't need this, as we can simply adjust the packet as 
> needed.  But for the receive side, we are kind of stuck, as the PMDs rely on 
> the hard coded RTE_PKTMBUF_HEADROOM to program receive locations.
>
> As far as header splitting, that would indeed be a much much nicer solution.
>
> I haven't looked in the latest code to see if header splitting is even an 
> option -- the version of the DPDK I'm working with is a little older (20.11) 
> -- we have to update but we have other local changes and so updating is one 
> of the things that we still have to do.
>
> At any rate, the version I did look at doesn't seem to support header splits 
> on any device other than FM10K.  That's not terrifically interesting for us.  
> We use Mellanox, E810 (ICE), bnxt, cloud NICs (all of them really -- ENA, 
> virtio-net, etc.)   We also have a fair amount of ixgbe and i40e on client 
> systems in the field.
>
> We also, unfortunately, have an older DPDK 18 with Mellanox contributions for 
> IPoverIB.... though I'm not sure we will try to support IPv6 there.  (We are 
> working towards replacing that part of stack with UCX.)
>
> Unless header splitting will work on all of this (excepting the IPoIB piece), 
> then it's not something we can really use.
> On Mar 25, 2024 at 10:20 AM -0700, Stephen Hemminger 
> <step...@networkplumber.org>, wrote:
> On Mon, 25 Mar 2024 10:01:52 +0000
> Bruce Richardson <bruce.richard...@intel.com> wrote:
>
> On Sat, Mar 23, 2024 at 01:51:25PM -0700, Garrett D'Amore wrote:
> > So we right now (at WEKA) have a somewhat older version of DPDK that we
> > have customized heavily, and I am going to to need to to make the
> > headroom *dynamic* (passed in at run time, and per port.)
> > We have this requirement because we need payload to be at a specific
> > offset, but have to deal with different header lengths for IPv4 and now
> > IPv6.
> > My reason for pointing this out, is that I would dearly like if we
> > could collaborate on this -- this change is going to touch pretty much
> > every PMD (we don't need it on all of them as we only support a subset
> > of PMDs, but its still a significant set.)
> > I'm not sure if anyone else has considered such a need -- this
> > particular message caught my eye as I'm looking specifically in this
> > area right now.
> >
> Hi
>
> thanks for reaching out. Can you clarify a little more as to the need for
> this requirement? Can you not just set the headroom value to the max needed
> value for any port and use that? Is there an issue with having blank space
> at the start of a buffer?
>
> Thanks,
> /Bruce
>
> If you have to make such a deep change across all PMD's then maybe
> it is not the best solution. What about being able to do some form of buffer
> chaining or pullup.

Reply via email to