This had occurred to me as well. I think most hardware DMA engines can align
on 32-bit boundaries. I've yet to see a device that actually requires 64-bit
DMA alignment. (But I have only looked at a subset of devices, and most of
the ones I have looked at are not ones that would be considered 'modern'.)
On Mar 26, 2024 at 8:06 AM -0700, Morten Brørup <m...@smartsharesystems.com>,
wrote:
> Something just struck me…
> The buffer address field in the RX descriptor of some NICs may have alignment
> requirements, i.e. the lowest bits in the buffer address field of the NIC’s
> RX descriptor may be used for other purposes (and assumed zero for buffer
> address purposes). 40 is divisible by 8, but offset 20 requires that the NIC
> hardware supports 4-byte aligned addresses (so only the 2 lowest bits may be
> used for other purposes).
>
> Here’s an example of what I mean:
> https://docs.amd.com/r/en-US/am011-versal-acap-trm/RX-Descriptor-Words
>
> If any of your supported NICs have that restriction, i.e. requires an 8 byte
> aligned buffer address, your concept of having the UDP payload at the same
> fixed offset for both IPv4 and IPv6 is not going to be possible. (And you
> were lucky that the offset happens to be sufficiently aligned to work for
> IPv4 to begin with.)
>
> It seems you need to read a bunch of datasheets before proceeding.
>
>
> Med venlig hilsen / Kind regards,
> -Morten Brørup
>
> From: Garrett D'Amore [mailto:garr...@damore.org]
> Sent: Tuesday, 26 March 2024 15.19
>
> This could work. Not that we would like to have the exceptional case of IPv6
> use less headroom. So we would say 40 is our compiled in default and then
> we reduce it by 20 on IPv6 which doesn’t have to support all the same devices
> that IPv4 does. This would give the lowest disruption to the existing IPv4
> stack and allow PMDs to updated incrementally.
> On Mar 26, 2024 at 1:05 AM -0700, Morten Brørup <m...@smartsharesystems.com>,
> wrote:
>
> Interesting requirement. I can easily imagine how a (non-forwarding, i.e.
> traffic terminating) application, which doesn’t really care about the
> preceding headers, can benefit from having its actual data at a specific
> offset for alignment purposes. I don’t consider this very exotic. (Even the
> Linux kernel uses this trick to achieve improved IP header alignment on RX.)
>
> I think the proper solution would be to add a new offload parameter to
> rte_eth_rxconf to specify how many bytes the driver should subtract from
> RTE_PKTMBUF_HEADROOM when writing the RX descriptor to the NIC hardware.
> Depending on driver support, this would make it configurable per device and
> per RX queue.
>
> If this parameter is set, the driver should adjust m->data_off accordingly on
> RX, so rte_pktmbuf_mtod[_offset]() and rte_pktmbuf_iova[_offset]() still
> point to the Ethernet header.
>
>
> Med venlig hilsen / Kind regards,
> -Morten Brørup
>
> From: Garrett D'Amore [mailto:garr...@damore.org]
> Sent: Monday, 25 March 2024 23.56
> So we need (for reasons that I don't want to get to into in too much detail)
> that our UDP payload headers are at a specific offset in the packet.
>
> This was not a problem as long as we only used IPv4. (We have configured 40
> bytes of headroom, which is more than any of our PMDs need by a hefty margin.)
>
> Now that we're extending to support IPv6, we need to reduce that headroom by
> 20 bytes, to preserve our UDP payload offset.
>
> This has big ramifications for how we fragment our own upper layer messages,
> and it has been determined that updating the PMDs to allow us to change the
> headroom for this use case (on a per port basis, as we will have some ports
> on IPv4 and others on IPv6) is the least effort, but a large margin. (Well,
> copying the frames via memcpy would be less development effort, but would be
> a performance catastrophe.)
>
> For transmit side we don't need this, as we can simply adjust the packet as
> needed. But for the receive side, we are kind of stuck, as the PMDs rely on
> the hard coded RTE_PKTMBUF_HEADROOM to program receive locations.
>
> As far as header splitting, that would indeed be a much much nicer solution.
>
> I haven't looked in the latest code to see if header splitting is even an
> option -- the version of the DPDK I'm working with is a little older (20.11)
> -- we have to update but we have other local changes and so updating is one
> of the things that we still have to do.
>
> At any rate, the version I did look at doesn't seem to support header splits
> on any device other than FM10K. That's not terrifically interesting for us.
> We use Mellanox, E810 (ICE), bnxt, cloud NICs (all of them really -- ENA,
> virtio-net, etc.) We also have a fair amount of ixgbe and i40e on client
> systems in the field.
>
> We also, unfortunately, have an older DPDK 18 with Mellanox contributions for
> IPoverIB.... though I'm not sure we will try to support IPv6 there. (We are
> working towards replacing that part of stack with UCX.)
>
> Unless header splitting will work on all of this (excepting the IPoIB piece),
> then it's not something we can really use.
> On Mar 25, 2024 at 10:20 AM -0700, Stephen Hemminger
> <step...@networkplumber.org>, wrote:
> On Mon, 25 Mar 2024 10:01:52 +0000
> Bruce Richardson <bruce.richard...@intel.com> wrote:
>
> On Sat, Mar 23, 2024 at 01:51:25PM -0700, Garrett D'Amore wrote:
> > So we right now (at WEKA) have a somewhat older version of DPDK that we
> > have customized heavily, and I am going to to need to to make the
> > headroom *dynamic* (passed in at run time, and per port.)
> > We have this requirement because we need payload to be at a specific
> > offset, but have to deal with different header lengths for IPv4 and now
> > IPv6.
> > My reason for pointing this out, is that I would dearly like if we
> > could collaborate on this -- this change is going to touch pretty much
> > every PMD (we don't need it on all of them as we only support a subset
> > of PMDs, but its still a significant set.)
> > I'm not sure if anyone else has considered such a need -- this
> > particular message caught my eye as I'm looking specifically in this
> > area right now.
> >
> Hi
>
> thanks for reaching out. Can you clarify a little more as to the need for
> this requirement? Can you not just set the headroom value to the max needed
> value for any port and use that? Is there an issue with having blank space
> at the start of a buffer?
>
> Thanks,
> /Bruce
>
> If you have to make such a deep change across all PMD's then maybe
> it is not the best solution. What about being able to do some form of buffer
> chaining or pullup.