> From: Robin Jarry [mailto:rja...@redhat.com]
> 
> Vladimir Medvedkin, Jul 18, 2024 at 23:25:
> > I think alignment should be 1 since in FIB6 users usually don't copy
> IPv6
> > address and just provide a pointer to the memory inside the packet.

How can they do that? The bulk lookup function takes an array of IPv6 
addresses, not an array of pointers to IPv6 addresses.

What you are suggesting only works with single lookup, not bulk lookup.

> Current
> > vector implementation loads IPv6 addresses using unaligned access (
> > _mm512_loadu_si512) so it doesn't rely on alignment.
> 
> Yes, my intention was exactly that, being able to map that structure
> directly in packets without copying them on the stack.

This would require changing the bulk lookup API to take an array of pointers 
instead of an array of IPv6 addresses.

It would be acceptable to introduce a new single address lookup function, 
taking a pointer to an unaligned (or 2 byte aligned) IPv6 address for the 
single lookup use cases mentioned above.

> 
> > > 2. In the IPv6 packet header, the IPv6 addresses are not 16 byte
> aligned,
> > > they are 8 byte aligned. So we cannot make the IPv6 address type 16
> byte
> > > aligned.
> 
> > Not necessary, if Ethernet frame in mbuf starts on 8b aligned address,
> then
> > IPv6 is aligned only by 2 bytes.
> 
> We probably could safely say that aligning on 2 bytes would be OK. But
> is there any benefit, performance wise, in doing so? Keeping the same
> alignment as before the change would at least make it ABI compatible.

I'm not worried about the IPv6 FIB functions. This proposal introduces a 
generic IPv6 address type for *all of DPDK*, so you need to consider *all* 
aspects, not just one library!

There may be current or future CPUs, where alignment makes a performance 
difference. Do all architectures support unaligned 128 bit access at 100 % 
similar performance to aligned 128 bit access? I think not!
E.g. on X86 architecture, load/store across a cache boundary has a performance 
impact. If the type is explicitly unaligned, an instance on the stack (i.e. a 
local variable holding an IPv6 address) might cross a cache boundary, whereas 
an 128 bit aligned instance on the stack is guaranteed not to cross a cache 
boundary.

The generic IPv4 address type is natively aligned (i.e. 4 byte). When accessing 
an IPv4 address in an IPv4 header following an Ethernet header, it is not 4 
byte aligned, so this is an *exception* from the general case, and must be 
treated as such. You don't want to make the general type unaligned (and thus 
inefficient) everywhere it is being used, only because a few use cases require 
the unaligned form.

The same principle must apply to the IPv6 address type. Let's make the generic 
type natively aligned (16 byte). And you might also offer an explicitly 
unaligned type for the exception use cases requiring unaligned access.

Reply via email to