On Sun, 16 Jan 2022 09:09:49 -0500 Luc Pelletier <lucp.at.w...@gmail.com> wrote:
> > X86 always allows unaligned access. Irregardless of what tools say. > > Why impose additional overhead in performance critical code. > > Let me preface my response by saying that I'm not a C compiler developer. > Hopefully someone who is will read this and chime in. > > I agree that X86 allows unaligned store/load. However, the C standard doesn't, > and says that it's undefined behavior. This means that the code relies on > undefined behavior. It may do the right thing all the time, almost all the > time, > some of the time... it's undefined. It may work now but it may stop > working in the future. > Here's a good discussion on SO about unaligned accesses in C on x86: > > https://stackoverflow.com/questions/46790550/c-undefined-behavior-strict-aliasing-rule-or-incorrect-alignment/46790815#46790815 > > There's no way to do the unaligned store/load in C (that I know of) > without invoking > undefined behavior. I can see 2 options, either write the code in > assembly, or use > some other C construct that doesn't rely on undefined behavior. > > While the for loop may seem slower than the other options, it > surprisingly results in > fewer load/store operations in certain scenarios. For example, if n == > 15 and it's > known at compile-time, the compiler will generate 2 overlapping qword > load/store > operations (rather than the 4 that are currently being done with the > current code). > > All that being said, I can go back to something similar to my first > patch. Using inline > assembly, and making sure this time that it works for 32-bit too. I > will post a patch in > a few minutes that does exactly that. Maintainers can then chime in > with their preferred > option. I would propose that DPDK have same kind of define as the kernel for SAFE_UNALIGNED_ACCESS. The C standard has to apply to all architectures but DPDK will make the choice to be fast rather than standards conformant.