> Thue use of _mm_loadu_si128 provides for unaligned byte arrays (that's > the 'u' in the 'loadu'), so you will be Ok there, too.
Thanks Jeff, I wasn't going to push this with a "works for me" without knowing why. I'll remove the alignment code. > I believe the way to zero a __m128i is using _mm_setzero_si128(). It Perfect, it's outside the main loop so I'm not worried but this does look more correct. On Tue, 26 Nov 2024 at 22:47, Jeffrey Walton <noloa...@gmail.com> wrote: > On Tue, Nov 26, 2024 at 4:27 PM Sam Russell <sam.h.russ...@gmail.com> > wrote: > > > > I've added an alignment check in lib/crc, it looks like the code works > okay without it for me but an _m128 is supposed to be 128-bit aligned so > I'm happy that I've added it. > > The _m128i's are naturally aligned. They will be ok: > > + in1 = _mm_loadu_si128(data); > + in2 = _mm_loadu_si128(data + 1); > + in3 = _mm_loadu_si128(data + 2); > + in4 = _mm_loadu_si128(data + 3); > > Thue use of _mm_loadu_si128 provides for unaligned byte arrays (that's > the 'u' in the 'loadu'), so you will be Ok there, too. > > > The attached patch renames the module to crc-x86_64 while keeping the > source file crc-x86_64-pclmul.c, as well as the alignment check above. > > > > test-crc and bench-crc are fine, gzip builds with this gnulib and > decompresses my test file with no hash error > > A quick comment from the patch... > > + __m128i in1 = {0}; > + __m128i in2 = {0}; > + __m128i in3 = {0}; > + __m128i in4 = {0}; > > I believe the way to zero a __m128i is using _mm_setzero_si128(). It > performs a pxor, which is fast. So I would expect to see: > > __m128i in1 = _mm_setzero_si128(); > __m128i in2 = _mm_setzero_si128(); > __m128i in3 = _mm_setzero_si128(); > __m128i in4 = _mm_setzero_si128(); > > I'm not sure what the assignment does, and if it always generates the > best code. If the assignment loads a zeroized buffer using movdqa or > movdqu, then I would expect it to be slower than the pxor. > > Also see < > https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_setzero_si128 > >. > > Jeff >