On Fri, Jan 17, 2025 at 3:40 AM Sam Russell <sam.h.russ...@gmail.com> wrote:
>
> We discussed this previously, we decided since AVX1 supports unaligned 
> accesses we could not do an alignment check at the start of the function, but 
> as you've discovered, this memcpy issue creates undefined behaviour.

I don't believe the memcpy is causing the problem. I believe it is
what Bruno or Paul showed:

    const __m128i *data = buf;

> Most performant would probably be an alignment check at the start and then 
> manually processing the first N bytes. Another option could be to simply cast 
> data to unsigned char* and then we can guarantee the compiler doesn't hit 
> alignment issues?

Change:

    const __m128i *data = buf;

To this so the compiler cannot pick between MOVDQA and MOVDQU:

    const __m128i data = _mm_loadu_si128(buf);

> What are people's preferences here?

Jeff

> On Fri, 17 Jan 2025 at 08:11, Paul Eggert <egg...@cs.ucla.edu> wrote:
>>
>> On 2025-01-16 21:25, Jeffrey Walton wrote:
>> > On Fri, Jan 17, 2025 at 12:07 AM Bruno Haible via Gnulib discussion
>> > list <bug-gnulib@gnu.org> wrote:
>> >> Yes, the undefined behaviour really starts here, in line 35:
>> >>
>> >>    const __m128i *data = buf;
>> >>
>> >> 'buf' was not aligned, 'const __m128i *' is 16-byte aligned.
>> >
>> > Disassemble the code around that line. See which asm instruction is
>> > being used for the load. I suspect MOVDQA (aligned) is being used
>> > instead of MOVDQU (unaligned).
>>
>> The compiler is entitled to do that. Bruno's right, the behavior is
>> undefined once the code assigns the unaligned pointer to an __m128i *
>> variable; see C23 §6.3.2.3 ¶7. Since behavior is undefined, the compiler
>> can do whatever it likes.
>>
>> I installed the attached patch to work around the immediate issue of the
>> undefined behavior. This skips the pclmul speedup if the buffer is not
>> properly aligned. If that is a significant performance issue in
>> Gnulib-using code, I hope Sam or somebody can come up with a
>> higher-performance fix.

Reply via email to