> - s += 16;
> + v16qi data, t;
> + /* Unaligned load. Reading beyond the final newline is safe, since
> + files.cc:read_file_guts pads the allocation. */
You need to change that function to use 32 byte padding as Jakub
pointed out (I forgot that too)
> + data = *(const v16qi_u *)s;
> + /* Prevent propagation into pshufb and pcmp as memory operand. */
> + __asm__ ("" : "+x" (data));
It would probably make sense to a file a PR on this separately,
to eventually fix the compiler to not need such workarounds.
Not sure how much difference it makes however.
-Andi