> - s += 16; > + v16qi data, t; > + /* Unaligned load. Reading beyond the final newline is safe, since > + files.cc:read_file_guts pads the allocation. */
You need to change that function to use 32 byte padding as Jakub pointed out (I forgot that too) > + data = *(const v16qi_u *)s; > + /* Prevent propagation into pshufb and pcmp as memory operand. */ > + __asm__ ("" : "+x" (data)); It would probably make sense to a file a PR on this separately, to eventually fix the compiler to not need such workarounds. Not sure how much difference it makes however. -Andi