https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119108
--- Comment #9 from Matthew Malcomson <matmal01 at gcc dot gnu.org> --- (In reply to Tamar Christina from comment #8) > Ok, so having looked at this I'm not sure the compiler is at fault here. > > Similar to the SVN case the snappy code is misaligning the loads > intentionally and loading 64-bits at a time from the 8-bit pointer: ... > So I think this is a case where the compiler can't do anything. (I also > think that the C code uses UB similar to SVN, they misalign the byte array > to 4-bytes but load 8-bytes at a time. They get lucky that the vector code > is never entered). ... > > The could would be beneficial if they: > > 1. added restrict to the functions, as eg in `FindMatchLengthPlain` values > manually vectorized anyway so aliasing must not be a problem > 2. they have a simple scalar loop variant that's left up to the vectorizer > to vectorize. This would actually give them faster code and allow e.g. SVE > codegen. Thanks for looking into it Tamar! Few questions (some just because I want to make sure I understand -- some more on topic ;-) Just to understand: - What SVN case are you referencing? - How is this UB? The UNALIGNED_LOAD64 seems to use `memcpy`, and they provide a relevant limit on the reads of 8 bytes at a time. More relevant to the issue: - I tried by adding `__restrict__` to `s1` and `s2` in `FindMatchLengthPlain` and replacing the function with a plain loop. I saw a significant slowdown. Is your point that this would allow the compiler to do something about the code even though it may not be better right now? Or did you mean inline the loop or something. (N.b. didn't double-check the codegen of that function -- just ran the benchmark naively again -- so if there was any obvious adjustment in flags or the like I should make I didn't make it ;-)