[Bug target/119108] [15 Regression] AArch64 Commit 'vect: Force alignment peeling ...' (r15-6807-g68326d5d1a593d) causes regression in Snappy workload for -mcpu=neoverse-v2.

matmal01 at gcc dot gnu.org via Gcc-bugs Tue, 11 Mar 2025 06:56:57 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119108


--- Comment #9 from Matthew Malcomson <matmal01 at gcc dot gnu.org> ---
(In reply to Tamar Christina from comment #8)
> Ok, so having looked at this I'm not sure the compiler is at fault here.
> 
> Similar to the SVN case the snappy code is misaligning the loads
> intentionally and  loading 64-bits at a time from the 8-bit pointer:

... 

> So I think this is a case where the compiler can't do anything. (I also
> think that the C code uses UB similar to SVN, they misalign the byte array
> to 4-bytes but load 8-bytes at a time. They get lucky that the vector code
> is never entered).

...

> 
> The could would be beneficial if they:
> 
> 1. added restrict to the functions, as eg in `FindMatchLengthPlain` values
> manually vectorized anyway so aliasing must not be a problem
> 2. they have a simple scalar loop variant that's left up to the vectorizer
> to vectorize.  This would actually give them faster code and allow e.g. SVE
> codegen.


Thanks for looking into it Tamar!

Few questions (some just because I want to make sure I understand -- some more
on topic ;-)

Just to understand:
- What SVN case are you referencing?
- How is this UB?  The UNALIGNED_LOAD64 seems to use `memcpy`, and they provide
a relevant limit on the reads of 8 bytes at a time.

More relevant to the issue:
- I tried by adding `__restrict__` to `s1` and `s2` in `FindMatchLengthPlain`
and replacing the function with a plain loop.  I saw a significant slowdown. 
Is  your point that this would allow the compiler to do something about the
code even though it may not be better right now?  Or did you mean inline the
loop or something.  (N.b. didn't double-check the codegen of that function --
just ran the benchmark naively again -- so if there was any obvious adjustment
in flags or the like I should make I didn't make it ;-)

[Bug target/119108] [15 Regression] AArch64 Commit 'vect: Force alignment peeling ...' (r15-6807-g68326d5d1a593d) causes regression in Snappy workload for -mcpu=neoverse-v2.

Reply via email to