Hi, Notice a tiny SVE-related performance issue: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95254
For the given test case, SLP succeeds with VNx8HImode with or without option -msve-vector-bits=256. The root cause for the difference is that we choose a different mode in aarch64_vectorize_related_mode under -msve-vector-bits=256: VNx2HImode instead of V4HImode. Then in the final tree ssa forwprop pass, we need to do a VIEW_CONVERT from V4HImode to VNx2HImode. PATCH catch and simplify the pattern in aarch64_expand_sve_mem_move, emitting a mov pattern of V4HImode instead. I am assuming endianness does not make a difference here considering this simplification. Bootstrap and tested on aarch64-linux-gnu. Comments? Thanks, Felix
pr95254-v1.diff
Description: pr95254-v1.diff