[PATCH PR95254] aarch64: gcc generate inefficient code with fixed sve vector length

Yangfei (Felix) Thu, 21 May 2020 01:09:09 -0700

Hi,

  Notice a tiny SVE-related performance issue:  
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95254


  For the given test case, SLP succeeds with VNx8HImode with or without option 
-msve-vector-bits=256.
  The root cause for the difference is that we choose a different mode in 
aarch64_vectorize_related_mode under -msve-vector-bits=256: VNx2HImode instead 
of V4HImode.
  Then in the final tree ssa forwprop pass, we need to do a VIEW_CONVERT from 
V4HImode to VNx2HImode.

  PATCH catch and simplify the pattern in aarch64_expand_sve_mem_move, emitting 
a mov pattern of V4HImode instead.
  I am assuming endianness does not make a difference here considering this 
simplification.
  Bootstrap and tested on aarch64-linux-gnu.  Comments?

Thanks,
Felix

pr95254-v1.diff
Description: pr95254-v1.diff

[PATCH PR95254] aarch64: gcc generate inefficient code with fixed sve vector length

Reply via email to