Hi:
   GCC11 will be the system GCC 2 years from now, and for the
processors then, they shouldn't even need to split a 256-bit vector
into 2 128-bits vectors.
   .i.e. Test SPEC2017 with the below 2 options on Zen3/ICL show
option B is better than Option A.
Option A:
-march=x86-64 -mtune=generic -mavx2 -mfma -Ofast

Option B:
Option A + -mtune-ctrl="256_unaligned_load_optimal,256_unaligned_store_optimal"

  Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}.
  Ok for trunk?




-- 
BR,
Hongtao

Attachment: 0001-Enable-X86_TUNE_AVX256_UNALIGNED_-LOAD-STORE-_OPTIMA.patch
Description: Binary data

Reply via email to