Hi, I have enabled SSE moves for znver1-3 since they are performance win on this machine too (we avoid using loops or string operations which are more costy). However as discussed in the PR log, this triggers bug in IRA and it was decided it is better to not backport the fix.
Bootstrapped/regtested x86_64-linux, will commit it shortly. Honza gcc/ChangeLog: 2023-04-14 Jan Hubicka <hubi...@ucw.cz> PR target/109137 * config/i386/x86-tune.def (X86_TUNE_AVX256_MOVE_BY_PIECES): Remove znver1-3. (X86_TUNE_AVX256_STORE_BY_PIECES): Remove znver1-3. diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def index 78d815c32db..e6b9e21250f 100644 --- a/gcc/config/i386/x86-tune.def +++ b/gcc/config/i386/x86-tune.def @@ -545,12 +545,12 @@ DEF_TUNE (X86_TUNE_AVX512_SPLIT_REGS, "avx512_split_regs", m_ZNVER4) /* X86_TUNE_AVX256_MOVE_BY_PIECES: Optimize move_by_pieces with 256-bit AVX instructions. */ DEF_TUNE (X86_TUNE_AVX256_MOVE_BY_PIECES, "avx256_move_by_pieces", - m_CORE_AVX512 | m_ZNVER1 | m_ZNVER2 | m_ZNVER3) + m_CORE_AVX512) /* X86_TUNE_AVX256_STORE_BY_PIECES: Optimize store_by_pieces with 256-bit AVX instructions. */ DEF_TUNE (X86_TUNE_AVX256_STORE_BY_PIECES, "avx256_store_by_pieces", - m_CORE_AVX512 | m_ZNVER1 | m_ZNVER2 | m_ZNVER3) + m_CORE_AVX512) /* X86_TUNE_AVX512_MOVE_BY_PIECES: Optimize move_by_pieces with 512-bit AVX instructions. */