Hi, I submitted a patch to change the mode checking for CLEAR_BY_PIECES. https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660344.html
It causes some regressions on aarch64. With the patch, V2x8QImode is used to do clear by pieces instead of TImode as vector mode is preferable and V2x8QImode supports const0 store. Thus the efficient "stp" instructions can't be generated. I drafted following patch to fix the problem. It can fix regressions found in memset-corner-cases.c, memset-q-reg.c, auto-init-padding-11.c and auto-init-padding-5.c. Compared to previous one, the main changes are 1. Support all 16-byte vector modes 2. Check memory address when pseudo can't be created. https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660349.html I send the patch in order to call auto CI to test the patch. The cfarm server is too slow to finish regression test overnight. I will check in the patch if there is no regressions and no one objects it. Thanks Gui Haochen ChangeLog aarch64: Implement 16-byte vector mode const0 store by TImode gcc/ * config/aarch64/aarch64-simd.md (mov<mode> for VSTRUCT_QD): Expand 16-byte vector mode const0 store by TImode. patch.diff diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 01b084d8ccb..acf86e191c7 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -7766,7 +7766,16 @@ (define_expand "mov<mode>" (match_operand:VSTRUCT_QD 1 "general_operand"))] "TARGET_FLOAT" { - if (can_create_pseudo_p ()) + if (known_eq (GET_MODE_SIZE (<MODE>mode), 16) + && operands[1] == CONST0_RTX (<MODE>mode) + && MEM_P (operands[0]) + && (can_create_pseudo_p () + || memory_address_p (TImode, XEXP (operands[0], 0)))) + { + operands[0] = adjust_address (operands[0], TImode, 0); + operands[1] = CONST0_RTX (TImode); + } + else if (can_create_pseudo_p ()) { if (GET_CODE (operands[0]) != REG) operands[1] = force_reg (<MODE>mode, operands[1]);