Hi, I submitted a patch to change the mode checking for CLEAR_BY_PIECES. https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660344.html
It causes some regressions on aarch64. With the patch, V2x8QImode is used to do clear by pieces instead of TImode as vector mode is preferable and V2x8QImode supports const0 store. Thus the efficient "stp" instructions can't be generated. I drafted following patch to fix the problem. It can fix regressions found in memset-corner-cases.c, memset-q-reg.c, auto-init-padding-11.c and auto-init-padding-5.c. Not sure if it should be done on all 16-byte vector modes. Also not sure if the patch is proper. So I send this RFC email. Thanks Gui Haochen ChangeLog aarch64: Implement 16-byte vector mode const0 store by TImode gcc/ * config/aarch64/aarch64-simd.md (mov<mode> for VSTRUCT_QD): Expand V2x8QImode const0 store by TImode. patch.diff diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 01b084d8ccb..8aa72940b12 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -7766,7 +7766,14 @@ (define_expand "mov<mode>" (match_operand:VSTRUCT_QD 1 "general_operand"))] "TARGET_FLOAT" { - if (can_create_pseudo_p ()) + if (<MODE>mode == V2x8QImode + && operands[1] == CONST0_RTX (V2x8QImode) + && MEM_P (operands[0])) + { + operands[0] = adjust_address (operands[0], TImode, 0); + operands[1] = CONST0_RTX (TImode); + } + else if (can_create_pseudo_p ()) { if (GET_CODE (operands[0]) != REG) operands[1] = force_reg (<MODE>mode, operands[1]);