On Fri, Aug 23, 2024 at 5:46 PM HAO CHEN GUI <guih...@linux.ibm.com> wrote: > > Hi Hongtao, > > 在 2024/8/23 11:47, Hongtao Liu 写道: > > On Fri, Aug 23, 2024 at 11:03 AM HAO CHEN GUI <guih...@linux.ibm.com> wrote: > >> > >> Hi Hongtao, > >> > >> 在 2024/8/23 9:47, Hongtao Liu 写道: > >>> On Thu, Aug 22, 2024 at 4:06 PM HAO CHEN GUI <guih...@linux.ibm.com> > >>> wrote: > >>>> > >>>> Hi Hongtao, > >>>> > >>>> 在 2024/8/21 11:21, Hongtao Liu 写道: > >>>>> r15-3058-gbb42c551905024 support const0 operand for movv16qi, please > >>>>> rebase your patch and see if there's still the regressions. > >>>> > >>>> There's still regressions. The patch enables V16QI const0 store, but > >>>> it also enables V8QI const0 store. The vector mode is preferable than > >>>> scalar mode so that V8QI is used for 8-byte memory clear instead of > >>>> DI. It's sub-optimal. > >>> Could we check if mode_size is greater than HOST_BITS_PER_WIDE_INT? > >> Not sure if all targets prefer it. Richard & Jeff, what's your opinion? > >> > >> IMHO, could we disable it from predicate or convert it to DI mode store > >> if V8QI const0 store is sub-optimal on i386? > >> > >> > >>>> > >>>> Another issue is it takes lots of subreg to generate an all-zero > >>>> V16QI register sometime. As PR92080 has been fixed, it can't reuse > >>>> existing all-zero V16QI register. > > Backend rtx_cost needs to be adjusted to prevent const0 propagation. > > The current rtx_cost for const0 for i386 is 0, which will enable > > propagation of const0. > > > > /* If MODE2 is appropriate for an MMX register, then tie > > @@ -21588,10 +21590,12 @@ ix86_rtx_costs (rtx x, machine_mode mode, > > int outer_code_i, int opno, > > case 0: > > break; > > case 1: /* 0: xor eliminates false dependency */ > > - *total = 0; > > + /* Add extra cost 1 to prevent propagation of CONST_VECTOR > > + for SET, which will enable more CSE optimization. */ > > + *total = 0 + (outer_code == SET); > > return true; > > default: /* -1: cmp contains false dependency */ > > - *total = 1; > > + *total = 1 + (outer_code == SET); > > return true; > > } > > > > the upper hunk should help for that. > Sorry, I didn't get your point. Which problem it will fix? I tested > upper code. Nothing changed. Which kind of const0 propagation you want > to prevent? The patch itself doesn't enable CSE for const0_rtx, but it's needed after cse_insn recognizes CONST0_RTX with a different mode and replaces them with subreg. I thought you had changed the cse_insn part. On the other hand, pxor is cheap, what matters more is the CSE of broadcasting the same value to different modes. i.e.
__m512i sinkz; __m256i sinky; void foo(char c) { sinkz = _mm512_set1_epi8(c); sinky = _mm256_set1_epi8(c); } > > Thanks > Gui Haochen > > >>>> > >>>> (insn 16 15 17 (set (reg:V4SI 118) > >>>> (const_vector:V4SI [ > >>>> (const_int 0 [0]) repeated x4 > >>>> ])) "auto-init-7.c":25:12 -1 > >>>> (nil)) > >>>> > >>>> (insn 17 16 18 (set (reg:V8HI 117) > >>>> (subreg:V8HI (reg:V4SI 118) 0)) "auto-init-7.c":25:12 -1 > >>>> (nil)) > >>>> > >>>> (insn 18 17 19 (set (reg:V16QI 116) > >>>> (subreg:V16QI (reg:V8HI 117) 0)) "auto-init-7.c":25:12 -1 > >>>> (nil)) > >>>> > >>>> (insn 19 18 0 (set (mem/c:V16QI (plus:DI (reg:DI 114) > >>>> (const_int 12 [0xc])) [0 MEM <char[1:28]> [(void > >>>> *)&temp3]+12 S16 A32]) > >>>> (reg:V16QI 116)) "auto-init-7.c":25:12 -1 > >>>> (nil)) > >>> I think those subregs can be simplified by later rtl passes? > >> > >> Here is the final dump. There are two all-zero 16-byte vector > >> registers. It can't figure out V4SI could be a subreg of V16QI. > >> > >> (insn 14 56 15 2 (set (reg:V16QI 20 xmm0 [115]) > >> (const_vector:V16QI [ > >> (const_int 0 [0]) repeated x16 > >> ])) "auto-init-7.c":25:12 2154 {movv16qi_internal} > >> (nil)) > >> (insn 15 14 16 2 (set (mem/c:V16QI (reg:DI 0 ax [114]) [0 MEM <char[1:28]> > >> [(void *)&temp3]+0 S16 A128]) > >> (reg:V16QI 20 xmm0 [115])) "auto-init-7.c":25:12 2154 > >> {movv16qi_internal} > >> (nil)) > >> (insn 16 15 19 2 (set (reg:V4SI 20 xmm0 [118]) > >> (const_vector:V4SI [ > >> (const_int 0 [0]) repeated x4 > >> ])) "auto-init-7.c":25:12 2160 {movv4si_internal} > >> (nil)) > >> (insn 19 16 57 2 (set (mem/c:V16QI (plus:DI (reg:DI 0 ax [114]) > >> (const_int 12 [0xc])) [0 MEM <char[1:28]> [(void > >> *)&temp3]+12 S16 A32]) > >> (reg:V16QI 20 xmm0 [116])) "auto-init-7.c":25:12 2154 > >> {movv16qi_internal} > >> > >> Thanks > >> Gui Haochen > >> > >>>> > >>>> Thanks > >>>> Gui Haochen > >>> > >>> > >>> > > > > > > -- BR, Hongtao