On Fri, Aug 23, 2024 at 5:46 PM HAO CHEN GUI <guih...@linux.ibm.com> wrote:
>
> Hi Hongtao,
>
> 在 2024/8/23 11:47, Hongtao Liu 写道:
> > On Fri, Aug 23, 2024 at 11:03 AM HAO CHEN GUI <guih...@linux.ibm.com> wrote:
> >>
> >> Hi Hongtao,
> >>
> >> 在 2024/8/23 9:47, Hongtao Liu 写道:
> >>> On Thu, Aug 22, 2024 at 4:06 PM HAO CHEN GUI <guih...@linux.ibm.com> 
> >>> wrote:
> >>>>
> >>>> Hi Hongtao,
> >>>>
> >>>> 在 2024/8/21 11:21, Hongtao Liu 写道:
> >>>>> r15-3058-gbb42c551905024 support const0 operand for movv16qi, please
> >>>>> rebase your patch and see if there's still the regressions.
> >>>>
> >>>> There's still regressions. The patch enables V16QI const0 store, but
> >>>> it also enables V8QI const0 store. The vector mode is preferable than
> >>>> scalar mode so that V8QI is used for 8-byte memory clear instead of
> >>>> DI. It's sub-optimal.
> >>> Could we check if mode_size is greater than HOST_BITS_PER_WIDE_INT?
> >> Not sure if all targets prefer it. Richard & Jeff, what's your opinion?
> >>
> >> IMHO, could we disable it from predicate or convert it to DI mode store
> >> if V8QI const0 store is sub-optimal on i386?
> >>
> >>
> >>>>
> >>>> Another issue is it takes lots of subreg to generate an all-zero
> >>>> V16QI register sometime. As PR92080 has been fixed, it can't reuse
> >>>> existing all-zero V16QI register.
> > Backend rtx_cost needs to be adjusted to prevent const0 propagation.
> > The current rtx_cost for const0 for i386 is 0, which will enable
> > propagation of const0.
> >
> >    /* If MODE2 is appropriate for an MMX register, then tie
> > @@ -21588,10 +21590,12 @@ ix86_rtx_costs (rtx x, machine_mode mode,
> > int outer_code_i, int opno,
> >         case 0:
> >           break;
> >         case 1:  /* 0: xor eliminates false dependency */
> > -         *total = 0;
> > +         /* Add extra cost 1 to prevent propagation of CONST_VECTOR
> > +            for SET, which will enable more CSE optimization.  */
> > +         *total = 0 + (outer_code == SET);
> >           return true;
> >         default: /* -1: cmp contains false dependency */
> > -         *total = 1;
> > +         *total = 1 + (outer_code == SET);
> >           return true;
> >         }
> >
> > the upper hunk should help for that.
> Sorry, I didn't get your point. Which problem it will fix? I tested
> upper code. Nothing changed. Which kind of const0 propagation you want
> to prevent?
The patch itself doesn't enable CSE for const0_rtx, but it's needed
after cse_insn recognizes CONST0_RTX with a different mode and
replaces them with subreg.
I thought you had changed the cse_insn part.
 On the other hand, pxor is cheap, what matters more is the CSE of
broadcasting the same value to different modes. i.e.

__m512i sinkz;
__m256i sinky;
void foo(char c) {
    sinkz = _mm512_set1_epi8(c);
    sinky = _mm256_set1_epi8(c);
}

>
> Thanks
> Gui Haochen
>
> >>>>
> >>>> (insn 16 15 17 (set (reg:V4SI 118)
> >>>>         (const_vector:V4SI [
> >>>>                 (const_int 0 [0]) repeated x4
> >>>>             ])) "auto-init-7.c":25:12 -1
> >>>>      (nil))
> >>>>
> >>>> (insn 17 16 18 (set (reg:V8HI 117)
> >>>>         (subreg:V8HI (reg:V4SI 118) 0)) "auto-init-7.c":25:12 -1
> >>>>      (nil))
> >>>>
> >>>> (insn 18 17 19 (set (reg:V16QI 116)
> >>>>         (subreg:V16QI (reg:V8HI 117) 0)) "auto-init-7.c":25:12 -1
> >>>>      (nil))
> >>>>
> >>>> (insn 19 18 0 (set (mem/c:V16QI (plus:DI (reg:DI 114)
> >>>>                 (const_int 12 [0xc])) [0 MEM <char[1:28]> [(void 
> >>>> *)&temp3]+12 S16 A32])
> >>>>         (reg:V16QI 116)) "auto-init-7.c":25:12 -1
> >>>>      (nil))
> >>> I think those subregs can be simplified by later rtl passes?
> >>
> >> Here is the final dump. There are two all-zero 16-byte vector
> >> registers. It can't figure out V4SI could be a subreg of V16QI.
> >>
> >> (insn 14 56 15 2 (set (reg:V16QI 20 xmm0 [115])
> >>         (const_vector:V16QI [
> >>                 (const_int 0 [0]) repeated x16
> >>             ])) "auto-init-7.c":25:12 2154 {movv16qi_internal}
> >>      (nil))
> >> (insn 15 14 16 2 (set (mem/c:V16QI (reg:DI 0 ax [114]) [0 MEM <char[1:28]> 
> >> [(void *)&temp3]+0 S16 A128])
> >>         (reg:V16QI 20 xmm0 [115])) "auto-init-7.c":25:12 2154 
> >> {movv16qi_internal}
> >>      (nil))
> >> (insn 16 15 19 2 (set (reg:V4SI 20 xmm0 [118])
> >>         (const_vector:V4SI [
> >>                 (const_int 0 [0]) repeated x4
> >>             ])) "auto-init-7.c":25:12 2160 {movv4si_internal}
> >>      (nil))
> >> (insn 19 16 57 2 (set (mem/c:V16QI (plus:DI (reg:DI 0 ax [114])
> >>                 (const_int 12 [0xc])) [0 MEM <char[1:28]> [(void 
> >> *)&temp3]+12 S16 A32])
> >>         (reg:V16QI 20 xmm0 [116])) "auto-init-7.c":25:12 2154 
> >> {movv16qi_internal}
> >>
> >> Thanks
> >> Gui Haochen
> >>
> >>>>
> >>>> Thanks
> >>>> Gui Haochen
> >>>
> >>>
> >>>
> >
> >
> >



-- 
BR,
Hongtao

Reply via email to