On Tue, 2025-02-25 at 20:49 +0800, Lulu Cheng wrote:
> 
> 在 2025/2/22 下午3:34, Xi Ruoyao 写道:
> > Now for __builtin_popcountl we are getting things like
> > 
> >     vrepli.b        $vr0,0
> >     vinsgr2vr.d     $vr0,$r4,0
> >     vpcnt.d $vr0,$vr0
> >     vpickve2gr.du   $r4,$vr0,0
> >     slli.w  $r4,$r4,0
> >     jr  $r1
> > 
> > The "vrepli.b" instruction is introduced by the init-regs pass (see
> > PR61810 and all the issues it references).  To work it around, we can
> > use post-reload instead of define_expand: the "f" constraint will make
> > the compiler automatically move the scalar between GPR and FPR, and
> > reload is much later than init-regs so init-regs won't get in our way.
> > 
> > Now the code looks like:
> > 
> >     movgr2fr.d      $f0,$r4
> >     vpcnt.d $vr0,$vr0
> >     movfr2gr.d      $r4,$f0
> >     jr  $r1
> > 
> > gcc/ChangeLog:
> > 
> >     * config/loongarch/loongarch.md (cntmap): Change to uppercase.
> >     (popcount<GPR:mode>2): Modify to a post reload split.
> > ---
> > 
> > Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?
> 
> I am currently optimizing the alignment with the code of r15-7684,
> 
> so should I submit the optimization patch for GCC16 stage1?

Hmm I think my patch should only affect code explicitly using
__builtin_popcount (AFAIK the compiler cannot optimize a popcount loop
into __builtin_popcount yet).  So it depends on if your benchmark code
uses __builtin_popcount much...


-- 
Xi Ruoyao <xry...@xry111.site>
School of Aerospace Science and Technology, Xidian University

Reply via email to