On Tue, 2025-02-25 at 20:49 +0800, Lulu Cheng wrote: > > 在 2025/2/22 下午3:34, Xi Ruoyao 写道: > > Now for __builtin_popcountl we are getting things like > > > > vrepli.b $vr0,0 > > vinsgr2vr.d $vr0,$r4,0 > > vpcnt.d $vr0,$vr0 > > vpickve2gr.du $r4,$vr0,0 > > slli.w $r4,$r4,0 > > jr $r1 > > > > The "vrepli.b" instruction is introduced by the init-regs pass (see > > PR61810 and all the issues it references). To work it around, we can > > use post-reload instead of define_expand: the "f" constraint will make > > the compiler automatically move the scalar between GPR and FPR, and > > reload is much later than init-regs so init-regs won't get in our way. > > > > Now the code looks like: > > > > movgr2fr.d $f0,$r4 > > vpcnt.d $vr0,$vr0 > > movfr2gr.d $r4,$f0 > > jr $r1 > > > > gcc/ChangeLog: > > > > * config/loongarch/loongarch.md (cntmap): Change to uppercase. > > (popcount<GPR:mode>2): Modify to a post reload split. > > --- > > > > Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? > > I am currently optimizing the alignment with the code of r15-7684, > > so should I submit the optimization patch for GCC16 stage1?
Hmm I think my patch should only affect code explicitly using __builtin_popcount (AFAIK the compiler cannot optimize a popcount loop into __builtin_popcount yet). So it depends on if your benchmark code uses __builtin_popcount much... -- Xi Ruoyao <xry...@xry111.site> School of Aerospace Science and Technology, Xidian University