On 2/2/23 05:26, Richard Biener wrote:
On Thu, 2 Feb 2023, juzhe.zh...@rivai.ai wrote:
Yeah, Thanks. You are right. CSE should do the job.
Now I know the reason CSE failed to optimize is I include
VL_REGNUM(66)/VTYPE_RENUM(67) hard reg
as the dependency of pred_broadcast:
(insn 19 18 20 4 (set (reg:VNx1DI 152)
(if_then_else:VNx1DI (unspec:VNx1BI [
(const_vector:VNx1BI repeat [
(const_int 1 [0x1])
])
(const_int 4 [0x4])
(const_int 2 [0x2]) repeated x2
(const_int 0 [0])
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(vec_duplicate:VNx1DI (reg/v:DI 148 [ x ]))
(unspec:VNx1DI [
(const_int 0 [0])
] UNSPEC_VUNDEF))) "rvv.c":22:23 695 {pred_broadcastvnx1di}
(nil))
Then CSE failed to set the 152 as copy.
VL_REGNUM(66)/VTYPE_RENUM(67) are the global hard reg that I should make each
RVV instruction depend on them.
Since we use vsetvl instruction (which is setting global
VL_REGNUM(66)/VTYPE_RENUM(67) status) to set the global status for
each RVV instruction.
Including the dependency here is to make sure the global VL/VTYPE status is
correct of each RVV instruction. (If we don't include
such dependency in RVV instruction, instruction scheduling may move the RVV
instructions and vsetvl instructions randomly then
produce incorrect vsetvl configuration)
The original reg_class of VL_REGNUM(66)/VTYPE_RENUM(67) I set here:
riscv_regno_to_class [VL_REGNUM] = VL_REGS;
riscv_regno_to_class [VTYPE_RENUM] = VTYPE_REGS;
Such configuration make CSE failed.
However, if I change the reg_class :
riscv_regno_to_class [VL_REGNUM] = NO_REGS;
riscv_regno_to_class [VTYPE_RENUM] = NO_REGS;
The CSE now can do the optimization now!
1) Would you mind telling me the difference between them?
No idea. I think CSE avoids to touch hard register references because
eliding them to copies can increase register pressure.IIRC the costing is set up differently and for a given partition a
pseudo will be preferred over a hard reg. This is in addition to other
places that test the small register class hooks.
2) If I set these 2 global status register as NO_REGS, will it create
issues for the global status configuration of each RVV instructions ?
No idea either. Usually these kind of dependences are introduced
by targets at the point the VL setting is introduced to avoid
pessimizing optimizations earlier. Often, for cases like a VL
register, this is done after register allocation only and indeed
necessary to avoid the second scheduling pass from breaking things.
Yea. I'm wondering about when the right place to introduce these
dependencies might be. I'm still a few months out from worrying about
RVV, but it's not too far away.
jeff