Qing Zhao <qing.z...@oracle.com> writes: > Hi, Richard, > > As you suggested, I added a default implementation of the target hook > “zero_cal_used_regs (HARD_REG_SET)” as following in my latest patch > > > /* The default hook for TARGET_ZERO_CALL_USED_REGS. */ > > void > default_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
FWIW, I was suggesting to return the set of registers that are actually cleared too. Here you have the hook emit the asm statement, but IMO the way we generate the asm for a given set of registers should be entirely target-independent, and happen outside the hook. So the hook returning the set of cleared registers does two things: (1) It indicates which registers should be clobbered by the asm (which would be generated after calling the hook, but emitted before the sequence of instructions generated by the hook). (2) It indicates which registers should be treated as live on return. FWIW, for (2), I'd recommend storing the returned HARD_REG_SET in crtl. Then the wrapper around EPILOGUE_USES that we talked about would check two things: - EPILOGUE_USES itself - the crtl HARD_REG_SET The crtl set would start out empty and remain empty unless the new option is used. > if (zero_rtx[(int)mode] == NULL_RTX) > { > zero_rtx[(int)mode] = reg; > tmp = gen_rtx_SET (reg, const0_rtx); > emit_insn (tmp); > } > else > emit_move_insn (reg, zero_rtx[(int)mode]); Hmm, OK, so you're assuming that it's better to zero one register and reuse that register for later moves. I guess this is my RISC background/bias showing, but I think it might be simpler to assume that zeroing is as cheap as a register move. The danger with reusing earlier registers is that you might introduce a cross-bank move, and some targets can only do those via memory. Or perhaps we could use insn_cost to choose between them. But I think the first implementation can just zero each register individually, unless we already know of a specific case in which reusing registers is necessary. > I tested this default implementation on aarch64 with a small testing case, > -fzero-call-used-regs=all-gpr|used-gpr|used-gpr-arg|used-arg|used work well, > however, > -fzero-call-used-regs=all-arg or -fzero-call-used-regs=all have an internal > compiler error as following: > > t1.c:15:1: internal compiler error: in gen_highpart, at emit-rtl.c:1631 > 15 | } > | ^ > 0xcff58b gen_highpart(machine_mode, rtx_def*) > ../../hjl-caller-saved-gcc/gcc/emit-rtl.c:1631 > 0x174b373 aarch64_split_128bit_move(rtx_def*, rtx_def*) > ../../hjl-caller-saved-gcc/gcc/config/aarch64/aarch64.c:3390 > 0x1d8b087 gen_split_11(rtx_insn*, rtx_def**) > ../../hjl-caller-saved-gcc/gcc/config/aarch64/aarch64.md:1394 > > As I studied today, I found the major issue for this bug is because the > following statement: > > machine_mode mode = reg_raw_mode[regno]; > > “reg_raw_mode” returns E_TImode for aarch64 register V0 (which is a vector > register on aarch64) , as a result, the zeroing insn for this register is: > > (insn 112 111 113 7 (set (reg:TI 32 v0) > (const_int 0 [0])) "t1.c":15:1 -1 > (nil)) > > > However, looks like that the above RTL have to be splitted into two sub > register moves on aarch64, and the splitting has some issue. > > So, I guess that on aarch64, zeroing vector registers might need other modes > than the one returned by “reg_raw_mode”. > > My questions are: > > 1. Is there another available utility routine that returns the proper MODE > for the hard registers that can be readily used to zero the hardr register? > 2. If not, should I add one more target hook for this purpose? i.e > > /* Return the proper machine mode that can be used to zero this hard register > specified by REGNO. */ > machine_mode zero-call-used-regs-mode (unsigned int REGNO) Thanks for testing aarch64. I think there are two issues here, one in the patch and one in the aarch64 backend: - the patch should use emit_move_insn rather than use gen_rtx_SET directly. - the aarch64 backend doesn't handle zeroing TImode vector registers, but should. E.g. for: void foo () { register __int128_t q0 asm ("q0"); q0 = 0; asm volatile ("" :: "w" (q0)); } we generate: mov x0, 0 mov x1, 0 fmov d0, x0 fmov v0.d[1], x1 which is, er, somewhat suboptimal. I'll try to fix the aarch64 bug for Monday next week. Thanks, Richard