> On Sep 25, 2020, at 7:53 AM, Richard Sandiford <richard.sandif...@arm.com>
> wrote:
>
> Qing Zhao <qing.z...@oracle.com> writes:
>> Hi, Richard,
>>
>> As you suggested, I added a default implementation of the target hook
>> “zero_cal_used_regs (HARD_REG_SET)” as following in my latest patch
>>
>>
>> /* The default hook for TARGET_ZERO_CALL_USED_REGS. */
>>
>> void
>> default_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
>
> FWIW, I was suggesting to return the set of registers that are actually
> cleared too. Here you have the hook emit the asm statement, but IMO the
> way we generate the asm for a given set of registers should be entirely
> target-independent, and happen outside the hook.
>
> So the hook returning the set of cleared registers does two things:
>
> (1) It indicates which registers should be clobbered by the asm
> (which would be generated after calling the hook, but emitted
> before the sequence of instructions generated by the hook).
For this purpose, this hook should return a Set of RTX that hold the cleared
registers, a HARD_REG_SET is not enough.
Since in the ASM_OPERANDS, we will need the RTX for the register (not the
REGNO).
Which data structure in GCC should be used here to hold this returned value as
Set of RTX ?
>
> (2) It indicates which registers should be treated as live on return.
>
> FWIW, for (2), I'd recommend storing the returned HARD_REG_SET in crtl.
Instead of storing this info in crtl, in my current patch, I added the
following in “df-scan.c":
+static HARD_REG_SET zeroed_reg_set;
And routines that manipulate this HARD_REG_SET.
I think that this should serve the same purpose as storing it to crtl?
> Then the wrapper around EPILOGUE_USES that we talked about would
> check two things:
>
> - EPILOGUE_USES itself
> - the crtl HARD_REG_SET
>
> The crtl set would start out empty and remain empty unless the
> new option is used.
Yes, I did this for zeroed_reg_set in my current patch.
>
>> if (zero_rtx[(int)mode] == NULL_RTX)
>> {
>> zero_rtx[(int)mode] = reg;
>> tmp = gen_rtx_SET (reg, const0_rtx);
>> emit_insn (tmp);
>> }
>> else
>> emit_move_insn (reg, zero_rtx[(int)mode]);
>
> Hmm, OK, so you're assuming that it's better to zero one register
> and reuse that register for later moves. I guess this is my RISC
> background/bias showing, but I think it might be simpler to assume
> that zeroing is as cheap as a register move. The danger with reusing
> earlier registers is that you might introduce a cross-bank move,
> and some targets can only do those via memory.
Okay, I will move zeroes to registers.
>
> Or perhaps we could use insn_cost to choose between them. But I think
> the first implementation can just zero each register individually,
> unless we already know of a specific case in which reusing registers
> is necessary.
The current X86 implementation uses register move instead of directly move zero
to register, I guess it’s because the register move on X86 is cheaper.
>
>> I tested this default implementation on aarch64 with a small testing case,
>> -fzero-call-used-regs=all-gpr|used-gpr|used-gpr-arg|used-arg|used work well,
>> however,
>> -fzero-call-used-regs=all-arg or -fzero-call-used-regs=all have an internal
>> compiler error as following:
>>
>> t1.c:15:1: internal compiler error: in gen_highpart, at emit-rtl.c:1631
>> 15 | }
>> | ^
>> 0xcff58b gen_highpart(machine_mode, rtx_def*)
>> ../../hjl-caller-saved-gcc/gcc/emit-rtl.c:1631
>> 0x174b373 aarch64_split_128bit_move(rtx_def*, rtx_def*)
>> ../../hjl-caller-saved-gcc/gcc/config/aarch64/aarch64.c:3390
>> 0x1d8b087 gen_split_11(rtx_insn*, rtx_def**)
>> ../../hjl-caller-saved-gcc/gcc/config/aarch64/aarch64.md:1394
>>
>> As I studied today, I found the major issue for this bug is because the
>> following statement:
>>
>> machine_mode mode = reg_raw_mode[regno];
>>
>> “reg_raw_mode” returns E_TImode for aarch64 register V0 (which is a vector
>> register on aarch64) , as a result, the zeroing insn for this register is:
>>
>> (insn 112 111 113 7 (set (reg:TI 32 v0)
>> (const_int 0 [0])) "t1.c":15:1 -1
>> (nil))
>>
>>
>> However, looks like that the above RTL have to be splitted into two sub
>> register moves on aarch64, and the splitting has some issue.
>>
>> So, I guess that on aarch64, zeroing vector registers might need other modes
>> than the one returned by “reg_raw_mode”.
>>
>> My questions are:
>>
>> 1. Is there another available utility routine that returns the proper MODE
>> for the hard registers that can be readily used to zero the hardr register?
>> 2. If not, should I add one more target hook for this purpose? i.e
>>
>> /* Return the proper machine mode that can be used to zero this hard
>> register specified by REGNO. */
>> machine_mode zero-call-used-regs-mode (unsigned int REGNO)
>
> Thanks for testing aarch64. I think there are two issues here,
> one in the patch and one in the aarch64 backend:
>
> - the patch should use emit_move_insn rather than use gen_rtx_SET directly.
Will try this.
>
> - the aarch64 backend doesn't handle zeroing TImode vector registers,
> but should. E.g. for:
>
> void
> foo ()
> {
> register __int128_t q0 asm ("q0");
> q0 = 0;
> asm volatile ("" :: "w" (q0));
> }
>
> we generate:
>
> mov x0, 0
> mov x1, 0
> fmov d0, x0
> fmov v0.d[1], x1
>
> which is, er, somewhat suboptimal.
>
> I'll try to fix the aarch64 bug for Monday next week.
Thanks a lot.
Qing
>
> Thanks,
> Richard