Qing Zhao <qing.z...@oracle.com> writes:
> Hi, Richard,
>
> As you suggested, I added a default implementation of the target hook 
> “zero_cal_used_regs (HARD_REG_SET)” as following in my latest patch
>
>
> /* The default hook for TARGET_ZERO_CALL_USED_REGS.  */
>
> void
> default_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)

FWIW, I was suggesting to return the set of registers that are actually
cleared too.  Here you have the hook emit the asm statement, but IMO the
way we generate the asm for a given set of registers should be entirely
target-independent, and happen outside the hook.

So the hook returning the set of cleared registers does two things:

(1) It indicates which registers should be clobbered by the asm
    (which would be generated after calling the hook, but emitted
    before the sequence of instructions generated by the hook).

(2) It indicates which registers should be treated as live on return.

FWIW, for (2), I'd recommend storing the returned HARD_REG_SET in crtl.
Then the wrapper around EPILOGUE_USES that we talked about would
check two things:

- EPILOGUE_USES itself
- the crtl HARD_REG_SET

The crtl set would start out empty and remain empty unless the
new option is used.

>         if (zero_rtx[(int)mode] == NULL_RTX)
>           {
>             zero_rtx[(int)mode] = reg;
>             tmp = gen_rtx_SET (reg, const0_rtx);
>             emit_insn (tmp);
>           }
>         else
>           emit_move_insn (reg, zero_rtx[(int)mode]);

Hmm, OK, so you're assuming that it's better to zero one register
and reuse that register for later moves.  I guess this is my RISC
background/bias showing, but I think it might be simpler to assume
that zeroing is as cheap as a register move.  The danger with reusing
earlier registers is that you might introduce a cross-bank move,
and some targets can only do those via memory.

Or perhaps we could use insn_cost to choose between them.  But I think
the first implementation can just zero each register individually,
unless we already know of a specific case in which reusing registers
is necessary.

> I tested this default implementation on aarch64 with a small testing case, 
> -fzero-call-used-regs=all-gpr|used-gpr|used-gpr-arg|used-arg|used work well, 
> however, 
> -fzero-call-used-regs=all-arg or -fzero-call-used-regs=all have an internal 
> compiler error as following:
>
> t1.c:15:1: internal compiler error: in gen_highpart, at emit-rtl.c:1631
>    15 | }
>       | ^
> 0xcff58b gen_highpart(machine_mode, rtx_def*)
>       ../../hjl-caller-saved-gcc/gcc/emit-rtl.c:1631
> 0x174b373 aarch64_split_128bit_move(rtx_def*, rtx_def*)
>       ../../hjl-caller-saved-gcc/gcc/config/aarch64/aarch64.c:3390
> 0x1d8b087 gen_split_11(rtx_insn*, rtx_def**)
>       ../../hjl-caller-saved-gcc/gcc/config/aarch64/aarch64.md:1394
>
> As I studied today, I found the major issue for this bug is because the 
> following statement:
>
>         machine_mode mode = reg_raw_mode[regno];
>
> “reg_raw_mode” returns E_TImode for aarch64 register V0 (which is a vector 
> register on aarch64) , as a result, the zeroing insn for this register is:
>
> (insn 112 111 113 7 (set (reg:TI 32 v0)
>         (const_int 0 [0])) "t1.c":15:1 -1
>      (nil))
>
>
> However, looks like that the above RTL have to be splitted into two sub 
> register moves on aarch64, and the splitting has some issue. 
>
> So, I guess that on aarch64, zeroing vector registers might need other modes 
> than the one returned by “reg_raw_mode”.  
>
> My questions are:
>
> 1. Is there another available utility routine that returns the proper MODE 
> for the hard registers that can be readily used to zero the hardr register?
> 2. If not, should I add one more target hook for this purpose? i.e 
>
> /* Return the proper machine mode that can be used to zero this hard register 
> specified by REGNO.  */
> machine_mode zero-call-used-regs-mode (unsigned int REGNO)

Thanks for testing aarch64.  I think there are two issues here,
one in the patch and one in the aarch64 backend:

- the patch should use emit_move_insn rather than use gen_rtx_SET directly.

- the aarch64 backend doesn't handle zeroing TImode vector registers,
  but should.  E.g. for:

    void
    foo ()
    {
      register __int128_t q0 asm ("q0");
      q0 = 0;
      asm volatile ("" :: "w" (q0));
    }

  we generate:

        mov     x0, 0
        mov     x1, 0
        fmov    d0, x0
        fmov    v0.d[1], x1

  which is, er, somewhat suboptimal.

I'll try to fix the aarch64 bug for Monday next week.

Thanks,
Richard

Reply via email to