Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

Richard Sandiford Thu, 17 Sep 2020 09:28:18 -0700

Qing Zhao <qing.z...@oracle.com> writes:
>> On Sep 17, 2020, at 1:17 AM, Richard Sandiford <richard.sandif...@arm.com> 
>> wrote:
>> 
>> Qing Zhao <qing.z...@oracle.com> writes:
>>> Segher and Richard, 
>>> 
>>> Now there are two major concerns from the discussion so far:
>>> 
>>> 1. (From Richard):  Inserting zero insns should be done after 
>>> pass_thread_prologue_and_epilogue since later passes (for example, 
>>> pass_regrename) might introduce new used caller-saved registers. 
>>>     So, we should do this in the beginning of pass_late_compilation (some 
>>> targets wouldn’t cope with doing it later). 
>>> 
>>> 2. (From Segher): The inserted zero insns should stay together with the 
>>> return, no other insns should move in-between zero insns and return insns. 
>>> Otherwise, a valid gadget could be formed. 
>>> 
>>> I think that both of the above 2 concerns are important and should be 
>>> addressed for the correct implementation. 
>>> 
>>> In order to support 1,  we cannot implementing it in “targetm.gen_return()” 
>>> and “targetm.gen_simple_return()”  since “targetm.gen_return()” and 
>>> “targetm.gen_simple_return()” are called during 
>>> pass_thread_prologue_and_epilogue, at that time, the use information still 
>>> not correct. 
>>> 
>>> In order to support 2, enhancing EPILOGUE_USES to include the zeroed 
>>> registgers is NOT enough to prevent all the zero insns from moving around.
>> 
>> Right.  The purpose of EPILOGUE_USES was instead to stop the moves from
>> being deleted as dead.
>> 
>>> More restrictions need to be added to these new zero insns.  (I think that 
>>> marking these new zeroed registers as “unspec_volatile” at RTL level is 
>>> necessary to prevent them from deleting from moving around). 
>>> 
>>> 
>>> So, based on the above, I propose the following approach that will resolve 
>>> the above 2 concerns:
>>> 
>>> 1. Add 2 new target hooks:
>>>   A. targetm.pro_epilogue_use (reg)
>>>   This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to
>>>   prevent deleting register setting instructions in prologue and epilogue.
>>> 
>>>   B. targetm.gen_zero_call_used_regs(need_zeroed_hardregs)
>>>   This hook will gen a sequence of zeroing insns that zero the registers 
>>> that specified in NEED_ZEROED_HARDREGS.
>>> 
>>>    A default handler of “gen_zero_call_used_regs” could be defined in 
>>> middle end, which use mov insns to zero registers, and then use 
>>> “targetm.pro_epilogue_use(reg)” to mark each zeroed registers. 
>> 
>> This sounds like you're going back to using:
>> 
>> (insn 18 16 19 2 (set (reg:SI 1 dx)
>>        (const_int 0 [0])) "t10.c":11:1 -1
>>     (nil))
>> (insn 19 18 20 2 (unspec_volatile [
>>            (reg:SI 1 dx)
>>        ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>>     (nil))
>> 
>> This also doesn't prevent the zeroing from being moved around.  Like the
>> EPILOGUE_USES approach, it only prevents the clearing from being removed
>> as dead.  I still think that the EPILOGUE_USES approach is the better
>> way of doing that.
>
> The following is what I see from i386.md: (I didn’t look at how 
> “UNSPEC_volatile” is used in data flow analysis in GCC yet)
>
> ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
> ;; all of memory.  This blocks insns from being moved across this point.


Heh, it looks like that comment dates back to 1994. :-)

The comment is no longer correct though.  I wasn't around at the time,
but I assume the comment was only locally true even then.

If what the comment said was true, then something like:

(define_insn "cld"
  [(unspec_volatile [(const_int 0)] UNSPECV_CLD)]
  ""
  "cld"
  [(set_attr "length" "1")
   (set_attr "length_immediate" "0")
   (set_attr "modrm" "0")])

would invalidate the entire register file and so would require all values
to be spilt to the stack around the CLD.

> I am not very familiar with how the unspec_volatile actually works in gcc’s 
> data flow analysis, my understanding  from the above is, the RTL insns marked 
> with UNSPEC_volatile would be served as a barrier that no other insns can 
> move across this point. At the same time, since the marked RTL insns is 
> considered to use and clobber all hard registers and memory, it cannot be 
> deleted either. 

UNSPEC_VOLATILEs can't be deleted.  And they can't be reordered relative
to other UNSPEC_VOLATILEs.  But the problem with:

(insn 18 16 19 2 (set (reg:SI 1 dx)
       (const_int 0 [0])) "t10.c":11:1 -1
    (nil))
(insn 19 18 20 2 (unspec_volatile [
           (reg:SI 1 dx)
       ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
    (nil))

is that the volatile occurs *after* the zeroing instruction.  So at best
it can stop insn 18 moving further down, to be closer to the return
instruction.  There's nothing to stop insn 18 moving further up,
away from the return instruction, which AIUI is what you're trying
to prevent.  E.g. suppose we had:

(insn 17 … pop a register other than dx from the stack …)
(insn 18 16 19 2 (set (reg:SI 1 dx)
       (const_int 0 [0])) "t10.c":11:1 -1
    (nil))
(insn 19 18 20 2 (unspec_volatile [
           (reg:SI 1 dx)
       ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
    (nil))

There is nothing to stop an rtl pass reordering that to:

(insn 18 16 19 2 (set (reg:SI 1 dx)
       (const_int 0 [0])) "t10.c":11:1 -1
    (nil))
(insn 17 … pop a register other than dx from the stack …)
(insn 19 18 20 2 (unspec_volatile [
           (reg:SI 1 dx)
       ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
    (nil))

There's also no dataflow reason why this couldn't be reordered to:

(insn 18 16 19 2 (set (reg:SI 1 dx)
       (const_int 0 [0])) "t10.c":11:1 -1
    (nil))
(insn 19 18 20 2 (unspec_volatile [
           (reg:SI 1 dx)
       ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
    (nil))
(insn 17 … pop a register other than dx from the stack …)

So…

> So, I thought that “UNSPEC_volatile” should be stronger than “EPILOGUE_USES”. 
> And it can serve the purpose of preventing zeroing insns from deleting and 
> moving. 

…both EPILOGUE_USES and UNSPEC_VOLATILE would be effective ways of
stopping insn 18 from being deleted.  But an UNSPEC_VOLATILE after
the instruction would IMO be counterproductive: it would stop the
zeroing instructions that we want to be close to the return instruction
from moving “closer” to the return instruction, but it wouldn't do the
same for unrelated instructions.  So if anything, the unspec_volatile
could increase the chances that something unrelated to the register
zeroing is moved later than the register zeroing.  E.g. this could
happen when filling delayed branch slots.

>> I don't think there's a foolproof way of preventing an unknown target
>> machine_reorg pass from moving the instructions around.  But since we
>> don't have unknown machine_reorgs (at least not in-tree), I think
>> instead we should be prepared to patch machine_reorgs where necessary
>> to ensure that they do the right thing.
>> 
>> If you want to increase the chances that machine_reorgs don't need to be
>> patched, you could either:
>> 
>> (a) to make the zeroing instructions themselves volatile or
>> (b) to insert a volatile reference to the register before (rather than
>>    after) the zeroing instruction
>> 
>> IMO (b) is the way to go, because it avoids the need to define special
>> volatile move patterns for each type of register.  (b) would be needed
>> on top of (rather than instead of) the EPILOGUE_USES thing.
>> 
> Okay, will take approach b. 
>
> But I still don’t quite understand why we still need “EPILOUGE_USES”? What’s 
> the additional benefit from EPILOGUE_USES?

The asm for (b) goes before the instruction, so we'd have:

(insn 17 … new asm …)
(insn 18 16 19 2 (set (reg:SI 1 dx)
       (const_int 0 [0])) "t10.c":11:1 -1
    (nil))
(insn 19 … return …)

But something has to tell the df machinery that the value of edx
matters on return from the function, otherwise insn 18 could be
deleted as dead.  Adding edx to EPILOGUE_USES provides that information
and stops the instruction from being deleted.

>> I don't think we need a new target-specific unspec_volatile code to do (b).
>> We can just use an automatically-generated volatile asm to clobber the
>> registers first.  See e.g. how expand_asm_memory_blockage handles memory
>> scheduling barriers.
> /* Generate asm volatile("" : : : "memory") as the memory blockage.  */
>
> static void
> expand_asm_memory_blockage (void)
> {
>   rtx asm_op, clob;
>
>   asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
>                                  rtvec_alloc (0), rtvec_alloc (0),
>                                  rtvec_alloc (0), UNKNOWN_LOCATION);
>   MEM_VOLATILE_P (asm_op) = 1;
>
>   clob = gen_rtx_SCRATCH (VOIDmode);
>   clob = gen_rtx_MEM (BLKmode, clob);
>   clob = gen_rtx_CLOBBER (VOIDmode, clob);
>
>   emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, asm_op, clob)));
> }
>
>
> As the following? 
>
> /* Generate asm volatile("" : : : “regno") for REGNO.   */
>
> static void
> expand_asm_reg_volatile (machine_mode mode, unsigned int regno)
> {
>   rtx asm_op, clob;
>
>   asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
>                                  rtvec_alloc (0), rtvec_alloc (0),
>                                  rtvec_alloc (0), UNKNOWN_LOCATION);
>   MEM_VOLATILE_P (asm_op) = 1;
>
>   clob = gen_rtx_REG (mode, regno);
>   clob = gen_rtx_CLOBBER (VOIDmode, clob);
>
>   emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, asm_op, clob)));
> }
>
> Is the above correct? 

Yeah, looks good.  You should be able to clobber all the registers you
want to clear in one asm.  For extra safety, it might be worth including
a (mem:BLK (scratch)) clobber too, so that memory instructions don't get
moved across the asm.

Thanks,
Richard

Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

Reply via email to