Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

Qing Zhao via Gcc-patches Thu, 17 Sep 2020 12:08:01 -0700

> On Sep 17, 2020, at 11:27 AM, Richard Sandiford <richard.sandif...@arm.com> 
> wrote:
> 
> Qing Zhao <qing.z...@oracle.com <mailto:qing.z...@oracle.com>> writes:
>>> On Sep 17, 2020, at 1:17 AM, Richard Sandiford <richard.sandif...@arm.com> 
>>> wrote:
>>> 
>>> Qing Zhao <qing.z...@oracle.com> writes:
>>>> Segher and Richard, 
>>>> 
>>>> Now there are two major concerns from the discussion so far:
>>>> 
>>>> 1. (From Richard):  Inserting zero insns should be done after 
>>>> pass_thread_prologue_and_epilogue since later passes (for example, 
>>>> pass_regrename) might introduce new used caller-saved registers. 
>>>>    So, we should do this in the beginning of pass_late_compilation (some 
>>>> targets wouldn’t cope with doing it later). 
>>>> 
>>>> 2. (From Segher): The inserted zero insns should stay together with the 
>>>> return, no other insns should move in-between zero insns and return insns. 
>>>> Otherwise, a valid gadget could be formed. 
>>>> 
>>>> I think that both of the above 2 concerns are important and should be 
>>>> addressed for the correct implementation. 
>>>> 
>>>> In order to support 1,  we cannot implementing it in 
>>>> “targetm.gen_return()” and “targetm.gen_simple_return()”  since 
>>>> “targetm.gen_return()” and “targetm.gen_simple_return()” are called during 
>>>> pass_thread_prologue_and_epilogue, at that time, the use information still 
>>>> not correct. 
>>>> 
>>>> In order to support 2, enhancing EPILOGUE_USES to include the zeroed 
>>>> registgers is NOT enough to prevent all the zero insns from moving around.
>>> 
>>> Right.  The purpose of EPILOGUE_USES was instead to stop the moves from
>>> being deleted as dead.
>>> 
>>>> More restrictions need to be added to these new zero insns.  (I think that 
>>>> marking these new zeroed registers as “unspec_volatile” at RTL level is 
>>>> necessary to prevent them from deleting from moving around). 
>>>> 
>>>> 
>>>> So, based on the above, I propose the following approach that will resolve 
>>>> the above 2 concerns:
>>>> 
>>>> 1. Add 2 new target hooks:
>>>>  A. targetm.pro_epilogue_use (reg)
>>>>  This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to
>>>>  prevent deleting register setting instructions in prologue and epilogue.
>>>> 
>>>>  B. targetm.gen_zero_call_used_regs(need_zeroed_hardregs)
>>>>  This hook will gen a sequence of zeroing insns that zero the registers 
>>>> that specified in NEED_ZEROED_HARDREGS.
>>>> 
>>>>   A default handler of “gen_zero_call_used_regs” could be defined in 
>>>> middle end, which use mov insns to zero registers, and then use 
>>>> “targetm.pro_epilogue_use(reg)” to mark each zeroed registers. 
>>> 
>>> This sounds like you're going back to using:
>>> 
>>> (insn 18 16 19 2 (set (reg:SI 1 dx)
>>>       (const_int 0 [0])) "t10.c":11:1 -1
>>>    (nil))
>>> (insn 19 18 20 2 (unspec_volatile [
>>>           (reg:SI 1 dx)
>>>       ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>>>    (nil))
>>> 
>>> This also doesn't prevent the zeroing from being moved around.  Like the
>>> EPILOGUE_USES approach, it only prevents the clearing from being removed
>>> as dead.  I still think that the EPILOGUE_USES approach is the better
>>> way of doing that.
>> 
>> The following is what I see from i386.md: (I didn’t look at how 
>> “UNSPEC_volatile” is used in data flow analysis in GCC yet)
>> 
>> ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
>> ;; all of memory.  This blocks insns from being moved across this point.
> 
> Heh, it looks like that comment dates back to 1994. :-)
> 
> The comment is no longer correct though.  I wasn't around at the time,
> but I assume the comment was only locally true even then.
> 
> If what the comment said was true, then something like:
> 
> (define_insn "cld"
>  [(unspec_volatile [(const_int 0)] UNSPECV_CLD)]
>  ""
>  "cld"
>  [(set_attr "length" "1")
>   (set_attr "length_immediate" "0")
>   (set_attr "modrm" "0")])
> 
> would invalidate the entire register file and so would require all values
> to be spilt to the stack around the CLD.

Okay, thanks for the info. 
then, what’s the current definition of UNSPEC_VOLATILE? 


> 
>> I am not very familiar with how the unspec_volatile actually works in gcc’s 
>> data flow analysis, my understanding  from the above is, the RTL insns 
>> marked with UNSPEC_volatile would be served as a barrier that no other insns 
>> can move across this point. At the same time, since the marked RTL insns is 
>> considered to use and clobber all hard registers and memory, it cannot be 
>> deleted either. 
> 
> UNSPEC_VOLATILEs can't be deleted.  And they can't be reordered relative
> to other UNSPEC_VOLATILEs.  But the problem with:
> 
> (insn 18 16 19 2 (set (reg:SI 1 dx)
>       (const_int 0 [0])) "t10.c":11:1 -1
>    (nil))
> (insn 19 18 20 2 (unspec_volatile [
>           (reg:SI 1 dx)
>       ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>    (nil))
> 
> is that the volatile occurs *after* the zeroing instruction.  So at best
> it can stop insn 18 moving further down, to be closer to the return
> instruction.  There's nothing to stop insn 18 moving further up,
> away from the return instruction, which AIUI is what you're trying
> to prevent.  E.g. suppose we had:
> 
> (insn 17 … pop a register other than dx from the stack …)
> (insn 18 16 19 2 (set (reg:SI 1 dx)
>       (const_int 0 [0])) "t10.c":11:1 -1
>    (nil))
> (insn 19 18 20 2 (unspec_volatile [
>           (reg:SI 1 dx)
>       ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>    (nil))
> 
> There is nothing to stop an rtl pass reordering that to:
> 
> (insn 18 16 19 2 (set (reg:SI 1 dx)
>       (const_int 0 [0])) "t10.c":11:1 -1
>    (nil))
> (insn 17 … pop a register other than dx from the stack …)
> (insn 19 18 20 2 (unspec_volatile [
>           (reg:SI 1 dx)
>       ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>    (nil))

Yes, agreed. And then the volatile marking insn should be put BEFORE the 
zeroing insn. 

> 
> There's also no dataflow reason why this couldn't be reordered to:
> 
> (insn 18 16 19 2 (set (reg:SI 1 dx)
>       (const_int 0 [0])) "t10.c":11:1 -1
>    (nil))
> (insn 19 18 20 2 (unspec_volatile [
>           (reg:SI 1 dx)
>       ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>    (nil))
> (insn 17 … pop a register other than dx from the stack …)
> 

This is the place I don’t quite agree at this moment, maybe I still not quite 
understand the “UNSPEC_volatile”.

I checked several places in GCC that handle “UNSPEC_VOLATILE”, for example,  
for the routine “can_move_insns_across” in gcc/df-problem.c:

      if (NONDEBUG_INSN_P (insn))
        {
          if (volatile_insn_p (PATTERN (insn)))
            return false;

From my understanding of reading the code, when an insn is UNSPEC_VOLATILE, 
another insn will NOT be able to move across it. 

Then for the above example, the insn 17 should Not be moved across insn 19 
either.

Let me know if I miss anything important. 


> So…
> 
>> So, I thought that “UNSPEC_volatile” should be stronger than 
>> “EPILOGUE_USES”. And it can serve the purpose of preventing zeroing insns 
>> from deleting and moving. 
> 
> …both EPILOGUE_USES and UNSPEC_VOLATILE would be effective ways of
> stopping insn 18 from being deleted.  But an UNSPEC_VOLATILE after
> the instruction would IMO be counterproductive: it would stop the
> zeroing instructions that we want to be close to the return instruction
> from moving “closer” to the return instruction, but it wouldn't do the
> same for unrelated instructions.  So if anything, the unspec_volatile
> could increase the chances that something unrelated to the register
> zeroing is moved later than the register zeroing.  E.g. this could
> happen when filling delayed branch slots.
> 
>>> I don't think there's a foolproof way of preventing an unknown target
>>> machine_reorg pass from moving the instructions around.  But since we
>>> don't have unknown machine_reorgs (at least not in-tree), I think
>>> instead we should be prepared to patch machine_reorgs where necessary
>>> to ensure that they do the right thing.
>>> 
>>> If you want to increase the chances that machine_reorgs don't need to be
>>> patched, you could either:
>>> 
>>> (a) to make the zeroing instructions themselves volatile or
>>> (b) to insert a volatile reference to the register before (rather than
>>>   after) the zeroing instruction
>>> 
>>> IMO (b) is the way to go, because it avoids the need to define special
>>> volatile move patterns for each type of register.  (b) would be needed
>>> on top of (rather than instead of) the EPILOGUE_USES thing.
>>> 
>> Okay, will take approach b. 
>> 
>> But I still don’t quite understand why we still need “EPILOUGE_USES”? What’s 
>> the additional benefit from EPILOGUE_USES?
> 
> The asm for (b) goes before the instruction, so we'd have:
> 
> (insn 17 … new asm …)
> (insn 18 16 19 2 (set (reg:SI 1 dx)
>       (const_int 0 [0])) "t10.c":11:1 -1
>    (nil))
> (insn 19 … return …)
> 
> But something has to tell the df machinery that the value of edx
> matters on return from the function, otherwise insn 18 could be
> deleted as dead.  Adding edx to EPILOGUE_USES provides that information
> and stops the instruction from being deleted.


In the above, insn 17 will be something like:

(insn 17 ...(unspec_volatile [  (reg:SI 1 dx)
    ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1 
(nil))

So, the reg edx is marked as “UNSPEC_volatile” already, that should mean the 
value of edx matters on return from the function already, my understanding is 
that df should automatically pick up the “UNSPEC_VOLATILE” insn and it’s 
operands.   “UNSPEC_VOLATILE” insn should serve the same purpose as putting 
“edx” to EPILOGUE_USES. 

Do I miss anything here?

> 
>>> I don't think we need a new target-specific unspec_volatile code to do (b).
>>> We can just use an automatically-generated volatile asm to clobber the
>>> registers first.  See e.g. how expand_asm_memory_blockage handles memory
>>> scheduling barriers.
>> /* Generate asm volatile("" : : : "memory") as the memory blockage.  */
>> 
>> static void
>> expand_asm_memory_blockage (void)
>> {
>>  rtx asm_op, clob;
>> 
>>  asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
>>                                 rtvec_alloc (0), rtvec_alloc (0),
>>                                 rtvec_alloc (0), UNKNOWN_LOCATION);
>>  MEM_VOLATILE_P (asm_op) = 1;
>> 
>>  clob = gen_rtx_SCRATCH (VOIDmode);
>>  clob = gen_rtx_MEM (BLKmode, clob);
>>  clob = gen_rtx_CLOBBER (VOIDmode, clob);
>> 
>>  emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, asm_op, clob)));
>> }
>> 
>> 
>> As the following? 
>> 
>> /* Generate asm volatile("" : : : “regno") for REGNO.   */
>> 
>> static void
>> expand_asm_reg_volatile (machine_mode mode, unsigned int regno)
>> {
>>  rtx asm_op, clob;
>> 
>>  asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
>>                                 rtvec_alloc (0), rtvec_alloc (0),
>>                                 rtvec_alloc (0), UNKNOWN_LOCATION);
>>  MEM_VOLATILE_P (asm_op) = 1;
>> 
>>  clob = gen_rtx_REG (mode, regno);
>>  clob = gen_rtx_CLOBBER (VOIDmode, clob);
>> 
>>  emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, asm_op, clob)));
>> }
>> 
>> Is the above correct? 
> 
> Yeah, looks good.  You should be able to clobber all the registers you
> want to clear in one asm.

How to do this?

thanks.

Qing
>  For extra safety, it might be worth including
> a (mem:BLK (scratch)) clobber too, so that memory instructions don't get
> moved across the asm.
> 
> Thanks,
> Richard
Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

Reply via email to