> On Sep 17, 2020, at 11:27 AM, Richard Sandiford <richard.sandif...@arm.com>
> wrote:
>
> Qing Zhao <qing.z...@oracle.com <mailto:qing.z...@oracle.com>> writes:
>>> On Sep 17, 2020, at 1:17 AM, Richard Sandiford <richard.sandif...@arm.com>
>>> wrote:
>>>
>>> Qing Zhao <qing.z...@oracle.com> writes:
>>>> Segher and Richard,
>>>>
>>>> Now there are two major concerns from the discussion so far:
>>>>
>>>> 1. (From Richard): Inserting zero insns should be done after
>>>> pass_thread_prologue_and_epilogue since later passes (for example,
>>>> pass_regrename) might introduce new used caller-saved registers.
>>>> So, we should do this in the beginning of pass_late_compilation (some
>>>> targets wouldn’t cope with doing it later).
>>>>
>>>> 2. (From Segher): The inserted zero insns should stay together with the
>>>> return, no other insns should move in-between zero insns and return insns.
>>>> Otherwise, a valid gadget could be formed.
>>>>
>>>> I think that both of the above 2 concerns are important and should be
>>>> addressed for the correct implementation.
>>>>
>>>> In order to support 1, we cannot implementing it in
>>>> “targetm.gen_return()” and “targetm.gen_simple_return()” since
>>>> “targetm.gen_return()” and “targetm.gen_simple_return()” are called during
>>>> pass_thread_prologue_and_epilogue, at that time, the use information still
>>>> not correct.
>>>>
>>>> In order to support 2, enhancing EPILOGUE_USES to include the zeroed
>>>> registgers is NOT enough to prevent all the zero insns from moving around.
>>>
>>> Right. The purpose of EPILOGUE_USES was instead to stop the moves from
>>> being deleted as dead.
>>>
>>>> More restrictions need to be added to these new zero insns. (I think that
>>>> marking these new zeroed registers as “unspec_volatile” at RTL level is
>>>> necessary to prevent them from deleting from moving around).
>>>>
>>>>
>>>> So, based on the above, I propose the following approach that will resolve
>>>> the above 2 concerns:
>>>>
>>>> 1. Add 2 new target hooks:
>>>> A. targetm.pro_epilogue_use (reg)
>>>> This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to
>>>> prevent deleting register setting instructions in prologue and epilogue.
>>>>
>>>> B. targetm.gen_zero_call_used_regs(need_zeroed_hardregs)
>>>> This hook will gen a sequence of zeroing insns that zero the registers
>>>> that specified in NEED_ZEROED_HARDREGS.
>>>>
>>>> A default handler of “gen_zero_call_used_regs” could be defined in
>>>> middle end, which use mov insns to zero registers, and then use
>>>> “targetm.pro_epilogue_use(reg)” to mark each zeroed registers.
>>>
>>> This sounds like you're going back to using:
>>>
>>> (insn 18 16 19 2 (set (reg:SI 1 dx)
>>> (const_int 0 [0])) "t10.c":11:1 -1
>>> (nil))
>>> (insn 19 18 20 2 (unspec_volatile [
>>> (reg:SI 1 dx)
>>> ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>>> (nil))
>>>
>>> This also doesn't prevent the zeroing from being moved around. Like the
>>> EPILOGUE_USES approach, it only prevents the clearing from being removed
>>> as dead. I still think that the EPILOGUE_USES approach is the better
>>> way of doing that.
>>
>> The following is what I see from i386.md: (I didn’t look at how
>> “UNSPEC_volatile” is used in data flow analysis in GCC yet)
>>
>> ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
>> ;; all of memory. This blocks insns from being moved across this point.
>
> Heh, it looks like that comment dates back to 1994. :-)
>
> The comment is no longer correct though. I wasn't around at the time,
> but I assume the comment was only locally true even then.
>
> If what the comment said was true, then something like:
>
> (define_insn "cld"
> [(unspec_volatile [(const_int 0)] UNSPECV_CLD)]
> ""
> "cld"
> [(set_attr "length" "1")
> (set_attr "length_immediate" "0")
> (set_attr "modrm" "0")])
>
> would invalidate the entire register file and so would require all values
> to be spilt to the stack around the CLD.
Okay, thanks for the info.
then, what’s the current definition of UNSPEC_VOLATILE?
>
>> I am not very familiar with how the unspec_volatile actually works in gcc’s
>> data flow analysis, my understanding from the above is, the RTL insns
>> marked with UNSPEC_volatile would be served as a barrier that no other insns
>> can move across this point. At the same time, since the marked RTL insns is
>> considered to use and clobber all hard registers and memory, it cannot be
>> deleted either.
>
> UNSPEC_VOLATILEs can't be deleted. And they can't be reordered relative
> to other UNSPEC_VOLATILEs. But the problem with:
>
> (insn 18 16 19 2 (set (reg:SI 1 dx)
> (const_int 0 [0])) "t10.c":11:1 -1
> (nil))
> (insn 19 18 20 2 (unspec_volatile [
> (reg:SI 1 dx)
> ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
> (nil))
>
> is that the volatile occurs *after* the zeroing instruction. So at best
> it can stop insn 18 moving further down, to be closer to the return
> instruction. There's nothing to stop insn 18 moving further up,
> away from the return instruction, which AIUI is what you're trying
> to prevent. E.g. suppose we had:
>
> (insn 17 … pop a register other than dx from the stack …)
> (insn 18 16 19 2 (set (reg:SI 1 dx)
> (const_int 0 [0])) "t10.c":11:1 -1
> (nil))
> (insn 19 18 20 2 (unspec_volatile [
> (reg:SI 1 dx)
> ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
> (nil))
>
> There is nothing to stop an rtl pass reordering that to:
>
> (insn 18 16 19 2 (set (reg:SI 1 dx)
> (const_int 0 [0])) "t10.c":11:1 -1
> (nil))
> (insn 17 … pop a register other than dx from the stack …)
> (insn 19 18 20 2 (unspec_volatile [
> (reg:SI 1 dx)
> ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
> (nil))
Yes, agreed. And then the volatile marking insn should be put BEFORE the
zeroing insn.
>
> There's also no dataflow reason why this couldn't be reordered to:
>
> (insn 18 16 19 2 (set (reg:SI 1 dx)
> (const_int 0 [0])) "t10.c":11:1 -1
> (nil))
> (insn 19 18 20 2 (unspec_volatile [
> (reg:SI 1 dx)
> ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
> (nil))
> (insn 17 … pop a register other than dx from the stack …)
>
This is the place I don’t quite agree at this moment, maybe I still not quite
understand the “UNSPEC_volatile”.
I checked several places in GCC that handle “UNSPEC_VOLATILE”, for example,
for the routine “can_move_insns_across” in gcc/df-problem.c:
if (NONDEBUG_INSN_P (insn))
{
if (volatile_insn_p (PATTERN (insn)))
return false;
From my understanding of reading the code, when an insn is UNSPEC_VOLATILE,
another insn will NOT be able to move across it.
Then for the above example, the insn 17 should Not be moved across insn 19
either.
Let me know if I miss anything important.
> So…
>
>> So, I thought that “UNSPEC_volatile” should be stronger than
>> “EPILOGUE_USES”. And it can serve the purpose of preventing zeroing insns
>> from deleting and moving.
>
> …both EPILOGUE_USES and UNSPEC_VOLATILE would be effective ways of
> stopping insn 18 from being deleted. But an UNSPEC_VOLATILE after
> the instruction would IMO be counterproductive: it would stop the
> zeroing instructions that we want to be close to the return instruction
> from moving “closer” to the return instruction, but it wouldn't do the
> same for unrelated instructions. So if anything, the unspec_volatile
> could increase the chances that something unrelated to the register
> zeroing is moved later than the register zeroing. E.g. this could
> happen when filling delayed branch slots.
>
>>> I don't think there's a foolproof way of preventing an unknown target
>>> machine_reorg pass from moving the instructions around. But since we
>>> don't have unknown machine_reorgs (at least not in-tree), I think
>>> instead we should be prepared to patch machine_reorgs where necessary
>>> to ensure that they do the right thing.
>>>
>>> If you want to increase the chances that machine_reorgs don't need to be
>>> patched, you could either:
>>>
>>> (a) to make the zeroing instructions themselves volatile or
>>> (b) to insert a volatile reference to the register before (rather than
>>> after) the zeroing instruction
>>>
>>> IMO (b) is the way to go, because it avoids the need to define special
>>> volatile move patterns for each type of register. (b) would be needed
>>> on top of (rather than instead of) the EPILOGUE_USES thing.
>>>
>> Okay, will take approach b.
>>
>> But I still don’t quite understand why we still need “EPILOUGE_USES”? What’s
>> the additional benefit from EPILOGUE_USES?
>
> The asm for (b) goes before the instruction, so we'd have:
>
> (insn 17 … new asm …)
> (insn 18 16 19 2 (set (reg:SI 1 dx)
> (const_int 0 [0])) "t10.c":11:1 -1
> (nil))
> (insn 19 … return …)
>
> But something has to tell the df machinery that the value of edx
> matters on return from the function, otherwise insn 18 could be
> deleted as dead. Adding edx to EPILOGUE_USES provides that information
> and stops the instruction from being deleted.
In the above, insn 17 will be something like:
(insn 17 ...(unspec_volatile [ (reg:SI 1 dx)
] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
(nil))
So, the reg edx is marked as “UNSPEC_volatile” already, that should mean the
value of edx matters on return from the function already, my understanding is
that df should automatically pick up the “UNSPEC_VOLATILE” insn and it’s
operands. “UNSPEC_VOLATILE” insn should serve the same purpose as putting
“edx” to EPILOGUE_USES.
Do I miss anything here?
>
>>> I don't think we need a new target-specific unspec_volatile code to do (b).
>>> We can just use an automatically-generated volatile asm to clobber the
>>> registers first. See e.g. how expand_asm_memory_blockage handles memory
>>> scheduling barriers.
>> /* Generate asm volatile("" : : : "memory") as the memory blockage. */
>>
>> static void
>> expand_asm_memory_blockage (void)
>> {
>> rtx asm_op, clob;
>>
>> asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
>> rtvec_alloc (0), rtvec_alloc (0),
>> rtvec_alloc (0), UNKNOWN_LOCATION);
>> MEM_VOLATILE_P (asm_op) = 1;
>>
>> clob = gen_rtx_SCRATCH (VOIDmode);
>> clob = gen_rtx_MEM (BLKmode, clob);
>> clob = gen_rtx_CLOBBER (VOIDmode, clob);
>>
>> emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, asm_op, clob)));
>> }
>>
>>
>> As the following?
>>
>> /* Generate asm volatile("" : : : “regno") for REGNO. */
>>
>> static void
>> expand_asm_reg_volatile (machine_mode mode, unsigned int regno)
>> {
>> rtx asm_op, clob;
>>
>> asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
>> rtvec_alloc (0), rtvec_alloc (0),
>> rtvec_alloc (0), UNKNOWN_LOCATION);
>> MEM_VOLATILE_P (asm_op) = 1;
>>
>> clob = gen_rtx_REG (mode, regno);
>> clob = gen_rtx_CLOBBER (VOIDmode, clob);
>>
>> emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, asm_op, clob)));
>> }
>>
>> Is the above correct?
>
> Yeah, looks good. You should be able to clobber all the registers you
> want to clear in one asm.
How to do this?
thanks.
Qing
> For extra safety, it might be worth including
> a (mem:BLK (scratch)) clobber too, so that memory instructions don't get
> moved across the asm.
>
> Thanks,
> Richard