> On Sep 17, 2020, at 1:17 AM, Richard Sandiford <richard.sandif...@arm.com>
> wrote:
>
> Qing Zhao <qing.z...@oracle.com> writes:
>> Segher and Richard,
>>
>> Now there are two major concerns from the discussion so far:
>>
>> 1. (From Richard): Inserting zero insns should be done after
>> pass_thread_prologue_and_epilogue since later passes (for example,
>> pass_regrename) might introduce new used caller-saved registers.
>> So, we should do this in the beginning of pass_late_compilation (some
>> targets wouldn’t cope with doing it later).
>>
>> 2. (From Segher): The inserted zero insns should stay together with the
>> return, no other insns should move in-between zero insns and return insns.
>> Otherwise, a valid gadget could be formed.
>>
>> I think that both of the above 2 concerns are important and should be
>> addressed for the correct implementation.
>>
>> In order to support 1, we cannot implementing it in “targetm.gen_return()”
>> and “targetm.gen_simple_return()” since “targetm.gen_return()” and
>> “targetm.gen_simple_return()” are called during
>> pass_thread_prologue_and_epilogue, at that time, the use information still
>> not correct.
>>
>> In order to support 2, enhancing EPILOGUE_USES to include the zeroed
>> registgers is NOT enough to prevent all the zero insns from moving around.
>
> Right. The purpose of EPILOGUE_USES was instead to stop the moves from
> being deleted as dead.
>
>> More restrictions need to be added to these new zero insns. (I think that
>> marking these new zeroed registers as “unspec_volatile” at RTL level is
>> necessary to prevent them from deleting from moving around).
>>
>>
>> So, based on the above, I propose the following approach that will resolve
>> the above 2 concerns:
>>
>> 1. Add 2 new target hooks:
>> A. targetm.pro_epilogue_use (reg)
>> This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to
>> prevent deleting register setting instructions in prologue and epilogue.
>>
>> B. targetm.gen_zero_call_used_regs(need_zeroed_hardregs)
>> This hook will gen a sequence of zeroing insns that zero the registers
>> that specified in NEED_ZEROED_HARDREGS.
>>
>> A default handler of “gen_zero_call_used_regs” could be defined in middle
>> end, which use mov insns to zero registers, and then use
>> “targetm.pro_epilogue_use(reg)” to mark each zeroed registers.
>
> This sounds like you're going back to using:
>
> (insn 18 16 19 2 (set (reg:SI 1 dx)
> (const_int 0 [0])) "t10.c":11:1 -1
> (nil))
> (insn 19 18 20 2 (unspec_volatile [
> (reg:SI 1 dx)
> ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
> (nil))
>
> This also doesn't prevent the zeroing from being moved around. Like the
> EPILOGUE_USES approach, it only prevents the clearing from being removed
> as dead. I still think that the EPILOGUE_USES approach is the better
> way of doing that.
The following is what I see from i386.md: (I didn’t look at how
“UNSPEC_volatile” is used in data flow analysis in GCC yet)
;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
;; all of memory. This blocks insns from being moved across this point.
I am not very familiar with how the unspec_volatile actually works in gcc’s
data flow analysis, my understanding from the above is, the RTL insns marked
with UNSPEC_volatile would be served as a barrier that no other insns can move
across this point. At the same time, since the marked RTL insns is considered
to use and clobber all hard registers and memory, it cannot be deleted either.
So, I thought that “UNSPEC_volatile” should be stronger than “EPILOGUE_USES”.
And it can serve the purpose of preventing zeroing insns from deleting and
moving.
>
> In other words: the use insns themselves are volatile and so can't be
> moved relative to each other and to other volatile insns. But the uses
> are fake instructions that don't do anything. The preceding zeroing
> instructions are just normal instructions that can be moved around
> freely before their respective uses.
But since the UNSPEC_volatile insns is considered as a barrier, no other insns
can move across them, then the zero insns cannot be moved around too, right?
>
> I don't think there's a foolproof way of preventing an unknown target
> machine_reorg pass from moving the instructions around. But since we
> don't have unknown machine_reorgs (at least not in-tree), I think
> instead we should be prepared to patch machine_reorgs where necessary
> to ensure that they do the right thing.
>
> If you want to increase the chances that machine_reorgs don't need to be
> patched, you could either:
>
> (a) to make the zeroing instructions themselves volatile or
> (b) to insert a volatile reference to the register before (rather than
> after) the zeroing instruction
>
> IMO (b) is the way to go, because it avoids the need to define special
> volatile move patterns for each type of register. (b) would be needed
> on top of (rather than instead of) the EPILOGUE_USES thing.
>
Okay, will take approach b.
But I still don’t quite understand why we still need “EPILOUGE_USES”? What’s
the additional benefit from EPILOGUE_USES?
> I don't think we need a new target-specific unspec_volatile code to do (b).
> We can just use an automatically-generated volatile asm to clobber the
> registers first. See e.g. how expand_asm_memory_blockage handles memory
> scheduling barriers.
/* Generate asm volatile("" : : : "memory") as the memory blockage. */
static void
expand_asm_memory_blockage (void)
{
rtx asm_op, clob;
asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
rtvec_alloc (0), rtvec_alloc (0),
rtvec_alloc (0), UNKNOWN_LOCATION);
MEM_VOLATILE_P (asm_op) = 1;
clob = gen_rtx_SCRATCH (VOIDmode);
clob = gen_rtx_MEM (BLKmode, clob);
clob = gen_rtx_CLOBBER (VOIDmode, clob);
emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, asm_op, clob)));
}
As the following?
/* Generate asm volatile("" : : : “regno") for REGNO. */
static void
expand_asm_reg_volatile (machine_mode mode, unsigned int regno)
{
rtx asm_op, clob;
asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
rtvec_alloc (0), rtvec_alloc (0),
rtvec_alloc (0), UNKNOWN_LOCATION);
MEM_VOLATILE_P (asm_op) = 1;
clob = gen_rtx_REG (mode, regno);
clob = gen_rtx_CLOBBER (VOIDmode, clob);
emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, asm_op, clob)));
}
Is the above correct?
thanks.
Qing
>
> Thanks,
> Richard