Dan Li <ashim...@linux.alibaba.com> writes:
>>> +
>>>      if (flag_stack_usage_info)
>>>        current_function_static_stack_size = constant_lower_bound 
>>> (frame_size);
>>>    
>>> @@ -9066,6 +9089,10 @@ aarch64_expand_epilogue (bool for_sibcall)
>>>          RTX_FRAME_RELATED_P (insn) = 1;
>>>        }
>>>    
>>> +  /* Pop return address from shadow call stack.  */
>>> +  if (aarch64_shadow_call_stack_enabled ())
>>> +   emit_insn (gen_scs_pop ());
>>> +
>> 
>> This looks correct, but following on from the above, I guess we continue
>> to restore x30 from the frame in the traditional way first, and then
>> overwrite it with the SCS value.  I think the output code would be
>> slightly neater if we suppressed the first restore of x30.
>> 
> Yes, the current epilogue is like:
>      .......
>      ldp     x29, x30, [sp], #16
> +   ldr     x30, [x18, #-8]!
>
> I think may be we can keep the two x30 pops here, for two reasons:
> 1) The implementation of scs in clang is to pop x30 twice, it might be
> better to be consistent with clang here[1].

This doesn't seem a very compelling reason on its own, unless it's
explicitly meant to be observable behaviour.  (But then, observed how?)

> 2) If we keep the first restore of x30, it may provide some flexibility
> for other scenarios. For example, we can directly patch the scs_push/pop
> insns in the binary to turn it into a binary without scs;

Is that a supported (and ideally documented) use case?  If it is,
then it becomes important for correctness that the compiler emits
exactly the opcodes in the original patch.  If it isn't, then the
compiler can emit other instructions that have the same effect.

I'd like a definitive ruling on this from the kernel folks before
the patch goes in.

If binary patching is supposed to be possible then scs_push and
scs_pop *do* need to be separate define_insns.  But they also need
to have some magic unspec that differentiates them from normal
pushes and pops, e.g.:

  (set ...mem...
       (unspec:DI [...reg...] UNSPEC_SCS_PUSH))

so that there is no chance that the pattern would be treated as
a normal move and optimised accordingly.

>>> +;; Save X30 in the X18-based POST_INC stack (consistent with clang).
>>> +(define_insn "scs_push"
>>> +  [(set (mem:DI (reg:DI R18_REGNUM)) (reg:DI R30_REGNUM))
>>> +   (set (reg:DI R18_REGNUM) (plus:DI (reg:DI R18_REGNUM) (const_int 8)))]
>>> +  ""
>>> +  "str\\tx30, [x18], #8"
>>> +  [(set_attr "type" "store_8")]
>>> +)
>>> +
>>> +;; Load X30 form the X18-based PRE_DEC stack (consistent with clang).
>>> +(define_insn "scs_pop"
>>> +  [(set (reg:DI R18_REGNUM) (minus:DI (reg:DI R18_REGNUM) (const_int 8)))
>>> +   (set (reg:DI R30_REGNUM) (mem:DI (reg:DI R18_REGNUM)))]
>>> +  ""
>>> +  "ldr\\tx30, [x18, #-8]!"
>>> +  [(set_attr "type" "load_8")]
>>> +)
>>> +
>> 
>> The two SETs here work in parallel, with the define_insn as a whole
>> following a "read input operands, act, write output operands" sequence.
>> The (mem: …) address therefore sees the old value of r18 rather than the
>> new one.  Also, RTL rules say that subtractions need to be written as
>> additions of the negative.
>> 
> Oh, sorry, I got it wrong here.
>
>> I think the pattern would therefore be something like:
>> 
>>    [(set (reg:DI R30_REGNUM) (mem:DI (plus:DI (reg:DI R18_REGNUM)
>>                                           (const_int -8))))]
>>     (set (reg:DI R18_REGNUM) (plus:DI (reg:DI R18_REGNUM) (const_int -8)))]
>> 
>> However, since these are normal moves, I think we should be able to use
>> the standard move patterns.  E.g. the push could be:
>> 
>> (define_expand "scs_push"
>>    [(set (mem:DI (pre_dec:DI (reg:DI R18_REGNUM)))
>>          (reg:DI R30_REGNUM))])
>> 
>> and similarly for the pop.
>> 
> Thanks, this template looks better.
>
> Since the push/pop of SCS and normal stack in clang are different
> (for example, scs push is post_inc, while normal stack is pre_dec),
> in order to maintain consistency, I think it can be changed to the
> following:
>
> (define_expand "scs_push"
>    [(set (mem:DI (post_inc:DI (reg:DI R18_REGNUM)))
>          (reg:DI R30_REGNUM))])
>
> (define_expand "scs_pop"
>    [(set (reg:DI R30_REGNUM)
>         (mem:DI (pre_dec:DI (reg:DI R18_REGNUM))))])

Yeah, I copied the wrong name above, sorry.

Thanks,
Richard

Reply via email to