Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

Richard Sandiford Wed, 23 Sep 2020 08:22:15 -0700

Qing Zhao <qing.z...@oracle.com> writes:
>> On Sep 23, 2020, at 9:32 AM, Richard Sandiford <richard.sandif...@arm.com> 
>> wrote:
>> 
>> Qing Zhao <qing.z...@oracle.com> writes:
>>>> On Sep 23, 2020, at 6:05 AM, Richard Sandiford <richard.sandif...@arm.com> 
>>>> wrote:
>>>> 
>>>> Qing Zhao <qing.z...@oracle.com <mailto:qing.z...@oracle.com>> writes:
>>>>>> On Sep 22, 2020, at 12:06 PM, Richard Sandiford 
>>>>>> <richard.sandif...@arm.com> wrote:
>>>>>>>>> 
>>>>>>>>> The following is what I see from i386.md: (I didn’t look at how 
>>>>>>>>> “UNSPEC_volatile” is used in data flow analysis in GCC yet)
>>>>>>>>> 
>>>>>>>>> ;; UNSPEC_VOLATILE is considered to use and clobber all hard 
>>>>>>>>> registers and
>>>>>>>>> ;; all of memory.  This blocks insns from being moved across this 
>>>>>>>>> point.
>>>>>>>> 
>>>>>>>> Heh, it looks like that comment dates back to 1994. :-)
>>>>>>>> 
>>>>>>>> The comment is no longer correct though.  I wasn't around at the time,
>>>>>>>> but I assume the comment was only locally true even then.
>>>>>>>> 
>>>>>>>> If what the comment said was true, then something like:
>>>>>>>> 
>>>>>>>> (define_insn "cld"
>>>>>>>> [(unspec_volatile [(const_int 0)] UNSPECV_CLD)]
>>>>>>>> ""
>>>>>>>> "cld"
>>>>>>>> [(set_attr "length" "1")
>>>>>>>> (set_attr "length_immediate" "0")
>>>>>>>> (set_attr "modrm" "0")])
>>>>>>>> 
>>>>>>>> would invalidate the entire register file and so would require all 
>>>>>>>> values
>>>>>>>> to be spilt to the stack around the CLD.
>>>>>>> 
>>>>>>> Okay, thanks for the info. 
>>>>>>> then, what’s the current definition of UNSPEC_VOLATILE? 
>>>>>> 
>>>>>> I'm not sure it's written down anywhere TBH.  rtl.texi just says:
>>>>>> 
>>>>>> @code{unspec_volatile} is used for volatile operations and operations
>>>>>> that may trap; @code{unspec} is used for other operations.
>>>>>> 
>>>>>> which seems like a cyclic definition: volatile expressions are defined
>>>>>> to be expressions that are volatile.
>>>>>> 
>>>>>> But IMO the semantics are that unspec_volatile patterns with a given
>>>>>> set of inputs and outputs act for dataflow purposes like volatile asms
>>>>>> with the same inputs and outputs.  The semantics of asm volatile are
>>>>>> at least slightly more well-defined (if only by example); see extend.texi
>>>>>> for details.  In particular:
>>>>>> 
>>>>>> Note that the compiler can move even @code{volatile asm} instructions 
>>>>>> relative
>>>>>> to other code, including across jump instructions. For example, on many 
>>>>>> targets there is a system register that controls the rounding mode of 
>>>>>> floating-point operations. Setting it with a @code{volatile asm} 
>>>>>> statement,
>>>>>> as in the following PowerPC example, does not work reliably.
>>>>>> 
>>>>>> @example
>>>>>> asm volatile("mtfsf 255, %0" : : "f" (fpenv));
>>>>>> sum = x + y;
>>>>>> @end example
>>>>>> 
>>>>>> The compiler may move the addition back before the @code{volatile asm}
>>>>>> statement. To make it work as expected, add an artificial dependency to
>>>>>> the @code{asm} by referencing a variable in the subsequent code, for
>>>>>> example:
>>>>>> 
>>>>>> @example
>>>>>> asm volatile ("mtfsf 255,%1" : "=X" (sum) : "f" (fpenv));
>>>>>> sum = x + y;
>>>>>> @end example
>>>>>> 
>>>>>> which is very similar to the unspec_volatile case we're talking about.
>>>>>> 
>>>>>> To take an x86 example:
>>>>>> 
>>>>>> void
>>>>>> f (char *x)
>>>>>> {
>>>>>>  asm volatile ("");
>>>>>>  x[0] = 0;
>>>>>>  asm volatile ("");
>>>>>>  x[1] = 0;
>>>>>>  asm volatile ("");
>>>>>> }
>>>>> 
>>>>> If we change the above as the following: (but it might not correct on the 
>>>>> asm format):
>>>>> 
>>>>> Void
>>>>> F (char *x)
>>>>> {
>>>>> asm volatile (“x[0]”);
>>>>> x[0] = 0;
>>>>> asm volatile (“x[1]"); 
>>>>> x[1] = 0;
>>>>> asm volatile ("”);
>>>>> }
>>>>> 
>>>>> Will the moving and merging be blocked?
>>>> 
>>>> That would stop assignments moving up, but it wouldn't stop x[0] moving
>>>> down across the x[1] asm.  Using:
>>>> 
>>>> asm volatile ("" ::: "memory");
>>>> 
>>>> would prevent moves in both directions, which was what I meant in my
>>>> later comment about memory clobbers.
>>>> 
>>>> In each case, the same would be true for unspec_volatile.
>>> 
>>> So, is the following good enough:
>>> 
>>> asm volatile (reg1, reg2, … regN, memory)
>>> mov reg1, 0
>>> mov reg2, 0
>>> ...
>>> mov regN,0
>>> asm volatile (reg1, reg2,… regN, memory)
>>> return
>>> 
>>> 
>>> I.e, just add one “asm volatile” insn whose operands include all registers 
>>> and memory BEFORE and AFTER the whole zeroing sequence.
>> 
>> It isn't clear from your syntax whether the asm volatile arguments
>> are uses or clobbers.
>
> How can the syntax of asm volatile distinguish “Uses” and “Clobbers”?


Well, I wasn't trying to discuss correct syntax, I just wasn't sure what
you meant.

As mentioned in the quote below, I was expecting the asm volatile
before the zeroing to include clobbers generated as discussed in
the earlier message:

  rtx asm_op = gen_rtx_ASM_OPERANDS (…);
  MEM_VOLATILE_P (asm_op) = 1;

  rtvec v = rtvec_alloc (N + 1);
  RTVEC_ELT (v, 0) = asm_op;
  RTVEC_ELT (v, 1) = gen_rtx_CLOBBER (VOIDmode, …);
  …
  RTVEC_ELT (v, N) = gen_rtx_CLOBBER (VOIDmode, …);

  emit_insn (gen_rtx_PARALLEL (VOIDmode, v));

But doing this after the zeroing would give:

  …clobber reg1 in an asm…
  …set reg1 to zero…
  …clobber reg1 in an asm…

Dataflow-wise, the second clobber overwrites the effect of the zeroing.
Since nothing uses reg1 between the zeroing and the clobber, the zeroing
could be removed as dead.

>>  The idea was:
>> 
>> - There would be an asm volatile before the moves that clobbers (but does
>>  not use) (mem:BLK (scratch)) and the zeroed registers.
>> 
>> - EPILOGUE_USES would make the zeroed registers live after the return.
>
> Is EPILOGUE_USES the only way for this purpose? Will add another “asm 
> volatile” immediately before the return serve the same purpose?

Why do you want to use an asm to keep the instructions live though?

As I think I mentioned before (but sorry if I'm misremembering),
using an asm would be counterproductive on delayed-branch targets.
The delayed branch scheduler looks backwards for something that could
fill the delay slot.  If we have an asm after the zeroing instructions
that uses the zeroed registers, that would prevent any zeroing
instruction from filling the delay slot.  The delayed branch scheduler
would therefore try to fill the delay slot with something from before
the zeroing sequence, which is exactly what we'd like to avoid.

Also, using an asm after the sequence would allow a machine_reorg
pass to reuse the zeroed registers for something else between the
second asm and the return.

IMO, marking the zeroed registers as being live out of the function
is the simplest, most direct way of representing the fact that the
zeroing effect has to survive to the function return.  It's how we
make sure that the function return value remains live and how we make
sure that the restored call-preserved registers remain live.

Thanks,
Richard

Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

Reply via email to