Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

Qing Zhao via Gcc-patches Tue, 22 Sep 2020 14:32:53 -0700

> On Sep 22, 2020, at 12:06 PM, Richard Sandiford <richard.sandif...@arm.com> 
> wrote:
>>>> 
>>>> The following is what I see from i386.md: (I didn’t look at how 
>>>> “UNSPEC_volatile” is used in data flow analysis in GCC yet)
>>>> 
>>>> ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
>>>> ;; all of memory.  This blocks insns from being moved across this point.
>>> 
>>> Heh, it looks like that comment dates back to 1994. :-)
>>> 
>>> The comment is no longer correct though.  I wasn't around at the time,
>>> but I assume the comment was only locally true even then.
>>> 
>>> If what the comment said was true, then something like:
>>> 
>>> (define_insn "cld"
>>> [(unspec_volatile [(const_int 0)] UNSPECV_CLD)]
>>> ""
>>> "cld"
>>> [(set_attr "length" "1")
>>>  (set_attr "length_immediate" "0")
>>>  (set_attr "modrm" "0")])
>>> 
>>> would invalidate the entire register file and so would require all values
>>> to be spilt to the stack around the CLD.
>> 
>> Okay, thanks for the info. 
>> then, what’s the current definition of UNSPEC_VOLATILE? 
> 
> I'm not sure it's written down anywhere TBH.  rtl.texi just says:
> 
>  @code{unspec_volatile} is used for volatile operations and operations
>  that may trap; @code{unspec} is used for other operations.
> 
> which seems like a cyclic definition: volatile expressions are defined
> to be expressions that are volatile.
> 
> But IMO the semantics are that unspec_volatile patterns with a given
> set of inputs and outputs act for dataflow purposes like volatile asms
> with the same inputs and outputs.  The semantics of asm volatile are
> at least slightly more well-defined (if only by example); see extend.texi
> for details.  In particular:
> 
>  Note that the compiler can move even @code{volatile asm} instructions 
> relative
>  to other code, including across jump instructions. For example, on many 
>  targets there is a system register that controls the rounding mode of 
>  floating-point operations. Setting it with a @code{volatile asm} statement,
>  as in the following PowerPC example, does not work reliably.
> 
>  @example
>  asm volatile("mtfsf 255, %0" : : "f" (fpenv));
>  sum = x + y;
>  @end example
> 
>  The compiler may move the addition back before the @code{volatile asm}
>  statement. To make it work as expected, add an artificial dependency to
>  the @code{asm} by referencing a variable in the subsequent code, for
>  example:
> 
>  @example
>  asm volatile ("mtfsf 255,%1" : "=X" (sum) : "f" (fpenv));
>  sum = x + y;
>  @end example
> 
> which is very similar to the unspec_volatile case we're talking about.
> 
> To take an x86 example:
> 
>  void
>  f (char *x)
>  {
>    asm volatile ("");
>    x[0] = 0;
>    asm volatile ("");
>    x[1] = 0;
>    asm volatile ("");
>  }

If we change the above as the following: (but it might not correct on the asm 
format):

Void
F (char *x)
{
asm volatile (“x[0]”);
x[0] = 0;
asm volatile (“x[1]"); 
x[1] = 0;
 asm volatile ("”);
}

Will the moving and merging be blocked?


I found the following code in df-scan.c:

static void
df_uses_record (class df_collection_rec *collection_rec,
                rtx *loc, enum df_ref_type ref_type,
                basic_block bb, struct df_insn_info *insn_info,
                int flags)
{
…

    case ASM_OPERANDS:
    case UNSPEC_VOLATILE:
    case TRAP_IF:
    case ASM_INPUT:
…
        if (code == ASM_OPERANDS)
          {
            int j;

            for (j = 0; j < ASM_OPERANDS_INPUT_LENGTH (x); j++)
              df_uses_record (collection_rec, &ASM_OPERANDS_INPUT (x, j),
                              DF_REF_REG_USE, bb, insn_info, flags);
            return;
          }
        break;
…
}


Looks like ONLY the operands of  “ASM_OPERANDS” are recorded as USES in df 
analysis,  the operands of “UNSPEC_VOLATILE” are NOT. 

If we use “ASM_OPERANDS” instead of “UNSPEXC_VOLATILE” as you suggested, the 
data flow analysis should automatically pick up the operands of “ASM_OPERANDS”, 
and fix the data flow, right?


> 
> gets optimised to:
> 
>        xorl    %eax, %eax
>        movw    %ax, (%rdi)
> 
> with the two stores being merged.  The same thing is IMO valid for
> unspec_volatile.  In both cases, you would need some kind of memory
> clobber to prevent the move and merge from happening.
> 
>>> 
>>> There's also no dataflow reason why this couldn't be reordered to:
>>> 
>>> (insn 18 16 19 2 (set (reg:SI 1 dx)
>>>      (const_int 0 [0])) "t10.c":11:1 -1
>>>   (nil))
>>> (insn 19 18 20 2 (unspec_volatile [
>>>          (reg:SI 1 dx)
>>>      ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>>>   (nil))
>>> (insn 17 … pop a register other than dx from the stack …)
>>> 
>> 
>> This is the place I don’t quite agree at this moment, maybe I still not 
>> quite understand the “UNSPEC_volatile”.
>> 
>> I checked several places in GCC that handle “UNSPEC_VOLATILE”, for example,  
>> for the routine “can_move_insns_across” in gcc/df-problem.c:
>> 
>>      if (NONDEBUG_INSN_P (insn))
>>        {
>>          if (volatile_insn_p (PATTERN (insn)))
>>            return false;
>> 
>> From my understanding of reading the code, when an insn is UNSPEC_VOLATILE, 
>> another insn will NOT be able to move across it. 
>> 
>> Then for the above example, the insn 17 should Not be moved across insn 19 
>> either.
>> 
>> Let me know if I miss anything important. 
> 
> The above is conservatively correct.  But not all passes do it.
> E.g. combine does have a similar approach:
> 
>  /* If INSN contains volatile references (specifically volatile MEMs),
>     we cannot combine across any other volatile references.
>     Even if INSN doesn't contain volatile references, any intervening
>     volatile insn might affect machine state.  */
> 
>  is_volatile_p = volatile_refs_p (PATTERN (insn))
>    ? volatile_refs_p
>    : volatile_insn_p;
> 
> And like you say, the passes that use can_move_insns_across will be
> conservative too.  But not many passes use that function.

Okay, I see. 
> 
> Passes like fwprop.c, postreload-gcse.c and ree.c do not (AFAIK) worry
> about volatile asms or unspec_volatiles, and can move code across them.
> And that's kind-of inevitable.  Having an “everything barrier” makes
> life very hard for global optimisation.

Okay, so, it’s intentionally not making UNSPEC_VOLATILE as an “everything 
barrier”? 

(But I do feel that the design for UNSPEC_volatile is not clean)

> 
>>> The asm for (b) goes before the instruction, so we'd have:
>>> 
>>> (insn 17 … new asm …)
>>> (insn 18 16 19 2 (set (reg:SI 1 dx)
>>>      (const_int 0 [0])) "t10.c":11:1 -1
>>>   (nil))
>>> (insn 19 … return …)
>>> 
>>> But something has to tell the df machinery that the value of edx
>>> matters on return from the function, otherwise insn 18 could be
>>> deleted as dead.  Adding edx to EPILOGUE_USES provides that information
>>> and stops the instruction from being deleted.
>> 
>> 
>> In the above, insn 17 will be something like:
>> 
>> (insn 17 ...(unspec_volatile [  (reg:SI 1 dx)
>>    ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1 
>> (nil))
> 
> In the example above, insn 17 would be an asm that clobbers dx
> (instead of using dx).
> 
>> So, the reg edx is marked as “UNSPEC_volatile” already, that should mean the 
>> value of edx matters on return from the function already, my understanding 
>> is that df should automatically pick up the “UNSPEC_VOLATILE” insn and it’s 
>> operands.   “UNSPEC_VOLATILE” insn should serve the same purpose as putting 
>> “edx” to EPILOGUE_USES. 
>> 
>> Do I miss anything here?
> 
> The point is that any use of dx at insn 17 comes before the definition
> in insn 18.  So a use in insn 17 would keep alive any store to dx that
> happend before insn 17.  But it would not keep the store in insn 18 live,
> since insn 18 executes later.

Okay, I see. 
> 
>>>>> I don't think we need a new target-specific unspec_volatile code to do 
>>>>> (b).
>>>>> We can just use an automatically-generated volatile asm to clobber the
>>>>> registers first.  See e.g. how expand_asm_memory_blockage handles memory
>>>>> scheduling barriers.
>>>> /* Generate asm volatile("" : : : "memory") as the memory blockage.  */
>>>> 
>>>> static void
>>>> expand_asm_memory_blockage (void)
>>>> {
>>>> rtx asm_op, clob;
>>>> 
>>>> asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
>>>>                                rtvec_alloc (0), rtvec_alloc (0),
>>>>                                rtvec_alloc (0), UNKNOWN_LOCATION);
>>>> MEM_VOLATILE_P (asm_op) = 1;
>>>> 
>>>> clob = gen_rtx_SCRATCH (VOIDmode);
>>>> clob = gen_rtx_MEM (BLKmode, clob);
>>>> clob = gen_rtx_CLOBBER (VOIDmode, clob);
>>>> 
>>>> emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, asm_op, clob)));
>>>> }
>>>> 
>>>> 
>>>> As the following? 
>>>> 
>>>> /* Generate asm volatile("" : : : “regno") for REGNO.   */
>>>> 
>>>> static void
>>>> expand_asm_reg_volatile (machine_mode mode, unsigned int regno)
>>>> {
>>>> rtx asm_op, clob;
>>>> 
>>>> asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
>>>>                                rtvec_alloc (0), rtvec_alloc (0),
>>>>                                rtvec_alloc (0), UNKNOWN_LOCATION);
>>>> MEM_VOLATILE_P (asm_op) = 1;
>>>> 
>>>> clob = gen_rtx_REG (mode, regno);
>>>> clob = gen_rtx_CLOBBER (VOIDmode, clob);
>>>> 
>>>> emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, asm_op, clob)));
>>>> }
>>>> 
>>>> Is the above correct? 
>>> 
>>> Yeah, looks good.  You should be able to clobber all the registers you
>>> want to clear in one asm.
>> 
>> How to do this?
> 
> Rather than create:
> 
>  gen_rtvec (2, asm_op, clob)
> 
> with just the asm and one clobber, you can create:
> 
>  gen_rtvec (N + 1, asm_op, clob1, …, clobN)
> 
> with N clobbers side-by-side.  When N is variable (as it probably would
> be in your case), it's easier to use rtvec_alloc and fill in the fields
> using RTVEC_ELT.  E.g.:
> 
>  rtvec v = rtvec_alloc (N + 1);
>  RTVEC_ELT (v, 0) = asm_op;
>  RTVEC_ELT (v, 1) = clob1;
>  …
>  RTVEC_ELT (v, N) = clobN;

Thanks.

Qing
> 
> Thanks,
> Richard
Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

Reply via email to