> On Sep 22, 2020, at 12:06 PM, Richard Sandiford <richard.sandif...@arm.com>
> wrote:
>>>>
>>>> The following is what I see from i386.md: (I didn’t look at how
>>>> “UNSPEC_volatile” is used in data flow analysis in GCC yet)
>>>>
>>>> ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
>>>> ;; all of memory. This blocks insns from being moved across this point.
>>>
>>> Heh, it looks like that comment dates back to 1994. :-)
>>>
>>> The comment is no longer correct though. I wasn't around at the time,
>>> but I assume the comment was only locally true even then.
>>>
>>> If what the comment said was true, then something like:
>>>
>>> (define_insn "cld"
>>> [(unspec_volatile [(const_int 0)] UNSPECV_CLD)]
>>> ""
>>> "cld"
>>> [(set_attr "length" "1")
>>> (set_attr "length_immediate" "0")
>>> (set_attr "modrm" "0")])
>>>
>>> would invalidate the entire register file and so would require all values
>>> to be spilt to the stack around the CLD.
>>
>> Okay, thanks for the info.
>> then, what’s the current definition of UNSPEC_VOLATILE?
>
> I'm not sure it's written down anywhere TBH. rtl.texi just says:
>
> @code{unspec_volatile} is used for volatile operations and operations
> that may trap; @code{unspec} is used for other operations.
>
> which seems like a cyclic definition: volatile expressions are defined
> to be expressions that are volatile.
>
> But IMO the semantics are that unspec_volatile patterns with a given
> set of inputs and outputs act for dataflow purposes like volatile asms
> with the same inputs and outputs. The semantics of asm volatile are
> at least slightly more well-defined (if only by example); see extend.texi
> for details. In particular:
>
> Note that the compiler can move even @code{volatile asm} instructions
> relative
> to other code, including across jump instructions. For example, on many
> targets there is a system register that controls the rounding mode of
> floating-point operations. Setting it with a @code{volatile asm} statement,
> as in the following PowerPC example, does not work reliably.
>
> @example
> asm volatile("mtfsf 255, %0" : : "f" (fpenv));
> sum = x + y;
> @end example
>
> The compiler may move the addition back before the @code{volatile asm}
> statement. To make it work as expected, add an artificial dependency to
> the @code{asm} by referencing a variable in the subsequent code, for
> example:
>
> @example
> asm volatile ("mtfsf 255,%1" : "=X" (sum) : "f" (fpenv));
> sum = x + y;
> @end example
>
> which is very similar to the unspec_volatile case we're talking about.
>
> To take an x86 example:
>
> void
> f (char *x)
> {
> asm volatile ("");
> x[0] = 0;
> asm volatile ("");
> x[1] = 0;
> asm volatile ("");
> }
If we change the above as the following: (but it might not correct on the asm
format):
Void
F (char *x)
{
asm volatile (“x[0]”);
x[0] = 0;
asm volatile (“x[1]");
x[1] = 0;
asm volatile ("”);
}
Will the moving and merging be blocked?
I found the following code in df-scan.c:
static void
df_uses_record (class df_collection_rec *collection_rec,
rtx *loc, enum df_ref_type ref_type,
basic_block bb, struct df_insn_info *insn_info,
int flags)
{
…
case ASM_OPERANDS:
case UNSPEC_VOLATILE:
case TRAP_IF:
case ASM_INPUT:
…
if (code == ASM_OPERANDS)
{
int j;
for (j = 0; j < ASM_OPERANDS_INPUT_LENGTH (x); j++)
df_uses_record (collection_rec, &ASM_OPERANDS_INPUT (x, j),
DF_REF_REG_USE, bb, insn_info, flags);
return;
}
break;
…
}
Looks like ONLY the operands of “ASM_OPERANDS” are recorded as USES in df
analysis, the operands of “UNSPEC_VOLATILE” are NOT.
If we use “ASM_OPERANDS” instead of “UNSPEXC_VOLATILE” as you suggested, the
data flow analysis should automatically pick up the operands of “ASM_OPERANDS”,
and fix the data flow, right?
>
> gets optimised to:
>
> xorl %eax, %eax
> movw %ax, (%rdi)
>
> with the two stores being merged. The same thing is IMO valid for
> unspec_volatile. In both cases, you would need some kind of memory
> clobber to prevent the move and merge from happening.
>
>>>
>>> There's also no dataflow reason why this couldn't be reordered to:
>>>
>>> (insn 18 16 19 2 (set (reg:SI 1 dx)
>>> (const_int 0 [0])) "t10.c":11:1 -1
>>> (nil))
>>> (insn 19 18 20 2 (unspec_volatile [
>>> (reg:SI 1 dx)
>>> ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>>> (nil))
>>> (insn 17 … pop a register other than dx from the stack …)
>>>
>>
>> This is the place I don’t quite agree at this moment, maybe I still not
>> quite understand the “UNSPEC_volatile”.
>>
>> I checked several places in GCC that handle “UNSPEC_VOLATILE”, for example,
>> for the routine “can_move_insns_across” in gcc/df-problem.c:
>>
>> if (NONDEBUG_INSN_P (insn))
>> {
>> if (volatile_insn_p (PATTERN (insn)))
>> return false;
>>
>> From my understanding of reading the code, when an insn is UNSPEC_VOLATILE,
>> another insn will NOT be able to move across it.
>>
>> Then for the above example, the insn 17 should Not be moved across insn 19
>> either.
>>
>> Let me know if I miss anything important.
>
> The above is conservatively correct. But not all passes do it.
> E.g. combine does have a similar approach:
>
> /* If INSN contains volatile references (specifically volatile MEMs),
> we cannot combine across any other volatile references.
> Even if INSN doesn't contain volatile references, any intervening
> volatile insn might affect machine state. */
>
> is_volatile_p = volatile_refs_p (PATTERN (insn))
> ? volatile_refs_p
> : volatile_insn_p;
>
> And like you say, the passes that use can_move_insns_across will be
> conservative too. But not many passes use that function.
Okay, I see.
>
> Passes like fwprop.c, postreload-gcse.c and ree.c do not (AFAIK) worry
> about volatile asms or unspec_volatiles, and can move code across them.
> And that's kind-of inevitable. Having an “everything barrier” makes
> life very hard for global optimisation.
Okay, so, it’s intentionally not making UNSPEC_VOLATILE as an “everything
barrier”?
(But I do feel that the design for UNSPEC_volatile is not clean)
>
>>> The asm for (b) goes before the instruction, so we'd have:
>>>
>>> (insn 17 … new asm …)
>>> (insn 18 16 19 2 (set (reg:SI 1 dx)
>>> (const_int 0 [0])) "t10.c":11:1 -1
>>> (nil))
>>> (insn 19 … return …)
>>>
>>> But something has to tell the df machinery that the value of edx
>>> matters on return from the function, otherwise insn 18 could be
>>> deleted as dead. Adding edx to EPILOGUE_USES provides that information
>>> and stops the instruction from being deleted.
>>
>>
>> In the above, insn 17 will be something like:
>>
>> (insn 17 ...(unspec_volatile [ (reg:SI 1 dx)
>> ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>> (nil))
>
> In the example above, insn 17 would be an asm that clobbers dx
> (instead of using dx).
>
>> So, the reg edx is marked as “UNSPEC_volatile” already, that should mean the
>> value of edx matters on return from the function already, my understanding
>> is that df should automatically pick up the “UNSPEC_VOLATILE” insn and it’s
>> operands. “UNSPEC_VOLATILE” insn should serve the same purpose as putting
>> “edx” to EPILOGUE_USES.
>>
>> Do I miss anything here?
>
> The point is that any use of dx at insn 17 comes before the definition
> in insn 18. So a use in insn 17 would keep alive any store to dx that
> happend before insn 17. But it would not keep the store in insn 18 live,
> since insn 18 executes later.
Okay, I see.
>
>>>>> I don't think we need a new target-specific unspec_volatile code to do
>>>>> (b).
>>>>> We can just use an automatically-generated volatile asm to clobber the
>>>>> registers first. See e.g. how expand_asm_memory_blockage handles memory
>>>>> scheduling barriers.
>>>> /* Generate asm volatile("" : : : "memory") as the memory blockage. */
>>>>
>>>> static void
>>>> expand_asm_memory_blockage (void)
>>>> {
>>>> rtx asm_op, clob;
>>>>
>>>> asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
>>>> rtvec_alloc (0), rtvec_alloc (0),
>>>> rtvec_alloc (0), UNKNOWN_LOCATION);
>>>> MEM_VOLATILE_P (asm_op) = 1;
>>>>
>>>> clob = gen_rtx_SCRATCH (VOIDmode);
>>>> clob = gen_rtx_MEM (BLKmode, clob);
>>>> clob = gen_rtx_CLOBBER (VOIDmode, clob);
>>>>
>>>> emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, asm_op, clob)));
>>>> }
>>>>
>>>>
>>>> As the following?
>>>>
>>>> /* Generate asm volatile("" : : : “regno") for REGNO. */
>>>>
>>>> static void
>>>> expand_asm_reg_volatile (machine_mode mode, unsigned int regno)
>>>> {
>>>> rtx asm_op, clob;
>>>>
>>>> asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
>>>> rtvec_alloc (0), rtvec_alloc (0),
>>>> rtvec_alloc (0), UNKNOWN_LOCATION);
>>>> MEM_VOLATILE_P (asm_op) = 1;
>>>>
>>>> clob = gen_rtx_REG (mode, regno);
>>>> clob = gen_rtx_CLOBBER (VOIDmode, clob);
>>>>
>>>> emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, asm_op, clob)));
>>>> }
>>>>
>>>> Is the above correct?
>>>
>>> Yeah, looks good. You should be able to clobber all the registers you
>>> want to clear in one asm.
>>
>> How to do this?
>
> Rather than create:
>
> gen_rtvec (2, asm_op, clob)
>
> with just the asm and one clobber, you can create:
>
> gen_rtvec (N + 1, asm_op, clob1, …, clobN)
>
> with N clobbers side-by-side. When N is variable (as it probably would
> be in your case), it's easier to use rtvec_alloc and fill in the fields
> using RTVEC_ELT. E.g.:
>
> rtvec v = rtvec_alloc (N + 1);
> RTVEC_ELT (v, 0) = asm_op;
> RTVEC_ELT (v, 1) = clob1;
> …
> RTVEC_ELT (v, N) = clobN;
Thanks.
Qing
>
> Thanks,
> Richard