Tucker Kern via Gcc <gcc@gcc.gnu.org> writes:
> TL;DR
>
> In GCC 9.3, I believe modified_between_p should return 1 if the memory
> reference is volatile. My layman's understanding of volatile memory is that
> it could change at any time, and thus could be modified between any two
> instructions.

That's true, but in RTL, volatile accesses provide no timing guarantee,
even in an abstract sense.  E.g. suppose we traced the execution of a
function call based on the instructions that exist immediately after
the expand pass.  If a volatile access occurs in the Nth instruction
of this trace, there's no guarantee that the access will still occur
in the Nth instruction of traces for later passes.  Unrelated
instructions can be added and removed at any time.

For example, a volatile access should not stop us from swapping:

  (set (reg:SI R0) (const_int 0))

and:

  (set (mem/v:SI (reg:SI R1)) (const_int 0))

So…

> Possible patch:
> diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
> index 01af063a222..a395df0627b 100644
> --- a/gcc/rtlanal.c
> +++ b/gcc/rtlanal.c
> @@ -1308,6 +1308,8 @@ modified_between_p (const_rtx x, const rtx_insn
> *start, const rtx_insn *end)
>        return 1;
>
>      case MEM:
> +      if (MEM_VOLATILE_P(x))
> +        return 1;
>        if (modified_between_p (XEXP (x, 0), start, end))
>   return 1;
>        if (MEM_READONLY_P (x))

…I think the current code handles volatile correctly.  The issue
is whether any instruction in (START, END) might modify the same
memory as X.  Most of the real work is done by the alias machinery,
which understands (or is supposed to understand) the special rules
around volatile memory and how it interacts with other memory accesses:

      for (insn = NEXT_INSN (start); insn != end; insn = NEXT_INSN (insn))
        if (memory_modified_in_insn_p (x, insn))
          return 1;

> Details
> I'm writing a splitter to optimize bit field conditionals (e.g. testing a
> bit of memory mapped I/O). The non-optimized pattern consists of a SET,
> AND,  LSHIFTRT + CMP, JMP pattern which I know can be simplified to an
> AND + JMP as the AND operation will set the necessary CC bits. To achieve
> this optimization in the combine pass a custom general_operand has been
> defined which allows volatile memory.
>
> However in certain situations I've found the combine pass is generating
> invalid patterns when it merges a SET and IF_THEN_ELSE (CMP + JMP)
> patterns. One such situation is when GCC decides to reuse the result of the
> comparison for the return value.
>
> This issue seems intertwined with the combine + volatile_ok issue that has
> been discussed a number of times in the past. The backend is question is
> highly embedded and significant performance gains are made by enabling
> volatile references during combine.
>
> Optimized tree of the test program. In this program the bit field value is
> saved in _1 to be reused in the return value. Why it makes this choice is
> beyond me.
>   <unnamed-unsigned:1> _1;
>   _Bool _3;
>
>   <bb 3> [local count: 1044213930]:
>   _1 ={v} MEM[(volatile struct register_t *)5B].done;
>   if (_1 != 0)
>     goto <bb 4>; [11.00%]
>   else
>     goto <bb 3>; [89.00%]
>
>   <bb 4> [local count: 1073741824]:
>   // Additional code trimmed for brevity...
>
>   <bb 6> [local count: 114863532]:
>   # _3 = PHI <0(4), _1(5)>
>   return _3;
>
> This produces the following pre-combine RTL.
> (insn 9 6 10 3 (parallel [
>             (set (reg:QI 17 [ <retval> ])
>                 (lshiftrt:QI (mem/v:QI (const_int 5 [0x5]) [1 MEM[(volatile
> struct register_t *)5B]+0 S1 A16])
>                     (const_int 3 [0x3])))
>             (clobber (reg:CC 7 FLAG))
>         ]) "fail.c":22:26 19 {*lshrqi3_im}
>      (expr_list:REG_DEAD (reg/f:QI 18)
>         (expr_list:REG_UNUSED (reg:CC 7 FLAG)
>             (nil))))
> (insn 10 9 12 3 (parallel [
>             (set (reg:QI 17 [ <retval> ])
>                 (and:QI (reg:QI 17 [ <retval> ])
>                     (const_int 1 [0x1])))
>             (clobber (reg:CC 7 FLAG))
>         ]) "fail.c":22:26 10 {andqi3}
>      (expr_list:REG_UNUSED (reg:CC 7 FLAG)
>         (nil)))
> (jump_insn 12 10 20 3 (parallel [
>             (set (pc)
>                 (if_then_else (eq (reg:QI 17 [ <retval> ])
>                         (const_int 0 [0]))
>                     (label_ref:QI 11)
>                     (pc)))
>             (clobber (scratch:QI))
>         ]) "fail.c":22:9 41 {*cbranchqi4_im}
>      (int_list:REG_BR_PROB 955630228 (nil))
>  -> 11)
>
> Which when combined generates patterns that will produce invalid code. GCC
> has assumed the memory value will not change between insn 10 & 12.
> (insn 10 9 12 3 (parallel [
>             (set (reg:QI 17 [ <retval> ])
>                 (and:QI (lshiftrt:QI (mem/v:QI (const_int 5 [0x5]) [1
> MEM[(volatile struct register_t *)5B]+0 S1 A16])
>                         (const_int 3 [0x3]))
>                     (const_int 1 [0x1])))
>             (clobber (reg:CC 7 FLAG))
>         ]) "fail.c":22:18 10 {andqi3}
>      (expr_list:REG_UNUSED (reg:CC 7 FLAG)
>         (nil)))
> (jump_insn 12 10 20 3 (parallel [
>             (set (pc)
>                 (if_then_else (eq (zero_extract:QI (mem/v:QI (const_int 5
> [0x5]) [1 MEM[(volatile struct register_t *)5B]+0 S1 A16])
>                             (const_int 1 [0x1])
>                             (const_int 3 [0x3]))
>                         (const_int 0 [0]))
>                     (label_ref:QI 11)
>                     (pc)))
>             (clobber (scratch:QI))
>         ]) "fail.c":22:9 43 {*cbranchqi4_zero_extract}
>      (int_list:REG_BR_PROB 955630228 (nil))
>  -> 11)
>
> With the above patch, the combination of insn 10 & 12 is rejected and the
> if_then_else switches on retval.
> (insn 10 9 12 3 (parallel [
>             (set (reg:QI 17 [ <retval> ])
>                 (and:QI (lshiftrt:QI (mem/v:QI (const_int 5 [0x5]) [1
> MEM[(volatile struct register_t *)5B]+0 S1 A16])
>                         (const_int 3 [0x3]))
>                     (const_int 1 [0x1])))
>             (clobber (reg:CC 7 FLAG))
>         ]) "fail.c":22:18 10 {andqi3}
>      (expr_list:REG_UNUSED (reg:CC 7 FLAG)
>         (nil)))
> (jump_insn 12 10 20 3 (parallel [
>             (set (pc)
>                 (if_then_else (eq (reg:QI 17 [ <retval> ])
>                         (const_int 0 [0]))
>                     (label_ref:QI 11)
>                     (pc)))
>             (clobber (scratch:QI))
>         ]) "fail.c":22:9 41 {*cbranchqi4_im}
>      (int_list:REG_BR_PROB 955630228 (nil))
>  -> 11)
>
> Issues like this are probably a reason volatile memory is disallowed during
> combine.

Yeah.

I think the issue in the example above is duplication rather than
modified_between_p.  There is:

  /* We are about to copy insns for the case where they need to be kept
     around.  Check that they can be copied in the merged instruction.  */

  if (targetm.cannot_copy_insn_p
      && ((added_sets_2 && targetm.cannot_copy_insn_p (i2))
          || (i1 && added_sets_1 && targetm.cannot_copy_insn_p (i1))
          || (i0 && added_sets_0 && targetm.cannot_copy_insn_p (i0))))
    {
      undo_all ();
      return 0;
    }

to stop us copying things that shouldn't be copied.  Perhaps volatile
accesses should be handled the same way.

But like you imply, I'm far from confident that this is the only
problem with ignoring volatile_ok.

Thanks,
Richard

Reply via email to