Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg

Bernd Edlinger Sun, 31 Jul 2016 03:44:35 -0700

On 07/30/16 13:39, Segher Boessenkool wrote:
> On Sat, Jul 30, 2016 at 08:17:59AM +0000, Bernd Edlinger wrote:
>> Ok, I the incorrect REG_POINTER is done here:
>>
>> cse_main -> reg_scan -> reg_scan_mark_refs -> set_reg_attrs_from_value
>>
>> and here I see a bug, because if POINTERS_EXTEND_UNSIGNED
>> can_be_reg_pointer is only set to false if x is SIGN_EXTEND but not
>> if x is a SUBREG as in this case.
>>
>> So I think that should be fixed this way:
>>
>> Index: emit-rtl.c
>> ===================================================================
>> --- emit-rtl.c       (revision 238891)
>> +++ emit-rtl.c       (working copy)
>> @@ -1155,7 +1155,7 @@ set_reg_attrs_from_value (rtx reg, rtx x)
>>       || (GET_CODE (x) == SUBREG && subreg_lowpart_p (x)))
>>        {
>>    #if defined(POINTERS_EXTEND_UNSIGNED)
>> -      if (((GET_CODE (x) == SIGN_EXTEND && POINTERS_EXTEND_UNSIGNED)
>> +      if (((GET_CODE (x) != ZERO_EXTEND && POINTERS_EXTEND_UNSIGNED)
>>         || (GET_CODE (x) != SIGN_EXTEND && ! POINTERS_EXTEND_UNSIGNED))
>>        && !targetm.have_ptr_extend ())
>>      can_be_reg_pointer = false;
>>
>>
>> What do you think does this look like the right fix?
>
> There also is (from rtl.texi), for paradoxical subregs:
>
> """
> @item @code{subreg} of @code{reg}s
> The upper bits are defined when @code{SUBREG_PROMOTED_VAR_P} is true.
> @code{SUBREG_PROMOTED_UNSIGNED_P} describes what the upper bits hold.
> Such subregs usually represent local variables, register variables
> and parameter pseudo variables that have been promoted to a wider mode.
> """
>
> so you might want to test for these as well.
>


like this?

Index: emit-rtl.c
===================================================================
--- emit-rtl.c  (revision 238891)
+++ emit-rtl.c  (working copy)
@@ -1156,7 +1156,11 @@
      {
  #if defined(POINTERS_EXTEND_UNSIGNED)
        if (((GET_CODE (x) == SIGN_EXTEND && POINTERS_EXTEND_UNSIGNED)
-          || (GET_CODE (x) != SIGN_EXTEND && ! POINTERS_EXTEND_UNSIGNED))
+          || (GET_CODE (x) == ZERO_EXTEND && ! POINTERS_EXTEND_UNSIGNED)
+          || (paradoxical_subreg_p (x)
+              && ! (SUBREG_PROMOTED_VAR_P (x)
+                    && SUBREG_CHECK_PROMOTED_SIGN (x,
+                                                   POINTERS_EXTEND_UNSIGNED))))
          && !targetm.have_ptr_extend ())
        can_be_reg_pointer = false;
  #endif

In the test case of pr71779 the subreg is no promoted var, so this
has no influence at this time.  Also I have not POINTERS_EXTEND_SIGNED
target, but for the symmetry it ought to check explicitly for
ZERO_EXTEND as well, and allow the pointer value to pass thru a
TRUNCATE.

>> With this patch the code the reg/f:DI 481 does no longer appear,
>> and also the invalid combine does no longer happen.
>
> Good :-)
>
>> However the test case from pr70903 does not get fixed by this.
>>
>> But when I look at the dumps, I see again the first incorrect
>> transformation in cse2 (again cse!):
>>
>> (insn 20 19 21 2 (set (reg:DI 94)
>>           (subreg:DI (reg:QI 93) 0)) pr.c:8 50 {*movdi_aarch64}
>>        (expr_list:REG_EQUAL (const_int 255 [0xff])
>>           (expr_list:REG_DEAD (reg:QI 93)
>>               (nil))))
>>
>> but that is simply wrong, because later optimization passes
>> expect reg 94 to be 0x000000ff but the upper bits are unspecified,
>> so that REG_EQUAL should better not exist.
>
> Agreed.  So where does that come from?
>
>> When I looked at cse.c I saw a comment in #if 0, which exactly
>> describes the problem that we have with paradoxical subreg here:
>
> <snip>
>
>> I am pretty sure, that this has to be reverted, and that it can
>> generate less efficient code.
>>
>> What do you think?
>
> I think this pessimises the generated code too much; there must be a
> better solution.
>

I debugged the cse again, to see how it works and why it mis-compiles
this example.

I found out that the trouble starts one instruction earlier:

(insn 19 40 20 2 (set (subreg:DI (reg:QI 93) 0)
         ) pr.c:8 50 {*movdi_aarch64}
      (expr_list:REG_DEAD (reg:DI 110 [ D.3037 ])
         (nil)))

cse_main sees the constant value and maps:
(reg:QI 93)  =>  (const_int 255 [0xff])

plus (I mean that is wrong):
(subreg:DI (reg:QI 93) 0)  =>  (const_int 255 [0xff])

When the next insn is scanned

(insn 20 19 21 2 (set (reg:DI 94)
         (subreg:DI (reg:QI 93) 0)) pr.c:8 50 {*movdi_aarch64}
      (expr_list:REG_DEAD (reg:QI 93)
         (nil)))

I see fold_rtx (subreg:DI (reg:QI 93) 0))
return (const_int 255 [0xff]) which is wrong.

now cse maps:
(reg:DI 94)  =>  (const_int 255 [0xff])

which is also not guaranteed to be correct, but the REG_EQUAL at
insn 20 is probably only a symptom, suppressing only that does not
fix the test case, because later we have:

(insn 25 24 26 2 (set (reg:DI 97)
         (const_int 255 [0xff])) pr.c:9 50 {*movdi_aarch64}
      (nil))
(insn 26 25 48 2 (set (zero_extract:DI (reg:DI 102 [ D.3038 ])
             (const_int 32 [0x20])
             (const_int 32 [0x20]))
         (reg:DI 97)) pr.c:9 691 {*insv_regdi}
      (expr_list:REG_DEAD (reg:DI 97)
         (nil)))

get replaced to

(insn 26 24 48 2 (set (zero_extract:DI (reg:DI 102 [ D.3038 ])
             (const_int 32 [0x20])
             (const_int 32 [0x20]))
         (reg:DI 94)) pr.c:9 691 {*insv_regdi}
      (expr_list:REG_DEAD (reg:DI 97)
         (nil)))

because when insn 25 is scanned it the constant 255 is already
mapped to reg 94 at insn 20.

now cse maps:
(reg:DI 97)  =>  (reg:DI 94)

therefore canon_reg (reg:DI 97) returns (reg:DI 94)
which makes canonicalize_insn do the wrong transformation at insn 26.


Now I think I found a better place for a patch, where the first bogus
mapping is recorded:

Index: cse.c
===================================================================
--- cse.c       (revision 238891)
+++ cse.c       (working copy)
@@ -5898,15 +5898,7 @@ cse_insn (rtx_insn *insn)
            || GET_MODE (dest) == BLKmode
            /* If we didn't put a REG_EQUAL value or a source into the hash
               table, there is no point is recording DEST.  */
-           || sets[i].src_elt == 0
-           /* If DEST is a paradoxical SUBREG and SRC is a ZERO_EXTEND
-              or SIGN_EXTEND, don't record DEST since it can cause
-              some tracking to be wrong.
-
-              ??? Think about this more later.  */
-           || (paradoxical_subreg_p (dest)
-               && (GET_CODE (sets[i].src) == SIGN_EXTEND
-                   || GET_CODE (sets[i].src) == ZERO_EXTEND)))
+           || sets[i].src_elt == 0)
          continue;

        /* STRICT_LOW_PART isn't part of the value BEING set,
@@ -5925,6 +5917,11 @@ cse_insn (rtx_insn *insn)
              sets[i].dest_hash = HASH (dest, GET_MODE (dest));
            }

+       /* If DEST is a paradoxical SUBREG, don't record DEST since it can
+          cause some tracking to be wrong.  */
+       if (paradoxical_subreg_p (dest))
+         continue;
+
        elt = insert (dest, sets[i].src_elt,
                      sets[i].dest_hash, GET_MODE (dest));



So apparently there was already an attempt of a fix for a similar bug,
and svn blame points to:

svn log -v -r8354
------------------------------------------------------------------------
r8354 | kenner | 1994-10-28 23:55:05 +0100 (Fri, 28 Oct 1994) | 3 lines
Changed paths:
    M /trunk/gcc/cse.c

(cse_insn): Don't record a DEST a paradoxical SUBREG and SRC is a
SIGN_EXTEND or ZERO_EXTEND.

------------------------------------------------------------------------

This way we can still map the underlying QI register to 255 but
not the SUBREG if it is a paradoxical subreg.

In the test case this patch still works (output code does not change).

What do you think?


Bernd.

Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg

Reply via email to