> -----Original Message-----
> From: Richard Sandiford <richard.sandif...@arm.com>
> Sent: Monday, June 13, 2022 9:41 AM
> To: Tamar Christina <tamar.christ...@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; rguent...@suse.de
> Subject: Re: [PATCH]middle-end Use subregs to expand COMPLEX_EXPR to
> set the lowpart.
> 
> Tamar Christina <tamar.christ...@arm.com> writes:
> > Hi All,
> >
> > When lowering COMPLEX_EXPR we currently emit two VEC_EXTRACTs.
> One
> > for the lowpart and one for the highpart.
> >
> > The problem with this is that in RTL the lvalue of the RTX is the only
> > thing tying the two instructions together.
> >
> > This means that e.g. combine is unable to try to combine the two
> > instructions for setting the lowpart and highpart.
> >
> > For ISAs that have bit extract instructions we can eliminate one of
> > the extracts if, and only if we're setting the entire complex number.
> >
> > This change changes the expand code when we're setting the entire
> > complex number to generate a subreg for the lowpart instead of a
> vec_extract.
> >
> > This allows us to optimize sequences such as:
> >
> > _Complex int f(int a, int b) {
> >     _Complex int t = a + b * 1i;
> >     return t;
> > }
> >
> > from:
> >
> > f:
> >     bfi     x2, x0, 0, 32
> >     bfi     x2, x1, 32, 32
> >     mov     x0, x2
> >     ret
> >
> > into:
> >
> > f:
> >     bfi     x0, x1, 32, 32
> >     ret
> >
> > I have also confirmed the codegen for x86_64 did not change.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> > and no issues.
> >
> > Ok for master?
> 
> I'm not sure this is endian-safe.  For big-endian it's the imaginary part 
> that can
> be written as a subreg.  The real part might be the high part of a register.
> 
> Maybe a more general way to handle this would be to add (yet another)
> parameter to store_bit_field that indicates that the current value of the
> structure is undefined.  That would also be useful in at least one other 
> caller
> (from calls.cc).  write_complex_part could then have a similar parameter,
> true for the first write and false for the second.

Ohayou-gozaimasu!

I've rewritten it using the approach you requested. I attempted to set the flag
In the correct places as well.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

        * expmed.cc (store_bit_field): Add parameter that indicates if value is
        still undefined and if so emit a subreg move instead.
        * expr.h (write_complex_part): Likewise.
        * expmed.h (store_bit_field): Add new parameter.
        * builtins.cc (expand_ifn_atomic_compare_exchange_into_call): Use new
        parameter.
        (expand_ifn_atomic_compare_exchange): Likewise.
        * calls.cc (store_unaligned_arguments_into_pseudos): Likewise.
        * emit-rtl.cc (validate_subreg): Likewise.
        * expr.cc (emit_group_store): Likewise.
        (copy_blkmode_from_reg): Likewise.
        (copy_blkmode_to_reg): Likewise.
        (clear_storage_hints): Likewise.
        (write_complex_part):  Likewise.
        (emit_move_complex_parts): Likewise.
        (expand_assignment): Likewise.
        (store_expr): Likewise.
        (store_field): Likewise.
        (expand_expr_real_2): Likewise.
        * ifcvt.cc (noce_emit_move_insn): Likewise.
        * internal-fn.cc (expand_arith_set_overflow): Likewise.
        (expand_arith_overflow_result_store): Likewise.
        (expand_addsub_overflow): Likewise.
        (expand_neg_overflow): Likewise.
        (expand_mul_overflow): Likewise.
        (expand_arith_overflow): Likewise.

gcc/testsuite/ChangeLog:

        * g++.target/aarch64/complex-init.C: New test.

--- inline copy of patch ---

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 
4c6c29390531d8ae9765add598621727213b23ec..8c80e46d9c9c9c2a7e1ce0f8add86729fd542b16
 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -6014,8 +6014,8 @@ expand_ifn_atomic_compare_exchange_into_call (gcall 
*call, machine_mode mode)
       if (GET_MODE (boolret) != mode)
        boolret = convert_modes (mode, GET_MODE (boolret), boolret, 1);
       x = force_reg (mode, x);
-      write_complex_part (target, boolret, true);
-      write_complex_part (target, x, false);
+      write_complex_part (target, boolret, true, true);
+      write_complex_part (target, x, false, false);
     }
 }
 
@@ -6070,8 +6070,8 @@ expand_ifn_atomic_compare_exchange (gcall *call)
       rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
       if (GET_MODE (boolret) != mode)
        boolret = convert_modes (mode, GET_MODE (boolret), boolret, 1);
-      write_complex_part (target, boolret, true);
-      write_complex_part (target, oldval, false);
+      write_complex_part (target, boolret, true, true);
+      write_complex_part (target, oldval, false, false);
     }
 }
 
diff --git a/gcc/calls.cc b/gcc/calls.cc
index 
e13469cfd43b5bdd4ca0d2b8458a9e4f996e36e9..586af170879ab0152e9c0634a4b8e0ce03ea8d6e
 100644
--- a/gcc/calls.cc
+++ b/gcc/calls.cc
@@ -1214,7 +1214,7 @@ store_unaligned_arguments_into_pseudos (struct arg_data 
*args, int num_actuals)
 
            bytes -= bitsize / BITS_PER_UNIT;
            store_bit_field (reg, bitsize, endian_correction, 0, 0,
-                            word_mode, word, false);
+                            word_mode, word, false, false);
          }
       }
 }
diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
index 
f4404d7abe33b565358b7f609a91114c75ecf4e7..15ffca2ffe986bca56c1fae9381bd33f5d6b012d
 100644
--- a/gcc/emit-rtl.cc
+++ b/gcc/emit-rtl.cc
@@ -947,9 +947,11 @@ validate_subreg (machine_mode omode, machine_mode imode,
           && GET_MODE_INNER (omode) == GET_MODE_INNER (imode))
     ;
   /* Subregs involving floating point modes are not allowed to
-     change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
+     change size unless it's an insert into a complex mode.
+     Therefore (subreg:DI (reg:DF) 0) and (subreg:CS (reg:SF) 0) are fine, but
      (subreg:SI (reg:DF) 0) isn't.  */
-  else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
+  else if ((FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
+          && !COMPLEX_MODE_P (omode))
     {
       if (! (known_eq (isize, osize)
             /* LRA can use subreg to store a floating point value in
diff --git a/gcc/expmed.h b/gcc/expmed.h
index 
ee1ddc82b601ce02957c493dad0d70eee2784ed7..0b2538c4c6bd51dfdc772ef70bdf631c0bed8717
 100644
--- a/gcc/expmed.h
+++ b/gcc/expmed.h
@@ -715,7 +715,7 @@ extern rtx expand_divmod (int, enum tree_code, 
machine_mode, rtx, rtx,
 
 extern void store_bit_field (rtx, poly_uint64, poly_uint64,
                             poly_uint64, poly_uint64,
-                            machine_mode, rtx, bool);
+                            machine_mode, rtx, bool, bool);
 extern rtx extract_bit_field (rtx, poly_uint64, poly_uint64, int, rtx,
                              machine_mode, machine_mode, bool, rtx *);
 extern rtx extract_low_bits (machine_mode, machine_mode, rtx);
diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index 
ed39c88bd044279113f46661608c53b8a69d81a1..9c66cc40f60aadf5472b25770cb16d6e9f85a7e2
 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -1112,13 +1112,15 @@ store_integral_bit_field (rtx op0, opt_scalar_int_mode 
op0_mode,
 
    FIELDMODE is the machine-mode of the FIELD_DECL node for this field.
 
-   If REVERSE is true, the store is to be done in reverse order.  */
+   If REVERSE is true, the store is to be done in reverse order.
+
+   If UNDEFINED_P is true then STR_RTX is currently undefined.  */
 
 void
 store_bit_field (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
                 poly_uint64 bitregion_start, poly_uint64 bitregion_end,
                 machine_mode fieldmode,
-                rtx value, bool reverse)
+                rtx value, bool reverse, bool undefined_p)
 {
   /* Handle -fstrict-volatile-bitfields in the cases where it applies.  */
   unsigned HOST_WIDE_INT ibitsize = 0, ibitnum = 0;
@@ -1160,6 +1162,18 @@ store_bit_field (rtx str_rtx, poly_uint64 bitsize, 
poly_uint64 bitnum,
       return;
     }
 
+  if (bitsize.is_constant (&ibitsize)
+      && bitnum.is_constant (&ibitnum)
+      && is_a <scalar_int_mode> (fieldmode, &int_mode)
+      && undefined_p
+      && ibitsize == GET_MODE_BITSIZE (int_mode))
+    {
+      gcc_assert (ibitnum % BITS_PER_UNIT == 0);
+      rtx dest = lowpart_subreg (GET_MODE (value), str_rtx, GET_MODE 
(str_rtx));
+      emit_move_insn (dest, value);
+      return;
+    }
+
   /* Under the C++0x memory model, we must not touch bits outside the
      bit region.  Adjust the address to start at the beginning of the
      bit region.  */
diff --git a/gcc/expr.h b/gcc/expr.h
index 
7e5cf495a2bf12e15e2a00d293dfb54f830f38dd..41447e023c7218db195e8db3725714cffb10b3f3
 100644
--- a/gcc/expr.h
+++ b/gcc/expr.h
@@ -253,7 +253,7 @@ extern rtx_insn *emit_move_insn_1 (rtx, rtx);
 extern rtx_insn *emit_move_complex_push (machine_mode, rtx, rtx);
 extern rtx_insn *emit_move_complex_parts (rtx, rtx);
 extern rtx read_complex_part (rtx, bool);
-extern void write_complex_part (rtx, rtx, bool);
+extern void write_complex_part (rtx, rtx, bool, bool);
 extern rtx read_complex_part (rtx, bool);
 extern rtx emit_move_resolve_push (machine_mode, rtx);
 
diff --git a/gcc/expr.cc b/gcc/expr.cc
index 
5f7142b975ada2cd8b00663d35ba1e0004b8e28d..c7f057a87fbc92d81388c87dd3754b4d21af6612
 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -2850,7 +2850,7 @@ emit_group_store (rtx orig_dst, rtx src, tree type 
ATTRIBUTE_UNUSED,
          store_bit_field (dest,
                           adj_bytelen * BITS_PER_UNIT, bytepos * BITS_PER_UNIT,
                           bytepos * BITS_PER_UNIT, ssize * BITS_PER_UNIT - 1,
-                          VOIDmode, tmps[i], false);
+                          VOIDmode, tmps[i], false, false);
        }
 
       /* Optimize the access just a bit.  */
@@ -2864,7 +2864,7 @@ emit_group_store (rtx orig_dst, rtx src, tree type 
ATTRIBUTE_UNUSED,
 
       else
        store_bit_field (dest, bytelen * BITS_PER_UNIT, bytepos * BITS_PER_UNIT,
-                        0, 0, mode, tmps[i], false);
+                        0, 0, mode, tmps[i], false, false);
     }
 
   /* Copy from the pseudo into the (probable) hard reg.  */
@@ -2997,7 +2997,7 @@ copy_blkmode_from_reg (rtx target, rtx srcreg, tree type)
                                          xbitpos % BITS_PER_WORD, 1,
                                          NULL_RTX, copy_mode, copy_mode,
                                          false, NULL),
-                      false);
+                      false, false);
     }
 }
 
@@ -3099,7 +3099,7 @@ copy_blkmode_to_reg (machine_mode mode_in, tree src)
                                          bitpos % BITS_PER_WORD, 1,
                                          NULL_RTX, word_mode, word_mode,
                                          false, NULL),
-                      false);
+                      false, false);
     }
 
   if (mode == BLKmode)
@@ -3267,8 +3267,8 @@ clear_storage_hints (rtx object, rtx size, enum 
block_op_methods method,
          zero = CONST0_RTX (GET_MODE_INNER (mode));
          if (zero != NULL)
            {
-             write_complex_part (object, zero, 0);
-             write_complex_part (object, zero, 1);
+             write_complex_part (object, zero, 0, true);
+             write_complex_part (object, zero, 1, false);
              return NULL;
            }
        }
@@ -3429,10 +3429,11 @@ set_storage_via_setmem (rtx object, rtx size, rtx val, 
unsigned int align,
 
 

 /* Write to one of the components of the complex value CPLX.  Write VAL to
-   the real part if IMAG_P is false, and the imaginary part if its true.  */
+   the real part if IMAG_P is false, and the imaginary part if its true.
+   If UNDEFINED_P then the value in CPLX is currently undefined.  */
 
 void
-write_complex_part (rtx cplx, rtx val, bool imag_p)
+write_complex_part (rtx cplx, rtx val, bool imag_p, bool undefined_p)
 {
   machine_mode cmode;
   scalar_mode imode;
@@ -3487,7 +3488,7 @@ write_complex_part (rtx cplx, rtx val, bool imag_p)
     }
 
   store_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0, 0, 0, imode, val,
-                  false);
+                  false, undefined_p);
 }
 
 /* Extract one of the components of the complex value CPLX.  Extract the
@@ -3740,8 +3741,8 @@ emit_move_complex_parts (rtx x, rtx y)
       && REG_P (x) && !reg_overlap_mentioned_p (x, y))
     emit_clobber (x);
 
-  write_complex_part (x, read_complex_part (y, false), false);
-  write_complex_part (x, read_complex_part (y, true), true);
+  write_complex_part (x, read_complex_part (y, false), false, true);
+  write_complex_part (x, read_complex_part (y, true), true, false);
 
   return get_last_insn ();
 }
@@ -5385,7 +5386,7 @@ expand_assignment (tree to, tree from, bool nontemporal)
        }
       else
        store_bit_field (mem, GET_MODE_BITSIZE (mode), 0, 0, 0, mode, reg,
-                        false);
+                        false, false);
       return;
     }
 
@@ -5607,8 +5608,8 @@ expand_assignment (tree to, tree from, bool nontemporal)
            concat_store_slow:;
              rtx temp = assign_stack_temp (GET_MODE (to_rtx),
                                            GET_MODE_SIZE (GET_MODE (to_rtx)));
-             write_complex_part (temp, XEXP (to_rtx, 0), false);
-             write_complex_part (temp, XEXP (to_rtx, 1), true);
+             write_complex_part (temp, XEXP (to_rtx, 0), false, true);
+             write_complex_part (temp, XEXP (to_rtx, 1), true, false);
              result = store_field (temp, bitsize, bitpos,
                                    bitregion_start, bitregion_end,
                                    mode1, from, get_alias_set (to),
@@ -6166,7 +6167,8 @@ store_expr (tree exp, rtx target, int call_param_p,
                store_bit_field (target,
                                 rtx_to_poly_int64 (expr_size (exp))
                                 * BITS_PER_UNIT,
-                                0, 0, 0, GET_MODE (temp), temp, reverse);
+                                0, 0, 0, GET_MODE (temp), temp, reverse,
+                                false);
            }
          else
            convert_move (target, temp, TYPE_UNSIGNED (TREE_TYPE (exp)));
@@ -7556,7 +7558,7 @@ store_field (rtx target, poly_int64 bitsize, poly_int64 
bitpos,
       gcc_checking_assert (known_ge (bitpos, 0));
       store_bit_field (target, bitsize, bitpos,
                       bitregion_start, bitregion_end,
-                      mode, temp, reverse);
+                      mode, temp, reverse, false);
 
       return const0_rtx;
     }
@@ -10012,8 +10014,8 @@ expand_expr_real_2 (sepops ops, rtx target, 
machine_mode tmode,
              complex_expr_swap_order:
                /* Move the imaginary (op1) and real (op0) parts to their
                   location.  */
-               write_complex_part (target, op1, true);
-               write_complex_part (target, op0, false);
+               write_complex_part (target, op1, true, true);
+               write_complex_part (target, op0, false, false);
 
                return target;
              }
@@ -10042,8 +10044,8 @@ expand_expr_real_2 (sepops ops, rtx target, 
machine_mode tmode,
          }
 
       /* Move the real (op0) and imaginary (op1) parts to their location.  */
-      write_complex_part (target, op0, false);
-      write_complex_part (target, op1, true);
+      write_complex_part (target, op0, false, true);
+      write_complex_part (target, op1, true, false);
 
       return target;
 
@@ -10282,7 +10284,7 @@ expand_expr_real_2 (sepops ops, rtx target, 
machine_mode tmode,
        rtx dst = gen_reg_rtx (mode);
        emit_move_insn (dst, op0);
        store_bit_field (dst, bitsize, bitpos, 0, 0,
-                        TYPE_MODE (TREE_TYPE (treeop1)), op1, false);
+                        TYPE_MODE (TREE_TYPE (treeop1)), op1, false, false);
        return dst;
       }
 
diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index 
22960a67f893316563e46658fab8f06cf9cbeb80..149403586f4e220d27759a3c36cf5806d03f2040
 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -999,7 +999,8 @@ noce_emit_move_insn (rtx x, rtx y)
                }
 
              gcc_assert (start < (MEM_P (op) ? BITS_PER_UNIT : BITS_PER_WORD));
-             store_bit_field (op, size, start, 0, 0, GET_MODE (x), y, false);
+             store_bit_field (op, size, start, 0, 0, GET_MODE (x), y, false,
+                              false);
              return;
            }
 
@@ -1056,7 +1057,7 @@ noce_emit_move_insn (rtx x, rtx y)
   outmode = GET_MODE (outer);
   bitpos = SUBREG_BYTE (outer) * BITS_PER_UNIT;
   store_bit_field (inner, GET_MODE_BITSIZE (outmode), bitpos,
-                  0, 0, outmode, y, false);
+                  0, 0, outmode, y, false, false);
 }
 
 /* Return the CC reg if it is used in COND.  */
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 
8b1733e20c4455e4e8c383c92fe859f4256cae69..8baf52f01516a0e8294176af68e69ec67e43278f
 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -735,9 +735,9 @@ expand_arith_set_overflow (tree lhs, rtx target)
 {
   if (TYPE_PRECISION (TREE_TYPE (TREE_TYPE (lhs))) == 1
       && !TYPE_UNSIGNED (TREE_TYPE (TREE_TYPE (lhs))))
-    write_complex_part (target, constm1_rtx, true);
+    write_complex_part (target, constm1_rtx, true, false);
   else
-    write_complex_part (target, const1_rtx, true);
+    write_complex_part (target, const1_rtx, true, false);
 }
 
 /* Helper for expand_*_overflow.  Store RES into the __real__ part
@@ -792,7 +792,7 @@ expand_arith_overflow_result_store (tree lhs, rtx target,
       expand_arith_set_overflow (lhs, target);
       emit_label (done_label);
     }
-  write_complex_part (target, lres, false);
+  write_complex_part (target, lres, false, false);
 }
 
 /* Helper for expand_*_overflow.  Store RES into TARGET.  */
@@ -837,7 +837,7 @@ expand_addsub_overflow (location_t loc, tree_code code, 
tree lhs,
     {
       target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
       if (!is_ubsan)
-       write_complex_part (target, const0_rtx, true);
+       write_complex_part (target, const0_rtx, true, false);
     }
 
   /* We assume both operands and result have the same precision
@@ -1282,7 +1282,7 @@ expand_neg_overflow (location_t loc, tree lhs, tree arg1, 
bool is_ubsan,
     {
       target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
       if (!is_ubsan)
-       write_complex_part (target, const0_rtx, true);
+       write_complex_part (target, const0_rtx, true, false);
     }
 
   enum insn_code icode = optab_handler (negv3_optab, mode);
@@ -1407,7 +1407,7 @@ expand_mul_overflow (location_t loc, tree lhs, tree arg0, 
tree arg1,
     {
       target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
       if (!is_ubsan)
-       write_complex_part (target, const0_rtx, true);
+       write_complex_part (target, const0_rtx, true, false);
     }
 
   if (is_ubsan)
@@ -2224,7 +2224,7 @@ expand_mul_overflow (location_t loc, tree lhs, tree arg0, 
tree arg1,
       do_compare_rtx_and_jump (op1, res, NE, true, mode, NULL_RTX, NULL,
                               all_done_label, 
profile_probability::very_unlikely ());
       emit_label (set_noovf);
-      write_complex_part (target, const0_rtx, true);
+      write_complex_part (target, const0_rtx, true, false);
       emit_label (all_done_label);
     }
 
@@ -2493,7 +2493,7 @@ expand_arith_overflow (enum tree_code code, gimple *stmt)
        {
          /* The infinity precision result will always fit into result.  */
          rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
-         write_complex_part (target, const0_rtx, true);
+         write_complex_part (target, const0_rtx, true, false);
          scalar_int_mode mode = SCALAR_INT_TYPE_MODE (type);
          struct separate_ops ops;
          ops.code = code;
diff --git a/gcc/testsuite/g++.target/aarch64/complex-init.C 
b/gcc/testsuite/g++.target/aarch64/complex-init.C
new file mode 100644
index 
0000000000000000000000000000000000000000..d3fd3e88d04a87bacf1c4ee74ce25282c6ff81e8
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/complex-init.C
@@ -0,0 +1,40 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */
+
+/*
+** _Z1fii:
+** ...
+**     bfi     x0, x1, 32, 32
+**     ret
+** ...
+*/
+_Complex int f(int a, int b) {
+    _Complex int t = a + b * 1i;
+    return t;
+}
+
+/*
+** _Z2f2ii:
+** ...
+**     bfi     x0, x1, 32, 32
+**     ret
+** ...
+*/
+_Complex int f2(int a, int b) {
+    _Complex int t = {a, b};
+    return t;
+}
+
+/* 
+** _Z12f_convolutedii:
+** ...
+**     bfi     x0, x1, 32, 32
+**     ret
+** ...
+*/
+_Complex int f_convoluted(int a, int b) {
+    _Complex int t = (_Complex int)a;
+    __imag__ t = b;
+    return t;
+}

Attachment: rb15778.patch
Description: rb15778.patch

Reply via email to