[PATCH v1] Match: Support form 11 for the unsigned scalar .SAT_SUB

2024-06-17 Thread pan2 . li
From: Pan Li 

We missed one match pattern for the unsigned scalar .SAT_SUB,  aka
form 11.

Form 11:
  #define SAT_SUB_U_11(T) \
  T sat_sub_u_11_##T (T x, T y) \
  { \
T ret; \
bool overflow = __builtin_sub_overflow (x, y, &ret); \
return overflow ? 0 : ret; \
  }

Thus,  add above form 11 to the match pattern gimple_unsigned_integer_sat_sub.

The below test suites are passed for this patch:
1. The rv64gcv fully regression test with newlib.
2. The rv64gcv build with glibc.
3. The x86 bootstrap test.
4. The x86 fully regression test.

gcc/ChangeLog:

* match.pd: Add form 11 match pattern for .SAT_SUB.

Signed-off-by: Pan Li 
---
 gcc/match.pd | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index 99968d316ed..5c330a43ed0 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3186,13 +3186,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
   && types_match (type, @0, @1
 
-/* Unsigned saturation sub, case 7 (branch with .SUB_OVERFLOW).  */
+/* Unsigned saturation sub, case 7 (branch eq with .SUB_OVERFLOW).  */
 (match (unsigned_integer_sat_sub @0 @1)
  (cond^ (eq (imagpart (IFN_SUB_OVERFLOW@2 @0 @1)) integer_zerop)
   (realpart @2) integer_zerop)
  (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
   && types_match (type, @0, @1
 
+/* Unsigned saturation sub, case 8 (branch ne with .SUB_OVERFLOW).  */
+(match (unsigned_integer_sat_sub @0 @1)
+ (cond^ (ne (imagpart (IFN_SUB_OVERFLOW@2 @0 @1)) integer_zerop)
+   integer_zerop (realpart @2))
+ (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
+  && types_match (type, @0, @1
+
 /* x >  y  &&  x != XXX_MIN  -->  x > y
x >  y  &&  x == XXX_MIN  -->  false . */
 (for eqne (eq ne)
-- 
2.34.1



Re: [Patch, Fortran, 96418] Fix Test coarray_alloc_comp_4.f08 ICEs

2024-06-17 Thread Andre Vehreschild
Hi Harald,

thank you very much for the review. Committed as:

gcc-15-1369-gdb75a6657e9

Regarding your question on the coarray-tests that are not in the
coarray-directory: These test in most cases test only one method of
implementing coarrays. I.e., they are either testing just -fcoarray=single or
-fcoarray=lib -lcaf_single, which are two different approaches. The tests in
the coarray-directory test all available methods to implement coarrays. Pushing
all coarray-tests into the coarray-directory will fail a lot of them, because
the behavior of -fcoarray=single and -fcoarray=lib -lcaf_single is different in
some corner cases. That's why the coarray-tests in the main gfortran-dir are
separate.

I do understand why it may be confusing, but I don't see an easy solution. Does
this answer your question?

Thanks again for the review.

Regards,
Andre

On Fri, 14 Jun 2024 21:43:47 +0200
Harald Anlauf  wrote:

> Hi Andre,
>
> the patch looks fairly simple and obvious, so OK from my side.
>
> ***
>
> Regarding the testsuite: since you renamed one of the testcases
> gfortran.dg/coarray_alloc_comp_* and moved it to gfortran.dg/coarray/,
> I checked and noticed that there are other similar runtime tests for
> coarrays (while some are compile-time only tests).
>
> Do we plan to "clean" this up and move more/all related runtime
> tests to the coarray/ subdirectory?  What is the general opinion on
> this?
>
> ***
>
> Thanks for the patch!
>
> Harald
>
>
> Am 14.06.24 um 09:22 schrieb Andre Vehreschild:
> > Hi all,
> >
> > I messed up renaming of the coarray_alloc_comp-test. This is fixed in the
> > second version of the patch. Sorry for the inconvenience.
> >
> > Additionally I figured that this patch also fixed PR fortran/103112.
> >
> > Regtests ok on x86_64 Fedora 39. Ok for mainline?
> >
> > Regards,
> > Andre
> >
> > On Tue, 11 Jun 2024 16:12:38 +0200
> > Andre Vehreschild  wrote:
> >
> >> Hi all,
> >>
> >> attached patch has already been present in 2020, but lost my attention. It
> >> fixes an ICE in the testsuite. The old mails description is:
> >>
> >> attached patch fixes PR96418 where the code in the testsuite when compiled
> >> with -fcoarray=single  lead to an ICE. The reason was that the coarray
> >> object was derefed as an array, but it was no array. Introducing the test
> >> for the descriptor removes the ICE.
> >>
> >> Regtests ok on x86_64-linux/Fedora 39. Ok for mainline?
> >>
> >> Regards,
> >>Andre
> >> --
> >> Andre Vehreschild * Email: vehre ad gmx dot de
> >
> >
> > --
> > Andre Vehreschild * Email: vehre ad gmx dot de
>


--
Andre Vehreschild * Email: vehre ad gmx dot de


[PATCH Committed][APX ZU] Fix test for target-support check

2024-06-17 Thread Kong, Lingling
Fix test for APX ZU. Add attribute for no-inline and target APX, and 
target-support check.



Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.

Committed as an obvious patch.



gcc/testsuite/ChangeLog:



   * gcc.target/i386/apx-zu-1.c: Add attribute for noinline,

   and target apx.

   * gcc.target/i386/apx-zu-2.c: Add target-support check.

---

gcc/testsuite/gcc.target/i386/apx-zu-1.c | 6 ++

 gcc/testsuite/gcc.target/i386/apx-zu-2.c | 3 +++

2 files changed, 9 insertions(+)



diff --git a/gcc/testsuite/gcc.target/i386/apx-zu-1.c 
b/gcc/testsuite/gcc.target/i386/apx-zu-1.c

index 927a87673a7..bc0e7fbb4dd 100644

--- a/gcc/testsuite/gcc.target/i386/apx-zu-1.c

+++ b/gcc/testsuite/gcc.target/i386/apx-zu-1.c

@@ -9,26 +9,32 @@

/* { dg-final { scan-assembler-times "setzue" 1} } */

/* { dg-final { scan-assembler-times "setzuge" 1} } */

/* { dg-final { scan-assembler "imulzu"} } */

+

+__attribute__((noinline, noclone, target("apxf")))

long long foo0 (int a)

{

   return a == 0 ? 0 : 1;

}

+__attribute__((noinline, noclone, target("apxf")))

long foo1 (int a, int b)

{

   return a > b ? 0 : 1;

}

+__attribute__((noinline, noclone, target("apxf")))

int foo2 (int a, int b)

{

   return a != b ? 0 : 1;

}

+__attribute__((noinline, noclone, target("apxf")))

short foo3 (int a, int b)

{

   return a < b ? 0 : 1;

}

+__attribute__((noinline, noclone, target("apxf")))

unsigned long

f1(unsigned short x)

{

diff --git a/gcc/testsuite/gcc.target/i386/apx-zu-2.c 
b/gcc/testsuite/gcc.target/i386/apx-zu-2.c

index 3ee04495d98..7585492bd7c 100644

--- a/gcc/testsuite/gcc.target/i386/apx-zu-2.c

+++ b/gcc/testsuite/gcc.target/i386/apx-zu-2.c

@@ -5,6 +5,9 @@

 int main(void)

{

+  if (!__builtin_cpu_supports ("apxf"))

+return 0;

+

   if (foo0 (0))

 __builtin_abort ();

   if (foo1 (3, 2))

--

2.31.1



[Patch-2v3, rs6000] Eliminate unnecessary byte swaps for duplicated constant vector store [PR113325]

2024-06-17 Thread HAO CHEN GUI
Hi,
  This patch creates an insn_and_split pattern which helps the duplicated
constant vector replace the source pseudo of store insn in fwprop pass.
Thus the store can be implemented by a single stxvd2x and it eliminates the
unnecessary byte swap insn on P8 LE. The test case shows the optimization.

  The patch depends on the first generic patch which uses insn cost in fwprop.

  Compared to previous version, the main change is to move
"can_create_pseudo_p ()" to insn condition.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for trunk?

Thanks
Gui Haochen


ChangeLog
rs6000: Eliminate unnecessary byte swaps for duplicated constant vector store

gcc/
PR target/113325
* config/rs6000/vsx.md (vsx_stxvd2x4_le_const_): New.

gcc/testsuite/
PR target/113325
* gcc.target/powerpc/pr113325.c: New.


patch.diff
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index f135fa079bd..d350c92141c 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -3368,6 +3368,31 @@ (define_insn "*vsx_stxvd2x4_le_"
   "stxvd2x %x1,%y0"
   [(set_attr "type" "vecstore")])

+(define_insn_and_split "vsx_stxvd2x4_le_const_"
+  [(set (match_operand:VSX_W 0 "memory_operand" "=Z")
+   (match_operand:VSX_W 1 "immediate_operand" "W"))]
+  "!BYTES_BIG_ENDIAN
+   && VECTOR_MEM_VSX_P (mode)
+   && !TARGET_P9_VECTOR
+   && const_vec_duplicate_p (operands[1])
+   && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(set (match_dup 2)
+   (match_dup 1))
+   (set (match_dup 0)
+   (vec_select:VSX_W
+ (match_dup 2)
+ (parallel [(const_int 2) (const_int 3)
+(const_int 0) (const_int 1)])))]
+{
+  /* Here all the constants must be loaded without memory.  */
+  gcc_assert (easy_altivec_constant (operands[1], mode));
+  operands[2] = gen_reg_rtx (mode);
+}
+  [(set_attr "type" "vecstore")
+   (set_attr "length" "8")])
+
 (define_insn "*vsx_stxvd2x8_le_V8HI"
   [(set (match_operand:V8HI 0 "memory_operand" "=Z")
 (vec_select:V8HI
diff --git a/gcc/testsuite/gcc.target/powerpc/pr113325.c 
b/gcc/testsuite/gcc.target/powerpc/pr113325.c
new file mode 100644
index 000..3ca1fcbc9ba
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr113325.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mdejagnu-cpu=power8 -mvsx" } */
+/* { dg-require-effective-target powerpc_vsx } */
+/* { dg-final { scan-assembler-not {\mxxpermdi\M} } } */
+
+void* foo (void* s1)
+{
+  return __builtin_memset (s1, 0, 32);
+}


Re: [PATCH-1v4] fwprop: Replace rtx_cost with insn_cost in try_fwprop_subst_pattern [PR113325]

2024-06-17 Thread Richard Sandiford
HAO CHEN GUI  writes:
> Hi,
>   This patch replaces rtx_cost with insn_cost in forward propagation.
> In the PR, one constant vector should be propagated and replace a
> pseudo in a store insn if we know it's a duplicated constant vector.
> It reduces the insn cost but not rtx cost. In this case, the cost is
> determined by destination operand (memory or pseudo). Unfortunately,
> rtx cost can't help.
>
>   The test case is added in the second rs6000 specific patch.
>
>   Compared to previous version, the main changes are:
> 1. Invalidate recog_data when the cached INSN is swapped out.
> 2. Pass strict_p according to prop.likely_profitable_p () to
> change_is_worthwhile.
>
> Previous version
> https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654276.html
>
>
>   The patch causes a regression cases on i386 as the pattern cost
> regulation has a bug. Please refer the patch and discussion here.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651363.html
>
>   Bootstrapped and tested on powerpc64-linux BE and LE with no
> regressions. Is it OK for the trunk?
>
> ChangeLog
> fwprop: invoke change_is_worthwhile to judge if a replacement is worthwhile
>
> gcc/
>   * fwprop.cc (try_fwprop_subst_pattern): Invoke change_is_worthwhile
>   to judge if a replacement is worthwhile.
>   * recog.cc (swap_change): Invalidate recog_data when the cached INSN
>   is swapped out.
>   * rtl-ssa/changes.cc (rtl_ssa::changes_are_worthwhile): Check if the
>   insn cost of new rtl is unknown and fail the replacement.
>
> patch.diff
> diff --git a/gcc/fwprop.cc b/gcc/fwprop.cc
> index de543923b92..4a9f68b66b1 100644
> --- a/gcc/fwprop.cc
> +++ b/gcc/fwprop.cc
> @@ -471,29 +471,18 @@ try_fwprop_subst_pattern (obstack_watermark &attempt, 
> insn_change &use_change,
>redo_changes (0);
>  }
>
> -  /* ??? In theory, it should be better to use insn costs rather than
> - set_src_costs here.  That would involve replacing this code with
> - change_is_worthwhile.  */
>bool ok = recog (attempt, use_change);
> -  if (ok && !prop.changed_mem_p () && !use_insn->is_asm ())
> -if (rtx use_set = single_set (use_rtl))
> -  {
> - bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (use_rtl));
> - temporarily_undo_changes (0);
> - auto old_cost = set_src_cost (SET_SRC (use_set),
> -   GET_MODE (SET_DEST (use_set)), speed);
> - redo_changes (0);
> - auto new_cost = set_src_cost (SET_SRC (use_set),
> -   GET_MODE (SET_DEST (use_set)), speed);
> - if (new_cost > old_cost
> - || (new_cost == old_cost && !prop.likely_profitable_p ()))
> -   {
> - if (dump_file)
> -   fprintf (dump_file, "change not profitable"
> -" (cost %d -> cost %d)\n", old_cost, new_cost);
> - ok = false;
> -   }
> -  }
> +  if (ok && !prop.changed_mem_p () && !use_insn->is_asm ()
> +  && single_set (use_rtl))

I don't think we should keep the single_set condition after this change.
insn_cost can handle all instructions.

OK for trunk with that removed.

Thanks,
Richard

> +{
> +  bool strict_p = !prop.likely_profitable_p ();
> +  if (!change_is_worthwhile (use_change, strict_p))
> + {
> +   if (dump_file)
> + fprintf (dump_file, "change not profitable");
> +   ok = false;
> + }
> +}
>
>if (!ok)
>  {
> diff --git a/gcc/recog.cc b/gcc/recog.cc
> index a6799e3f5e6..56370e40e01 100644
> --- a/gcc/recog.cc
> +++ b/gcc/recog.cc
> @@ -614,7 +614,11 @@ swap_change (int num)
>else
>  std::swap (*changes[num].loc, changes[num].old);
>if (changes[num].object && !MEM_P (changes[num].object))
> -std::swap (INSN_CODE (changes[num].object), changes[num].old_code);
> +{
> +  std::swap (INSN_CODE (changes[num].object), changes[num].old_code);
> +  if (recog_data.insn == changes[num].object)
> + recog_data.insn = nullptr;
> +}
>  }
>
>  /* Temporarily undo all the changes numbered NUM and up, with a view
> diff --git a/gcc/rtl-ssa/changes.cc b/gcc/rtl-ssa/changes.cc
> index 11639e81bb7..c5ac4956a19 100644
> --- a/gcc/rtl-ssa/changes.cc
> +++ b/gcc/rtl-ssa/changes.cc
> @@ -186,6 +186,14 @@ rtl_ssa::changes_are_worthwhile (array_slice *const> changes,
>if (!change->is_deletion ())
>   {
> change->new_cost = insn_cost (change->rtl (), for_speed);
> +   /* If the cost is unknown, replacement is not worthwhile.  */
> +   if (!change->new_cost)
> + {
> +   if (dump_file && (dump_flags & TDF_DETAILS))
> + fprintf (dump_file,
> +  "Reject replacement due to unknown insn cost.\n");
> +   return false;
> + }
> new_cost += change->new_cost;
> if (for_speed)
>   weighted_new_cost += (cfg_bb->count.to_sreal_scale (entry_count)


Re: [PATCH V4] Add warning options -W[no-]compare-distinct-pointer-types

2024-06-17 Thread Jose E. Marchesi


> On Thu, 17 Aug 2023, Jose E. Marchesi via Gcc-patches wrote:
>> [Changes from V3:
> :
>> LLVM supports an option -W[no-]compare-distinct-pointer-types that can
>> be used in order to enable or disable the emission of such warnings.
>
> It looks this went in, alas is not covered in gcc-14/changes.html?
>
> Was that intentional? If not, would you mind considering adding something?

Will do.


Re: [PATCH] middle-end/114189 - drop uses of vcond{,u,eq}_optab

2024-06-17 Thread Richard Sandiford
Richard Biener  writes:
> On Fri, 14 Jun 2024, Richard Biener wrote:
>
>> On Fri, 14 Jun 2024, Richard Sandiford wrote:
>> 
>> > Richard Biener  writes:
>> > > On Fri, 14 Jun 2024, Richard Sandiford wrote:
>> > >
>> > >> Richard Biener  writes:
>> > >> > The following retires vcond{,u,eq} optabs by stopping to use them
>> > >> > from the middle-end.  Targets instead (should) implement vcond_mask
>> > >> > and vec_cmp{,u,eq} optabs.  The PR this change refers to lists
>> > >> > possibly affected targets - those implementing these patterns,
>> > >> > and in particular it lists mips, sparc and ia64 as targets that
>> > >> > most definitely will regress while others might simply remove
>> > >> > their vcond{,u,eq} patterns.
>> > >> >
>> > >> > I'd appreciate testing, I do not expect fallout for x86 or 
>> > >> > arm/aarch64.
>> > >> > I know riscv doesn't implement any of the legacy optabs.  But less
>> > >> > maintained vector targets might need adjustments.
>> > >> >
>> > >> > I want to get rid of those optabs for GCC 15.  If I don't hear from
>> > >> > you I will assume your target is fine.
>> > >> 
>> > >> Great!  Thanks for doing this.
>> > >> 
>> > >> Is there a plan for how we should handle vector comparisons that
>> > >> have to be done as the inverse of the negated condition?  Should
>> > >> targets simply not provide vec_cmp for such conditions and leave
>> > >> the target-independent code to deal with the fallout?  (For a
>> > >> standalone comparison, it would invert the result.  For a VEC_COND_EXPR
>> > >> it would swap the true and false values.)
>> > >
>> > > I would expect that the ISEL pass which currently deals with finding
>> > > valid combos of .VCMP{,U,EQ} and .VCOND_MASK deals with this.
>> > > So how do we deal with this right now?  I expect RTL expansion will
>> > > do the inverse trick, no?
>> > 
>> > I think in practice (at least for the targets I've worked on),
>> > the target's vec_cmp handles the inversion itself.  Thus the
>> > main optimisation done by targets' vcond patterns is to avoid
>> > the inversion (and instead swap the true/false values) when the
>> > "opposite" comparison is the native one.
>> 
>> I see.  I suppose whether or not vec_cmp is handled is determined
>> by a FAIL so it's somewhat difficult to determine this at ISEL time.

In principle we could say that the predicates should accept only the
conditions that can be done natively.  Then target-independent code
can apply the usual approaches to generating other conditions
(which tend to be replicated across targets anyway).

> I'll also note that we document vec_cmp{,u,eq} as having all zeros,
> all ones for the result while vcond_mask might only care for the MSB
> (it's documented to work on the result of a pre-computed vector
> comparison).

Not sure how much the docs reflect reality.  At least for SVE,
vec_cmp returns 0/1 results for vector boolean modes. 

But I think for integer comparison results, vec_cmp must produce 0/-1
and vcond only accepts 0/-1.

> So this eventually asks for targets to work out the optimal sequence
> via combine helpers and thus eventually splitters to fixup invalid
> compare operators late?

I really hope we can do this in late gimple & expand.

Thanks,
Richard


[PATCH 0/8] Follow-on force_subreg patches

2024-06-17 Thread Richard Sandiford
This series expands on the fix for PR115464 by using force_subreg
in more places.  It also adds some convenience wrappers for lowpart
and highpart subregs.

A part of this will need to be backported after a grace period,
but I'll post the cherry-picked parts separately.

Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK to install?

Richard Sandiford (8):
  Make force_subreg emit nothing on failure
  aarch64: Use force_subreg in more places
  Make more use of force_subreg
  Add force_lowpart_subreg
  aarch64: Add some uses of force_lowpart_subreg
  Make more use of force_lowpart_subreg
  Add force_highpart_subreg
  aarch64: Add some uses of force_highpart_subreg

 gcc/builtins.cc   | 22 +++---
 gcc/config/aarch64/aarch64-builtins.cc| 15 +++
 gcc/config/aarch64/aarch64-simd.md|  4 +-
 .../aarch64/aarch64-sve-builtins-base.cc  | 10 ++---
 .../aarch64/aarch64-sve-builtins-functions.h  |  6 +--
 .../aarch64/aarch64-sve-builtins-sme.cc   |  2 +-
 gcc/config/aarch64/aarch64.cc | 31 -
 gcc/explow.cc | 34 +-
 gcc/explow.h  |  2 +
 gcc/expmed.cc | 26 ---
 gcc/expr.cc   | 44 +--
 gcc/optabs.cc | 26 ++-
 .../aarch64/sve/acle/general/pr115464_2.c | 11 +
 13 files changed, 111 insertions(+), 122 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr115464_2.c

-- 
2.25.1



[PATCH 1/8] Make force_subreg emit nothing on failure

2024-06-17 Thread Richard Sandiford
While adding more uses of force_subreg, I realised that it should
be more careful to emit no instructions on failure.  This kind of
failure should be very rare, so I don't think it's a case worth
optimising for.

gcc/
* explow.cc (force_subreg): Emit no instructions on failure.
---
 gcc/explow.cc | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/explow.cc b/gcc/explow.cc
index f6843398c4b..bd93c878064 100644
--- a/gcc/explow.cc
+++ b/gcc/explow.cc
@@ -756,8 +756,12 @@ force_subreg (machine_mode outermode, rtx op,
   if (x)
 return x;
 
+  auto *start = get_last_insn ();
   op = copy_to_mode_reg (innermode, op);
-  return simplify_gen_subreg (outermode, op, innermode, byte);
+  rtx res = simplify_gen_subreg (outermode, op, innermode, byte);
+  if (!res)
+delete_insns_since (start);
+  return res;
 }
 
 /* If X is a memory ref, copy its contents to a new temp reg and return
-- 
2.25.1



[PATCH 2/8] aarch64: Use force_subreg in more places

2024-06-17 Thread Richard Sandiford
This patch makes the aarch64 code use force_subreg instead of
simplify_gen_subreg in more places.  The criteria were:

(1) The code is obviously specific to expand (where new pseudos
can be created).

(2) The value is obviously an rvalue rather than an lvalue.

(3) The offset wasn't a simple lowpart or highpart calculation;
a later patch will deal with those.

gcc/
* config/aarch64/aarch64-builtins.cc (aarch64_expand_fcmla_builtin):
Use force_subreg instead of simplify_gen_subreg.
* config/aarch64/aarch64-simd.md (ctz2): Likewise.
* config/aarch64/aarch64-sve-builtins-base.cc
(svget_impl::expand): Likewise.
(svget_neonq_impl::expand): Likewise.
* config/aarch64/aarch64-sve-builtins-functions.h
(multireg_permute::expand): Likewise.
---
 gcc/config/aarch64/aarch64-builtins.cc  | 4 ++--
 gcc/config/aarch64/aarch64-simd.md  | 4 ++--
 gcc/config/aarch64/aarch64-sve-builtins-base.cc | 8 +++-
 gcc/config/aarch64/aarch64-sve-builtins-functions.h | 6 +++---
 4 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index d589e59defc..7d827cbc2ac 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -2592,12 +2592,12 @@ aarch64_expand_fcmla_builtin (tree exp, rtx target, int 
fcode)
   rtx temp2 = gen_reg_rtx (DImode);
   temp1 = simplify_gen_subreg (d->mode, op2, quadmode,
   subreg_lowpart_offset (d->mode, quadmode));
-  temp1 = simplify_gen_subreg (V2DImode, temp1, d->mode, 0);
+  temp1 = force_subreg (V2DImode, temp1, d->mode, 0);
   if (BYTES_BIG_ENDIAN)
emit_insn (gen_aarch64_get_lanev2di (temp2, temp1, const0_rtx));
   else
emit_insn (gen_aarch64_get_lanev2di (temp2, temp1, const1_rtx));
-  op2 = simplify_gen_subreg (d->mode, temp2, GET_MODE (temp2), 0);
+  op2 = force_subreg (d->mode, temp2, GET_MODE (temp2), 0);
 
   /* And recalculate the index.  */
   lane -= nunits / 4;
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 0bb39091a38..01b084d8ccb 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -389,8 +389,8 @@ (define_expand "ctz2"
   "TARGET_SIMD"
   {
  emit_insn (gen_bswap2 (operands[0], operands[1]));
- rtx op0_castsi2qi = simplify_gen_subreg(mode, operands[0],
-mode, 0);
+ rtx op0_castsi2qi = force_subreg (mode, operands[0],
+  mode, 0);
  emit_insn (gen_aarch64_rbit (op0_castsi2qi, op0_castsi2qi));
  emit_insn (gen_clz2 (operands[0], operands[0]));
  DONE;
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index 823d60040f9..99932037124 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -1121,9 +1121,8 @@ public:
   expand (function_expander &e) const override
   {
 /* Fold the access into a subreg rvalue.  */
-return simplify_gen_subreg (e.vector_mode (0), e.args[0],
-   GET_MODE (e.args[0]),
-   INTVAL (e.args[1]) * BYTES_PER_SVE_VECTOR);
+return force_subreg (e.vector_mode (0), e.args[0], GET_MODE (e.args[0]),
+INTVAL (e.args[1]) * BYTES_PER_SVE_VECTOR);
   }
 };
 
@@ -1157,8 +1156,7 @@ public:
e.add_fixed_operand (indices);
return e.generate_insn (icode);
   }
-return simplify_gen_subreg (e.result_mode (), e.args[0],
-   GET_MODE (e.args[0]), 0);
+return force_subreg (e.result_mode (), e.args[0], GET_MODE (e.args[0]), 0);
   }
 };
 
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-functions.h 
b/gcc/config/aarch64/aarch64-sve-builtins-functions.h
index 3b8e575e98e..7d06a57ff83 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-functions.h
+++ b/gcc/config/aarch64/aarch64-sve-builtins-functions.h
@@ -639,9 +639,9 @@ public:
   {
machine_mode elt_mode = e.vector_mode (0);
rtx arg = e.args[0];
-   e.args[0] = simplify_gen_subreg (elt_mode, arg, GET_MODE (arg), 0);
-   e.args.safe_push (simplify_gen_subreg (elt_mode, arg, GET_MODE (arg),
-  GET_MODE_SIZE (elt_mode)));
+   e.args[0] = force_subreg (elt_mode, arg, GET_MODE (arg), 0);
+   e.args.safe_push (force_subreg (elt_mode, arg, GET_MODE (arg),
+   GET_MODE_SIZE (elt_mode)));
   }
 return e.use_exact_insn (icode);
   }
-- 
2.25.1



[PATCH 3/8] Make more use of force_subreg

2024-06-17 Thread Richard Sandiford
This patch makes target-independent code use force_subreg instead
of simplify_gen_subreg in some places.  The criteria were:

(1) The code is obviously specific to expand (where new pseudos
can be created), or at least would be invalid to call when
!can_create_pseudo_p () and temporaries are needed.

(2) The value is obviously an rvalue rather than an lvalue.

(3) The offset wasn't a simple lowpart or highpart calculation;
a later patch will deal with those.

Doing this should reduce the likelihood of bugs like PR115464
occuring in other situations.

gcc/
* expmed.cc (store_bit_field_using_insv): Use force_subreg
instead of simplify_gen_subreg.
(store_bit_field_1): Likewise.
(extract_bit_field_as_subreg): Likewise.
(extract_integral_bit_field): Likewise.
(emit_store_flag_1): Likewise.
* expr.cc (convert_move): Likewise.
(convert_modes): Likewise.
(emit_group_load_1): Likewise.
(emit_group_store): Likewise.
(expand_assignment): Likewise.
---
 gcc/expmed.cc | 22 --
 gcc/expr.cc   | 27 ---
 2 files changed, 20 insertions(+), 29 deletions(-)

diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index 9ba01695f53..1f68e7be721 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -695,13 +695,7 @@ store_bit_field_using_insv (const extraction_insn *insv, 
rtx op0,
 if we must narrow it, be sure we do it correctly.  */
 
  if (GET_MODE_SIZE (value_mode) < GET_MODE_SIZE (op_mode))
-   {
- tmp = simplify_subreg (op_mode, value1, value_mode, 0);
- if (! tmp)
-   tmp = simplify_gen_subreg (op_mode,
-  force_reg (value_mode, value1),
-  value_mode, 0);
-   }
+   tmp = force_subreg (op_mode, value1, value_mode, 0);
  else
{
  if (targetm.mode_rep_extended (op_mode, value_mode) != UNKNOWN)
@@ -806,7 +800,7 @@ store_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, 
poly_uint64 bitnum,
   if (known_eq (bitnum, 0U)
  && known_eq (bitsize, GET_MODE_BITSIZE (GET_MODE (op0
{
- sub = simplify_gen_subreg (GET_MODE (op0), value, fieldmode, 0);
+ sub = force_subreg (GET_MODE (op0), value, fieldmode, 0);
  if (sub)
{
  if (reverse)
@@ -1633,7 +1627,7 @@ extract_bit_field_as_subreg (machine_mode mode, rtx op0,
   && known_eq (bitsize, GET_MODE_BITSIZE (mode))
   && lowpart_bit_field_p (bitnum, bitsize, op0_mode)
   && TRULY_NOOP_TRUNCATION_MODES_P (mode, op0_mode))
-return simplify_gen_subreg (mode, op0, op0_mode, bytenum);
+return force_subreg (mode, op0, op0_mode, bytenum);
   return NULL_RTX;
 }
 
@@ -2000,11 +1994,11 @@ extract_integral_bit_field (rtx op0, 
opt_scalar_int_mode op0_mode,
  return convert_extracted_bit_field (target, mode, tmode, unsignedp);
}
   /* If OP0 is a hard register, copy it to a pseudo before calling
-simplify_gen_subreg.  */
+force_subreg.  */
   if (REG_P (op0) && HARD_REGISTER_P (op0))
op0 = copy_to_reg (op0);
-  op0 = simplify_gen_subreg (word_mode, op0, op0_mode.require (),
-bitnum / BITS_PER_WORD * UNITS_PER_WORD);
+  op0 = force_subreg (word_mode, op0, op0_mode.require (),
+ bitnum / BITS_PER_WORD * UNITS_PER_WORD);
   op0_mode = word_mode;
   bitnum %= BITS_PER_WORD;
 }
@@ -5774,8 +5768,8 @@ emit_store_flag_1 (rtx target, enum rtx_code code, rtx 
op0, rtx op1,
 
  /* Do a logical OR or AND of the two words and compare the
 result.  */
- op00 = simplify_gen_subreg (word_mode, op0, int_mode, 0);
- op01 = simplify_gen_subreg (word_mode, op0, int_mode, UNITS_PER_WORD);
+ op00 = force_subreg (word_mode, op0, int_mode, 0);
+ op01 = force_subreg (word_mode, op0, int_mode, UNITS_PER_WORD);
  tem = expand_binop (word_mode,
  op1 == const0_rtx ? ior_optab : and_optab,
  op00, op01, NULL_RTX, unsignedp,
diff --git a/gcc/expr.cc b/gcc/expr.cc
index 9cecc1758f5..31a7346e33f 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -301,7 +301,7 @@ convert_move (rtx to, rtx from, int unsignedp)
GET_MODE_BITSIZE (to_mode)));
 
   if (VECTOR_MODE_P (to_mode))
-   from = simplify_gen_subreg (to_mode, from, GET_MODE (from), 0);
+   from = force_subreg (to_mode, from, GET_MODE (from), 0);
   else
to = simplify_gen_subreg (from_mode, to, GET_MODE (to), 0);
 
@@ -935,7 +935,7 @@ convert_modes (machine_mode mode, machine_mode oldmode, rtx 
x, int unsignedp)
 {
   gcc_assert (known_eq (GET_MODE_BITSIZE (mode),
GET_MODE_BITSIZE (oldmode)));
-  return simplify_gen_subreg (mode, x, oldmode, 0);
+  re

[PATCH 5/8] aarch64: Add some uses of force_lowpart_subreg

2024-06-17 Thread Richard Sandiford
This patch makes more use of force_lowpart_subreg, similarly
to the recent patch for force_subreg.  The criteria were:

(1) The code is obviously specific to expand (where new pseudos
can be created).

(2) The value is obviously an rvalue rather than an lvalue.

gcc/
PR target/115464
* config/aarch64/aarch64-builtins.cc (aarch64_expand_fcmla_builtin)
(aarch64_expand_rwsr_builtin): Use force_lowpart_subreg instead of
simplify_gen_subreg and lowpart_subreg.
* config/aarch64/aarch64-sve-builtins-base.cc
(svset_neonq_impl::expand): Likewise.
* config/aarch64/aarch64-sve-builtins-sme.cc
(add_load_store_slice_operand): Likewise.
* config/aarch64/aarch64.cc (aarch64_sve_reinterpret): Likewise.
(aarch64_addti_scratch_regs, aarch64_subvti_scratch_regs): Likewise.

gcc/testsuite/
PR target/115464
* gcc.target/aarch64/sve/acle/general/pr115464_2.c: New test.
---
 gcc/config/aarch64/aarch64-builtins.cc | 11 +--
 gcc/config/aarch64/aarch64-sve-builtins-base.cc|  2 +-
 gcc/config/aarch64/aarch64-sve-builtins-sme.cc |  2 +-
 gcc/config/aarch64/aarch64.cc  | 14 +-
 .../aarch64/sve/acle/general/pr115464_2.c  | 11 +++
 5 files changed, 23 insertions(+), 17 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr115464_2.c

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 7d827cbc2ac..30669f8aa18 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -2579,8 +2579,7 @@ aarch64_expand_fcmla_builtin (tree exp, rtx target, int 
fcode)
   int lane = INTVAL (lane_idx);
 
   if (lane < nunits / 4)
-op2 = simplify_gen_subreg (d->mode, op2, quadmode,
-  subreg_lowpart_offset (d->mode, quadmode));
+op2 = force_lowpart_subreg (d->mode, op2, quadmode);
   else
 {
   /* Select the upper 64 bits, either a V2SF or V4HF, this however
@@ -2590,8 +2589,7 @@ aarch64_expand_fcmla_builtin (tree exp, rtx target, int 
fcode)
 gen_highpart_mode generates code that isn't optimal.  */
   rtx temp1 = gen_reg_rtx (d->mode);
   rtx temp2 = gen_reg_rtx (DImode);
-  temp1 = simplify_gen_subreg (d->mode, op2, quadmode,
-  subreg_lowpart_offset (d->mode, quadmode));
+  temp1 = force_lowpart_subreg (d->mode, op2, quadmode);
   temp1 = force_subreg (V2DImode, temp1, d->mode, 0);
   if (BYTES_BIG_ENDIAN)
emit_insn (gen_aarch64_get_lanev2di (temp2, temp1, const0_rtx));
@@ -2836,7 +2834,7 @@ aarch64_expand_rwsr_builtin (tree exp, rtx target, int 
fcode)
case AARCH64_WSR64:
case AARCH64_WSRF64:
case AARCH64_WSR128:
- subreg = lowpart_subreg (sysreg_mode, input_val, mode);
+ subreg = force_lowpart_subreg (sysreg_mode, input_val, mode);
  break;
case AARCH64_WSRF:
  subreg = gen_lowpart_SUBREG (SImode, input_val);
@@ -2871,7 +2869,8 @@ aarch64_expand_rwsr_builtin (tree exp, rtx target, int 
fcode)
 case AARCH64_RSR64:
 case AARCH64_RSRF64:
 case AARCH64_RSR128:
-  return lowpart_subreg (TYPE_MODE (TREE_TYPE (exp)), target, sysreg_mode);
+  return force_lowpart_subreg (TYPE_MODE (TREE_TYPE (exp)),
+  target, sysreg_mode);
 case AARCH64_RSRF:
   subreg = gen_lowpart_SUBREG (SImode, target);
   return gen_lowpart_SUBREG (SFmode, subreg);
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index 99932037124..aa26370d397 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -1183,7 +1183,7 @@ public:
 if (BYTES_BIG_ENDIAN)
   return e.use_exact_insn (code_for_aarch64_sve_set_neonq (mode));
 insn_code icode = code_for_vcond_mask (mode, mode);
-e.args[1] = lowpart_subreg (mode, e.args[1], GET_MODE (e.args[1]));
+e.args[1] = force_lowpart_subreg (mode, e.args[1], GET_MODE (e.args[1]));
 e.add_output_operand (icode);
 e.add_input_operand (icode, e.args[1]);
 e.add_input_operand (icode, e.args[0]);
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-sme.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-sme.cc
index f4c91bcbb95..b66b35ae60b 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-sme.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-sme.cc
@@ -112,7 +112,7 @@ add_load_store_slice_operand (function_expander &e, 
insn_code icode,
   rtx base = e.args[argno];
   if (e.mode_suffix_id == MODE_vnum)
 {
-  rtx vnum = lowpart_subreg (SImode, e.args[vnum_argno], DImode);
+  rtx vnum = force_lowpart_subreg (SImode, e.args[vnum_argno], DImode);
   base = simplify_gen_binary (PLUS, SImode, base, vnum);
 }
   e.add_input_operand (icode, base);
diff --git a/gcc/config/aar

[PATCH 4/8] Add force_lowpart_subreg

2024-06-17 Thread Richard Sandiford
optabs had a local function called lowpart_subreg_maybe_copy
that is very similar to the lowpart version of force_subreg.
This patch adds a force_lowpart_subreg wrapper around
force_subreg and uses it in optabs.cc.

The only difference between the old and new functions is that
the old one asserted success while the new one doesn't.
It's common not to assert elsewhere when taking subregs;
normally a null result is enough.

Later patches will make more use of the new function.

gcc/
* explow.h (force_lowpart_subreg): Declare.
* explow.cc (force_lowpart_subreg): New function.
* optabs.cc (lowpart_subreg_maybe_copy): Delete.
(expand_absneg_bit): Use force_lowpart_subreg instead of
lowpart_subreg_maybe_copy.
(expand_copysign_bit): Likewise.
---
 gcc/explow.cc | 14 ++
 gcc/explow.h  |  1 +
 gcc/optabs.cc | 24 ++--
 3 files changed, 17 insertions(+), 22 deletions(-)

diff --git a/gcc/explow.cc b/gcc/explow.cc
index bd93c878064..2a91cf76ea6 100644
--- a/gcc/explow.cc
+++ b/gcc/explow.cc
@@ -764,6 +764,20 @@ force_subreg (machine_mode outermode, rtx op,
   return res;
 }
 
+/* Try to return an rvalue expression for the OUTERMODE lowpart of OP,
+   which has mode INNERMODE.  Allow OP to be forced into a new register
+   if necessary.
+
+   Return null on failure.  */
+
+rtx
+force_lowpart_subreg (machine_mode outermode, rtx op,
+ machine_mode innermode)
+{
+  auto byte = subreg_lowpart_offset (outermode, innermode);
+  return force_subreg (outermode, op, innermode, byte);
+}
+
 /* If X is a memory ref, copy its contents to a new temp reg and return
that reg.  Otherwise, return X.  */
 
diff --git a/gcc/explow.h b/gcc/explow.h
index cbd1fcb7eb3..dd654649b06 100644
--- a/gcc/explow.h
+++ b/gcc/explow.h
@@ -43,6 +43,7 @@ extern rtx copy_to_suggested_reg (rtx, rtx, machine_mode);
 extern rtx force_reg (machine_mode, rtx);
 
 extern rtx force_subreg (machine_mode, rtx, machine_mode, poly_uint64);
+extern rtx force_lowpart_subreg (machine_mode, rtx, machine_mode);
 
 /* Return given rtx, copied into a new temp reg if it was in memory.  */
 extern rtx force_not_mem (rtx);
diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index c54d275b8b7..d569742beea 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -3096,26 +3096,6 @@ expand_ffs (scalar_int_mode mode, rtx op0, rtx target)
   return 0;
 }
 
-/* Extract the OMODE lowpart from VAL, which has IMODE.  Under certain
-   conditions, VAL may already be a SUBREG against which we cannot generate
-   a further SUBREG.  In this case, we expect forcing the value into a
-   register will work around the situation.  */
-
-static rtx
-lowpart_subreg_maybe_copy (machine_mode omode, rtx val,
-  machine_mode imode)
-{
-  rtx ret;
-  ret = lowpart_subreg (omode, val, imode);
-  if (ret == NULL)
-{
-  val = force_reg (imode, val);
-  ret = lowpart_subreg (omode, val, imode);
-  gcc_assert (ret != NULL);
-}
-  return ret;
-}
-
 /* Expand a floating point absolute value or negation operation via a
logical operation on the sign bit.  */
 
@@ -3204,7 +3184,7 @@ expand_absneg_bit (enum rtx_code code, scalar_float_mode 
mode,
   gen_lowpart (imode, op0),
   immed_wide_int_const (mask, imode),
   gen_lowpart (imode, target), 1, OPTAB_LIB_WIDEN);
-  target = lowpart_subreg_maybe_copy (mode, temp, imode);
+  target = force_lowpart_subreg (mode, temp, imode);
 
   set_dst_reg_note (get_last_insn (), REG_EQUAL,
gen_rtx_fmt_e (code, mode, copy_rtx (op0)),
@@ -4043,7 +4023,7 @@ expand_copysign_bit (scalar_float_mode mode, rtx op0, rtx 
op1, rtx target,
 
   temp = expand_binop (imode, ior_optab, op0, op1,
   gen_lowpart (imode, target), 1, OPTAB_LIB_WIDEN);
-  target = lowpart_subreg_maybe_copy (mode, temp, imode);
+  target = force_lowpart_subreg (mode, temp, imode);
 }
 
   return target;
-- 
2.25.1



[PATCH 6/8] Make more use of force_lowpart_subreg

2024-06-17 Thread Richard Sandiford
This patch makes target-independent code use force_lowpart_subreg
instead of simplify_gen_subreg and lowpart_subreg in some places.
The criteria were:

(1) The code is obviously specific to expand (where new pseudos
can be created), or at least would be invalid to call when
!can_create_pseudo_p () and temporaries are needed.

(2) The value is obviously an rvalue rather than an lvalue.

Doing this should reduce the likelihood of bugs like PR115464
occuring in other situations.

gcc/
* builtins.cc (expand_builtin_issignaling): Use force_lowpart_subreg
instead of simplify_gen_subreg and lowpart_subreg.
* expr.cc (convert_mode_scalar, expand_expr_real_2): Likewise.
* optabs.cc (expand_doubleword_mod): Likewise.
---
 gcc/builtins.cc |  7 ++-
 gcc/expr.cc | 17 +
 gcc/optabs.cc   |  2 +-
 3 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 5b5307c67b8..bde517b639e 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -2940,8 +2940,7 @@ expand_builtin_issignaling (tree exp, rtx target)
  {
hi = simplify_gen_subreg (imode, temp, fmode,
  subreg_highpart_offset (imode, fmode));
-   lo = simplify_gen_subreg (imode, temp, fmode,
- subreg_lowpart_offset (imode, fmode));
+   lo = force_lowpart_subreg (imode, temp, fmode);
if (!hi || !lo)
  {
scalar_int_mode imode2;
@@ -2951,9 +2950,7 @@ expand_builtin_issignaling (tree exp, rtx target)
hi = simplify_gen_subreg (imode, temp2, imode2,
  subreg_highpart_offset (imode,
  imode2));
-   lo = simplify_gen_subreg (imode, temp2, imode2,
- subreg_lowpart_offset (imode,
-imode2));
+   lo = force_lowpart_subreg (imode, temp2, imode2);
  }
  }
if (!hi || !lo)
diff --git a/gcc/expr.cc b/gcc/expr.cc
index 31a7346e33f..ffbac513692 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -423,7 +423,8 @@ convert_mode_scalar (rtx to, rtx from, int unsignedp)
0).exists (&toi_mode))
{
  start_sequence ();
- rtx fromi = lowpart_subreg (fromi_mode, from, from_mode);
+ rtx fromi = force_lowpart_subreg (fromi_mode, from,
+   from_mode);
  rtx tof = NULL_RTX;
  if (fromi)
{
@@ -443,7 +444,7 @@ convert_mode_scalar (rtx to, rtx from, int unsignedp)
  NULL_RTX, 1);
  if (toi)
{
- tof = lowpart_subreg (to_mode, toi, toi_mode);
+ tof = force_lowpart_subreg (to_mode, toi, toi_mode);
  if (tof)
emit_move_insn (to, tof);
}
@@ -475,7 +476,7 @@ convert_mode_scalar (rtx to, rtx from, int unsignedp)
0).exists (&toi_mode))
{
  start_sequence ();
- rtx fromi = lowpart_subreg (fromi_mode, from, from_mode);
+ rtx fromi = force_lowpart_subreg (fromi_mode, from, from_mode);
  rtx tof = NULL_RTX;
  do
{
@@ -510,11 +511,11 @@ convert_mode_scalar (rtx to, rtx from, int unsignedp)
  temp4, shift, NULL_RTX, 1);
  if (!temp5)
break;
- rtx temp6 = lowpart_subreg (toi_mode, temp5, fromi_mode);
+ rtx temp6 = force_lowpart_subreg (toi_mode, temp5,
+   fromi_mode);
  if (!temp6)
break;
- tof = lowpart_subreg (to_mode, force_reg (toi_mode, temp6),
-   toi_mode);
+ tof = force_lowpart_subreg (to_mode, temp6, toi_mode);
  if (tof)
emit_move_insn (to, tof);
}
@@ -9784,9 +9785,9 @@ expand_expr_real_2 (const_sepops ops, rtx target, 
machine_mode tmode,
inner_mode = TYPE_MODE (inner_type);
 
  if (modifier == EXPAND_INITIALIZER)
-   op0 = lowpart_subreg (mode, op0, inner_mode);
+   op0 = force_lowpart_subreg (mode, op0, inner_mode);
  else
-   op0=  convert_modes (mode, inner_mode, op0,
+   op0 = convert_modes (mode, inner_mode, op0,
 TYPE_UNSIGNED (inner_type));
}
 
diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index d569742beea..1

[PATCH 8/8] aarch64: Add some uses of force_highpart_subreg

2024-06-17 Thread Richard Sandiford
This patch adds uses of force_highpart_subreg to places that
already use force_lowpart_subreg.

gcc/
* config/aarch64/aarch64.cc (aarch64_addti_scratch_regs): Use
force_highpart_subreg instead of gen_highpart and simplify_gen_subreg.
(aarch64_subvti_scratch_regs): Likewise.
---
 gcc/config/aarch64/aarch64.cc | 17 -
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index c952a7cdefe..026f8627a89 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -26873,19 +26873,12 @@ aarch64_addti_scratch_regs (rtx op1, rtx op2, rtx 
*low_dest,
   *low_in1 = force_lowpart_subreg (DImode, op1, TImode);
   *low_in2 = force_lowpart_subreg (DImode, op2, TImode);
   *high_dest = gen_reg_rtx (DImode);
-  *high_in1 = gen_highpart (DImode, op1);
-  *high_in2 = simplify_gen_subreg (DImode, op2, TImode,
-  subreg_highpart_offset (DImode, TImode));
+  *high_in1 = force_highpart_subreg (DImode, op1, TImode);
+  *high_in2 = force_highpart_subreg (DImode, op2, TImode);
 }
 
 /* Generate DImode scratch registers for 128-bit (TImode) subtraction.
 
-   This function differs from 'arch64_addti_scratch_regs' in that
-   OP1 can be an immediate constant (zero). We must call
-   subreg_highpart_offset with DImode and TImode arguments, otherwise
-   VOIDmode will be used for the const_int which generates an internal
-   error from subreg_size_highpart_offset which does not expect a size of zero.
-
OP1 represents the TImode destination operand 1
OP2 represents the TImode destination operand 2
LOW_DEST represents the low half (DImode) of TImode operand 0
@@ -26907,10 +26900,8 @@ aarch64_subvti_scratch_regs (rtx op1, rtx op2, rtx 
*low_dest,
   *low_in2 = force_lowpart_subreg (DImode, op2, TImode);
   *high_dest = gen_reg_rtx (DImode);
 
-  *high_in1 = simplify_gen_subreg (DImode, op1, TImode,
-  subreg_highpart_offset (DImode, TImode));
-  *high_in2 = simplify_gen_subreg (DImode, op2, TImode,
-  subreg_highpart_offset (DImode, TImode));
+  *high_in1 = force_highpart_subreg (DImode, op1, TImode);
+  *high_in2 = force_highpart_subreg (DImode, op2, TImode);
 }
 
 /* Generate RTL for 128-bit (TImode) subtraction with overflow.
-- 
2.25.1



[PATCH 7/8] Add force_highpart_subreg

2024-06-17 Thread Richard Sandiford
This patch adds a force_highpart_subreg to go along with the
recently added force_lowpart_subreg.

gcc/
* explow.h (force_highpart_subreg): Declare.
* explow.cc (force_highpart_subreg): New function.
* builtins.cc (expand_builtin_issignaling): Use it.
* expmed.cc (emit_store_flag_1): Likewise.
---
 gcc/builtins.cc | 15 ---
 gcc/explow.cc   | 14 ++
 gcc/explow.h|  1 +
 gcc/expmed.cc   |  4 +---
 4 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index bde517b639e..d467d1697b4 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -2835,9 +2835,7 @@ expand_builtin_issignaling (tree exp, rtx target)
 it is, working on the DImode high part is usually better.  */
  if (!MEM_P (temp))
{
- if (rtx t = simplify_gen_subreg (imode, temp, fmode,
-  subreg_highpart_offset (imode,
-  fmode)))
+ if (rtx t = force_highpart_subreg (imode, temp, fmode))
hi = t;
  else
{
@@ -2845,9 +2843,7 @@ expand_builtin_issignaling (tree exp, rtx target)
  if (int_mode_for_mode (fmode).exists (&imode2))
{
  rtx temp2 = gen_lowpart (imode2, temp);
- poly_uint64 off = subreg_highpart_offset (imode, imode2);
- if (rtx t = simplify_gen_subreg (imode, temp2,
-  imode2, off))
+ if (rtx t = force_highpart_subreg (imode, temp2, imode2))
hi = t;
}
}
@@ -2938,8 +2934,7 @@ expand_builtin_issignaling (tree exp, rtx target)
   it is, working on DImode parts is usually better.  */
if (!MEM_P (temp))
  {
-   hi = simplify_gen_subreg (imode, temp, fmode,
- subreg_highpart_offset (imode, fmode));
+   hi = force_highpart_subreg (imode, temp, fmode);
lo = force_lowpart_subreg (imode, temp, fmode);
if (!hi || !lo)
  {
@@ -2947,9 +2942,7 @@ expand_builtin_issignaling (tree exp, rtx target)
if (int_mode_for_mode (fmode).exists (&imode2))
  {
rtx temp2 = gen_lowpart (imode2, temp);
-   hi = simplify_gen_subreg (imode, temp2, imode2,
- subreg_highpart_offset (imode,
- imode2));
+   hi = force_highpart_subreg (imode, temp2, imode2);
lo = force_lowpart_subreg (imode, temp2, imode2);
  }
  }
diff --git a/gcc/explow.cc b/gcc/explow.cc
index 2a91cf76ea6..b4a0df89bc3 100644
--- a/gcc/explow.cc
+++ b/gcc/explow.cc
@@ -778,6 +778,20 @@ force_lowpart_subreg (machine_mode outermode, rtx op,
   return force_subreg (outermode, op, innermode, byte);
 }
 
+/* Try to return an rvalue expression for the OUTERMODE highpart of OP,
+   which has mode INNERMODE.  Allow OP to be forced into a new register
+   if necessary.
+
+   Return null on failure.  */
+
+rtx
+force_highpart_subreg (machine_mode outermode, rtx op,
+  machine_mode innermode)
+{
+  auto byte = subreg_highpart_offset (outermode, innermode);
+  return force_subreg (outermode, op, innermode, byte);
+}
+
 /* If X is a memory ref, copy its contents to a new temp reg and return
that reg.  Otherwise, return X.  */
 
diff --git a/gcc/explow.h b/gcc/explow.h
index dd654649b06..de89e9e2933 100644
--- a/gcc/explow.h
+++ b/gcc/explow.h
@@ -44,6 +44,7 @@ extern rtx force_reg (machine_mode, rtx);
 
 extern rtx force_subreg (machine_mode, rtx, machine_mode, poly_uint64);
 extern rtx force_lowpart_subreg (machine_mode, rtx, machine_mode);
+extern rtx force_highpart_subreg (machine_mode, rtx, machine_mode);
 
 /* Return given rtx, copied into a new temp reg if it was in memory.  */
 extern rtx force_not_mem (rtx);
diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index 1f68e7be721..3b9475f5aa0 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -5784,9 +5784,7 @@ emit_store_flag_1 (rtx target, enum rtx_code code, rtx 
op0, rtx op1,
  rtx op0h;
 
  /* If testing the sign bit, can just test on high word.  */
- op0h = simplify_gen_subreg (word_mode, op0, int_mode,
- subreg_highpart_offset (word_mode,
- int_mode));
+ op0h = force_highpart_subreg (word_mode, op0, int_mode);
  tem = emit_store_flag (NULL_RTX, code, op0h, op1, word_mode,
 unsignedp, normalizep);
}
-- 
2.25.1



Re: [PATCH] middle-end/114189 - drop uses of vcond{,u,eq}_optab

2024-06-17 Thread Andrew Stubbs

On 14/06/2024 11:31, Richard Biener wrote:

The following retires vcond{,u,eq} optabs by stopping to use them
from the middle-end.  Targets instead (should) implement vcond_mask
and vec_cmp{,u,eq} optabs.  The PR this change refers to lists
possibly affected targets - those implementing these patterns,
and in particular it lists mips, sparc and ia64 as targets that
most definitely will regress while others might simply remove
their vcond{,u,eq} patterns.

I'd appreciate testing, I do not expect fallout for x86 or arm/aarch64.
I know riscv doesn't implement any of the legacy optabs.  But less
maintained vector targets might need adjustments.

I want to get rid of those optabs for GCC 15.  If I don't hear from
you I will assume your target is fine.


Seems OK for GCN.

The GCN vcond patterns are expanded directly to vec_cmp/vcond_mask, so 
the set of supported operations should be identical.


Andrew



Re: [PATCH] libstdc++: Do not use memset _Hashtable buckets allocation

2024-06-17 Thread Jonathan Wakely
On Sat, 15 Jun 2024 at 14:04, François Dumont  wrote:
>
> Here is the simplified patch then.

The use of std::__to_address seems wrong.

The allocator returns a __buckets_ptr, and that function returns a
__buckets_ptr, so it should just be returned unchanged, not by
converting to a raw pointer with __to_address.


>
>  libstdc++: Do not use memset in _Hashtable buckets allocation
>
>  Using memset is incorrect if the __bucket_ptr type is non-trivial, or
>  does not use an all-zero bit pattern for its null value.
>
>  Replace the use of memset with std::__uinitialized_default_n to set the
>  pointers to nullptr. Doing so and corresponding std::_Destroy_n
> when deallocating
>  buckets.
>
>  libstdc++-v3/ChangeLog:
>
>  * include/bits/hashtable_policy.h
>  (_Hashtable_alloc::_M_allocate_buckets): Do not use memset
> to zero
>  out bucket pointers.
>  (_Hashtable_alloc::_M_deallocate_buckets): Add destroy of
> buckets.
>
> Tested under Linux x64, ok to commit ?
>
> François
>
> On 13/06/2024 20:58, Jonathan Wakely wrote:
> > On Thu, 13 Jun 2024 at 19:57, Jonathan Wakely  wrote:
> >> On Thu, 13 Jun 2024 at 18:40, François Dumont  wrote:
> >>> Hi
> >>>
> >>> Following your recent change here:
> >>>
> >>> https://gcc.gnu.org/pipermail/libstdc++/2024-June/058998.html
> >>>
> >>> I think we also need to fix the memset at bucket allocation level.
> >>>
> >>> I did it trying also to be more fancy pointer friendly by running
> >>> __uninitialized_default_n_a on the allocator returned pointer rather
> >>> than on the __to_address result. I wonder if an __uninitialized_fill_n_a
> >>> would have been better ? Doing so I also had to call std::_Destroy on
> >>> deallocation. Let me know if it is too early.
> >> You don't need the RAII guard. Initializing Alloc::pointer isn't
> >> allowed to throw exceptions:
> >>
> >> "An allocator type X shall meet the Cpp17CopyConstructible
> >> requirements (Table 32). The XX::pointer,
> >> XX::const_pointer, XX::void_pointer, and XX::const_void_pointer types
> >> shall meet the Cpp17Nullable-
> >> Pointer requirements (Table 36). No constructor, comparison operator
> >> function, copy operation, move
> >> operation, or swap operation on these pointer types shall exit via an
> >> exception."
> >>
> >> And you should not pass the allocator to the __uninitialized_xxx call,
> >> nor the _Destroy call. We don't want to use the allocator's
> >> construct/destroy members for those pointers. They are not container
> >> elements.
> >>
> >> I think either uninitialized_fill_n with nullptr or
> >> __uninitialized_default_n is fine. Not the _a forms taking an
> >> allocator though.
> > And I'd use _Destroy_n(_M_buckets, _M_bucket_count)
> >
> >
> >>> I also wonder if the compiler will be able to optimize it to a memset
> >>> call ? I'm interested to work on it if you confirm that it won't.
> >> It will do whatever is fastest, which might be memset or might be
> >> vectorized code to zero it out (which is probably what libc memset
> >> does too).
> >>
> >>> libstdc++: Do not use memset in _Hashtable buckets allocation
> >>>
> >>> Using memset is incorrect if the __bucket_ptr type is non-trivial, or
> >>> does not use an all-zero bit pattern for its null value.
> >>>
> >>> Replace the use of memset with std::__uinitialized_default_n_a to set the
> >>> pointers to nullptr. Doing so and corresponding std::_Destroy when
> >>> deallocating
> >>> buckets.
> >>>
> >>> libstdc++-v3/ChangeLog:
> >>>
> >>>   * include/bits/hashtable_policy.h
> >>>   (_Hashtable_alloc::_M_allocate_buckets): Do not use memset to zero
> >>>   out bucket pointers.
> >>>   (_Hashtable_alloc::_M_deallocate_buckets): Add destroy of buckets.
> >>>
> >>>
> >>> I hope you won't ask for copy rights on the changelog entry :-)
> >>>
> >>> Tested under Linux x64, ok to commit ?
> >>>
> >>> François



Re: [PATCH] libstdc++: Do not use memset _Hashtable buckets allocation

2024-06-17 Thread Jonathan Wakely
On Mon, 17 Jun 2024 at 11:18, Jonathan Wakely  wrote:
>
> On Sat, 15 Jun 2024 at 14:04, François Dumont  wrote:
> >
> > Here is the simplified patch then.
>
> The use of std::__to_address seems wrong.
>
> The allocator returns a __buckets_ptr, and that function returns a
> __buckets_ptr, so it should just be returned unchanged, not by
> converting to a raw pointer with __to_address.

It was already wrong, but we should fix that now not keep it wrong.

Using __to_address to get a pointer to pass to memset was correct. But
the result of the __to_address call was used to initialize another
__buckets_ptr variable. Which is what we already had before calling
__to_address.

It would have made sense like this:

  auto __ptr = __buckets_alloc_traits::allocate(__alloc, __bkt_count);
  auto* __p = std::__to_address(__ptr);
  __builtin_memset(__p, 0, __bkt_count * sizeof(__node_base_ptr));
  return __ptr;

i.e. __p should be a raw pointer (not a __buckets_ptr), and then it
should return __ptr not __p. But that isn't what we had.

Anyway, now that we're not using memset, we don't need any raw pointer
at all, so don't need std::__to_address at all.




>
>
> >
> >  libstdc++: Do not use memset in _Hashtable buckets allocation
> >
> >  Using memset is incorrect if the __bucket_ptr type is non-trivial, or
> >  does not use an all-zero bit pattern for its null value.
> >
> >  Replace the use of memset with std::__uinitialized_default_n to set the
> >  pointers to nullptr. Doing so and corresponding std::_Destroy_n
> > when deallocating
> >  buckets.
> >
> >  libstdc++-v3/ChangeLog:
> >
> >  * include/bits/hashtable_policy.h
> >  (_Hashtable_alloc::_M_allocate_buckets): Do not use memset
> > to zero
> >  out bucket pointers.
> >  (_Hashtable_alloc::_M_deallocate_buckets): Add destroy of
> > buckets.
> >
> > Tested under Linux x64, ok to commit ?
> >
> > François
> >
> > On 13/06/2024 20:58, Jonathan Wakely wrote:
> > > On Thu, 13 Jun 2024 at 19:57, Jonathan Wakely  wrote:
> > >> On Thu, 13 Jun 2024 at 18:40, François Dumont  
> > >> wrote:
> > >>> Hi
> > >>>
> > >>> Following your recent change here:
> > >>>
> > >>> https://gcc.gnu.org/pipermail/libstdc++/2024-June/058998.html
> > >>>
> > >>> I think we also need to fix the memset at bucket allocation level.
> > >>>
> > >>> I did it trying also to be more fancy pointer friendly by running
> > >>> __uninitialized_default_n_a on the allocator returned pointer rather
> > >>> than on the __to_address result. I wonder if an __uninitialized_fill_n_a
> > >>> would have been better ? Doing so I also had to call std::_Destroy on
> > >>> deallocation. Let me know if it is too early.
> > >> You don't need the RAII guard. Initializing Alloc::pointer isn't
> > >> allowed to throw exceptions:
> > >>
> > >> "An allocator type X shall meet the Cpp17CopyConstructible
> > >> requirements (Table 32). The XX::pointer,
> > >> XX::const_pointer, XX::void_pointer, and XX::const_void_pointer types
> > >> shall meet the Cpp17Nullable-
> > >> Pointer requirements (Table 36). No constructor, comparison operator
> > >> function, copy operation, move
> > >> operation, or swap operation on these pointer types shall exit via an
> > >> exception."
> > >>
> > >> And you should not pass the allocator to the __uninitialized_xxx call,
> > >> nor the _Destroy call. We don't want to use the allocator's
> > >> construct/destroy members for those pointers. They are not container
> > >> elements.
> > >>
> > >> I think either uninitialized_fill_n with nullptr or
> > >> __uninitialized_default_n is fine. Not the _a forms taking an
> > >> allocator though.
> > > And I'd use _Destroy_n(_M_buckets, _M_bucket_count)
> > >
> > >
> > >>> I also wonder if the compiler will be able to optimize it to a memset
> > >>> call ? I'm interested to work on it if you confirm that it won't.
> > >> It will do whatever is fastest, which might be memset or might be
> > >> vectorized code to zero it out (which is probably what libc memset
> > >> does too).
> > >>
> > >>> libstdc++: Do not use memset in _Hashtable buckets allocation
> > >>>
> > >>> Using memset is incorrect if the __bucket_ptr type is non-trivial, or
> > >>> does not use an all-zero bit pattern for its null value.
> > >>>
> > >>> Replace the use of memset with std::__uinitialized_default_n_a to set 
> > >>> the
> > >>> pointers to nullptr. Doing so and corresponding std::_Destroy when
> > >>> deallocating
> > >>> buckets.
> > >>>
> > >>> libstdc++-v3/ChangeLog:
> > >>>
> > >>>   * include/bits/hashtable_policy.h
> > >>>   (_Hashtable_alloc::_M_allocate_buckets): Do not use memset to zero
> > >>>   out bucket pointers.
> > >>>   (_Hashtable_alloc::_M_deallocate_buckets): Add destroy of buckets.
> > >>>
> > >>>
> > >>> I hope you won't ask for copy rights on the changelog entry :-)
> > >>>
> > >>> Tested under Linux x64, ok to commit ?
> > >>>
> > >>> François



Re: [PATCH 06/52] m2: Replace uses of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE

2024-06-17 Thread Kewen.Lin
on 2024/6/15 13:00, Gaius Mulley wrote:
> "Kewen.Lin"  writes:
> 
>> Hi Gaius,
>>
  static tree
  build_m2_short_real_node (void)
  {
 -  tree c;
 -
 -  /* Define `REAL'.  */
 -
 -  c = make_node (REAL_TYPE);
 -  TYPE_PRECISION (c) = FLOAT_TYPE_SIZE;
 -  layout_type (c);
 -  return c;
 +  /* Define `SHORTREAL'.  */
 +  layout_type (float_type_node);
>>>
>>> It looks that float_type_node, double_type_node, float128_type_node and
>>> long_double_type_node have been called with layout_type when they are
>>> being initialized in function build_common_tree_nodes, maybe we can just
>>> assert their TYPE_SIZE.
>>
>> I just noticed that latest trunk still has {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
>> in gcc/m2 and realized that my comment above was misleading, sorry about 
>> that.
>> It meant TYPE_SIZE (float_type_node) etc. instead of 
>> {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE,
>> as this patch series would like to get rid of 
>> {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE.
>>
>> I adjusted them as below patch, does this look good to you?
> 
> Hi Kewen,
> 
> ah yes indeed, lgtm,

Thanks Gaius!  Pushed as r15-1362-g96fe23eb8a9eba.

BR,
Kewen


[COMMITTED] Rename Value_Range to value_range.

2024-06-17 Thread Aldy Hernandez
Now that all remaining users of value_range have been renamed to
int_range<>, we can reclaim value_range as a temporary, thus removing
the annoying CamelCase.

gcc/ChangeLog:

* data-streamer-in.cc (streamer_read_value_range): Rename
Value_Range to value_range.
* data-streamer.h (streamer_read_value_range): Same.
* gimple-pretty-print.cc (dump_ssaname_info): Same.
* gimple-range-cache.cc (ssa_block_ranges::dump): Same.
(ssa_lazy_cache::merge): Same.
(block_range_cache::dump): Same.
(ssa_cache::merge_range): Same.
(ssa_cache::dump): Same.
(ranger_cache::edge_range): Same.
(ranger_cache::propagate_cache): Same.
(ranger_cache::fill_block_cache): Same.
(ranger_cache::resolve_dom): Same.
(ranger_cache::range_from_dom): Same.
(ranger_cache::register_inferred_value): Same.
* gimple-range-fold.cc (op1_range): Same.
(op2_range): Same.
(fold_relations): Same.
(fold_using_range::range_of_range_op): Same.
(fold_using_range::range_of_phi): Same.
(fold_using_range::range_of_call): Same.
(fold_using_range::condexpr_adjust): Same.
(fold_using_range::range_of_cond_expr): Same.
(fur_source::register_outgoing_edges): Same.
* gimple-range-fold.h (gimple_range_type): Same.
(gimple_range_ssa_p): Same.
* gimple-range-gori.cc (gori_compute::compute_operand_range): Same.
(gori_compute::logical_combine): Same.
(gori_compute::refine_using_relation): Same.
(gori_compute::compute_operand1_range): Same.
(gori_compute::compute_operand2_range): Same.
(gori_compute::compute_operand1_and_operand2_range): Same.
(gori_calc_operands): Same.
(gori_name_helper): Same.
* gimple-range-infer.cc (gimple_infer_range::check_assume_func): Same.
(gimple_infer_range::gimple_infer_range): Same.
(infer_range_manager::maybe_adjust_range): Same.
(infer_range_manager::add_range): Same.
* gimple-range-infer.h: Same.
* gimple-range-op.cc
(gimple_range_op_handler::gimple_range_op_handler): Same.
(gimple_range_op_handler::calc_op1): Same.
(gimple_range_op_handler::calc_op2): Same.
(gimple_range_op_handler::maybe_builtin_call): Same.
* gimple-range-path.cc (path_range_query::internal_range_of_expr): Same.
(path_range_query::ssa_range_in_phi): Same.
(path_range_query::compute_ranges_in_phis): Same.
(path_range_query::compute_ranges_in_block): Same.
(path_range_query::add_to_exit_dependencies): Same.
* gimple-range-trace.cc (debug_seed_ranger): Same.
* gimple-range.cc (gimple_ranger::range_of_expr): Same.
(gimple_ranger::range_on_entry): Same.
(gimple_ranger::range_on_edge): Same.
(gimple_ranger::range_of_stmt): Same.
(gimple_ranger::prefill_stmt_dependencies): Same.
(gimple_ranger::register_inferred_ranges): Same.
(gimple_ranger::register_transitive_inferred_ranges): Same.
(gimple_ranger::export_global_ranges): Same.
(gimple_ranger::dump_bb): Same.
(assume_query::calculate_op): Same.
(assume_query::calculate_phi): Same.
(assume_query::dump): Same.
(dom_ranger::range_of_stmt): Same.
* ipa-cp.cc (ipcp_vr_lattice::meet_with_1): Same.
(ipa_vr_operation_and_type_effects): Same.
(ipa_value_range_from_jfunc): Same.
(propagate_bits_across_jump_function): Same.
(propagate_vr_across_jump_function): Same.
(ipcp_store_vr_results): Same.
* ipa-cp.h: Same.
* ipa-fnsummary.cc (evaluate_conditions_for_known_args): Same.
(evaluate_properties_for_edge): Same.
* ipa-prop.cc (struct ipa_vr_ggc_hash_traits): Same.
(ipa_vr::get_vrange): Same.
(ipa_vr::streamer_read): Same.
(ipa_vr::streamer_write): Same.
(ipa_vr::dump): Same.
(ipa_set_jfunc_vr): Same.
(ipa_compute_jump_functions_for_edge): Same.
(ipcp_get_parm_bits): Same.
(ipcp_update_vr): Same.
(ipa_record_return_value_range): Same.
(ipa_return_value_range): Same.
* ipa-prop.h (ipa_return_value_range): Same.
(ipa_record_return_value_range): Same.
* range-op.h (range_cast): Same.
* tree-ssa-dom.cc
(dom_opt_dom_walker::set_global_ranges_from_unreachable_edges): Same.
(cprop_operand): Same.
* tree-ssa-loop-ch.cc (loop_static_stmt_p): Same.
* tree-ssa-loop-niter.cc (record_nonwrapping_iv): Same.
* tree-ssa-loop-split.cc (split_at_bb_p): Same.
* tree-ssa-phiopt.cc (value_replacement): Same.
* tree-ssa-strlen.cc (get_range): Same.
* tree-ssa-threadedge.cc (hybrid_jt_simplifier::simplify): Same.
(hybrid_jt_simplifier::compute_exit_dependencies): Same.
* tree-ssanames.cc (set_rang

[PATCH] c++, contracts: Ensure return statements on checkers.

2024-06-17 Thread Iain Sandoe
This is a minor tidy-up, tested on x86_64-darwin,
OK For trunk?
thanks
Iain

--- 8< ---

At present, for pre-conditions and for post-conditions with a void
return, we are not emitting a return statement. This patch adds the
relevant return statements.

gcc/cp/ChangeLog:

* contracts.cc (finish_function_contracts): Add return
statements to pre-condition and void post-cndition
checking functions.

Signed-off-by: Iain Sandoe 
---
 gcc/cp/contracts.cc | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/cp/contracts.cc b/gcc/cp/contracts.cc
index 634e3cf4fa9..0822624a910 100644
--- a/gcc/cp/contracts.cc
+++ b/gcc/cp/contracts.cc
@@ -2052,6 +2052,7 @@ finish_function_contracts (tree fndecl)
   DECL_PENDING_INLINE_P (pre) = false;
   start_preparsed_function (pre, DECL_ATTRIBUTES (pre), flags);
   remap_and_emit_conditions (fndecl, pre, PRECONDITION_STMT);
+  finish_return_stmt (NULL_TREE);
   tree finished_pre = finish_function (false);
   expand_or_defer_fn (finished_pre);
 }
@@ -2065,6 +2066,8 @@ finish_function_contracts (tree fndecl)
   remap_and_emit_conditions (fndecl, post, POSTCONDITION_STMT);
   if (!VOID_TYPE_P (TREE_TYPE (TREE_TYPE (post
finish_return_stmt (get_postcondition_result_parameter (fndecl));
+  else
+   finish_return_stmt (NULL_TREE);
 
   tree finished_post = finish_function (false);
   expand_or_defer_fn (finished_post);
-- 
2.39.2 (Apple Git-143)



[PATCH] c++, coroutines, contracts: Handle coroutine and void functions [PR110871, PR110872, PR115434].

2024-06-17 Thread Iain Sandoe
This patch came out of a discussion on Mattermost about how to deal
with contracts/coroutines integration.  Actually, it would also allow
some semantic checking to be deferred until the same spot - at which
time there are no dependent types, which can simplify the process.

NOTE: this is a fix for bugs in the existing '2a' contracts impl. it
does not attempt to make any of the changes required by P2900 to 
either code-gen or constexpr handling.

Tested on x86_64-darwin, so far, OK for trunk if testing succeeds on
x86_64/powerpc64 linux too?
thanks,
Iain

--- 8< ---

The current implementation of contracts emits the checks into function
bodies in three places; for pre-conditions at the start of the body,
for asserts in-line in the function body and for post-conditions as an
addition to return statements.

In general (at least with existing "2a" contract semantics) the in-line
contract asserts behave as expected.

However, the mechanism is not applicable to:

 * Handling pre conditions in coroutines since, for those, the standard
  specifies a wrapping of the original function body by functionality
  implementing initial and final suspends (along with some housekeeping
  to route exceptions).  Thus for such transformed function bodies, the
  preconditions then get actioned after the initial suspend, which does
  not behave as intended.

  * Handling post conditions in functions that do not have return
statements (which applies to coroutines and void functions).

In the following, we identify a potentially transformed function body
(in the case of coroutines, this is usually called the "ramp()" function).

The patch here re-implements the code insertion in one of the two following
ways (code for exposition only):

  * For functions with no post-conditions we wrap the potentially
transformed function as follows:

  {
 handle_pre_condition_checking ();
 potentially_transformed_function_body ();
  }

  This implements the intent that the preconditions are processed after
  the function parameters are initialised but before any other actions.

  * For functions with post-conditions:

  try
   {
 if (preconditions_exist)
   handle_pre_condition_checking ();
 potentially_transformed_function_body ();
   }
  finally
   {
 handle_post_condition_checking ();
   }
  else [only if the function is not marked noexcept(true) ]
   {
 __rethrow ();
   }

In this, post-conditions [that might apply to the return value etc.]
are evaluated on every non-exceptional edge out of the function.

At present, the model here is that exceptions thrown by the function
propagate upwards as if there were no contracts present.  If the desired
semantic becomes that an exception is counted as equivalent to a contract
violation - then we can add a second handler in place of the rethrow.

At constexpr time we need to evaluate the contract conditions, but not
the exceptional path, which is handled by a flag on the EH_ELSE_EXPR that
indicates it is in use for contract handling.

This patch specifically does not address changes to code-gen and constexpr
handling that are contained in P2900.

PR c++/115434
PR c++/110871
PR c++/110872

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_constant_expression): Handle EH_ELSE_EXPR.
* contracts.cc (finish_contract_attribute): Remove excess line.
(build_contract_condition_function): Post condition handlers are
void now.
(emit_postconditions_cleanup): Remove.
(emit_postconditions): New.
(add_pre_condition_fn_call): New.
(add_post_condition_fn_call): New.
(apply_preconditions): New.
(apply_postconditions): New.
(maybe_apply_function_contracts): New.
(apply_postcondition_to_return): Remove.
* contracts.h (apply_postcondition_to_return): Remove.
(maybe_apply_function_contracts): Add.
* coroutines.cc (coro_build_actor_or_destroy_function): Do not
copy contracts to coroutine helpers.
* cp-tree.h (CONTRACT_EH_ELSE_P): New.
* decl.cc (finish_function): Handle wrapping a possibly
transformed function body in contract checks.
* typeck.cc (check_return_expr): Remove handling of post
conditions on return expressions.

gcc/ChangeLog:

* gimplify.cc (struct gimplify_ctx): Add a flag to show we are
expending a handler.
(gimplify_expr): When we are expanding a handler, and the body
transforms might have re-written DECL_RESULT into a gimple var,
ensure that hander references to DECL_RESULT are also re-written
to refer to the gimple var.

gcc/testsuite/ChangeLog:

* g++.dg/contracts/pr115434.C: New test.
* g++.dg/coroutines/pr110871.C: New test.
* g++.dg/coroutines/pr110872.C: New test.

Signed-off-by: Iain Sandoe 
---
 gcc/cp/constexpr.cc|  16 ++
 gcc/cp/contracts.cc| 249 

[PATCH] tree-optimization/115508 - fix ICE with SLP scheduling and extern vector

2024-06-17 Thread Richard Biener
When there's a permute after an extern vector we can run into a case
that didn't consider the scheduled node being a permute which lacks
a representative.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/115508
* tree-vect-slp.cc (vect_schedule_slp_node): Guard check on
representative.

* gcc.target/i386/pr115508.c: New testcase.
---
 gcc/testsuite/gcc.target/i386/pr115508.c | 15 +++
 gcc/tree-vect-slp.cc |  1 +
 2 files changed, 16 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr115508.c

diff --git a/gcc/testsuite/gcc.target/i386/pr115508.c 
b/gcc/testsuite/gcc.target/i386/pr115508.c
new file mode 100644
index 000..a97b2007f7a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr115508.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=znver1" } */
+
+typedef long long v4di __attribute__((vector_size(4 * sizeof (long long;
+
+v4di vec_var;
+extern long long array1[];
+long long g(void)
+{
+  int total_error_4 = 0;
+  total_error_4 += array1 [0] + array1 [1] + array1 [2] + array1 [3];
+  v4di t = vec_var;
+  long long iorvar = t [1] | t [0] | t [2] | t [3];
+  return iorvar + total_error_4;
+}
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 38e7fadb679..6ef04b14dd8 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -9674,6 +9674,7 @@ vect_schedule_slp_node (vec_info *vinfo,
  si = gsi_after_labels (vinfo->bbs[0]);
}
   else if (is_a  (vinfo)
+  && SLP_TREE_CODE (node) != VEC_PERM_EXPR
   && gimple_bb (last_stmt) != gimple_bb (stmt_info->stmt)
   && gimple_could_trap_p (stmt_info->stmt))
{
-- 
2.35.3


Re: [RFC PATCH] ARM: thumb1: Use LDMIA/STMIA for DI/DF loads/stores

2024-06-17 Thread Richard Earnshaw (lists)
Hi Siarahei,

On 16/06/2024 09:51, Siarhei Volkau wrote:
> If the address register is dead after load/store operation it looks
> beneficial to use LDMIA/STMIA instead of pair of LDR/STR instructions,
> at least if optimizing for size.
> 
> E.g.
>  ldr r0, [r3, #0]
>  ldr r1, [r3, #4]  @ r3 is dead after
> will be replaced by
>  ldmia r3!, {r0, r1}
> 
> also for reused reg is legal to:
>  ldr r2, [r3, #0]
>  ldr r3, [r3, #4] @ r3 reused
> will be replaced by
>  ldmia r3, {r2, r3}
> 
> However, I know little about other thumb CPUs except Cortex M0/M0+.
> 1. Is there any drawbacks if optimizing speed?
> 2. Might it be profitable for thumb2?

I like the idea behind this patch, but I think I'd try first doing this as a 
peephole2 rule to rewrite the address in this case.  That has the additional 
advantage that we then estimate the size of the instruction more accurately.  

I think it would then be easy to extend this to thumb2 as well if it looks like 
a win (perhaps only for -Os in the thumb2 case).


> 
> Regarding code size with the patch gives for v6-m/nofp:
>libgcc:  -52 bytes / -0.10%
> Newlib's libc:  -68 bytes / -0.03%
>  libm:  -96 bytes / -0.10%
> libstdc++: -140 bytes / -0.02%
> 
> Also I have questions regarding testing the patch.
> It's obscure how to do it properly, for now I compile
> for arm-none-eabi target and make check seems failing
> on any compilable test due to missing symbols from libnosys.
> I guess that arm-gnu-elf is the correct triple but it still
> advisable for proper commands to make & run the testsuite.

For testing, I'd start with something like 
gcc/testsuite/gcc.target/arm/thumb-andsi.c as a template and adapt that for 
your specific case.  Matching something like "ldmia\tr[0-7]!," should be enough.

R.

> 
> Signed-off-by: Siarhei Volkau 
> ---
>  gcc/config/arm/arm-protos.h |  2 +-
>  gcc/config/arm/arm.cc   |  7 ++-
>  gcc/config/arm/thumb1.md| 10 --
>  3 files changed, 15 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
> index 2cd560c9925..548bfbaccdc 100644
> --- a/gcc/config/arm/arm-protos.h
> +++ b/gcc/config/arm/arm-protos.h
> @@ -254,7 +254,7 @@ extern int thumb_shiftable_const (unsigned HOST_WIDE_INT);
>  extern enum arm_cond_code maybe_get_arm_condition_code (rtx);
>  extern void thumb1_final_prescan_insn (rtx_insn *);
>  extern void thumb2_final_prescan_insn (rtx_insn *);
> -extern const char *thumb_load_double_from_address (rtx *);
> +extern const char *thumb_load_double_from_address (rtx *, rtx_insn *);
>  extern const char *thumb_output_move_mem_multiple (int, rtx *);
>  extern const char *thumb_call_via_reg (rtx);
>  extern void thumb_expand_cpymemqi (rtx *);
> diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
> index b8c32db0a1d..73c2478ed77 100644
> --- a/gcc/config/arm/arm.cc
> +++ b/gcc/config/arm/arm.cc
> @@ -28350,7 +28350,7 @@ thumb1_output_interwork (void)
> a computed memory address.  The computed address may involve a
> register which is overwritten by the load.  */
>  const char *
> -thumb_load_double_from_address (rtx *operands)
> +thumb_load_double_from_address (rtx *operands, rtx_insn *insn)
>  {
>rtx addr;
>rtx base;
> @@ -28368,6 +28368,11 @@ thumb_load_double_from_address (rtx *operands)
>switch (GET_CODE (addr))
>  {
>  case REG:
> +  if (find_reg_note (insn, REG_DEAD, addr))
> +return "ldmia\t%m1!, {%0, %H0}";
> +  else if (REGNO (addr) == REGNO (operands[0]) + 1)
> +return "ldmia\t%m1, {%0, %H0}";
> +
>operands[2] = adjust_address (operands[1], SImode, 4);
>  
>if (REGNO (operands[0]) == REGNO (addr))
> diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
> index d7074b43f60..8da6887b560 100644
> --- a/gcc/config/arm/thumb1.md
> +++ b/gcc/config/arm/thumb1.md
> @@ -637,8 +637,11 @@
>  case 5:
>return \"stmia\\t%0, {%1, %H1}\";
>  case 6:
> -  return thumb_load_double_from_address (operands);
> +  return thumb_load_double_from_address (operands, insn);
>  case 7:
> +  if (MEM_P (operands[0]) && REG_P (XEXP (operands[0], 0))
> +  && find_reg_note (insn, REG_DEAD, XEXP (operands[0], 0)))
> +return \"stmia\\t%m0!, {%1, %H1}\";
>operands[2] = gen_rtx_MEM (SImode,
>plus_constant (Pmode, XEXP (operands[0], 0), 4));
>output_asm_insn (\"str\\t%1, %0\;str\\t%H1, %2\", operands);
> @@ -970,8 +973,11 @@
>  case 2:
>return \"stmia\\t%0, {%1, %H1}\";
>  case 3:
> -  return thumb_load_double_from_address (operands);
> +  return thumb_load_double_from_address (operands, insn);
>  case 4:
> +  if (MEM_P (operands[0]) && REG_P (XEXP (operands[0], 0))
> +  && find_reg_note (insn, REG_DEAD, XEXP (operands[0], 0)))
> +return \"stmia\\t%m0!, {%1, %H1}\";
>operands[2] = gen_rtx_MEM (SImode,
>plus_const

RE: [PATCH v3] aarch64: Add vector popcount besides QImode [PR113859]

2024-06-17 Thread Tamar Christina
Hi,

> -Original Message-
> From: Pengxuan Zheng 
> Sent: Friday, June 14, 2024 12:57 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Pengxuan Zheng 
> Subject: [PATCH v3] aarch64: Add vector popcount besides QImode [PR113859]
> 
> This patch improves GCC’s vectorization of __builtin_popcount for aarch64 
> target
> by adding popcount patterns for vector modes besides QImode, i.e., HImode,
> SImode and DImode.
> 
> With this patch, we now generate the following for V8HI:
>   cnt v1.16b, v.16b
>   uaddlp  v2.8h, v1.16b
> 
> For V4HI, we generate:
>   cnt v1.8b, v.8b
>   uaddlp  v2.4h, v1.8b
> 
> For V4SI, we generate:
>   cnt v1.16b, v.16b
>   uaddlp  v2.8h, v1.16b
>   uaddlp  v3.4s, v2.8h
> 
> For V2SI, we generate:
>   cnt v1.8b, v.8b
>   uaddlp  v2.4h, v1.8b
>   uaddlp  v3.2s, v2.4h
> 
> For V2DI, we generate:
>   cnt v1.16b, v.16b
>   uaddlp  v2.8h, v1.16b
>   uaddlp  v3.4s, v2.8h
>   uaddlp  v4.2d, v3.4s

Nice patch!  We can do better for these sequences though. Would you instead 
consider using udot with a 0 accumulator and 1 multiplicatent.

Essentially
movi v0.16b, #0
movi v1.16b, #1
cnt v3.16b, v2.16b
udot  v0.4s, v3.16b, v1.16b

this has 1 instruction less on the critical path so should be half the latency 
of the uaddlp variants.

For the DI case you'll still need a final uaddlp.

Cheers,
Tamar

> 
>   PR target/113859
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64-simd.md (aarch64_addlp):
> Rename to...
>   (@aarch64_addlp): ... This.
>   (popcount2): New define_expand.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/popcnt-vec.c: New test.
> 
> Signed-off-by: Pengxuan Zheng 
> ---
>  gcc/config/aarch64/aarch64-simd.md| 28 +++-
>  gcc/testsuite/gcc.target/aarch64/popcnt-vec.c | 69 +++
>  2 files changed, 96 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt-vec.c
> 
> diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-
> simd.md
> index 0bb39091a38..ee73e13534b 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -3461,7 +3461,7 @@ (define_insn
> "*aarch64_addlv_ze"
>[(set_attr "type" "neon_reduc_add")]
>  )
> 
> -(define_expand "aarch64_addlp"
> +(define_expand "@aarch64_addlp"
>[(set (match_operand: 0 "register_operand")
>   (plus:
> (vec_select:
> @@ -3517,6 +3517,32 @@ (define_insn "popcount2"
>[(set_attr "type" "neon_cnt")]
>  )
> 
> +(define_expand "popcount2"
> +  [(set (match_operand:VDQHSD 0 "register_operand")
> +(popcount:VDQHSD (match_operand:VDQHSD 1 "register_operand")))]
> +  "TARGET_SIMD"
> +  {
> +/* Generate a byte popcount. */
> +machine_mode mode =  == 64 ? V8QImode : V16QImode;
> +rtx tmp = gen_reg_rtx (mode);
> +auto icode = optab_handler (popcount_optab, mode);
> +emit_insn (GEN_FCN (icode) (tmp, gen_lowpart (mode, operands[1])));
> +
> +/* Use a sequence of UADDLPs to accumulate the counts. Each step doubles
> +   the element size and halves the number of elements. */
> +do
> +  {
> +auto icode = code_for_aarch64_addlp (ZERO_EXTEND, GET_MODE (tmp));
> +mode = insn_data[icode].operand[0].mode;
> +rtx dest = mode == mode ? operands[0] : gen_reg_rtx (mode);
> +emit_insn (GEN_FCN (icode) (dest, tmp));
> +tmp = dest;
> +  }
> +while (mode != mode);
> +DONE;
> +  }
> +)
> +
>  ;; 'across lanes' max and min ops.
> 
>  ;; Template for outputting a scalar, so we can create __builtins which can be
> diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt-vec.c
> b/gcc/testsuite/gcc.target/aarch64/popcnt-vec.c
> new file mode 100644
> index 000..0c4926d7ca8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/popcnt-vec.c
> @@ -0,0 +1,69 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-vect-cost-model" } */
> +
> +/* This function should produce cnt v.16b. */
> +void
> +bar (unsigned char *__restrict b, unsigned char *__restrict d)
> +{
> +  for (int i = 0; i < 1024; i++)
> +d[i] = __builtin_popcount (b[i]);
> +}
> +
> +/* This function should produce cnt v.16b and uaddlp (Add Long Pairwise). */
> +void
> +bar1 (unsigned short *__restrict b, unsigned short *__restrict d)
> +{
> +  for (int i = 0; i < 1024; i++)
> +d[i] = __builtin_popcount (b[i]);
> +}
> +
> +/* This function should produce cnt v.16b and 2 uaddlp (Add Long Pairwise). 
> */
> +void
> +bar2 (unsigned int *__restrict b, unsigned int *__restrict d)
> +{
> +  for (int i = 0; i < 1024; i++)
> +d[i] = __builtin_popcount (b[i]);
> +}
> +
> +/* This function should produce cnt v.16b and 3 uaddlp (Add Long Pairwise). 
> */
> +void
> +bar3 (unsigned long long *__restrict b, unsigned long long *__restrict d)
> +{
> +  for (int i = 0; i < 1024; i++)
> +d[i] = __builtin_popcountll (b[i]);
> +}
> +
> +/* SLP
> +   This function should produce cnt v.8b and uaddlp (Add Long Pairwise)

Ping^2 [PATCHv5] Optab: add isnormal_optab for __builtin_isnormal

2024-06-17 Thread HAO CHEN GUI
Hi,
  Gently ping it.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653001.html

Thanks
Gui Haochen

在 2024/6/3 10:37, HAO CHEN GUI 写道:
> Hi,
>   All issues were addressed. Gently ping it.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653001.html
> 
> Thanks
> Gui Haochen
> 
> 
> 在 2024/5/29 14:36, HAO CHEN GUI 写道:
>> Hi,
>>   This patch adds an optab for __builtin_isnormal. The normal check can be
>> implemented on rs6000 by a single instruction. It needs an optab to be
>> expanded to the certain sequence of instructions.
>>
>>   The subsequent patches will implement the expand on rs6000.
>>
>>   Compared to previous version, the main change is to specify return
>> value of the optab should be either 0 or 1.
>> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652865.html
>>
>>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
>> regressions. Is this OK for trunk?
>>
>> Thanks
>> Gui Haochen
>>
>> ChangeLog
>> optab: Add isnormal_optab for isnormal builtin
>>
>> gcc/
>>  * builtins.cc (interclass_mathfn_icode): Set optab to isnormal_optab
>>  for isnormal builtin.
>>  * optabs.def (isnormal_optab): New.
>>  * doc/md.texi (isnormal): Document.
>>
>>
>> patch.diff
>> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
>> index 53e9d210541..89ba56abf17 100644
>> --- a/gcc/builtins.cc
>> +++ b/gcc/builtins.cc
>> @@ -2463,6 +2463,8 @@ interclass_mathfn_icode (tree arg, tree fndecl)
>>builtin_optab = isfinite_optab;
>>break;
>>  case BUILT_IN_ISNORMAL:
>> +  builtin_optab = isnormal_optab;
>> +  break;
>>  CASE_FLT_FN (BUILT_IN_FINITE):
>>  case BUILT_IN_FINITED32:
>>  case BUILT_IN_FINITED64:
>> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
>> index 3eb4216141e..4fd7da095fe 100644
>> --- a/gcc/doc/md.texi
>> +++ b/gcc/doc/md.texi
>> @@ -8563,6 +8563,12 @@ Return 1 if operand 1 is a finite floating point 
>> number and 0
>>  otherwise.  @var{m} is a scalar floating point mode.  Operand 0
>>  has mode @code{SImode}, and operand 1 has mode @var{m}.
>>
>> +@cindex @code{isnormal@var{m}2} instruction pattern
>> +@item @samp{isnormal@var{m}2}
>> +Return 1 if operand 1 is a normal floating point number and 0
>> +otherwise.  @var{m} is a scalar floating point mode.  Operand 0
>> +has mode @code{SImode}, and operand 1 has mode @var{m}.
>> +
>>  @end table
>>
>>  @end ifset
>> diff --git a/gcc/optabs.def b/gcc/optabs.def
>> index dcd77315c2a..3c401fc0b4c 100644
>> --- a/gcc/optabs.def
>> +++ b/gcc/optabs.def
>> @@ -353,6 +353,7 @@ OPTAB_D (hypot_optab, "hypot$a3")
>>  OPTAB_D (ilogb_optab, "ilogb$a2")
>>  OPTAB_D (isinf_optab, "isinf$a2")
>>  OPTAB_D (isfinite_optab, "isfinite$a2")
>> +OPTAB_D (isnormal_optab, "isnormal$a2")
>>  OPTAB_D (issignaling_optab, "issignaling$a2")
>>  OPTAB_D (ldexp_optab, "ldexp$a3")
>>  OPTAB_D (log10_optab, "log10$a2")


[pushed] doc: Mark up __cxa_atexit as @code.

2024-06-17 Thread Gerald Pfeifer
Pushed. (The diff is a bit larger due to line breaks.)

Gerald

gcc:
* doc/install.texi (Configuration): Mark up __cxa_atexit as @code.
---
 gcc/doc/install.texi | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 298031dc2de..1774a010889 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -1779,12 +1779,12 @@ Produce code conforming to version 20191213.
 In the absence of this configuration option the default version is 20191213.
 
 @item --enable-__cxa_atexit
-Define if you want to use __cxa_atexit, rather than atexit, to
+Define if you want to use @code{__cxa_atexit}, rather than atexit, to
 register C++ destructors for local statics and global objects.
 This is essential for fully standards-compliant handling of
-destructors, but requires __cxa_atexit in libc.  This option is currently
-only available on systems with GNU libc.  When enabled, this will cause
-@option{-fuse-cxa-atexit} to be passed by default.
+destructors, but requires @code{__cxa_atexit} in libc.  This option is
+currently only available on systems with GNU libc.  When enabled, this
+will cause @option{-fuse-cxa-atexit} to be passed by default.
 
 @item --enable-gnu-indirect-function
 Define if you want to enable the @code{ifunc} attribute.  This option is
-- 
2.45.2


[to-be-committed][RISC-V] Handle zero_extract destination for single bit insertions

2024-06-17 Thread Jeff Law
Combine will use zero_extract destinations for certain bitfield 
insertions.  If the bitfield is a single bit constant, then we can use 
bset/bclr.


In this case we are only dealing with word_mode objects, so we don't 
have to worry about the SI->DI extension issues for TARGET_64BIT.


The testcase was derived from 502.gcc in spec from the RAU team.


An earlier version of this (TARGET_64BIT only) went through Ventana's CI 
system.  This version has gone though mine after generalizing it to 
handle rv32 as well.  I'll wait for pre-commit CI to render its verdict 
before moving forward.


Jeff


diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 311f0d373c0..c6bd55c53f9 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -654,6 +654,18 @@ (define_split
  (any_or:DI (ashift:DI (const_int 1) (match_dup 1))
(match_dup 3)))])
 
+;; Yet another form of a bset/bclr that can be created by combine.
+(define_insn "*bsetclr_zero_extract"
+  [(set (zero_extract:X (match_operand:X 0 "register_operand" "+r")
+   (const_int 1)
+   (zero_extend:X (match_operand:QI 1 "register_operand" 
"r")))
+   (match_operand 2 "immediate_operand" "n"))]
+  "TARGET_ZBS
+   && (operands[2] == CONST0_RTX (mode)
+   || operands[2] == CONST1_RTX (mode))"
+  { return operands[2] == CONST0_RTX (mode) ? "bclr\t%0,%0,%1" : 
"bset\t%0,%0,%1"; }
+  [(set_attr "type" "bitmanip")])
+
 (define_insn "*bclr"
   [(set (match_operand:X 0 "register_operand" "=r")
(and:X (rotate:X (const_int -2)
diff --git a/gcc/testsuite/gcc.target/riscv/zbs-zext-3.c 
b/gcc/testsuite/gcc.target/riscv/zbs-zext-3.c
new file mode 100644
index 000..0239014e06b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbs-zext-3.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zba_zbb_zbs -mabi=lp64d" { target { rv64 } } } 
*/
+/* { dg-options "-march=rv32gc_zba_zbb_zbs -mabi=ilp32" { target { rv32 } } } 
*/
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
+
+/* We need to adjust the constant so this works for rv32 and rv64.  */
+#if __riscv_xlen == 32
+#define ONE 1U
+#else
+#define ONE 1ULL
+#endif
+
+void add_to_hard_reg_set(long long *a, unsigned int count) {
+  int i = 0;
+  while(i++ < count)
+*a |= (1U << i);
+}
+
+void remove_from_hard_reg_set(long long *a, unsigned int count) {
+  int i = 0;
+  while(i++ < count)
+*a &= ~(ONE << i);
+}
+
+
+/* { dg-final { scan-assembler-not "and\t" } } */
+/* { dg-final { scan-assembler-not "andn\t" } } */


Patch ping

2024-06-17 Thread Jakub Jelinek
Hi!

I'd like to ping the
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653573.html
patch.  While the committed and backported patch fixed PCH on PIE
cc1/cc1plus etc. on PowerPC, it grew up the size of the
rs6000_init_generated_builtins function quite a lot.
The above patch decreases it back, to even less than the size of
the function before my fix.

Jakub



[PATCH] tree-optimization/115493 - fix wrong code with SLP induction cond reduction

2024-06-17 Thread Richard Biener
The following fixes a bad final value being used when doing single-lane
SLP integer induction cond reduction vectorization.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

PR tree-optimization/115493
* tree-vect-loop.cc (vect_create_epilog_for_reduction): Use
the first scalar result.
---
 gcc/tree-vect-loop.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index d9a2ad69484..7c79e9da106 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -6843,8 +6843,8 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
 with the original initial value, unless induc_val is
 the same as initial_def already.  */
  tree zcompare = make_ssa_name (boolean_type_node);
- epilog_stmt = gimple_build_assign (zcompare, EQ_EXPR, new_temp,
-induc_val);
+ epilog_stmt = gimple_build_assign (zcompare, EQ_EXPR,
+scalar_results[0], induc_val);
  gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
  tree initial_def = reduc_info->reduc_initial_values[0];
  tree tmp = make_ssa_name (new_scalar_dest);
-- 
2.35.3


[PATCH][v2] Enhance if-conversion for automatic arrays

2024-06-17 Thread Richard Biener
Automatic arrays that are not address-taken should not be subject to
store data races.  This applies to OMP SIMD in-branch lowered
functions result array which for the testcase otherwise prevents
vectorization with SSE and for AVX and AVX512 ends up with spurious
.MASK_STORE to the stack surviving.

This inefficiency was noted in PR111793.

I've introduced ref_can_have_store_data_races, commonizing uses
of flag_store_data_races in if-conversion, cselim and store motion.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

PR tree-optimization/111793
* tree-ssa-alias.h (ref_can_have_store_data_races): Declare.
* tree-ssa-alias.cc (ref_can_have_store_data_races): New
function.
* tree-if-conv.cc (ifcvt_memrefs_wont_trap): Use
ref_can_have_store_data_races to allow more unconditional
stores.
* tree-ssa-loop-im.cc (execute_sm): Likewise.
* tree-ssa-phiopt.cc (cond_store_replacement): Likewise.

* gcc.dg/vect/vect-simd-clone-21.c: New testcase.
---
 .../gcc.dg/vect/vect-simd-clone-21.c  | 16 
 gcc/tree-if-conv.cc   | 11 +--
 gcc/tree-ssa-alias.cc | 19 +++
 gcc/tree-ssa-alias.h  |  2 ++
 gcc/tree-ssa-loop-im.cc   |  2 +-
 gcc/tree-ssa-phiopt.cc|  4 +---
 6 files changed, 44 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-21.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-21.c 
b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-21.c
new file mode 100644
index 000..49c52fb59bd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-21.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd" } */
+
+#pragma omp declare simd simdlen(4) inbranch
+__attribute__((noinline)) int
+foo (int a, int b)
+{
+  return a + b;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" { target 
i?86-*-* x86_64-*-* } } } */
+/* if-conversion shouldn't need to resort to masked stores for the result
+   array created by OMP lowering since that's automatic and does not have
+   its address taken.  */
+/* { dg-final { scan-tree-dump-not "MASK_STORE" "vect" } } */
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index c4c3ed41a44..57992b6deca 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -936,12 +936,11 @@ ifcvt_memrefs_wont_trap (gimple *stmt, 
vec drs)
 
   /* an unconditionaly write won't trap if the base is written
  to unconditionally.  */
-  if (base_master_dr
- && DR_BASE_W_UNCONDITIONALLY (*base_master_dr))
-   return flag_store_data_races;
-  /* or the base is known to be not readonly.  */
-  else if (base_object_writable (DR_REF (a)))
-   return flag_store_data_races;
+  if ((base_master_dr
+  && DR_BASE_W_UNCONDITIONALLY (*base_master_dr))
+ /* or the base is known to be not readonly.  */
+ || base_object_writable (DR_REF (a)))
+   return !ref_can_have_store_data_races (base);
 }
 
   return false;
diff --git a/gcc/tree-ssa-alias.cc b/gcc/tree-ssa-alias.cc
index 1a91d63a31e..fab048b0b59 100644
--- a/gcc/tree-ssa-alias.cc
+++ b/gcc/tree-ssa-alias.cc
@@ -3704,6 +3704,25 @@ stmt_kills_ref_p (gimple *stmt, tree ref)
   return stmt_kills_ref_p (stmt, &r);
 }
 
+/* Return whether REF can be subject to store data races.  */
+
+bool
+ref_can_have_store_data_races (tree ref)
+{
+  /* With -fallow-store-data-races do not care about them.  */
+  if (flag_store_data_races)
+return false;
+
+  tree base = get_base_address (ref);
+  if (auto_var_p (base)
+  && ! may_be_aliased (base))
+/* Automatic variables not aliased are not subject to
+   data races.  */
+return false;
+
+  return true;
+}
+
 
 /* Walk the virtual use-def chain of VUSE until hitting the virtual operand
TARGET or a statement clobbering the memory reference REF in which
diff --git a/gcc/tree-ssa-alias.h b/gcc/tree-ssa-alias.h
index 5cd64e72295..5834533ae9c 100644
--- a/gcc/tree-ssa-alias.h
+++ b/gcc/tree-ssa-alias.h
@@ -144,6 +144,8 @@ extern bool call_may_clobber_ref_p (gcall *, tree, bool = 
true);
 extern bool call_may_clobber_ref_p_1 (gcall *, ao_ref *, bool = true);
 extern bool stmt_kills_ref_p (gimple *, tree);
 extern bool stmt_kills_ref_p (gimple *, ao_ref *);
+extern bool ref_can_have_store_data_races (tree);
+
 enum translate_flags
   { TR_TRANSLATE, TR_VALUEIZE_AND_DISAMBIGUATE, TR_DISAMBIGUATE };
 extern tree get_continuation_for_phi (gimple *, ao_ref *, bool,
diff --git a/gcc/tree-ssa-loop-im.cc b/gcc/tree-ssa-loop-im.cc
index f3fda2bd7ce..3acbd886a0d 100644
--- a/gcc/tree-ssa-loop-im.cc
+++ b/gcc/tree-ssa-loop-im.cc
@@ -2298,7 +2298,7 @@ execute_sm (class loop *loop, im_mem_ref *ref,
   bool always_stored = ref_always_access

Re: [PATCH v3] aarch64: Add vector popcount besides QImode [PR113859]

2024-06-17 Thread Andrew Pinski
On Mon, Jun 17, 2024, 5:59 AM Tamar Christina 
wrote:

> Hi,
>
> > -Original Message-
> > From: Pengxuan Zheng 
> > Sent: Friday, June 14, 2024 12:57 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Pengxuan Zheng 
> > Subject: [PATCH v3] aarch64: Add vector popcount besides QImode
> [PR113859]
> >
> > This patch improves GCC’s vectorization of __builtin_popcount for
> aarch64 target
> > by adding popcount patterns for vector modes besides QImode, i.e.,
> HImode,
> > SImode and DImode.
> >
> > With this patch, we now generate the following for V8HI:
> >   cnt v1.16b, v.16b
> >   uaddlp  v2.8h, v1.16b
> >
> > For V4HI, we generate:
> >   cnt v1.8b, v.8b
> >   uaddlp  v2.4h, v1.8b
> >
> > For V4SI, we generate:
> >   cnt v1.16b, v.16b
> >   uaddlp  v2.8h, v1.16b
> >   uaddlp  v3.4s, v2.8h
> >
> > For V2SI, we generate:
> >   cnt v1.8b, v.8b
> >   uaddlp  v2.4h, v1.8b
> >   uaddlp  v3.2s, v2.4h
> >
> > For V2DI, we generate:
> >   cnt v1.16b, v.16b
> >   uaddlp  v2.8h, v1.16b
> >   uaddlp  v3.4s, v2.8h
> >   uaddlp  v4.2d, v3.4s
>
> Nice patch!  We can do better for these sequences though. Would you
> instead consider using udot with a 0 accumulator and 1 multiplicatent.
>
> Essentially
> movi v0.16b, #0
> movi v1.16b, #1
> cnt v3.16b, v2.16b
> udot  v0.4s, v3.16b, v1.16b
>
> this has 1 instruction less on the critical path so should be half the
> latency of the uaddlp variants.
>

Of course that can only be done if the udot is enabled. But yes I agree
that is better.


> For the DI case you'll still need a final uaddlp.
>
> Cheers,
> Tamar
>
> >
> >   PR target/113859
> >
> > gcc/ChangeLog:
> >
> >   * config/aarch64/aarch64-simd.md (aarch64_addlp):
> > Rename to...
> >   (@aarch64_addlp): ... This.
> >   (popcount2): New define_expand.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/aarch64/popcnt-vec.c: New test.
> >
> > Signed-off-by: Pengxuan Zheng 
> > ---
> >  gcc/config/aarch64/aarch64-simd.md| 28 +++-
> >  gcc/testsuite/gcc.target/aarch64/popcnt-vec.c | 69 +++
> >  2 files changed, 96 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt-vec.c
> >
> > diff --git a/gcc/config/aarch64/aarch64-simd.md
> b/gcc/config/aarch64/aarch64-
> > simd.md
> > index 0bb39091a38..ee73e13534b 100644
> > --- a/gcc/config/aarch64/aarch64-simd.md
> > +++ b/gcc/config/aarch64/aarch64-simd.md
> > @@ -3461,7 +3461,7 @@ (define_insn
> > "*aarch64_addlv_ze"
> >[(set_attr "type" "neon_reduc_add")]
> >  )
> >
> > -(define_expand "aarch64_addlp"
> > +(define_expand "@aarch64_addlp"
> >[(set (match_operand: 0 "register_operand")
> >   (plus:
> > (vec_select:
> > @@ -3517,6 +3517,32 @@ (define_insn "popcount2"
> >[(set_attr "type" "neon_cnt")]
> >  )
> >
> > +(define_expand "popcount2"
> > +  [(set (match_operand:VDQHSD 0 "register_operand")
> > +(popcount:VDQHSD (match_operand:VDQHSD 1 "register_operand")))]
> > +  "TARGET_SIMD"
> > +  {
> > +/* Generate a byte popcount. */
> > +machine_mode mode =  == 64 ? V8QImode : V16QImode;
> > +rtx tmp = gen_reg_rtx (mode);
> > +auto icode = optab_handler (popcount_optab, mode);
> > +emit_insn (GEN_FCN (icode) (tmp, gen_lowpart (mode, operands[1])));
> > +
> > +/* Use a sequence of UADDLPs to accumulate the counts. Each step
> doubles
> > +   the element size and halves the number of elements. */
> > +do
> > +  {
> > +auto icode = code_for_aarch64_addlp (ZERO_EXTEND, GET_MODE
> (tmp));
> > +mode = insn_data[icode].operand[0].mode;
> > +rtx dest = mode == mode ? operands[0] : gen_reg_rtx
> (mode);
> > +emit_insn (GEN_FCN (icode) (dest, tmp));
> > +tmp = dest;
> > +  }
> > +while (mode != mode);
> > +DONE;
> > +  }
> > +)
> > +
> >  ;; 'across lanes' max and min ops.
> >
> >  ;; Template for outputting a scalar, so we can create __builtins which
> can be
> > diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt-vec.c
> > b/gcc/testsuite/gcc.target/aarch64/popcnt-vec.c
> > new file mode 100644
> > index 000..0c4926d7ca8
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/popcnt-vec.c
> > @@ -0,0 +1,69 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fno-vect-cost-model" } */
> > +
> > +/* This function should produce cnt v.16b. */
> > +void
> > +bar (unsigned char *__restrict b, unsigned char *__restrict d)
> > +{
> > +  for (int i = 0; i < 1024; i++)
> > +d[i] = __builtin_popcount (b[i]);
> > +}
> > +
> > +/* This function should produce cnt v.16b and uaddlp (Add Long
> Pairwise). */
> > +void
> > +bar1 (unsigned short *__restrict b, unsigned short *__restrict d)
> > +{
> > +  for (int i = 0; i < 1024; i++)
> > +d[i] = __builtin_popcount (b[i]);
> > +}
> > +
> > +/* This function should produce cnt v.16b and 2 uaddlp (Add Long
> Pairwise). */
> > +void
> > +bar2 (unsigned int *__restrict b, unsigned int *__restrict d)

[PATCH v1 1/7] RISC-V: Add testcases for unsigned .SAT_ADD vector form 2

2024-06-17 Thread pan2 . li
From: Pan Li 

After the middle-end support the form 2 of unsigned SAT_ADD and
the RISC-V backend implement the .SAT_ADD for vector mode, add
more test case to cover the form 2.

Form 2:
  #define DEF_VEC_SAT_U_ADD_FMT_2(T)   \
  void __attribute__((noinline))   \
  vec_sat_u_add_##T##_fmt_2 (T *out, T *op_1, T *op_2, unsigned limit) \
  {\
unsigned i;\
for (i = 0; i < limit; i++)\
  {\
T x = op_1[i]; \
T y = op_2[i]; \
out[i] = (T)(x + y) >= x ? (x + y) : -1;   \
  }\
  }

Passed the rv64gcv regression tests.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h: Add helper
macro for testing.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-5.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-6.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-7.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-8.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-5.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-6.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-7.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-8.c: New test.

Signed-off-by: Pan Li 
---
 .../riscv/rvv/autovec/binop/vec_sat_arith.h   | 16 
 .../riscv/rvv/autovec/binop/vec_sat_u_add-5.c | 19 +
 .../riscv/rvv/autovec/binop/vec_sat_u_add-6.c | 20 +
 .../riscv/rvv/autovec/binop/vec_sat_u_add-7.c | 20 +
 .../riscv/rvv/autovec/binop/vec_sat_u_add-8.c | 20 +
 .../rvv/autovec/binop/vec_sat_u_add-run-5.c   | 75 +++
 .../rvv/autovec/binop/vec_sat_u_add-run-6.c   | 75 +++
 .../rvv/autovec/binop/vec_sat_u_add-run-7.c   | 75 +++
 .../rvv/autovec/binop/vec_sat_u_add-run-8.c   | 75 +++
 9 files changed, 395 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-7.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-7.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-8.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
index 450f0fbbc72..57b1bce4bd2 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
@@ -19,9 +19,25 @@ vec_sat_u_add_##T##_fmt_1 (T *out, T *op_1, T *op_2, 
unsigned limit) \
 }\
 }
 
+#define DEF_VEC_SAT_U_ADD_FMT_2(T)   \
+void __attribute__((noinline))   \
+vec_sat_u_add_##T##_fmt_2 (T *out, T *op_1, T *op_2, unsigned limit) \
+{\
+  unsigned i;\
+  for (i = 0; i < limit; i++)\
+{\
+  T x = op_1[i]; \
+  T y = op_2[i]; \
+  out[i] = (T)(x + y) >= x ? (x + y) : -1;   \
+}\
+}
+
 #define RUN_VEC_SAT_U_ADD_FMT_1(T, out, op_1, op_2, N) \
   vec_sat_u_add_##T##_fmt_1(out, op_1, op_2, N)
 
+#define RUN_VEC_SAT_U_ADD_FMT_2(T, out, op_1, op_2, N) \
+  vec_sat_u_add_##T##_fmt_2(out, op_1, op_2, N)
+
 
/**/
 /* Saturation Sub (Unsigned and Signed)   
*/
 
/**/
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-5.c 
b/gcc/testsuite/gcc.target/risc

[PATCH v1 2/7] RISC-V: Add testcases for unsigned .SAT_ADD vector form 3

2024-06-17 Thread pan2 . li
From: Pan Li 

After the middle-end support the form 3 of unsigned SAT_ADD and
the RISC-V backend implement the .SAT_ADD for vector mode, add
more test case to cover the form 3.

Form 3:
  #define DEF_VEC_SAT_U_ADD_FMT_3(T)   \
  void __attribute__((noinline))   \
  vec_sat_u_add_##T##_fmt_3 (T *out, T *op_1, T *op_2, unsigned limit) \
  {\
unsigned i;\
for (i = 0; i < limit; i++)\
  {\
T x = op_1[i]; \
T y = op_2[i]; \
T ret; \
T overflow = __builtin_add_overflow (x, y, &ret);  \
out[i] = (T)(-overflow) | ret; \
  }\
  }

Passed the rv64gcv regression tests.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h: Add helper
macro for testing.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-10.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-11.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-12.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-9.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-10.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-11.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-12.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-9.c: New test.

Signed-off-by: Pan Li 
---
 .../riscv/rvv/autovec/binop/vec_sat_arith.h   | 18 +
 .../rvv/autovec/binop/vec_sat_u_add-10.c  | 20 +
 .../rvv/autovec/binop/vec_sat_u_add-11.c  | 20 +
 .../rvv/autovec/binop/vec_sat_u_add-12.c  | 20 +
 .../riscv/rvv/autovec/binop/vec_sat_u_add-9.c | 19 +
 .../rvv/autovec/binop/vec_sat_u_add-run-10.c  | 75 +++
 .../rvv/autovec/binop/vec_sat_u_add-run-11.c  | 75 +++
 .../rvv/autovec/binop/vec_sat_u_add-run-12.c  | 75 +++
 .../rvv/autovec/binop/vec_sat_u_add-run-9.c   | 75 +++
 9 files changed, 397 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-10.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-11.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-12.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-9.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-10.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-11.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-12.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-9.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
index 57b1bce4bd2..76f393fffbd 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
@@ -32,12 +32,30 @@ vec_sat_u_add_##T##_fmt_2 (T *out, T *op_1, T *op_2, 
unsigned limit) \
 }\
 }
 
+#define DEF_VEC_SAT_U_ADD_FMT_3(T)   \
+void __attribute__((noinline))   \
+vec_sat_u_add_##T##_fmt_3 (T *out, T *op_1, T *op_2, unsigned limit) \
+{\
+  unsigned i;\
+  for (i = 0; i < limit; i++)\
+{\
+  T x = op_1[i]; \
+  T y = op_2[i]; \
+  T ret; \
+  T overflow = __builtin_add_overflow (x, y, &ret);  \
+  out[i] = (T)(-overflow) | ret; \
+}\
+}
+
 #define RUN_VEC_SAT_U_ADD_FMT_1(T, out, op_1, op_2, N) \
   vec_sat_u_add_##T##_fmt_1(out, op_1, op_2, N)
 
 #define RUN_VEC_SAT_U_ADD_FMT_2(T, out, op_1, op_2, N) \
   vec_sat_u_add_##T##_fmt_2(out, op_1, op_2, N)
 
+#define RUN_VEC_SAT_U_ADD_FMT_3(T, out, op_1, op_2, N) \

[PATCH v1 3/7] RISC-V: Add testcases for unsigned .SAT_ADD vector form 4

2024-06-17 Thread pan2 . li
From: Pan Li 

After the middle-end support the form 4 of unsigned SAT_ADD and
the RISC-V backend implement the .SAT_ADD for vector mode, add
more test case to cover the form 4.

Form 4:
  #define DEF_VEC_SAT_U_ADD_FMT_4(T)   \
  void __attribute__((noinline))   \
  vec_sat_u_add_##T##_fmt_4 (T *out, T *op_1, T *op_2, unsigned limit) \
  {\
unsigned i;\
for (i = 0; i < limit; i++)\
  {\
T x = op_1[i]; \
T y = op_2[i]; \
T ret; \
out[i] = __builtin_add_overflow (x, y, &ret) ? -1 : ret;   \
  }\
  }

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h: Add helper
macro for testing.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-13.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-14.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-15.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-16.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-13.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-14.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-15.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-16.c: New test.

Passed the rv64gcv regression tests.

Signed-off-by: Pan Li 
---
 .../riscv/rvv/autovec/binop/vec_sat_arith.h   | 17 +
 .../rvv/autovec/binop/vec_sat_u_add-13.c  | 19 +
 .../rvv/autovec/binop/vec_sat_u_add-14.c  | 20 +
 .../rvv/autovec/binop/vec_sat_u_add-15.c  | 20 +
 .../rvv/autovec/binop/vec_sat_u_add-16.c  | 20 +
 .../rvv/autovec/binop/vec_sat_u_add-run-13.c  | 75 +++
 .../rvv/autovec/binop/vec_sat_u_add-run-14.c  | 75 +++
 .../rvv/autovec/binop/vec_sat_u_add-run-15.c  | 75 +++
 .../rvv/autovec/binop/vec_sat_u_add-run-16.c  | 75 +++
 9 files changed, 396 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-13.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-14.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-15.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-13.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-14.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-15.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-16.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
index 76f393fffbd..e00769e35b6 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
@@ -47,6 +47,20 @@ vec_sat_u_add_##T##_fmt_3 (T *out, T *op_1, T *op_2, 
unsigned limit) \
 }\
 }
 
+#define DEF_VEC_SAT_U_ADD_FMT_4(T)   \
+void __attribute__((noinline))   \
+vec_sat_u_add_##T##_fmt_4 (T *out, T *op_1, T *op_2, unsigned limit) \
+{\
+  unsigned i;\
+  for (i = 0; i < limit; i++)\
+{\
+  T x = op_1[i]; \
+  T y = op_2[i]; \
+  T ret; \
+  out[i] = __builtin_add_overflow (x, y, &ret) ? -1 : ret;   \
+}\
+}
+
 #define RUN_VEC_SAT_U_ADD_FMT_1(T, out, op_1, op_2, N) \
   vec_sat_u_add_##T##_fmt_1(out, op_1, op_2, N)
 
@@ -56,6 +70,9 @@ vec_sat_u_add_##T##_fmt_3 (T *out, T *op_1, T *op_2, unsigned 
limit) \
 #define RUN_VEC_SAT_U_ADD_FMT_3(T, out, op_1, op_2, N) \
   vec_sat_u_add_##T##_fmt_3(out, op_1, op_2, N)
 
+#define RUN_VEC_SAT_U_ADD_FMT_4(T, out, op_1, op_2, N) \
+  vec_sat_u_add_##T##_fmt_4(out, op_1, op_2, N)
+

[PATCH v1 4/7] RISC-V: Add testcases for unsigned .SAT_ADD vector form 5

2024-06-17 Thread pan2 . li
From: Pan Li 

After the middle-end support the form 5 of unsigned SAT_ADD and
the RISC-V backend implement the .SAT_ADD for vector mode, add
more test case to cover the form 5.

Form 5:
  #define DEF_VEC_SAT_U_ADD_FMT_5(T)   \
  void __attribute__((noinline))   \
  vec_sat_u_add_##T##_fmt_5 (T *out, T *op_1, T *op_2, unsigned limit) \
  {\
unsigned i;\
for (i = 0; i < limit; i++)\
  {\
T x = op_1[i]; \
T y = op_2[i]; \
T ret; \
out[i] = __builtin_add_overflow (x, y, &ret) == 0 ? ret : -1;  \
  }\
  }

Passed the rv64gcv regression tests.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h: Add helper
macro for testing.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-17.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-18.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-19.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-20.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-17.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-18.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-19.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-20.c: New test.

Signed-off-by: Pan Li 
---
 .../riscv/rvv/autovec/binop/vec_sat_arith.h   | 17 +
 .../rvv/autovec/binop/vec_sat_u_add-17.c  | 19 +
 .../rvv/autovec/binop/vec_sat_u_add-18.c  | 20 +
 .../rvv/autovec/binop/vec_sat_u_add-19.c  | 20 +
 .../rvv/autovec/binop/vec_sat_u_add-20.c  | 20 +
 .../rvv/autovec/binop/vec_sat_u_add-run-17.c  | 75 +++
 .../rvv/autovec/binop/vec_sat_u_add-run-18.c  | 75 +++
 .../rvv/autovec/binop/vec_sat_u_add-run-19.c  | 75 +++
 .../rvv/autovec/binop/vec_sat_u_add-run-20.c  | 75 +++
 9 files changed, 396 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-17.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-18.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-19.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-20.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-17.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-18.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-19.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-20.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
index e00769e35b6..1f2ee31577d 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
@@ -61,6 +61,20 @@ vec_sat_u_add_##T##_fmt_4 (T *out, T *op_1, T *op_2, 
unsigned limit) \
 }\
 }
 
+#define DEF_VEC_SAT_U_ADD_FMT_5(T)   \
+void __attribute__((noinline))   \
+vec_sat_u_add_##T##_fmt_5 (T *out, T *op_1, T *op_2, unsigned limit) \
+{\
+  unsigned i;\
+  for (i = 0; i < limit; i++)\
+{\
+  T x = op_1[i]; \
+  T y = op_2[i]; \
+  T ret; \
+  out[i] = __builtin_add_overflow (x, y, &ret) == 0 ? ret : -1;  \
+}\
+}
+
 #define RUN_VEC_SAT_U_ADD_FMT_1(T, out, op_1, op_2, N) \
   vec_sat_u_add_##T##_fmt_1(out, op_1, op_2, N)
 
@@ -73,6 +87,9 @@ vec_sat_u_add_##T##_fmt_4 (T *out, T *op_1, T *op_2, unsigned 
limit) \
 #define RUN_VEC_SAT_U_ADD_FMT_4(T, out, op_1, op_2, N) \
   vec_sat_u_add_##T##_fmt_4(out, op_1, op_2, N)
 
+#define RUN_VEC_SAT_U_ADD_FMT_5(T, out, op_1, op_2, N) \
+  vec_sat_u_add_##T##_fmt_5(out, op_1, op_2, N)
+

[PATCH v1 6/7] RISC-V: Add testcases for unsigned .SAT_ADD vector form 7

2024-06-17 Thread pan2 . li
From: Pan Li 

After the middle-end support the form 7 of unsigned SAT_ADD and
the RISC-V backend implement the .SAT_ADD for vector mode, add
more test case to cover the form 7.

Form 7:
  #define DEF_VEC_SAT_U_ADD_FMT_7(T)   \
  void __attribute__((noinline))   \
  vec_sat_u_add_##T##_fmt_7 (T *out, T *op_1, T *op_2, unsigned limit) \
  {\
unsigned i;\
for (i = 0; i < limit; i++)\
  {\
T x = op_1[i]; \
T y = op_2[i]; \
out[i] = (T)(x + y) < x ? -1 : (x + y);\
  }\
  }

Passed the rv64gcv regression tests.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h: Add helper
macro for testing.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-25.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-26.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-27.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-28.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-25.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-26.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-27.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-28.c: New test.

Signed-off-by: Pan Li 
---
 .../riscv/rvv/autovec/binop/vec_sat_arith.h   | 16 
 .../rvv/autovec/binop/vec_sat_u_add-25.c  | 19 +
 .../rvv/autovec/binop/vec_sat_u_add-26.c  | 20 +
 .../rvv/autovec/binop/vec_sat_u_add-27.c  | 20 +
 .../rvv/autovec/binop/vec_sat_u_add-28.c  | 20 +
 .../rvv/autovec/binop/vec_sat_u_add-run-25.c  | 75 +++
 .../rvv/autovec/binop/vec_sat_u_add-run-26.c  | 75 +++
 .../rvv/autovec/binop/vec_sat_u_add-run-27.c  | 75 +++
 .../rvv/autovec/binop/vec_sat_u_add-run-28.c  | 75 +++
 9 files changed, 395 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-25.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-26.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-27.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-28.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-25.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-26.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-27.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-28.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
index 0f08822cbeb..46fae4555be 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
@@ -88,6 +88,19 @@ vec_sat_u_add_##T##_fmt_6 (T *out, T *op_1, T *op_2, 
unsigned limit) \
 }\
 }
 
+#define DEF_VEC_SAT_U_ADD_FMT_7(T)   \
+void __attribute__((noinline))   \
+vec_sat_u_add_##T##_fmt_7 (T *out, T *op_1, T *op_2, unsigned limit) \
+{\
+  unsigned i;\
+  for (i = 0; i < limit; i++)\
+{\
+  T x = op_1[i]; \
+  T y = op_2[i]; \
+  out[i] = (T)(x + y) < x ? -1 : (x + y);\
+}\
+}
+
 #define RUN_VEC_SAT_U_ADD_FMT_1(T, out, op_1, op_2, N) \
   vec_sat_u_add_##T##_fmt_1(out, op_1, op_2, N)
 
@@ -106,6 +119,9 @@ vec_sat_u_add_##T##_fmt_6 (T *out, T *op_1, T *op_2, 
unsigned limit) \
 #define RUN_VEC_SAT_U_ADD_FMT_6(T, out, op_1, op_2, N) \
   vec_sat_u_add_##T##_fmt_6(out, op_1, op_2, N)
 
+#define RUN_VEC_SAT_U_ADD_FMT_7(T, out, op_1, op_2, N) \
+  vec_sat_u_add_##T##_fmt_7(out, op_1, op_2, N)
+
 
/**/
 /* Saturation Sub (Unsigned and Signed) 

[PATCH v1 5/7] RISC-V: Add testcases for unsigned .SAT_ADD vector form 6

2024-06-17 Thread pan2 . li
From: Pan Li 

After the middle-end support the form 6 of unsigned SAT_ADD and
the RISC-V backend implement the .SAT_ADD for vector mode, add
more test case to cover the form 6.

Form 6:
  #define DEF_VEC_SAT_U_ADD_FMT_6(T)   \
  void __attribute__((noinline))   \
  vec_sat_u_add_##T##_fmt_6 (T *out, T *op_1, T *op_2, unsigned limit) \
  {\
unsigned i;\
for (i = 0; i < limit; i++)\
  {\
T x = op_1[i]; \
T y = op_2[i]; \
out[i] = x <= (T)(x + y) ? (x + y) : -1;   \
  }\
  }

Passed the rv64gcv regression tests.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h: Add helper
macro for testing.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-21.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-22.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-23.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-24.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-21.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-22.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-23.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-24.c: New test.

Signed-off-by: Pan Li 
---
 .../riscv/rvv/autovec/binop/vec_sat_arith.h   | 16 
 .../rvv/autovec/binop/vec_sat_u_add-21.c  | 19 +
 .../rvv/autovec/binop/vec_sat_u_add-22.c  | 20 +
 .../rvv/autovec/binop/vec_sat_u_add-23.c  | 20 +
 .../rvv/autovec/binop/vec_sat_u_add-24.c  | 20 +
 .../rvv/autovec/binop/vec_sat_u_add-run-21.c  | 75 +++
 .../rvv/autovec/binop/vec_sat_u_add-run-22.c  | 75 +++
 .../rvv/autovec/binop/vec_sat_u_add-run-23.c  | 75 +++
 .../rvv/autovec/binop/vec_sat_u_add-run-24.c  | 75 +++
 9 files changed, 395 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-21.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-22.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-23.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-24.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-21.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-22.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-23.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-24.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
index 1f2ee31577d..0f08822cbeb 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
@@ -75,6 +75,19 @@ vec_sat_u_add_##T##_fmt_5 (T *out, T *op_1, T *op_2, 
unsigned limit) \
 }\
 }
 
+#define DEF_VEC_SAT_U_ADD_FMT_6(T)   \
+void __attribute__((noinline))   \
+vec_sat_u_add_##T##_fmt_6 (T *out, T *op_1, T *op_2, unsigned limit) \
+{\
+  unsigned i;\
+  for (i = 0; i < limit; i++)\
+{\
+  T x = op_1[i]; \
+  T y = op_2[i]; \
+  out[i] = x <= (T)(x + y) ? (x + y) : -1;   \
+}\
+}
+
 #define RUN_VEC_SAT_U_ADD_FMT_1(T, out, op_1, op_2, N) \
   vec_sat_u_add_##T##_fmt_1(out, op_1, op_2, N)
 
@@ -90,6 +103,9 @@ vec_sat_u_add_##T##_fmt_5 (T *out, T *op_1, T *op_2, 
unsigned limit) \
 #define RUN_VEC_SAT_U_ADD_FMT_5(T, out, op_1, op_2, N) \
   vec_sat_u_add_##T##_fmt_5(out, op_1, op_2, N)
 
+#define RUN_VEC_SAT_U_ADD_FMT_6(T, out, op_1, op_2, N) \
+  vec_sat_u_add_##T##_fmt_6(out, op_1, op_2, N)
+
 
/**/
 /* Saturation Sub (Unsigned and Signed)  

[PATCH v1 7/7] RISC-V: Add testcases for unsigned .SAT_ADD vector form 8

2024-06-17 Thread pan2 . li
From: Pan Li 

After the middle-end support the form 8 of unsigned SAT_ADD and
the RISC-V backend implement the .SAT_ADD for vector mode, add
more test case to cover the form 8.

Form 8:
  #define DEF_VEC_SAT_U_ADD_FMT_8(T)   \
  void __attribute__((noinline))   \
  vec_sat_u_add_##T##_fmt_8 (T *out, T *op_1, T *op_2, unsigned limit) \
  {\
unsigned i;\
for (i = 0; i < limit; i++)\
  {\
T x = op_1[i]; \
T y = op_2[i]; \
out[i] = x > (T)(x + y) ? -1 : (x + y);\
  }\
  }

Passed the rv64gcv regression tests.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h: Add helper
macro for testing.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-29.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-30.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-31.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-32.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-29.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-30.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-31.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-32.c: New test.

Signed-off-by: Pan Li 
---
 .../riscv/rvv/autovec/binop/vec_sat_arith.h   | 16 
 .../rvv/autovec/binop/vec_sat_u_add-29.c  | 19 +
 .../rvv/autovec/binop/vec_sat_u_add-30.c  | 20 +
 .../rvv/autovec/binop/vec_sat_u_add-31.c  | 20 +
 .../rvv/autovec/binop/vec_sat_u_add-32.c  | 20 +
 .../rvv/autovec/binop/vec_sat_u_add-run-29.c  | 75 +++
 .../rvv/autovec/binop/vec_sat_u_add-run-30.c  | 75 +++
 .../rvv/autovec/binop/vec_sat_u_add-run-31.c  | 75 +++
 .../rvv/autovec/binop/vec_sat_u_add-run-32.c  | 75 +++
 9 files changed, 395 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-29.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-30.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-31.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-29.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-30.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-31.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-32.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
index 46fae4555be..443f88261ba 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
@@ -101,6 +101,19 @@ vec_sat_u_add_##T##_fmt_7 (T *out, T *op_1, T *op_2, 
unsigned limit) \
 }\
 }
 
+#define DEF_VEC_SAT_U_ADD_FMT_8(T)   \
+void __attribute__((noinline))   \
+vec_sat_u_add_##T##_fmt_8 (T *out, T *op_1, T *op_2, unsigned limit) \
+{\
+  unsigned i;\
+  for (i = 0; i < limit; i++)\
+{\
+  T x = op_1[i]; \
+  T y = op_2[i]; \
+  out[i] = x > (T)(x + y) ? -1 : (x + y);\
+}\
+}
+
 #define RUN_VEC_SAT_U_ADD_FMT_1(T, out, op_1, op_2, N) \
   vec_sat_u_add_##T##_fmt_1(out, op_1, op_2, N)
 
@@ -122,6 +135,9 @@ vec_sat_u_add_##T##_fmt_7 (T *out, T *op_1, T *op_2, 
unsigned limit) \
 #define RUN_VEC_SAT_U_ADD_FMT_7(T, out, op_1, op_2, N) \
   vec_sat_u_add_##T##_fmt_7(out, op_1, op_2, N)
 
+#define RUN_VEC_SAT_U_ADD_FMT_8(T, out, op_1, op_2, N) \
+  vec_sat_u_add_##T##_fmt_8(out, op_1, op_2, N)
+
 
/**/
 /* Saturation Sub (Unsigned and Signed)   

Re: [PATCH] middle-end/114189 - drop uses of vcond{,u,eq}_optab

2024-06-17 Thread Kewen.Lin
on 2024/6/17 14:16, Richard Biener wrote:
> On Mon, 17 Jun 2024, Kewen.Lin wrote:
> 
>> Hi Richi,
>>
>> on 2024/6/14 18:31, Richard Biener wrote:
>>> The following retires vcond{,u,eq} optabs by stopping to use them
>>> from the middle-end.  Targets instead (should) implement vcond_mask
>>> and vec_cmp{,u,eq} optabs.  The PR this change refers to lists
>>> possibly affected targets - those implementing these patterns,
>>> and in particular it lists mips, sparc and ia64 as targets that
>>> most definitely will regress while others might simply remove
>>> their vcond{,u,eq} patterns.
>>>
>>> I'd appreciate testing, I do not expect fallout for x86 or arm/aarch64.
>>> I know riscv doesn't implement any of the legacy optabs.  But less
>>> maintained vector targets might need adjustments.
>>
>> Thanks for making this change, this patch can be bootstrapped on ppc64{,le}
>> but both have one failure on gcc/testsuite/gcc.target/powerpc/pr66144-3.c,
>> by looking into it, I found it just exposed one oversight in the current
>> rs6000 vcond_mask support (the condition mask location is wrong), so I think
>> this change is fine for rs6000 port, I'll also test SPEC2017 for this (with
>> rs6000 vcond_mask change) soon.
> 
> Btw, for those targets where the patch works out fine it would be nice
> to delete their vcond{,u,eq} expanders (and double-check that doesn't
> cause issues on its own).

OK, will do, thanks for reminding!

> 
> Can target maintainers note whether their targets support all condition
> codes for their vector comparisons (including FP variants)?  And 

On Power, hardware only supports EQ and GT for vector INT (well ISA 3.0 supports
NE for b/h/w), while EQ, GT & GE for vector FP.  But vec_cmp optab supports
{EQ,NE,LT,LE,GT,GE} for signed, {EQ,NE,LTU,LEU,GTU,GEU} for unsigned, and
{EQ,NE,LT,LE,GT,GE,UNORDERED,ORDERED,UNEQ,LTGT,UNGE,UNGT,UNLT,UNLE} for fp.

> whether they choose to implement all condition codes in vec_cmp
> and adjust with inversion / operand swapping for not supported cases?

Yes for rs6000 port, some relies on define_insn_and_split.

BR,
Kewen



Re: [PATCH] rs6000: Compute rop_hash_save_offset for non-Altivec compiles [PR115389]

2024-06-17 Thread Kewen.Lin
on 2024/6/17 10:31, Peter Bergner wrote:
> On 6/16/24 9:10 PM, Kewen.Lin wrote:
>> on 2024/6/15 01:05, Peter Bergner wrote:
>>> That said, the --with-cpu=power5 build without fortran did bootstrap and
>>> regtest with no regressions, so the build did test that code path and
>>> exposed no problems.
>>
>> OK, nice!  Thanks!
> 
> I assume this means you're "OK" with the updated patch, correct?

Yes, OK for trunk, thanks!

>>> Currently, TARGET_ALTIVEC_ABI is defined as:
>>>
>>>   #define TARGET_ALTIVEC_ABI rs6000_altivec_abi
>>>
>>> Would it make sense to redine it to:
>>>
>>>   #define TARGET_ALTIVEC_ABI (TARGET_ALTIVEC && rs6000_altivec_abi)
>>>
>>> ...or add some code in rs6000 option handling to disable rs6000_altivec_abi
>>> when TARGET_ALTIVEC is false?  or do we care enough to even change it? 
>>> :-)
>>
>> Assuming the current code is robust enough (perfectly guarded by some 
>> altivec related
>> condition like this altivec register saving slot), there may not any actual 
>> errors,
>> but considering not surprising people, I'm inclined to add some option 
>> handlings for
>> it, like unsetting rs6000_altivec_abi if !TARGET_ALTIVEC and give some 
>> warning if it's
>> explicitly specified, what do you think?
> 
> I like it, since if Altivec is disabled, having TARGET_ALTIVEC_ABI enabled 
> makes no
> sense to me.  That is orthogonal to this bug though, so should be a separate 
> patch.

Yes.

> Do you want to take a stab at writing that or do you want me to do that?

Either is fine for me, then let me give it a shot.

BR,
Kewen



Ping^2 [PATCH-1v3, rs6000] Implement optab_isinf for SFDF and IEEE128

2024-06-17 Thread HAO CHEN GUI
Hi,
   Gently ping the series of patches.
 [PATCH-1v3, rs6000] Implement optab_isinf for SFDF and IEEE128
 https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652593.html
 [PATCH-2v3, rs6000] Implement optab_isfinite for SFDF and IEEE128
 https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652594.html
 [PATCH-3v3, rs6000] Implement optab_isnormal for SFDF and IEEE128
 https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652595.html

Thanks
Gui Haochen

在 2024/6/3 10:40, HAO CHEN GUI 写道:
> Hi,
>   Gently ping the series of patches.
> [PATCH-1v3, rs6000] Implement optab_isinf for SFDF and IEEE128
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652593.html
> [PATCH-2v3, rs6000] Implement optab_isfinite for SFDF and IEEE128
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652594.html
> [PATCH-3v3, rs6000] Implement optab_isnormal for SFDF and IEEE128
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652595.html
> 
> Thanks
> Gui Haochen
> 
> 在 2024/5/24 14:02, HAO CHEN GUI 写道:
>> Hi,
>>   This patch implemented optab_isinf for SFDF and IEEE128 by test
>> data class instructions.
>>
>>   Compared with previous version, the main change is to narrow
>> down the predict for float operand according to review's advice.
>> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652128.html
>>
>>   Bootstrapped and tested on powerpc64-linux BE and LE with no
>> regressions. Is it OK for trunk?
>>
>> Thanks
>> Gui Haochen
>>
>> ChangeLog
>> rs6000: Implement optab_isinf for SFDF and IEEE128
>>
>> gcc/
>>  PR target/97786
>>  * config/rs6000/vsx.md (isinf2 for SFDF): New expand.
>>  (isinf2 for IEEE128): New expand.
>>
>> gcc/testsuite/
>>  PR target/97786
>>  * gcc.target/powerpc/pr97786-1.c: New test.
>>  * gcc.target/powerpc/pr97786-2.c: New test.
>>
>> patch.diff
>> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
>> index f135fa079bd..08cce11da60 100644
>> --- a/gcc/config/rs6000/vsx.md
>> +++ b/gcc/config/rs6000/vsx.md
>> @@ -5313,6 +5313,24 @@ (define_expand "xststdcp"
>>operands[4] = CONST0_RTX (SImode);
>>  })
>>
>> +(define_expand "isinf2"
>> +  [(use (match_operand:SI 0 "gpc_reg_operand"))
>> +   (use (match_operand:SFDF 1 "vsx_register_operand"))]
>> +  "TARGET_HARD_FLOAT && TARGET_P9_VECTOR"
>> +{
>> +  emit_insn (gen_xststdcp (operands[0], operands[1], GEN_INT (0x30)));
>> +  DONE;
>> +})
>> +
>> +(define_expand "isinf2"
>> +  [(use (match_operand:SI 0 "gpc_reg_operand"))
>> +   (use (match_operand:IEEE128 1 "vsx_register_operand"))]
>> +  "TARGET_HARD_FLOAT && TARGET_P9_VECTOR"
>> +{
>> +  emit_insn (gen_xststdcqp_ (operands[0], operands[1], GEN_INT 
>> (0x30)));
>> +  DONE;
>> +})
>> +
>>  ;; The VSX Scalar Test Negative Quad-Precision
>>  (define_expand "xststdcnegqp_"
>>[(set (match_dup 2)
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-1.c 
>> b/gcc/testsuite/gcc.target/powerpc/pr97786-1.c
>> new file mode 100644
>> index 000..c1c4f64ee8b
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-1.c
>> @@ -0,0 +1,22 @@
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target powerpc_vsx } */
>> +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
>> +
>> +int test1 (double x)
>> +{
>> +  return __builtin_isinf (x);
>> +}
>> +
>> +int test2 (float x)
>> +{
>> +  return __builtin_isinf (x);
>> +}
>> +
>> +int test3 (float x)
>> +{
>> +  return __builtin_isinff (x);
>> +}
>> +
>> +/* { dg-final { scan-assembler-not {\mfcmp} } } */
>> +/* { dg-final { scan-assembler-times {\mxststdcsp\M} 2 } } */
>> +/* { dg-final { scan-assembler-times {\mxststdcdp\M} 1 } } */
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-2.c 
>> b/gcc/testsuite/gcc.target/powerpc/pr97786-2.c
>> new file mode 100644
>> index 000..ed305e8572e
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-2.c
>> @@ -0,0 +1,17 @@
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target ppc_float128_hw } */
>> +/* { dg-require-effective-target powerpc_vsx } */
>> +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mabi=ieeelongdouble -Wno-psabi" 
>> } */
>> +
>> +int test1 (long double x)
>> +{
>> +  return __builtin_isinf (x);
>> +}
>> +
>> +int test2 (long double x)
>> +{
>> +  return __builtin_isinfl (x);
>> +}
>> +
>> +/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */
>> +/* { dg-final { scan-assembler-times {\mxststdcqp\M} 2 } } */


Ping^2 [PATCHv5] Optab: add isfinite_optab for __builtin_isfinite

2024-06-17 Thread HAO CHEN GUI
Hi,
  Gently ping it.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652991.html

Thanks
Gui Haochen

在 2024/6/3 10:37, HAO CHEN GUI 写道:
> Hi,
>   All issues were addressed. Gently ping it.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652991.html
> 
> Thanks
> Gui Haochen
> 
> 在 2024/5/29 14:36, HAO CHEN GUI 写道:
>> Hi,
>>   This patch adds an optab for __builtin_isfinite. The finite check can be
>> implemented on rs6000 by a single instruction. It needs an optab to be
>> expanded to the certain sequence of instructions.
>>
>>   The subsequent patches will implement the expand on rs6000.
>>
>>   Compared to previous version, the main change is to specify return
>> value of the optab should be either 0 or 1.
>> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652864.html
>>
>>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
>> regressions. Is this OK for trunk?
>>
>> Thanks
>> Gui Haochen
>>
>> ChangeLog
>> optab: Add isfinite_optab for isfinite builtin
>>
>> gcc/
>>  * builtins.cc (interclass_mathfn_icode): Set optab to isfinite_optab
>>  for isfinite builtin.
>>  * optabs.def (isfinite_optab): New.
>>  * doc/md.texi (isfinite): Document.
>>
>>
>> patch.diff
>> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
>> index f8d94c4b435..53e9d210541 100644
>> --- a/gcc/builtins.cc
>> +++ b/gcc/builtins.cc
>> @@ -2459,8 +2459,10 @@ interclass_mathfn_icode (tree arg, tree fndecl)
>>errno_set = true; builtin_optab = ilogb_optab; break;
>>  CASE_FLT_FN (BUILT_IN_ISINF):
>>builtin_optab = isinf_optab; break;
>> -case BUILT_IN_ISNORMAL:
>>  case BUILT_IN_ISFINITE:
>> +  builtin_optab = isfinite_optab;
>> +  break;
>> +case BUILT_IN_ISNORMAL:
>>  CASE_FLT_FN (BUILT_IN_FINITE):
>>  case BUILT_IN_FINITED32:
>>  case BUILT_IN_FINITED64:
>> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
>> index 5730bda80dc..3eb4216141e 100644
>> --- a/gcc/doc/md.texi
>> +++ b/gcc/doc/md.texi
>> @@ -8557,6 +8557,12 @@ operand 2, greater than operand 2 or is unordered 
>> with operand 2.
>>
>>  This pattern is not allowed to @code{FAIL}.
>>
>> +@cindex @code{isfinite@var{m}2} instruction pattern
>> +@item @samp{isfinite@var{m}2}
>> +Return 1 if operand 1 is a finite floating point number and 0
>> +otherwise.  @var{m} is a scalar floating point mode.  Operand 0
>> +has mode @code{SImode}, and operand 1 has mode @var{m}.
>> +
>>  @end table
>>
>>  @end ifset
>> diff --git a/gcc/optabs.def b/gcc/optabs.def
>> index ad14f9328b9..dcd77315c2a 100644
>> --- a/gcc/optabs.def
>> +++ b/gcc/optabs.def
>> @@ -352,6 +352,7 @@ OPTAB_D (fmod_optab, "fmod$a3")
>>  OPTAB_D (hypot_optab, "hypot$a3")
>>  OPTAB_D (ilogb_optab, "ilogb$a2")
>>  OPTAB_D (isinf_optab, "isinf$a2")
>> +OPTAB_D (isfinite_optab, "isfinite$a2")
>>  OPTAB_D (issignaling_optab, "issignaling$a2")
>>  OPTAB_D (ldexp_optab, "ldexp$a3")
>>  OPTAB_D (log10_optab, "log10$a2")


Re: Patch ping

2024-06-17 Thread Segher Boessenkool
On Mon, Jun 17, 2024 at 03:26:52PM +0200, Jakub Jelinek wrote:
> I'd like to ping the
> https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653573.html
> patch.  While the committed and backported patch fixed PCH on PIE
> cc1/cc1plus etc. on PowerPC, it grew up the size of the
> rs6000_init_generated_builtins function quite a lot.
> The above patch decreases it back, to even less than the size of
> the function before my fix.

A patch in the middle of a thread.  I missed it, sorry.  Please send
patches as separate threads?


Segher


[PATCH] diagnostics: Fix add_misspelling_candidates [PR115440]

2024-06-17 Thread Jakub Jelinek
Hi!

The option_map array for most entries contains just non-NULL opt0
{ "-Wno-", NULL, "-W", false, true },
{ "-fno-", NULL, "-f", false, true },
{ "-gno-", NULL, "-g", false, true },
{ "-mno-", NULL, "-m", false, true },
{ "--debug=", NULL, "-g", false, false },
{ "--machine-", NULL, "-m", true, false },
{ "--machine-no-", NULL, "-m", false, true },
{ "--machine=", NULL, "-m", false, false },
{ "--machine=no-", NULL, "-m", false, true },
{ "--machine", "", "-m", false, false },
{ "--machine", "no-", "-m", false, true },
{ "--optimize=", NULL, "-O", false, false },
{ "--std=", NULL, "-std=", false, false },
{ "--std", "", "-std=", false, false },
{ "--warn-", NULL, "-W", true, false },
{ "--warn-no-", NULL, "-W", false, true },
{ "--", NULL, "-f", true, false },
{ "--no-", NULL, "-f", false, true }
and so add_misspelling_candidates works correctly for it, but 3 out of
these,
{ "--machine", "", "-m", false, false },
{ "--machine", "no-", "-m", false, true },
and
{ "--std", "", "-std=", false, false },
use non-NULL opt1.  That says that
--machine foo
should map to
-mfoo
and
--machine no-foo
should map to
-mno-foo
and
--std c++17
should map to
-std=c++17
add_misspelling_canidates was not handling this, so it hapilly
registered say
--stdc++17
or
--machineavx512
(twice) as spelling alternatives, when those options aren't recognized.
Instead we support
--std c++17
or
--machine avx512
--machine no-avx512

The following patch fixes that.  On this particular testcase, we no longer
suggest anything, even when among the suggestion is say that
--std c++17
or
-std=c++17
etc.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-06-17  Jakub Jelinek  

PR driver/115440
* opts-common.cc (add_misspelling_candidates): If opt1 is non-NULL,
add a space and opt1 to the alternative suggestion text.

* g++.dg/cpp1z/pr115440.C: New test.

--- gcc/opts-common.cc.jj   2024-06-14 19:44:34.434236887 +0200
+++ gcc/opts-common.cc  2024-06-17 10:58:14.351178400 +0200
@@ -524,6 +524,7 @@ add_misspelling_candidates (auto_vecsafe_push (alternative);
}
 }
--- gcc/testsuite/g++.dg/cpp1z/pr115440.C.jj2024-06-17 10:55:28.607380969 
+0200
+++ gcc/testsuite/g++.dg/cpp1z/pr115440.C   2024-06-17 11:04:38.334075632 
+0200
@@ -0,0 +1,8 @@
+// PR driver/115440
+// { dg-do compile { target c++17_only } }
+// { dg-options "--c++17" }
+
+int i;
+
+// { dg-bogus "unrecognized command-line option '--c\\\+\\\+17'; did you mean 
'--stdc\\\+\\\+17'" "" { target *-*-* } 0 }
+// { dg-error "unrecognized command-line option '--c\\\+\\\+17'" "" { target 
*-*-* } 0 }

Jakub



[Committed] RISC-V: Add configure check for Zaamo/Zalrsc assembler support

2024-06-17 Thread Patrick O'Neill
Binutils 2.42 and before don't support Zaamo/Zalrsc. Add a configure
check to prevent emitting Zaamo/Zalrsc in the arch string when the
assember does not support it.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_subset_list::to_string): Skip zaamo/zalrsc when not
supported by the assembler.
* config.in: Regenerate.
* configure: Regenerate.
* configure.ac: Add zaamo/zalrsc assmeber check.

Signed-off-by: Patrick O'Neill 
Acked-by: Palmer Dabbelt  # RISC-V
Reviewed-by: Palmer Dabbelt  # RISC-V
---
Tested using newlib rv64gc with binutils tip-of-tree and 2.42.

This results in calls being emitted when compiling for _zaamo_zalrsc
when the assember does not support these extensions.

> cat amo.c
void foo (int* bar, int* baz)
{
  __atomic_add_fetch(bar, baz, __ATOMIC_RELAXED);
}
> gcc -march=rv64id_zaamo_zalrsc -O3 amo.c
results in:
foo:
sext.w  a1,a1
li  a2,0
tail__atomic_fetch_add_4

As a result there are some testsuite failures on zalrsc specific
testcases and when using an old version of binutils on non-a targets.
Not a cause for concern imo but worth calling out.
Also testcases that check for the default isa string will fail with
the old binutils since zaamo/zalrsc aren't emitted anymore.
---
 gcc/common/config/riscv/riscv-common.cc | 11 +
 gcc/config.in   |  6 +
 gcc/configure   | 31 +
 gcc/configure.ac|  5 
 4 files changed, 53 insertions(+)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 78dfd6b1470..1dc1d9904c7 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -916,6 +916,7 @@ riscv_subset_list::to_string (bool version_p) const
   riscv_subset_t *subset;
 
   bool skip_zifencei = false;
+  bool skip_zaamo_zalrsc = false;
   bool skip_zicsr = false;
   bool i2p0 = false;
 
@@ -943,6 +944,10 @@ riscv_subset_list::to_string (bool version_p) const
  a mistake in that binutils 2.35 supports zicsr but not zifencei.  */
   skip_zifencei = true;
 #endif
+#ifndef HAVE_AS_MARCH_ZAAMO_ZALRSC
+  /* Skip since binutils 2.42 and earlier don't recognize zaamo/zalrsc.  */
+  skip_zaamo_zalrsc = true;
+#endif
 
   for (subset = m_head; subset != NULL; subset = subset->next)
 {
@@ -954,6 +959,12 @@ riscv_subset_list::to_string (bool version_p) const
  subset->name == "zicsr")
continue;
 
+  if (skip_zaamo_zalrsc && subset->name == "zaamo")
+   continue;
+
+  if (skip_zaamo_zalrsc && subset->name == "zalrsc")
+   continue;
+
   /* For !version_p, we only separate extension with underline for
 multi-letter extension.  */
   if (!first &&
diff --git a/gcc/config.in b/gcc/config.in
index e41b6dc97cd..acab3c0f126 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -629,6 +629,12 @@
 #endif
 
 
+/* Define if the assembler understands -march=rv*_zaamo_zalrsc. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_AS_MARCH_ZAAMO_ZALRSC
+#endif
+
+
 /* Define if the assembler understands -march=rv*_zifencei. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_AS_MARCH_ZIFENCEI
diff --git a/gcc/configure b/gcc/configure
index 94970e24051..9dc0b65dfaa 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -30820,6 +30820,37 @@ if test $gcc_cv_as_riscv_march_zifencei = yes; then
 
 $as_echo "#define HAVE_AS_MARCH_ZIFENCEI 1" >>confdefs.h
 
+fi
+
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for 
-march=rv32i_zaamo_zalrsc support" >&5
+$as_echo_n "checking assembler for -march=rv32i_zaamo_zalrsc support... " >&6; 
}
+if ${gcc_cv_as_riscv_march_zaamo_zalrsc+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  gcc_cv_as_riscv_march_zaamo_zalrsc=no
+  if test x$gcc_cv_as != x; then
+$as_echo '' > conftest.s
+if { ac_try='$gcc_cv_as $gcc_cv_as_flags -march=rv32i_zaamo_zalrsc -o 
conftest.o conftest.s >&5'
+  { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
+  (eval $ac_try) 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; }
+then
+   gcc_cv_as_riscv_march_zaamo_zalrsc=yes
+else
+  echo "configure: failed program was" >&5
+  cat conftest.s >&5
+fi
+rm -f conftest.o conftest.s
+  fi
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: 
$gcc_cv_as_riscv_march_zaamo_zalrsc" >&5
+$as_echo "$gcc_cv_as_riscv_march_zaamo_zalrsc" >&6; }
+if test $gcc_cv_as_riscv_march_zaamo_zalrsc = yes; then
+
+$as_echo "#define HAVE_AS_MARCH_ZAAMO_ZALRSC 1" >>confdefs.h
+
 fi
 
 ;;
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 35475cf5aae..b2243e9954a 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -5452,6 +5452,11 @@ configured with --enable-newlib-nano-formatted-io.])
   [-march=rv32i_zifencei2p0],,,
   [AC_DEFINE(HAVE_AS_MARCH_ZIFENCEI, 1,
 [

Re: [PATCH] RISC-V: Add configure check for Zaamo/Zalrsc assembler support

2024-06-17 Thread Patrick O'Neill



On 6/13/24 13:02, Jeff Law wrote:



On 6/12/24 5:20 PM, Patrick O'Neill wrote:

Binutils 2.42 and before don't support Zaamo/Zalrsc. Add a configure
check to prevent emitting Zaamo/Zalrsc in the arch string when the
assember does not support it.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
  (riscv_subset_list::to_string): Skip zaamo/zalrsc when not
  supported by the assembler.
* config.in: Regenerate.
* configure: Regenerate.
* configure.ac: Add zaamo/zalrsc assmeber check.

OK.

It looks like you've got some unexpected diff fragmets in configure -- 
all the LARGE_OFF_T stuff.  They look OK to me, but something like 
that is usually a sign of different autoconf versions.   I wouldn't 
lose any sleep if you left them as-is or removed those hunks before 
committing.


jeff


Removed the hunks and committed.
Sent the committed version to the list for the archiver.

I'll rebase the promotion RFC [1] on top and resolve the warning that 
Andreas Schwab noticed.


Patrick

[1]: 
https://patchwork.sourceware.org/project/gcc/patch/20240613233059.1451117-1-patr...@rivosinc.com/


Re: [PATCH] rs6000: Compute rop_hash_save_offset for non-Altivec compiles [PR115389]

2024-06-17 Thread Kewen.Lin
on 2024/6/15 01:05, Peter Bergner wrote:
> On 6/13/24 10:26 PM, Peter Bergner wrote:
>> On 6/13/24 9:26 PM, Kewen.Lin wrote:
> I understand this is just copied from the if arm, but if I read this 
> right, it can be
> simplified as:

 Ok, I'll retest with that simplification.
>>
>> So I retested a normal powerpc64le-linux build (ie, we default to Power8
>> with Altivec) and it bootstrapped and regtested with no regressions.
>> I then attempted a --with-cpu=power5 build to test the non-altivec path,
>> but both the unpatched and patched builds died building libgfortran with
>> the following error: "error: ‘_Float128’ is not supported on this target".
>> I believe that is related to PR113652.  I'll kick off the build again,
>> this time disabling Fortran and seeing if the build completes.
> 
> My bad for calling the --with-cpu=power5 bootstrap build on ELFv2 a "bug".
> It's not, since ELFv2 mandates a cpu with at least ISA 2.07 (eg. Power8)
> support and some of the libgfortran code was written assuming that, so what
> I was trying to do was really not supported (ie, luser error).
> 
> That said, the --with-cpu=power5 build without fortran did bootstrap and
> regtest with no regressions, so the build did test that code path and
> exposed no problems.

OK, nice!  Thanks!

> 
> 
> 
 That's what I expected too! :-)  However, I was surprised to learn that 
 -mno-altivec
 does *not* disable TARGET_ALTIVEC_ABI.  I had to explicitly use the -mabi= 
 option to
 expose the bug.
>>>
>>> oh, it's surprising, I learn something today! :) I guess it's not 
>>> intentional but just no
>>> one noticed it, as it seems nonsense to have altivec ABI extension but not 
>>> using any altivec
>>> features.
> 
> Currently, TARGET_ALTIVEC_ABI is defined as:
> 
>   #define TARGET_ALTIVEC_ABI rs6000_altivec_abi
> 
> Would it make sense to redine it to:
> 
>   #define TARGET_ALTIVEC_ABI (TARGET_ALTIVEC && rs6000_altivec_abi)
> 
> ...or add some code in rs6000 option handling to disable rs6000_altivec_abi
> when TARGET_ALTIVEC is false?  or do we care enough to even change it? :-)

Assuming the current code is robust enough (perfectly guarded by some altivec 
related
condition like this altivec register saving slot), there may not any actual 
errors,
but considering not surprising people, I'm inclined to add some option 
handlings for
it, like unsetting rs6000_altivec_abi if !TARGET_ALTIVEC and give some warning 
if it's
explicitly specified, what do you think?

BR,
Kewen



[committed] c++: Fix up floating point conversion rank comparison for _Float32 and float if float/double are same size [PR115511]

2024-06-17 Thread Jakub Jelinek
Hi!

On AVR and SH with some options sizeof (float) == sizeof (double) and
the 2 types have the same set of values.
http://eel.is/c++draft/conv.rank#2.2 for this says that double still
has bigger rank than float and http://eel.is/c++draft/conv.rank#2.2
says that extended type with the same set of values as more than one
standard floating point type shall have the same rank as double.
I've implemented the latter rule as
   if (cnt > 1 && mv2 == long_double_type_node)
 return -2;
with the _Float64/double/long double case having same mode case (various
targets with -mlong-double-64) in mind.
But never thought there are actually targets where float and double
are the same, that needs handling too, if cnt > 1 (that is the extended
type mv1 has same set of values as 2 or 3 of float/double/long double)
and mv2 is float, we need to return 2, because mv1 in that case should
have same rank as double and double has bigger rank than float.

Bootstrapped/regtested on x86_64-linux and i686-linux and checked with
a cross-compiler to avr-none on the testcase, which previously ICEd because
the function returned _Float32 and float have the same rank, just different
subrank and for _Float32 vs. double also returned they have the same rank.
Committed to trunk as obvious, will backport to 14/13 soon.

2024-06-17  Jakub Jelinek  

PR target/111343
PR c++/115511
* typeck.cc (cp_compare_floating_point_conversion_ranks): If an
extended floating point type mv1 has same set of values as more
than one standard floating point type and mv2 is float, return 2.

* g++.dg/cpp23/ext-floating18.C: New test.

--- gcc/cp/typeck.cc.jj 2024-06-04 13:19:03.755604346 +0200
+++ gcc/cp/typeck.cc2024-06-17 10:32:02.063088961 +0200
@@ -393,6 +393,9 @@ cp_compare_floating_point_conversion_ran
  has higher rank.  */
   if (cnt > 1 && mv2 == long_double_type_node)
 return -2;
+  /* And similarly if t2 is float, t2 has lower rank.  */
+  if (cnt > 1 && mv2 == float_type_node)
+return 2;
   /* Otherwise, they have equal rank, but extended types
  (other than std::bfloat16_t) have higher subrank.
  std::bfloat16_t shouldn't have equal rank to any standard
--- gcc/testsuite/g++.dg/cpp23/ext-floating18.C.jj  2024-06-17 
18:39:01.740020581 +0200
+++ gcc/testsuite/g++.dg/cpp23/ext-floating18.C 2024-06-17 18:47:19.152779782 
+0200
@@ -0,0 +1,26 @@
+// P1467R9 - Extended floating-point types and standard names.
+// { dg-do compile { target c++23 } }
+// { dg-options "" }
+// { dg-add-options float32 }
+
+constexpr int foo (float) { return 1; }
+constexpr int foo (double) { return 2; }
+constexpr int foo (long double) { return 3; }
+
+#ifdef __STDCPP_FLOAT32_T__
+#if __FLT_MAX_EXP__ == __FLT32_MAX_EXP__ \
+&& __FLT_MAX_DIG__ == __FLT32_MAX_DIG__
+#if __FLT_MAX_EXP__ == __DBL_MAX_EXP__ \
+&& __FLT_MAX_DIG__ == __DBL_MAX_DIG__
+static_assert (foo (1.0f32) == 2);
+#else
+static_assert (foo (1.0f32) == 1);
+#endif
+#endif
+#endif
+#ifdef __STDCPP_FLOAT64_T__
+#if __DBL_MAX_EXP__ == __FLT64_MAX_EXP__ \
+&& __DBL_MAX_DIG__ == __FLT64_MAX_DIG__
+static_assert (foo (1.0f64) == 2);
+#endif
+#endif

Jakub



[PATCH] c-family: Fix -Warray-compare warning ICE [PR115290]

2024-06-17 Thread Jakub Jelinek
Hi!

The warning code uses %D to print the ARRAY_REF first operands.
That works in the most common case where those operands are decls, but
as can be seen on the following testcase, they can be other expressions
with array type.
Just changing %D to %E isn't enough, because then the diagnostics can
suggest something like
note: use '&(x) != 0 ? (int (*)[32])&a : (int (*)[32])&b[0] == &(y) != 0 ? (int 
(*)[32])&a : (int (*)[32])&b[0]' to compare the addresses
which is a bad suggestion, the %E printing doesn't know that the
warning code will want to add & before it and [0] after it.
So, the following patch adds ()s around the operand as well, but does
that only for non-decls, for decls keeps it as &arr[0] like before.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk
and release branches?

2024-06-17  Jakub Jelinek  

PR c/115290
* c-warn.cc (do_warn_array_compare): Use %E rather than %D for
printing op0 and op1; if those operands aren't decls, also print
parens around them.

* c-c++-common/Warray-compare-3.c: New test.

--- gcc/c-family/c-warn.cc.jj   2024-06-04 13:19:03.371609456 +0200
+++ gcc/c-family/c-warn.cc  2024-06-17 15:07:09.005737065 +0200
@@ -3832,11 +3832,16 @@ do_warn_array_compare (location_t locati
   /* C doesn't allow +arr.  */
   if (c_dialect_cxx ())
inform (location, "use unary %<+%> which decays operands to pointers "
-   "or %<&%D[0] %s &%D[0]%> to compare the addresses",
-   op0, op_symbol_code (code), op1);
+   "or %<&%s%E%s[0] %s &%s%E%s[0]%> to compare the addresses",
+   DECL_P (op0) ? "" : "(", op0, DECL_P (op0) ? "" : ")",
+   op_symbol_code (code),
+   DECL_P (op1) ? "" : "(", op1, DECL_P (op1) ? "" : ")");
   else
-   inform (location, "use %<&%D[0] %s &%D[0]%> to compare the addresses",
-   op0, op_symbol_code (code), op1);
+   inform (location,
+   "use %<&%s%E%s[0] %s &%s%E%s[0]%> to compare the addresses",
+   DECL_P (op0) ? "" : "(", op0, DECL_P (op0) ? "" : ")",
+   op_symbol_code (code),
+   DECL_P (op1) ? "" : "(", op1, DECL_P (op1) ? "" : ")");
 }
 }
 
--- gcc/testsuite/c-c++-common/Warray-compare-3.c.jj2024-06-17 
15:13:57.098422635 +0200
+++ gcc/testsuite/c-c++-common/Warray-compare-3.c   2024-06-17 
15:13:24.339849049 +0200
@@ -0,0 +1,13 @@
+/* PR c/115290 */
+/* { dg-do compile } */
+/* { dg-options "-Warray-compare" } */
+
+int a[32][32], b[32][32];
+
+int
+foo (int x, int y)
+{
+  return (x ? a : b) == (y ? a : b); /* { dg-warning "comparison between two 
arrays" } */
+/* { dg-message "use '&\\\(\[^\n\r]*\\\)\\\[0\\\] == 
&\\\(\[^\n\r]*\\\)\\\[0\\\]' to compare the addresses" "" { target c } .-1 } */
+/* { dg-message "use unary '\\\+' which decays operands to pointers or 
'&\\\(\[^\n\r]*\\\)\\\[0\\\] == &\\\(\[^\n\r]*\\\)\\\[0\\\]' to compare the 
addresses" "" { target c++ } .-2 } */
+}

Jakub



Re: [PATCH 30/52 v2] pdp11: Remove macro {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE

2024-06-17 Thread Kewen.Lin
Hi Paul,

on 2024/6/14 23:20, Paul Koning wrote:
> Ok, I understand better now.  But if those macros are supposed to be replaced 
> by hook functions, could you make that replacement part of the proposed patch?

The default implementation of the introduced hook mode_for_floating_type
returns SFmode for float and DFmode for double or long double, which matches
what pdp11 port requires, so there is no need to add its own hook 
implementation.
This patch series only re-define this hook macro with the customized hook
implementation for those ports which need something beyond the default.

BR,
Kewen

> 
>   paul
> 
>> On Jun 13, 2024, at 11:22 PM, Kewen.Lin  wrote:
>>
>> Hi Paul,
>>
>> on 2024/6/14 04:07, Paul Koning wrote:
>>> What is the effect of this change?  The original code intended to have 
>>> "float" mean a 32 bit value, and "double" a 64 bit value.  There aren't any 
>>> larger floats, so I defined the long double size as 64 also.  Is the right 
>>> answer not to define it?
>>
>> Since sub-patch 09/52 will poison {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE, 
>> target code building will fail
>> if it still has these macros.  As I'd like to squash these target changes 
>> onto 09/52, so I didn't note
>> the background/context here, sorry about that.
>>
>>>
>>> That part I understand, but why does the patch also remove FLOAT_TYPE_SIZE 
>>> and DOUBLE_TYPE_SIZE without explanation and without mention in the 
>>> changelog?
>>
>> Oops, thanks for catching!  I just noticed this sub-patch has inconsistent 
>> subject & changelog, I should
>> have noticed this as it has a quite different subject from the others. :(  
>> With your finding, I just
>> re-visited all the other sub-patches, luckily they are consistent.
>>
>> The below is the updated revision, hope it looks good to you.  Thanks again.
>>
>> BR,
>> Kewen
>> -
>>
>> Subject: [PATCH] pdp11: Remove macro {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE
>>
>> This is to remove macros {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
>> defines in pdp11 port, as we want to replace these macros
>> with hook mode_for_floating_type and poison them.
>>
>> gcc/ChangeLog:
>>
>>* config/pdp11/pdp11.h (FLOAT_TYPE_SIZE): Remove.
>>(DOUBLE_TYPE_SIZE): Likewise.
>>(LONG_DOUBLE_TYPE_SIZE): Likewise.
>> ---
>> gcc/config/pdp11/pdp11.h | 11 ---
>> 1 file changed, 11 deletions(-)
>>
>> diff --git a/gcc/config/pdp11/pdp11.h b/gcc/config/pdp11/pdp11.h
>> index 2446fea0b58..6c8e045bc57 100644
>> --- a/gcc/config/pdp11/pdp11.h
>> +++ b/gcc/config/pdp11/pdp11.h
>> @@ -71,17 +71,6 @@ along with GCC; see the file COPYING3.  If not see
>> #define LONG_TYPE_SIZE 32
>> #define LONG_LONG_TYPE_SIZE64
>>
>> -/* In earlier versions, FLOAT_TYPE_SIZE was selectable as 32 or 64,
>> -   but that conflicts with Fortran language rules.  Since there is no
>> -   obvious reason why we should have that feature -- other targets
>> -   generally don't have float and double the same size -- I've removed
>> -   it.  Note that it continues to be true (for now) that arithmetic is
>> -   always done with 64-bit values, i.e., the FPU is always in "double"
>> -   mode.  */
>> -#define FLOAT_TYPE_SIZE32
>> -#define DOUBLE_TYPE_SIZE   64
>> -#define LONG_DOUBLE_TYPE_SIZE  64
>> -
>> /* machine types from ansi */
>> #define SIZE_TYPE "short unsigned int" /* definition of size_t */
>> #define WCHAR_TYPE "short int" /* or long int */
>> --
>> 2.43.0
>>
>>
> 



Re: [PATCH] middle-end/114189 - drop uses of vcond{,u,eq}_optab

2024-06-17 Thread Stefan Schulze Frielinghaus
On Mon, Jun 17, 2024 at 08:16:34AM +0200, Richard Biener wrote:
> On Mon, 17 Jun 2024, Kewen.Lin wrote:
> 
> > Hi Richi,
> > 
> > on 2024/6/14 18:31, Richard Biener wrote:
> > > The following retires vcond{,u,eq} optabs by stopping to use them
> > > from the middle-end.  Targets instead (should) implement vcond_mask
> > > and vec_cmp{,u,eq} optabs.  The PR this change refers to lists
> > > possibly affected targets - those implementing these patterns,
> > > and in particular it lists mips, sparc and ia64 as targets that
> > > most definitely will regress while others might simply remove
> > > their vcond{,u,eq} patterns.
> > > 
> > > I'd appreciate testing, I do not expect fallout for x86 or arm/aarch64.
> > > I know riscv doesn't implement any of the legacy optabs.  But less
> > > maintained vector targets might need adjustments.
> > 
> > Thanks for making this change, this patch can be bootstrapped on ppc64{,le}
> > but both have one failure on gcc/testsuite/gcc.target/powerpc/pr66144-3.c,
> > by looking into it, I found it just exposed one oversight in the current
> > rs6000 vcond_mask support (the condition mask location is wrong), so I think
> > this change is fine for rs6000 port, I'll also test SPEC2017 for this (with
> > rs6000 vcond_mask change) soon.
> 
> Btw, for those targets where the patch works out fine it would be nice
> to delete their vcond{,u,eq} expanders (and double-check that doesn't
> cause issues on its own).
> 
> Can target maintainers note whether their targets support all condition
> codes for their vector comparisons (including FP variants)?  And 
> whether they choose to implement all condition codes in vec_cmp
> and adjust with inversion / operand swapping for not supported cases?

On s390 we support all comparison operations with inverse / operand
swapping via s390_expand_vec_compare.  However, we still have some
failures for which I opened PR115519.  Currently it is unclear to me
what precisely is missing and will have a further look.  vcond_mask
expander is also implemented for all modes.

Cheers,
Stefan

> 
> Thanks,
> Richard.
> 
> > BR,
> > Kewen
> > 
> > > 
> > > I want to get rid of those optabs for GCC 15.  If I don't hear from
> > > you I will assume your target is fine.
> > > 
> > > Thanks,
> > > Richard.
> > > 
> > >   PR middle-end/114189
> > >   * optabs-query.h (get_vcond_icode): Always return CODE_FOR_nothing.
> > >   (get_vcond_eq_icode): Likewise.
> > > ---
> > >  gcc/optabs-query.h | 13 -
> > >  1 file changed, 4 insertions(+), 9 deletions(-)
> > > 
> > > diff --git a/gcc/optabs-query.h b/gcc/optabs-query.h
> > > index 0cb2c21ba85..31fbce80175 100644
> > > --- a/gcc/optabs-query.h
> > > +++ b/gcc/optabs-query.h
> > > @@ -112,14 +112,9 @@ get_vec_cmp_eq_icode (machine_mode vmode, 
> > > machine_mode mask_mode)
> > > mode CMODE, unsigned if UNS is true, resulting in a value of mode 
> > > VMODE.  */
> > >  
> > >  inline enum insn_code
> > > -get_vcond_icode (machine_mode vmode, machine_mode cmode, bool uns)
> > > +get_vcond_icode (machine_mode, machine_mode, bool)
> > >  {
> > > -  enum insn_code icode = CODE_FOR_nothing;
> > > -  if (uns)
> > > -icode = convert_optab_handler (vcondu_optab, vmode, cmode);
> > > -  else
> > > -icode = convert_optab_handler (vcond_optab, vmode, cmode);
> > > -  return icode;
> > > +  return CODE_FOR_nothing;
> > >  }
> > >  
> > >  /* Return insn code for a conditional operator with a mask mode
> > > @@ -135,9 +130,9 @@ get_vcond_mask_icode (machine_mode vmode, 
> > > machine_mode mmode)
> > > mode CMODE (only EQ/NE), resulting in a value of mode VMODE.  */
> > >  
> > >  inline enum insn_code
> > > -get_vcond_eq_icode (machine_mode vmode, machine_mode cmode)
> > > +get_vcond_eq_icode (machine_mode, machine_mode)
> > >  {
> > > -  return convert_optab_handler (vcondeq_optab, vmode, cmode);
> > > +  return CODE_FOR_nothing;
> > >  }
> > >  
> > >  /* Enumerates the possible extraction_insn operations.  */
> > 
> > 
> 
> -- 
> Richard Biener 
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH] rs6000: Shrink rs6000_init_generated_builtins size [PR115324]

2024-06-17 Thread Jakub Jelinek
Hi!

While my r15-1001-g4cf2de9b5268224 PCH PIE power fix change decreased the
.data section sizes (219792 -> 189336), it increased the size of already
huge rs6000_init_generated_builtins generated function, from 218328
to 228668 bytes.  That is because there are thousands of array references
to global arrays and we keep constructing the addresses of the arrays
again and again.

Ideally some optimization would figure out we have a single function which
has
461   rs6000_overload_info
   1257   rs6000_builtin_info_fntype
   1768   rs6000_builtin_decls
   2548   rs6000_instance_info_fntype
array references and that maybe it might be a good idea to just preload
the addresses of those arrays into some register if it decreases code size
and doesn't slow things down.
The function actually is called just once and is huge, so code size is even
more important than speed, which is dominated by all the GC allocations
anyway.

Until that is done, here is a slightly cleaner version of the hack, which
makes the function noipa (so that LTO doesn't undo it) for GCC 8.1+ and
passes the 4 arrays as arguments to the function from the caller.
This decreases the function size from 228668 bytes to 207572 bytes.

Bootstrapped/regtested on powerpc64le-linux, ok for trunk?

2024-06-17  Jakub Jelinek  

PR target/115324
* config/rs6000/rs6000-gen-builtins.cc (write_decls): Change
declaration of rs6000_init_generated_builtins from no arguments
to 4 pointer arguments.
(write_init_bif_table): Change rs6000_builtin_info_fntype to
builtin_info_fntype and rs6000_builtin_decls to builtin_decls.
(write_init_ovld_table): Change rs6000_instance_info_fntype to
instance_info_fntype, rs6000_builtin_decls to builtin_decls and
rs6000_overload_info to overload_info.
(write_init_file): Add __noipa__ attribute to
rs6000_init_generated_builtins for GCC 8.1+ and change the function
from no arguments to 4 pointer arguments.  Change rs6000_builtin_decls
to builtin_decls.
* config/rs6000/rs6000-builtin.cc (rs6000_init_builtins): Adjust
rs6000_init_generated_builtins caller.

--- gcc/config/rs6000/rs6000-gen-builtins.cc.jj 2024-06-03 23:11:02.662631144 
+0200
+++ gcc/config/rs6000/rs6000-gen-builtins.cc2024-06-03 23:38:31.727620920 
+0200
@@ -2376,7 +2376,10 @@ write_decls (void)
   "rs6000_instance_info_fntype[RS6000_INST_MAX];\n");
   fprintf (header_file, "extern ovldrecord rs6000_overload_info[];\n\n");
 
-  fprintf (header_file, "extern void rs6000_init_generated_builtins ();\n\n");
+  fprintf (header_file,
+  "extern void rs6000_init_generated_builtins (tree *, tree *,\n");
+  fprintf (header_file,
+  "\t\t\t\t\tovldrecord *, tree *);\n\n");
   fprintf (header_file,
   "extern bool rs6000_builtin_is_supported (rs6000_gen_builtins);\n");
   fprintf (header_file,
@@ -2651,7 +2654,7 @@ write_init_bif_table (void)
   for (int i = 0; i <= curr_bif; i++)
 {
   fprintf (init_file,
-  "  rs6000_builtin_info_fntype[RS6000_BIF_%s]"
+  "  builtin_info_fntype[RS6000_BIF_%s]"
   "\n= %s;\n",
   bifs[i].idname, bifs[i].fndecl);
 
@@ -2678,7 +2681,7 @@ write_init_bif_table (void)
}
 
   fprintf (init_file,
-  "  rs6000_builtin_decls[(int)RS6000_BIF_%s] = t\n",
+  "  builtin_decls[(int)RS6000_BIF_%s] = t\n",
   bifs[i].idname);
   fprintf (init_file,
   "= add_builtin_function (\"%s\",\n",
@@ -2719,7 +2722,7 @@ write_init_bif_table (void)
  fprintf (init_file, "}\n");
  fprintf (init_file, "  else\n");
  fprintf (init_file, "{\n");
- fprintf (init_file, "  rs6000_builtin_decls"
+ fprintf (init_file, "  builtin_decls"
   "[(int)RS6000_BIF_%s] = NULL_TREE;\n", bifs[i].idname);
  fprintf (init_file, "}\n");
}
@@ -2740,7 +2743,7 @@ write_init_ovld_table (void)
   for (int i = 0; i <= curr_ovld; i++)
 {
   fprintf (init_file,
-  "  rs6000_instance_info_fntype[RS6000_INST_%s]"
+  "  instance_info_fntype[RS6000_INST_%s]"
   "\n= %s;\n",
   ovlds[i].ovld_id_name, ovlds[i].fndecl);
 
@@ -2772,7 +2775,7 @@ write_init_ovld_table (void)
}
 
  fprintf (init_file,
-  "  rs6000_builtin_decls[(int)RS6000_OVLD_%s] = t\n",
+  "  builtin_decls[(int)RS6000_OVLD_%s] = t\n",
   stanza->stanza_id);
  fprintf (init_file,
   "= add_builtin_function (\"%s\",\n",
@@ -2793,7 +2796,7 @@ write_init_ovld_table (void)
  fprintf (init_file, "\n");
 
  fprintf (init_file,
-  "  rs6000_overload_info[RS6000_OVLD_%s - base]"
+  "  overload_info[RS6000_OVLD_%s - base]"
   ".first_instance\n",
  

Re: [PATCH] c-family: Fix -Warray-compare warning ICE [PR115290]

2024-06-17 Thread Marek Polacek
On Mon, Jun 17, 2024 at 07:09:03PM +0200, Jakub Jelinek wrote:
> Hi!
> 
> The warning code uses %D to print the ARRAY_REF first operands.
> That works in the most common case where those operands are decls, but
> as can be seen on the following testcase, they can be other expressions
> with array type.
> Just changing %D to %E isn't enough, because then the diagnostics can
> suggest something like
> note: use '&(x) != 0 ? (int (*)[32])&a : (int (*)[32])&b[0] == &(y) != 0 ? 
> (int (*)[32])&a : (int (*)[32])&b[0]' to compare the addresses
> which is a bad suggestion, the %E printing doesn't know that the
> warning code will want to add & before it and [0] after it.
> So, the following patch adds ()s around the operand as well, but does
> that only for non-decls, for decls keeps it as &arr[0] like before.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk
> and release branches?

Ok, thanks.
 
> 2024-06-17  Jakub Jelinek  
> 
>   PR c/115290
>   * c-warn.cc (do_warn_array_compare): Use %E rather than %D for
>   printing op0 and op1; if those operands aren't decls, also print
>   parens around them.
> 
>   * c-c++-common/Warray-compare-3.c: New test.
> 
> --- gcc/c-family/c-warn.cc.jj 2024-06-04 13:19:03.371609456 +0200
> +++ gcc/c-family/c-warn.cc2024-06-17 15:07:09.005737065 +0200
> @@ -3832,11 +3832,16 @@ do_warn_array_compare (location_t locati
>/* C doesn't allow +arr.  */
>if (c_dialect_cxx ())
>   inform (location, "use unary %<+%> which decays operands to pointers "
> - "or %<&%D[0] %s &%D[0]%> to compare the addresses",
> - op0, op_symbol_code (code), op1);
> + "or %<&%s%E%s[0] %s &%s%E%s[0]%> to compare the addresses",
> + DECL_P (op0) ? "" : "(", op0, DECL_P (op0) ? "" : ")",
> + op_symbol_code (code),
> + DECL_P (op1) ? "" : "(", op1, DECL_P (op1) ? "" : ")");
>else
> - inform (location, "use %<&%D[0] %s &%D[0]%> to compare the addresses",
> - op0, op_symbol_code (code), op1);
> + inform (location,
> + "use %<&%s%E%s[0] %s &%s%E%s[0]%> to compare the addresses",
> + DECL_P (op0) ? "" : "(", op0, DECL_P (op0) ? "" : ")",
> + op_symbol_code (code),
> + DECL_P (op1) ? "" : "(", op1, DECL_P (op1) ? "" : ")");
>  }
>  }
>  
> --- gcc/testsuite/c-c++-common/Warray-compare-3.c.jj  2024-06-17 
> 15:13:57.098422635 +0200
> +++ gcc/testsuite/c-c++-common/Warray-compare-3.c 2024-06-17 
> 15:13:24.339849049 +0200
> @@ -0,0 +1,13 @@
> +/* PR c/115290 */
> +/* { dg-do compile } */
> +/* { dg-options "-Warray-compare" } */
> +
> +int a[32][32], b[32][32];
> +
> +int
> +foo (int x, int y)
> +{
> +  return (x ? a : b) == (y ? a : b); /* { dg-warning "comparison between two 
> arrays" } */
> +/* { dg-message "use '&\\\(\[^\n\r]*\\\)\\\[0\\\] == 
> &\\\(\[^\n\r]*\\\)\\\[0\\\]' to compare the addresses" "" { target c } .-1 } 
> */
> +/* { dg-message "use unary '\\\+' which decays operands to pointers or 
> '&\\\(\[^\n\r]*\\\)\\\[0\\\] == &\\\(\[^\n\r]*\\\)\\\[0\\\]' to compare the 
> addresses" "" { target c++ } .-2 } */
> +}
> 
>   Jakub
> 

Marek



Re: [pushed 2/3] libcpp: move label_text to its own header

2024-06-17 Thread Bert Wesarg
Hi,

On Thu, Jun 6, 2024 at 7:05 PM Andrew Pinski  wrote:
>
> On Thu, Jun 6, 2024 at 9:00 AM David Malcolm  wrote:
> >
> > On Thu, 2024-06-06 at 08:40 -0700, Andrew Pinski wrote:
> > > On Thu, Jun 6, 2024 at 6:02 AM Bert Wesarg
> > >  wrote:
> > > >
> > > > Dear David,
> > > >
> > > > On Tue, May 28, 2024 at 10:07 PM David Malcolm
> > > >  wrote:
> > > > >
> > > > > No functional change intended.
> > > > >
> > > > > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > > > > Pushed to trunk as r15-874-g9bda2c4c81b668.
> > > > >
> > > > > libcpp/ChangeLog:
> > > > > * Makefile.in (TAGS_SOURCES): Add include/label-text.h.
> > > > > * include/label-text.h: New file.
> > > > > * include/rich-location.h: Include "label-text.h".
> > > > > (class label_text): Move to label-text.h.
> > > > >
> > > > > Signed-off-by: David Malcolm 
> > > > > ---
> > > > >  libcpp/Makefile.in |   2 +-
> > > > >  libcpp/include/label-text.h| 102
> > > > > +
> > > > >  libcpp/include/rich-location.h |  79 +
> > > > >  3 files changed, 105 insertions(+), 78 deletions(-)
> > > > >  create mode 100644 libcpp/include/label-text.h
> > > > >
> > > > > diff --git a/libcpp/Makefile.in b/libcpp/Makefile.in
> > > > > index ebbca3fb..7e47153264c0 100644
> > > > > --- a/libcpp/Makefile.in
> > > > > +++ b/libcpp/Makefile.in
> > > > > @@ -271,7 +271,7 @@ ETAGS = @ETAGS@
> > > > >
> > > > >  TAGS_SOURCES = $(libcpp_a_SOURCES) internal.h system.h ucnid.h \
> > > > >  include/cpplib.h include/line-map.h include/mkdeps.h
> > > > > include/symtab.h \
> > > > > -include/rich-location.h
> > > > > +include/rich-location.h include/label-text.h
> > > >
> > > > this does not seem to be enough that the new header will be
> > > > installed.
> > > > I get compile errors when compiling an plug-in with this patch:
> > > >
> > > > In file included from
> > > > /home/bitten/opt/gcc-15-20240602/lib/gcc/x86_64-pc-linux-
> > > > gnu/15.0.0/plugin/include/diagnostic.h:24,
> > > > from
> > > > /home/bitten/builds/oCyPvWN6/1/perftools/cicd/scorep/src/build-gcc-
> > > > plugin/../src/adapters/compiler/gcc-
> > > > plugin/scorep_plugin_inst_descriptor.cpp:43:
> > > > /home/bitten/opt/gcc-15-20240602/lib/gcc/x86_64-pc-linux-
> > > > gnu/15.0.0/plugin/include/rich-location.h:25:10:
> > > > fatal error: label-text.h: No such file or directory
> > > > 25 | #include "label-text.h"
> > > > > ^~
> > > > compilation terminated.
> > >
> > > I have a fix which I am testing.
> >
> > Likewise (and sorry about the breakage)
>
> Committed as r15-1076-g6e6471806d886b .

Thanks. I can confirm, that my external plugin builds again.

Bert

>
> >
> > Dave
> >


Re: [PATCH] rs6000, altivec-2-runnable.c update the require-effective-target

2024-06-17 Thread Peter Bergner
On 6/14/24 1:37 PM, Carl Love wrote:
> Per the additional feedback after patch: 
> 
>   commit c892525813c94b018464d5a4edc17f79186606b7
>   Author: Carl Love 
>   Date:   Tue Jun 11 14:01:16 2024 -0400
> 
>   rs6000, altivec-2-runnable.c should be a runnable test
> 
>   The test case has "dg-do compile" set not "dg-do run" for a runnable
>   test.  This patch changes the dg-do command argument to run.
> 
>   gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
>   * gcc.target/powerpc/altivec-2-runnable.c: Change dg-do
>   argument to run.

Test case altivec-1-runnable.c seems to have the same issue, in that it
is currently a dg-do compile test case rather than the intended dg-do run.
Can you have a look at changing that to dg-do run too?  My guess it that
this one will want something similar to some other altivec test cases, ala:

/* { dg-do run { target vmx_hw } } */
/* { dg-do compile { target { ! vmx_hw } } } */
/* { dg-require-effective-target powerpc_altivec_ok } */
/* { dg-options "-O2 -maltivec -mabi=altivec" } */


That said, I don't like not having a -mdejagnu-cpu=... here.
I think for our server cpus, this is fine, but on an embedded system
with a old ISA default for -mcpu=... (so we be doing a dg-do compile),
just adding -maltivec to that default may not make much sense for that
default and probably should be an error.  Maybe something like:

/* { dg-do run { target vmx_hw } } */
/* { dg-do compile { target { ! vmx_hw } } } */
/* { dg-require-effective-target powerpc_altivec_ok } */
/* { dg-options "-O2 -mdejagnu=power7" } */

...makes more sense?   Ke Wen & Segher, thoughts on that?
Ke Wen, should powerpc_altivec_ok be powerpc_altivec here???

Peter




Re: [PATCH 2/3] Enabled LRA for ia64.

2024-06-17 Thread Joseph Myers
On Fri, 14 Jun 2024, Jonathan Wakely wrote:

> Both, ideally. The libstdc++ test should definitely be fixed because
> it fails with released versions of glibc already in the wild. But
> glibc should also be fixed because it's a standards conformance issue.

The __ctx macro used in various sys/ucontext.h headers prepends __ in 
standards conformance modes (the point being to avoid breaking the API 
outside such modes when we fixed the namespace issues).

#ifdef __USE_MISC
# define __ctx(fld) fld
#else
# define __ctx(fld) __ ## fld
#endif

(bits/sigcontext.h didn't get any such fixes as it's not included at all 
in standards conformance modes, only if __USE_MISC.)

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH 30/52 v2] pdp11: Remove macro {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE

2024-06-17 Thread Paul Koning
Thanks Kewen.

Given that background, the patch is OK.

paul

> On Jun 16, 2024, at 10:01 PM, Kewen.Lin  wrote:
> 
> Hi Paul,
> 
> on 2024/6/14 23:20, Paul Koning wrote:
>> Ok, I understand better now.  But if those macros are supposed to be 
>> replaced by hook functions, could you make that replacement part of the 
>> proposed patch?
> 
> The default implementation of the introduced hook mode_for_floating_type
> returns SFmode for float and DFmode for double or long double, which matches
> what pdp11 port requires, so there is no need to add its own hook 
> implementation.
> This patch series only re-define this hook macro with the customized hook
> implementation for those ports which need something beyond the default.
> 
> BR,
> Kewen
> 
>> 
>>  paul
>> 
>>> On Jun 13, 2024, at 11:22 PM, Kewen.Lin  wrote:
>>> 
>>> Hi Paul,
>>> 
>>> on 2024/6/14 04:07, Paul Koning wrote:
 What is the effect of this change?  The original code intended to have 
 "float" mean a 32 bit value, and "double" a 64 bit value.  There aren't 
 any larger floats, so I defined the long double size as 64 also.  Is the 
 right answer not to define it?
>>> 
>>> Since sub-patch 09/52 will poison {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE, 
>>> target code building will fail
>>> if it still has these macros.  As I'd like to squash these target changes 
>>> onto 09/52, so I didn't note
>>> the background/context here, sorry about that.
>>> 
 
 That part I understand, but why does the patch also remove FLOAT_TYPE_SIZE 
 and DOUBLE_TYPE_SIZE without explanation and without mention in the 
 changelog?
>>> 
>>> Oops, thanks for catching!  I just noticed this sub-patch has inconsistent 
>>> subject & changelog, I should
>>> have noticed this as it has a quite different subject from the others. :(  
>>> With your finding, I just
>>> re-visited all the other sub-patches, luckily they are consistent.
>>> 
>>> The below is the updated revision, hope it looks good to you.  Thanks again.
>>> 
>>> BR,
>>> Kewen
>>> -
>>> 
>>> Subject: [PATCH] pdp11: Remove macro {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE
>>> 
>>> This is to remove macros {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
>>> defines in pdp11 port, as we want to replace these macros
>>> with hook mode_for_floating_type and poison them.
>>> 
>>> gcc/ChangeLog:
>>> 
>>>   * config/pdp11/pdp11.h (FLOAT_TYPE_SIZE): Remove.
>>>   (DOUBLE_TYPE_SIZE): Likewise.
>>>   (LONG_DOUBLE_TYPE_SIZE): Likewise.
>>> ---
>>> gcc/config/pdp11/pdp11.h | 11 ---
>>> 1 file changed, 11 deletions(-)
>>> 
>>> diff --git a/gcc/config/pdp11/pdp11.h b/gcc/config/pdp11/pdp11.h
>>> index 2446fea0b58..6c8e045bc57 100644
>>> --- a/gcc/config/pdp11/pdp11.h
>>> +++ b/gcc/config/pdp11/pdp11.h
>>> @@ -71,17 +71,6 @@ along with GCC; see the file COPYING3.  If not see
>>> #define LONG_TYPE_SIZE 32
>>> #define LONG_LONG_TYPE_SIZE64
>>> 
>>> -/* In earlier versions, FLOAT_TYPE_SIZE was selectable as 32 or 64,
>>> -   but that conflicts with Fortran language rules.  Since there is no
>>> -   obvious reason why we should have that feature -- other targets
>>> -   generally don't have float and double the same size -- I've removed
>>> -   it.  Note that it continues to be true (for now) that arithmetic is
>>> -   always done with 64-bit values, i.e., the FPU is always in "double"
>>> -   mode.  */
>>> -#define FLOAT_TYPE_SIZE32
>>> -#define DOUBLE_TYPE_SIZE   64
>>> -#define LONG_DOUBLE_TYPE_SIZE  64
>>> -
>>> /* machine types from ansi */
>>> #define SIZE_TYPE "short unsigned int" /* definition of size_t */
>>> #define WCHAR_TYPE "short int" /* or long int */
>>> --
>>> 2.43.0
>>> 
>>> 
>> 
> 



[PATCH] c++: ICE with generic lambda and pack expansion [PR115425]

2024-06-17 Thread Marek Polacek
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
In r13-272 we hardened the *_PACK_EXPANSION and *_ARGUMENT_PACK macros.
That trips up here because make_pack_expansion returns error_mark_node
and we access that with PACK_EXPANSION_LOCAL_P.

PR c++/115425

gcc/cp/ChangeLog:

* pt.cc (tsubst_pack_expansion): Return error_mark_node if
make_pack_expansion doesn't work out.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-generic12.C: New test.
---
 gcc/cp/pt.cc  |  2 ++
 gcc/testsuite/g++.dg/cpp2a/lambda-generic12.C | 25 +++
 2 files changed, 27 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-generic12.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 607753ae6b7..e676372f75b 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -13775,6 +13775,8 @@ tsubst_pack_expansion (tree t, tree args, 
tsubst_flags_t complain,
   else
result = tsubst (pattern, args, complain, in_decl);
   result = make_pack_expansion (result, complain);
+  if (result == error_mark_node)
+   return error_mark_node;
   PACK_EXPANSION_LOCAL_P (result) = PACK_EXPANSION_LOCAL_P (t);
   PACK_EXPANSION_SIZEOF_P (result) = PACK_EXPANSION_SIZEOF_P (t);
   if (PACK_EXPANSION_AUTO_P (t))
diff --git a/gcc/testsuite/g++.dg/cpp2a/lambda-generic12.C 
b/gcc/testsuite/g++.dg/cpp2a/lambda-generic12.C
new file mode 100644
index 000..219529c7c32
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/lambda-generic12.C
@@ -0,0 +1,25 @@
+// PR c++/115425
+// { dg-do compile { target c++20 } }
+
+using size_t = decltype(sizeof(0));
+
+template 
+struct X {};
+
+template
+void foo(X);
+
+template
+struct S;
+
+template
+auto test() {
+  constexpr static auto x = foo>(); // { dg-error "no 
matching function" }
+  return [](X) {
+(typename S::type{}, ...);
+  }(X<__integer_pack (0)...>{});
+}
+
+int main() {
+  test();
+}

base-commit: b63c7d92012f92e0517190cf263d29bbef8a06bf
-- 
2.45.1



[PATCH V3 0/2] Fix ICE with vwsll combine on 32bit targets

2024-06-17 Thread Edwin Lu
The following testcases have been failing on rv32 targets since 
r15-953-gaf4bf422a69:
FAIL: gcc.target/riscv/rvv/autovec/binop/vwsll-1.c (internal compiler
error: in maybe_legitimize_operand, at optabs.cc:8056)
FAIL: gcc.target/riscv/rvv/autovec/binop/vwsll-1.c (test for excess
errors)

Fix the bug and also robustify our emit_insn by making an assertion
check unconditional

I'm not sure if this ICE warrants its own separate testcase since it is
already being tested. I do have a minimal testcase on hand if we would
like to add one.

V2: Remove subreg condition and change assert to internal error

V3: Update the _trunc_scalar splitter as well

Edwin Lu (2):
  RISC-V: Fix vwsll combine on rv32 targets
  RISC-V: Move mode assertion out of conditional branch in emit_insn

 gcc/config/riscv/autovec-opt.md |  6 ++
 gcc/config/riscv/riscv-v.cc | 25 +++--
 2 files changed, 21 insertions(+), 10 deletions(-)

-- 
2.34.1



[PATCH V3 1/2] RISC-V: Fix vwsll combine on rv32 targets

2024-06-17 Thread Edwin Lu
On rv32 targets, vwsll_zext1_scalar_ would trigger an ice in
maybe_legitimize_instruction when zero extending a uint32 to uint64 due
to a mismatch between the input operand's mode (DI) and the expanded insn
operand's mode (Pmode == SI). Ensure that mode of the operands match

Tested on rv32/64 gcv newlib. Letting CI perform additional testing

gcc/ChangeLog:

* config/riscv/autovec-opt.md: Fix mode mismatch

Signed-off-by: Edwin Lu 
Co-authored-by: Robin Dapp 
---
V2: Remove subreg check

V3: Update _trunc_scalar splitter as well
---
 gcc/config/riscv/autovec-opt.md | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 6a2eabbd854..d7a3cfd4602 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -1517,8 +1517,7 @@ (define_insn_and_split "*vwsll_zext1_scalar_"
   "&& 1"
   [(const_int 0)]
   {
-if (GET_CODE (operands[2]) == SUBREG)
-  operands[2] = SUBREG_REG (operands[2]);
+operands[2] = gen_lowpart (Pmode, operands[2]);
 insn_code icode = code_for_pred_vwsll_scalar (mode);
 riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, operands);
 DONE;
@@ -1584,8 +1583,7 @@ (define_insn_and_split "*vwsll_zext1_trunc_scalar_"
   "&& 1"
   [(const_int 0)]
   {
-if (GET_CODE (operands[2]) == SUBREG)
-  operands[2] = SUBREG_REG (operands[2]);
+operands[2] = gen_lowpart (Pmode, operands[2]);
 insn_code icode = code_for_pred_vwsll_scalar (mode);
 riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, operands);
 DONE;
-- 
2.34.1



[PATCH V3 2/2] RISC-V: Move mode assertion out of conditional branch in emit_insn

2024-06-17 Thread Edwin Lu
When emitting insns, we have an early assertion to ensure the input
operand's mode and the expanded operand's mode are the same; however, it
does not perform this check if the pattern does not have an explicit
machine mode specifying the operand. In this scenario, it will always
assume that mode = Pmode to correctly satisfy the
maybe_legitimize_operand check, however, there may be problems when
working in 32 bit environments.

Make the assert unconditional and replace it with an internal error for
more descriptive logging

gcc/ChangeLog:

* config/riscv/riscv-v.cc: Move assert out of conditional block

Signed-off-by: Edwin Lu 
Co-authored-by: Robin Dapp 
---
V2: change assert to internal error

V3: No change
---
 gcc/config/riscv/riscv-v.cc | 25 +++--
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 8911f5783c8..5306711c1b7 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -50,6 +50,7 @@
 #include "rtx-vector-builder.h"
 #include "targhooks.h"
 #include "predict.h"
+#include "errors.h"
 
 using namespace riscv_vector;
 
@@ -290,11 +291,17 @@ public:
   always Pmode.  */
if (mode == VOIDmode)
  mode = Pmode;
-   else
- /* Early assertion ensures same mode since maybe_legitimize_operand
-will check this.  */
- gcc_assert (GET_MODE (ops[opno]) == VOIDmode
- || GET_MODE (ops[opno]) == mode);
+
+   /* Early assertion ensures same mode since maybe_legitimize_operand
+  will check this.  */
+   machine_mode required_mode = GET_MODE (ops[opno]);
+   if (required_mode != VOIDmode && required_mode != mode)
+ internal_error ("expected mode %s for operand %d of "
+ "insn %s but got mode %s.\n",
+ GET_MODE_NAME (mode),
+ opno,
+ insn_data[(int) icode].name,
+ GET_MODE_NAME (required_mode));
 
add_input_operand (ops[opno], mode);
   }
@@ -346,7 +353,13 @@ public:
 else if (m_insn_flags & VXRM_RDN_P)
   add_rounding_mode_operand (VXRM_RDN);
 
-gcc_assert (insn_data[(int) icode].n_operands == m_opno);
+
+if (insn_data[(int) icode].n_operands != m_opno)
+  internal_error ("invalid number of operands for insn %s, "
+ "expected %d but got %d.\n",
+ insn_data[(int) icode].name,
+ insn_data[(int) icode].n_operands, m_opno);
+
 expand (icode, any_mem_p);
   }
 
-- 
2.34.1



Re: [PATCH 2/3] Enabled LRA for ia64.

2024-06-17 Thread Jonathan Wakely
On Mon, 17 Jun 2024 at 19:03, Joseph Myers  wrote:
>
> On Fri, 14 Jun 2024, Jonathan Wakely wrote:
>
> > Both, ideally. The libstdc++ test should definitely be fixed because
> > it fails with released versions of glibc already in the wild. But
> > glibc should also be fixed because it's a standards conformance issue.
>
> The __ctx macro used in various sys/ucontext.h headers prepends __ in
> standards conformance modes (the point being to avoid breaking the API
> outside such modes when we fixed the namespace issues).
>
> #ifdef __USE_MISC
> # define __ctx(fld) fld
> #else
> # define __ctx(fld) __ ## fld
> #endif
>
> (bits/sigcontext.h didn't get any such fixes as it's not included at all
> in standards conformance modes, only if __USE_MISC.)

I see, thanks. So it's not a problem in C, only in C++ due to G++
defining _GNU_SOURCE.

Let's just change the libstdc++ tests then.



Re: [C PATCH, v3] Fix for redeclared enumerator initialized with different type [PR115109]

2024-06-17 Thread Joseph Myers
On Sat, 15 Jun 2024, Martin Uecker wrote:

> The patch fails on arm because the tests make assumptions
> about enums that are not true everywhere. Should we just 
> limit the tests to x86?

For compilation tests, using -fno-short-enums should work.  That won't 
work for link / execute tests, but in those cases you can use { target { ! 
short_enums } }.  (If there are other issues beyond a short-enums default, 
other effective-targets may be needed.)

-- 
Joseph S. Myers
josmy...@redhat.com

Re: [PATCH] rs6000: Compute rop_hash_save_offset for non-Altivec compiles [PR115389]

2024-06-17 Thread Peter Bergner
On 6/16/24 9:10 PM, Kewen.Lin wrote:
> on 2024/6/15 01:05, Peter Bergner wrote:
>> That said, the --with-cpu=power5 build without fortran did bootstrap and
>> regtest with no regressions, so the build did test that code path and
>> exposed no problems.
> 
> OK, nice!  Thanks!

I assume this means you're "OK" with the updated patch, correct?




>> Currently, TARGET_ALTIVEC_ABI is defined as:
>>
>>   #define TARGET_ALTIVEC_ABI rs6000_altivec_abi
>>
>> Would it make sense to redine it to:
>>
>>   #define TARGET_ALTIVEC_ABI (TARGET_ALTIVEC && rs6000_altivec_abi)
>>
>> ...or add some code in rs6000 option handling to disable rs6000_altivec_abi
>> when TARGET_ALTIVEC is false?  or do we care enough to even change it? 
>> :-)
> 
> Assuming the current code is robust enough (perfectly guarded by some altivec 
> related
> condition like this altivec register saving slot), there may not any actual 
> errors,
> but considering not surprising people, I'm inclined to add some option 
> handlings for
> it, like unsetting rs6000_altivec_abi if !TARGET_ALTIVEC and give some 
> warning if it's
> explicitly specified, what do you think?

I like it, since if Altivec is disabled, having TARGET_ALTIVEC_ABI enabled 
makes no
sense to me.  That is orthogonal to this bug though, so should be a separate 
patch.
Do you want to take a stab at writing that or do you want me to do that?


Peter




Re: [committed] testsuite: Add -Wno-psabi to vshuf-mem.C test

2024-06-17 Thread Jakub Jelinek
On Mon, Jun 17, 2024 at 09:09:37PM +0200, Andreas Krebbel wrote:
> On 6/14/24 20:03, Jakub Jelinek wrote:
> > Also wonder about the
> > // { dg-additional-options "-march=z14" { target s390*-*-* } }
> > line, doesn't that mean the test will FAIL on all pre-z14 HW?
> > Shouldn't it use some z14_runtime or similar effective target, or
> > check in main (in that case copied over to g++.target/s390) whether
> > z14 instructions can be actually used at runtime?
> 
> Oh right. I'll remove that line and replicate the testcase in the arch
> specific test dir.

Though, looking around some more, perhaps
// { dg-additional-options "-march=z14" { target s390_vxe } }
might be all that is needed, even in current dir.

Jakub



Re: [PATCH 2/3] Enabled LRA for ia64.

2024-06-17 Thread Frank Scheiner

On 17.06.24 20:53, Jonathan Wakely wrote:

On Mon, 17 Jun 2024 at 19:03, Joseph Myers  wrote:


On Fri, 14 Jun 2024, Jonathan Wakely wrote:


Both, ideally. The libstdc++ test should definitely be fixed because
it fails with released versions of glibc already in the wild. But
glibc should also be fixed because it's a standards conformance issue.


The __ctx macro used in various sys/ucontext.h headers prepends __ in
standards conformance modes (the point being to avoid breaking the API
outside such modes when we fixed the namespace issues).

#ifdef __USE_MISC
# define __ctx(fld) fld
#else
# define __ctx(fld) __ ## fld
#endif

(bits/sigcontext.h didn't get any such fixes as it's not included at all
in standards conformance modes, only if __USE_MISC.)


I see, thanks. So it's not a problem in C, only in C++ due to G++
defining _GNU_SOURCE.

Let's just change the libstdc++ tests then.


Great, I did test that patched in the same way as in [1] on Friday. It
makes the three failing tests pass:

```
# make check-target-libstdc++-v3
RUNTESTFLAGS="conformance.exp=17_intro/names*\ experimental/names.cc"

Test run by root on Fri Jun 14 16:04:26 2024
Native configuration is ia64-t2-linux-gnu

=== libstdc++ tests ===

Schedule of variations:
unix

Running target unix
Running
/dev/shm/gcc-15-lra/src.gcc.ia64-toolchain-3.240529.123346.921189/gcc/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp
...
PASS: 17_intro/names.cc  -std=gnu++17 (test for excess errors)
PASS: 17_intro/names_pstl.cc  -std=gnu++17 (test for excess errors)
PASS: experimental/names.cc  -std=gnu++17 (test for excess errors)

=== libstdc++ Summary ===

# of expected passes3
```

[1]:
https://gcc.gnu.org/git/?p=gcc.git;a=patch;h=cf5f7791056b3ed993bc8024be767a86157514a9

You most likely want the workaround as separate patch on this list, as
the failures were happening for both the non-LRA and LRA case, right?

Cheers,
Frank


Re: [Patch, Fortran, 96418] Fix Test coarray_alloc_comp_4.f08 ICEs

2024-06-17 Thread Harald Anlauf

Hi Andre,

Am 17.06.24 um 09:51 schrieb Andre Vehreschild:

Regarding your question on the coarray-tests that are not in the
coarray-directory: These test in most cases test only one method of
implementing coarrays. I.e., they are either testing just -fcoarray=single or
-fcoarray=lib -lcaf_single, which are two different approaches. The tests in
the coarray-directory test all available methods to implement coarrays.  Pushing


ah, that explains it.  I only looked at some of the test sources,
but did not think of looking at caf.exp ...


all coarray-tests into the coarray-directory will fail a lot of them, because
the behavior of -fcoarray=single and -fcoarray=lib -lcaf_single is different in
some corner cases. That's why the coarray-tests in the main gfortran-dir are
separate.

I do understand why it may be confusing, but I don't see an easy solution. Does
this answer your question?


Indeed it does!

Thanks,
Harald



[committed] c: Implement C2Y alignof on incomplete arrays

2024-06-17 Thread Joseph Myers
C2Y has adopted support for alignof applied to incomplete array types
(N3273).  Add this support to GCC.  As the relevant checks are in
c-family code that doesn't have access to functions such as
pedwarn_c23, this remains a hard error for older versions and isn't
handled by -Wc23-c2y-compat, although preferably it would work like
pedwarn_c23 (pedwarn-if-pedantic for older versions, warning with
-Wc23-c2y-compat in C2Y mode).

Bootstrapped with no regressions for x86_64-pc-linux-gnu.

gcc/c-family/
* c-common.cc (c_sizeof_or_alignof_type): Allow alignof on an
incomplete array type for C2Y.

gcc/testsuite/
* gcc.dg/c23-align-10.c, gcc.dg/c2y-align-1.c,
gcc.dg/c2y-align-2.c: New tests.

diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index 24335deeb58..7d752acd430 100644
--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-common.cc
@@ -3972,7 +3972,9 @@ c_sizeof_or_alignof_type (location_t loc,
   value = size_one_node;
 }
   else if (!COMPLETE_TYPE_P (type)
-  && (!c_dialect_cxx () || is_sizeof || type_code != ARRAY_TYPE))
+  && ((!c_dialect_cxx () && !flag_isoc2y)
+  || is_sizeof
+  || type_code != ARRAY_TYPE))
 {
   if (complain)
error_at (loc, "invalid application of %qs to incomplete type %qT",
diff --git a/gcc/testsuite/gcc.dg/c23-align-10.c 
b/gcc/testsuite/gcc.dg/c23-align-10.c
new file mode 100644
index 000..bd6b9c268c3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c23-align-10.c
@@ -0,0 +1,6 @@
+/* Test C2Y alignof on an incomplete array type: not allowed in C23.  */
+/* { dg-do compile } */
+/* { dg-options "-std=c23 -pedantic-errors" } */
+
+int a = alignof(int[]); /* { dg-error "incomplete" } */
+int b = alignof(int[][1]); /* { dg-error "incomplete" } */
diff --git a/gcc/testsuite/gcc.dg/c2y-align-1.c 
b/gcc/testsuite/gcc.dg/c2y-align-1.c
new file mode 100644
index 000..3f9ab18c518
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c2y-align-1.c
@@ -0,0 +1,6 @@
+/* Test C2Y alignof on an incomplete array type.  */
+/* { dg-do compile } */
+/* { dg-options "-std=c2y -pedantic-errors" } */
+
+int a = alignof(int[]);
+int b = alignof(int[][1]);
diff --git a/gcc/testsuite/gcc.dg/c2y-align-2.c 
b/gcc/testsuite/gcc.dg/c2y-align-2.c
new file mode 100644
index 000..b7b87150413
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c2y-align-2.c
@@ -0,0 +1,8 @@
+/* Test C2Y alignof on an incomplete array type: still not allowed for other
+   incomplete types.  */
+/* { dg-do compile } */
+/* { dg-options "-std=c2y -pedantic-errors" } */
+
+int a = alignof(void); /* { dg-error "void" } */
+struct s;
+int b = alignof(struct s); /* { dg-error "incomplete" } */

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH] diagnostics: Fix add_misspelling_candidates [PR115440]

2024-06-17 Thread Joseph Myers
On Mon, 17 Jun 2024, Jakub Jelinek wrote:

> 2024-06-17  Jakub Jelinek  
> 
>   PR driver/115440
>   * opts-common.cc (add_misspelling_candidates): If opt1 is non-NULL,
>   add a space and opt1 to the alternative suggestion text.
> 
>   * g++.dg/cpp1z/pr115440.C: New test.

OK.

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH] middle-end/114189 - drop uses of vcond{,u,eq}_optab

2024-06-17 Thread Kewen.Lin
Hi Richi,

on 2024/6/14 18:31, Richard Biener wrote:
> The following retires vcond{,u,eq} optabs by stopping to use them
> from the middle-end.  Targets instead (should) implement vcond_mask
> and vec_cmp{,u,eq} optabs.  The PR this change refers to lists
> possibly affected targets - those implementing these patterns,
> and in particular it lists mips, sparc and ia64 as targets that
> most definitely will regress while others might simply remove
> their vcond{,u,eq} patterns.
> 
> I'd appreciate testing, I do not expect fallout for x86 or arm/aarch64.
> I know riscv doesn't implement any of the legacy optabs.  But less
> maintained vector targets might need adjustments.

Thanks for making this change, this patch can be bootstrapped on ppc64{,le}
but both have one failure on gcc/testsuite/gcc.target/powerpc/pr66144-3.c,
by looking into it, I found it just exposed one oversight in the current
rs6000 vcond_mask support (the condition mask location is wrong), so I think
this change is fine for rs6000 port, I'll also test SPEC2017 for this (with
rs6000 vcond_mask change) soon.

BR,
Kewen

> 
> I want to get rid of those optabs for GCC 15.  If I don't hear from
> you I will assume your target is fine.
> 
> Thanks,
> Richard.
> 
>   PR middle-end/114189
>   * optabs-query.h (get_vcond_icode): Always return CODE_FOR_nothing.
>   (get_vcond_eq_icode): Likewise.
> ---
>  gcc/optabs-query.h | 13 -
>  1 file changed, 4 insertions(+), 9 deletions(-)
> 
> diff --git a/gcc/optabs-query.h b/gcc/optabs-query.h
> index 0cb2c21ba85..31fbce80175 100644
> --- a/gcc/optabs-query.h
> +++ b/gcc/optabs-query.h
> @@ -112,14 +112,9 @@ get_vec_cmp_eq_icode (machine_mode vmode, machine_mode 
> mask_mode)
> mode CMODE, unsigned if UNS is true, resulting in a value of mode VMODE.  
> */
>  
>  inline enum insn_code
> -get_vcond_icode (machine_mode vmode, machine_mode cmode, bool uns)
> +get_vcond_icode (machine_mode, machine_mode, bool)
>  {
> -  enum insn_code icode = CODE_FOR_nothing;
> -  if (uns)
> -icode = convert_optab_handler (vcondu_optab, vmode, cmode);
> -  else
> -icode = convert_optab_handler (vcond_optab, vmode, cmode);
> -  return icode;
> +  return CODE_FOR_nothing;
>  }
>  
>  /* Return insn code for a conditional operator with a mask mode
> @@ -135,9 +130,9 @@ get_vcond_mask_icode (machine_mode vmode, machine_mode 
> mmode)
> mode CMODE (only EQ/NE), resulting in a value of mode VMODE.  */
>  
>  inline enum insn_code
> -get_vcond_eq_icode (machine_mode vmode, machine_mode cmode)
> +get_vcond_eq_icode (machine_mode, machine_mode)
>  {
> -  return convert_optab_handler (vcondeq_optab, vmode, cmode);
> +  return CODE_FOR_nothing;
>  }
>  
>  /* Enumerates the possible extraction_insn operations.  */



Re: [PATCH] xtensa: constantsynth: Reforge to fix some non-fatal issues

2024-06-17 Thread Max Filippov
Hi Suwa-san,

On Mon, Jun 17, 2024 at 04:17:15PM +0900, Takayuki 'January June' Suwa wrote:
> The previous constant synthesis logic had some issues that were non-fatal
> but worth considering:
> 
> - It didn't work with DFmode literals, because those were cast to SImode
>   rather SFmode when splitting into two natural-width words by
>   split_double().
> 
> - It didn't work with large literals when TARGET_AUTO_LITPOOLS was enabled,
>   because those were relaxed MOVI immediates rather references to literal
>   pool entries,
> 
> - It didn't take into account that when literals with the same RTL
>   representation are pooled multiple times within a function, those entries
>   are shared (especially important when optimizing for size).
> 
> This patch addresses the above issues by making appropriate tweaks to the
> constant synthesis logic.
> 
> gcc/ChangeLog:
> 
>   * config/xtensa/xtensa-protos.h (xtensa_constantsynth):
>   Change the second argument from HOST_WIDE_INT to rtx.
>   * config/xtensa/xtensa.cc (#include):
>   Add "context.h" and "pass_manager.h".
>   (machine_function): Add a new hash_map field "litpool_usage".
>   (xtensa_constantsynth): Make "src" (the second operand) accept
>   RTX literal instead of its value, and treat both bare and pooled
>   SI/SFmode literals equally by bit-exact canonicalization into
>   CONST_INT RTX internally.  And then, make avoid synthesis if
>   such multiple identical canonicalized literals are found in same
>   function when optimizing for size.  Finally, for literals where
>   synthesis is not possible or has been avoided, re-emit "move"
>   RTXes with canonicalized ones to increase the chances of sharing
>   literal pool entries.
>   * config/xtensa/xtensa.md (split patterns for constant synthesis):
>   Change to simply invoke xtensa_constantsynth() as mentioned above,
>   and add new patterns for when TARGET_AUTO_LITPOOLS is enabled.
> ---
>  gcc/config/xtensa/xtensa-protos.h |  2 +-
>  gcc/config/xtensa/xtensa.cc   | 75 ---
>  gcc/config/xtensa/xtensa.md   | 56 ++-
>  3 files changed, 103 insertions(+), 30 deletions(-)

This series introduced a few ICE regressions:

+FAIL: gcc.dg/atomic/c11-atomic-exec-2.c   -Os  (internal compiler error: 
Segmentation fault)
+FAIL: gcc.dg/atomic/c11-atomic-exec-3.c   -Os  (internal compiler error: 
Segmentation fault)
+FAIL: gcc.dg/atomic/c11-atomic-exec-4.c   -Os  (internal compiler error: 
Segmentation fault)
+FAIL: gcc.dg/torture/vec-cvt-1.c   -Os  (internal compiler error: Segmentation 
fault)
+FAIL: c-c++-common/torture/complex-sign-mixed-add.c   -Os  (internal compiler 
error: Segmentation fault)
+FAIL: c-c++-common/torture/complex-sign-mixed-div.c   -Os  (internal compiler 
error: Segmentation fault)
+FAIL: c-c++-common/torture/complex-sign-mixed-sub.c   -Os  (internal compiler 
error: Segmentation fault)
+FAIL: gfortran.dg/bind-c-contiguous-1.f90   -Os  (internal compiler error: 
Segmentation fault)
+FAIL: gfortran.dg/bind-c-contiguous-4.f90   -Os  (internal compiler error: 
Segmentation fault)
+FAIL: gfortran.dg/minlocval_4.f90   -Os  (internal compiler error: 
Segmentation fault)

they all have a backtrace like this:

/home/jcmvbkbc/ws/tensilica/gcc/gcc/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-4.c:
 In function 'test_main_long_double_postinc':
/home/jcmvbkbc/ws/tensilica/gcc/gcc/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-4.c:73:1:
 internal compiler error: Segmentation fault
/home/jcmvbkbc/ws/tensilica/gcc/gcc/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-4.c:97:1:
 note: in expansion of macro 'TEST_FUNCS'   
   
0xf0493f crash_signal
/home/jcmvbkbc/ws/tensilica/gcc/gcc/gcc/toplev.cc:319
0x7fcc65b98d5f ???
./signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0
0x98cd63 lookup_page_table_entry
/home/jcmvbkbc/ws/tensilica/gcc/gcc/gcc/ggc-page.cc:630 


0x98cd63 ggc_set_mark(void const*)
/home/jcmvbkbc/ws/tensilica/gcc/gcc/gcc/ggc-page.cc:1553
0x12b31bd gt_ggc_mx_hash_map_rtx_int_(void*)
./gt-xtensa.h:39
0xc19207 gt_ggc_mx_function(void*)  



/home/jcmvbkbc/ws/tensilica/gcc/builds/gcc-15-1382-g448482d3d5c2-xtensa-call0-le/gcc/gtype-desc.cc:1696

 
0xc19207 gt_ggc_mx_function(void*)

/home/jcmvbkbc/ws/tensilica/gcc/builds/gcc-15-1382-g448482d3d5c2-xtensa-call0-le/gcc/gtype-desc.cc:1680
  

[c-family] Add minimal support for __bf16 to -fdump-ada-spec

2024-06-17 Thread Eric Botcazou
Tested on x86-64/Linux, applied on the mainline.


2024-06-17  Eric Botcazou  

c-family/
* c-ada-spec.cc (is_float16): New predicate.
(dump_ada_node) : Call it.

-- 
Eric Botcazoudiff --git a/gcc/c-family/c-ada-spec.cc b/gcc/c-family/c-ada-spec.cc
index a41e93aeafb..e1b1b2a4b73 100644
--- a/gcc/c-family/c-ada-spec.cc
+++ b/gcc/c-family/c-ada-spec.cc
@@ -2077,6 +2077,22 @@ dump_ada_enum_type (pretty_printer *pp, tree node, tree type, int spc)
 }
 }
 
+/* Return true if NODE is the __bf16 type.  */
+
+static bool
+is_float16 (tree node)
+{
+  if (!TYPE_NAME (node) || TREE_CODE (TYPE_NAME (node)) != TYPE_DECL)
+return false;
+
+  tree name = DECL_NAME (TYPE_NAME (node));
+
+  if (IDENTIFIER_POINTER (name) [0] != '_')
+return false;
+
+  return id_equal (name, "__bf16");
+}
+
 /* Return true if NODE is the _Float32/_Float32x type.  */
 
 static bool
@@ -2210,7 +2226,12 @@ dump_ada_node (pretty_printer *pp, tree node, tree type, int spc,
   break;
 
 case REAL_TYPE:
-  if (is_float32 (node))
+  if (is_float16 (node))
+	{
+	  pp_string (pp, "Short_Float");
+	  break;
+	}
+  else if (is_float32 (node))
 	{
 	  pp_string (pp, "Float");
 	  break;


Re: [c-family] Add minimal support for __bf16 to -fdump-ada-spec

2024-06-17 Thread Andrew Pinski
On Mon, Jun 17, 2024 at 2:29 PM Eric Botcazou  wrote:
>
> Tested on x86-64/Linux, applied on the mainline.
>
>
> 2024-06-17  Eric Botcazou  
>
> c-family/
> * c-ada-spec.cc (is_float16): New predicate.
> (dump_ada_node) : Call it.

Hmm, is_float16 seems to be me would be _Float16 rather than __bf16.
Those two are two different formats; both could be supported on a
target (both aarch64 and x86_64 support both at the same time).
Also for __bf16, I think comparing against the format being
arm_bfloat_half_format would be a better choice rather than depending
on the name.

Thanks,
Andrew Pinski

>
> --
> Eric Botcazou


[C PATCH, v4] Fix for redeclared enumerator initialized with different type [PR115109]

2024-06-17 Thread Martin Uecker


This is a new version of the patch.  This adds the -fno-short-enums flag 
to the tests. I will commit it if the CI for am does not claim this time.

Bootstrapped and regression tested on x86_64.


c23: Fix for redeclared enumerator initialized with different type 
[PR115109]

c23 specifies that the type of a redeclared enumerator is the one of the
previous declaration.  Convert initializers with different type accordingly
and emit an error when the value does not fit.

2024-06-01 Martin Uecker  

PR c/115109

gcc/c/
* c-decl.cc (build_enumerator): When redeclaring an
enumerator convert value to previous type.  For redeclared
enumerators use underlying type for computing the next value.

gcc/testsuite/
* gcc.dg/pr115109.c: New test.
* gcc.dg/c23-tag-enum-6.c: New test.
* gcc.dg/c23-tag-enum-7.c: New test.

commit c8a0ec5150299689e6e36b0044ea811b82d90b2f
Author: Martin Uecker 
Date:   Sat May 18 22:00:04 2024 +0200

c23: Fix for redeclared enumerator initialized with different type 
[PR115109]

c23 specifies that the type of a redeclared enumerator is the one of the
previous declaration.  Convert initializers with different type accordingly
and emit an error when the value does not fit.

2024-06-01 Martin Uecker  

PR c/115109

gcc/c/
* c-decl.cc (build_enumerator): When redeclaring an
enumerator convert value to previous type.  For redeclared
enumerators use underlying type for computing the next value.

gcc/testsuite/
* gcc.dg/pr115109.c: New test.
* gcc.dg/c23-tag-enum-6.c: New test.
* gcc.dg/c23-tag-enum-7.c: New test.

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 6c09eb73128..01326570e2b 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -10277,6 +10277,7 @@ build_enumerator (location_t decl_loc, location_t loc,
  struct c_enum_contents *the_enum, tree name, tree value)
 {
   tree decl;
+  tree old_decl;
 
   /* Validate and default VALUE.  */
 
@@ -10336,6 +10337,23 @@ build_enumerator (location_t decl_loc, location_t loc,
 definition.  */
   value = convert (the_enum->enum_type, value);
 }
+  else if (flag_isoc23
+  && (old_decl = lookup_name_in_scope (name, current_scope))
+  && old_decl != error_mark_node
+  && TREE_TYPE (old_decl)
+  && TREE_TYPE (TREE_TYPE (old_decl))
+  && TREE_CODE (old_decl) == CONST_DECL)
+{
+  /* Enumeration constants in a redeclaration have the previous type.  */
+  tree previous_type = TREE_TYPE (DECL_INITIAL (old_decl));
+  if (!int_fits_type_p (value, previous_type))
+   {
+ error_at (loc, "value of redeclared enumerator outside the range "
+"of %qT", previous_type);
+ locate_old_decl (old_decl);
+   }
+  value = convert (previous_type, value);
+}
   else
 {
   /* Even though the underlying type of an enum is unspecified, the
@@ -10402,9 +10420,14 @@ build_enumerator (location_t decl_loc, location_t loc,
 false);
 }
   else
-the_enum->enum_next_value
-  = build_binary_op (EXPR_LOC_OR_LOC (value, input_location),
-PLUS_EXPR, value, integer_one_node, false);
+{
+  /* In a redeclaration the type can already be the enumeral type.  */
+  if (TREE_CODE (TREE_TYPE (value)) == ENUMERAL_TYPE)
+   value = convert (ENUM_UNDERLYING_TYPE (TREE_TYPE (value)), value);
+  the_enum->enum_next_value
+   = build_binary_op (EXPR_LOC_OR_LOC (value, input_location),
+  PLUS_EXPR, value, integer_one_node, false);
+}
   the_enum->enum_overflow = tree_int_cst_lt (the_enum->enum_next_value, value);
   if (the_enum->enum_overflow
   && !ENUM_FIXED_UNDERLYING_TYPE_P (the_enum->enum_type))
diff --git a/gcc/testsuite/gcc.dg/c23-tag-enum-6.c 
b/gcc/testsuite/gcc.dg/c23-tag-enum-6.c
new file mode 100644
index 000..29aef7ee3fd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c23-tag-enum-6.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-std=c23 -fno-short-enums" } */
+
+#include 
+
+enum E : int { a = 1, b = 2 };
+enum E : int { b = _Generic(a, enum E: 2), a = 1 };
+
+enum H { x = 1 };
+enum H { x = 2UL + UINT_MAX }; /* { dg-error "outside the range" } */
+
+enum K : int { z = 1 };
+enum K : int { z = 2UL + UINT_MAX };   /* { dg-error "outside the range" } */
+
+enum F { A = 0, B = UINT_MAX };
+enum F { B = UINT_MAX, A };/* { dg-error "outside the range" } */
+
+enum G : unsigned int { C = 0, D = UINT_MAX };
+enum G : unsigned int { D = UINT_MAX, C }; /* { dg-error 
"overflow" } */
+
diff --git a/gcc/testsuite/gcc.dg/c23-tag-enum-7.c 
b/gcc/testsuite/gcc.dg/c23-tag-enum-7.c
new file mode 100644
index 000..d4c787c8f71
--- /dev/n

Re: [wwwdocs,pushed] backends.html - Update weblinks to AVR simulators

2024-06-17 Thread Gerald Pfeifer
On Sat, 15 Jun 2024, Georg-Johann Lay wrote:
> Applied this one:

Cool.

> +SimulAVR at https://www.nongnu.org/simulavr";

This one gives a http response of "301 Moved Permanently" redirecting to 
https://www.nongnu.org/simulavr/ . I'll fix this in a minute.

On a related note, though, can we update the references to the simulators 
from (exemplary)

   +avrtest at
   +  https://github.com/sprintersb/atest";
   +>https://github.com/sprintersb/atest

to

   +https://github.com/sprintersb/atest";>avrtest


Thanks,
Gerald


[pushed] wwwdocs: backends: Adjust SimulAVR link

2024-06-17 Thread Gerald Pfeifer
The original link gives a "301 Moved Permanently", easily fixed by 
appending a slash.

Pushed.

Gerald

---
 htdocs/backends.html | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/htdocs/backends.html b/htdocs/backends.html
index 1f7c85d7..d86783a6 100644
--- a/htdocs/backends.html
+++ b/htdocs/backends.html
@@ -128,8 +128,8 @@ xtensa | C
   https://github.com/sprintersb/atest?tab=readme-ov-file#running-the-avr-gcc-testsuite-using-the-avrtest-simulator";
 >README: Running the avr-gcc Testsuite using the avrtest Simulator
 
-SimulAVR at https://www.nongnu.org/simulavr";
-  >https://www.nongnu.org/simulavr
+SimulAVR at https://www.nongnu.org/simulavr/";
+  >https://www.nongnu.org/simulavr/
 
 
 
-- 
2.45.2


Re: [PATCH] rs6000: Shrink rs6000_init_generated_builtins size [PR115324]

2024-06-17 Thread Segher Boessenkool
Hi!

Thanks for posting this again.  Much easier to find that way :-)

On Mon, Jun 17, 2024 at 07:15:48PM +0200, Jakub Jelinek wrote:
> While my r15-1001-g4cf2de9b5268224 PCH PIE power fix change decreased the
> .data section sizes (219792 -> 189336), it increased the size of already
> huge rs6000_init_generated_builtins generated function, from 218328
> to 228668 bytes.  That is because there are thousands of array references
> to global arrays and we keep constructing the addresses of the arrays
> again and again.

Less than 5%, for some perspective ;-)

> Ideally some optimization would figure out we have a single function which
> has
> 461   rs6000_overload_info
>1257   rs6000_builtin_info_fntype
>1768   rs6000_builtin_decls
>2548   rs6000_instance_info_fntype
> array references and that maybe it might be a good idea to just preload
> the addresses of those arrays into some register if it decreases code size
> and doesn't slow things down.
> The function actually is called just once and is huge, so code size is even
> more important than speed, which is dominated by all the GC allocations
> anyway.

Yup.

> Until that is done, here is a slightly cleaner version of the hack, which
> makes the function noipa (so that LTO doesn't undo it) for GCC 8.1+ and
> passes the 4 arrays as arguments to the function from the caller.
> This decreases the function size from 228668 bytes to 207572 bytes.
> 
> Bootstrapped/regtested on powerpc64le-linux, ok for trunk?

> 2024-06-17  Jakub Jelinek  
> 
>   PR target/115324
>   * config/rs6000/rs6000-gen-builtins.cc (write_decls): Change
>   declaration of rs6000_init_generated_builtins from no arguments
>   to 4 pointer arguments.
>   (write_init_bif_table): Change rs6000_builtin_info_fntype to
>   builtin_info_fntype and rs6000_builtin_decls to builtin_decls.
>   (write_init_ovld_table): Change rs6000_instance_info_fntype to
>   instance_info_fntype, rs6000_builtin_decls to builtin_decls and
>   rs6000_overload_info to overload_info.
>   (write_init_file): Add __noipa__ attribute to
>   rs6000_init_generated_builtins for GCC 8.1+ and change the function
>   from no arguments to 4 pointer arguments.  Change rs6000_builtin_decls
>   to builtin_decls.
>   * config/rs6000/rs6000-builtin.cc (rs6000_init_builtins): Adjust
>   rs6000_init_generated_builtins caller.

It would have been much easier to review if you had done the renaming in
a separate patch :-)  You typically notice such things when writing the
changelog is much harder than expected, and this is the True Value of
changelogs!

Seen from the other side, when reviewing a patch I like to start with
the changelog (after the commit message), it should tell everything
there is to know, and then if something in the actiual patch surprises
me, something is not ideal, or wrong even.

> +  /* The reason to pass pointers to the function instead of accessing
> + the rs6000_{{builtin,instance}_info_fntype,overload_info,builtin_decls}
> + arrays directly is to decrease size of the already large function and
> + noipa prevents the compiler with LTO to undo that optimization.  */

Some of these array names no longer have the rs6000_ prefix now.  Oh
wait, you already took that into account?  I'm not saying anything :-)

The patch is fine for trunk, thank you!  If you want backports those
are okay, too (but I don't think you want any?  Or does this work
withput the previous patches as well?)


Segher


Re: [PATCH] rs6000: ROP - Do not disable shrink-wrapping for leaf functions [PR114759]

2024-06-17 Thread Segher Boessenkool
Hi!

On Mon, Jun 17, 2024 at 05:26:39PM -0500, Peter Bergner wrote:
> While auditing our ROP code generation for some test cases I wrote, I noticed
> a few issues which I'm tracking in PR114759.  The first issue I noticed is we
> disable shrink-wrapping when using -mrop-protect, even in the cases where we
> never emit the ROP instructions because they're not needed.

Please don't call this "ROP instructions".  -mrop-protect tries to make
it much harder to succesfully do exploits in a style called "return-
oriented programming", starting from a stack overwrite normally.  It
does this by hashing the return address together with the stack pointer
value and with the previous hash value (so the whole call stack hashed),
and checking that before returning.

"ROP insns" are the instructions used in such exploits, not what you
mean here :-)

The instructions are called "hash*"C, so maybe call tbem "hash insns"
or "ROP protect hash insns"?.

> The problem is
> we disable shrink-wrapping too early, before we know whether we will need to
> emit the ROP instructions or not.  The fix is to delay disabling shrink
> wrapping until we've decided whether we will or won't be emitting the ROP
> instructions.

>   * config/rs6000/rs6000.cc (rs6000_override_options_after_change): Move
>   the disabling of shrink-wrapping from here
>   * config/rs6000/rs6000-logue.cc (rs6000_stack_info): ...to here.

Hrm.  Can you do it in some particular caller of rs6000_stack_info,
instead?  The rs6000_stack_info function itself is not suppposed to
change any state whatsoever.

> --- a/gcc/config/rs6000/rs6000-logue.cc
> +++ b/gcc/config/rs6000/rs6000-logue.cc
> @@ -720,7 +720,11 @@ rs6000_stack_info (void)
>&& info->calls_p
>&& DEFAULT_ABI == ABI_ELFv2
>&& rs6000_rop_protect)
> -info->rop_hash_size = 8;
> +{
> +  /* If we are inserting ROP-protect instructions, disable shrink wrap.  
> */
> +  flag_shrink_wrap = 0;
> +  info->rop_hash_size = 8;
> +}

The comment should say *why*!  The fact that we do is clear from the
code itself already.  But why do we want this?

> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -3427,10 +3427,6 @@ rs6000_override_options_after_change (void)
>  }
>else if (!OPTION_SET_P (flag_cunroll_grow_size))
>  flag_cunroll_grow_size = flag_peel_loops || optimize >= 3;
> -
> -  /* If we are inserting ROP-protect instructions, disable shrink wrap.  */
> -  if (rs6000_rop_protect)
> -flag_shrink_wrap = 0;
>  }

(Yes, I know the original code didn't say either, but let's try to make
things better :-) )

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr114759-1.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power10 -mrop-protect 
> -fdump-rtl-pro_and_epilogue" } */
> +/* { dg-require-effective-target rop_ok } */

Do you want rop_ok while you are *forcing* it to be okay anyway?  Why?


Segher


Re: [PATCH] rs6000: Compute rop_hash_save_offset for non-Altivec compiles [PR115389]

2024-06-17 Thread Peter Bergner
On 6/16/24 9:40 PM, Kewen.Lin wrote:
> on 2024/6/17 10:31, Peter Bergner wrote:
>> On 6/16/24 9:10 PM, Kewen.Lin wrote:
>>> on 2024/6/15 01:05, Peter Bergner wrote:
 That said, the --with-cpu=power5 build without fortran did bootstrap and
 regtest with no regressions, so the build did test that code path and
 exposed no problems.
>>>
>>> OK, nice!  Thanks!
>>
>> I assume this means you're "OK" with the updated patch, correct?
> 
> Yes, OK for trunk, thanks!

Thanks.  We will need backports to GCC 11, as it is broken back to when
ROP was first added then.  I'll let things burn-in on trunk for a couple
of days so Bill's CI builders have a chance to test it on all of our
configs.  





>> Do you want to take a stab at writing that or do you want me to do that?
> 
> Either is fine for me, then let me give it a shot.

Sounds good, thanks.  That will allow me to handle the other ROP issues
I came across, which are reported in PR114759.

Peter




Re: [PATCH V3 1/2] RISC-V: Fix vwsll combine on rv32 targets

2024-06-17 Thread Jeff Law




On 6/17/24 12:33 PM, Edwin Lu wrote:

On rv32 targets, vwsll_zext1_scalar_ would trigger an ice in
maybe_legitimize_instruction when zero extending a uint32 to uint64 due
to a mismatch between the input operand's mode (DI) and the expanded insn
operand's mode (Pmode == SI). Ensure that mode of the operands match

Tested on rv32/64 gcv newlib. Letting CI perform additional testing

gcc/ChangeLog:

* config/riscv/autovec-opt.md: Fix mode mismatch

OK
jeff




Re: [PATCH V3 2/2] RISC-V: Move mode assertion out of conditional branch in emit_insn

2024-06-17 Thread Jeff Law




On 6/17/24 12:33 PM, Edwin Lu wrote:

When emitting insns, we have an early assertion to ensure the input
operand's mode and the expanded operand's mode are the same; however, it
does not perform this check if the pattern does not have an explicit
machine mode specifying the operand. In this scenario, it will always
assume that mode = Pmode to correctly satisfy the
maybe_legitimize_operand check, however, there may be problems when
working in 32 bit environments.

Make the assert unconditional and replace it with an internal error for
more descriptive logging

gcc/ChangeLog:

* config/riscv/riscv-v.cc: Move assert out of conditional block

OK.

Jeff



[COMMITTED] aarch64: Add testcase for PR97405

2024-06-17 Thread Andrew Pinski
This aarch64 sve specific code was fixed by r15-917-gc9842f99042454
which added a riscv specific testcase so adding an aarch64 one to test
the fix does not regress is a good idea.

Committed as obvious after testing the testcase for aarch64-linux-gnu.

PR tree-optimization/97405

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/pr97405-1.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/gcc.target/aarch64/sve/pr97405-1.c | 13 +
 1 file changed, 13 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pr97405-1.c

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr97405-1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/pr97405-1.c
new file mode 100644
index 000..5efa32c9928
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pr97405-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv8.2-a+sve -O2" }
+/* PR tree-optimization/97405 */
+#include "arm_sve.h"
+
+void
+a (svuint8x3_t b, unsigned char *p, int c) {
+  if (c)
+svst1_u8(svptrue_pat_b8(SV_VL16), p, svget3_u8(b, 1));
+  else
+svst1_u8(svwhilelt_b8(6, 6), p, svget3_u8(b, 1));
+}
+
-- 
2.43.0



[RFC v3] RISC-V: Promote Zaamo/Zalrsc to a when using an old binutils

2024-06-17 Thread Patrick O'Neill
Binutils 2.42 and before don't support Zaamo/Zalrsc. Promote Zaamo/Zalrsc to
'a' in the -march string when assembling.

This change respects Zaamo/Zalrsc when generating code.

Testcases that check for the default isa string will fail with the old binutils
since zaamo/zalrsc aren't emitted anymore. All other Zaamo/Zalrsc testcases
pass.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_subset_list::to_string): Add toggle to promote Zaamo/Zalrsc
extensions to 'a'.
(riscv_arch_str): Ditto.
(riscv_expand_arch): Ditto.
(riscv_expand_arch_from_cpu): Ditto.
(riscv_expand_arch_upgrade_exts): New function. Wrapper around
riscv_expand_arch to preserve the function signature.
(riscv_expand_arch_no_upgrade_exts): Ditto
(riscv_expand_arch_from_cpu_upgrade_exts): New function. Wrapper around
riscv_expand_arch_from_cpu to preserve the function signature.
(riscv_expand_arch_from_cpu_no_upgrade_exts): Ditto.
* config/riscv/riscv-protos.h (riscv_arch_str): Add toggle to function
prototype.
* config/riscv/riscv-subset.h: Ditto.
* config/riscv/riscv-target-attr.cc (riscv_process_target_attr):
* config/riscv/riscv.cc (riscv_emit_attribute):
(riscv_declare_function_name):
* config/riscv/riscv.h (riscv_expand_arch): Remove.
(riscv_expand_arch_from_cpu): Ditto.
(riscv_expand_arch_upgrade_exts): Add toggle wrapper functions.
(riscv_expand_arch_no_upgrade_exts): Ditto.
(riscv_expand_arch_from_cpu_upgrade_exts): Ditto.
(riscv_expand_arch_from_cpu_no_upgrade_exts): Ditto.
(EXTRA_SPEC_FUNCTIONS): Ditto.
(OPTION_DEFAULT_SPECS): Use non-upgraded march string when invoking the
compiler.
(ASM_SPEC): Use upgraded march string when invoking the assembler.

Signed-off-by: Patrick O'Neill 
---
v3 ChangeLog:
Rebased on non-promoting patch.
Wrap all Zaamo/Zalrsc upgrade code in #ifndef to prevent compiler
warnings about unused/potentially undefined variables.
Silence unused parameter warning with a voidcast.
---
RFC since I'm not sure if this upgrade behavior is more trouble than
it's worth - this is a pretty invasive change. Happy to iterate further
or just drop these changes.
---
 gcc/common/config/riscv/riscv-common.cc | 111 +---
 gcc/config/riscv/riscv-protos.h |   3 +-
 gcc/config/riscv/riscv-subset.h |   2 +-
 gcc/config/riscv/riscv-target-attr.cc   |   4 +-
 gcc/config/riscv/riscv.cc   |   7 +-
 gcc/config/riscv/riscv.h|  46 ++
 6 files changed, 137 insertions(+), 36 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 1dc1d9904c7..05c26f73b73 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -907,7 +907,7 @@ riscv_subset_list::add (const char *subset, bool implied_p)
VERSION_P to determine append version info or not.  */

 std::string
-riscv_subset_list::to_string (bool version_p) const
+riscv_subset_list::to_string (bool version_p, bool upgrade_exts) const
 {
   std::ostringstream oss;
   oss << "rv" << m_xlen;
@@ -916,10 +916,17 @@ riscv_subset_list::to_string (bool version_p) const
   riscv_subset_t *subset;

   bool skip_zifencei = false;
-  bool skip_zaamo_zalrsc = false;
   bool skip_zicsr = false;
   bool i2p0 = false;

+#ifndef HAVE_AS_MARCH_ZAAMO_ZALRSC
+  bool upgrade_zaamo_zalrsc = false;
+  bool has_a_ext = false;
+  bool insert_a_ext = false;
+  bool inserted_a_ext = false;
+  riscv_subset_t *a_subset;
+#endif
+
   /* For RISC-V ISA version 2.2 or earlier version, zicsr and zifencei is
  included in the base ISA.  */
   if (riscv_isa_spec == ISA_SPEC_CLASS_2P2)
@@ -945,8 +952,33 @@ riscv_subset_list::to_string (bool version_p) const
   skip_zifencei = true;
 #endif
 #ifndef HAVE_AS_MARCH_ZAAMO_ZALRSC
-  /* Skip since binutils 2.42 and earlier don't recognize zaamo/zalrsc.  */
-  skip_zaamo_zalrsc = true;
+  /* Upgrade Zaamo/Zalrsc extensions to 'a' since binutils 2.42 and earlier
+ don't recognize zaamo/zalrsc.  */
+  upgrade_zaamo_zalrsc = upgrade_exts;
+  if (upgrade_zaamo_zalrsc)
+{
+  for (subset = m_head; subset != NULL; subset = subset->next)
+   {
+ if (subset->name == "a")
+   has_a_ext = true;
+ if (subset->name == "zaamo" || subset->name == "zalrsc")
+   insert_a_ext = true;
+   }
+  if (insert_a_ext && !has_a_ext)
+   {
+ unsigned int major_version = 0, minor_version = 0;
+ get_default_version ("a", &major_version, &minor_version);
+ a_subset = new riscv_subset_t ();
+ a_subset->name = "a";
+ a_subset->implied_p = false;
+ a_subset->major_version = major_version;
+ a_subset->minor_version = minor_version;
+   }
+}
+#else
+  /* Silence unused parameter warning when HAV

RE: [PATCH] aarch64: Add fix_truncv4sfv4hi2 pattern [PR113882]

2024-06-17 Thread Pengxuan Zheng (QUIC)
> Pengxuan Zheng  writes:
> > This patch adds the fix_truncv4sfv4hi2 (V4SF->V4HI) pattern which is
> > implemented using fix_truncv4sfv4si2 (V4SF->V4SI) and then truncv4siv4hi2
> (V4SI->V4HI).
> >
> > PR target/113882
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-simd.md (fix_truncv4sfv4hi2): New pattern.
> 
> Could we handle this by extending the target-independent code instead?
> Richard mentioned in comment 1 that the current set of intermediate
> conversions is hard-coded, but it didn't sound like he was implying that the
> set shouldn't change.

Yes, Richard. I checked the target-independent code. In fact, SLP already 
handles this type of intermediate conversions. However, the logic is guarded by 
"!flag_trapping_math". Therefore, if we pass -fno-trapping-math , SLP actually 
generates the right vectorized code. Also, looks like the check for 
"!flag_trapping_math" was added intentionally in r14-2085-g77a50c772771f6 to 
fix 
some PRs. So, I'm not sure what we should do here. Thoughts?

  if (GET_MODE_SIZE (lhs_mode) != GET_MODE_SIZE (rhs_mode)
  && (code == FLOAT_EXPR ||
  (code == FIX_TRUNC_EXPR && !flag_trapping_math)))

Thanks,
Pengxuan
> 
> Thanks,
> Richard
> 
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/fix_trunc2.c: New test.
> >
> > Signed-off-by: Pengxuan Zheng 
> > ---
> >  gcc/config/aarch64/aarch64-simd.md| 13 +
> >  gcc/testsuite/gcc.target/aarch64/fix_trunc2.c | 14 ++
> >  2 files changed, 27 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/fix_trunc2.c
> >
> > diff --git a/gcc/config/aarch64/aarch64-simd.md
> > b/gcc/config/aarch64/aarch64-simd.md
> > index 868f4486218..096f7b56a27 100644
> > --- a/gcc/config/aarch64/aarch64-simd.md
> > +++ b/gcc/config/aarch64/aarch64-simd.md
> > @@ -3032,6 +3032,19 @@ (define_expand
> "2"
> >"TARGET_SIMD"
> >{})
> >
> > +
> > +(define_expand "fix_truncv4sfv4hi2"
> > +  [(match_operand:V4HI 0 "register_operand")
> > +   (match_operand:V4SF 1 "register_operand")]
> > +  "TARGET_SIMD"
> > +  {
> > +rtx tmp = gen_reg_rtx (V4SImode);
> > +emit_insn (gen_fix_truncv4sfv4si2 (tmp, operands[1]));
> > +emit_insn (gen_truncv4siv4hi2 (operands[0], tmp));
> > +DONE;
> > +  }
> > +)
> > +
> >  (define_expand "ftrunc2"
> >[(set (match_operand:VHSDF 0 "register_operand")
> > (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand")] diff
> > --git a/gcc/testsuite/gcc.target/aarch64/fix_trunc2.c
> > b/gcc/testsuite/gcc.target/aarch64/fix_trunc2.c
> > new file mode 100644
> > index 000..57cc00913a3
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/fix_trunc2.c
> > @@ -0,0 +1,14 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2" } */
> > +
> > +void
> > +f (short *__restrict a, float *__restrict b) {
> > +  a[0] = b[0];
> > +  a[1] = b[1];
> > +  a[2] = b[2];
> > +  a[3] = b[3];
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {fcvtzs\tv[0-9]+.4s, v[0-9]+.4s}
> > +1 } } */
> > +/* { dg-final { scan-assembler-times {xtn\tv[0-9]+.4h, v[0-9]+.4s} 1
> > +} } */


[PATCH] function.h: eliminate macros "dom_computed" and "n_bbs_in_dom_tree"

2024-06-17 Thread David Malcolm
Be explicit when we use "cfun".

No functional change intended.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

OK for trunk?

gcc/ChangeLog:
* dominance.cc (compute_dom_fast_query): Replace uses of
"dom_computed" macro with explicit use of cfun.
(compute_dom_fast_query_in_region): Likewise.
(calculate_dominance_info): Likewise, also for macro
"n_bbs_in_dom_tree".
(calculate_dominance_info_for_region): Likewise for
"dom_computed" macro.
(get_immediate_dominator): Likewise.
(set_immediate_dominator): Likewise.
(get_dominated_by): Likewise.
(redirect_immediate_dominators): Likewise.
(nearest_common_dominator): Likewise.
(dominated_by_p): Likewise.
(bb_dom_dfs_in): Likewise.
(bb_dom_dfs_out): Likewise.
(recompute_dominator): Likewise.
(iterate_fix_dominators): Likewise.
(add_to_dominance_info): Likewise, also for macro
"n_bbs_in_dom_tree".
(delete_from_dominance_info): Likewise.
(set_dom_info_availability): Likewise for
"dom_computed" macro.
* function.h (dom_computed): Delete macro.
(n_bbs_in_dom_tree): Delete macro.

Signed-off-by: David Malcolm 
---
 gcc/dominance.cc | 70 +---
 gcc/function.h   |  3 ---
 2 files changed, 36 insertions(+), 37 deletions(-)

diff --git a/gcc/dominance.cc b/gcc/dominance.cc
index 0357210ed27f..528b38caa9db 100644
--- a/gcc/dominance.cc
+++ b/gcc/dominance.cc
@@ -672,7 +672,7 @@ compute_dom_fast_query (enum cdi_direction dir)
 
   gcc_checking_assert (dom_info_available_p (dir));
 
-  if (dom_computed[dir_index] == DOM_OK)
+  if (cfun->cfg->x_dom_computed[dir_index] == DOM_OK)
 return;
 
   FOR_ALL_BB_FN (bb, cfun)
@@ -681,7 +681,7 @@ compute_dom_fast_query (enum cdi_direction dir)
assign_dfs_numbers (bb->dom[dir_index], &num);
 }
 
-  dom_computed[dir_index] = DOM_OK;
+  cfun->cfg->x_dom_computed[dir_index] = DOM_OK;
 }
 
 /* Analogous to the previous function but compute the data for reducible
@@ -697,7 +697,7 @@ compute_dom_fast_query_in_region (enum cdi_direction dir,
 
   gcc_checking_assert (dom_info_available_p (dir));
 
-  if (dom_computed[dir_index] == DOM_OK)
+  if (cfun->cfg->x_dom_computed[dir_index] == DOM_OK)
 return;
 
   /* Assign dfs numbers for region nodes except for entry and exit nodes.  */
@@ -708,7 +708,7 @@ compute_dom_fast_query_in_region (enum cdi_direction dir,
assign_dfs_numbers (bb->dom[dir_index], &num);
 }
 
-  dom_computed[dir_index] = DOM_OK;
+  cfun->cfg->x_dom_computed[dir_index] = DOM_OK;
 }
 
 /* The main entry point into this module.  DIR is set depending on whether
@@ -721,7 +721,7 @@ calculate_dominance_info (cdi_direction dir, bool 
compute_fast_query)
 {
   unsigned int dir_index = dom_convert_dir_to_idx (dir);
 
-  if (dom_computed[dir_index] == DOM_OK)
+  if (cfun->cfg->x_dom_computed[dir_index] == DOM_OK)
 {
   checking_verify_dominators (dir);
   return;
@@ -730,14 +730,14 @@ calculate_dominance_info (cdi_direction dir, bool 
compute_fast_query)
   timevar_push (TV_DOMINANCE);
   if (!dom_info_available_p (dir))
 {
-  gcc_assert (!n_bbs_in_dom_tree[dir_index]);
+  gcc_assert (!cfun->cfg->x_n_bbs_in_dom_tree[dir_index]);
 
   basic_block b;
   FOR_ALL_BB_FN (b, cfun)
{
  b->dom[dir_index] = et_new_tree (b);
}
-  n_bbs_in_dom_tree[dir_index] = n_basic_blocks_for_fn (cfun);
+  cfun->cfg->x_n_bbs_in_dom_tree[dir_index] = n_basic_blocks_for_fn (cfun);
 
   dom_info di (cfun, dir);
   di.calc_dfs_tree ();
@@ -749,7 +749,7 @@ calculate_dominance_info (cdi_direction dir, bool 
compute_fast_query)
et_set_father (b->dom[dir_index], d->dom[dir_index]);
}
 
-  dom_computed[dir_index] = DOM_NO_FAST_QUERY;
+  cfun->cfg->x_dom_computed[dir_index] = DOM_NO_FAST_QUERY;
 }
   else
 checking_verify_dominators (dir);
@@ -772,7 +772,7 @@ calculate_dominance_info_for_region (cdi_direction dir,
   basic_block bb;
   unsigned int i;
 
-  if (dom_computed[dir_index] == DOM_OK)
+  if (cfun->cfg->x_dom_computed[dir_index] == DOM_OK)
 return;
 
   timevar_push (TV_DOMINANCE);
@@ -791,7 +791,7 @@ calculate_dominance_info_for_region (cdi_direction dir,
 if (basic_block d = di.get_idom (bb))
   et_set_father (bb->dom[dir_index], d->dom[dir_index]);
 
-  dom_computed[dir_index] = DOM_NO_FAST_QUERY;
+  cfun->cfg->x_dom_computed[dir_index] = DOM_NO_FAST_QUERY;
   compute_dom_fast_query_in_region (dir, region);
 
   timevar_pop (TV_DOMINANCE);
@@ -858,7 +858,7 @@ get_immediate_dominator (enum cdi_direction dir, 
basic_block bb)
   unsigned int dir_index = dom_convert_dir_to_idx (dir);
   struct et_node *node = bb->dom[dir_index];
 
-  gcc_checking_assert (dom_computed[dir_index]);
+  gcc_checking_assert (cfun->cfg->x_dom_computed[dir_index]);
 
   if (!node->father)
   

[PATCH 00/11] CodeView variables and type system

2024-06-17 Thread Mark Harmstone
This patch series adds support for outputting global variables when the
-gcodeview option is provided, along with the type system to go along
with this.

As with previous patches, the best way to see the output is run
Microsoft's cvdump.exe against the object file:
https://github.com/microsoft/microsoft-pdb/raw/master/cvdump/cvdump.exe

You'll also need a recentish version of binutils in order to get ld to
output an actual PDB file that can be read by MSVC or windbg.

This ought to be fairly complete as far as C is concerned. Still to come
are functions, local variables, and some C++ things.

Mark Harmstone (11):
  Output CodeView data about variables
  Handle CodeView base types
  Handle typedefs for CodeView
  Handle pointers for CodeView
  Handle const and varible modifiers for CodeView
  Handle enums for CodeView
  Handle structs and classes for CodeView
  Handle unions for CodeView.
  Handle arrays for CodeView
  Handle bitfields for CodeView
  Handle subroutine types in CodeView

 gcc/dwarf2codeview.cc | 2278 -
 gcc/dwarf2codeview.h  |   67 ++
 gcc/dwarf2out.cc  |5 +
 3 files changed, 2341 insertions(+), 9 deletions(-)

-- 
2.44.2



[PATCH 04/11] Handle pointers for CodeView

2024-06-17 Thread Mark Harmstone
Translates DW_TAG_pointer_type DIEs into LF_POINTER symbols, which get
output into the .debug$T section.

gcc/
* dwarf2codeview.cc (FIRST_TYPE): Define.
(struct codeview_custom_type): New structure.
(custom_types, last_custom_type): New variables.
(get_type_num): Prototype.
(write_lf_pointer, write_custom_types): New functions.
(codeview_debug_finish): Call write_custom_types.
(add_custom_type, get_type_num_pointer_type): New functions.
(get_type_num): Handle DW_TAG_pointer_type DIEs.
* dwarf2codeview.h (T_VOID): Define.
(CV_POINTER_32, CV_POINTER_64): Likewise.
(T_32PVOID, T_64PVOID): Likewise.
(CV_PTR_NEAR32, CV_PTR64, LF_POINTER): Likewise.
---
 gcc/dwarf2codeview.cc | 179 +-
 gcc/dwarf2codeview.h  |  13 +++
 2 files changed, 188 insertions(+), 4 deletions(-)

diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc
index 5006a176260..51401f2d5bc 100644
--- a/gcc/dwarf2codeview.cc
+++ b/gcc/dwarf2codeview.cc
@@ -56,6 +56,8 @@ along with GCC; see the file COPYING3.  If not see
 #define CV_CFL_C   0x00
 #define CV_CFL_CXX 0x01
 
+#define FIRST_TYPE 0x1000
+
 #define LINE_LABEL "Lcvline"
 #define END_FUNC_LABEL "Lcvendfunc"
 #define SYMBOL_START_LABEL "Lcvsymstart"
@@ -168,6 +170,22 @@ struct die_hasher : free_ptr_hash 
   }
 };
 
+struct codeview_custom_type
+{
+  struct codeview_custom_type *next;
+  uint32_t num;
+  uint16_t kind;
+
+  union
+  {
+struct
+{
+  uint32_t base_type;
+  uint32_t attributes;
+} lf_pointer;
+  };
+};
+
 static unsigned int line_label_num;
 static unsigned int func_label_num;
 static unsigned int sym_label_num;
@@ -181,6 +199,9 @@ static const char* last_filename;
 static uint32_t last_file_id;
 static codeview_symbol *sym, *last_sym;
 static hash_table *types_htab;
+static codeview_custom_type *custom_types, *last_custom_type;
+
+static uint32_t get_type_num (dw_die_ref type);
 
 /* Record new line number against the current function.  */
 
@@ -845,6 +866,71 @@ write_codeview_symbols (void)
   asm_fprintf (asm_out_file, "%LLcv_syms_end:\n");
 }
 
+/* Write an LF_POINTER type.  */
+
+static void
+write_lf_pointer (codeview_custom_type *t)
+{
+  /* This is lf_pointer in binutils and lfPointer in Microsoft's cvinfo.h:
+
+struct lf_pointer
+{
+  uint16_t size;
+  uint16_t kind;
+  uint32_t base_type;
+  uint32_t attributes;
+} ATTRIBUTE_PACKED;
+  */
+
+  fputs (integer_asm_op (2, false), asm_out_file);
+  asm_fprintf (asm_out_file, "%LLcv_type%x_end - %LLcv_type%x_start\n",
+  t->num, t->num);
+
+  asm_fprintf (asm_out_file, "%LLcv_type%x_start:\n", t->num);
+
+  fputs (integer_asm_op (2, false), asm_out_file);
+  fprint_whex (asm_out_file, t->kind);
+  putc ('\n', asm_out_file);
+
+  fputs (integer_asm_op (4, false), asm_out_file);
+  fprint_whex (asm_out_file, t->lf_pointer.base_type);
+  putc ('\n', asm_out_file);
+
+  fputs (integer_asm_op (4, false), asm_out_file);
+  fprint_whex (asm_out_file, t->lf_pointer.attributes);
+  putc ('\n', asm_out_file);
+
+  asm_fprintf (asm_out_file, "%LLcv_type%x_end:\n", t->num);
+}
+
+/* Write the .debug$T section, which contains all of our custom type
+   definitions.  */
+
+static void
+write_custom_types (void)
+{
+  targetm.asm_out.named_section (".debug$T", SECTION_DEBUG, NULL);
+
+  fputs (integer_asm_op (4, false), asm_out_file);
+  fprint_whex (asm_out_file, CV_SIGNATURE_C13);
+  putc ('\n', asm_out_file);
+
+  while (custom_types)
+{
+  codeview_custom_type *n = custom_types->next;
+
+  switch (custom_types->kind)
+   {
+   case LF_POINTER:
+ write_lf_pointer (custom_types);
+ break;
+   }
+
+  free (custom_types);
+  custom_types = n;
+}
+}
+
 /* Finish CodeView debug info emission.  */
 
 void
@@ -861,6 +947,9 @@ codeview_debug_finish (void)
   write_line_numbers ();
   write_codeview_symbols ();
 
+  if (custom_types)
+write_custom_types ();
+
   if (types_htab)
 delete types_htab;
 }
@@ -993,10 +1082,88 @@ get_type_num_base_type (dw_die_ref type)
 }
 }
 
-/* Process a DIE representing a type definition and return its number.  If
-   it's something we can't handle, return 0.  We keep a hash table so that
-   we're not adding the same type multiple times - though if we do it's not
-   disastrous, as ld will deduplicate everything for us.  */
+/* Add a new codeview_custom_type to our singly-linked custom_types list.  */
+
+static void
+add_custom_type (codeview_custom_type *ct)
+{
+  uint32_t num;
+
+  if (last_custom_type)
+{
+  num = last_custom_type->num + 1;
+  last_custom_type->next = ct;
+}
+  else
+{
+  num = FIRST_TYPE;
+  custom_types = ct;
+}
+
+  last_custom_type = ct;
+
+  ct->num = num;
+}
+
+/* Process a DW_TAG_pointer_type DIE.  If 

[PATCH 01/11] Output CodeView data about variables

2024-06-17 Thread Mark Harmstone
Parse the DW_TAG_variable DIEs, and outputs S_GDATA32 (for global variables)
and S_LDATA32 (static global variables) symbols into the .debug$S section.

gcc/
* dwarf2codeview.cc (S_LDATA32, S_GDATA32): Define.
(struct codeview_symbol): New structure.
(sym, last_sym): New variables.
(write_data_symbol): New function.
(write_codeview_symbols): Call write_data_symbol.
(add_variable, codeview_debug_early_finish): New functions.
* dwarf2codeview.h (codeview_debug_early_finish): Prototype.
* dwarf2out.cc
(dwarf2out_early_finish): Call codeview_debug_early_finish.
---
 gcc/dwarf2codeview.cc | 160 ++
 gcc/dwarf2codeview.h  |   1 +
 gcc/dwarf2out.cc  |   5 ++
 3 files changed, 166 insertions(+)

diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc
index db776d79be4..60e84635971 100644
--- a/gcc/dwarf2codeview.cc
+++ b/gcc/dwarf2codeview.cc
@@ -46,6 +46,8 @@ along with GCC; see the file COPYING3.  If not see
 
 #define CHKSUM_TYPE_MD51
 
+#define S_LDATA32  0x110c
+#define S_GDATA32  0x110d
 #define S_COMPILE3 0x113c
 
 #define CV_CFL_80386   0x03
@@ -129,6 +131,22 @@ struct codeview_function
   codeview_line_block *blocks, *last_block;
 };
 
+struct codeview_symbol
+{
+  codeview_symbol *next;
+  uint16_t kind;
+
+  union
+  {
+struct
+{
+  uint32_t type;
+  char *name;
+  dw_die_ref die;
+} data_symbol;
+  };
+};
+
 static unsigned int line_label_num;
 static unsigned int func_label_num;
 static unsigned int sym_label_num;
@@ -140,6 +158,7 @@ static codeview_string *strings, *last_string;
 static codeview_function *funcs, *last_func;
 static const char* last_filename;
 static uint32_t last_file_id;
+static codeview_symbol *sym, *last_sym;
 
 /* Record new line number against the current function.  */
 
@@ -698,6 +717,77 @@ write_compile3_symbol (void)
   targetm.asm_out.internal_label (asm_out_file, SYMBOL_END_LABEL, label_num);
 }
 
+/* Write an S_GDATA32 symbol, representing a global variable, or an S_LDATA32
+   symbol, for a static global variable.  */
+
+static void
+write_data_symbol (codeview_symbol *s)
+{
+  unsigned int label_num = ++sym_label_num;
+  dw_attr_node *loc;
+  dw_loc_descr_ref loc_ref;
+
+  /* This is struct datasym in binutils:
+
+  struct datasym
+  {
+   uint16_t size;
+   uint16_t kind;
+   uint32_t type;
+   uint32_t offset;
+   uint16_t section;
+   char name[];
+  } ATTRIBUTE_PACKED;
+  */
+
+  /* Extract the DW_AT_location attribute from the DIE, and make sure it's in
+ in a format we can parse.  */
+
+  loc = get_AT (s->data_symbol.die, DW_AT_location);
+  if (!loc)
+goto end;
+
+  if (loc->dw_attr_val.val_class != dw_val_class_loc)
+goto end;
+
+  loc_ref = loc->dw_attr_val.v.val_loc;
+  if (!loc_ref || loc_ref->dw_loc_opc != DW_OP_addr)
+goto end;
+
+  /* Output the S_GDATA32 / S_LDATA32 record.  */
+
+  fputs (integer_asm_op (2, false), asm_out_file);
+  asm_fprintf (asm_out_file,
+  "%L" SYMBOL_END_LABEL "%u - %L" SYMBOL_START_LABEL "%u\n",
+  label_num, label_num);
+
+  targetm.asm_out.internal_label (asm_out_file, SYMBOL_START_LABEL, label_num);
+
+  fputs (integer_asm_op (2, false), asm_out_file);
+  fprint_whex (asm_out_file, s->kind);
+  putc ('\n', asm_out_file);
+
+  fputs (integer_asm_op (4, false), asm_out_file);
+  fprint_whex (asm_out_file, s->data_symbol.type);
+  putc ('\n', asm_out_file);
+
+  asm_fprintf (asm_out_file, "\t.secrel32 ");
+  output_addr_const (asm_out_file, loc_ref->dw_loc_oprnd1.v.val_addr);
+  fputc ('\n', asm_out_file);
+
+  asm_fprintf (asm_out_file, "\t.secidx ");
+  output_addr_const (asm_out_file, loc_ref->dw_loc_oprnd1.v.val_addr);
+  fputc ('\n', asm_out_file);
+
+  ASM_OUTPUT_ASCII (asm_out_file, s->data_symbol.name,
+   strlen (s->data_symbol.name) + 1);
+
+  targetm.asm_out.internal_label (asm_out_file, SYMBOL_END_LABEL, label_num);
+
+end:
+  free (s->data_symbol.name);
+}
+
 /* Write the CodeView symbols into the .debug$S section.  */
 
 static void
@@ -714,6 +804,22 @@ write_codeview_symbols (void)
 
   write_compile3_symbol ();
 
+  while (sym)
+{
+  codeview_symbol *n = sym->next;
+
+  switch (sym->kind)
+   {
+   case S_LDATA32:
+   case S_GDATA32:
+ write_data_symbol (sym);
+ break;
+   }
+
+  free (sym);
+  sym = n;
+}
+
   asm_fprintf (asm_out_file, "%LLcv_syms_end:\n");
 }
 
@@ -734,4 +840,58 @@ codeview_debug_finish (void)
   write_codeview_symbols ();
 }
 
+/* Process a DW_TAG_variable DIE, and add an S_GDATA32 or S_LDATA32 symbol for
+   this.  */
+
+static void
+add_variable (dw_die_ref die)
+{
+  codeview_symbol *s;
+  const char *name;
+
+  name = get_AT_string (die, DW_AT_name);
+  if (!name)
+return;
+
+  s = (codeview_symbol *) xm

[PATCH 07/11] Handle structs and classes for CodeView

2024-06-17 Thread Mark Harmstone
Translates DW_TAG_structure_type DIEs into LF_STRUCTURE symbols, and
DW_TAG_class_type DIEs into LF_CLASS symbols.

gcc/
* dwarf2codeview.cc
(struct codeview_type): Add is_fwd_ref member.
(struct codeview_subtype): Add lf_member to union.
(struct codeview_custom_type): Add lf_structure to union.
(struct codeview_deferred_type): New structure.
(deferred_types, last_deferred_type): New variables.
(get_type_num): Add new args to prototype.
(write_lf_fieldlist): Handle LF_MEMBER subtypes.
(write_lf_structure): New function.
(write_custom_types): Call write_lf_structure.
(get_type_num_pointer_type): Add in_struct argument.
(get_type_num_const_type): Likewise.
(get_type_num_volatile_type): Likewise.
(add_enum_forward_def): Fix get_type_num call.
(get_type_num_enumeration_type): Add in-struct argument.
(add_deferred_type, flush_deferred_types): New functions.
(add_struct_forward_def, get_type_num_struct): Likewise.
(get_type_num): Handle self-referential structs.
(add_variable): Fix get_type_num call.
(codeview_debug_early_finish): Call flush_deferred_types.
* dwarf2codeview.h (LF_CLASS, LF_STRUCTURE, LF_MEMBER): Define.
---
 gcc/dwarf2codeview.cc | 513 --
 gcc/dwarf2codeview.h  |   3 +
 2 files changed, 493 insertions(+), 23 deletions(-)

diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc
index 475a53573e9..9c6614f6297 100644
--- a/gcc/dwarf2codeview.cc
+++ b/gcc/dwarf2codeview.cc
@@ -158,6 +158,7 @@ struct codeview_type
 {
   dw_die_ref die;
   uint32_t num;
+  bool is_fwd_ref;
 };
 
 struct die_hasher : free_ptr_hash 
@@ -197,6 +198,13 @@ struct codeview_subtype
 {
   uint32_t type_num;
 } lf_index;
+struct
+{
+  uint16_t attributes;
+  uint32_t type;
+  codeview_integer offset;
+  char *name;
+} lf_member;
   };
 };
 
@@ -232,9 +240,25 @@ struct codeview_custom_type
   uint32_t fieldlist;
   char *name;
 } lf_enum;
+struct
+{
+  uint16_t num_members;
+  uint16_t properties;
+  uint32_t field_list;
+  uint32_t derived_from;
+  uint32_t vshape;
+  codeview_integer length;
+  char *name;
+} lf_structure;
   };
 };
 
+struct codeview_deferred_type
+{
+  struct codeview_deferred_type *next;
+  dw_die_ref type;
+};
+
 static unsigned int line_label_num;
 static unsigned int func_label_num;
 static unsigned int sym_label_num;
@@ -249,8 +273,9 @@ static uint32_t last_file_id;
 static codeview_symbol *sym, *last_sym;
 static hash_table *types_htab;
 static codeview_custom_type *custom_types, *last_custom_type;
+static codeview_deferred_type *deferred_types, *last_deferred_type;
 
-static uint32_t get_type_num (dw_die_ref type);
+static uint32_t get_type_num (dw_die_ref type, bool in_struct, bool 
no_fwd_ref);
 
 /* Record new line number against the current function.  */
 
@@ -1217,6 +1242,51 @@ write_lf_fieldlist (codeview_custom_type *t)
  free (v->lf_enumerate.name);
  break;
 
+   case LF_MEMBER:
+ /* This is lf_member in binutils and lfMember in Microsoft's
+cvinfo.h:
+
+   struct lf_member
+   {
+ uint16_t kind;
+ uint16_t attributes;
+ uint32_t type;
+ uint16_t offset;
+ char name[];
+   } ATTRIBUTE_PACKED;
+ */
+
+ fputs (integer_asm_op (2, false), asm_out_file);
+ fprint_whex (asm_out_file, LF_MEMBER);
+ putc ('\n', asm_out_file);
+
+ fputs (integer_asm_op (2, false), asm_out_file);
+ fprint_whex (asm_out_file, v->lf_member.attributes);
+ putc ('\n', asm_out_file);
+
+ fputs (integer_asm_op (4, false), asm_out_file);
+ fprint_whex (asm_out_file, v->lf_member.type);
+ putc ('\n', asm_out_file);
+
+ leaf_len = 8 + write_cv_integer (&v->lf_member.offset);
+
+ if (v->lf_member.name)
+   {
+ name_len = strlen (v->lf_member.name) + 1;
+ ASM_OUTPUT_ASCII (asm_out_file, v->lf_member.name, name_len);
+   }
+ else
+   {
+ name_len = 1;
+ ASM_OUTPUT_ASCII (asm_out_file, "", name_len);
+   }
+
+ leaf_len += name_len;
+ write_cv_padding (4 - (leaf_len % 4));
+
+ free (v->lf_member.name);
+ break;
+
case LF_INDEX:
  /* This is lf_index in binutils and lfIndex in Microsoft's cvinfo.h:
 
@@ -1308,6 +1378,82 @@ write_lf_enum (codeview_custom_type *t)
   asm_fprintf (asm_out_file, "%LLcv_type%x_end:\n", t->num);
 }
 
+/* Write an LF_STRUCTURE or LF_CLASS type (the two have the same structure).  
*/
+
+static void
+write_lf_structure (codeview_custom_type *t)
+{
+  size_t name_len, leaf_len

[PATCH 05/11] Handle const and varible modifiers for CodeView

2024-06-17 Thread Mark Harmstone
Translate DW_TAG_const_type and DW_TAG_volatile_type DIEs into
LF_MODIFIER symbols.

gcc/
* dwarf2codeview.cc
(struct codeview_custom_type): Add lf_modifier to union.
(write_cv_padding, write_lf_modifier): New functions.
(write_custom_types): Call write_lf_modifier.
(get_type_num_const_type): New function.
(get_type_num_volatile_type): Likewise.
(get_type_num): Handle DW_TAG_const_type and
DW_TAG_volatile_type DIEs.
* dwarf2codeview.h (MOD_const, MOD_volatile): Define.
(LF_MODIFIER): Likewise.
---
 gcc/dwarf2codeview.cc | 157 ++
 gcc/dwarf2codeview.h  |   5 ++
 2 files changed, 162 insertions(+)

diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc
index 51401f2d5bc..05f5f60997e 100644
--- a/gcc/dwarf2codeview.cc
+++ b/gcc/dwarf2codeview.cc
@@ -183,6 +183,11 @@ struct codeview_custom_type
   uint32_t base_type;
   uint32_t attributes;
 } lf_pointer;
+struct
+{
+  uint32_t base_type;
+  uint16_t modifier;
+} lf_modifier;
   };
 };
 
@@ -903,6 +908,76 @@ write_lf_pointer (codeview_custom_type *t)
   asm_fprintf (asm_out_file, "%LLcv_type%x_end:\n", t->num);
 }
 
+/* All CodeView type definitions have to be aligned to a four-byte boundary,
+   so write some padding bytes if necessary.  These have to be specific values:
+   f3, f2, f1.  */
+
+static void
+write_cv_padding (size_t padding)
+{
+  if (padding == 4 || padding == 0)
+return;
+
+  if (padding == 3)
+{
+  fputs (integer_asm_op (1, false), asm_out_file);
+  fprint_whex (asm_out_file, 0xf3);
+  putc ('\n', asm_out_file);
+}
+
+  if (padding >= 2)
+{
+  fputs (integer_asm_op (1, false), asm_out_file);
+  fprint_whex (asm_out_file, 0xf2);
+  putc ('\n', asm_out_file);
+}
+
+  fputs (integer_asm_op (1, false), asm_out_file);
+  fprint_whex (asm_out_file, 0xf1);
+  putc ('\n', asm_out_file);
+}
+
+/* Write an LF_MODIFIER type, representing a const and/or volatile modification
+   of another type.  */
+
+static void
+write_lf_modifier (codeview_custom_type *t)
+{
+  /* This is lf_modifier in binutils and lfModifier in Microsoft's cvinfo.h:
+
+struct lf_modifier
+{
+  uint16_t size;
+  uint16_t kind;
+  uint32_t base_type;
+  uint16_t modifier;
+  uint16_t padding;
+} ATTRIBUTE_PACKED;
+  */
+
+  fputs (integer_asm_op (2, false), asm_out_file);
+  asm_fprintf (asm_out_file, "%LLcv_type%x_end - %LLcv_type%x_start\n",
+  t->num, t->num);
+
+  asm_fprintf (asm_out_file, "%LLcv_type%x_start:\n", t->num);
+
+  fputs (integer_asm_op (2, false), asm_out_file);
+  fprint_whex (asm_out_file, t->kind);
+  putc ('\n', asm_out_file);
+
+  fputs (integer_asm_op (4, false), asm_out_file);
+  fprint_whex (asm_out_file, t->lf_modifier.base_type);
+  putc ('\n', asm_out_file);
+
+  fputs (integer_asm_op (2, false), asm_out_file);
+  fprint_whex (asm_out_file, t->lf_modifier.modifier);
+  putc ('\n', asm_out_file);
+
+  write_cv_padding (2);
+
+  asm_fprintf (asm_out_file, "%LLcv_type%x_end:\n", t->num);
+}
+
 /* Write the .debug$T section, which contains all of our custom type
definitions.  */
 
@@ -924,6 +999,10 @@ write_custom_types (void)
case LF_POINTER:
  write_lf_pointer (custom_types);
  break;
+
+   case LF_MODIFIER:
+ write_lf_modifier (custom_types);
+ break;
}
 
   free (custom_types);
@@ -1159,6 +1238,76 @@ get_type_num_pointer_type (dw_die_ref type)
   return ct->num;
 }
 
+/* Process a DW_TAG_const_type DIE, adding an LF_MODIFIER type and returning
+   its number.  */
+
+static uint32_t
+get_type_num_const_type (dw_die_ref type)
+{
+  dw_die_ref base_type;
+  uint32_t base_type_num;
+  codeview_custom_type *ct;
+  bool is_volatile = false;
+
+  base_type = get_AT_ref (type, DW_AT_type);
+  if (!base_type)
+return 0;
+
+  /* Handle case when this is a const volatile type - we only need one
+ LF_MODIFIER for this.  */
+  if (dw_get_die_tag (base_type) == DW_TAG_volatile_type)
+{
+  is_volatile = true;
+
+  base_type = get_AT_ref (base_type, DW_AT_type);
+  if (!base_type)
+   return 0;
+}
+
+  base_type_num = get_type_num (base_type);
+  if (base_type_num == 0)
+return 0;
+
+  ct = (codeview_custom_type *) xmalloc (sizeof (codeview_custom_type));
+
+  ct->next = NULL;
+  ct->kind = LF_MODIFIER;
+  ct->lf_modifier.base_type = base_type_num;
+  ct->lf_modifier.modifier = MOD_const;
+
+  if (is_volatile)
+ct->lf_modifier.modifier |= MOD_volatile;
+
+  add_custom_type (ct);
+
+  return ct->num;
+}
+
+/* Process a DW_TAG_volatile_type DIE, adding an LF_MODIFIER type and
+   returning its number.  */
+
+static uint32_t
+get_type_num_volatile_type (dw_die_ref type)
+{
+  uint32_t base_type_num;
+  codeview_custom_type *ct;
+
+  base_type_num = get_type_num (get_AT_ref (type, DW_AT_ty

[PATCH 10/11] Handle bitfields for CodeView

2024-06-17 Thread Mark Harmstone
Translates structure members with DW_AT_data_bit_offset set in DWARF
into LF_BITFIELD symbols.

gcc/
* dwarf2codeview.cc
(struct codeview_custom_type): Add lf_bitfield to union.
(write_lf_bitfield): New function.
(write_custom_types): Call write_lf_bitfield.
(create_bitfield): New function.
(get_type_num_struct): Handle bitfields.
* dwarf2codeview.h (LF_BITFIELD): Define.
---
 gcc/dwarf2codeview.cc | 89 ++-
 gcc/dwarf2codeview.h  |  1 +
 2 files changed, 88 insertions(+), 2 deletions(-)

diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc
index 3f1ce5577fc..06267639169 100644
--- a/gcc/dwarf2codeview.cc
+++ b/gcc/dwarf2codeview.cc
@@ -256,6 +256,12 @@ struct codeview_custom_type
   uint32_t index_type;
   codeview_integer length_in_bytes;
 } lf_array;
+struct
+{
+  uint32_t base_type;
+  uint8_t length;
+  uint8_t position;
+} lf_bitfield;
   };
 };
 
@@ -1573,6 +1579,50 @@ write_lf_array (codeview_custom_type *t)
   asm_fprintf (asm_out_file, "%LLcv_type%x_end:\n", t->num);
 }
 
+/* Write an LF_BITFIELD type.  */
+
+static void
+write_lf_bitfield (codeview_custom_type *t)
+{
+  /* This is lf_bitfield in binutils and lfBitfield in Microsoft's cvinfo.h:
+
+struct lf_bitfield
+{
+  uint16_t size;
+  uint16_t kind;
+  uint32_t base_type;
+  uint8_t length;
+  uint8_t position;
+} ATTRIBUTE_PACKED;
+  */
+
+  fputs (integer_asm_op (2, false), asm_out_file);
+  asm_fprintf (asm_out_file, "%LLcv_type%x_end - %LLcv_type%x_start\n",
+  t->num, t->num);
+
+  asm_fprintf (asm_out_file, "%LLcv_type%x_start:\n", t->num);
+
+  fputs (integer_asm_op (2, false), asm_out_file);
+  fprint_whex (asm_out_file, t->kind);
+  putc ('\n', asm_out_file);
+
+  fputs (integer_asm_op (4, false), asm_out_file);
+  fprint_whex (asm_out_file, t->lf_bitfield.base_type);
+  putc ('\n', asm_out_file);
+
+  fputs (integer_asm_op (1, false), asm_out_file);
+  fprint_whex (asm_out_file, t->lf_bitfield.length);
+  putc ('\n', asm_out_file);
+
+  fputs (integer_asm_op (1, false), asm_out_file);
+  fprint_whex (asm_out_file, t->lf_bitfield.position);
+  putc ('\n', asm_out_file);
+
+  write_cv_padding (2);
+
+  asm_fprintf (asm_out_file, "%LLcv_type%x_end:\n", t->num);
+}
+
 /* Write the .debug$T section, which contains all of our custom type
definitions.  */
 
@@ -1619,6 +1669,10 @@ write_custom_types (void)
case LF_ARRAY:
  write_lf_array (custom_types);
  break;
+
+   case LF_BITFIELD:
+ write_lf_bitfield (custom_types);
+ break;
}
 
   free (custom_types);
@@ -2199,6 +2253,33 @@ add_struct_forward_def (dw_die_ref type)
   return ct->num;
 }
 
+/* Add an LF_BITFIELD type, returning its number.  DWARF represents bitfields
+   as members in a struct with a DW_AT_data_bit_offset attribute, whereas in
+   CodeView they're a distinct type.  */
+
+static uint32_t
+create_bitfield (dw_die_ref c)
+{
+  codeview_custom_type *ct;
+  uint32_t base_type;
+
+  base_type = get_type_num (get_AT_ref (c, DW_AT_type), true, false);
+  if (base_type == 0)
+return 0;
+
+  ct = (codeview_custom_type *) xmalloc (sizeof (codeview_custom_type));
+
+  ct->next = NULL;
+  ct->kind = LF_BITFIELD;
+  ct->lf_bitfield.base_type = base_type;
+  ct->lf_bitfield.length = get_AT_unsigned (c, DW_AT_bit_size);
+  ct->lf_bitfield.position = get_AT_unsigned (c, DW_AT_data_bit_offset);
+
+  add_custom_type (ct);
+
+  return ct->num;
+}
+
 /* Process a DW_TAG_structure_type, DW_TAG_class_type, or DW_TAG_union_type
DIE, add an LF_FIELDLIST and an LF_STRUCTURE / LF_CLASS / LF_UNION type,
and return the number of the latter.  */
@@ -2279,8 +2360,12 @@ get_type_num_struct (dw_die_ref type, bool in_struct, 
bool *is_fwd_ref)
  break;
}
 
- el->lf_member.type = get_type_num (get_AT_ref (c, DW_AT_type), true,
-   false);
+ if (get_AT (c, DW_AT_data_bit_offset))
+   el->lf_member.type = create_bitfield (c);
+ else
+   el->lf_member.type = get_type_num (get_AT_ref (c, DW_AT_type),
+  true, false);
+
  el->lf_member.offset.neg = false;
  el->lf_member.offset.num = get_AT_unsigned (c,
  
DW_AT_data_member_location);
diff --git a/gcc/dwarf2codeview.h b/gcc/dwarf2codeview.h
index 70eed6bf2aa..70eae554b80 100644
--- a/gcc/dwarf2codeview.h
+++ b/gcc/dwarf2codeview.h
@@ -64,6 +64,7 @@ along with GCC; see the file COPYING3.  If not see
 #define LF_MODIFIER0x1001
 #define LF_POINTER 0x1002
 #define LF_FIELDLIST   0x1203
+#define LF_BITFIELD0x1205
 #define LF_INDEX   0x1404
 #define LF_ENUMERATE   0x1502
 #define LF_ARRAY   0x1503

[PATCH 03/11] Handle typedefs for CodeView

2024-06-17 Thread Mark Harmstone
gcc/
* dwarf2codeview.cc (get_type_num): Handle typedefs.
---
 gcc/dwarf2codeview.cc | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc
index eb7c1270e31..5006a176260 100644
--- a/gcc/dwarf2codeview.cc
+++ b/gcc/dwarf2codeview.cc
@@ -1024,6 +1024,12 @@ get_type_num (dw_die_ref type)
   t->num = get_type_num_base_type (type);
   break;
 
+case DW_TAG_typedef:
+  /* FIXME - signed longs typedef'd as "HRESULT" should get their
+own type (T_HRESULT) */
+  t->num = get_type_num (get_AT_ref (type, DW_AT_type));
+  break;
+
 default:
   t->num = 0;
   break;
-- 
2.44.2



Re: [committed] testsuite: Add -Wno-psabi to vshuf-mem.C test

2024-06-17 Thread Andreas Krebbel

On 6/14/24 20:03, Jakub Jelinek wrote:

Also wonder about the
// { dg-additional-options "-march=z14" { target s390*-*-* } }
line, doesn't that mean the test will FAIL on all pre-z14 HW?
Shouldn't it use some z14_runtime or similar effective target, or
check in main (in that case copied over to g++.target/s390) whether
z14 instructions can be actually used at runtime?


Oh right. I'll remove that line and replicate the testcase in the arch 
specific test dir.


Andreas




[PATCH 02/11] Handle CodeView base types

2024-06-17 Thread Mark Harmstone
Adds a get_type_num function to translate type DIEs into CodeView
numbers, along with a hash table for this.  For now we just deal with
the base types (integers, Unicode chars, floats, and bools).

gcc/
* dwarf2codeview.cc (struct codeview_type): New structure.
(struct die_hasher): Likewise.
(types_htab): New variable.
(codeview_debug_finish): Free types_htab if allocated.
(get_type_num_base_type, get_type_num): New function.
(add_variable): Call get_type_num.
* dwarf2codeview.h (T_CHAR, T_SHORT, T_LONG, T_QUAD): Define.
(T_UCHAR, T_USHORT, T_ULONG, T_UQUAD, T_BOOL08): Likewise.
(T_REAL32, T_REAL64, T_REAL80, T_REAL128, T_RCHAR): Likewise.
(T_WCHAR, T_INT4, T_UINT4, T_CHAR16, T_CHAR32, T_CHAR8): Likewise.
---
 gcc/dwarf2codeview.cc | 196 +-
 gcc/dwarf2codeview.h  |  23 +
 2 files changed, 218 insertions(+), 1 deletion(-)

diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc
index 60e84635971..eb7c1270e31 100644
--- a/gcc/dwarf2codeview.cc
+++ b/gcc/dwarf2codeview.cc
@@ -147,6 +147,27 @@ struct codeview_symbol
   };
 };
 
+struct codeview_type
+{
+  dw_die_ref die;
+  uint32_t num;
+};
+
+struct die_hasher : free_ptr_hash 
+{
+  typedef dw_die_ref compare_type;
+
+  static hashval_t hash (const codeview_type *x)
+  {
+return htab_hash_pointer (x->die);
+  }
+
+  static bool equal (const codeview_type *x, const dw_die_ref y)
+  {
+return x->die == y;
+  }
+};
+
 static unsigned int line_label_num;
 static unsigned int func_label_num;
 static unsigned int sym_label_num;
@@ -159,6 +180,7 @@ static codeview_function *funcs, *last_func;
 static const char* last_filename;
 static uint32_t last_file_id;
 static codeview_symbol *sym, *last_sym;
+static hash_table *types_htab;
 
 /* Record new line number against the current function.  */
 
@@ -838,6 +860,178 @@ codeview_debug_finish (void)
   write_source_files ();
   write_line_numbers ();
   write_codeview_symbols ();
+
+  if (types_htab)
+delete types_htab;
+}
+
+/* Translate a DWARF base type (DW_TAG_base_type) into its CodeView
+   equivalent.  */
+
+static uint32_t
+get_type_num_base_type (dw_die_ref type)
+{
+  unsigned int size = get_AT_unsigned (type, DW_AT_byte_size);
+
+  switch (get_AT_unsigned (type, DW_AT_encoding))
+{
+case DW_ATE_signed_char:
+  {
+   const char *name = get_AT_string (type, DW_AT_name);
+
+   if (size != 1)
+ return 0;
+
+   if (name && !strcmp (name, "signed char"))
+ return T_CHAR;
+   else
+ return T_RCHAR;
+  }
+
+case DW_ATE_unsigned_char:
+  if (size != 1)
+   return 0;
+
+  return T_UCHAR;
+
+case DW_ATE_signed:
+  switch (size)
+   {
+   case 2:
+ return T_SHORT;
+
+   case 4:
+ {
+   const char *name = get_AT_string (type, DW_AT_name);
+
+   if (name && !strcmp (name, "int"))
+ return T_INT4;
+   else
+ return T_LONG;
+ }
+
+   case 8:
+ return T_QUAD;
+
+   default:
+ return 0;
+   }
+
+case DW_ATE_unsigned:
+  switch (size)
+   {
+   case 2:
+ {
+   const char *name = get_AT_string (type, DW_AT_name);
+
+   if (name && !strcmp (name, "wchar_t"))
+ return T_WCHAR;
+   else
+ return T_USHORT;
+ }
+
+   case 4:
+ {
+   const char *name = get_AT_string (type, DW_AT_name);
+
+   if (name && !strcmp (name, "unsigned int"))
+ return T_UINT4;
+   else
+ return T_ULONG;
+ }
+
+   case 8:
+ return T_UQUAD;
+
+   default:
+ return 0;
+   }
+
+case DW_ATE_UTF:
+  switch (size)
+   {
+   case 1:
+ return T_CHAR8;
+
+   case 2:
+ return T_CHAR16;
+
+   case 4:
+ return T_CHAR32;
+
+   default:
+ return 0;
+   }
+
+case DW_ATE_float:
+  switch (size)
+   {
+   case 4:
+ return T_REAL32;
+
+   case 8:
+ return T_REAL64;
+
+   case 12:
+ return T_REAL80;
+
+   case 16:
+ return T_REAL128;
+
+   default:
+ return 0;
+   }
+
+case DW_ATE_boolean:
+  if (size == 1)
+   return T_BOOL08;
+  else
+   return 0;
+
+default:
+  return 0;
+}
+}
+
+/* Process a DIE representing a type definition and return its number.  If
+   it's something we can't handle, return 0.  We keep a hash table so that
+   we're not adding the same type multiple times - though if we do it's not
+   disastrous, as ld will deduplicate everything for us.  */
+
+static uint32_t
+get_type_num (dw_die_ref type)
+{
+  codeview_type **slot, *t;
+
+  if (!type)
+return 0;
+
+  if (!types_htab)
+types_htab = new hash_table (10);
+
+  slot = types_htab->find_slot

Re: [PATCH] rs6000: ROP - Do not disable shrink-wrapping for leaf functions [PR114759]

2024-06-17 Thread Peter Bergner
On 6/17/24 6:11 PM, Segher Boessenkool wrote:
> "ROP insns" are the instructions used in such exploits, not what you
> mean here :-)
> 
> The instructions are called "hash*"C, so maybe call tbem "hash insns"
> or "ROP protect hash insns"?.

Ok, that bad verbiage was in the extra commentary not part of the git
log entry.  That said, I'll reword that to the following:

 Only disable shrink-wrapping when using -mrop-protect when we know we
-will be emitting the ROP instructions (ie, non-leaf functions).
+will be emitting the ROP protect hash instructions (ie, non-leaf functions).




>>  * config/rs6000/rs6000.cc (rs6000_override_options_after_change): Move
>>  the disabling of shrink-wrapping from here
>>  * config/rs6000/rs6000-logue.cc (rs6000_stack_info): ...to here.
> 
> Hrm.  Can you do it in some particular caller of rs6000_stack_info,
> instead?  The rs6000_stack_info function itself is not suppposed to
> change any state whatsoever.

Sure, I can look at maybe moving that to the caller or maybe somewhere
better.  I'll repost the patch once I find a better location.



> The comment should say *why*!  The fact that we do is clear from the
> code itself already.  But why do we want this?
> 
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/gcc/config/rs6000/rs6000.cc
>> @@ -3427,10 +3427,6 @@ rs6000_override_options_after_change (void)
>>  }
>>else if (!OPTION_SET_P (flag_cunroll_grow_size))
>>  flag_cunroll_grow_size = flag_peel_loops || optimize >= 3;
>> -
>> -  /* If we are inserting ROP-protect instructions, disable shrink wrap.  */
>> -  if (rs6000_rop_protect)
>> -flag_shrink_wrap = 0;
>>  }
> 
> (Yes, I know the original code didn't say either, but let's try to make
> things better :-) )

Yeah, I didn't write that, I only moved it, but I can try to come up with
an explanation of why we need to disable it now.  That said, my hope is to
not have to disable shrink-wrapping even when we emit the ROP protect hash
insns in the future, but that will take some extra work.  If I can manage
that, then this should all just go away. :-)  Until then, we can stick
with this patch's micro-optimization.




>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr114759-1.c
>> @@ -0,0 +1,16 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -mdejagnu-cpu=power10 -mrop-protect 
>> -fdump-rtl-pro_and_epilogue" } */
>> +/* { dg-require-effective-target rop_ok } */
> 
> Do you want rop_ok while you are *forcing* it to be okay anyway?  Why?

At the moment, yes, since the rop_ok test not only checks for the -mcpu= level,
it also verifies that the ABI is ok.  Currently, rop_ok makes sure we have
Power10 and ELFv2 ABI being used.  So currently, if we were to run this test
on BE, we'd get an UNSUPPORTED using the rop_ok check, but if we removed it,
we'd see a FAIL.  

As we discussed offline, the plan is to eventually enable emitting the ROP 
protect
hash insns on other ABIs, but until then, I think we want to keep the rop_ok 
check
so as to keep Bill's CI builder from flagging it as a FAIL.

Peter




[PATCH 06/11] Handle enums for CodeView

2024-06-17 Thread Mark Harmstone
Translates DW_TAG_enumeration_type DIEs into LF_ENUM symbols.

gcc/
* dwarf2codeview.cc (MAX_FIELDLIST_SIZE): Define.
(struct codeview_integer): New structure.
(struct codeview_subtype): Likewise
(struct codeview_custom_type): Add lf_fieldlist and lf_enum
to union.
(write_cv_integer, cv_integer_len): New functions.
(write_lf_fieldlist, write_lf_enum): Likewise.
(write_custom_types): Call write_lf_fieldlist and write_lf_enum.
(add_enum_forward_def): New function.
(get_type_num_enumeration_type): Likewise.
(get_type_num): Handle DW_TAG_enumeration_type DIEs.
* dwarf2codeview.h (LF_FIELDLIST, LF_INDEX, LF_ENUMERATE): Define.
(LF_ENUM, LF_CHAR, LF_SHORT, LF_USHORT, LF_LONG): Likewise.
(LF_ULONG, LF_QUADWORD, LF_UQUADWORD): Likewise.
(CV_ACCESS_PRIVATE, CV_ACCESS_PROTECTED): Likewise.
(CV_ACCESS_PUBLIC, CV_PROP_FWDREF): Likewise.
---
 gcc/dwarf2codeview.cc | 524 ++
 gcc/dwarf2codeview.h  |  17 ++
 2 files changed, 541 insertions(+)

diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc
index 05f5f60997e..475a53573e9 100644
--- a/gcc/dwarf2codeview.cc
+++ b/gcc/dwarf2codeview.cc
@@ -63,6 +63,11 @@ along with GCC; see the file COPYING3.  If not see
 #define SYMBOL_START_LABEL "Lcvsymstart"
 #define SYMBOL_END_LABEL   "Lcvsymend"
 
+/* There's two bytes available for each type's size, but follow MSVC's lead in
+   capping the LF_FIELDLIST size at fb00 (minus 8 bytes for the LF_INDEX
+   pointing to the overflow entry).  */
+#define MAX_FIELDLIST_SIZE 0xfaf8
+
 #define HASH_SIZE 16
 
 struct codeview_string
@@ -170,6 +175,31 @@ struct die_hasher : free_ptr_hash 
   }
 };
 
+struct codeview_integer
+{
+  bool neg;
+  uint64_t num;
+};
+
+struct codeview_subtype
+{
+  struct codeview_subtype *next;
+  uint16_t kind;
+
+  union
+  {
+struct
+{
+  char *name;
+  struct codeview_integer value;
+} lf_enumerate;
+struct
+{
+  uint32_t type_num;
+} lf_index;
+  };
+};
+
 struct codeview_custom_type
 {
   struct codeview_custom_type *next;
@@ -188,6 +218,20 @@ struct codeview_custom_type
   uint32_t base_type;
   uint16_t modifier;
 } lf_modifier;
+struct
+{
+  size_t length;
+  codeview_subtype *subtypes;
+  codeview_subtype *last_subtype;
+} lf_fieldlist;
+struct
+{
+  uint16_t count;
+  uint16_t properties;
+  uint32_t underlying_type;
+  uint32_t fieldlist;
+  char *name;
+} lf_enum;
   };
 };
 
@@ -978,6 +1022,292 @@ write_lf_modifier (codeview_custom_type *t)
   asm_fprintf (asm_out_file, "%LLcv_type%x_end:\n", t->num);
 }
 
+/* Write a CodeView extensible integer.  If the value is non-negative and
+   < 0x8000, the value gets written directly as an uint16_t.  Otherwise, we
+   output two bytes for the integer type (LF_CHAR, LF_SHORT, ...), and the
+   actual value follows.  */
+
+static size_t
+write_cv_integer (codeview_integer *i)
+{
+  if (i->neg)
+{
+  if (i->num <= 0x80)
+   {
+ fputs (integer_asm_op (2, false), asm_out_file);
+ fprint_whex (asm_out_file, LF_CHAR);
+ putc ('\n', asm_out_file);
+
+ fputs (integer_asm_op (1, false), asm_out_file);
+ fprint_whex (asm_out_file, -i->num);
+ putc ('\n', asm_out_file);
+
+ return 3;
+   }
+  else if (i->num <= 0x8000)
+   {
+ fputs (integer_asm_op (2, false), asm_out_file);
+ fprint_whex (asm_out_file, LF_SHORT);
+ putc ('\n', asm_out_file);
+
+ fputs (integer_asm_op (2, false), asm_out_file);
+ fprint_whex (asm_out_file, -i->num);
+ putc ('\n', asm_out_file);
+
+ return 4;
+   }
+  else if (i->num <= 0x8000)
+   {
+ fputs (integer_asm_op (2, false), asm_out_file);
+ fprint_whex (asm_out_file, LF_LONG);
+ putc ('\n', asm_out_file);
+
+ fputs (integer_asm_op (4, false), asm_out_file);
+ fprint_whex (asm_out_file, -i->num);
+ putc ('\n', asm_out_file);
+
+ return 6;
+   }
+  else
+   {
+ fputs (integer_asm_op (2, false), asm_out_file);
+ fprint_whex (asm_out_file, LF_QUADWORD);
+ putc ('\n', asm_out_file);
+
+ fputs (integer_asm_op (8, false), asm_out_file);
+ fprint_whex (asm_out_file, -i->num);
+ putc ('\n', asm_out_file);
+
+ return 10;
+   }
+}
+  else
+{
+  if (i->num <= 0x7fff)
+   {
+ fputs (integer_asm_op (2, false), asm_out_file);
+ fprint_whex (asm_out_file, i->num);
+ putc ('\n', asm_out_file);
+
+ return 2;
+   }
+  else if (i->num <= 0x)
+   {
+ fputs (integer_asm_op (2, false), asm_out_file);
+ fprint_whex (asm_out_file, LF_USHORT);
+ putc ('

[PATCH 08/11] Handle unions for CodeView.

2024-06-17 Thread Mark Harmstone
Translates DW_TAG_union_type DIEs into LF_UNION symbols.

gcc/
* dwarf2codeview.cc (write_lf_union): New function.
(write_custom_types): Call write_lf_union.
(add_struct_forward_def): Handle DW_TAG_union_type DIEs.
(get_type_num_struct): Handle unions.
(get_type_num): Handle DW_TAG_union_type DIEs.
* dwarf2codeview.h (LF_UNION): Define.
---
 gcc/dwarf2codeview.cc | 91 ---
 gcc/dwarf2codeview.h  |  1 +
 2 files changed, 86 insertions(+), 6 deletions(-)

diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc
index 9c6614f6297..9e3b64522b2 100644
--- a/gcc/dwarf2codeview.cc
+++ b/gcc/dwarf2codeview.cc
@@ -1454,6 +1454,72 @@ write_lf_structure (codeview_custom_type *t)
   asm_fprintf (asm_out_file, "%LLcv_type%x_end:\n", t->num);
 }
 
+/* Write an LF_UNION type.  */
+
+static void
+write_lf_union (codeview_custom_type *t)
+{
+  size_t name_len, leaf_len;
+
+  /* This is lf_union in binutils and lfUnion in Microsoft's cvinfo.h:
+
+struct lf_union
+{
+  uint16_t size;
+  uint16_t kind;
+  uint16_t num_members;
+  uint16_t properties;
+  uint32_t field_list;
+  uint16_t length;
+  char name[];
+} ATTRIBUTE_PACKED;
+  */
+
+  fputs (integer_asm_op (2, false), asm_out_file);
+  asm_fprintf (asm_out_file, "%LLcv_type%x_end - %LLcv_type%x_start\n",
+  t->num, t->num);
+
+  asm_fprintf (asm_out_file, "%LLcv_type%x_start:\n", t->num);
+
+  fputs (integer_asm_op (2, false), asm_out_file);
+  fprint_whex (asm_out_file, t->kind);
+  putc ('\n', asm_out_file);
+
+  fputs (integer_asm_op (2, false), asm_out_file);
+  fprint_whex (asm_out_file, t->lf_structure.num_members);
+  putc ('\n', asm_out_file);
+
+  fputs (integer_asm_op (2, false), asm_out_file);
+  fprint_whex (asm_out_file, t->lf_structure.properties);
+  putc ('\n', asm_out_file);
+
+  fputs (integer_asm_op (4, false), asm_out_file);
+  fprint_whex (asm_out_file, t->lf_structure.field_list);
+  putc ('\n', asm_out_file);
+
+  leaf_len = 12 + write_cv_integer (&t->lf_structure.length);
+
+  if (t->lf_structure.name)
+{
+  name_len = strlen (t->lf_structure.name) + 1;
+  ASM_OUTPUT_ASCII (asm_out_file, t->lf_structure.name, name_len);
+}
+  else
+{
+  static const char unnamed_struct[] = "";
+
+  name_len = sizeof (unnamed_struct);
+  ASM_OUTPUT_ASCII (asm_out_file, unnamed_struct, name_len);
+}
+
+  leaf_len += name_len;
+  write_cv_padding (4 - (leaf_len % 4));
+
+  free (t->lf_structure.name);
+
+  asm_fprintf (asm_out_file, "%LLcv_type%x_end:\n", t->num);
+}
+
 /* Write the .debug$T section, which contains all of our custom type
definitions.  */
 
@@ -1492,6 +1558,10 @@ write_custom_types (void)
case LF_CLASS:
  write_lf_structure (custom_types);
  break;
+
+   case LF_UNION:
+ write_lf_union (custom_types);
+ break;
}
 
   free (custom_types);
@@ -2026,7 +2096,7 @@ flush_deferred_types (void)
   last_deferred_type = NULL;
 }
 
-/* Add a forward definition for a struct or class.  */
+/* Add a forward definition for a struct, class, or union.  */
 
 static uint32_t
 add_struct_forward_def (dw_die_ref type)
@@ -2047,6 +2117,10 @@ add_struct_forward_def (dw_die_ref type)
   ct->kind = LF_STRUCTURE;
   break;
 
+case DW_TAG_union_type:
+  ct->kind = LF_UNION;
+  break;
+
 default:
   break;
 }
@@ -2068,9 +2142,9 @@ add_struct_forward_def (dw_die_ref type)
   return ct->num;
 }
 
-/* Process a DW_TAG_structure_type or DW_TAG_class_type DIE, add an
-   LF_FIELDLIST and an LF_STRUCTURE / LF_CLASS type, and return the number of
-   the latter.  */
+/* Process a DW_TAG_structure_type, DW_TAG_class_type, or DW_TAG_union_type
+   DIE, add an LF_FIELDLIST and an LF_STRUCTURE / LF_CLASS / LF_UNION type,
+   and return the number of the latter.  */
 
 static uint32_t
 get_type_num_struct (dw_die_ref type, bool in_struct, bool *is_fwd_ref)
@@ -2227,8 +2301,8 @@ get_type_num_struct (dw_die_ref type, bool in_struct, 
bool *is_fwd_ref)
   ct = ct2;
 }
 
-  /* Now add an LF_STRUCTURE / LF_CLASS, pointing to the LF_FIELDLIST we just
- added.  */
+  /* Now add an LF_STRUCTURE / LF_CLASS / LF_UNION, pointing to the
+ LF_FIELDLIST we just added.  */
 
   ct = (codeview_custom_type *) xmalloc (sizeof (codeview_custom_type));
 
@@ -2244,6 +2318,10 @@ get_type_num_struct (dw_die_ref type, bool in_struct, 
bool *is_fwd_ref)
   ct->kind = LF_STRUCTURE;
   break;
 
+case DW_TAG_union_type:
+  ct->kind = LF_UNION;
+  break;
+
 default:
   break;
 }
@@ -2325,6 +2403,7 @@ get_type_num (dw_die_ref type, bool in_struct, bool 
no_fwd_ref)
 
 case DW_TAG_structure_type:
 case DW_TAG_class_type:
+case DW_TAG_union_type:
   num = get_type_num_struct (type, in_struct, &is_fwd_ref);
   break;
 
diff --git a/gcc/dwarf2codeview.h b/gcc/dwar

[PATCH 09/11] Handle arrays for CodeView

2024-06-17 Thread Mark Harmstone
Translates DW_TAG_array_type DIEs into LF_ARRAY symbols.

gcc/
* dwarf2codeview.cc
(struct codeview_custom_type): Add lf_array to union.
(write_lf_array): New function.
(write_custom_types): Call write_lf_array.
(get_type_num_array_type): New function.
(get_type_num): Handle DW_TAG_array_type DIEs.
* dwarf2codeview.h (LF_ARRAY): Define.
---
 gcc/dwarf2codeview.cc | 179 ++
 gcc/dwarf2codeview.h  |   1 +
 2 files changed, 180 insertions(+)

diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc
index 9e3b64522b2..3f1ce5577fc 100644
--- a/gcc/dwarf2codeview.cc
+++ b/gcc/dwarf2codeview.cc
@@ -250,6 +250,12 @@ struct codeview_custom_type
   codeview_integer length;
   char *name;
 } lf_structure;
+struct
+{
+  uint32_t element_type;
+  uint32_t index_type;
+  codeview_integer length_in_bytes;
+} lf_array;
   };
 };
 
@@ -1520,6 +1526,53 @@ write_lf_union (codeview_custom_type *t)
   asm_fprintf (asm_out_file, "%LLcv_type%x_end:\n", t->num);
 }
 
+/* Write an LF_ARRAY type.  */
+
+static void
+write_lf_array (codeview_custom_type *t)
+{
+  size_t leaf_len;
+
+  /* This is lf_array in binutils and lfArray in Microsoft's cvinfo.h:
+
+struct lf_array
+{
+  uint16_t size;
+  uint16_t kind;
+  uint32_t element_type;
+  uint32_t index_type;
+  uint16_t length_in_bytes;
+  char name[];
+} ATTRIBUTE_PACKED;
+  */
+
+  fputs (integer_asm_op (2, false), asm_out_file);
+  asm_fprintf (asm_out_file, "%LLcv_type%x_end - %LLcv_type%x_start\n",
+  t->num, t->num);
+
+  asm_fprintf (asm_out_file, "%LLcv_type%x_start:\n", t->num);
+
+  fputs (integer_asm_op (2, false), asm_out_file);
+  fprint_whex (asm_out_file, t->kind);
+  putc ('\n', asm_out_file);
+
+  fputs (integer_asm_op (4, false), asm_out_file);
+  fprint_whex (asm_out_file, t->lf_array.element_type);
+  putc ('\n', asm_out_file);
+
+  fputs (integer_asm_op (4, false), asm_out_file);
+  fprint_whex (asm_out_file, t->lf_array.index_type);
+  putc ('\n', asm_out_file);
+
+  leaf_len = 13 + write_cv_integer (&t->lf_array.length_in_bytes);
+
+  ASM_OUTPUT_ASCII (asm_out_file, "", 1);
+
+  write_cv_padding (4 - (leaf_len % 4));
+
+  asm_fprintf (asm_out_file, "%LLcv_type%x_end:\n", t->num);
+}
+
 /* Write the .debug$T section, which contains all of our custom type
definitions.  */
 
@@ -1562,6 +1615,10 @@ write_custom_types (void)
case LF_UNION:
  write_lf_union (custom_types);
  break;
+
+   case LF_ARRAY:
+ write_lf_array (custom_types);
+ break;
}
 
   free (custom_types);
@@ -2346,6 +2403,124 @@ get_type_num_struct (dw_die_ref type, bool in_struct, 
bool *is_fwd_ref)
   return ct->num;
 }
 
+/* Process a DW_TAG_array_type DIE, adding an LF_ARRAY type and returning its
+   number.  */
+
+static uint32_t
+get_type_num_array_type (dw_die_ref type, bool in_struct)
+{
+  dw_die_ref base_type, t, first_child, c, *dimension_arr;
+  uint64_t size = 0;
+  unsigned int dimensions, i;
+  uint32_t element_type;
+
+  base_type = get_AT_ref (type, DW_AT_type);
+  if (!base_type)
+return 0;
+
+  /* We need to know the size of our base type.  Loop through until we find
+ it.  */
+  t = base_type;
+  while (t && size == 0)
+{
+  switch (dw_get_die_tag (t))
+   {
+   case DW_TAG_const_type:
+   case DW_TAG_volatile_type:
+   case DW_TAG_typedef:
+   case DW_TAG_enumeration_type:
+ t = get_AT_ref (t, DW_AT_type);
+ break;
+
+   case DW_TAG_base_type:
+   case DW_TAG_structure_type:
+   case DW_TAG_class_type:
+   case DW_TAG_union_type:
+   case DW_TAG_pointer_type:
+ size = get_AT_unsigned (t, DW_AT_byte_size);
+ break;
+
+   default:
+ return 0;
+   }
+}
+
+  if (size == 0)
+return 0;
+
+  first_child = dw_get_die_child (type);
+  if (!first_child)
+return 0;
+
+  element_type = get_type_num (base_type, in_struct, false);
+  if (element_type == 0)
+return 0;
+
+  /* Create an array of our DW_TAG_subrange_type children, in reverse order.
+ We have to do this because unlike DWARF CodeView doesn't have
+ multidimensional arrays, so instead we do arrays of arrays.  */
+
+  dimensions = 0;
+  c = first_child;
+  do
+{
+  c = dw_get_die_sib (c);
+  if (dw_get_die_tag (c) != DW_TAG_subrange_type)
+   continue;
+
+  dimensions++;
+}
+  while (c != first_child);
+
+  if (dimensions == 0)
+return 0;
+
+  dimension_arr = (dw_die_ref *) xmalloc (sizeof (dw_die_ref) * dimensions);
+
+  c = first_child;
+  i = 0;
+  do
+{
+  c = dw_get_die_sib (c);
+  if (dw_get_die_tag (c) != DW_TAG_subrange_type)
+   continue;
+
+  dimension_arr[dimensions - i - 1] = c;
+  i++;
+}
+  while (c != first_child);
+
+  /* Record an LF_ARRAY entry for each array dimensi

  1   2   >