Re: [PATCH 1/2] LoongArch: Fix wrong code with _alsl_reversesi_extended

2025-01-22 Thread Xi Ruoyao
On Wed, 2025-01-22 at 10:53 +0800, Xi Ruoyao wrote:
> On Wed, 2025-01-22 at 10:37 +0800, Lulu Cheng wrote:
> > 
> > 在 2025/1/22 上午8:49, Xi Ruoyao 写道:
> > > The second source register of this insn cannot be the same as the
> > > destination register.
> > > 
> > > gcc/ChangeLog:
> > > 
> > >   * config/loongarch/loongarch.md
> > >   (_alsl_reversesi_extended): Add '&' to the destination
> > >   register constraint and append '0' to the first source register
> > >   constraint to indicate the destination register cannot be same
> > >   as the second source register, and change the split condition to
> > >   reload_completed so that the insn will be split only after RA in
> > >   order to obtain allocated registers that satisfy the above
> > >   constraints.
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > >   * gcc.target/loongarch/bitwise-shift-reassoc-clobber.c: New
> > >   test.
> > > ---
> > >   gcc/config/loongarch/loongarch.md |  6 +++---
> > >   .../loongarch/bitwise-shift-reassoc-clobber.c | 21 +++
> > >   2 files changed, 24 insertions(+), 3 deletions(-)
> > >   create mode 100644 
> > > gcc/testsuite/gcc.target/loongarch/bitwise-shift-reassoc-clobber.c
> > > 
> > > diff --git a/gcc/config/loongarch/loongarch.md 
> > > b/gcc/config/loongarch/loongarch.md
> > > index 223e2b9f37f..1392325038c 100644
> > > --- a/gcc/config/loongarch/loongarch.md
> > > +++ b/gcc/config/loongarch/loongarch.md
> > > @@ -3160,13 +3160,13 @@ (define_insn_and_split 
> > > "_shift_reverse"
> > >   ;; add.w => alsl.w, so implement slli.d + and + add.w => and + alsl.w on
> > >   ;; our own.
> > >   (define_insn_and_split "_alsl_reversesi_extended"
> > > -  [(set (match_operand:DI 0 "register_operand" "=r")
> > > +  [(set (match_operand:DI 0 "register_operand" "=&r")
> > >   (sign_extend:DI
> > >     (plus:SI
> > >       (subreg:SI
> > >     (any_bitwise:DI
> > >   (ashift:DI
> > > -   (match_operand:DI 1 "register_operand" "r")
> > > +   (match_operand:DI 1 "register_operand" "r0")
> > >     (match_operand:SI 2 "const_immalsl_operand" ""))
> > >   (match_operand:DI 3 "const_int_operand" "i"))
> > >     0)
> > > @@ -3175,7 +3175,7 @@ (define_insn_and_split 
> > > "_alsl_reversesi_extended"
> > >  && loongarch_reassoc_shift_bitwise (, operands[2], 
> > > operands[3],
> > >      SImode)"
> > >     "#"
> > > -  "&& true"
> > > +  "&& reload_completed"
> > 
> > I have no problem with this patch.
> > 
> > But, I have always been confused about the use of reload_completed.
> > 
> > I can understand that it needs to be true here, but I don't quite 
> > understand the following:
> > 
> > ```
> > 
> > (define_insn_and_split "*zero_extendsidi2_internal"
> >    [(set (match_operand:DI 0 "register_operand" "=r,r,r,r")
> >  (zero_extend:DI (match_operand:SI 1 "nonimmediate_operand" 
> > "r,m,ZC,k")))]
> >    "TARGET_64BIT"
> >    "@
> >     bstrpick.d\t%0,%1,31,0
> >     ld.wu\t%0,%1
> >     #
> >     ldx.wu\t%0,%1"
> >    "&& reload_completed
> 
> I don't really understand it either... It is here since day 1 LoongArch
> support was added and I've never had enough courage to hack this part.

I pushed the 1st patch as an emergency wrong-code fix now.  The 2nd
patch can wait until we figure out the best way to support the fusion.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] c++/modules: Fix linkage checks for exported using-decls

2025-01-22 Thread Richard Earnshaw (lists)

On 21/01/2025 21:58, Jason Merrill wrote:

On 1/15/25 7:36 PM, yxj-github-437 wrote:

On Fri, Jan 03, 2025 at 05:18:55PM +, xxx wrote:

From: yxj-github-437 <2457369...@qq.com>

This patch attempts to fix an error when build module std. The 
reason for the
error is __builrin_va_list (aka struct __va_list) is an internal 
linkage. so
attempt to handle this builtin type by identifying whether 
DECL_SOURCE_LOCATION (entity)

is BUILTINS_LOCATION.



Hi, thanks for the patch!  I suspect this may not be sufficient to
completely avoid issues with the __gnuc_va_list type; in particular, if
it's internal linkage that may prevent it from being referred to in
other ways by inline functions in named modules (due to P1815).

Maybe a better approach would be to instead mark this builtin type as
TREE_PUBLIC (presumably in aarch64_build_builtin_va_list)?


Thanks, I change my patch to mark TREE_PUBLIC.


Looks good to me if the ARM maintainers don't object.


Looks OK to me.

R.



This patch is small enough not to worry about copyright, but
"yxj-github-437 <2457369...@qq.com>" seems like a placeholder name, what 
name would you like the commit to use?



-- >8 --

This patch attempts to fix an error when build module std. The reason 
for the

error is __builtin_va_list (aka struct __va_list) has internal linkage.
so mark this builtin type as TREE_PUBLIC to make struct __va_list has
external linkage.

/x/gcc-15.0.0/usr/bin/aarch64-linux-android-g++ -fmodules -std=c++23 - 
fPIC -O2

-fsearch-include-path bits/std.cc -c
/x/gcc-15.0.0/usr/lib/gcc/aarch64-linux-android/15.0.0/include/c++/ 
bits/std.cc:3642:14:
error: exporting ‘typedef __gnuc_va_list va_list’ that does not have 
external linkage

  3642 |   using std::va_list;
   |  ^~~
: note: ‘struct __va_list’ declared here with internal linkage

gcc
* config/aarch64/aarch64.cc (aarch64_build_builtin_va_list): mark 
__builtin_va_list

as TREE_PUBLIC
* config/arm/arm.cc (arm_build_builtin_va_list): mark 
__builtin_va_list

as TREE_PUBLIC
* testsuite/g++.dg/modules/builtin-8.C: New test
---
  gcc/config/aarch64/aarch64.cc    | 1 +
  gcc/config/arm/arm.cc    | 1 +
  gcc/testsuite/g++.dg/modules/builtin-8.C | 9 +
  3 files changed, 11 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/modules/builtin-8.C

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/ 
aarch64.cc

index ad31e9d255c..e022526e573 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -21566,6 +21566,7 @@ aarch64_build_builtin_va_list (void)
   get_identifier ("__va_list"),
   va_list_type);
    DECL_ARTIFICIAL (va_list_name) = 1;
+  TREE_PUBLIC (va_list_name) = 1;
    TYPE_NAME (va_list_type) = va_list_name;
    TYPE_STUB_DECL (va_list_type) = va_list_name;
diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 1e0791dc8c2..86838ebde5f 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -2906,6 +2906,7 @@ arm_build_builtin_va_list (void)
   get_identifier ("__va_list"),
   va_list_type);
    DECL_ARTIFICIAL (va_list_name) = 1;
+  TREE_PUBLIC (va_list_name) = 1;
    TYPE_NAME (va_list_type) = va_list_name;
    TYPE_STUB_DECL (va_list_type) = va_list_name;
    /* Create the __ap field.  */
diff --git a/gcc/testsuite/g++.dg/modules/builtin-8.C b/gcc/testsuite/ 
g++.dg/modules/builtin-8.C

new file mode 100644
index 000..ff91104e4a9
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/builtin-8.C
@@ -0,0 +1,9 @@
+// { dg-additional-options -fmodules-ts }
+module;
+#include 
+export module builtins;
+// { dg-module-cmi builtins }
+
+export {
+  using ::va_list;
+}






Re: [GCC16 stage 1][RFC][PATCH 0/3]extend "counted_by" attribute to pointer fields of structures

2025-01-22 Thread Michael Matz
Hello,

On Tue, 21 Jan 2025, Martin Uecker wrote:

> > > Coudn't you use the rule that .len refers to the closest enclosing 
> > > structure
> > > even without __self__ ?  This would then also disambiguate between 
> > > designators
> > > and other uses.
> > 
> > Right now, an expression cannot start with '.', which provides the 
> > disambiguation between designators and expressions as initializers. 
> 
> You could disambiguate directly after parsing the identifier, which
> does not seem overly problematic.

Which way?  When you allow ". identifier" as primary expression, then in

  struct S s = { .x = 42 };

the initializer can be parsed as designated initializer (with error 
when 'x' is not a member of S) or as assignment expression like in

  struct T t = { foo = 42 };

You need to decide which is which after seeing the ".".  I'm guessing what 
you mean is that on seeing ".ident" as first two tokens inside in 
initializer-list you go the designator route, and not the 
initializer/assignment-expression route, even though the latter can now 
also start with ".ident".  But then, what about:

  struct U u = { .y };

It's certainly not a designation anymore, but you only know after not 
seeing the '='.  So here it's actually an assignment-expression.  And as 
you allowed ".ident" as primary expression this might rightfully refer to 
some in-scope 'y' member of some outer struct (or give an error).

Note further that you may have '{ .y[1][3].z }', which is still not a 
designation, but an expression under your proposal, whereas
'{ .y[1][3].z = 1 }' would remain a designation.  This shows that you 
now need arbitrary look-ahead to disambiguate the two.  A Very Bad Idea.


Ciao,
Michael.

[PATCH] c++/modules: Fix exporting temploid friends in header units [PR118582]

2025-01-22 Thread Nathaniel Shead
Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

When we started streaming the bit to handle merging of imported temploid
friends in r15-2807, I unthinkingly only streamed it in the
'!state->is_header ()' case.

This patch reworks the streaming logic to ensure that this data is
always streamed, including for unique entities (in case that ever comes
up somehow).  This does make the streaming slightly less efficient, as
functions and types will need an extra byte, but this doesn't appear to
make a huge difference to the size of the resulting module; the 'std'
module on my machine grows by 0.2% from 30671136 to 30730144 bytes.

PR c++/118582

gcc/cp/ChangeLog:

* module.cc (trees_out::decl_value): Always stream
imported_temploid_friends information.
(trees_in::decl_value): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr118582_a.H: New test.
* g++.dg/modules/pr118582_b.H: New test.
* g++.dg/modules/pr118582_c.H: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/module.cc  | 47 +++
 gcc/testsuite/g++.dg/modules/pr118582_a.H | 16 
 gcc/testsuite/g++.dg/modules/pr118582_b.H |  6 +++
 gcc/testsuite/g++.dg/modules/pr118582_c.H |  5 +++
 4 files changed, 49 insertions(+), 25 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/pr118582_a.H
 create mode 100644 gcc/testsuite/g++.dg/modules/pr118582_b.H
 create mode 100644 gcc/testsuite/g++.dg/modules/pr118582_c.H

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 813c1436141..17215594fd3 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -2791,7 +2791,7 @@ static keyed_map_t *keyed_table;
 
 /* Instantiations of temploid friends imported from another module
need to be attached to the same module as the temploid.  This maps
-   these decls to the temploid they are instantiated them, as there is
+   these decls to the temploid they are instantiated from, as there is
no other easy way to get this information.  */
 static GTY((cache)) decl_tree_cache_map *imported_temploid_friends;
 
@@ -7961,7 +7961,6 @@ trees_out::decl_value (tree decl, depset *dep)
 }
 
   merge_kind mk = get_merge_kind (decl, dep);
-  bool is_imported_temploid_friend = imported_temploid_friends->get (decl);
 
   if (CHECKING_P)
 {
@@ -7998,10 +7997,6 @@ trees_out::decl_value (tree decl, depset *dep)
is_attached = true;
 
  bits.b (is_attached);
-
- /* Also tell the importer whether this is an imported temploid
-friend, which has implications for merging.  */
- bits.b (is_imported_temploid_friend);
}
  bits.b (dep && dep->has_defn ());
}
@@ -8087,6 +8082,16 @@ trees_out::decl_value (tree decl, depset *dep)
   tree container = decl_container (decl);
   unsigned tpl_levels = 0;
 
+  /* Also tell the importer whether this is a temploid friend attached
+ to a different module (which has implications for merging), so that
+ importers can reconstruct this information on stream-in.  */
+  if (TREE_CODE (inner) == FUNCTION_DECL || TREE_CODE (inner) == TYPE_DECL)
+{
+  tree* temploid_friend_slot = imported_temploid_friends->get (decl);
+  gcc_checking_assert (!temploid_friend_slot || *temploid_friend_slot);
+  tree_node (temploid_friend_slot ? *temploid_friend_slot : NULL_TREE);
+}
+
   {
 auto wmk = make_temp_override (dep_hash->writing_merge_key, true);
 if (decl != inner)
@@ -8182,14 +8187,6 @@ trees_out::decl_value (tree decl, depset *dep)
}
 }
 
-  if (is_imported_temploid_friend)
-{
-  /* Write imported temploid friends so that importers can reconstruct
-this information on stream-in.  */
-  tree* slot = imported_temploid_friends->get (decl);
-  tree_node (*slot);
-}
-
   bool is_typedef = false;
   if (!type && TREE_CODE (inner) == TYPE_DECL)
 {
@@ -8266,7 +8263,6 @@ trees_in::decl_value ()
 {
   int tag = 0;
   bool is_attached = false;
-  bool is_imported_temploid_friend = false;
   bool has_defn = false;
   unsigned mk_u = u ();
   if (mk_u >= MK_hwm || !merge_kind_name[mk_u])
@@ -8287,10 +8283,7 @@ trees_in::decl_value ()
{
  bits_in bits = stream_bits ();
  if (!(mk & MK_template_mask) && !state->is_header ())
-   {
- is_attached = bits.b ();
- is_imported_temploid_friend = bits.b ();
-   }
+   is_attached = bits.b ();
 
  has_defn = bits.b ();
}
@@ -8385,6 +8378,12 @@ trees_in::decl_value ()
   tree container = decl_container ();
   unsigned tpl_levels = 0;
 
+  /* If this is an imported temploid friend, get the owning decl its
+ attachment is determined by (or NULL_TREE otherwise).  */
+  tree temploid_friend = NULL_TREE;
+  if (TREE_CODE (inner) == FUNCTION_DECL || TREE_CODE (inner) == TYPE_DECL)
+temploid_friend = tree_node ();
+
   /* Figure out i

[PATCH] LoongArch: Fix invalid subregs in xorsign [PR118501]

2025-01-22 Thread Xi Ruoyao
The test case added in r15-7073 now triggers an ICE, indicating we need
the same fix as AArch64.

gcc/ChangeLog:

PR target/118501
* config/loongarch/loongarch.md (@xorsign3): Use
force_lowpart_subreg.
---

Bootstrapped and regtested on loongarch64-linux-gnu, ok for trunk?

 gcc/config/loongarch/loongarch.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 0325994ebd6..04a9a79d548 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -1343,8 +1343,8 @@ (define_expand "@xorsign3"
   machine_mode lsx_mode
 = mode == SFmode ? V4SFmode : V2DFmode;
   rtx tmp = gen_reg_rtx (lsx_mode);
-  rtx op1 = lowpart_subreg (lsx_mode, operands[1], mode);
-  rtx op2 = lowpart_subreg (lsx_mode, operands[2], mode);
+  rtx op1 = force_lowpart_subreg (lsx_mode, operands[1], mode);
+  rtx op2 = force_lowpart_subreg (lsx_mode, operands[2], mode);
   emit_insn (gen_xorsign3 (lsx_mode, tmp, op1, op2));
   emit_move_insn (operands[0],
   lowpart_subreg (mode, tmp, lsx_mode));
-- 
2.48.1



Re: [PATCH 3/3] aarch64: Avoid redundant writes to FPMR

2025-01-22 Thread Kyrylo Tkachov


> On 22 Jan 2025, at 13:53, Richard Sandiford  wrote:
> 
> Kyrylo Tkachov  writes:
>> Hi Richard,
>> 
>>> On 22 Jan 2025, at 13:21, Richard Sandiford  
>>> wrote:
>>> 
>>> GCC 15 is the first release to support FP8 intrinsics.
>>> The underlying instructions depend on the value of a new register,
>>> FPMR.  Unlike FPCR, FPMR is a normal call-clobbered/caller-save
>>> register rather than a global register.  So:
>>> 
>>> - The FP8 intrinsics take a final uint64_t argument that
>>> specifies what value FPMR should have.
>>> 
>>> - If an FP8 operation is split across multiple functions,
>>> it is likely that those functions would have a similar argument.
>>> 
>>> If the object code has the structure:
>>> 
>>>   for (...)
>>> fp8_kernel (..., fpmr_value);
>>> 
>>> then fp8_kernel would set FPMR to fpmr_value each time it is
>>> called, even though FPMR will already have that value for at
>>> least the second and subsequent calls (and possibly the first).
>>> 
>>> The working assumption for the ABI has been that writes to
>>> registers like FPMR can in general be more expensive than
>>> reads and so it would be better to use a conditional write like:
>>> 
>>>  mrs tmp, fpmr
>>>  cmp tmp, 
>>>  beq 1f
>>>  nsr fpmr, 
>> 
>> Typo “msr” here and in the comment in the code.
> 
> Oops, thanks, will fix.
> 
>> [...]
>>> @@ -1883,6 +1884,44 @@ (define_split
>>>  }
>>> )
>>> 
>>> +;; The preferred way of writing to the FPMR is to test whether it already
>>> +;; has the desired value and branch around the write if so.  This reduces
>>> +;; the number of redundant FPMR writes caused by ABI boundaries, such as 
>>> in:
>>> +;;
>>> +;;for (...)
>>> +;;  fp8_kernel (..., fpmr_value);
>>> +;;
>>> +;; Without this optimization, fp8_kernel would set FPMR to fpmr_value each
>>> +;; time that it is called.
>>> +;;
>>> +;; We do this as a split so that hardreg_pre can optimize the moves first.
>>> +(define_split
>>> +  [(set (reg:DI FPM_REGNUM)
>>> +(match_operand:DI 0 "aarch64_reg_or_zero"))]
>>> +  "TARGET_FP8 && !TARGET_CHEAP_FPMR_WRITE && can_create_pseudo_p ()"
>>> +  [(const_int 0)]
>>> +  {
>>> +auto label = gen_label_rtx ();
>>> +rtx current = copy_to_reg (gen_rtx_REG (DImode, FPM_REGNUM));
>>> +rtx cond = gen_rtx_EQ (VOIDmode, current, operands[0]);
>>> +emit_jump_insn (gen_cbranchdi4 (cond, current, operands[0], label));
>> 
>> Do you think it’s worth marking this jump as likely?
>> In some other expand code in the backend where we emit jumps we sometimes 
>> use aarch64_emit_unlikely_jump.
> 
> Ah, yeah, I should have said that I'd wondered about that.  But in the
> end it didn't seem appropriate.  Given that hardreg_pre should remove
> local instances of redundancy, we don't really have much information
> about whether the branch is likely or unlikely.  I think instead the
> hope/expectation is that the branch has a predictable pattern.

Ok, thanks for clarifying.
Kyrill

> 
> Thanks,
> Richard



[pushed: r15-7126] jit: fix startup on aarch64

2025-01-22 Thread David Malcolm
libgccjit fails on startup on aarch64 (and probably other archs).

The issues are that

(a) within jit_langhook_init the call to
targetm.init_builtins can use types that aren't representable
via jit::recording::type, and

(b) targetm.init_builtins can call lang_hooks.decls.pushdecl, which
although a no-op for libgccjit has a gcc_unreachable.

Fixed thusly.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r15-7126-g27470f9a818538.

gcc/jit/ChangeLog:
* dummy-frontend.cc (tree_type_to_jit_type): For POINTER_TYPE,
bail out if the inner call to tree_type_to_jit_type fails.
Don't abort on unknown types.
(jit_langhook_pushdecl): Replace gcc_unreachable with return of
NULL_TREE.

Signed-off-by: David Malcolm 
---
 gcc/jit/dummy-frontend.cc | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/gcc/jit/dummy-frontend.cc b/gcc/jit/dummy-frontend.cc
index 574851696311..1d0080d6fecb 100644
--- a/gcc/jit/dummy-frontend.cc
+++ b/gcc/jit/dummy-frontend.cc
@@ -1278,6 +1278,8 @@ recording::type* tree_type_to_jit_type (tree type)
   {
 tree inner_type = TREE_TYPE (type);
 recording::type* element_type = tree_type_to_jit_type (inner_type);
+if (!element_type)
+  return nullptr;
 return element_type->get_pointer ();
   }
   else
@@ -1299,10 +1301,6 @@ recording::type* tree_type_to_jit_type (tree type)
}
   }
 }
-
-fprintf (stderr, "Unknown type:\n");
-debug_tree (type);
-abort ();
   }
 
   return NULL;
@@ -1372,7 +1370,7 @@ jit_langhook_global_bindings_p (void)
 static tree
 jit_langhook_pushdecl (tree decl ATTRIBUTE_UNUSED)
 {
-  gcc_unreachable ();
+  return NULL_TREE;
 }
 
 static tree
-- 
2.26.3



Please Ignore -- testing email

2025-01-22 Thread Matthew Malcomson
Apologies for the noise.


Re: [PATCH v2 01/12] OpenMP/PolyInt: Pass poly-int structures by address to OMP libs.

2025-01-22 Thread Tejas Belagod

On 1/21/25 10:16 PM, Jakub Jelinek wrote:

On Fri, Oct 18, 2024 at 11:52:22AM +0530, Tejas Belagod wrote:

Currently poly-int type structures are passed by value to OpenMP runtime
functions for shared clauses etc.  This patch improves on this by passing
around poly-int structures by address to avoid copy-overhead.

gcc/ChangeLog
* omp-low.c (use_pointer_for_field): Use pointer if the OMP data
structure's field type is a poly-int.


I think I've acked this one earlier already.
It is still ok.



Thanks Jakub for the reviews. Just to clarify  - this series is all now 
for Stage 1, I'm guessing?


Thanks,
Tejas.


---
  gcc/omp-low.cc | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index da2051b0279..6b3853ed528 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -466,7 +466,8 @@ static bool
  use_pointer_for_field (tree decl, omp_context *shared_ctx)
  {
if (AGGREGATE_TYPE_P (TREE_TYPE (decl))
-  || TYPE_ATOMIC (TREE_TYPE (decl)))
+  || TYPE_ATOMIC (TREE_TYPE (decl))
+  || POLY_INT_CST_P (DECL_SIZE (decl)))
  return true;
  
/* We can only use copy-in/copy-out semantics for shared variables

--
2.25.1


Jakub





Re: [pushed: r15-7126] jit: fix startup on aarch64

2025-01-22 Thread Antoni Boucher

Hi David.
I had a patch for this here: https://github.com/antoyo/libgccjit/pull/20

The fact that you removed the debug_tree (and abort) will make it harder 
to figure out what the missing types to handle are.
This will also probably make it hard for people to understand why they 
get a type error when calling a builtin function with an unsupported type.
And as you can see in my PR, at least a few types were missing that you 
didn't add in your patch.


Do you have a better solution for this?

I just thought about this potential solution: perhaps if we get an 
unsupported type, we could add the builtin to an array instead of the 
hashmap: this way, we could tell the user that this builtin is not 
currently supported.

What are your thoughts on this?

Thanks.

Le 2025-01-22 à 08 h 38, David Malcolm a écrit :

libgccjit fails on startup on aarch64 (and probably other archs).

The issues are that

(a) within jit_langhook_init the call to
targetm.init_builtins can use types that aren't representable
via jit::recording::type, and

(b) targetm.init_builtins can call lang_hooks.decls.pushdecl, which
although a no-op for libgccjit has a gcc_unreachable.

Fixed thusly.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r15-7126-g27470f9a818538.

gcc/jit/ChangeLog:
* dummy-frontend.cc (tree_type_to_jit_type): For POINTER_TYPE,
bail out if the inner call to tree_type_to_jit_type fails.
Don't abort on unknown types.
(jit_langhook_pushdecl): Replace gcc_unreachable with return of
NULL_TREE.

Signed-off-by: David Malcolm 
---
  gcc/jit/dummy-frontend.cc | 8 +++-
  1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/gcc/jit/dummy-frontend.cc b/gcc/jit/dummy-frontend.cc
index 574851696311..1d0080d6fecb 100644
--- a/gcc/jit/dummy-frontend.cc
+++ b/gcc/jit/dummy-frontend.cc
@@ -1278,6 +1278,8 @@ recording::type* tree_type_to_jit_type (tree type)
{
  tree inner_type = TREE_TYPE (type);
  recording::type* element_type = tree_type_to_jit_type (inner_type);
+if (!element_type)
+  return nullptr;
  return element_type->get_pointer ();
}
else
@@ -1299,10 +1301,6 @@ recording::type* tree_type_to_jit_type (tree type)
}
}
  }
-
-fprintf (stderr, "Unknown type:\n");
-debug_tree (type);
-abort ();
}
  
return NULL;

@@ -1372,7 +1370,7 @@ jit_langhook_global_bindings_p (void)
  static tree
  jit_langhook_pushdecl (tree decl ATTRIBUTE_UNUSED)
  {
-  gcc_unreachable ();
+  return NULL_TREE;
  }
  
  static tree




Re: [GCC16 stage 1][RFC][PATCH 0/3]extend "counted_by" attribute to pointer fields of structures

2025-01-22 Thread Martin Uecker


Hello Michael,

Am Mittwoch, dem 22.01.2025 um 16:54 +0100 schrieb Michael Matz:
> On Wed, 22 Jan 2025, Martin Uecker wrote:
> 
> > > > So you do not need to look further.  But maybe I am missing something 
> > > > else.
> > > 
> > > Like ...
> > > 
> > > > > Note further that you may have '{ .y[1][3].z }', which is still not a 
> > > > > designation, but an expression under your proposal, whereas
> > > > > '{ .y[1][3].z = 1 }' would remain a designation.  This shows that you 
> > > > > now need arbitrary look-ahead to disambiguate the two.  A Very Bad 
> > > > > Idea.
> > > 
> > > ... this?
> > 
> > In .y[1][3].z  after .y you can decide whether y is a member of the
> > struct being initialized.  If it is, it is a designator and if not
> > it must be an expression.
> 
> If y is not a member it must be an expression, true.  But if it's a member 
> you don't know, it may be a designation or an expression.

In an initializer I know all the members.

Martin 




Re: [PATCH] c++, v2: Implement for static locals CWG 2867 - Order of initialization for structured bindings [PR115769]

2025-01-22 Thread Jason Merrill

On 9/6/24 8:02 AM, Jakub Jelinek wrote:

Hi!

On Wed, Aug 14, 2024 at 06:11:35PM +0200, Jakub Jelinek wrote:

Here is the I believe ABI compatible version, which uses the separate
guard variables, so different structured binding variables can be
initialized in different threads, but the thread that did the artificial
base initialization will keep temporaries live at least until the last
guard variable is released (i.e. when even that variable has been
initialized).

Bootstrapped/regtested on x86_64-linux and i686-linux on top of the
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660354.html
patch, ok for trunk?

As for namespace scope structured bindings and this DR, all of
set_up_extended_ref_temp, cp_finish_decl -> expand_static_init and
cp_finish_decl -> cp_finish_decomp -> cp_finish_decl -> expand_static_init
in that case just push some decls into the static_aggregates or
tls_aggregates chains.
So, we can end up e.g. with the most important decl for a extended ref
temporary (which initializes some temporaries), then perhaps some more
of those, then DECL_DECOMPOSITION_P base, then n times optionally some further
extended refs and DECL_DECOMPOSITION_P non-base and I think we need
to one_static_initialization_or_destruction all of them together, by
omitting CLEANUP_POINT_EXPR from the very first one (or all until the
DECL_DECOMPOSITION_P base?), say through temporarily clearing
stmts_are_full_exprs_p and then wrapping whatever
one_static_initialization_or_destruction produces for all of those into
a single CLEANUP_POINT_EXPR argument.
Perhaps remember static_aggregates or tls_aggregates early before any
check_initializer etc. calls and then after cp_finish_decomp cut that
TREE_LIST nodes and pass that as a separate TREE_VALUE in the list.
Though, not sure what to do about modules.cc uses of these, it needs
to save/restore that stuff somehow too.


Now that the CWG 2867 patch for automatic structured bindings is in,
here is an updated version of the block scope static structured bindings
CWG 2867 patch.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK.


No patch for the namespace scope structured bindings yet, will work on that
soon.

2024-09-05  Jakub Jelinek  

PR c++/115769
* decl.cc: Partially implement CWG 2867 - Order of initialization
for structured bindings.
(cp_finish_decl): If need_decomp_init, for function scope structure
binding bases, temporarily clear stmts_are_full_exprs_p before
calling expand_static_init, after it call cp_finish_decomp and wrap
code emitted by both into maybe_cleanup_point_expr_void and ensure
cp_finish_decomp isn't called again.

* g++.dg/DRs/dr2867-3.C: New test.
* g++.dg/DRs/dr2867-4.C: New test.

--- gcc/cp/decl.cc.jj   2024-09-04 19:55:59.046491602 +0200
+++ gcc/cp/decl.cc  2024-09-04 20:04:35.695952219 +0200
@@ -9140,7 +9140,24 @@ cp_finish_decl (tree decl, tree init, bo
 initializer.  It is not legal to redeclare a static data
 member, so this issue does not arise in that case.  */
else if (var_definition_p && TREE_STATIC (decl))
-   expand_static_init (decl, init);
+   {
+ if (decomp && DECL_FUNCTION_SCOPE_P (decl))
+   {
+ tree sl = push_stmt_list ();
+ auto saved_stmts_are_full_exprs_p = stmts_are_full_exprs_p ();
+ current_stmt_tree ()->stmts_are_full_exprs_p = 0;
+ expand_static_init (decl, init);
+ current_stmt_tree ()->stmts_are_full_exprs_p
+   = saved_stmts_are_full_exprs_p;
+ cp_finish_decomp (decl, decomp);
+ decomp = NULL;
+ sl = pop_stmt_list (sl);
+ sl = maybe_cleanup_point_expr_void (sl);
+ add_stmt (sl);
+   }
+ else
+   expand_static_init (decl, init);
+   }
  }
  
/* If a CLEANUP_STMT was created to destroy a temporary bound to a

--- gcc/testsuite/g++.dg/DRs/dr2867-3.C.jj  2024-08-13 21:05:42.876446125 
+0200
+++ gcc/testsuite/g++.dg/DRs/dr2867-3.C 2024-08-13 21:05:42.876446125 +0200
@@ -0,0 +1,159 @@
+// CWG2867 - Order of initialization for structured bindings.
+// { dg-do run { target c++11 } }
+// { dg-options "" }
+
+#define assert(X) do { if (!(X)) __builtin_abort(); } while (0)
+
+namespace std {
+  template struct tuple_size;
+  template struct tuple_element;
+}
+
+int a, c, d, i;
+
+struct A {
+  A () { assert (c == 3); ++c; }
+  ~A () { ++a; }
+  template  int &get () const { assert (c == 5 + I); ++c; return i; }
+};
+
+template <> struct std::tuple_size  { static const int value = 4; };
+template  struct std::tuple_element  { using type = int; };
+template <> struct std::tuple_size  { static const int value = 4; };
+template  struct std::tuple_element  { using type = int; };
+
+struct B {
+  B () { assert (c >= 1 && c <= 2); ++c; }
+  ~B () { assert (c >= 9 && c <= 10); ++c; }
+};
+
+struct C {
+  conste

Re: [GCC16 stage 1][RFC][PATCH 0/3]extend "counted_by" attribute to pointer fields of structures

2025-01-22 Thread Michael Matz
Hello,

On Wed, 22 Jan 2025, Martin Uecker wrote:

> > > In .y[1][3].z  after .y you can decide whether y is a member of the
> > > struct being initialized.  If it is, it is a designator and if not
> > > it must be an expression.
> > 
> > If y is not a member it must be an expression, true.  But if it's a member 
> > you don't know, it may be a designation or an expression.
> 
> In an initializer I know all the members.

My sentence was ambiguous :-)  Trying again: When it's a member, and you 
know it's a member, then you still don't know if it's going to be a 
designation or an expression.  It can be both.


Ciao,
Michael.


[PATCH] builtins: Store unspecified value to *exp for inf/nan [PR114877]

2025-01-22 Thread Jakub Jelinek
Hi!

The fold_builtin_frexp folding for NaN/Inf just returned the first argument
with evaluating second arguments side-effects, rather than storing something
to what the second argument points to.

The PR argues that the C standard requires the function to store something
there but what exactly is stored is unspecified, so not storing there
anything can result in UB if the value isn't initialized and is read later.

glibc and newlib store there 0, musl apparently doesn't store anything.

The following patch stores there zero (or would you prefer storing there
some other value, 42, INT_MAX, INT_MIN, etc.?; zero is cheapest to form
in assembly though) and adjusts the test so that it
doesn't rely on not storing there anything but instead checks for
-Wmaybe-uninitialized warning to find out that something has been stored
there.
Unfortunately I had to disable the NaN tests for -O0, while we can fold
__builtin_isnan (__builtin_nan ("")) at compile time, we can't fold
__builtin_isnan ((i = 0, __builtin_nan (""))) at compile time.
fold_builtin_classify uses just tree_expr_nan_p and if that isn't true
(because expr is a COMPOUND_EXPR with tree_expr_nan_p on the second arg),
it does
  arg = builtin_save_expr (arg);
  return fold_build2_loc (loc, UNORDERED_EXPR, type, arg, arg);
and that isn't folded at -O0 further, as we wrap it into SAVE_EXPR and
nothing propagates the NAN to the comparison.
I think perhaps tree_expr_nan_p etc. could have case COMPOUND_EXPR:
added and recurse on the second argument, but that feels like stage1
material to me if we want to do that at all.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2025-01-22  Jakub Jelinek  

PR middle-end/114877
* builtins.cc (fold_builtin_frexp): Handle rvc_nan and rvc_inf cases
like rvc_zero, return passed in arg and set *exp = 0.

* gcc.dg/torture/builtin-frexp-1.c: Add -Wmaybe-uninitialized as
dg-additional-options.
(bar): New function.
(TESTIT_FREXP2): Rework the macro so that it doesn't test whether
nothing has been stored to what the second argument points to, but
instead that something has been stored there, whatever it is.
(main): Temporarily don't enable the nan tests for -O0.

--- gcc/builtins.cc.jj  2025-01-22 09:15:41.700176403 +0100
+++ gcc/builtins.cc 2025-01-22 09:42:42.764927141 +0100
@@ -9574,14 +9574,16 @@ fold_builtin_frexp (location_t loc, tree
   switch (value->cl)
   {
   case rvc_zero:
+  case rvc_nan:
+  case rvc_inf:
/* For +-0, return (*exp = 0, +-0).  */
+   /* For +-NaN or +-Inf, *exp is unspecified, but something should
+  be stored there so that it isn't read from uninitialized object.
+  As glibc and newlib store *exp = 0 for +-Inf/NaN, storing
+  0 here as well is easiest.  */
exp = integer_zero_node;
frac = arg0;
break;
-  case rvc_nan:
-  case rvc_inf:
-   /* For +-NaN or +-Inf, *exp is unspecified, return arg0.  */
-   return omit_one_operand_loc (loc, rettype, arg0, arg1);
   case rvc_normal:
{
  /* Since the frexp function always expects base 2, and in
--- gcc/testsuite/gcc.dg/torture/builtin-frexp-1.c.jj   2020-01-12 
11:54:37.546396315 +0100
+++ gcc/testsuite/gcc.dg/torture/builtin-frexp-1.c  2025-01-22 
10:31:49.497851625 +0100
@@ -11,6 +11,7 @@
floating point formats need -funsafe-math-optimizations.  */
 /* { dg-require-effective-target inf } */
 /* { dg-options "-funsafe-math-optimizations" { target powerpc*-*-* } } */
+/* { dg-additional-options "-Wmaybe-uninitialized" } */
 
 extern void link_error(int);
 
@@ -52,22 +53,36 @@ extern void link_error(int);
 link_error(__LINE__); \
   } while (0)
 
+int __attribute__ ((__noipa__))
+bar (int x)
+{
+  (void) x;
+  return 42;
+} 
+
 /* Test that FUNCRES(frexp(NEG FUNCARG(ARGARG),&i)) is false.  Check
-   the sign as well.  Ensure side-effects are evaluated in i.  */
+   the sign as well.  Ensure side-effects are evaluated in the second
+   frexp argument.  */
 #define TESTIT_FREXP2(NEG,FUNCARG,ARGARG,FUNCRES) do { \
-  int i=5; \
+  int i, j = 5; \
   if (!__builtin_##FUNCRES##f(__builtin_frexpf(NEG 
__builtin_##FUNCARG##f(ARGARG),&i)) \
-  || CKSGN_F(__builtin_frexpf(NEG 
__builtin_##FUNCARG##f(ARGARG),(i++,&i)), NEG __builtin_##FUNCARG##f(ARGARG)) \
-  || CKEXP(i,6)) \
+  || CKSGN_F(__builtin_frexpf(NEG 
__builtin_##FUNCARG##f(ARGARG),(j++,&i)), NEG __builtin_##FUNCARG##f(ARGARG)) \
+  || CKEXP(j,6)) \
 link_error(__LINE__); \
+  if (CKEXP(bar(i),42)) \
+__builtin_abort(); \
   if (!__builtin_##FUNCRES(__builtin_frexp(NEG 
__builtin_##FUNCARG(ARGARG),&i)) \
-  || CKSGN(__builtin_frexp(NEG __builtin_##FUNCARG(ARGARG),(i++,&i)), NEG 
__builtin_##FUNCARG(ARGARG)) \
-  || CKEXP(i,7)) \
+  || CKSGN(__builtin_frexp(NEG __builtin_##FUNCARG(ARGARG),(j++,&i)), NEG 
__builtin_##FUNCARG(ARGARG)) \
+  ||

Re: [PATCH v2 01/12] OpenMP/PolyInt: Pass poly-int structures by address to OMP libs.

2025-01-22 Thread Jakub Jelinek
On Wed, Jan 22, 2025 at 04:19:37PM +0530, Tejas Belagod wrote:
> On 1/21/25 10:16 PM, Jakub Jelinek wrote:
> > On Fri, Oct 18, 2024 at 11:52:22AM +0530, Tejas Belagod wrote:
> > > Currently poly-int type structures are passed by value to OpenMP runtime
> > > functions for shared clauses etc.  This patch improves on this by passing
> > > around poly-int structures by address to avoid copy-overhead.
> > > 
> > > gcc/ChangeLog
> > >   * omp-low.c (use_pointer_for_field): Use pointer if the OMP data
> > >   structure's field type is a poly-int.
> > 
> > I think I've acked this one earlier already.
> > It is still ok.
> > 
> 
> Thanks Jakub for the reviews. Just to clarify  - this series is all now for
> Stage 1, I'm guessing?

Not necessarily, but likely.
That said, I think the use_pointer_for_field patch can be committed
separately now, doesn't have to wait for the rest.

The general idea behind the tests would be to test something that users
could be naturally using, so for worksharing constructs test something
parallelizable, with multiple threads doing some work (of course, for the
testcase purposes it doesn't need to be some really meaningful work, just
something simple) and when testing the various data sharing or mapping of
the variable length types, it should check that they are actually handled
correctly (so for something shared see if multiple threads can read the
shared variable (and otherwise perhaps do some slightly different work
rather than the same in all threads), then perhaps in a parallel after
a #pragma omp barrier write it in one of the threads, then after another
#pragma omp barrier try to read it again in all the threads and verify each
thread sees the one written earlier; for the privatization clauses verify
that they are really private, let each thread write a different value to
them concurrently and say after a barrier verify they read what they
actually wrote, plus test the various extra properties of the privatization
clauses, say for firstprivate test reading from the value before the
parallelization that all threads read the expected value, for lastprivate
that the value from the last iteration or section has been propagated back
to the original item, etc.).  In any cases the tests shouldn't have data
races.  Some tests e.g. for tasks would be nice too.
Perhaps as the work for the different threads or different iterations of
say omp for or omp loop or omp distribute you could be using just helper
functions that take some SVE vectors as arguments (and perhaps the iteration
number or thread number as another so that the work is different in each)
and/or return them and e.g. for the privatization also check passing of SVE
var address to a helper function and reading the value in there.

Now, obviously somewhere in the gomp/libgomp testsuites we have tests that
test the corner cases like behavior of a parallel with a single thread or
worksharing constructs outside of an explicit parallel, but I think doing
the same for SVE isn't really needed, or at least it should be tested in
addition of tests actually testing something parallelized.

Jakub



[patch,avr,applied] Add tests for LRA's PR118591

2025-01-22 Thread Georg-Johann Lay

Added 2 tests for PR118591.

Johann

--

AVR: Add test cases for PR118591.

gcc/testsuite/
PR rtl-optimization/118591
* gcc.target/avr/torture/pr118591-1.c: New test.
* gcc.target/avr/torture/pr118591-2.c: New test.

diff --git a/gcc/testsuite/gcc.target/avr/torture/pr118591-1.c 
b/gcc/testsuite/gcc.target/avr/torture/pr118591-1.c

new file mode 100644
index 000..814f0410a7f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/avr/torture/pr118591-1.c
@@ -0,0 +1,22 @@
+/* { dg-do run { target { ! avr_tiny } } } */
+/* { dg-additional-options "-std=c99 -mlra" } */
+
+__attribute__((noipa))
+void func2 (long long a1, long long a2, long b)
+{
+  static unsigned char count = 0;
+  if (b != count++)
+__builtin_abort ();
+}
+
+int main (void)
+{
+  for (long b = 0; b < 5; ++b)
+{
+  __asm ("" ::: "r5", "r9", "r24", "r20", "r16", "r12", "r30");
+
+  func2 (0, 0, b);
+}
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/avr/torture/pr118591-2.c 
b/gcc/testsuite/gcc.target/avr/torture/pr118591-2.c

new file mode 100644
index 000..83d36060088
--- /dev/null
+++ b/gcc/testsuite/gcc.target/avr/torture/pr118591-2.c
@@ -0,0 +1,21 @@
+/* Test case failed on avrtiny.  */
+/* { dg-do run } */
+/* { dg-additional-options "-std=c99 -mlra" } */
+
+__attribute__((noipa))
+void func2 (long a, long b)
+{
+  static unsigned char count = 0;
+  if (b != count++)
+__builtin_abort ();
+}
+
+int main (void)
+{
+  for (long b = 0; b < 5; ++b)
+{
+  func2 (0, b);
+}
+
+  return 0;
+}


Re: [PATCH 3/3] aarch64: Avoid redundant writes to FPMR

2025-01-22 Thread Kyrylo Tkachov
Hi Richard,

> On 22 Jan 2025, at 13:21, Richard Sandiford  wrote:
> 
> GCC 15 is the first release to support FP8 intrinsics.
> The underlying instructions depend on the value of a new register,
> FPMR.  Unlike FPCR, FPMR is a normal call-clobbered/caller-save
> register rather than a global register.  So:
> 
> - The FP8 intrinsics take a final uint64_t argument that
>  specifies what value FPMR should have.
> 
> - If an FP8 operation is split across multiple functions,
>  it is likely that those functions would have a similar argument.
> 
> If the object code has the structure:
> 
>for (...)
>  fp8_kernel (..., fpmr_value);
> 
> then fp8_kernel would set FPMR to fpmr_value each time it is
> called, even though FPMR will already have that value for at
> least the second and subsequent calls (and possibly the first).
> 
> The working assumption for the ABI has been that writes to
> registers like FPMR can in general be more expensive than
> reads and so it would be better to use a conditional write like:
> 
>   mrs tmp, fpmr
>   cmp tmp, 
>   beq 1f
>   nsr fpmr, 

Typo “msr” here and in the comment in the code.

> 1:
> 
> instead of writing the same value to FPMR repeatedly.
> 
> This patch implements that.  It also adds a tuning flag that suppresses
> the behaviour, both to make testing easier and to support any future
> cores that (for example) are able to rename FPMR.
> 
> Hopefully this really is the last part of the FP8 enablement.
> 
> Tested on aarch64-linux-gnu.  I'll push in about 24 hours
> if there are no comments before then.
> 
> Richard
> 
> 
> gcc/
> * config/aarch64/aarch64-tuning-flags.def
> (AARCH64_EXTRA_TUNE_CHEAP_FPMR_WRITE): New tuning flag.
> * config/aarch64/aarch64.h (TARGET_CHEAP_FPMR_WRITE): New macro.
> * config/aarch64/aarch64.md: Split moves into FPMR into a test
> and branch around.
> (aarch64_write_fpmr): New pattern.
> 
> gcc/testsuite/
> * g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Add
> cheap_fpmr_write by default.
> * gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Likewise.
> * gcc.target/aarch64/acle/fp8.c: Add cheap_fpmr_write.
> * gcc.target/aarch64/acle/fpmr-2.c: Likewise.
> * gcc.target/aarch64/simd/vcvt_fpm.c: Likewise.
> * gcc.target/aarch64/simd/vdot2_fpm.c: Likewise.
> * gcc.target/aarch64/simd/vdot4_fpm.c: Likewise.
> * gcc.target/aarch64/simd/vmla_fpm.c: Likewise.
> * gcc.target/aarch64/acle/fpmr-6.c: New test.
> ---
> gcc/config/aarch64/aarch64-tuning-flags.def   | 15 +++
> gcc/config/aarch64/aarch64.h  |  5 +++
> gcc/config/aarch64/aarch64.md | 39 +++
> .../sve2/acle/aarch64-sve2-acle-asm.exp   |  2 +-
> gcc/testsuite/gcc.target/aarch64/acle/fp8.c   |  2 +-
> .../gcc.target/aarch64/acle/fpmr-2.c  |  2 +-
> .../gcc.target/aarch64/acle/fpmr-6.c  | 36 +
> .../gcc.target/aarch64/simd/vcvt_fpm.c|  2 +-
> .../gcc.target/aarch64/simd/vdot2_fpm.c   |  2 +-
> .../gcc.target/aarch64/simd/vdot4_fpm.c   |  2 +-
> .../gcc.target/aarch64/simd/vmla_fpm.c|  2 +-
> .../sve2/acle/aarch64-sve2-acle-asm.exp   |  2 +-
> 12 files changed, 103 insertions(+), 8 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/fpmr-6.c
> 
> diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def 
> b/gcc/config/aarch64/aarch64-tuning-flags.def
> index 60967aac903..7a67d6197d9 100644
> --- a/gcc/config/aarch64/aarch64-tuning-flags.def
> +++ b/gcc/config/aarch64/aarch64-tuning-flags.def
> @@ -48,6 +48,21 @@ AARCH64_EXTRA_TUNING_OPTION ("fully_pipelined_fma", 
> FULLY_PIPELINED_FMA)
>rather than re-use an input predicate register.  */
> AARCH64_EXTRA_TUNING_OPTION ("avoid_pred_rmw", AVOID_PRED_RMW)
> 
> +/* Whether writes to the FPMR are cheap enough that:
> +
> +   msr fpmr, 
> +
> +   is better than:
> +
> +   mrs tmp, fpmr
> +   cmp tmp, 
> +   beq 1f
> +   nsr fpmr, 
> + 1:
> +
> +   even when the branch is predictably taken.  */
> +AARCH64_EXTRA_TUNING_OPTION ("cheap_fpmr_write", CHEAP_FPMR_WRITE)
> +
> /* Baseline tuning settings suitable for all modern cores.  */
> #define AARCH64_EXTRA_TUNE_BASE (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND \
> | AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA)
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index 218868a5246..5cbf442130b 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -486,6 +486,11 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE 
> ATTRIBUTE_UNUSED
> /* fp8 instructions are enabled through +fp8.  */
> #define TARGET_FP8 AARCH64_HAVE_ISA (FP8)
> 
> +/* See the comment above the tuning flag for details.  */
> +#define TARGET_CHEAP_FPMR_WRITE \
> +  (bool (aarch64_tune_params.extra_tuning_flags \
> + & AARCH64_EXTRA_TUNE_CHEAP_FPMR_WRITE))
> +
> /* Combinatorial tests.  */
> 
> #define TARGET_SVE2_OR_SME2 \
> diff --git a/gcc/config/aarch64/aarch6

Re: [PATCH 3/3] aarch64: Avoid redundant writes to FPMR

2025-01-22 Thread Richard Sandiford
Kyrylo Tkachov  writes:
> Hi Richard,
>
>> On 22 Jan 2025, at 13:21, Richard Sandiford  
>> wrote:
>> 
>> GCC 15 is the first release to support FP8 intrinsics.
>> The underlying instructions depend on the value of a new register,
>> FPMR.  Unlike FPCR, FPMR is a normal call-clobbered/caller-save
>> register rather than a global register.  So:
>> 
>> - The FP8 intrinsics take a final uint64_t argument that
>>  specifies what value FPMR should have.
>> 
>> - If an FP8 operation is split across multiple functions,
>>  it is likely that those functions would have a similar argument.
>> 
>> If the object code has the structure:
>> 
>>for (...)
>>  fp8_kernel (..., fpmr_value);
>> 
>> then fp8_kernel would set FPMR to fpmr_value each time it is
>> called, even though FPMR will already have that value for at
>> least the second and subsequent calls (and possibly the first).
>> 
>> The working assumption for the ABI has been that writes to
>> registers like FPMR can in general be more expensive than
>> reads and so it would be better to use a conditional write like:
>> 
>>   mrs tmp, fpmr
>>   cmp tmp, 
>>   beq 1f
>>   nsr fpmr, 
>
> Typo “msr” here and in the comment in the code.

Oops, thanks, will fix.

> [...]
>> @@ -1883,6 +1884,44 @@ (define_split
>>   }
>> )
>> 
>> +;; The preferred way of writing to the FPMR is to test whether it already
>> +;; has the desired value and branch around the write if so.  This reduces
>> +;; the number of redundant FPMR writes caused by ABI boundaries, such as in:
>> +;;
>> +;;for (...)
>> +;;  fp8_kernel (..., fpmr_value);
>> +;;
>> +;; Without this optimization, fp8_kernel would set FPMR to fpmr_value each
>> +;; time that it is called.
>> +;;
>> +;; We do this as a split so that hardreg_pre can optimize the moves first.
>> +(define_split
>> +  [(set (reg:DI FPM_REGNUM)
>> +(match_operand:DI 0 "aarch64_reg_or_zero"))]
>> +  "TARGET_FP8 && !TARGET_CHEAP_FPMR_WRITE && can_create_pseudo_p ()"
>> +  [(const_int 0)]
>> +  {
>> +auto label = gen_label_rtx ();
>> +rtx current = copy_to_reg (gen_rtx_REG (DImode, FPM_REGNUM));
>> +rtx cond = gen_rtx_EQ (VOIDmode, current, operands[0]);
>> +emit_jump_insn (gen_cbranchdi4 (cond, current, operands[0], label));
>
> Do you think it’s worth marking this jump as likely?
> In some other expand code in the backend where we emit jumps we sometimes use 
> aarch64_emit_unlikely_jump.

Ah, yeah, I should have said that I'd wondered about that.  But in the
end it didn't seem appropriate.  Given that hardreg_pre should remove
local instances of redundancy, we don't really have much information
about whether the branch is likely or unlikely.  I think instead the
hope/expectation is that the branch has a predictable pattern.

Thanks,
Richard


[PUSHED] s390: Fix arch15 machine string for binutils

2025-01-22 Thread Stefan Schulze Frielinghaus
gcc/ChangeLog:

* config/s390/s390.cc: Fix arch15 machine string which must not
be empty.
---
 gcc/config/s390/s390.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index 313f968c87e..86a5f059b85 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -342,7 +342,7 @@ const struct s390_processor processor_table[] =
   { "z14","arch12", PROCESSOR_3906_Z14,&zEC12_cost,  12 },
   { "z15","arch13", PROCESSOR_8561_Z15,&zEC12_cost,  13 },
   { "z16","arch14", PROCESSOR_3931_Z16,&zEC12_cost,  14 },
-  { "arch15", "",   PROCESSOR_ARCH15,  &zEC12_cost,  15 },
+  { "arch15", "arch15", PROCESSOR_ARCH15,  &zEC12_cost,  15 },
   { "native", "",   PROCESSOR_NATIVE,  NULL, 0  }
 };
 
-- 
2.47.0



Re: [GCC16 stage 1][RFC][PATCH 0/3]extend "counted_by" attribute to pointer fields of structures

2025-01-22 Thread Michael Matz
Hello,

On Wed, 22 Jan 2025, Martin Uecker wrote:

> > > So you do not need to look further.  But maybe I am missing something 
> > > else.
> > 
> > Like ...
> > 
> > > > Note further that you may have '{ .y[1][3].z }', which is still not a 
> > > > designation, but an expression under your proposal, whereas
> > > > '{ .y[1][3].z = 1 }' would remain a designation.  This shows that you 
> > > > now need arbitrary look-ahead to disambiguate the two.  A Very Bad Idea.
> > 
> > ... this?
> 
> In .y[1][3].z  after .y you can decide whether y is a member of the
> struct being initialized.  If it is, it is a designator and if not
> it must be an expression.

If y is not a member it must be an expression, true.  But if it's a member 
you don't know, it may be a designation or an expression.


Ciao,
Michael.


[PATCH 1/3] aarch64: Allow FPMR source values to be zero

2025-01-22 Thread Richard Sandiford
GCC 15 is going to be the first release to support FPMR.
The alternatives for moving values into FPMR were missing
a zero alternative, meaning that moves of zero would use an
unnecessary temporary register.

Tested on aarch64-linux-gnu.  I'll push in about 24 hours
if there are no comments before then.

Richard


gcc/
* config/aarch64/aarch64.md (*mov_aarch64)
(*movsi_aarch64, *movdi_aarch64): Allow the source of an MSR
to be zero.

gcc/testsuite/
* gcc.target/aarch64/acle/fp8.c: Add tests for moving zero into FPMR.
---
 gcc/config/aarch64/aarch64.md   | 50 ++---
 gcc/testsuite/gcc.target/aarch64/acle/fp8.c | 47 +++
 2 files changed, 72 insertions(+), 25 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 39655ea5e39..776c4c4ceee 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1531,7 +1531,7 @@ (define_insn "*mov_aarch64"
  [w, r Z  ; neon_from_gp, nosimd ] fmov\t%s0, %w1
  [w, w; neon_dup   , simd   ] dup\t%0, %1.[0]
  [w, w; neon_dup   , nosimd ] fmov\t%s0, %s1
- [Umv, r  ; mrs, *  ] msr\t%0, %x1
+ [Umv, rZ ; mrs, *  ] msr\t%0, %x1
  [r, Umv  ; mrs, *  ] mrs\t%x0, %1
   }
 )
@@ -1595,7 +1595,7 @@ (define_insn_and_split "*movsi_aarch64"
  [r  , w  ; f_mrc, fp  , 4] fmov\t%w0, %s1
  [w  , w  ; fmov , fp  , 4] fmov\t%s0, %s1
  [w  , Ds ; neon_move, simd, 4] << 
aarch64_output_scalar_simd_mov_immediate (operands[1], SImode);
- [Umv, r  ; mrs  , *   , 4] msr\t%0, %x1
+ [Umv, rZ ; mrs  , *   , 4] msr\t%0, %x1
  [r, Umv  ; mrs  , *   , 4] mrs\t%x0, %1
   }
   "CONST_INT_P (operands[1]) && !aarch64_move_imm (INTVAL (operands[1]), 
SImode)
@@ -1613,30 +1613,30 @@ (define_insn_and_split "*movdi_aarch64"
   "(register_operand (operands[0], DImode)
 || aarch64_reg_or_zero (operands[1], DImode))"
   {@ [cons: =0, 1; attrs: type, arch, length]
- [w, Z  ; neon_move, simd, 4] movi\t%0.2d, #0
- [r, r  ; mov_reg  , *   , 4] mov\t%x0, %x1
- [k, r  ; mov_reg  , *   , 4] mov\t%0, %x1
- [r, k  ; mov_reg  , *   , 4] mov\t%x0, %1
- [r, O  ; mov_imm  , *   , 4] << aarch64_is_mov_xn_imm (INTVAL 
(operands[1])) ? "mov\t%x0, %1" : "mov\t%w0, %1";
- [r, n  ; mov_imm  , *   ,16] #
+ [w, Z   ; neon_move, simd, 4] movi\t%0.2d, #0
+ [r, r   ; mov_reg  , *   , 4] mov\t%x0, %x1
+ [k, r   ; mov_reg  , *   , 4] mov\t%0, %x1
+ [r, k   ; mov_reg  , *   , 4] mov\t%x0, %1
+ [r, O   ; mov_imm  , *   , 4] << aarch64_is_mov_xn_imm (INTVAL 
(operands[1])) ? "mov\t%x0, %1" : "mov\t%w0, %1";
+ [r, n   ; mov_imm  , *   ,16] #
  /* The "mov_imm" type for CNT is just a placeholder.  */
- [r, Usv; mov_imm  , sve , 4] << aarch64_output_sve_cnt_immediate ("cnt", 
"%x0", operands[1]);
- [r, Usr; mov_imm  , sve,  4] << aarch64_output_sve_rdvl (operands[1]);
- [r, UsR; mov_imm  , sme,  4] << aarch64_output_rdsvl (operands[1]);
- [r, m  ; load_8   , *   , 4] ldr\t%x0, %1
- [w, m  ; load_8   , fp  , 4] ldr\t%d0, %1
- [m, r Z; store_8  , *   , 4] str\t%x1, %0
- [m, w  ; store_8  , fp  , 4] str\t%d1, %0
- [r, Usw; load_8   , *   , 8] << TARGET_ILP32 ? "adrp\t%0, %A1\;ldr\t%w0, 
[%0, %L1]" : "adrp\t%0, %A1\;ldr\t%0, [%0, %L1]";
- [r, Usa; adr  , *   , 4] adr\t%x0, %c1
- [r, Ush; adr  , *   , 4] adrp\t%x0, %A1
- [w, r Z; f_mcr, fp  , 4] fmov\t%d0, %x1
- [r, w  ; f_mrc, fp  , 4] fmov\t%x0, %d1
- [w, w  ; fmov , fp  , 4] fmov\t%d0, %d1
- [w, Dd ; neon_move, simd, 4] << aarch64_output_scalar_simd_mov_immediate 
(operands[1], DImode);
- [w, Dx ; neon_move, simd, 8] #
- [Umv, r; mrs  , *   , 4] msr\t%0, %1
- [r, Umv; mrs  , *   , 4] mrs\t%0, %1
+ [r, Usv ; mov_imm  , sve , 4] << aarch64_output_sve_cnt_immediate ("cnt", 
"%x0", operands[1]);
+ [r, Usr ; mov_imm  , sve,  4] << aarch64_output_sve_rdvl (operands[1]);
+ [r, UsR ; mov_imm  , sme,  4] << aarch64_output_rdsvl (operands[1]);
+ [r, m   ; load_8   , *   , 4] ldr\t%x0, %1
+ [w, m   ; load_8   , fp  , 4] ldr\t%d0, %1
+ [m, r Z ; store_8  , *   , 4] str\t%x1, %0
+ [m, w   ; store_8  , fp  , 4] str\t%d1, %0
+ [r, Usw ; load_8   , *   , 8] << TARGET_ILP32 ? "adrp\t%0, %A1\;ldr\t%w0, 
[%0, %L1]" : "adrp\t%0, %A1\;ldr\t%0, [%0, %L1]";
+ [r, Usa ; adr  , *   , 4] adr\t%x0, %c1
+ [r, Ush ; adr  , *   , 4] adrp\t%x0, %A1
+ [w, r Z ; f_mcr, fp  , 4] fmov\t%d0, %x1
+ [r, w   ; f_mrc, fp  , 4] fmov\t%x0, %d1
+ [w, w   ; fmov , fp  , 4] fmov\t%d0, %d1
+ [w, Dd  ; neon_move, simd, 4] << aarch64_output_scalar_simd_mov_immediate 
(operands[1], DImode);
+ [w, Dx  ; neon_move, simd, 8] #
+ [Umv, rZ; mrs  , *   , 4] msr\t%0, %x1
+ [r, Umv ; mrs  , *   , 4] mrs\t%0, %1
   }
   "CONST_INT

[PATCH 2/3] aarch64: Fix memory cost for FPM_REGNUM

2025-01-22 Thread Richard Sandiford
GCC 15 is going to be the first release to support FPMR.
While working on a follow-up patch, I noticed that for:

(set (reg:DI R) ...)
...
(set (reg:DI fpmr) (reg:DI R))

IRA would prefer to spill R to memory rather than allocate a GPR.
This is because the register move cost for GENERAL_REGS to
MOVEABLE_SYSREGS is very high:

  /* Moves to/from sysregs are expensive, and must go via GPR.  */
  if (from == MOVEABLE_SYSREGS)
return 80 + aarch64_register_move_cost (mode, GENERAL_REGS, to);
  if (to == MOVEABLE_SYSREGS)
return 80 + aarch64_register_move_cost (mode, from, GENERAL_REGS);

but the memory cost for MOVEABLE_SYSREGS was the same as for
GENERAL_REGS, making memory much cheaper.

Loading and storing FPMR involves a GPR temporary, so the cost should
account for moving into and out of that temporary.

This did show up indirectly in some of the existing asm tests,
where the stack frame allocated 16 bytes for callee saves (D8)
and another 16 bytes for spilling a temporary register.

It's possible that other registers need the same treatment
and it's more than probable that this code needs a rework.
None of that seems suitable for stage 4 though.

Tested on aarch64-linux-gnu.  I'll push in about 24 hours
if there are no comments before then.

Richard


gcc/
* config/aarch64/aarch64.cc (aarch64_memory_move_cost): Account
for the cost of moving in and out of GENERAL_SYSREGS.

gcc/testsuite/
* gcc.target/aarch64/acle/fpmr-5.c: New test.
* gcc.target/aarch64/sve2/acle/asm/dot_lane_mf8.c: Don't expect
a spill slot to be allocated.
* gcc.target/aarch64/sve2/acle/asm/mlalb_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlallbb_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlallbt_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlalltb_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlalltt_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlalt_lane_mf8.c: Likewise.
---
 gcc/config/aarch64/aarch64.cc| 11 +--
 gcc/testsuite/gcc.target/aarch64/acle/fpmr-5.c   | 16 
 .../aarch64/sve2/acle/asm/dot_lane_mf8.c |  4 ++--
 .../aarch64/sve2/acle/asm/mlalb_lane_mf8.c   |  2 +-
 .../aarch64/sve2/acle/asm/mlallbb_lane_mf8.c |  2 +-
 .../aarch64/sve2/acle/asm/mlallbt_lane_mf8.c |  2 +-
 .../aarch64/sve2/acle/asm/mlalltb_lane_mf8.c |  2 +-
 .../aarch64/sve2/acle/asm/mlalltt_lane_mf8.c |  2 +-
 .../aarch64/sve2/acle/asm/mlalt_lane_mf8.c   |  2 +-
 9 files changed, 33 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/fpmr-5.c

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index dba779a8e51..a1f5619a615 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -15858,9 +15858,16 @@ aarch64_memory_move_cost (machine_mode mode, 
reg_class_t rclass_i, bool in)
? aarch64_tune_params.memmov_cost.load_fp
: aarch64_tune_params.memmov_cost.store_fp);
 
+  /* If the move needs to go through GPRs, add the cost of doing that.  */
+  int base = 0;
+  if (rclass_i == MOVEABLE_SYSREGS)
+base += (in
+? aarch64_register_move_cost (DImode, GENERAL_REGS, rclass_i)
+: aarch64_register_move_cost (DImode, rclass_i, GENERAL_REGS));
+
   return (in
- ? aarch64_tune_params.memmov_cost.load_int
- : aarch64_tune_params.memmov_cost.store_int);
+ ? base + aarch64_tune_params.memmov_cost.load_int
+ : base + aarch64_tune_params.memmov_cost.store_int);
 }
 
 /* Implement TARGET_INSN_COST.  We have the opportunity to do something
diff --git a/gcc/testsuite/gcc.target/aarch64/acle/fpmr-5.c 
b/gcc/testsuite/gcc.target/aarch64/acle/fpmr-5.c
new file mode 100644
index 000..da6d7f62f90
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/acle/fpmr-5.c
@@ -0,0 +1,16 @@
+/* { dg-options "-O" } */
+
+#include 
+
+void f(int cond)
+{
+  uint64_t x;
+  asm volatile ("" : "=r" (x));
+  if (cond)
+{
+  register uint64_t fpmr asm ("fpmr") = x;
+  asm volatile ("" :: "Umv" (fpmr));
+}
+}
+
+/* { dg-final { scan-assembler-not {\tsub\tsp,} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/dot_lane_mf8.c 
b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/dot_lane_mf8.c
index 9e54cd11c4b..83fe5cff5d3 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/dot_lane_mf8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/dot_lane_mf8.c
@@ -70,7 +70,7 @@ TEST_DUAL_Z (dot_lane_1_f16, svfloat16_t, svmfloat8_t,
 ** msr fpmr, x0
 ** mov (z[0-7])\.d, z8\.d
 ** fdotz0\.h, z1\.b, \1\.b\[1\]
-** ldr d8, \[sp\], 32
+** ldr d8, \[sp\], 16
 ** ret
 */
 TEST_DUAL_LANE_REG (dot_lane_z8_f16, svfloat16_t, svmfloat8_t, z8,
@@ -151,7 +151,7 @@ TEST_DUAL_Z (dot_lane_1_f32, svfloat32_t, svmfloat8_t,
 ** msr  

[PATCH 3/3] aarch64: Avoid redundant writes to FPMR

2025-01-22 Thread Richard Sandiford
GCC 15 is the first release to support FP8 intrinsics.
The underlying instructions depend on the value of a new register,
FPMR.  Unlike FPCR, FPMR is a normal call-clobbered/caller-save
register rather than a global register.  So:

- The FP8 intrinsics take a final uint64_t argument that
  specifies what value FPMR should have.

- If an FP8 operation is split across multiple functions,
  it is likely that those functions would have a similar argument.

If the object code has the structure:

for (...)
  fp8_kernel (..., fpmr_value);

then fp8_kernel would set FPMR to fpmr_value each time it is
called, even though FPMR will already have that value for at
least the second and subsequent calls (and possibly the first).

The working assumption for the ABI has been that writes to
registers like FPMR can in general be more expensive than
reads and so it would be better to use a conditional write like:

   mrs tmp, fpmr
   cmp tmp, 
   beq 1f
   nsr fpmr, 
 1:

instead of writing the same value to FPMR repeatedly.

This patch implements that.  It also adds a tuning flag that suppresses
the behaviour, both to make testing easier and to support any future
cores that (for example) are able to rename FPMR.

Hopefully this really is the last part of the FP8 enablement.

Tested on aarch64-linux-gnu.  I'll push in about 24 hours
if there are no comments before then.

Richard


gcc/
* config/aarch64/aarch64-tuning-flags.def
(AARCH64_EXTRA_TUNE_CHEAP_FPMR_WRITE): New tuning flag.
* config/aarch64/aarch64.h (TARGET_CHEAP_FPMR_WRITE): New macro.
* config/aarch64/aarch64.md: Split moves into FPMR into a test
and branch around.
(aarch64_write_fpmr): New pattern.

gcc/testsuite/
* g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Add
cheap_fpmr_write by default.
* gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Likewise.
* gcc.target/aarch64/acle/fp8.c: Add cheap_fpmr_write.
* gcc.target/aarch64/acle/fpmr-2.c: Likewise.
* gcc.target/aarch64/simd/vcvt_fpm.c: Likewise.
* gcc.target/aarch64/simd/vdot2_fpm.c: Likewise.
* gcc.target/aarch64/simd/vdot4_fpm.c: Likewise.
* gcc.target/aarch64/simd/vmla_fpm.c: Likewise.
* gcc.target/aarch64/acle/fpmr-6.c: New test.
---
 gcc/config/aarch64/aarch64-tuning-flags.def   | 15 +++
 gcc/config/aarch64/aarch64.h  |  5 +++
 gcc/config/aarch64/aarch64.md | 39 +++
 .../sve2/acle/aarch64-sve2-acle-asm.exp   |  2 +-
 gcc/testsuite/gcc.target/aarch64/acle/fp8.c   |  2 +-
 .../gcc.target/aarch64/acle/fpmr-2.c  |  2 +-
 .../gcc.target/aarch64/acle/fpmr-6.c  | 36 +
 .../gcc.target/aarch64/simd/vcvt_fpm.c|  2 +-
 .../gcc.target/aarch64/simd/vdot2_fpm.c   |  2 +-
 .../gcc.target/aarch64/simd/vdot4_fpm.c   |  2 +-
 .../gcc.target/aarch64/simd/vmla_fpm.c|  2 +-
 .../sve2/acle/aarch64-sve2-acle-asm.exp   |  2 +-
 12 files changed, 103 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/fpmr-6.c

diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def 
b/gcc/config/aarch64/aarch64-tuning-flags.def
index 60967aac903..7a67d6197d9 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -48,6 +48,21 @@ AARCH64_EXTRA_TUNING_OPTION ("fully_pipelined_fma", 
FULLY_PIPELINED_FMA)
rather than re-use an input predicate register.  */
 AARCH64_EXTRA_TUNING_OPTION ("avoid_pred_rmw", AVOID_PRED_RMW)
 
+/* Whether writes to the FPMR are cheap enough that:
+
+   msr fpmr, 
+
+   is better than:
+
+   mrs tmp, fpmr
+   cmp tmp, 
+   beq 1f
+   nsr fpmr, 
+ 1:
+
+   even when the branch is predictably taken.  */
+AARCH64_EXTRA_TUNING_OPTION ("cheap_fpmr_write", CHEAP_FPMR_WRITE)
+
 /* Baseline tuning settings suitable for all modern cores.  */
 #define AARCH64_EXTRA_TUNE_BASE (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND \
 | AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA)
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 218868a5246..5cbf442130b 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -486,6 +486,11 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE ATTRIBUTE_UNUSED
 /* fp8 instructions are enabled through +fp8.  */
 #define TARGET_FP8 AARCH64_HAVE_ISA (FP8)
 
+/* See the comment above the tuning flag for details.  */
+#define TARGET_CHEAP_FPMR_WRITE \
+  (bool (aarch64_tune_params.extra_tuning_flags \
+& AARCH64_EXTRA_TUNE_CHEAP_FPMR_WRITE))
+
 /* Combinatorial tests.  */
 
 #define TARGET_SVE2_OR_SME2 \
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 776c4c4ceee..071058dbeb3 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -356,6 +356,7 @@ (d

[pushed] aarch64: Fix aarch64_write_sysregdi predicate

2025-01-22 Thread Richard Sandiford
While working on another MSR-related patch, I noticed that
aarch64_write_sysregdi's constraints allowed zero, but its
predicate didn't.  This could in principle lead to an ICE
during or after RA, since "Z" allows the RA to rematerialise
a known zero directly into the instruction.

The usual techniques for exposing a bug like that didn't work in this
case, since the optimisers seem to make no attempt to remove redundant
zero moves (at least not for these unspec_volatiles).  But the problem
still seems worth fixing pre-emptively.

Tested on aarch64-linux-gnu & pushed.

Richard


gcc/
* config/aarch64/aarch64.md (aarch64_read_sysregti): Change
the source predicate to aarch64_reg_or_zero.

gcc/testsuite/
* gcc.target/aarch64/acle/rwsr-4.c: New test.
* gcc.target/aarch64/acle/rwsr-armv8p9.c: Avoid read of uninitialized
variable.
---
 gcc/config/aarch64/aarch64.md |  2 +-
 gcc/testsuite/gcc.target/aarch64/acle/rwsr-4.c| 15 +++
 .../gcc.target/aarch64/acle/rwsr-armv8p9.c|  4 +---
 3 files changed, 17 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr-4.c

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index f8d82cee903..39655ea5e39 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -659,7 +659,7 @@ (define_insn "aarch64_read_sysregti"
 
 (define_insn "aarch64_write_sysregdi"
   [(unspec_volatile:DI [(match_operand 0 "aarch64_sysreg_string" "")
-   (match_operand:DI 1 "register_operand" "rZ")]
+   (match_operand:DI 1 "aarch64_reg_or_zero" "rZ")]
   UNSPEC_SYSREG_WDI)]
   ""
   "msr\t%0, %x1"
diff --git a/gcc/testsuite/gcc.target/aarch64/acle/rwsr-4.c 
b/gcc/testsuite/gcc.target/aarch64/acle/rwsr-4.c
new file mode 100644
index 000..52fb603f3cf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/acle/rwsr-4.c
@@ -0,0 +1,15 @@
+/* { dg-options "-O" } */
+
+#include 
+#include 
+
+void f();
+void g()
+{
+  int64_t x = 0;
+  __arm_wsr64("tpidr_el0", x);
+  f();
+  __arm_wsr64("tpidr_el0", x);
+}
+
+/* { dg-final { scan-assembler-times {\tmsr\t[^,]+, xzr\n} 2 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/acle/rwsr-armv8p9.c 
b/gcc/testsuite/gcc.target/aarch64/acle/rwsr-armv8p9.c
index e2f297bbeeb..c49fbb5368e 100644
--- a/gcc/testsuite/gcc.target/aarch64/acle/rwsr-armv8p9.c
+++ b/gcc/testsuite/gcc.target/aarch64/acle/rwsr-armv8p9.c
@@ -3,10 +3,8 @@
 /* { dg-options "-O2 -march=armv8.9-a" } */
 #include 
 void
-readwrite_armv8p9a_sysregs ()
+readwrite_armv8p9a_sysregs (long long int a)
 {
-  long long int a;
-
   /* Write-only system registers.  */
   __arm_wsr64 ("pmzr_el0", a); /* { dg-final { scan-assembler 
"msr\ts3_3_c9_c13_4, x0" } } */
 
-- 
2.25.1



Re: [GCC16 stage 1][RFC][PATCH 0/3]extend "counted_by" attribute to pointer fields of structures

2025-01-22 Thread Michael Matz
Hello,

On Wed, 22 Jan 2025, Martin Uecker wrote:

> > You need to decide which is which after seeing the ".".  I'm guessing what 
> > you mean is that on seeing ".ident" as first two tokens inside in 
> > initializer-list you go the designator route, and not the 
> > initializer/assignment-expression route, even though the latter can now 
> > also start with ".ident".  
> 
> What I mean is that after parsing the dot followed by an identifier x, 
> if x is the name of a member of the structure S which is being initialized,
> it is  a designator, otherwise it is an expression that uses .x to refer
> to some member of an enclosing definition.

So, as I guessed.

> So you do not need to look further.  But maybe I am missing something 
> else.

Like ...

> > Note further that you may have '{ .y[1][3].z }', which is still not a 
> > designation, but an expression under your proposal, whereas
> > '{ .y[1][3].z = 1 }' would remain a designation.  This shows that you 
> > now need arbitrary look-ahead to disambiguate the two.  A Very Bad Idea.

... this?


Ciao,
Michael.

Re: [GCC16 stage 1][RFC][PATCH 0/3]extend "counted_by" attribute to pointer fields of structures

2025-01-22 Thread Martin Uecker
Am Mittwoch, dem 22.01.2025 um 16:25 +0100 schrieb Michael Matz:
> Hello,
> 
> On Wed, 22 Jan 2025, Martin Uecker wrote:
> 
> > > You need to decide which is which after seeing the ".".  I'm guessing 
> > > what 
> > > you mean is that on seeing ".ident" as first two tokens inside in 
> > > initializer-list you go the designator route, and not the 
> > > initializer/assignment-expression route, even though the latter can now 
> > > also start with ".ident".  
> > 
> > What I mean is that after parsing the dot followed by an identifier x, 
> > if x is the name of a member of the structure S which is being initialized,
> > it is  a designator, otherwise it is an expression that uses .x to refer
> > to some member of an enclosing definition.
> 
> So, as I guessed.

I think you missed the part that you can lookup whether x is
a member of the struct S.

> 
> > So you do not need to look further.  But maybe I am missing something 
> > else.
> 
> Like ...
> 
> > > Note further that you may have '{ .y[1][3].z }', which is still not a 
> > > designation, but an expression under your proposal, whereas
> > > '{ .y[1][3].z = 1 }' would remain a designation.  This shows that you 
> > > now need arbitrary look-ahead to disambiguate the two.  A Very Bad Idea.
> 
> ... this?

In .y[1][3].z  after .y you can decide whether y is a member of the
struct being initialized.  If it is, it is a designator and if not
it must be an expression.

Martin







Re: [GCC16 stage 1][RFC][PATCH 0/3]extend "counted_by" attribute to pointer fields of structures

2025-01-22 Thread Martin Uecker
Am Mittwoch, dem 22.01.2025 um 15:53 +0100 schrieb Michael Matz:
> Hello,
> 
> On Tue, 21 Jan 2025, Martin Uecker wrote:
> 
> > > > Coudn't you use the rule that .len refers to the closest enclosing 
> > > > structure
> > > > even without __self__ ?  This would then also disambiguate between 
> > > > designators
> > > > and other uses.
> > > 
> > > Right now, an expression cannot start with '.', which provides the 
> > > disambiguation between designators and expressions as initializers. 
> > 
> > You could disambiguate directly after parsing the identifier, which
> > does not seem overly problematic.
> 
> Which way?  When you allow ". identifier" as primary expression, then in
> 
>   struct S s = { .x = 42 };
> 
> the initializer can be parsed as designated initializer (with error 
> when 'x' is not a member of S) or as assignment expression like in
> 
>   struct T t = { foo = 42 };
> 
> You need to decide which is which after seeing the ".".  I'm guessing what 
> you mean is that on seeing ".ident" as first two tokens inside in 
> initializer-list you go the designator route, and not the 
> initializer/assignment-expression route, even though the latter can now 
> also start with ".ident".  

What I mean is that after parsing the dot followed by an identifier x, 
if x is the name of a member of the structure S which is being initialized,
it is  a designator, otherwise it is an expression that uses .x to refer
to some member of an enclosing definition.

So you do not need to look further.  But maybe I am missing something
else.

Martin


> But then, what about:
> 
>   struct U u = { .y };
> 
> It's certainly not a designation anymore, but you only know after not 
> seeing the '='.  So here it's actually an assignment-expression.  And as 
> you allowed ".ident" as primary expression this might rightfully refer to 
> some in-scope 'y' member of some outer struct (or give an error).
> 
> Note further that you may have '{ .y[1][3].z }', which is still not a 
> designation, but an expression under your proposal, whereas
> '{ .y[1][3].z = 1 }' would remain a designation.  This shows that you 
> now need arbitrary look-ahead to disambiguate the two.  A Very Bad Idea.
> 
> 
> Ciao,
> Michael.

-- 
Univ.-Prof. Dr. rer. nat. Martin Uecker
Graz University of Technology
Institute of Biomedical Imaging




[PATCH v4] arm: [MVE intrinsics] Avoid warnings when floating-point is not supported [PR 117814]

2025-01-22 Thread Christophe Lyon
If the target does not support floating-point, we register FP vector
types as 'void' (see register_vector_type).

The leads to warnings about 'pure attribute on function returning
void' when we declare the various load intrinsics because their
call_properties say CP_READ_MEMORY (thus giving them the 'pure'
attribute), but their return type is void.  This happens for instance
in gcc.target/arm/pr112337.c, depending on how GCC is built (I didn't
notice the warnings because arm_mve.h is considered as a system
include in my environment, and the warning is not emitted, but CI
reported it).

To avoid such warnings, declare floating-point scalar and vector types
even if the target does not have an FPU.

Note that since an FPU can be activated via #pragma GCC target
("arch=armv8.1-m.main+mve.fp" for instance), it means that such types
cannot appear and disappear withing a single TU, they have to be
available in all contexts.  This implies a noteworthy change for
__fp16: it not longer depends on using -mfp16-format=ieee or
alternative.  Also note that if the target ISA has the fp16 bit set,
we already silently activate -mfp16-format=ieee (with an error if
-mfp16-format=alternative was supplied).  The patch now enforces
-mfp16-format=none if the option was used.

In arm-mve-builtins.cc (register_builtin_types, register_vector_type,
register_builtin_tuple_types), this means simply removing the early
exits.  However, for this to work, we need to update
arm_vector_mode_supported_p, so that vector floating-point types are
always defined, and __fp16 must always be registered by
arm_init_fp16_builtins (as it is the base type for vectors of
float16_t.  Another side effect is that the declaration of float16_t
and float32_t typedefs is now unconditional.

The new tests verify that:
- we emit an error if the code tries to use floating-point intrinsics
  and the target does not have the floating-point extension
- we emit the expected code when activating the floating-point
  expected via a pragma
- we emit the expected code when the target supports floating-point
  (no pragma needed)
- we apply -mfp16-format=none where we used to default to ieee

An update is needed in g++.target/arm/mve/general-c++/nomve_fp_1.c,
because the error message now correctly uses float16x8_t instead of
void as return type.

The patch removes gcc.target/arm/fp16-compile-none-1.c which tests
that using __fp16 produces an error with -mfp16-format=none, since it
is no longer the case.

gcc/ChangeLog:

PR target/117814
* config/arm/arm-builtins.cc (arm_init_fp16_builtins): Always
register __fp16 type.
* config/arm/arm-mve-builtins.cc (register_builtin_tuple_types):
Remove special handling when TARGET_HAVE_MVE_FLOAT is false.
(register_vector_type): Likewise.
(register_builtin_tuple_types): Likewise.
* config/arm/arm-opts.h (arm_fp16_format_type): Add
ARM_FP16_FORMAT_DEFAULT.
* config/arm/arm.cc (arm_vector_mode_supported_p): Accept
floating-point vector modes even if TARGET_HAVE_MVE_FLOAT is
false.
(arm_option_reconfigure_globals): Apply ARM_FP16_FORMAT_NONE if
requested.
* config/arm/arm.opt (mfp16-format): Default to
ARM_FP16_FORMAT_DEFAULT.
* config/arm/arm_mve_types.h (float16_t, float32_t): Define
unconditionally.
* doc/extend.texi (Half-precision Floating-point): __fp16 is now
always available on arm.  More x86 paragraph closer to the rest of
the x86 information.
* doc/sourcebuild.texi (ARM-specific attributes): Document
arm_v8_1m_mve_nofp_ok.

gcc/testsuite/ChangeLog:

PR target/117814
* gcc.target/arm/mve/intrinsics/pr117814-f16.c: New test.
* gcc.target/arm/mve/intrinsics/pr117814-2-f16.c: New test.
* gcc.target/arm/mve/intrinsics/pr117814-3-f16.c: New test.
* gcc.target/arm/mve/intrinsics/pr117814-4-f16.c: New test.
* gcc.target/arm/mve/intrinsics/pr117814-f32.c: New test.
* gcc.target/arm/mve/intrinsics/pr117814-2-f32.c: New test.
* gcc.target/arm/mve/intrinsics/pr117814-3-f32.c: New test.
* gcc.target/arm/fp16-compile-none-1.c: Delete.
* g++.target/arm/mve/general-c++/nomve_fp_1.c: Fix expected error
message.
* lib/target-supports.exp
(check_effective_target_arm_v8_1m_mve_nofp_ok_nocache): New.
(check_effective_target_arm_v8_1m_mve_nofp_ok): New.
(add_options_for_arm_v8_1m_mve_nofp): New.
---
 gcc/config/arm/arm-builtins.cc|  4 +-
 gcc/config/arm/arm-mve-builtins.cc| 22 +
 gcc/config/arm/arm-opts.h |  1 +
 gcc/config/arm/arm.cc | 14 +++---
 gcc/config/arm/arm.opt|  2 +-
 gcc/config/arm/arm_mve_types.h|  2 -
 gcc/doc/extend.texi   | 29 ++-
 gcc/doc/sourcebuild.texi  |  6 +++
 

Re: [PATCH 1/2] LoongArch: Fix wrong code with _alsl_reversesi_extended

2025-01-22 Thread Lulu Cheng



在 2025/1/22 下午5:21, Xi Ruoyao 写道:

On Wed, 2025-01-22 at 10:53 +0800, Xi Ruoyao wrote:

On Wed, 2025-01-22 at 10:37 +0800, Lulu Cheng wrote:

在 2025/1/22 上午8:49, Xi Ruoyao 写道:

The second source register of this insn cannot be the same as the
destination register.

gcc/ChangeLog:

* config/loongarch/loongarch.md
(_alsl_reversesi_extended): Add '&' to the destination
register constraint and append '0' to the first source register
constraint to indicate the destination register cannot be same
as the second source register, and change the split condition to
reload_completed so that the insn will be split only after RA in
order to obtain allocated registers that satisfy the above
constraints.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/bitwise-shift-reassoc-clobber.c: New
test.
---
   gcc/config/loongarch/loongarch.md |  6 +++---
   .../loongarch/bitwise-shift-reassoc-clobber.c | 21 +++
   2 files changed, 24 insertions(+), 3 deletions(-)
   create mode 100644 
gcc/testsuite/gcc.target/loongarch/bitwise-shift-reassoc-clobber.c

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 223e2b9f37f..1392325038c 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -3160,13 +3160,13 @@ (define_insn_and_split "_shift_reverse"
   ;; add.w => alsl.w, so implement slli.d + and + add.w => and + alsl.w on
   ;; our own.
   (define_insn_and_split "_alsl_reversesi_extended"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (match_operand:DI 0 "register_operand" "=&r")
    (sign_extend:DI
      (plus:SI
        (subreg:SI
      (any_bitwise:DI
    (ashift:DI
-     (match_operand:DI 1 "register_operand" "r")
+     (match_operand:DI 1 "register_operand" "r0")
      (match_operand:SI 2 "const_immalsl_operand" ""))
    (match_operand:DI 3 "const_int_operand" "i"))
      0)
@@ -3175,7 +3175,7 @@ (define_insn_and_split "_alsl_reversesi_extended"
  && loongarch_reassoc_shift_bitwise (, operands[2], operands[3],
       SImode)"
     "#"
-  "&& true"
+  "&& reload_completed"

I have no problem with this patch.

But, I have always been confused about the use of reload_completed.

I can understand that it needs to be true here, but I don't quite
understand the following:

```

(define_insn_and_split "*zero_extendsidi2_internal"
    [(set (match_operand:DI 0 "register_operand" "=r,r,r,r")
  (zero_extend:DI (match_operand:SI 1 "nonimmediate_operand"
"r,m,ZC,k")))]
    "TARGET_64BIT"
    "@
     bstrpick.d\t%0,%1,31,0
     ld.wu\t%0,%1
     #
     ldx.wu\t%0,%1"
    "&& reload_completed

I don't really understand it either... It is here since day 1 LoongArch
support was added and I've never had enough courage to hack this part.

I pushed the 1st patch as an emergency wrong-code fix now.  The 2nd
patch can wait until we figure out the best way to support the fusion.
I agree. I also compared the previous assembly code when 
bstrpick_alsl_paired was not deleted to analyze this problem.




[PATCH 2/5] LoongArch: Don't use "+" for atomic_{load, store} "m" constraint

2025-01-22 Thread Xi Ruoyao
Atomic load does not modify the memory.  Atomic store does not read the
memory, thus we can use "=" instead.

gcc/ChangeLog:

* config/loongarch/sync.md (atomic_load): Remove "+" for
the memory operand.
(atomic_store): Use "=" instead of "+" for the memory
operand.
---
 gcc/config/loongarch/sync.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md
index 75b134cd853..15413e4df1b 100644
--- a/gcc/config/loongarch/sync.md
+++ b/gcc/config/loongarch/sync.md
@@ -105,7 +105,7 @@ (define_insn "mem_thread_fence_1"
 (define_insn "atomic_load"
   [(set (match_operand:QHWD 0 "register_operand" "=r")
 (unspec_volatile:QHWD
-  [(match_operand:QHWD 1 "memory_operand" "+m")
+  [(match_operand:QHWD 1 "memory_operand" "m")
(match_operand:SI 2 "const_int_operand")];; 
model
   UNSPEC_ATOMIC_LOAD))]
   ""
@@ -142,7 +142,7 @@ (define_insn "atomic_load"
 
 ;; Implement atomic stores with amoswap.  Fall back to fences for atomic loads.
 (define_insn "atomic_store"
-  [(set (match_operand:QHWD 0 "memory_operand" "+m")
+  [(set (match_operand:QHWD 0 "memory_operand" "=m")
 (unspec_volatile:QHWD
   [(match_operand:QHWD 1 "reg_or_0_operand" "rJ")
(match_operand:SI 2 "const_int_operand")]  ;; model
-- 
2.48.1



[PATCH 5/5] LoongArch: Remove "b 3f" instruction if unneeded

2025-01-22 Thread Xi Ruoyao
This instruction is used to skip an redundant barrier if -mno-ld-seq-sa
or the memory model requires a barrier on failure.  But with -mld-seq-sa
and other memory models the barrier may be nonexisting at all, and we
should remove the "b 3f" instruction as well.

The implementation uses a new operand modifier "%T" to output a comment
marker if the operand is a memory order for which the barrier won't be
generated.  "%T", and also "%t", are not really used before and the code
for them in loongarch_print_operand_reloc is just some MIPS legacy.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_print_operand_reloc):
Make "%T" output a comment marker if the operand is a memory
order for which the barrier won't be generated; remove "%t".
* config/loongarch/sync.md (atomic_cas_value_strong): Add
%T before "b 3f".
(atomic_cas_value_cmp_and_7_): Likewise.
---
 gcc/config/loongarch/loongarch.cc | 19 ---
 gcc/config/loongarch/sync.md  |  4 ++--
 2 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index e9978370e8c..341a92bc942 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -6183,9 +6183,7 @@ loongarch_print_operand_reloc (FILE *file, rtx op, bool 
hi64_part,
'Q'  Print R_LARCH_RELAX for TLS IE.
'r'  Print address 12-31bit relocation associated with OP.
'R'  Print address 32-51bit relocation associated with OP.
-   'T' Print 'f' for (eq:CC ...), 't' for (ne:CC ...),
- 'z' for (eq:?I ...), 'n' for (ne:?I ...).
-   't' Like 'T', but with the EQ/NE cases reversed
+   'T' Print a comment marker if %G outputs nothing.
'u' Print a LASX register.
'v' Print the insn size suffix b, h, w or d for vector modes V16QI, V8HI,
  V4SI, V2SI, and w, d for vector modes V4SF, V2DF respectively.
@@ -6264,6 +6262,13 @@ loongarch_print_operand (FILE *file, rtx op, int letter)
fputs ("dbar\t0x700", file);
   break;
 
+case 'T':
+  if (!loongarch_cas_failure_memorder_needs_acquire (
+   memmodel_from_int (INTVAL (op)))
+ && ISA_HAS_LD_SEQ_SA)
+   fprintf (file, "%s", ASM_COMMENT_START);
+  break;
+
 case 'h':
   if (code == HIGH)
op = XEXP (op, 0);
@@ -6342,14 +6347,6 @@ loongarch_print_operand (FILE *file, rtx op, int letter)
 false /* lo_reloc */);
   break;
 
-case 't':
-case 'T':
-  {
-   int truth = (code == NE) == (letter == 'T');
-   fputc ("zfnt"[truth * 2 + FCC_REG_P (REGNO (XEXP (op, 0)))], file);
-  }
-  break;
-
 case 'V':
   if (CONST_VECTOR_P (op))
{
diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md
index 67cf9f47e5a..881d216aed0 100644
--- a/gcc/config/loongarch/sync.md
+++ b/gcc/config/loongarch/sync.md
@@ -267,7 +267,7 @@ (define_insn "atomic_cas_value_strong"
   output_asm_insn ("or%i3\t%5,$zero,%3", operands);
   output_asm_insn ("sc.\t%5,%1", operands);
   output_asm_insn ("beqz\t%5,1b", operands);
-  output_asm_insn ("b\t3f", operands);
+  output_asm_insn ("%T4b\t3f", operands);
   output_asm_insn ("2:", operands);
   output_asm_insn ("%G4", operands);
   output_asm_insn ("3:", operands);
@@ -411,7 +411,7 @@ (define_insn "atomic_cas_value_cmp_and_7_"
 "or%i5\\t%7,%7,%5\\n\\t"
 "sc.\\t%7,%1\\n\\t"
 "beq\\t$zero,%7,1b\\n\\t"
-"b\\t3f\\n\\t"
+"%T6b\\t3f\\n\\t"
 "2:\\n\\t"
 "%G6\\n\\t"
 "3:\\n\\t";
-- 
2.48.1



[PATCH 1/5] LoongArch: (NFC) Remove atomic_optab and use amop instead

2025-01-22 Thread Xi Ruoyao
They are the same.

gcc/ChangeLog:

* config/loongarch/sync.md (atomic_optab): Remove.
(atomic_): Change atomic_optab to amop.
(atomic_fetch_): Likewise.
---
 gcc/config/loongarch/sync.md | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md
index fd8d732dd67..75b134cd853 100644
--- a/gcc/config/loongarch/sync.md
+++ b/gcc/config/loongarch/sync.md
@@ -35,8 +35,6 @@ (define_c_enum "unspec" [
 ])
 
 (define_code_iterator any_atomic [plus ior xor and])
-(define_code_attr atomic_optab
-  [(plus "add") (ior "or") (xor "xor") (and "and")])
 
 ;; This attribute gives the format suffix for atomic memory operations.
 (define_mode_attr amo [(QI "b") (HI "h") (SI "w") (DI "d")])
@@ -175,7 +173,7 @@ (define_insn "atomic_store"
 }
   [(set (attr "length") (const_int 12))])
 
-(define_insn "atomic_"
+(define_insn "atomic_"
   [(set (match_operand:GPR 0 "memory_operand" "+ZB")
(unspec_volatile:GPR
  [(any_atomic:GPR (match_dup 0)
@@ -197,7 +195,7 @@ (define_insn "atomic_add"
   "amadd%A2.\t$zero,%z1,%0"
   [(set (attr "length") (const_int 4))])
 
-(define_insn "atomic_fetch_"
+(define_insn "atomic_fetch_"
   [(set (match_operand:GPR 0 "register_operand" "=&r")
(match_operand:GPR 1 "memory_operand" "+ZB"))
(set (match_dup 1)
-- 
2.48.1



[PATCH 0/5] LoongArch: Atomic operation clean-up and micro-optimization

2025-01-22 Thread Xi Ruoyao
Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

Xi Ruoyao (5):
  LoongArch: (NFC) Remove atomic_optab and use amop instead
  LoongArch: Don't use "+" for atomic_{load,store} "m" constraint
  LoongArch: Allow using bstrins for masking the address in
atomic_test_and_set
  LoongArch: Don't emit overly-restrictive barrier for LL-SC loops
  LoongArch: Remove "b 3f" instruction if unneeded

 gcc/config/loongarch/loongarch.cc | 19 ++---
 gcc/config/loongarch/sync.md  | 46 ++-
 2 files changed, 29 insertions(+), 36 deletions(-)

-- 
2.48.1



Re: [Patch, fortran] PR96087 - [12/13/14/15 Regression] ICE in gfc_get_symbol_decl, at fortran/trans-decl.c:1575

2025-01-22 Thread Andre Vehreschild
Hi Paul,

the patch looks reasonable to me. Ok for mainline.

Just a side-thought: Could it be possible, that the for-loop in trans-decl does
not find the result? Would an assert after the loop at least give a hint, where
something went wrong? That's just from reading the code, so if you think that
can not happen, feel free to commit w/o the assert.

Regards,
Andre

On Wed, 22 Jan 2025 14:03:15 +
Paul Richard Thomas  wrote:

> Hi All,
>
> This patch fixes a double ICE arising from confusion between the dummy
> symbol arising from a module function/subroutine interface and the module
> procedure itself. In both cases, use of the name is unambiguous, as
> explained in the ChangeLog. The testcase contains both the original and the
> variant in comment 1.
>
> Regtests OK with FC41/x86_64 - OK for mainline and later backporting?
>
> Paul
>
> Fortran: Regression- fix ICE at fortran/trans-decl.c:1575 [PR96087]
>
> 2025-01-22  Paul Thomas  
>
> gcc/fortran
> PR fortran/96087
> * trans-decl.cc (gfc_get_symbol_decl): If a dummy is missing a
> backend decl, it is likely that it has come from a module proc
> interface. Look for the formal symbol by name in the containing
> proc and use its backend decl.
> * trans-expr.cc (gfc_apply_interface_mapping_to_expr): For the
> same reason, match the name, rather than the symbol address to
> perform the mapping.
>
> gcc/testsuite/
> PR fortran/96087
> * gfortran.dg/pr96087.f90: New test.


--
Andre Vehreschild * Email: vehre ad gmx dot de


[PATCH 3/5] LoongArch: Allow using bstrins for masking the address in atomic_test_and_set

2025-01-22 Thread Xi Ruoyao
We can use bstrins for masking the address here.  As people are already
working on LA32R (which lacks bstrins instructions), for future-proofing
we check whether (const_int -4) is an and_operand and force it into an
register if not.

gcc/ChangeLog:

* config/loongarch/sync.md (atomic_test_and_set): Use bstrins
for masking the address if possible.
---
 gcc/config/loongarch/sync.md | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md
index 15413e4df1b..5c2eb34f4d4 100644
--- a/gcc/config/loongarch/sync.md
+++ b/gcc/config/loongarch/sync.md
@@ -359,12 +359,13 @@ (define_expand "atomic_test_and_set"
   rtx mem = operands[1];
   rtx model = operands[2];
   rtx addr = force_reg (Pmode, XEXP (mem, 0));
-  rtx tmp_reg = gen_reg_rtx (Pmode);
-  rtx zero_reg = gen_rtx_REG (Pmode, 0);
-
+  rtx mask = gen_int_mode (-4, Pmode);
   rtx aligned_addr = gen_reg_rtx (Pmode);
-  emit_move_insn (tmp_reg, gen_rtx_PLUS (Pmode, zero_reg, GEN_INT (-4)));
-  emit_move_insn (aligned_addr, gen_rtx_AND (Pmode, addr, tmp_reg));
+
+  if (!and_operand (mask, Pmode))
+mask = force_reg (Pmode, mask);
+
+  emit_move_insn (aligned_addr, gen_rtx_AND (Pmode, addr, mask));
 
   rtx aligned_mem = change_address (mem, SImode, aligned_addr);
   set_mem_alias_set (aligned_mem, 0);
-- 
2.48.1



[PATCH 4/5] LoongArch: Don't emit overly-restrictive barrier for LL-SC loops

2025-01-22 Thread Xi Ruoyao
For LL-SC loops, if the atomic operation has succeeded, the SC
instruction always imply a full barrier, so the barrier we manually
inserted only needs to take the account for the failure memorder, not
the success memorder (the barrier is skipped with "b 3f" on success
anyway).

Note that if we use the AMCAS instructions, we indeed need to consider
both the success memorder an the failure memorder deciding if "_db"
suffix is needed.  Thus the semantics of atomic_cas_value_strong
and atomic_cas_value_strong_amcas start to be different.  To
prevent the compiler from being too clever, use a different unspec code
for AMCAS instructions.

gcc/ChangeLog:

* config/loongarch/sync.md (UNSPEC_COMPARE_AND_SWAP_AMCAS): New
UNSPEC code.
(atomic_cas_value_strong): NFC, update the comment to note
we only need to consider failure memory order.
(atomic_cas_value_strong_amcas): Use
UNSPEC_COMPARE_AND_SWAP_AMCAS instead of
UNSPEC_COMPARE_AND_SWAP.
(atomic_compare_and_swap): Pass failure memorder to
gen_atomic_cas_value_strong.
(atomic_compare_and_swap): Pass failure memorder to
gen_atomic_cas_value_cmp_and_7_si.
---
 gcc/config/loongarch/sync.md | 21 +
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md
index 5c2eb34f4d4..67cf9f47e5a 100644
--- a/gcc/config/loongarch/sync.md
+++ b/gcc/config/loongarch/sync.md
@@ -21,6 +21,7 @@
 
 (define_c_enum "unspec" [
   UNSPEC_COMPARE_AND_SWAP
+  UNSPEC_COMPARE_AND_SWAP_AMCAS
   UNSPEC_COMPARE_AND_SWAP_ADD
   UNSPEC_COMPARE_AND_SWAP_SUB
   UNSPEC_COMPARE_AND_SWAP_AND
@@ -238,7 +239,7 @@ (define_insn "atomic_cas_value_strong"
(set (match_dup 1)
(unspec_volatile:GPR [(match_operand:GPR 2 "reg_or_0_operand" "rJ")
  (match_operand:GPR 3 "reg_or_0_operand" "rJ")
- (match_operand:SI 4 "const_int_operand")]  ;; 
mod_s
+ (match_operand:SI 4 "const_int_operand")]  ;; 
mod_f
 UNSPEC_COMPARE_AND_SWAP))
(clobber (match_scratch:GPR 5 "=&r"))]
   ""
@@ -286,8 +287,8 @@ (define_insn "atomic_cas_value_strong_amcas"
(set (match_dup 1)
(unspec_volatile:QHWD [(match_operand:QHWD 2 "reg_or_0_operand" "rJ")
   (match_operand:QHWD 3 "reg_or_0_operand" "rJ")
-  (match_operand:SI 4 "const_int_operand")]  ;; 
mod_s
-UNSPEC_COMPARE_AND_SWAP))]
+  (match_operand:SI 4 "const_int_operand")]  ;; mod
+UNSPEC_COMPARE_AND_SWAP_AMCAS))]
   "ISA_HAS_LAMCAS"
   "ori\t%0,%z2,0\n\tamcas%A4.\t%0,%z3,%1"
   [(set (attr "length") (const_int 8))])
@@ -316,16 +317,14 @@ (define_expand "atomic_compare_and_swap"
   && is_mm_release (memmodel_base (INTVAL (mod_s
 mod_s = GEN_INT (MEMMODEL_ACQ_REL);
 
-  operands[6] = mod_s;
-
   if (ISA_HAS_LAMCAS)
 emit_insn (gen_atomic_cas_value_strong_amcas (operands[1], 
operands[2],
 operands[3], 
operands[4],
-operands[6]));
+mod_s));
   else
 emit_insn (gen_atomic_cas_value_strong (operands[1], operands[2],
  operands[3], operands[4],
- operands[6]));
+ mod_f));
 
   rtx compare = operands[1];
   if (operands[3] != const0_rtx)
@@ -399,7 +398,7 @@ (define_insn "atomic_cas_value_cmp_and_7_"
  (match_operand:GPR 3 "reg_or_0_operand" "rJ")
  (match_operand:GPR 4 "reg_or_0_operand"  "rJ")
  (match_operand:GPR 5 "reg_or_0_operand"  "rJ")
- (match_operand:SI 6 "const_int_operand")] ;; model
+ (match_operand:SI 6 "const_int_operand")] ;; mod_f
 UNSPEC_COMPARE_AND_SWAP))
(clobber (match_scratch:GPR 7 "=&r"))]
   ""
@@ -443,18 +442,16 @@ (define_expand "atomic_compare_and_swap"
   && is_mm_release (memmodel_base (INTVAL (mod_s
 mod_s = GEN_INT (MEMMODEL_ACQ_REL);
 
-  operands[6] = mod_s;
-
   if (ISA_HAS_LAMCAS)
 emit_insn (gen_atomic_cas_value_strong_amcas (operands[1], 
operands[2],
   operands[3], operands[4],
-  operands[6]));
+  mod_s));
   else
 {
   union loongarch_gen_fn_ptrs generator;
   generator.fn_7 = gen_atomic_cas_value_cmp_and_7_si;
   loongarch_expand_atomic_qihi (generator, operands[1], operands[2],
-   operands[3], operands[4], operands[6]);
+   operands[3], operands[

Re: [PATCH v3] c++: fix wrong-code with constexpr prvalue opt [PR118396]

2025-01-22 Thread Jason Merrill

On 1/21/25 7:04 PM, Marek Polacek wrote:

On Tue, Jan 21, 2025 at 11:00:13AM -0500, Jason Merrill wrote:

On 1/21/25 9:54 AM, Jason Merrill wrote:

On 1/20/25 5:58 PM, Marek Polacek wrote:

@@ -9087,7 +9092,9 @@ cxx_eval_outermost_constant_expr (tree t, bool
allow_non_constant,
   return r;
     else if (non_constant_p && TREE_CONSTANT (r))
   r = mark_non_constant (r);
-  else if (non_constant_p)
+  else if (non_constant_p
+   /* Check we are not trying to return the wrong type.  */
+   || !same_type_ignoring_top_level_qualifiers_p (type,
TREE_TYPE (r)))
   return t;


Actually, I guess we also want to give an error if !allow_non_constant.


Like so?

(It would not have triggered before pre-r15-7103 because there we're called
from maybe_constant_init, thus allow_non_constant=true.)

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
This patch adds an error in a !allow_non_constant case when the
initializer/object types don't match.

PR c++/118396

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_outermost_constant_expr): Add an error call
when !allow_non_constant.
---
  gcc/cp/constexpr.cc | 14 +++---
  1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 9f950ffed74..85d469b055e 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -9092,11 +9092,19 @@ cxx_eval_outermost_constant_expr (tree t, bool 
allow_non_constant,
  return r;
else if (non_constant_p && TREE_CONSTANT (r))
  r = mark_non_constant (r);
-  else if (non_constant_p
-  /* Check we are not trying to return the wrong type.  */
-  || !same_type_ignoring_top_level_qualifiers_p (type, TREE_TYPE (r)))
+  else if (non_constant_p)
  return t;
  
+  /* Check we are not trying to return the wrong type.  */

+  if (!same_type_ignoring_top_level_qualifiers_p (type, TREE_TYPE (r)))
+{
+  /* If so, this is not a constant expression.  */
+  if (!allow_non_constant)
+   error ("%qE is not a constant expression because it initializes "
+  "the wrong object", t);


Let's print the types as well.  Maybe "because it initializes a %qT 
rather than %T"?


OK with that change.


+  return t;
+}
+
if (should_unshare)
  r = unshare_expr (r);
  


base-commit: 16d778239397b2f70a1e0680c0b82ae6ee98fe9e




[Patch, fortran] PR96087 - [12/13/14/15 Regression] ICE in gfc_get_symbol_decl, at fortran/trans-decl.c:1575

2025-01-22 Thread Paul Richard Thomas
Hi All,

This patch fixes a double ICE arising from confusion between the dummy
symbol arising from a module function/subroutine interface and the module
procedure itself. In both cases, use of the name is unambiguous, as
explained in the ChangeLog. The testcase contains both the original and the
variant in comment 1.

Regtests OK with FC41/x86_64 - OK for mainline and later backporting?

Paul

Fortran: Regression- fix ICE at fortran/trans-decl.c:1575 [PR96087]

2025-01-22  Paul Thomas  

gcc/fortran
PR fortran/96087
* trans-decl.cc (gfc_get_symbol_decl): If a dummy is missing a
backend decl, it is likely that it has come from a module proc
interface. Look for the formal symbol by name in the containing
proc and use its backend decl.
* trans-expr.cc (gfc_apply_interface_mapping_to_expr): For the
same reason, match the name, rather than the symbol address to
perform the mapping.

gcc/testsuite/
PR fortran/96087
* gfortran.dg/pr96087.f90: New test.
diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc
index 4ae22a5584d..97bb0a41858 100644
--- a/gcc/fortran/trans-decl.cc
+++ b/gcc/fortran/trans-decl.cc
@@ -1722,6 +1722,21 @@ gfc_get_symbol_decl (gfc_symbol * sym)
 	sym->backend_decl = DECL_CHAIN (sym->backend_decl);
 	}
 
+  /* Automatic array indices in module procedures need the backend_decl
+	 to be extracted from the procedure formal arglist.  */
+  if (sym->attr.dummy && !sym->backend_decl)
+	{
+	  gfc_formal_arglist *f;
+	  for (f = sym->ns->proc_name->formal; f; f = f->next)
+	{
+	  gfc_symbol *fsym = f->sym;
+	  if (strcmp (sym->name, fsym->name))
+		continue;
+	  sym->backend_decl = fsym->backend_decl;
+	  break;
+	 }
+	}
+
   /* Dummy variables should already have been created.  */
   gcc_assert (sym->backend_decl);
 
diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index bef49d32a58..1bf46c616bb 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -5099,7 +5099,7 @@ gfc_apply_interface_mapping_to_expr (gfc_interface_mapping * mapping,
   /* TODO Find out why the condition on expr->symtree had to be moved into
  the loop rather than being outside it, as originally.  */
   for (sym = mapping->syms; sym; sym = sym->next)
-if (expr->symtree && sym->old == expr->symtree->n.sym)
+if (expr->symtree && !strcmp (sym->old->name, expr->symtree->n.sym->name))
   {
 	if (sym->new_sym->n.sym->backend_decl)
 	  expr->symtree = sym->new_sym;
diff --git a/gcc/testsuite/gfortran.dg/pr96087.f90 b/gcc/testsuite/gfortran.dg/pr96087.f90
new file mode 100644
index 000..6c75d4f0cf2
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr96087.f90
@@ -0,0 +1,46 @@
+! { dg-do run }
+
+module m
+   interface
+  module function f(a, n, b) result(z)
+ integer, intent(in) :: n
+ real :: z(n + 1)
+ real :: a, b
+  end
+   end interface
+contains
+   module procedure f
+  integer :: i
+  do i = 1, size(z)
+z(i) = real(i)
+  end do
+   end procedure
+end
+
+! Comment 1
+module n
+   interface
+  module subroutine g(n, z)
+ integer, intent(in) :: n
+ real :: z(n)
+  end
+   end interface
+contains
+   module procedure g
+  z = 1
+  if (int (sum (z)) /= n) stop 1
+   end procedure
+end
+
+  use m
+  use n
+  real, allocatable :: r(:)
+  integer :: i = 2
+  r = f (1.0, i+1, 2.0)
+  if (any (r .ne. [(real(i), i = 1,4)])) stop 2
+  if (any (f (3.0, 1, 4.0) .ne. [(real(i), i = 1,2)])) stop 3
+
+  r = [(real (i), i = 10,20)]
+  call g (5, r)
+  if (int (sum (r)) /= (sum ([(i, i = 15,20)]) + 5)) stop 4
+end


Re: [PATCH] c++/modules: Fix exporting temploid friends in header units [PR118582]

2025-01-22 Thread Jason Merrill

On 1/22/25 6:30 AM, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?


OK.


-- >8 --

When we started streaming the bit to handle merging of imported temploid
friends in r15-2807, I unthinkingly only streamed it in the
'!state->is_header ()' case.

This patch reworks the streaming logic to ensure that this data is
always streamed, including for unique entities (in case that ever comes
up somehow).  This does make the streaming slightly less efficient, as
functions and types will need an extra byte, but this doesn't appear to
make a huge difference to the size of the resulting module; the 'std'
module on my machine grows by 0.2% from 30671136 to 30730144 bytes.

PR c++/118582

gcc/cp/ChangeLog:

* module.cc (trees_out::decl_value): Always stream
imported_temploid_friends information.
(trees_in::decl_value): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr118582_a.H: New test.
* g++.dg/modules/pr118582_b.H: New test.
* g++.dg/modules/pr118582_c.H: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/module.cc  | 47 +++
  gcc/testsuite/g++.dg/modules/pr118582_a.H | 16 
  gcc/testsuite/g++.dg/modules/pr118582_b.H |  6 +++
  gcc/testsuite/g++.dg/modules/pr118582_c.H |  5 +++
  4 files changed, 49 insertions(+), 25 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/modules/pr118582_a.H
  create mode 100644 gcc/testsuite/g++.dg/modules/pr118582_b.H
  create mode 100644 gcc/testsuite/g++.dg/modules/pr118582_c.H

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 813c1436141..17215594fd3 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -2791,7 +2791,7 @@ static keyed_map_t *keyed_table;
  
  /* Instantiations of temploid friends imported from another module

 need to be attached to the same module as the temploid.  This maps
-   these decls to the temploid they are instantiated them, as there is
+   these decls to the temploid they are instantiated from, as there is
 no other easy way to get this information.  */
  static GTY((cache)) decl_tree_cache_map *imported_temploid_friends;
  
@@ -7961,7 +7961,6 @@ trees_out::decl_value (tree decl, depset *dep)

  }
  
merge_kind mk = get_merge_kind (decl, dep);

-  bool is_imported_temploid_friend = imported_temploid_friends->get (decl);
  
if (CHECKING_P)

  {
@@ -7998,10 +7997,6 @@ trees_out::decl_value (tree decl, depset *dep)
is_attached = true;
  
  	  bits.b (is_attached);

-
- /* Also tell the importer whether this is an imported temploid
-friend, which has implications for merging.  */
- bits.b (is_imported_temploid_friend);
}
  bits.b (dep && dep->has_defn ());
}
@@ -8087,6 +8082,16 @@ trees_out::decl_value (tree decl, depset *dep)
tree container = decl_container (decl);
unsigned tpl_levels = 0;
  
+  /* Also tell the importer whether this is a temploid friend attached

+ to a different module (which has implications for merging), so that
+ importers can reconstruct this information on stream-in.  */
+  if (TREE_CODE (inner) == FUNCTION_DECL || TREE_CODE (inner) == TYPE_DECL)
+{
+  tree* temploid_friend_slot = imported_temploid_friends->get (decl);
+  gcc_checking_assert (!temploid_friend_slot || *temploid_friend_slot);
+  tree_node (temploid_friend_slot ? *temploid_friend_slot : NULL_TREE);
+}
+
{
  auto wmk = make_temp_override (dep_hash->writing_merge_key, true);
  if (decl != inner)
@@ -8182,14 +8187,6 @@ trees_out::decl_value (tree decl, depset *dep)
}
  }
  
-  if (is_imported_temploid_friend)

-{
-  /* Write imported temploid friends so that importers can reconstruct
-this information on stream-in.  */
-  tree* slot = imported_temploid_friends->get (decl);
-  tree_node (*slot);
-}
-
bool is_typedef = false;
if (!type && TREE_CODE (inner) == TYPE_DECL)
  {
@@ -8266,7 +8263,6 @@ trees_in::decl_value ()
  {
int tag = 0;
bool is_attached = false;
-  bool is_imported_temploid_friend = false;
bool has_defn = false;
unsigned mk_u = u ();
if (mk_u >= MK_hwm || !merge_kind_name[mk_u])
@@ -8287,10 +8283,7 @@ trees_in::decl_value ()
{
  bits_in bits = stream_bits ();
  if (!(mk & MK_template_mask) && !state->is_header ())
-   {
- is_attached = bits.b ();
- is_imported_temploid_friend = bits.b ();
-   }
+   is_attached = bits.b ();
  
  	  has_defn = bits.b ();

}
@@ -8385,6 +8378,12 @@ trees_in::decl_value ()
tree container = decl_container ();
unsigned tpl_levels = 0;
  
+  /* If this is an imported temploid friend, get the owning decl its

+ attachment is determined by (or NULL_TREE otherwise).  */
+  tree temploid_friend = NULL_TREE;
+  if (TREE_CODE (inner) == FUNCTION_DECL

Re: [GCC16 stage 1][RFC][PATCH 0/3]extend "counted_by" attribute to pointer fields of structures

2025-01-22 Thread Qing Zhao



> On Jan 22, 2025, at 11:22, Martin Uecker  wrote:
> 
> 
> Hello Michael,
> 
> Am Mittwoch, dem 22.01.2025 um 16:54 +0100 schrieb Michael Matz:
>> On Wed, 22 Jan 2025, Martin Uecker wrote:
>> 
> So you do not need to look further.  But maybe I am missing something 
> else.
 
 Like ...
 
>> Note further that you may have '{ .y[1][3].z }', which is still not a 
>> designation, but an expression under your proposal, whereas
>> '{ .y[1][3].z = 1 }' would remain a designation.  This shows that you 
>> now need arbitrary look-ahead to disambiguate the two.  A Very Bad Idea.
 
 ... this?
>>> 
>>> In .y[1][3].z  after .y you can decide whether y is a member of the
>>> struct being initialized.  If it is, it is a designator and if not
>>> it must be an expression.
>> 
>> If y is not a member it must be an expression, true.  But if it's a member 
>> you don't know, it may be a designation or an expression.
> 
> In an initializer I know all the members.

I am not familiar with the parser, so, I am a little confused about the 
following:

Suppose we have:

struct foo {
  int z;
  float f;
}

struct bar {
  char *array __attribute__ ((counted_by (.y[1][3].z + 4)));
  struct foo y[5][10];
}

So, in the above, when parsing the above expression inside counted_by, can the 
current parser be easily to be extended to parse it?

thanks.

Qing
> 
> Martin 
> 
> 



[PATCH] ipa-cp: Perform operations in the appropriate types (PR 118097)

2025-01-22 Thread Martin Jambor
Hi,

one of the testcases from PR 118097 and the one from PR 118535 show
that the fix to PR 118138 was incomplete.  We must not only make sure
that (intermediate) results of operations performed by IPA-CP are
fold_converted to the type of the destination formal parameter but we
also must decouple the these types from the ones in which operations
are performed.

This patch does that, even though we do not store or stream the
operation types, instead we simply limit ourselves to tcc_comparisons
and operations for which the first operand and the result are of the
same type as determined by expr_type_first_operand_type_p.  If we
wanted to go beyond these, we would indeed need to store/stream the
respective operation type.

ipa_value_from_jfunc needs an additional check that res_type is not
NULL because it is not called just from within IPA-CP (where we know
we have a destination lattice slot belonging to a defined parameter)
but also from inlining, ipa-fnsummary and ipa-modref where it is used
to examine a call to a function with variadic arguments and we do not
have types for the unknown parameters.  But we cannot really work with
those or estimate any benefits when it comes to them, so ignoring them
should be OK.

Even after this patch, ipa_get_jf_arith_result has a parameter called
res_type in which it performs operations for aggregate jump functions,
where we do not allow type conversions when constucting the jump
functions and the type is the type of the stored data.  In GCC 16, we
could relax this and allow conversions like for scalars.

Bootstrapped, LTO-bootstrapped and tested on x86_64-linux.  OK for
master?

Thanks,

Honza


gcc/ChangeLog:

2025-01-20  Martin Jambor  

PR ipa/118097
* ipa-cp.cc (ipa_get_jf_arith_result): Adjust comment.
(ipa_get_jf_pass_through_result): Removed.
(ipa_value_from_jfunc): Use directly ipa_get_jf_arith_result, do
not specify operation type but make sure we check and possibly
convert the result.
(get_val_across_arith_op): Remove the last parameter, always pass
NULL_TREE to ipa_get_jf_arith_result in its last argument.
(propagate_vals_across_arith_jfunc): Do not pass res_type to
get_val_across_arith_op.
(propagate_vals_across_pass_through): Add checking assert that
parm_type is not NULL.

gcc/testsuite/ChangeLog:

2025-01-20  Martin Jambor  

PR ipa/118097
* gcc.dg/ipa/pr118097.c: New test.
* gcc.dg/ipa/pr118535.c: Likewise.
* gcc.dg/ipa/ipa-notypes-1.c: Likewise.
---
 gcc/ipa-cp.cc| 46 ++--
 gcc/testsuite/gcc.dg/ipa/ipa-notypes-1.c | 17 +
 gcc/testsuite/gcc.dg/ipa/pr118097.c  | 23 
 gcc/testsuite/gcc.dg/ipa/pr118535.c  | 17 +
 4 files changed, 75 insertions(+), 28 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/ipa-notypes-1.c
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr118097.c
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr118535.c

diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index d89324a0077..68959f2677b 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -1467,11 +1467,10 @@ ipacp_value_safe_for_type (tree param_type, tree value)
 return NULL_TREE;
 }
 
-/* Return the result of a (possibly arithmetic) operation on the constant
-   value INPUT.  OPERAND is 2nd operand for binary operation.  RES_TYPE is
-   the type of the parameter to which the result is passed.  Return
-   NULL_TREE if that cannot be determined or be considered an
-   interprocedural invariant.  */
+/* Return the result of a (possibly arithmetic) operation on the constant value
+   INPUT.  OPERAND is 2nd operand for binary operation.  RES_TYPE is the type
+   in which any operation is to be performed.  Return NULL_TREE if that cannot
+   be determined or be considered an interprocedural invariant.  */
 
 static tree
 ipa_get_jf_arith_result (enum tree_code opcode, tree input, tree operand,
@@ -1513,21 +1512,6 @@ ipa_get_jf_arith_result (enum tree_code opcode, tree 
input, tree operand,
   return res;
 }
 
-/* Return the result of a (possibly arithmetic) pass through jump function
-   JFUNC on the constant value INPUT.  RES_TYPE is the type of the parameter
-   to which the result is passed.  Return NULL_TREE if that cannot be
-   determined or be considered an interprocedural invariant.  */
-
-static tree
-ipa_get_jf_pass_through_result (struct ipa_jump_func *jfunc, tree input,
-   tree res_type)
-{
-  return ipa_get_jf_arith_result (ipa_get_jf_pass_through_operation (jfunc),
- input,
- ipa_get_jf_pass_through_operand (jfunc),
- res_type);
-}
-
 /* Return the result of an ancestor jump function JFUNC on the constant value
INPUT.  Return NULL_TREE if that cannot be determined.  */
 
@@ -1595,7 +1579,14 @@ ipa_value_from_jfunc (class ipa_node_params *info

Re: [PATCH] builtins: Store unspecified value to *exp for inf/nan [PR114877]

2025-01-22 Thread Jeff Law




On 1/22/25 2:48 AM, Jakub Jelinek wrote:

Hi!

The fold_builtin_frexp folding for NaN/Inf just returned the first argument
with evaluating second arguments side-effects, rather than storing something
to what the second argument points to.

The PR argues that the C standard requires the function to store something
there but what exactly is stored is unspecified, so not storing there
anything can result in UB if the value isn't initialized and is read later.

glibc and newlib store there 0, musl apparently doesn't store anything.

The following patch stores there zero (or would you prefer storing there
some other value, 42, INT_MAX, INT_MIN, etc.?; zero is cheapest to form
in assembly though) and adjusts the test so that it
doesn't rely on not storing there anything but instead checks for
-Wmaybe-uninitialized warning to find out that something has been stored
there.
Unfortunately I had to disable the NaN tests for -O0, while we can fold
__builtin_isnan (__builtin_nan ("")) at compile time, we can't fold
__builtin_isnan ((i = 0, __builtin_nan (""))) at compile time.
fold_builtin_classify uses just tree_expr_nan_p and if that isn't true
(because expr is a COMPOUND_EXPR with tree_expr_nan_p on the second arg),
it does
   arg = builtin_save_expr (arg);
   return fold_build2_loc (loc, UNORDERED_EXPR, type, arg, arg);
and that isn't folded at -O0 further, as we wrap it into SAVE_EXPR and
nothing propagates the NAN to the comparison.
I think perhaps tree_expr_nan_p etc. could have case COMPOUND_EXPR:
added and recurse on the second argument, but that feels like stage1
material to me if we want to do that at all.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2025-01-22  Jakub Jelinek  

PR middle-end/114877
* builtins.cc (fold_builtin_frexp): Handle rvc_nan and rvc_inf cases
like rvc_zero, return passed in arg and set *exp = 0.

* gcc.dg/torture/builtin-frexp-1.c: Add -Wmaybe-uninitialized as
dg-additional-options.
(bar): New function.
(TESTIT_FREXP2): Rework the macro so that it doesn't test whether
nothing has been stored to what the second argument points to, but
instead that something has been stored there, whatever it is.
(main): Temporarily don't enable the nan tests for -O0.
I think storing zero is the way to go.  I can't see a compelling reason 
for any other value.


OK for the trunk.



Jeff


[patch,avr] ad PR117726: Tweak 32-bit logical shifts of 25...30 for -Oz

2025-01-22 Thread Georg-Johann Lay

As it turns out, logical 32-bit shifts with an offset of 25..30 can
be performed in 7 instructions or less.  This beats the 7 instruc-
tions required for the default code of a shift loop.
Plus, with zero overhead, these cases can be 3-operand.

This is only relevant for -Oz because with -Os, 3op shifts are
split with -msplit-bit-shift (which is not performed with -Oz).

Passes without new regressions.  Ok for trunk?

Johann

--

AVR: PR117726 - Tweak 32-bit logical shifts of 25...30 for -Oz.

As it turns out, logical 32-bit shifts with an offset of 25..30 can
be performed in 7 instructions or less.  This beats the 7 instruc-
tions required for the default code of a shift loop.
Plus, with zero overhead, these cases can be 3-operand.

This is only relevant for -Oz because with -Os, 3op shifts are
split with -msplit-bit-shift (which is not performed with -Oz).

PR target/117726
gcc/
* config/avr/avr.cc (avr_ld_regno_p): New function.
(ashlsi3_out) [case 25,26,27,28,29,30]: Handle and tweak.
(lshrsi3_out): Same.
(avr_rtx_costs_1) [SImode, ASHIFT, LSHIFTRT]: Adjust costs.
* config/avr/avr.md (ashlsi3, *ashlsi3, *ashlsi3_const):
Add "r,r,C4L" alternative.
(lshrsi3, *lshrsi3, *lshrsi3_const): Add "r,r,C4R" alternative.
* config/avr/constraints.md (C4R, C4L): New,
gcc/testsuite/
* gcc.target/avr/torture/avr-torture.exp (AVR_TORTURE_OPTIONS):
Turn one option variant into -Oz.diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index e5a5aa34ec0..8628a438ab5 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -418,6 +418,15 @@ avr_adiw_reg_p (rtx reg)
 }
 
 
+/* Return true iff REGNO is in R16...R31.  */
+
+static bool
+avr_ld_regno_p (int regno)
+{
+  return TEST_HARD_REG_CLASS (LD_REGS, regno);
+}
+
+
 static bool
 ra_in_progress ()
 {
@@ -7397,17 +7406,20 @@ ashlsi3_out (rtx_insn *insn, rtx operands[], int *plen)
 {
   if (CONST_INT_P (operands[2]))
 {
+  int off = INTVAL (operands[2]);
   int reg0 = true_regnum (operands[0]);
   int reg1 = true_regnum (operands[1]);
   bool reg1_unused_after = reg_unused_after (insn, operands[1]);
-
+  bool scratch_p = (GET_CODE (PATTERN (insn)) == PARALLEL
+			&& XVECLEN (PATTERN (insn), 0) == 3
+			&& REG_P (operands[3]));
   if (plen)
 	*plen = 0;
 
-  switch (INTVAL (operands[2]))
+  switch (off)
 	{
 	default:
-	  if (INTVAL (operands[2]) < 32)
+	  if (off < 32)
 	break;
 
 	  return AVR_HAVE_MOVW
@@ -7461,11 +7473,58 @@ ashlsi3_out (rtx_insn *insn, rtx operands[], int *plen)
 			   "mov %D0,%B1"  CR_TAB
 			   "clr %B0"  CR_TAB
 			   "clr %A0", operands, plen, 4);
+	case 30:
+	  if (AVR_HAVE_MUL && scratch_p)
+	return avr_asm_len ("ldi %3,1<<6"   CR_TAB
+"mul %3,%A1"CR_TAB
+"mov %D0,r0"CR_TAB
+"clr __zero_reg__"  CR_TAB
+"clr %C0"   CR_TAB
+"clr %B0"   CR_TAB
+"clr %A0", operands, plen, 7);
+	  // Fallthrough
+
+	case 28:
+	case 29:
+	  {
+	const bool ld_reg0_p = avr_ld_regno_p (reg0 + 3); // %D0
+	const bool ld_reg1_p = avr_ld_regno_p (reg1 + 0); // %A1
+	if (ld_reg0_p
+		|| (ld_reg1_p && reg1_unused_after)
+		|| scratch_p)
+	  {
+		if (ld_reg0_p)
+		  avr_asm_len ("mov %D0,%A1"CR_TAB
+			   "swap %D0"   CR_TAB
+			   "andi %D0,0xf0", operands, plen, 3);
+		else if (ld_reg1_p && reg1_unused_after)
+		  avr_asm_len ("swap %A1"   CR_TAB
+			   "andi %A1,0xf0"  CR_TAB
+			   "mov %D0,%A1", operands, plen, 3);
+		else
+		  avr_asm_len ("mov %D0,%A1"CR_TAB
+			   "swap %D0"   CR_TAB
+			   "ldi %3,0xf0"CR_TAB
+			   "and %D0,%3", operands, plen, 4);
+		for (int i = 28; i < off; ++i)
+		  avr_asm_len ("lsl %D0", operands, plen, 1);
+		return avr_asm_len ("clr %C0"  CR_TAB
+"clr %B0"  CR_TAB
+"clr %A0", operands, plen, 3);
+	  }
+	  }
+	  // Fallthrough
+
 	case 24:
-	  return avr_asm_len ("mov %D0,%A1"  CR_TAB
-			  "clr %C0"  CR_TAB
+	case 25:
+	case 26:
+	case 27:
+	  avr_asm_len ("mov %D0,%A1", operands, plen, 1);
+	  for (int i = 24; i < off; ++i)
+	avr_asm_len ("lsl %D0", operands, plen, 1);
+	  return avr_asm_len ("clr %C0"  CR_TAB
 			  "clr %B0"  CR_TAB
-			  "clr %A0", operands, plen, 4);
+			  "clr %A0", operands, plen, 3);
 	case 31:
 	  return AVR_HAVE_MOVW
 	? avr_asm_len ("bst %A1,0"CR_TAB
@@ -8298,17 +8357,20 @@ lshrsi3_out (rtx_insn *insn, rtx operands[], int *plen)
 {
   if (CONST_INT_P (operands[2]))
 {
+  int off = INTVAL (operands[2]);
   int reg0 = true_regnum (operands[0]);
   int reg1 = true_regnum (operands[1]);
   bool reg1_unused_after = reg_unused_after (insn, operands[1]);
-
+  bool scratch_p = (GET_CODE (PATTERN (insn)) == PARALLEL
+			&& XVECLEN (PATTERN (insn), 0) == 3
+			&& REG_P (operands[3]));
   if (plen)
 	*plen = 0;
 
-  switch (INTVAL (operands[2]))
+  switch (off)

Re: [GCC16 stage 1][RFC][PATCH 0/3]extend "counted_by" attribute to pointer fields of structures

2025-01-22 Thread Martin Uecker


Hi Quin,

sorry, another  idea I noted down some time ago which I would like
to mention. 

> > 
> > - use it only in limited contexts where you do not need to know
> >   the type (e.g. this works for goto labels) or for a basic
> >   counted_by attribute that only takes an identifier as we have it now.

A version of this is to allow taking the address of the member, but
give it the type void*.  The user can then convert it to the right type
(and we could warn later if the cast goes to the wrong type).

struct foo {
  int z;
  float f;
}

typedef struct counter_t[5][10];

struct bar {
  char *array __attribute__ ((counted_by ( (*(counter_t)&.y)[1][3].z + 4 );
  struct foo y[5][10];
}

This could also be used together with the function idea:

struct bar {
  char *array __attribute__ (( counted_by ( bar_count(&.y) ));
  struct foo y[5][10];
}


This would be simple to implement and do the job.

Martin



Re: [GCC16 stage 1][RFC][PATCH 0/3]extend "counted_by" attribute to pointer fields of structures

2025-01-22 Thread Martin Uecker
Am Mittwoch, dem 22.01.2025 um 16:37 + schrieb Qing Zhao:
> 
> > On Jan 22, 2025, at 11:22, Martin Uecker  wrote:
> > 
> > 
> > Hello Michael,
> > 
> > Am Mittwoch, dem 22.01.2025 um 16:54 +0100 schrieb Michael Matz:
> > > On Wed, 22 Jan 2025, Martin Uecker wrote:
> > > 
> > > > > > So you do not need to look further.  But maybe I am missing 
> > > > > > something 
> > > > > > else.
> > > > > 
> > > > > Like ...
> > > > > 
> > > > > > > Note further that you may have '{ .y[1][3].z }', which is still 
> > > > > > > not a 
> > > > > > > designation, but an expression under your proposal, whereas
> > > > > > > '{ .y[1][3].z = 1 }' would remain a designation.  This shows that 
> > > > > > > you 
> > > > > > > now need arbitrary look-ahead to disambiguate the two.  A Very 
> > > > > > > Bad Idea.
> > > > > 
> > > > > ... this?
> > > > 
> > > > In .y[1][3].z  after .y you can decide whether y is a member of the
> > > > struct being initialized.  If it is, it is a designator and if not
> > > > it must be an expression.
> > > 
> > > If y is not a member it must be an expression, true.  But if it's a 
> > > member 
> > > you don't know, it may be a designation or an expression.
> > 
> > In an initializer I know all the members.
> 
> I am not familiar with the parser, so, I am a little confused about the 
> following:
> 
> Suppose we have:
> 
> struct foo {
>   int z;
>   float f;
> }
> 
> struct bar {
>   char *array __attribute__ ((counted_by (.y[1][3].z + 4)));
>   struct foo y[5][10];
> }
> 
> So, in the above, when parsing the above expression inside counted_by, can the
> current parser be easily to be extended to parse it?

No, I don't think this can be done easily. The issue is that you do
not know the declaration for y because it hasn't been parsed yet. 

If you forward reference some struct member, you have several
possibilities:

- use it only in limited contexts where you do not need to know
  the type (e.g. this works for goto labels) or for a basic
  counted_by attribute that only takes an identifier as we have it now.

- simply assume it has a certain type (size_t as is proposed in the 
  WG14 paper Joseph mentioned) and fail later if it does not.


Both options would rule the construct above (but there could be
workarounds).  Other alternatives are:

- you have same kind of forward declaration (as we have for
  parameters  as GNU extension). In the context of C, this is the
  cleanest solution but either requires forward declaring the
  full struct (which can be done in C23) or new syntax for only
  forward declaring the member.

- or you use some delayed parsing where you store away the tokens
  and parse it later when all structure members are done. I think
  this is a highly problematic approach for a variety of reasons.


Martin


> 
> thanks.
> 
> Qing
> > 
> > Martin 
> > 
> > 
> 



[PATCH] [ifcombine] out-of-bounds bitfield refs can trap [PR118514]

2025-01-22 Thread Alexandre Oliva
On Jan 21, 2025, Richard Biener  wrote:

> So - your fix looks almost good, I'd adjust it to

>> +case BIT_FIELD_REF:
>> +  if (DECL_P (TREE_OPERAND (expr, 0))
>> + && !bit_field_ref_in_bounds_p (expr))
>> +   return true;
>> +  /* Fall through.  */

> OK if that works.

It does.  But my intuition isn't happy with this.  I don't get why only
DECLs need to be checked here, and why we should check the type size
rather than the decl size if we're only checking decls.

I have another patch coming up that doesn't raise concerns for me, so
I'll hold off from installing the above, in case you also prefer the
other one.


[ifcombine] out-of-bounds bitfield refs can trap [PR118514]

Check that BIT_FIELD_REFs of DECLs are in range before deciding they
don't trap.

Check that a replacement bitfield load is as trapping as the replaced
load.


for  gcc/ChangeLog

PR tree-optimization/118514
* tree-eh.cc (bit_field_ref_in_bounds_p): New.
(tree_could_trap_p) : Call it.
* gimple-fold.cc (make_bit_field_load): Check trapping status
of replacement load against original load.

for  gcc/testsuite/ChangeLog

PR tree-optimization/118514
* gcc.dg/field-merge-23.c: New.
---
 gcc/gimple-fold.cc|5 +
 gcc/testsuite/gcc.dg/field-merge-23.c |   19 +++
 gcc/tree-eh.cc|   29 -
 3 files changed, 52 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/field-merge-23.c

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index 3c971a29ef045..01c4d076af26c 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -7859,6 +7859,11 @@ make_bit_field_load (location_t loc, tree inner, tree 
orig_inner, tree type,
   gimple *new_stmt = gsi_stmt (i);
   if (gimple_has_mem_ops (new_stmt))
gimple_set_vuse (new_stmt, reaching_vuse);
+  gcc_checking_assert (! (gimple_assign_load_p (point)
+ && gimple_assign_load_p (new_stmt))
+  || (tree_could_trap_p (gimple_assign_rhs1 (point))
+  == tree_could_trap_p (gimple_assign_rhs1
+(new_stmt;
 }
 
   gimple_stmt_iterator gsi = gsi_for_stmt (point);
diff --git a/gcc/testsuite/gcc.dg/field-merge-23.c 
b/gcc/testsuite/gcc.dg/field-merge-23.c
new file mode 100644
index 0..d60f76206ebea
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/field-merge-23.c
@@ -0,0 +1,19 @@
+/* { dg-do run } */
+/* { dg-options "-O" } */
+
+/* PR tree-optimization/118514 */
+
+/* Check that we don't pull optimized references that could trap out of
+   loops.  */
+
+int a, c = 1;
+unsigned char b[1], d;
+int main() {
+  while (a || !c) {
+signed char e = b[10];
+d = e < 0 || b[10] > 1;
+if (d)
+  __builtin_abort ();
+  }
+  return 0;
+}
diff --git a/gcc/tree-eh.cc b/gcc/tree-eh.cc
index 1033b124e4df3..27a4b2b5b1665 100644
--- a/gcc/tree-eh.cc
+++ b/gcc/tree-eh.cc
@@ -2646,6 +2646,29 @@ range_in_array_bounds_p (tree ref)
   return true;
 }
 
+/* Return true iff EXPR, a BIT_FIELD_REF, accesses a bit range that is known to
+   be in bounds for the referred operand type.  */
+
+static bool
+bit_field_ref_in_bounds_p (tree expr)
+{
+  tree size_tree;
+  poly_uint64 size_max, min, wid, max;
+
+  size_tree = TYPE_SIZE (TREE_TYPE (TREE_OPERAND (expr, 0)));
+  if (!size_tree || !poly_int_tree_p (size_tree, &size_max))
+return false;
+
+  min = bit_field_offset (expr);
+  wid = bit_field_size (expr);
+  max = min + wid;
+  if (maybe_lt (max, min)
+  || maybe_lt (size_max, max))
+return false;
+
+  return true;
+}
+
 /* Return true if EXPR can trap, as in dereferencing an invalid pointer
location or floating point arithmetic.  C.f. the rtl version, may_trap_p.
This routine expects only GIMPLE lhs or rhs input.  */
@@ -2688,10 +2711,14 @@ tree_could_trap_p (tree expr)
  restart:
   switch (code)
 {
+case BIT_FIELD_REF:
+  if (DECL_P (TREE_OPERAND (expr, 0)) && !bit_field_ref_in_bounds_p (expr))
+   return true;
+  /* Fall through.  */
+
 case COMPONENT_REF:
 case REALPART_EXPR:
 case IMAGPART_EXPR:
-case BIT_FIELD_REF:
 case VIEW_CONVERT_EXPR:
 case WITH_SIZE_EXPR:
   expr = TREE_OPERAND (expr, 0);


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: [PATCH] testsuite: Only run test if alarm is available

2025-01-22 Thread Mike Stump
On Jan 19, 2025, at 12:47 PM, Torbjorn SVENSSON  
wrote:
> 
> On 2025-01-19 21:20, Andrew Pinski wrote:
>> On Sun, Jan 19, 2025 at 12:17 PM Torbjörn SVENSSON
>>  wrote:
>>> 
>>> Ok for trunk?
>>> 
>>> --
>>> 
>>> Most baremetal toolchains will not have an implementation for alarm and
>>> sigaction as they are target specific.
>>> For arm-none-eabi with newlib, function signatures are exposed, but
>>> there is no implmentation and thus the test cases causes a undefined
>>> symbol link error.
>>> 
>>> gcc/testsuite/ChangeLog:
>>> 
>>> * gcc.dg/pr78185.c: Remove dg-do and replace with
>>> with dg-require-effective-target of signal and alarm.
>>> * gcc.dg/pr116906-1.c: Likewise.
>>> * gcc.dg/pr116906-1.c: Likewise.
>>> * gcc.dg/vect/pr101145inf.c: Use effective-target alarm.
>>> * gcc.dg/vect/pr101145inf_1.c: Likewise.
>>> * lib/target-supports.exp(check_effective_target_alarm): New.
>>> 
>>> gcc/ChangeLog:
>>> 
>>> * doc/sourcebuild.texi (Effective-Target Keywords): Document
>>> 'alarm'.
>>> 
>>> Signed-off-by: Torbjörn SVENSSON 
>>> ---
>>>  gcc/doc/sourcebuild.texi  |  3 +++
>>>  gcc/testsuite/gcc.dg/pr116906-1.c |  3 ++-
>>>  gcc/testsuite/gcc.dg/pr116906-2.c |  3 ++-
>>>  gcc/testsuite/gcc.dg/pr78185.c|  3 ++-
>>>  gcc/testsuite/gcc.dg/vect/pr101145inf.c   |  1 +
>>>  gcc/testsuite/gcc.dg/vect/pr101145inf_1.c |  1 +
>>>  gcc/testsuite/lib/target-supports.exp | 23 +++
>>>  7 files changed, 34 insertions(+), 3 deletions(-)
>>> 
>>> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
>>> index b5c1b23e527..98ede70f23c 100644
>>> --- a/gcc/doc/sourcebuild.texi
>>> +++ b/gcc/doc/sourcebuild.texi
>>> @@ -2808,6 +2808,9 @@ both scalar and vector modes.
>>>  @subsubsection Environment attributes
>>> 
>>>  @table @code
>>> +@item alarm
>>> +Target supports @code{alarm}.
>>> +
>>>  @item c
>>>  The language for the compiler under test is C.
>>> 
>>> diff --git a/gcc/testsuite/gcc.dg/pr116906-1.c 
>>> b/gcc/testsuite/gcc.dg/pr116906-1.c
>>> index 27b1fdae02b..7187507a60d 100644
>>> --- a/gcc/testsuite/gcc.dg/pr116906-1.c
>>> +++ b/gcc/testsuite/gcc.dg/pr116906-1.c
>>> @@ -1,4 +1,5 @@
>>> -/* { dg-do run { target *-*-linux* *-*-gnu* *-*-uclinux* } } */
>>> +/* { dg-require-effective-target alarm } */
>>> +/* { dg-require-effective-target signal } */
>>>  /* { dg-options "-O2" } */
>>> 
>>>  #include 
>>> diff --git a/gcc/testsuite/gcc.dg/pr116906-2.c 
>>> b/gcc/testsuite/gcc.dg/pr116906-2.c
>>> index 3478771678c..41a352bf837 100644
>>> --- a/gcc/testsuite/gcc.dg/pr116906-2.c
>>> +++ b/gcc/testsuite/gcc.dg/pr116906-2.c
>>> @@ -1,4 +1,5 @@
>>> -/* { dg-do run { target *-*-linux* *-*-gnu* *-*-uclinux* } } */
>>> +/* { dg-require-effective-target alarm } */
>>> +/* { dg-require-effective-target signal } */
>>>  /* { dg-options "-O2 -fno-tree-ch" } */
>>> 
>>>  #include 
>>> diff --git a/gcc/testsuite/gcc.dg/pr78185.c b/gcc/testsuite/gcc.dg/pr78185.c
>>> index d7781b2080f..ada8b1b9f90 100644
>>> --- a/gcc/testsuite/gcc.dg/pr78185.c
>>> +++ b/gcc/testsuite/gcc.dg/pr78185.c
>>> @@ -1,4 +1,5 @@
>>> -/* { dg-do run { target *-*-linux* *-*-gnu* *-*-uclinux* } } */
>>> +/* { dg-require-effective-target alarm } */
>>> +/* { dg-require-effective-target signal } */
>>>  /* { dg-options "-O" } */
>>> 
>>>  #include 
>>> diff --git a/gcc/testsuite/gcc.dg/vect/pr101145inf.c 
>>> b/gcc/testsuite/gcc.dg/vect/pr101145inf.c
>>> index aa598875aa5..70aea94b6e0 100644
>>> --- a/gcc/testsuite/gcc.dg/vect/pr101145inf.c
>>> +++ b/gcc/testsuite/gcc.dg/vect/pr101145inf.c
>>> @@ -1,3 +1,4 @@
>>> +/* { dg-require-effective-target alarm } */
>>>  /* { dg-require-effective-target signal } */
>>>  /* { dg-additional-options "-O3" } */
>>>  #include 
>>> diff --git a/gcc/testsuite/gcc.dg/vect/pr101145inf_1.c 
>>> b/gcc/testsuite/gcc.dg/vect/pr101145inf_1.c
>>> index 0465788c3cc..fe008284e1d 100644
>>> --- a/gcc/testsuite/gcc.dg/vect/pr101145inf_1.c
>>> +++ b/gcc/testsuite/gcc.dg/vect/pr101145inf_1.c
>>> @@ -1,3 +1,4 @@
>>> +/* { dg-require-effective-target alarm } */
>>>  /* { dg-require-effective-target signal } */
>>>  /* { dg-additional-options "-O3" } */
>>>  #include 
>>> diff --git a/gcc/testsuite/lib/target-supports.exp 
>>> b/gcc/testsuite/lib/target-supports.exp
>>> index 939ef3a4119..93795a7e27f 100644
>>> --- a/gcc/testsuite/lib/target-supports.exp
>>> +++ b/gcc/testsuite/lib/target-supports.exp
>>> @@ -14255,3 +14255,26 @@ proc add_options_for_nvptx_alloca_ptx { flags } {
>>> 
>>>  return $flags
>>>  }
>>> +
>>> +# Return true if alarm is supported on the target.
>>> +
>>> +proc check_effective_target_alarm { } {
>> Maybe A small optimization is return false here if signal is not supported.
>> That is:
>>   if ![check_effective_target_signal] {
>> return 0
>>   }
> 
> Sure, I'll add that.
> 
> Is it okay with this change?
> Or should I send a v2 with this?

Ok.



Re: [PATCH] c++: Implement for namespace statics CWG 2867 - Order of initialization for structured bindings [PR115769]

2025-01-22 Thread Jason Merrill

On 9/10/24 2:29 PM, Jakub Jelinek wrote:

Hi!

The following patch on top of the
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662507.html
patch adds CWG 2867 support for namespace locals.

Those vars are just pushed into {static,tls}_aggregates chain, then
pruned from those lists, separated by priority and finally emitted into
the corresponding dynamic initialization functions.
The patch adds two flags used on the TREE_LIST nodes in those lists,
one marks the structured binding base variable and/or associated ref
extended temps, another marks the vars initialized using get methods.
The flags are preserved across the pruning, for splitting into by priority
all associated decls of a structured binding using tuple* are forced
into the same priority as the first one, and finally when actually emitting
code, CLEANUP_POINT_EXPRs are disabled in the base initializer(s) and
code from the bases and non-bases together is wrapped into a single
CLEANUP_POINT_EXPR.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Note, I haven't touched the module handling; from what I can see,
prune_vars_needing_no_initialization is destructive to the
{static,tls}_aggregates lists (keeps the list NULL at the end or if there
are errors or it contains some DECL_EXTERNAL decls, keeps in there just
those, not the actual vars that need dynamic initialization) and
the module writing is done only afterwards, so I think it could work
reasonably only if header_module_p ().  Can namespace scope structured
bindings appear in header_module_p () or !header_module_p () modules?
How would a testcase using them look like?  Especially when structured
bindings can't be extern nor templates nor inline there can be just one
definition, so the module would need to be included in a single file, no?
In any case, the patch shouldn't make the modules case any worse, it
just adds TREE_LIST flags which will not be streamed for modules and so
if one can use structured bindings in modules, possibly CWG 2867 would be
not fixed for those but nothing worse than that.

2024-09-10  Jakub Jelinek  

PR c++/115769
gcc/cp/
* cp-tree.h (STATIC_INIT_DECOMP_BASE_P): Define.
(STATIC_INIT_DECOMP_NONBASE_P): Define.
* decl.cc (cp_finish_decl): Mark nodes in {static,tls}_aggregates
with
* decl2.cc (decomp_handle_one_var, decomp_finalize_var_list): New
functions.
(emit_partial_init_fini_fn): Use them.
(prune_vars_needing_no_initialization): Clear
STATIC_INIT_DECOMP_*BASE_P flags if needed.
(partition_vars_for_init_fini): Use same priority for
consecutive STATIC_INIT_DECOMP_*BASE_P vars and propagate
those flags to new TREE_LISTs when possible.  Formatting fix.
(handle_tls_init): Use decomp_handle_one_var and
decomp_finalize_var_list functions.
gcc/testsuite/
* g++.dg/DRs/dr2867-5.C: New test.
* g++.dg/DRs/dr2867-6.C: New test.
* g++.dg/DRs/dr2867-7.C: New test.
* g++.dg/DRs/dr2867-8.C: New test.

--- gcc/cp/cp-tree.h.jj 2024-09-07 09:31:20.601484156 +0200
+++ gcc/cp/cp-tree.h2024-09-09 15:53:44.924112247 +0200
@@ -470,6 +470,7 @@ extern GTY(()) tree cp_global_trees[CPTI
BASELINK_FUNCTIONS_MAYBE_INCOMPLETE_P (in BASELINK)
BIND_EXPR_VEC_DTOR (in BIND_EXPR)
ATOMIC_CONSTR_EXPR_FROM_CONCEPT_P (in ATOMIC_CONSTR)
+  STATIC_INIT_DECOMP_BASE_P (in the TREE_LIST for {static,tls}_aggregates)
 2: IDENTIFIER_KIND_BIT_2 (in IDENTIFIER_NODE)
ICS_THIS_FLAG (in _CONV)
DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P (in VAR_DECL)
@@ -489,6 +490,8 @@ extern GTY(()) tree cp_global_trees[CPTI
IMPLICIT_CONV_EXPR_BRACED_INIT (in IMPLICIT_CONV_EXPR)
PACK_EXPANSION_AUTO_P (in *_PACK_EXPANSION)
contract_semantic (in ASSERTION_, PRECONDITION_, POSTCONDITION_STMT)
+  STATIC_INIT_DECOMP_NONBASE_P (in the TREE_LIST
+   for {static,tls}_aggregates)
 3: IMPLICIT_RVALUE_P (in NON_LVALUE_EXPR or STATIC_CAST_EXPR)
ICS_BAD_FLAG (in _CONV)
FN_TRY_BLOCK_P (in TRY_BLOCK)
@@ -5947,6 +5950,21 @@ extern bool defer_mangling_aliases;
  
  extern bool flag_noexcept_type;
  
+/* True if this TREE_LIST in {static,tls}_aggregates is a for dynamic

+   initialization of namespace scope structured binding base or related
+   extended ref init temps.  Temporaries from the initialization of
+   STATIC_INIT_DECOMP_BASE_P dynamic initializers should be destroyed only
+   after the last STATIC_INIT_DECOMP_NONBASE_P dynamic initializer following
+   it.  */
+#define STATIC_INIT_DECOMP_BASE_P(NODE) \
+  TREE_LANG_FLAG_1 (TREE_LIST_CHECK (NODE))
+
+/* True if this TREE_LIST in {static,tls}_aggregates is a for dynamic
+   initialization of namespace scope structured binding non-base
+   variable using get.  */
+#define STATIC_INIT_DECOMP_NONBASE_P(NODE) \
+  TREE_LANG_FLAG_2 (TREE_LIST_CHECK (NODE))
+
  /* A list of namespace-sco

Re: [PATCH] c++: Implement for namespace statics CWG 2867 - Order of initialization for structured bindings [PR115769]

2025-01-22 Thread Jason Merrill

On 1/3/25 6:39 AM, Jakub Jelinek wrote:

On Thu, Dec 19, 2024 at 11:56:10AM -0500, Jason Merrill wrote:

Looks right.


So, I've tried to construct a testcase, but turns out modules.cc just
doesn't handle structured bindings at namespace scope at all, so it is
premature to try to get the ordering right in it.

The first patch contains the attempt to turn dr2867-5.C testcase into
a module header testcase with a fix for the first ICE I've run into
(but it ICEs immediately elsewhere, the assumption that all namespace
scope decls have names is all around).


How about giving it a name?  That would seem useful for debugging as well.


The second patch is much simpler testcase which doesn't care about ordering
of anything and ICEs without/with the other patches the same.

Should be all !DECL_NAME (decl) MK_unique?  Or just
DECL_DECOMPOSITION_P (decl) && !DECL_NAME (decl)?
Structured bindings can't be redeclared, so I guess they are unique.

Jakub




Re: [PATCH v3 3/6] c++: Fix ABI for lambdas declared in alias templates [PR116568]

2025-01-22 Thread Jason Merrill

On 1/16/25 7:24 PM, Nathaniel Shead wrote:

On Thu, Jan 16, 2025 at 07:09:33PM -0500, Jason Merrill wrote:

On 1/6/25 7:22 AM, Nathaniel Shead wrote:

I'm not 100% sure I've handled this properly, any feedback welcome.
In particular, maybe should I use `DECL_IMPLICIT_TYPEDEF_P` in the
mangling logic instead of `!TYPE_DECL_ALIAS_P`?  They both seem to work
in this case but not sure which would be clearer.

I also looked into trying do a limited form of 'start_decl' before
parsing the type but there were too many circular dependencies for me to
work through, so I think any such changes would have to wait till GCC16
(if they're even possible at all).

-- >8 --

This adds mangling support for lambdas with a mangling context of an
alias template, and gives that context when instantiating such a lambda.


I think this is wrong, an alias is not an entity so it is not a definable
item.

The ABI change proposal also doesn't mention aliases.


Ah right, I see; I'd treated https://eel.is/c++draft/basic.def.odr#1.5
as being any template, but I see now it's "any templated entity" which
is different (since as you say an alias isn't an entity).

In that case, how do you think we should handle class-scope alias
templates of lambdas?  Such a class is surely a definable item, and so
e.g.

   struct S {
 template 
 using X = decltype([]{ return I; });
   };
   using L1 = S::X<1>;
   using L2 = S::X<2>;

should this work and declare L1 to be the same type across TUs?


Hmm, I suppose it should.  So then using the alias template name in the 
mangling is then not because it's a definable item, but just as a 
convenient label to indicate where it appears in the class and what the 
template arguments apply to.


But even with that understanding, many of the changes in this patch to 
make aliases more special seem wrong, we shouldn't need those just to 
push/pop lambda scope?


Jason



Re: [GCC16 stage 1][RFC][PATCH 0/3]extend "counted_by" attribute to pointer fields of structures

2025-01-22 Thread Martin Uecker
Am Mittwoch, dem 22.01.2025 um 17:30 +0100 schrieb Michael Matz:
> Hello,
> 
> On Wed, 22 Jan 2025, Martin Uecker wrote:
> 
> > > > In .y[1][3].z  after .y you can decide whether y is a member of the
> > > > struct being initialized.  If it is, it is a designator and if not
> > > > it must be an expression.
> > > 
> > > If y is not a member it must be an expression, true.  But if it's a 
> > > member 
> > > you don't know, it may be a designation or an expression.
> > 
> > In an initializer I know all the members.
> 
> My sentence was ambiguous :-)  Trying again: When it's a member, and you 
> know it's a member, then you still don't know if it's going to be a 
> designation or an expression.  It can be both.

I guess this depends on what you mean by "it can be".  The rule would simply
be that it is not an expression. The rationale is the following:

If it is inside the initializer of a structure and references a member of the
same structure, then it can not simultaneously be inside the argument to a
counted_by attribute used in the declaration of this structure (which at this
time has already been parsed completely).  So there is no reason to allow it
be interpreted as an expression and the rule I proposed would therefor simply
state that it then is not an expression.

struct { 
  int n;
  int *buf [[counted_by(.n)]]; // this n is in a counted_by
} x = { .n }; // this n can not be in counted_by for the same struct


If we allowed this to be interpreted as an expression, then you could use
it to reference a member during initialization, e.g.

struct { int y; int x; } a = { .y = 1, .x = .y };

but this would be another extension unrelated to counted_by, which I did 
not intend to suggest.


There are other possibilities for disambiguation, we could also simply state
that in initializers at the syntatic position where a designator is allowed,
it is always a designator and not expression, and it then does not reference 
a member of the structure being initialized, it is an error.  Maybe this is
even preferable.


I would like to mention that what clang currently has in a prototype uses a
mechnanism called "delayed parsing" which is essentially infinite lookahead
(in addition to being confusing and incoherent with C language rules). So
IMHO we need something better.

My proposal for the moment would be to only allow very restricted syntactic
forms, and not generic expressions, side stepping all these issues.

Martin


> 
> 
> Ciao,
> Michael.



Re: [GCC16 stage 1][RFC][PATCH 0/3]extend "counted_by" attribute to pointer fields of structures

2025-01-22 Thread Martin Uecker
Am Mittwoch, dem 22.01.2025 um 18:11 +0100 schrieb Martin Uecker:
> Am Mittwoch, dem 22.01.2025 um 16:37 + schrieb Qing Zhao:
> > 
> > > On Jan 22, 2025, at 11:22, Martin Uecker  wrote:
> > > 
> > > 
> > > Hello Michael,
> > > 
> > > Am Mittwoch, dem 22.01.2025 um 16:54 +0100 schrieb Michael Matz:
> > > > On Wed, 22 Jan 2025, Martin Uecker wrote:
> > > > 
> > > > > > > So you do not need to look further.  But maybe I am missing 
> > > > > > > something 
> > > > > > > else.
> > > > > > 
> > > > > > Like ...
> > > > > > 
> > > > > > > > Note further that you may have '{ .y[1][3].z }', which is still 
> > > > > > > > not a 
> > > > > > > > designation, but an expression under your proposal, whereas
> > > > > > > > '{ .y[1][3].z = 1 }' would remain a designation.  This shows 
> > > > > > > > that you 
> > > > > > > > now need arbitrary look-ahead to disambiguate the two.  A Very 
> > > > > > > > Bad Idea.
> > > > > > 
> > > > > > ... this?
> > > > > 
> > > > > In .y[1][3].z  after .y you can decide whether y is a member of the
> > > > > struct being initialized.  If it is, it is a designator and if not
> > > > > it must be an expression.
> > > > 
> > > > If y is not a member it must be an expression, true.  But if it's a 
> > > > member 
> > > > you don't know, it may be a designation or an expression.
> > > 
> > > In an initializer I know all the members.
> > 
> > I am not familiar with the parser, so, I am a little confused about the 
> > following:
> > 
> > Suppose we have:
> > 
> > struct foo {
> >   int z;
> >   float f;
> > }
> > 
> > struct bar {
> >   char *array __attribute__ ((counted_by (.y[1][3].z + 4)));
> >   struct foo y[5][10];
> > }
> > 
> > So, in the above, when parsing the above expression inside counted_by, can 
> > the
> > current parser be easily to be extended to parse it?
> 
> No, I don't think this can be done easily. The issue is that you do
> not know the declaration for y because it hasn't been parsed yet. 
> 
> If you forward reference some struct member, you have several
> possibilities:
> 
> - use it only in limited contexts where you do not need to know
>   the type (e.g. this works for goto labels) or for a basic
>   counted_by attribute that only takes an identifier as we have it now.
> 
> - simply assume it has a certain type (size_t as is proposed in the 
>   WG14 paper Joseph mentioned) and fail later if it does not.
> 
> 
> Both options would rule the construct above (but there could be
> workarounds). 

One of the workarounds could be to instead call a function (which could
be inlined later) and that function takes a pointer to the member. 
Then it does not need to now anything about any member, e.g.:


struct foo {
  int z;
  float f;
}

size_t bar_count(struct bar *);

struct bar {
  char *array __attribute__ ((counted_by (bar_count(__self__;
  struct foo y[5][10];
}

size_t bar_count(struct bar *p)
{
  return p->y[1][3].z +4;
}



>  Other alternatives are:
> 
> - you have same kind of forward declaration (as we have for
>   parameters  as GNU extension). In the context of C, this is the
>   cleanest solution but either requires forward declaring the
>   full struct (which can be done in C23) or new syntax for only
>   forward declaring the member.

A possible C23 workaround could be:

struct foo {
  int z;
  float f;
}

struct bar {
  char *array __attribute__ ((counted_by (*))); 
  // star indicates missing size exppression
  struct foo y[5][10];
}

struct bar { // redeclare with known size
  char *array __attribute__ ((counted_by (.y[1][3].z + 4)));
  struct foo y[5][10];
}


Martin


> 
> - or you use some delayed parsing where you store away the tokens
>   and parse it later when all structure members are done. I think
>   this is a highly problematic approach for a variety of reasons.
> 
> 
> Martin
> 
> 
> > 
> > thanks.
> > 
> > Qing
> > > 
> > > Martin 
> > > 
> > > 
> > 
> 



Re: [PATCH] RISC-V: Disable two-source permutes for now [PR117173].

2025-01-22 Thread Jeff Law




On 1/22/25 12:29 AM, Robin Dapp wrote:

Hi,

after testing on the BPI (4.2% improvement for x264 input 1, 4.4% for input 2)
and the discussion in PR117173 I figured it's best to disable the two-source
permutes by default for now.  We quickly talked about this on the patchwork
call last week.  Conclusion was to just post the patch and discuss here.

The patch adds a parameter "riscv-two-source-permutes" which restores
the old behavior.  It does not add a uarch knob to override the default.

It's still possible to get two-source permutes in a register via
the expander but the obvious constant cases are covered.

Regtested on rv64gcv_zvl512b.

Regards
  Robin

PR target/117173

gcc/ChangeLog:

* config/riscv/riscv-v.cc (shuffle_generic_patterns): Only
support single-source permutes by default.
* config/riscv/riscv.opt: New param "riscv-two-source-permutes".

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Run with two-source permutes.
* lib/target-supports.exp: Ditto.
* gcc.dg/fold-perm-2.c: Ditto.
* gcc.dg/pr54346.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-12.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-6.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-8.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-12.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/pr111848.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/rvv-costmodel-vect.exp: Ditto.
So this isn't a regression, but I can also understand the desire to fix 
this fairly significant performance issue.


IMHO this is extremely safe -- if rejecting two operand permutes here 
were to result in a failure I would argue that we've got a clear bug in 
our other permutation support that could be triggered without this change.


So I'm OK with it as well.  I think giving folks through the end of the 
week to chime in with potential objections would be wise though.


jeff



[PATCH] Fortran: do not evaluate arguments of MAXVAL/MINVAL too often [PR118613]

2025-01-22 Thread Harald Anlauf

Dear all,

while looking at details of a related but slightly different PR, I found
that we did evaluate the arguments to MINLOC/MAXLOC too often in the
inlined version.

The attached patch creates temporaries for array elements where needed,
and ensures that each array element is only touched once.  This required
a minor adjustment for the rank-1 algorithm, which is documented.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald

From d2a52256b3e4817e16a5d222c2fecd7bc66e5613 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Wed, 22 Jan 2025 22:44:39 +0100
Subject: [PATCH] Fortran: do not evaluate arguments of MAXVAL/MINVAL too often
 [PR118613]

	PR fortran/118613

gcc/fortran/ChangeLog:

	* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxval): Adjust algorithm
	for inlined version of MINLOC and MAXLOC so that arguments are only
	evaluted once, and create temporaries where necessary.  Document
	change of algorithm.

gcc/testsuite/ChangeLog:

	* gfortran.dg/maxval_arg_eval_count.f90: New test.
---
 gcc/fortran/trans-intrinsic.cc| 37 +++-
 .../gfortran.dg/maxval_arg_eval_count.f90 | 88 +++
 2 files changed, 121 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/maxval_arg_eval_count.f90

diff --git a/gcc/fortran/trans-intrinsic.cc b/gcc/fortran/trans-intrinsic.cc
index afbec5b2752..51237d0d3be 100644
--- a/gcc/fortran/trans-intrinsic.cc
+++ b/gcc/fortran/trans-intrinsic.cc
@@ -6409,8 +6409,16 @@ gfc_conv_intrinsic_findloc (gfc_se *se, gfc_expr *expr)
   nonempty = false;
   S = from;
   while (S <= to) {
-	if (mask[S]) { nonempty = true; if (a[S] <= limit) goto lab; }
-	S++;
+	if (mask[S]) {
+	  nonempty = true;
+	  if (a[S] <= limit) {
+	limit = a[S];
+	S++;
+	goto lab;
+	  }
+	else
+	  S++;
+	}
   }
   limit = nonempty ? NaN : huge (limit);
   lab:
@@ -6419,7 +6427,15 @@ gfc_conv_intrinsic_findloc (gfc_se *se, gfc_expr *expr)
   at runtime whether array is nonempty or not, rank 1:
   limit = Infinity;
   S = from;
-  while (S <= to) { if (a[S] <= limit) goto lab; S++; }
+  while (S <= to) {
+	if (a[S] <= limit) {
+	  limit = a[S];
+	  S++;
+	  goto lab;
+	  }
+	else
+	  S++;
+  }
   limit = (from <= to) ? NaN : huge (limit);
   lab:
   while (S <= to) { limit = min (a[S], limit); S++; }
@@ -6710,6 +6726,7 @@ gfc_conv_intrinsic_minmaxval (gfc_se * se, gfc_expr * expr, enum tree_code op)
   gfc_copy_loopinfo_to_se (&arrayse, &loop);
   arrayse.ss = arrayss;
   gfc_conv_expr_val (&arrayse, arrayexpr);
+  arrayse.expr = gfc_evaluate_now (arrayse.expr, &arrayse.pre);
   gfc_add_block_to_block (&block, &arrayse.pre);
 
   gfc_init_block (&block2);
@@ -6722,7 +6739,18 @@ gfc_conv_intrinsic_minmaxval (gfc_se * se, gfc_expr * expr, enum tree_code op)
   tmp = fold_build2_loc (input_location, op == GT_EXPR ? GE_EXPR : LE_EXPR,
 			 logical_type_node, arrayse.expr, limit);
   if (lab)
-	ifbody = build1_v (GOTO_EXPR, lab);
+	{
+	  stmtblock_t ifblock;
+	  tree inc_loop;
+	  inc_loop = fold_build2_loc (input_location, PLUS_EXPR,
+  TREE_TYPE (loop.loopvar[0]),
+  loop.loopvar[0], gfc_index_one_node);
+	  gfc_init_block (&ifblock);
+	  gfc_add_modify (&ifblock, limit, arrayse.expr);
+	  gfc_add_modify (&ifblock, loop.loopvar[0], inc_loop);
+	  gfc_add_expr_to_block (&ifblock, build1_v (GOTO_EXPR, lab));
+	  ifbody = gfc_finish_block (&ifblock);
+	}
   else
 	{
 	  stmtblock_t ifblock;
@@ -6816,6 +6844,7 @@ gfc_conv_intrinsic_minmaxval (gfc_se * se, gfc_expr * expr, enum tree_code op)
   gfc_copy_loopinfo_to_se (&arrayse, &loop);
   arrayse.ss = arrayss;
   gfc_conv_expr_val (&arrayse, arrayexpr);
+  arrayse.expr = gfc_evaluate_now (arrayse.expr, &arrayse.pre);
   gfc_add_block_to_block (&block, &arrayse.pre);
 
   /* MIN_EXPR/MAX_EXPR has unspecified behavior with NaNs or
diff --git a/gcc/testsuite/gfortran.dg/maxval_arg_eval_count.f90 b/gcc/testsuite/gfortran.dg/maxval_arg_eval_count.f90
new file mode 100644
index 000..1c1a04819a0
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/maxval_arg_eval_count.f90
@@ -0,0 +1,88 @@
+! { dg-do run }
+!
+! PR fortran/118613 - check argument evaluation count of MAXVAL
+
+program p
+  implicit none
+  integer, parameter :: k = 2
+  integer :: n
+  integer :: i1(k*k), i2(k,k), mm
+  real:: a1(k*k), a2(k,k), mx
+  complex :: c1(k*k), c2(k,k)
+  logical :: m1(k*k), m2(k,k)
+
+  ! prepare mask for masked variants
+  m1 = .true.
+  m2 = .true.
+  i1 = 0
+  i2 = 0
+  a1 = 0.
+  a2 = 0.
+  c1 = 0.
+  c2 = 0.
+
+  ! integer
+  n = 0
+  mm = maxval (h(i1))
+  if (n /= k*k .or. mm /= 0) stop 1
+  n = 0
+  mm = maxval (h(i2))
+  if (n /= k*k .or. mm /= 0) stop 2
+  n = 0
+  mm = maxval (h(i1),m1)
+  if (n /= k*k .or. mm /= 0) stop 3
+  n = 0
+  mm = maxval (h(i2),m2)
+  if (n /= k*k .or. mm /= 0) stop 4
+
+  ! real
+  n = 0
+  mx = maxval (f(a1))
+  if (n /= k*k .or. mx /= 0) st

Re: [PATCH] i386: Append -march=x86-64-v3 to AVX10.2/512 VNNI testcases

2025-01-22 Thread Hongtao Liu
On Wed, Jan 22, 2025 at 11:13 AM Haochen Jiang  wrote:
>
> Hi all,
>
> These two testcases are misses on previous addition for
> -march=x86-64-v3 to silence warning for -march=native tests.
>
> Ok for trunk?
Ok.
>
> Thx,
> Haochen
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/vnniint16-auto-vectorize-4.c: Append
> -march=x86-64-v3.
> * gcc.target/i386/vnniint8-auto-vectorize-4.c: Ditto.
> ---
>  gcc/testsuite/gcc.target/i386/vnniint16-auto-vectorize-4.c | 2 +-
>  gcc/testsuite/gcc.target/i386/vnniint8-auto-vectorize-4.c  | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/i386/vnniint16-auto-vectorize-4.c 
> b/gcc/testsuite/gcc.target/i386/vnniint16-auto-vectorize-4.c
> index beaab189406..1bd1400d1a4 100644
> --- a/gcc/testsuite/gcc.target/i386/vnniint16-auto-vectorize-4.c
> +++ b/gcc/testsuite/gcc.target/i386/vnniint16-auto-vectorize-4.c
> @@ -1,5 +1,5 @@
>  /* { dg-do run } */
> -/* { dg-options "-O2 -mavx10.2-512" } */
> +/* { dg-options "-O2 -march=x86-64-v3 -mavx10.2-512" } */
>  /* { dg-require-effective-target avx10_2_512 } */
>
>  #define N 512
> diff --git a/gcc/testsuite/gcc.target/i386/vnniint8-auto-vectorize-4.c 
> b/gcc/testsuite/gcc.target/i386/vnniint8-auto-vectorize-4.c
> index 70cd80c225b..dafab344d1d 100644
> --- a/gcc/testsuite/gcc.target/i386/vnniint8-auto-vectorize-4.c
> +++ b/gcc/testsuite/gcc.target/i386/vnniint8-auto-vectorize-4.c
> @@ -1,5 +1,5 @@
>  /* { dg-do run } */
> -/* { dg-options "-O2 -mavx10.2-512" } */
> +/* { dg-options "-O2 -march=x86-64-v3 -mavx10.2-512" } */
>  /* { dg-require-effective-target avx10_2_512 } */
>
>  #define N 512
> --
> 2.31.1
>


-- 
BR,
Hongtao


Re: [PATCH v3 2/6] c++: Fix mangling of otherwise unattached class-scope lambdas [PR118245]

2025-01-22 Thread Jason Merrill

On 1/6/25 7:21 AM, Nathaniel Shead wrote:

Something like this should probably be backported to GCC 14 too, since
my change in r14-9232-g3685fae23bb008 inadvertantly caused ICEs that
this fixes.  But without the previous patch this patch will cause ABI
changes, and I'm not sure how easily it would be to divorce those
changes from the fix here.

I suppose probably the true issue is that r14-9232 inadvertantly changed
ABI for lambdas in base classes, and we should just revert the parser.cc
changes for 14.3?  (And accept that it'll regress those modules tests.)


I suppose so.


-- >8 --

This is a step closer to implementing the suggested changes for
https://github.com/itanium-cxx-abi/cxx-abi/pull/85.  Most lambdas
defined within a class should have an extra scope of that class so that
uses across different TUs are properly merged by the linker.  This also
needs to happen during template instantiation.

While I was working on this I found some other cases where the mangling
of lambdas was incorrect and causing issues, notably the testcase
lambda-ctx3.C which currently emits the same mangling for the base class
and member lambdas, causing mysterious assembler errors since r14-9232.
This is also the root cause of PR c++/118245.

One notable case not handled either here or in the ABI is what is
supposed to happen with lambdas declared in alias templates; see
lambda-ctx4.C.  I believe that by the C++ standard, such lambdas should
also dedup across TUs, but this isn't currently implemented (for
class-scope or not).  I wasn't able to work out how to fix the mangling
logic for this case easily so I've just excluded alias templates from
the class-scope mangling rules in template instantiation.


I'm skeptical of the instantiate_template change for other member 
templates as well, since the set of instantiations of such member 
templates might vary between translation units, so the lambda should be 
in the scope of that member template instantiation rather than just the 
class.


And I don't see any other member templates in the testcases?  Like, for

struct S {
  template 
  decltype([]{ return I; }) f() { return {}; };
};

void f(decltype(S().f<24>())*) {}
void f(decltype(S().f<42>())*) {}

how are these lambdas mangled?  Before this patch, the two lambdas have 
arbitrary discriminators in S but they are treated as TU-local so it 
doesn't matter.  After this patch, they get the same number and are no 
longer TU-local, and assembly fails.


As with the member alias template example, I think it would be coherent 
to say the lambdas are TU-local because they're between a template 
parameter scope and a class/function/initializer, as a clarification to 
https://eel.is/c++draft/basic#link-15.2 .  Or we need to mangle them in 
the scope of the member template instantiation even though they aren't 
within the body of the function.


Interestingly, clang mangles them in the scope of S::f, without any 
template arguments.  That seems to have the same problem as just using 
S.  And they aren't treated as TU-local, so the arbitrary discriminators 
are an ABI problem.


Jason



Re: [PATCH v3 4/6] c++: Update mangling of lambdas in expressions

2025-01-22 Thread Jason Merrill

On 1/6/25 7:23 AM, Nathaniel Shead wrote:

https://github.com/itanium-cxx-abi/cxx-abi/pull/85 clarifies that
mangling a lambda expression should use 'L' rather than "tl".  This only
affects C++20 (and later) so no ABI flag is given.


OK.


gcc/cp/ChangeLog:

* mangle.cc (write_expression): Update mangling for lambdas.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-generic-mangle1.C: Update mangling.
* g++.dg/cpp2a/lambda-generic-mangle1a.C: Likewise.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/mangle.cc | 2 +-
  gcc/testsuite/g++.dg/cpp2a/lambda-generic-mangle1.C  | 2 +-
  gcc/testsuite/g++.dg/cpp2a/lambda-generic-mangle1a.C | 2 +-
  3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/cp/mangle.cc b/gcc/cp/mangle.cc
index 9d457e2a2f3..e2e8fb2c71b 100644
--- a/gcc/cp/mangle.cc
+++ b/gcc/cp/mangle.cc
@@ -3769,7 +3769,7 @@ write_expression (tree expr)
 equivalent.
  
  	 So just use the closure type mangling.  */

-  write_string ("tl");
+  write_char ('L');
write_type (LAMBDA_EXPR_CLOSURE (expr));
write_char ('E');
  }
diff --git a/gcc/testsuite/g++.dg/cpp2a/lambda-generic-mangle1.C 
b/gcc/testsuite/g++.dg/cpp2a/lambda-generic-mangle1.C
index 0051307f53d..306959a4f9f 100644
--- a/gcc/testsuite/g++.dg/cpp2a/lambda-generic-mangle1.C
+++ b/gcc/testsuite/g++.dg/cpp2a/lambda-generic-mangle1.C
@@ -6,4 +6,4 @@ struct C {
void f(decltype([](T, auto) { return 0; })) {}
  };
  void g() { C().f({}); }
-// { dg-final { scan-assembler "_ZN1C1fIiEEvDTtlNS_UlT_TL0__E_EEE" } }
+// { dg-final { scan-assembler "_ZN1C1fIiEEvDTLNS_UlT_TL0__E_EEE" } }
diff --git a/gcc/testsuite/g++.dg/cpp2a/lambda-generic-mangle1a.C 
b/gcc/testsuite/g++.dg/cpp2a/lambda-generic-mangle1a.C
index dc7b0125631..b7bd4ecdd46 100644
--- a/gcc/testsuite/g++.dg/cpp2a/lambda-generic-mangle1a.C
+++ b/gcc/testsuite/g++.dg/cpp2a/lambda-generic-mangle1a.C
@@ -7,4 +7,4 @@ struct C {
void f(decltype([](T, auto) { return 0; })) {}
  };
  void g() { C().f({}); }
-// { dg-final { scan-assembler "_ZN1C1fIiEEvDTtlNS_UlT_T_E_EEE" } }
+// { dg-final { scan-assembler "_ZN1C1fIiEEvDTLNS_UlT_T_E_EEE" } }




Re: [PATCH] Fortran: Added support for locality specs in DO CONCURRENT (Fortran 2018/23)

2025-01-22 Thread Damian Rouson
The first error message in my previous email led me to the following
constraint:

“*C1130*  A *variable-name *that appears in a LOCAL or LOCAL_INIT
*locality-spec *shall not have the ALLOCATABLE, INTENT (IN), or OPTIONAL
attribute, shall not be of finalizable type, shall not have an allocatable
ultimate component,...”

My first thought was, "Holy guacamole. That seems like such
a severe limitation that the feature seems almost useless."  Fortuitously,
however, it turns out that the code I sent a little while ago was missing
one important feature from my intended use case: associate.  As shown
below, using associate eliminates the first error, but I'm still confused
by the remaining error message.  Are locality specifiers actually supported
yet?

Damian

% cat locality.f90

program main

  implicit none

  integer pair

  integer :: mini_batch_size=1


  real, allocatable, dimension(:,:) :: a, dcdb



  allocate(a(1,1))

  allocate(dcdb(1,1))


  associate(a_ => a, dcdb_ => dcdb)

do concurrent (pair = 1:mini_batch_size) local(a_) reduce(+: dcdb_)

  a_ = 0.

  dcdb_ = 0.

end do

  end associate


end program


% gfortran locality.f90

*locality.f90:12:71:*


   12 | do concurrent (pair = 1:mini_batch_size) local(a_) reduce(+:
dcdb_)

  |
  *1*

*Error:* Sorry, LOCAL and LOCAL_INIT are not yet supported for ‘*do
concurrent*’ constructs at *(1)*


% gfortran --version

GNU Fortran (GCC) 15.0.1 20250119 (experimental)

Copyright (C) 2025 Free Software Foundation, Inc.

This is free software; see the source for copying conditions.  There is NO

warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


[PATCH] LoongArch: Fix ICE caused by illegal calls to builtin functions [PR118561].

2025-01-22 Thread Lulu Cheng
PR target/118561

gcc/ChangeLog:

* config/loongarch/loongarch-builtins.cc
(loongarch_expand_builtin_lsx_test_branch):
NULL_RTX will not be returned when an error is detected.
(loongarch_expand_builtin): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/pr118561.c: New test.

---
 gcc/config/loongarch/loongarch-builtins.cc| 7 +--
 gcc/testsuite/gcc.target/loongarch/pr118561.c | 9 +
 2 files changed, 14 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118561.c

diff --git a/gcc/config/loongarch/loongarch-builtins.cc 
b/gcc/config/loongarch/loongarch-builtins.cc
index 92d995a916a..1849b35357c 100644
--- a/gcc/config/loongarch/loongarch-builtins.cc
+++ b/gcc/config/loongarch/loongarch-builtins.cc
@@ -2996,7 +2996,10 @@ loongarch_expand_builtin_lsx_test_branch (enum insn_code 
icode, tree exp)
 ops[1].value = force_reg (ops[1].mode, ops[1].value);
 
   if ((cbranch = maybe_gen_insn (icode, 3, ops)) == NULL_RTX)
-error ("failed to expand built-in function");
+{
+  error ("failed to expand built-in function");
+  return const0_rtx;
+}
 
   cmp_result = gen_reg_rtx (SImode);
 
@@ -3036,7 +3039,7 @@ loongarch_expand_builtin (tree exp, rtx target, rtx 
subtarget ATTRIBUTE_UNUSED,
 {
   error_at (EXPR_LOCATION (exp),
"built-in function %qD is not enabled", fndecl);
-  return target;
+  return target ? target : const0_rtx;
 }
 
   switch (d->builtin_type)
diff --git a/gcc/testsuite/gcc.target/loongarch/pr118561.c 
b/gcc/testsuite/gcc.target/loongarch/pr118561.c
new file mode 100644
index 000..81a776eada3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/pr118561.c
@@ -0,0 +1,9 @@
+/* PR target/118561: ICE with -mfpu=none */
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=loongarch64 -mfpu=none" } */
+
+int
+test (void)
+{
+  return __builtin_loongarch_movfcsr2gr (0); /* { dg-error "built-in function 
'__builtin_loongarch_movfcsr2gr' is not enabled" } */
+}
-- 
2.34.1



Re: [PATCH] LoongArch: Fix invalid subregs in xorsign [PR118501]

2025-01-22 Thread Lulu Cheng



在 2025/1/23 上午11:36, Xi Ruoyao 写道:

On Thu, 2025-01-23 at 11:21 +0800, Lulu Cheng wrote:

在 2025/1/22 下午9:26, Xi Ruoyao 写道:

The test case added in r15-7073 now triggers an ICE, indicating we need
the same fix as AArch64.

gcc/ChangeLog:

PR target/118501
* config/loongarch/loongarch.md (@xorsign3): Use
force_lowpart_subreg.
---

Bootstrapped and regtested on loongarch64-linux-gnu, ok for trunk?

LGTM!

Here we try to use force_lowpart_subreg when can_create_pseudo_p () is
true, right?

As this is a define_expand, can_create_pseudo_p should be just true (it
turns to false when the reload pass starts, and reload is far after
expand).


Um, that's not what I meant. What I meant is that in future code 
implementation,


if can_create_pseudo_p is satisfied, we will not use lowpart_subreg but 
use force_lowpart_subreg.





I'm pushing the patch into master now.

   gcc/config/loongarch/loongarch.md | 4 ++--
   1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 0325994ebd6..04a9a79d548 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -1343,8 +1343,8 @@ (define_expand "@xorsign3"
     machine_mode lsx_mode
   = mode == SFmode ? V4SFmode : V2DFmode;
     rtx tmp = gen_reg_rtx (lsx_mode);
-  rtx op1 = lowpart_subreg (lsx_mode, operands[1], mode);
-  rtx op2 = lowpart_subreg (lsx_mode, operands[2], mode);
+  rtx op1 = force_lowpart_subreg (lsx_mode, operands[1], mode);
+  rtx op2 = force_lowpart_subreg (lsx_mode, operands[2], mode);
     emit_insn (gen_xorsign3 (lsx_mode, tmp, op1, op2));
     emit_move_insn (operands[0],
     lowpart_subreg (mode, tmp, lsx_mode));




[PATCH v2] libcpp: Fix incorrect line numbers in large files [PR108900]

2025-01-22 Thread Yash . Shinde
From: Yash Shinde 

This patch addresses an issue in the C preprocessor where incorrect line number 
information is generated when processing
files with a large number of lines. The problem arises from improper handling 
of location intervals in the line map,
particularly when locations exceed LINE_MAP_MAX_LOCATION_WITH_PACKED_RANGES.

By ensuring that the highest location is not decremented if it would move to a 
different ordinary map, this fix resolves
the line number discrepancies observed in certain test cases. This change 
improves the accuracy of line number reporting,
benefiting users relying on precise code coverage and debugging information.

Signed-off-by: Jeremy Bettis 
Signed-off-by: Yash Shinde 
---
 libcpp/files.cc | 8 
 1 file changed, 8 insertions(+)

diff --git a/libcpp/files.cc b/libcpp/files.cc
index 1ed19ca..3e6ca119ad5 100644
--- a/libcpp/files.cc
+++ b/libcpp/files.cc
@@ -1046,6 +1046,14 @@ _cpp_stack_file (cpp_reader *pfile, _cpp_file *file, 
include_type type,
&& type < IT_DIRECTIVE_HWM
&& (pfile->line_table->highest_location
!= LINE_MAP_MAX_LOCATION - 1));
+
+  if (decrement && LINEMAPS_ORDINARY_USED (pfile->line_table))
+{
+  const line_map_ordinary *map = LINEMAPS_LAST_ORDINARY_MAP 
(pfile->line_table);
+  if (map && map->start_location == pfile->line_table->highest_location)
+   decrement = false;
+}
+
   if (decrement)
 pfile->line_table->highest_location--;
 
-- 
2.43.0



Re: [PATCH] LoongArch: Fix invalid subregs in xorsign [PR118501]

2025-01-22 Thread Lulu Cheng



在 2025/1/22 下午9:26, Xi Ruoyao 写道:

The test case added in r15-7073 now triggers an ICE, indicating we need
the same fix as AArch64.

gcc/ChangeLog:

PR target/118501
* config/loongarch/loongarch.md (@xorsign3): Use
force_lowpart_subreg.
---

Bootstrapped and regtested on loongarch64-linux-gnu, ok for trunk?


LGTM!

Here we try to use force_lowpart_subreg when can_create_pseudo_p () is 
true, right?




  gcc/config/loongarch/loongarch.md | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 0325994ebd6..04a9a79d548 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -1343,8 +1343,8 @@ (define_expand "@xorsign3"
machine_mode lsx_mode
  = mode == SFmode ? V4SFmode : V2DFmode;
rtx tmp = gen_reg_rtx (lsx_mode);
-  rtx op1 = lowpart_subreg (lsx_mode, operands[1], mode);
-  rtx op2 = lowpart_subreg (lsx_mode, operands[2], mode);
+  rtx op1 = force_lowpart_subreg (lsx_mode, operands[1], mode);
+  rtx op2 = force_lowpart_subreg (lsx_mode, operands[2], mode);
emit_insn (gen_xorsign3 (lsx_mode, tmp, op1, op2));
emit_move_insn (operands[0],
lowpart_subreg (mode, tmp, lsx_mode));




Re: [PATCH] LoongArch: Fix invalid subregs in xorsign [PR118501]

2025-01-22 Thread Xi Ruoyao
On Thu, 2025-01-23 at 11:21 +0800, Lulu Cheng wrote:
> 
> 在 2025/1/22 下午9:26, Xi Ruoyao 写道:
> > The test case added in r15-7073 now triggers an ICE, indicating we need
> > the same fix as AArch64.
> > 
> > gcc/ChangeLog:
> > 
> > PR target/118501
> > * config/loongarch/loongarch.md (@xorsign3): Use
> > force_lowpart_subreg.
> > ---
> > 
> > Bootstrapped and regtested on loongarch64-linux-gnu, ok for trunk?
> 
> LGTM!
> 
> Here we try to use force_lowpart_subreg when can_create_pseudo_p () is
> true, right?

As this is a define_expand, can_create_pseudo_p should be just true (it
turns to false when the reload pass starts, and reload is far after
expand).

I'm pushing the patch into master now.

> >   gcc/config/loongarch/loongarch.md | 4 ++--
> >   1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/gcc/config/loongarch/loongarch.md 
> > b/gcc/config/loongarch/loongarch.md
> > index 0325994ebd6..04a9a79d548 100644
> > --- a/gcc/config/loongarch/loongarch.md
> > +++ b/gcc/config/loongarch/loongarch.md
> > @@ -1343,8 +1343,8 @@ (define_expand "@xorsign3"
> >     machine_mode lsx_mode
> >   = mode == SFmode ? V4SFmode : V2DFmode;
> >     rtx tmp = gen_reg_rtx (lsx_mode);
> > -  rtx op1 = lowpart_subreg (lsx_mode, operands[1], mode);
> > -  rtx op2 = lowpart_subreg (lsx_mode, operands[2], mode);
> > +  rtx op1 = force_lowpart_subreg (lsx_mode, operands[1], mode);
> > +  rtx op2 = force_lowpart_subreg (lsx_mode, operands[2], mode);
> >     emit_insn (gen_xorsign3 (lsx_mode, tmp, op1, op2));
> >     emit_move_insn (operands[0],
> >     lowpart_subreg (mode, tmp, lsx_mode));
> 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] Fortran: Added support for locality specs in DO CONCURRENT (Fortran 2018/23)

2025-01-22 Thread Damian Rouson
I recently built the master branch of Iain's fork of gcc in order to test
the support for locality specifiers.   I have verified that the branch I
built contains the commit mentioned in this email thread
20b8500cfa522ebe0fcf756d5b32816da7f904dd.  The code below isn't designed to
do anything useful -- just to make sure the syntax works. Could someone
tell me why the second error message indicates that LOCAL is not supported
yet?  Also, the first error message is really interesting because if I'm
really not allowed to have anything allocatable inside LOCAL(), then I'm
going to have to go back to the drawing board regarding my intended use
case.  Any advice?

Damian

% cat locality.f90

program main

  implicit none

  integer pair

  integer :: mini_batch_size=1


  real, allocatable, dimension(:,:) :: a, dcdb



  allocate(a(1,1))

  allocate(dcdb(1,1))


  do concurrent (pair = 1:mini_batch_size) local(a) reduce(+: dcdb)

a = 0.

dcdb = 0.

  end do


end program


% gfortran locality.f90

*locality.f90:8:50:*


8 |   do concurrent (pair = 1:mini_batch_size) local(a) reduce(+: dcdb)

  |  *1*

*Error:* ALLOCATABLE attribute not permitted for ‘*a*’ in LOCAL
locality-spec at *(1)*

*locality.f90:8:67:*


8 |   do concurrent (pair = 1:mini_batch_size) local(a) reduce(+: dcdb)

  |
*1*

*Error:* Sorry, LOCAL and LOCAL_INIT are not yet supported for ‘*do
concurrent*’ constructs at *(1)*


gfortran --version

GNU Fortran (GCC) 15.0.1 20250119 (experimental)

Copyright (C) 2025 Free Software Foundation, Inc.

This is free software; see the source for copying conditions.  There is NO

warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.




On Mon, Jan 13, 2025 at 5:41 PM Jerry D  wrote:

> Committed as:
>
> commit 20b8500cfa522ebe0fcf756d5b32816da7f904dd (HEAD -> master,
> origin/master, origin/HEAD)
> Author: Anuj Mohite 
> Date:   Mon Jan 13 16:28:57 2025 -0800
>
>  Fortran: Add LOCALITY support for DO_CONCURRENT
>
>  This patch provided by Anuj Mohite as part of the GSoC
> project.
>  It is modified slightly by Jerry DeLisle for minor formatting.
>  The patch provides front-end parsing of the LOCALITY specs in
>  DO_CONCURRENT and adds numerous test cases.
>
>  gcc/fortran/ChangeLog:
>
>  * dump-parse-tree.cc (show_code_node):  Updated to use
>  c->ext.concur.forall_iterator instead of
> c->ext.forall_iterator.
>  * frontend-passes.cc (index_interchange): Updated to
>  use c->ext.concur.forall_iterator instead of
> c->ext.forall_iterator.
>  (gfc_code_walker): Likewise.
>  * gfortran.h (enum locality_type): Added new enum for
> locality types
>  in DO CONCURRENT constructs.
>  * match.cc (match_simple_forall): Updated to use
>  new_st.ext.concur.forall_iterator instead of
> new_st.ext.forall_iterator.
>  (gfc_match_forall): Likewise.
>  (gfc_match_do):  Implemented support for matching DO
> CONCURRENT locality
>  specifiers (LOCAL, LOCAL_INIT, SHARED, DEFAULT(NONE), and
> REDUCE).
>  * parse.cc (parse_do_block): Updated to use
>  new_st.ext.concur.forall_iterator instead of
> new_st.ext.forall_iterator.
>  * resolve.cc (struct check_default_none_data): Added struct
>  check_default_none_data.
>  (do_concur_locality_specs_f2023): New function to check
> compliance
>  with F2023's C1133 constraint for DO CONCURRENT.
>  (check_default_none_expr): New function to check DEFAULT(NONE)
>  compliance.
>  (resolve_locality_spec): New function to resolve locality
> specs.
>  (gfc_count_forall_iterators): Updated to use
>  code->ext.concur.forall_iterator.
>  (gfc_resolve_forall): Updated to use
> code->ext.concur.forall_iterator.
>  * st.cc (gfc_free_statement): Updated to free locality
> specifications
>  and use p->ext.concur.forall_iterator.
>  * trans-stmt.cc (gfc_trans_forall_1): Updated to use
>  code->ext.concur.forall_iterator.
>
>  gcc/testsuite/ChangeLog:
>
>  * gfortran.dg/do_concurrent_10.f90: New test.
>  * gfortran.dg/do_concurrent_8_f2018.f90: New test.
>  * gfortran.dg/do_concurrent_8_f2023.f90: New test.
>  * gfortran.dg/do_concurrent_9.f90: New test.
>  * gfortran.dg/do_concurrent_all_clauses.f90: New test.
>  * gfortran.dg/do_concurrent_basic.f90: New test.
>  * gfortran.dg/do_concurrent_constraints.f90: New test.
>  * gfortran.dg/do_concurrent_local_init.f90: New test.
>  * gfortran.dg/do_concurrent_locality_specs.f90: New test.
>  * gfortran.dg/do_concurrent_multiple_reduce.f90: New test.
>  

[PATCH v2] [ifcombine] avoid dropping tree_could_trap_p [PR118514]

2025-01-22 Thread Alexandre Oliva
On Jan 22, 2025, Alexandre Oliva  wrote:

> I have another patch coming up that doesn't raise concerns for me, so
> I'll hold off from installing the above, in case you also prefer the
> other one.

Unlike other access patterns, BIT_FIELD_REFs aren't regarded as
possibly-trapping out of referencing out-of-bounds bits.

So, if decode_field_reference finds a load that could trap, but whose
inner object can't, bail out if it accesses past the inner object's
size.

This may drop some optimizations in VLAs, that could be safe if we
managed to reuse the same pre-existing load for both compares, but
those get optimized later (as in the new testcase).  The cases of most
interest are those that merge separate fields, and they necessarily
involve new BIT_FIELD_REFs loads.

Regstrapped on x86_64-linux-gnu.  Ok to install instead of the other
one?


for  gcc/ChangeLog

PR tree-optimization/118514
* gimple-fold.cc (decode_field_reference): Refuse to consider
BIT_FIELD_REF of non-trapping object if the original load
could trap for being out-of-bounds.
(make_bit_field_ref): Check that replacement loads could trap
as much as the original loads.
(fold_truth_andor_for_ifcombine): Rearrange testing of
reversep, and note that signbit needs not be concerned with
it.  Check that combinable loads have the same trapping
status.
* tree-eh.cc (bit_field_ref_in_bounds_p): New.
(tree_could_trap_p): Call it on DECLs.
* tree-eh.h (bit_field_ref_in_bounds_p): Declare.

for  gcc/testsuite/ChangeLog

PR tree-optimization/118514
* gcc.dg/field-merge-23.c: New.
---
 gcc/gimple-fold.cc|   27 +--
 gcc/testsuite/gcc.dg/field-merge-23.c |   19 +++
 gcc/tree-eh.cc|   30 +-
 gcc/tree-eh.h |1 +
 4 files changed, 70 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/field-merge-23.c

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index da3f505c3fcac..cd9d350349186 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -7686,10 +7686,14 @@ decode_field_reference (tree *pexp, HOST_WIDE_INT 
*pbitsize,
   || bs <= shiftrt
   || offset != 0
   || TREE_CODE (inner) == PLACEHOLDER_EXPR
-  /* Reject out-of-bound accesses (PR79731).  */
-  || (! AGGREGATE_TYPE_P (TREE_TYPE (inner))
- && compare_tree_int (TYPE_SIZE (TREE_TYPE (inner)),
-  bp + bs) < 0)
+  /* Reject out-of-bound accesses (PR79731).  BIT_FIELD_REFs aren't flagged
+as tree_could_trap_p just because of out-of-range bits, so don't even
+try to optimize loads that could trap if they access out-of-range bits
+of an object that could not trap (PR118514).  */
+  || ((! AGGREGATE_TYPE_P (TREE_TYPE (inner))
+  || (load && tree_could_trap_p (gimple_assign_rhs1 (load))
+  && !tree_could_trap_p (inner)))
+ && !bit_field_ref_in_bounds_p (inner, bs, bp))
   || (INTEGRAL_TYPE_P (TREE_TYPE (inner))
  && !type_has_mode_precision_p (TREE_TYPE (inner
 return NULL_TREE;
@@ -7859,6 +7863,11 @@ make_bit_field_load (location_t loc, tree inner, tree 
orig_inner, tree type,
   gimple *new_stmt = gsi_stmt (i);
   if (gimple_has_mem_ops (new_stmt))
gimple_set_vuse (new_stmt, reaching_vuse);
+  gcc_checking_assert (! (gimple_assign_load_p (point)
+ && gimple_assign_load_p (new_stmt))
+  || (tree_could_trap_p (gimple_assign_rhs1 (point))
+  == tree_could_trap_p (gimple_assign_rhs1
+(new_stmt;
 }
 
   gimple_stmt_iterator gsi = gsi_for_stmt (point);
@@ -8223,8 +8232,12 @@ fold_truth_andor_for_ifcombine (enum tree_code code, 
tree truth_type,
   std::swap (rl_loc, rr_loc);
 }
 
+  /* Check that the loads that we're trying to combine have the same vuse and
+ same trapping status.  */
   if ((ll_load && rl_load)
-  ? gimple_vuse (ll_load) != gimple_vuse (rl_load)
+  ? (gimple_vuse (ll_load) != gimple_vuse (rl_load)
+|| (tree_could_trap_p (gimple_assign_rhs1 (ll_load))
+!= tree_could_trap_p (gimple_assign_rhs1 (rl_load
   : (!ll_load != !rl_load))
 return 0;
 
@@ -8277,7 +8290,9 @@ fold_truth_andor_for_ifcombine (enum tree_code code, tree 
truth_type,
   else if (lr_reversep != rr_reversep
   || ! operand_equal_p (lr_inner, rr_inner, 0)
   || ((lr_load && rr_load)
-  ? gimple_vuse (lr_load) != gimple_vuse (rr_load)
+  ? (gimple_vuse (lr_load) != gimple_vuse (rr_load)
+ || (tree_could_trap_p (gimple_assign_rhs1 (lr_load))
+ != tree_could_trap_p (gimple_assign_rhs1 (rr_load
   : (!lr_load != !rr_load)))
 re

Re: [patch,avr] ad PR117726: Tweak 32-bit logical shifts of 25...30 for -Oz

2025-01-22 Thread Denis Chertykov
ср, 22 янв. 2025 г. в 23:53, Georg-Johann Lay :
>
> As it turns out, logical 32-bit shifts with an offset of 25..30 can
> be performed in 7 instructions or less.  This beats the 7 instruc-
> tions required for the default code of a shift loop.
> Plus, with zero overhead, these cases can be 3-operand.
>
> This is only relevant for -Oz because with -Os, 3op shifts are
> split with -msplit-bit-shift (which is not performed with -Oz).
>
> Passes without new regressions.  Ok for trunk?

Ok. Please apply.

Denis


[PATCH] [ifcombine] improve reverse checking and operand swapping

2025-01-22 Thread Alexandre Oliva
On Jan 22, 2025, Alexandre Oliva  wrote:

> I have another patch coming up that doesn't raise concerns for me, so
> I'll hold off from installing the above, in case you also prefer the
> other one.

And here's an unrelated bit that came to mind while working on this, but
that I split out before posting.


Don't reject an ifcombine field-merging opportunity just because the
left-hand operands aren't both reversed, if the second compare needs
to be swapped for operands to match.

Also mention that reversep does NOT affect the turning of range tests
into bit tests.

Regstrapped on x86_64-linux-gnu.  Ok to install?


for  gcc/ChangeLog

* gimple-fold.cc (fold_truth_andor_for_ifcombine): Document
reversep's absence of effects on range tests.  Don't reject
reversep mismatches before trying compare swapping.
---
 gcc/gimple-fold.cc |   10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index 3c971a29ef045..da3f505c3fcac 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -8090,8 +8090,9 @@ fold_truth_andor_for_ifcombine (enum tree_code code, tree 
truth_type,
   return 0;
 }
 
-  /* Prepare to turn compares of signed quantities with zero into
- sign-bit tests.  */
+  /* Prepare to turn compares of signed quantities with zero into sign-bit
+ tests.  We need not worry about *_reversep here for these compare
+ rewrites: loads will have already been reversed before compares.  */
   bool lsignbit = false, rsignbit = false;
   if ((lcode == LT_EXPR || lcode == GE_EXPR)
   && integer_zerop (lr_arg)
@@ -8198,10 +8199,11 @@ fold_truth_andor_for_ifcombine (enum tree_code code, 
tree truth_type,
  the rhs's.  If one is a load and the other isn't, we have to be
  conservative and avoid the optimization, otherwise we could get
  SRAed fields wrong.  */
-  if (volatilep || ll_reversep != rl_reversep)
+  if (volatilep)
 return 0;
 
-  if (! operand_equal_p (ll_inner, rl_inner, 0))
+  if (ll_reversep != rl_reversep
+  || ! operand_equal_p (ll_inner, rl_inner, 0))
 {
   /* Try swapping the operands.  */
   if (ll_reversep != rr_reversep


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


[PATCH v2 3/4] RISC-V: Fix incorrect code gen for scalar signed SAT_SUB [PR117688]

2025-01-22 Thread pan2 . li
From: Pan Li 

This patch would like to fix the wroing code generation for the scalar
signed SAT_SUB.  The input can be QI/HI/SI/DI while the alu like sub
can only work on Xmode.  Unfortunately we don't have sub/add for
non-Xmode like QImode in scalar, thus we need to sign extend to Xmode
to ensure we have the correct value before ALU like sub.  The gen_lowpart
will generate something like lbu which has all zero for highest bits.

For example, when 0xff(-1 for QImode) sub 0x1(1 for QImode), we actually
want to -1 - 1 = -2, but if there is no sign extend like lbu, we will get
0xff - 1 = 0xfe which is incorrect.  Thus, we have to sign extend 0xff(Qmode)
to 0x(assume XImode is DImode) before sub in Xmode.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

PR target/117688

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_expand_sssub): Leverage the helper
riscv_extend_to_xmode_reg with SIGN_EXTEND.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr117688.h: Add test helper macro.
* gcc.target/riscv/pr117688-sub-run-1-s16.c: New test.
* gcc.target/riscv/pr117688-sub-run-1-s32.c: New test.
* gcc.target/riscv/pr117688-sub-run-1-s64.c: New test.
* gcc.target/riscv/pr117688-sub-run-1-s8.c: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv.cc |  4 ++--
 .../gcc.target/riscv/pr117688-sub-run-1-s16.c |  6 ++
 .../gcc.target/riscv/pr117688-sub-run-1-s32.c |  6 ++
 .../gcc.target/riscv/pr117688-sub-run-1-s64.c |  6 ++
 .../gcc.target/riscv/pr117688-sub-run-1-s8.c  |  6 ++
 gcc/testsuite/gcc.target/riscv/pr117688.h | 21 +++
 6 files changed, 47 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s16.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s64.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s8.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 59148bfab91..4999044f198 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -12903,8 +12903,8 @@ riscv_expand_sssub (rtx dest, rtx x, rtx y)
   machine_mode mode = GET_MODE (dest);
   unsigned bitsize = GET_MODE_BITSIZE (mode).to_constant ();
   rtx shift_bits = GEN_INT (bitsize - 1);
-  rtx xmode_x = gen_lowpart (Xmode, x);
-  rtx xmode_y = gen_lowpart (Xmode, y);
+  rtx xmode_x = riscv_extend_to_xmode_reg (x, mode, SIGN_EXTEND);
+  rtx xmode_y = riscv_extend_to_xmode_reg (y, mode, SIGN_EXTEND);
   rtx xmode_minus = gen_reg_rtx (Xmode);
   rtx xmode_xor_0 = gen_reg_rtx (Xmode);
   rtx xmode_xor_1 = gen_reg_rtx (Xmode);
diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s16.c 
b/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s16.c
new file mode 100644
index 000..7b375bb6c85
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s16.c
@@ -0,0 +1,6 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "pr117688.h"
+
+DEFINE_SIGNED_SAT_SUB_RUN(int16_t, INT16_MIN, INT16_MAX)
diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s32.c 
b/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s32.c
new file mode 100644
index 000..ba0e8fc8ea5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s32.c
@@ -0,0 +1,6 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "pr117688.h"
+
+DEFINE_SIGNED_SAT_SUB_RUN(int32_t, INT32_MIN, INT32_MAX)
diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s64.c 
b/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s64.c
new file mode 100644
index 000..c24c549af30
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s64.c
@@ -0,0 +1,6 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "pr117688.h"
+
+DEFINE_SIGNED_SAT_SUB_RUN(int64_t, INT64_MIN, INT64_MAX)
diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s8.c 
b/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s8.c
new file mode 100644
index 000..67f9df179a1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s8.c
@@ -0,0 +1,6 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "pr117688.h"
+
+DEFINE_SIGNED_SAT_SUB_RUN(int8_t, INT8_MIN, INT8_MAX)
diff --git a/gcc/testsuite/gcc.target/riscv/pr117688.h 
b/gcc/testsuite/gcc.target/riscv/pr117688.h
index 1013a8a8012..3b734ce62ed 100644
--- a/gcc/testsuite/gcc.target/riscv/pr117688.h
+++ b/gcc/testsuite/gcc.target/riscv/pr117688.h
@@ -24,4 +24,25 @@
 return 0; \
   }
 
+#define DEFINE_SIGNED_SAT_SUB_RUN(T, MIN, MAX)  \
+  T x, y, result;   

[PATCH v2 2/4] RISC-V: Fix incorrect code gen for scalar signed SAT_ADD [PR117688]

2025-01-22 Thread pan2 . li
From: Pan Li 

This patch would like to fix the wroing code generation for the scalar
signed SAT_ADD.  The input can be QI/HI/SI/DI while the alu like sub
can only work on Xmode.  Unfortunately we don't have sub/add for
non-Xmode like QImode in scalar, thus we need to sign extend to Xmode
to ensure we have the correct value before ALU like add.  The gen_lowpart
will generate something like lbu which has all zero for highest bits.

For example, when 0xff(-1 for QImode) plus 0x2(1 for QImode), we actually
want to -1 + 2 = 1, but if there is no sign extend like lbu, we will get
0xff + 2 = 0x101 which is incorrect.  Thus, we have to sign extend 0xff(Qmode)
to 0x(assume XImode is DImode) before plus in Xmode.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

PR target/117688

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_expand_ssadd): Leverage the helper
riscv_extend_to_xmode_reg with SIGN_EXTEND.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr117688-add-run-1-s16.c: New test.
* gcc.target/riscv/pr117688-add-run-1-s32.c: New test.
* gcc.target/riscv/pr117688-add-run-1-s64.c: New test.
* gcc.target/riscv/pr117688-add-run-1-s8.c: New test.
* gcc.target/riscv/pr117688.h: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv.cc |  4 +--
 .../gcc.target/riscv/pr117688-add-run-1-s16.c |  6 +
 .../gcc.target/riscv/pr117688-add-run-1-s32.c |  6 +
 .../gcc.target/riscv/pr117688-add-run-1-s64.c |  6 +
 .../gcc.target/riscv/pr117688-add-run-1-s8.c  |  6 +
 gcc/testsuite/gcc.target/riscv/pr117688.h | 27 +++
 6 files changed, 53 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s16.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s64.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688.h

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 3b9a2c5500d..59148bfab91 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -12799,8 +12799,8 @@ riscv_expand_ssadd (rtx dest, rtx x, rtx y)
   machine_mode mode = GET_MODE (dest);
   unsigned bitsize = GET_MODE_BITSIZE (mode).to_constant ();
   rtx shift_bits = GEN_INT (bitsize - 1);
-  rtx xmode_x = gen_lowpart (Xmode, x);
-  rtx xmode_y = gen_lowpart (Xmode, y);
+  rtx xmode_x = riscv_extend_to_xmode_reg (x, mode, SIGN_EXTEND);
+  rtx xmode_y = riscv_extend_to_xmode_reg (y, mode, SIGN_EXTEND);
   rtx xmode_sum = gen_reg_rtx (Xmode);
   rtx xmode_dest = gen_reg_rtx (Xmode);
   rtx xmode_xor_0 = gen_reg_rtx (Xmode);
diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s16.c 
b/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s16.c
new file mode 100644
index 000..21ec107cbf1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s16.c
@@ -0,0 +1,6 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "pr117688.h"
+
+DEFINE_SIGNED_SAT_ADD_RUN(int16_t, INT16_MIN, INT16_MAX)
diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s32.c 
b/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s32.c
new file mode 100644
index 000..1f197d1280b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s32.c
@@ -0,0 +1,6 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "pr117688.h"
+
+DEFINE_SIGNED_SAT_ADD_RUN(int32_t, INT32_MIN, INT32_MAX)
diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s64.c 
b/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s64.c
new file mode 100644
index 000..4903bc854d3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s64.c
@@ -0,0 +1,6 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "pr117688.h"
+
+DEFINE_SIGNED_SAT_ADD_RUN(int64_t, INT64_MIN, INT64_MAX)
diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s8.c 
b/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s8.c
new file mode 100644
index 000..a9f2fe7f192
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s8.c
@@ -0,0 +1,6 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "pr117688.h"
+
+DEFINE_SIGNED_SAT_ADD_RUN(int8_t, INT8_MIN, INT8_MAX)
diff --git a/gcc/testsuite/gcc.target/riscv/pr117688.h 
b/gcc/testsuite/gcc.target/riscv/pr117688.h
new file mode 100644
index 000..1013a8a8012
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr117688.h
@@ -0,0 +1,27 @@
+#ifndef HAVE_DEFINED_PR117688_H
+#define HAVE_DEFINED_PR117688_H
+
+#include 
+
+#define DEFINE_SIGNED_SAT_ADD_RUN(T, MIN, MAX)\
+  T x, y, re

[PATCH v2 1/4] RISC-V: Refactor SAT_* operand rtx extend to reg help func [NFC]

2025-01-22 Thread pan2 . li
From: Pan Li 

This patch would like to refactor the helper function of the SAT_*
scalar.  The helper function will convert the define_pattern ops
to the xmode reg for the underlying code-gen.  This patch add
new parameter for ZERO_EXTEND or SIGN_EXTEND if the input is const_int
or the mode is non-Xmode.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_gen_zero_extend_rtx): Rename from ...
(riscv_extend_to_xmode_reg): Rename to and add rtx_code for
zero/sign extend if non-Xmode.
(riscv_expand_usadd): Leverage the renamed function with ZERO_EXTEND.
(riscv_expand_ussub): Ditto.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv.cc | 61 +++
 1 file changed, 36 insertions(+), 25 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 5a3a0504177..3b9a2c5500d 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -12639,17 +12639,29 @@ riscv_get_raw_result_mode (int regno)
   return default_get_reg_raw_mode (regno);
 }
 
-/* Generate a REG rtx of Xmode from the given rtx and mode.
+/* Generate a REG rtx of Xmode from the given rtx and rtx_code.
The rtx x can be REG (QI/HI/SI/DI) or const_int.
The machine_mode mode is the original mode from define pattern.
+   The rtx_code can be ZERO_EXTEND or SIGN_EXTEND.
 
-   If rtx is REG and Xmode, the RTX x will be returned directly.
+   If rtx is REG:
 
-   If rtx is REG and non-Xmode, the zero extended to new REG of Xmode will be
-   returned.
+   1. If rtx Xmode, the RTX x will be returned directly.
+   2. if rtx non-Xmode, the extended to new REG of Xmode will be returned.
 
-   If rtx is const_int, a new REG rtx will be created to hold the value of
-   const_int and then returned.
+   The scalar ALU like add don't support non-Xmode like QI/HI.  Then the
+   gen_lowpart will have problem here.  For example, when we would like
+   to add -1(0xff if QImode) and 2(0x2 if QImode).  The 0xff and 0x2 will
+   be loaded to register for adding.  Aka:
+
+   0xff + 0x2 = 0x101 instead of -1 + 2 = 1.
+
+   Thus we need to sign extend 0xff to 0x if Xmode is DImode
+   for correctness.  Similar the unsigned also need zero extend.
+
+   If rtx is const_int:
+
+   1. A new REG rtx will be created to hold the value of const_int.
 
According to the gccint doc, the constants generated for modes with fewer
bits than in HOST_WIDE_INT must be sign extended to full width.  Thus there
@@ -12667,28 +12679,27 @@ riscv_get_raw_result_mode (int regno)
the REG rtx of Xmode, instead of taking care of these in expand func.  */
 
 static rtx
-riscv_gen_zero_extend_rtx (rtx x, machine_mode mode)
+riscv_extend_to_xmode_reg (rtx x, machine_mode mode, enum rtx_code rcode)
 {
+  gcc_assert (rcode == ZERO_EXTEND || rcode == SIGN_EXTEND);
+
   rtx xmode_reg = gen_reg_rtx (Xmode);
 
-  if (!CONST_INT_P (x))
+  if (CONST_INT_P (x))
 {
   if (mode == Xmode)
-   return x;
-
-  riscv_emit_unary (ZERO_EXTEND, xmode_reg, x);
-  return xmode_reg;
+   emit_move_insn (xmode_reg, x);
+  else
+   {
+ rtx x_reg = gen_reg_rtx (mode);
+ emit_move_insn (x_reg, x);
+ riscv_emit_unary (rcode, xmode_reg, x_reg);
+   }
 }
-
-  if (mode == Xmode)
-emit_move_insn (xmode_reg, x);
+  else if (mode == Xmode)
+return x;
   else
-{
-  rtx reg_x = gen_reg_rtx (mode);
-
-  emit_move_insn (reg_x, x);
-  riscv_emit_unary (ZERO_EXTEND, xmode_reg, reg_x);
-}
+riscv_emit_unary (rcode, xmode_reg, x);
 
   return xmode_reg;
 }
@@ -12709,8 +12720,8 @@ riscv_expand_usadd (rtx dest, rtx x, rtx y)
   machine_mode mode = GET_MODE (dest);
   rtx xmode_sum = gen_reg_rtx (Xmode);
   rtx xmode_lt = gen_reg_rtx (Xmode);
-  rtx xmode_x = riscv_gen_zero_extend_rtx (x, mode);
-  rtx xmode_y = riscv_gen_zero_extend_rtx (y, mode);
+  rtx xmode_x = riscv_extend_to_xmode_reg (x, mode, ZERO_EXTEND);
+  rtx xmode_y = riscv_extend_to_xmode_reg (y, mode, ZERO_EXTEND);
   rtx xmode_dest = gen_reg_rtx (Xmode);
 
   /* Step-1: sum = x + y  */
@@ -12844,8 +12855,8 @@ void
 riscv_expand_ussub (rtx dest, rtx x, rtx y)
 {
   machine_mode mode = GET_MODE (dest);
-  rtx xmode_x = riscv_gen_zero_extend_rtx (x, mode);
-  rtx xmode_y = riscv_gen_zero_extend_rtx (y, mode);
+  rtx xmode_x = riscv_extend_to_xmode_reg (x, mode, ZERO_EXTEND);
+  rtx xmode_y = riscv_extend_to_xmode_reg (y, mode, ZERO_EXTEND);
   rtx xmode_lt = gen_reg_rtx (Xmode);
   rtx xmode_minus = gen_reg_rtx (Xmode);
   rtx xmode_dest = gen_reg_rtx (Xmode);
-- 
2.43.0



[PATCH v2 4/4] RISC-V: Fix incorrect code gen for scalar signed SAT_TRUNC [PR117688]

2025-01-22 Thread pan2 . li
From: Pan Li 

This patch would like to fix the wroing code generation for the scalar
signed SAT_TRUNC.  The input can be QI/HI/SI/DI while the alu like sub
can only work on Xmode.  Unfortunately we don't have sub/add for
non-Xmode like QImode in scalar, thus we need to sign extend to Xmode
to ensure we have the correct value before ALU like add.  The gen_lowpart
will generate something like lbu which has all zero for highest bits.

For example, when 0xff7f(-129 for HImode) trunc to QImode, we actually
want compare -129 to -128, but if there is no sign extend like lbu, we will
compare 0xff7f to 0xff80(assum Xmode is DImode).  Thus, we have
to sign extend 0xff(Qmode) to 0xff7f(assume Xmode is DImode)
before compare in Xmode.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

PR target/117688

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_expand_sstrunc): Leverage the helper
riscv_extend_to_xmode_reg with SIGN_EXTEND.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr117688.h: Add test helper macros.
* gcc.target/riscv/pr117688-trunc-run-1-s16-to-s8.c: New test.
* gcc.target/riscv/pr117688-trunc-run-1-s32-to-s16.c: New test.
* gcc.target/riscv/pr117688-trunc-run-1-s32-to-s8.c: New test.
* gcc.target/riscv/pr117688-trunc-run-1-s64-to-s16.c: New test.
* gcc.target/riscv/pr117688-trunc-run-1-s64-to-s32.c: New test.
* gcc.target/riscv/pr117688-trunc-run-1-s64-to-s8.c: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv.cc |  2 +-
 .../riscv/pr117688-trunc-run-1-s16-to-s8.c|  6 +
 .../riscv/pr117688-trunc-run-1-s32-to-s16.c   |  6 +
 .../riscv/pr117688-trunc-run-1-s32-to-s8.c|  6 +
 .../riscv/pr117688-trunc-run-1-s64-to-s16.c   |  6 +
 .../riscv/pr117688-trunc-run-1-s64-to-s32.c   |  6 +
 .../riscv/pr117688-trunc-run-1-s64-to-s8.c|  6 +
 gcc/testsuite/gcc.target/riscv/pr117688.h | 22 +++
 8 files changed, 59 insertions(+), 1 deletion(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s16-to-s8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s32-to-s16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s32-to-s8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s64-to-s16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s64-to-s32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s64-to-s8.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 4999044f198..f7b00bb713f 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -13014,7 +13014,7 @@ riscv_expand_sstrunc (rtx dest, rtx src)
   rtx xmode_narrow_min = gen_reg_rtx (Xmode);
   rtx xmode_lt = gen_reg_rtx (Xmode);
   rtx xmode_gt = gen_reg_rtx (Xmode);
-  rtx xmode_src = gen_lowpart (Xmode, src);
+  rtx xmode_src = riscv_extend_to_xmode_reg (src, GET_MODE (src), SIGN_EXTEND);
   rtx xmode_dest = gen_reg_rtx (Xmode);
   rtx xmode_mask = gen_reg_rtx (Xmode);
   rtx xmode_sat = gen_reg_rtx (Xmode);
diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s16-to-s8.c 
b/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s16-to-s8.c
new file mode 100644
index 000..df84615a25f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s16-to-s8.c
@@ -0,0 +1,6 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "pr117688.h"
+
+DEFINE_SIGNED_SAT_TRUNC_RUN(int16_t, int8_t, INT8_MIN, INT8_MAX)
diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s32-to-s16.c 
b/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s32-to-s16.c
new file mode 100644
index 000..1b0f860eb55
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s32-to-s16.c
@@ -0,0 +1,6 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "pr117688.h"
+
+DEFINE_SIGNED_SAT_TRUNC_RUN(int32_t, int16_t, INT16_MIN, INT16_MAX)
diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s32-to-s8.c 
b/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s32-to-s8.c
new file mode 100644
index 000..e412a29df36
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s32-to-s8.c
@@ -0,0 +1,6 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "pr117688.h"
+
+DEFINE_SIGNED_SAT_TRUNC_RUN(int32_t, int8_t, INT8_MIN, INT8_MAX)
diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s64-to-s16.c 
b/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s64-to-s16.c
new file mode 100644
index 000..234d33b1895
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s64-to-s16.c
@@ -0,0 +1,6 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-a

[PATCH] [ifcombine] check for more zero-extension cases [PR118572]

2025-01-22 Thread Alexandre Oliva


When comparing a signed narrow variable with a wider constant that has
the bit corresponding to the variable's sign bit set, we would check
that the constant is a sign-extension from that sign bit, and conclude
that the compare fails if it isn't.

When the signed variable is masked without getting the [lr]l_signbit
variable set, or when the sign bit itself is masked out, we know the
sign-extension bits from the extended variable are going to be zero,
so the constant will only compare equal if it is a zero- rather than
sign-extension from the narrow variable's precision, therefore, check
that it satisfies this property, and yield a false compare result
otherwise.


for  gcc/ChangeLog

PR tree-optimization/118572
* gimple-fold.cc (fold_truth_andor_for_ifcombine): Compare as
unsigned the variables whose extension bits are masked out.

for  gcc/testsuite/ChangeLog

PR tree-optimization/118572
* gcc.dg/field-merge-24.c: New.
---
 gcc/gimple-fold.cc|   20 --
 gcc/testsuite/gcc.dg/field-merge-24.c |   36 +
 2 files changed, 53 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/field-merge-24.c

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index cd9d350349186..13541fe1ef749 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -8552,12 +8552,21 @@ fold_truth_andor_for_ifcombine (enum tree_code code, 
tree truth_type,
 {
   /* Before clipping upper bits of the right-hand operand of the compare,
 check that they're sign or zero extensions, depending on how the
-left-hand operand would be extended.  */
+left-hand operand would be extended.  If it is unsigned, or if there's
+a mask that zeroes out extension bits, whether because we've checked
+for upper bits in the mask and did not set ll_signbit, or because the
+sign bit itself is masked out, check that the right-hand operand is
+zero-extended.  */
   bool l_non_ext_bits = false;
   if (ll_bitsize < lr_bitsize)
{
  wide_int zext = wi::zext (l_const, ll_bitsize);
- if ((ll_unsignedp ? zext : wi::sext (l_const, ll_bitsize)) == l_const)
+ if ((ll_unsignedp
+  || (ll_and_mask.get_precision ()
+  && (!ll_signbit
+  || ((ll_and_mask & wi::mask (ll_bitsize - 1, true, 
ll_bitsize))
+  == 0)))
+  ? zext : wi::sext (l_const, ll_bitsize)) == l_const)
l_const = zext;
  else
l_non_ext_bits = true;
@@ -8583,7 +8592,12 @@ fold_truth_andor_for_ifcombine (enum tree_code code, 
tree truth_type,
   if (rl_bitsize < rr_bitsize)
{
  wide_int zext = wi::zext (r_const, rl_bitsize);
- if ((rl_unsignedp ? zext : wi::sext (r_const, rl_bitsize)) == r_const)
+ if ((rl_unsignedp
+  || (rl_and_mask.get_precision ()
+  && (!rl_signbit
+  || ((rl_and_mask & wi::mask (rl_bitsize - 1, true, 
rl_bitsize))
+  == 0)))
+  ? zext : wi::sext (r_const, rl_bitsize)) == r_const)
r_const = zext;
  else
r_non_ext_bits = true;
diff --git a/gcc/testsuite/gcc.dg/field-merge-24.c 
b/gcc/testsuite/gcc.dg/field-merge-24.c
new file mode 100644
index 0..ce5bce7d0b49c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/field-merge-24.c
@@ -0,0 +1,36 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+/* PR tree-optimization/118572 */
+/* Check that signed compares with constants that seem signed in the other
+   compare operand's width get treated as unsigned if its upper bits are masked
+   out.  */
+
+__attribute__((noipa))
+int test(signed char c)
+{
+return (((0x80 & (c&0xff)) != 0) && ((0xc0 & (c&0xff)) == 0x80));
+}
+
+__attribute__((noipa))
+int test2(signed char c)
+{
+return (((-128 & (c&-1)) != 0) && ((-64 & (c&-1)) == -128));
+}
+
+__attribute__((noipa))
+int test3(signed char c)
+{
+return (((0x80 & (c&-1)) != 0) && ((0x1248c0 & (c&-1)) == 0x124880));
+}
+
+__attribute__((noipa))
+int test4(signed char c)
+{
+return (((0x400 & (c&-1)) == 0) && ((0x40 & (c&-1)) == 0x40));
+}
+
+int main() {
+  if (test(0x80) == 0 || test2(-128) == 0 || test3(-128) == 0 || test4(64) == 
0)
+__builtin_abort();
+}


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: [PATCH] Fortran: Added support for locality specs in DO CONCURRENT (Fortran 2018/23)

2025-01-22 Thread Andre Vehreschild
Hi Damian,

looking into the code I neither find the keyword LOCAL nor REDUCE. The match
routine also does not give any hint that those keywords are supported in a do
concurrent loop.

And here is the existing bug
ticket: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101602

But I don't see a ticket for REDUCE yet.

Sorry for the bad news.

Regards,
Andre

On Wed, 22 Jan 2025 20:59:02 -0800
Damian Rouson  wrote:

> The first error message in my previous email led me to the following
> constraint:
> 
> “*C1130*  A *variable-name *that appears in a LOCAL or LOCAL_INIT
> *locality-spec *shall not have the ALLOCATABLE, INTENT (IN), or OPTIONAL
> attribute, shall not be of finalizable type, shall not have an allocatable
> ultimate component,...”
> 
> My first thought was, "Holy guacamole. That seems like such
> a severe limitation that the feature seems almost useless."  Fortuitously,
> however, it turns out that the code I sent a little while ago was missing
> one important feature from my intended use case: associate.  As shown
> below, using associate eliminates the first error, but I'm still confused
> by the remaining error message.  Are locality specifiers actually supported
> yet?
> 
> Damian
> 
> % cat locality.f90
> 
> program main
> 
>   implicit none
> 
>   integer pair
> 
>   integer :: mini_batch_size=1
> 
> 
>   real, allocatable, dimension(:,:) :: a, dcdb
> 
> 
> 
>   allocate(a(1,1))
> 
>   allocate(dcdb(1,1))
> 
> 
>   associate(a_ => a, dcdb_ => dcdb)
> 
> do concurrent (pair = 1:mini_batch_size) local(a_) reduce(+: dcdb_)
> 
>   a_ = 0.
> 
>   dcdb_ = 0.
> 
> end do
> 
>   end associate
> 
> 
> end program
> 
> 
> % gfortran locality.f90
> 
> *locality.f90:12:71:*
> 
> 
>12 | do concurrent (pair = 1:mini_batch_size) local(a_) reduce(+:
> dcdb_)
> 
>   |
>   *1*
> 
> *Error:* Sorry, LOCAL and LOCAL_INIT are not yet supported for ‘*do
> concurrent*’ constructs at *(1)*
> 
> 
> % gfortran --version
> 
> GNU Fortran (GCC) 15.0.1 20250119 (experimental)
> 
> Copyright (C) 2025 Free Software Foundation, Inc.
> 
> This is free software; see the source for copying conditions.  There is NO
> 
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


Re: [PATCH] RISC-V: Disable two-source permutes for now [PR117173].

2025-01-22 Thread Robin Dapp
> Could you show me the a piece of  codegen difference in X264 that make 
> performance improved ?

I have one ready from SATD (see PR117173), there are more.

"Before":
_838 = VEC_PERM_EXPR ;
_846 = VEC_PERM_EXPR ;

"After":
_42 = VEC_PERM_EXPR ;
...
_44 = VEC_PERM_EXPR ;
_45 = VEC_PERM_EXPR ;

"After" is matched in match.pd and converted to "before".  "Before" requires
two masked, complex gathers while "after" needs to masking but just vmerge
and single-source gathers.


??????[PATCH] RISC-V: Disable two-source permutes for now [PR117173].

2025-01-22 Thread ??????
lgtm.








 --Reply to Message--
 On Wed, Jan 22, 2025 16:04 PM Robin Dapp

Re: [PATCH] lra: emit caller-save register spills before call insn [PR116028]

2025-01-22 Thread Andrew Pinski
On Tue, Jan 21, 2025 at 5:45 PM Andrew Pinski  wrote:
>
> On Thu, Aug 8, 2024 at 2:07 PM Andrew Pinski  wrote:
> >
> > On Fri, Aug 2, 2024 at 7:30 AM Jeff Law  wrote:
> > >
> > >
> > >
> > > On 8/1/24 4:12 AM, Surya Kumari Jangala wrote:
> > > > lra: emit caller-save register spills before call insn [PR116028]
> > > >
> > > > LRA emits insns to save caller-save registers in the
> > > > inheritance/splitting pass. In this pass, LRA builds EBBs (Extended
> > > > Basic Block) and traverses the insns in the EBBs in reverse order from
> > > > the last insn to the first insn. When LRA sees a write to a pseudo (that
> > > > has been assigned a caller-save register), and there is a read following
> > > > the write, with an intervening call insn between the write and read,
> > > > then LRA generates a spill immediately after the write and a restore
> > > > immediately before the read. The spill is needed because the call insn
> > > > will clobber the caller-save register.
> > > >
> > > > If there is a write insn and a call insn in two separate BBs but
> > > > belonging to the same EBB, the spill insn gets generated in the BB
> > > > containing the write insn. If the write insn is in the entry BB, then
> > > > the spill insn that is generated in the entry BB prevents shrink wrap
> > > > from happening. This is because the spill insn references the stack
> > > > pointer and hence the prolog gets generated in the entry BB itself.
> > > >
> > > > This patch ensures that the spill insn is generated before the call insn
> > > > instead of after the write. This is also more efficient as the spill now
> > > > occurs only in the path containing the call.
> > > >
> > > > 2024-08-01  Surya Kumari Jangala  
> > > >
> > > > gcc/
> > > >   PR rtl-optimization/PR116028
> > > >   * lra-constraints.cc (split_reg): Spill register before call
> > > >   insn.
> > > >   (latest_call_insn): New variable.
> > > >   (inherit_in_ebb): Track the latest call insn.
> > > >
> > > > gcc/testsuite/
> > > >   PR rtl-optimization/PR116028
> > > >   * gcc.dg/ira-shrinkwrap-prep-1.c: Remove xfail for powerpc.
> > > >   * gcc.dg/pr10474.c: Remove xfail for powerpc.
> > > Implementation looks fine.  I would suggest a comment indicating why
> > > we're inserting before last_call_insn.  Otherwise someone in the future
> > > would have to find the patch submission to know why we're handling that
> > > case specially.
> > >
> > > OK with that additional comment.
> >
> > This causes bootstrap failure on aarch64-linux-gnu; self-tests fail at
> > stage 2. Looks to be wrong code is produced compiling stage 2
> > compiler.
> > I have not looked further than that right now.
>
> I decided to re-apply the patch to the trunk locally and see if I
> could debug what was going wrong. The good news is the bootstrap
> failure is gone.
> The bad news is I don't know why though. I am going to see if I can
> bisect where the failure mode I was getting disappears. That should
> help decide if the bug has got latent or really fixed.

Just a small update before I go to bed.

Here are some information on the revisions which work/don't work (with
the patch re-applied):
r15-5746-g0547dbb725b6d8 fails the way it was reported above
r15-6350-gbb829ce157f8b4 fails with a  bootstrap comparison
r15-6351-g24df430108c0cd fails with a bootstrap comparison

r15-7077-g96f4ba4d19a765 works
r15-7116-g3f641a8f1d1fafc0c6531aee185d0e74998987d5 works

I will be testing more revisions tomorrow. But it looks like we are
down to ~700 revisions.

Thanks,
Andrew Pinski


>
> Thanks,
> Andrew Pinski
>
> >
> > Thanks,
> > Andrew
> >
> > >
> > > Thanks,
> > > jeff