[PATCH] MIPS: Add MSUBF.fmt instruction for MIPSr6

2024-09-14 Thread Jie Mei
GCC currently uses two instructions (NEG.fmt and MADDF.fmt) for
operations like `x - (y * z)' for MIPSr6. We can further tune this by
using only MSUBF.fmt instead of those two.

This patch adds MSUBF.fmt instrutions with corresponding tests.

gcc/ChangeLog:

* config/mips/mips.md (fms4): Generates MSUBF.fmt
instructions.
(*fms4_msubf): Same as above.
(fnma4): Same as above.
(*fnma4_msubf): Same as above.

gcc/testsuite/ChangeLog:

* gcc.target/mips/mips-msubf.c: New tests for MIPSr6.
---
 gcc/config/mips/mips.md| 28 ---
 gcc/testsuite/gcc.target/mips/mips-msubf.c | 31 ++
 2 files changed, 56 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/mips/mips-msubf.c

diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index 737d2566ec8..e6f20af34f0 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -2609,7 +2609,8 @@
(fma:ANYF (match_operand:ANYF 1 "register_operand")
  (match_operand:ANYF 2 "register_operand")
  (neg:ANYF (match_operand:ANYF 3 "register_operand"]
-  "(ISA_HAS_FUSED_MADD3 || ISA_HAS_FUSED_MADD4)")
+  "(ISA_HAS_FUSED_MADD3 || ISA_HAS_FUSED_MADD4)
+   || ISA_HAS_FUSED_MADDF")
 
 (define_insn "*fms4_msub3"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
@@ -2631,6 +2632,16 @@
   [(set_attr "type" "fmadd")
(set_attr "mode" "")])
 
+(define_insn "*fms4_msubf"
+  [(set (match_operand:ANYF 0 "register_operand" "=f")
+   (fma:ANYF (match_operand:ANYF 1 "register_operand" "f")
+ (match_operand:ANYF 2 "register_operand" "f")
+ (neg:ANYF (match_operand:ANYF 3 "register_operand" "0"]
+  "ISA_HAS_FUSED_MADDF"
+  "msubf.\t%0,%1,%2"
+  [(set_attr "type" "fmadd")
+   (set_attr "mode" "")])
+
 ;; fnma is defined in GCC as (fma (neg op1) op2 op3)
 ;; (-op1 * op2) + op3 ==> -(op1 * op2) + op3 ==> -((op1 * op2) - op3)
 ;; The mips nmsub instructions implement -((op1 * op2) - op3)
@@ -2642,8 +2653,9 @@
(fma:ANYF (neg:ANYF (match_operand:ANYF 1 "register_operand"))
  (match_operand:ANYF 2 "register_operand")
  (match_operand:ANYF 3 "register_operand")))]
-  "(ISA_HAS_FUSED_MADD3 || ISA_HAS_FUSED_MADD4)
-   && !HONOR_SIGNED_ZEROS (mode)")
+  "((ISA_HAS_FUSED_MADD3 || ISA_HAS_FUSED_MADD4)
+   && !HONOR_SIGNED_ZEROS (mode))
+   || ISA_HAS_FUSED_MADDF")
 
 (define_insn "*fnma4_nmsub3"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
@@ -2665,6 +2677,16 @@
   [(set_attr "type" "fmadd")
(set_attr "mode" "")])
 
+(define_insn "*fnma4_msubf"
+  [(set (match_operand:ANYF 0 "register_operand" "=f")
+   (fma:ANYF (neg:ANYF (match_operand:ANYF 1 "register_operand" "f"))
+ (match_operand:ANYF 2 "register_operand" "f")
+ (match_operand:ANYF 3 "register_operand" "0")))]
+  "ISA_HAS_FUSED_MADDF"
+  "msubf.\t%0,%1,%2"
+  [(set_attr "type" "fmadd")
+   (set_attr "mode" "")])
+
 ;; fnms is defined as: (fma (neg op1) op2 (neg op3))
 ;; ((-op1) * op2) - op3 ==> -(op1 * op2) - op3 ==> -((op1 * op2) + op3)
 ;; The mips nmadd instructions implement -((op1 * op2) + op3)
diff --git a/gcc/testsuite/gcc.target/mips/mips-msubf.c 
b/gcc/testsuite/gcc.target/mips/mips-msubf.c
new file mode 100644
index 000..424ca21e7f8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/mips-msubf.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-mhard-float -ffast-math -march=mips32r6" } */
+/* { dg-skip-if "code quality test" { *-*-* } { "-O0" "-O1" } { "" } } */
+/* { dg-final { scan-assembler-times "\tmsubf\\.s\t" 2 } } */
+/* { dg-final { scan-assembler-times "\tmsubf\\.d\t" 2 } } */
+
+NOMIPS16 float
+test01 (float x, float y, float z)
+{
+  return x - (y * z);
+}
+
+NOMIPS16 double
+test02 (double x, double y, double z)
+{
+  return x - (y * z);
+}
+
+NOMIPS16 float
+test03 (float x, float y, float z)
+{
+  return (y * z) - x;
+}
+
+NOMIPS16 double
+test04 (double x, double y, double z)
+{
+  return (y * z) - x;
+}
+
+
-- 
2.42.0


[PATCH] Mark the copy/move constructor/operator= of auto_bitmap as delete

2024-09-14 Thread Andrew Pinski
Since we are written in C++11, these should be marked as delete rather
than just private.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* bitmap.h (class auto_bitmap): Mark copy/move constructor/operator=
as deleted.

Signed-off-by: Andrew Pinski 
---
 gcc/bitmap.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/bitmap.h b/gcc/bitmap.h
index 4cad1b4d6c6..451edcfc590 100644
--- a/gcc/bitmap.h
+++ b/gcc/bitmap.h
@@ -959,10 +959,10 @@ class auto_bitmap
 
  private:
   // Prevent making a copy that references our bitmap.
-  auto_bitmap (const auto_bitmap &);
-  auto_bitmap &operator = (const auto_bitmap &);
-  auto_bitmap (auto_bitmap &&);
-  auto_bitmap &operator = (auto_bitmap &&);
+  auto_bitmap (const auto_bitmap &) = delete;
+  auto_bitmap &operator = (const auto_bitmap &) = delete;
+  auto_bitmap (auto_bitmap &&) = delete;
+  auto_bitmap &operator = (auto_bitmap &&) = delete;
 
   bitmap_head m_bits;
 };
-- 
2.43.0



[PATCH] vect: release defs of removed statement

2024-09-14 Thread Andrew Pinski
While trying to add use of simple_dce_from_worklist
to the vectorizer so we don't need to run a full blown
DCE pass after the vectorizer, there was a crash noticed
due to a ssa name which has a stmt without a bb. This was
due to not calling release_defs after the call to gsi_remove.

Note the code to remove zero use statements should be able to
remove once the use of simple_dce_from_worklist has been added.
But in the meantime, fixing this bug will also improve memory
usage and a few other things which look through all ssa names.

gcc/ChangeLog:

* tree-vect-loop.cc (optimize_mask_stores): Call release_defs
after the call to gsi_remove with last argument of true.

Signed-off-by: Andrew Pinski 
---
 gcc/tree-vect-loop.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index cc15492f6a0..62c7f90779f 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -12803,6 +12803,7 @@ optimize_mask_stores (class loop *loop)
  if (has_zero_uses (lhs))
{
  gsi_remove (&gsi_from, true);
+ release_defs (stmt1);
  continue;
}
}
-- 
2.43.0



Re: [PATCH] c++: Don't mix timevar_start and auto_cond_timevar for TV_NAME_LOOKUP [PR116681]

2024-09-14 Thread Jason Merrill

On 9/13/24 1:31 PM, Simon Martin wrote:

We currently ICE upon the following testcase when using -ftime-report

=== cut here ===
template < int> using __conditional_t = int;
template < typename _Iter >
concept random_access_iterator = requires { new _Iter; };
template < typename _Iterator >
struct reverse_iterator {
   using iterator_concept =
 __conditional_t< random_access_iterator< _Iterator>>;
};
void RemoveBottom() {
   int iter;
   for (reverse_iterator< int > iter;;)
   ;
}
=== cut here ===

The problem is that qualified_namespace_lookup does a plain start() of
the TV_NAME_LOOKUP timer (that asserts that the timer is not already
started). However this timer has already been cond_start()'d in the call
stack - by pushdecl - so the assert fails.

This patch simply ensures that we always conditionally start this timer
(which is done in all other places that use it).


OK, thanks.


Successfully tested on x86_64-pc-linux-gnu.

PR c++/116681

gcc/cp/ChangeLog:

* name-lookup.cc (qualified_namespace_lookup): Use an
auto_cond_timer instead of using timevar_start and timevar_stop.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-pr116681.C: New test.

---
  gcc/cp/name-lookup.cc |  3 +--
  .../g++.dg/cpp2a/concepts-pr116681.C  | 20 +++
  2 files changed, 21 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-pr116681.C

diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
index bfe17b7cb2f..c7a693e02d5 100644
--- a/gcc/cp/name-lookup.cc
+++ b/gcc/cp/name-lookup.cc
@@ -7421,10 +7421,9 @@ tree lookup_qualified_name (tree t, const char *p, 
LOOK_want w, bool c)
  static bool
  qualified_namespace_lookup (tree scope, name_lookup *lookup)
  {
-  timevar_start (TV_NAME_LOOKUP);
+  auto_cond_timevar tv (TV_NAME_LOOKUP);
query_oracle (lookup->name);
bool found = lookup->search_qualified (ORIGINAL_NAMESPACE (scope));
-  timevar_stop (TV_NAME_LOOKUP);
return found;
  }
  
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-pr116681.C b/gcc/testsuite/g++.dg/cpp2a/concepts-pr116681.C

new file mode 100644
index 000..f1b47797f1e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-pr116681.C
@@ -0,0 +1,20 @@
+// PR c++/116681
+// { dg-do compile { target c++20 } }
+// { dg-additional-options "-ftime-report" }
+// { dg-allow-blank-lines-in-output 1 }
+// { dg-prune-output "Time variable" }
+// { dg-prune-output "k" }
+// { dg-prune-output "\[0-9\]+%" }
+
+template < int> using __conditional_t = int;
+template < typename _Iter >
+concept random_access_iterator = requires { new _Iter; };
+template < typename _Iterator > struct reverse_iterator {
+  using iterator_concept = __conditional_t< random_access_iterator< _Iterator 
>>;
+};
+void RemoveBottom()
+{
+  int iter;
+  for (reverse_iterator< int > iter;;)
+  ;
+}




Re: [PATCH] testsuite: a few more hostedlib adjustments

2024-09-14 Thread Alexandre Oliva
On Sep 13, 2024, Mike Stump  wrote:

> Yeah, the hard issues are resolved, so follow on patches like this are
> now obvious. They follow the usual and standard practices now.

Thanks.

> So, we should expect them to punt to a hosted person to find, discover
> and fix the fallout. Kinda like canadian cross work, or int16
> work. Just a fair warning that you will be more an expert than most
> for these sort of issues.  :-)

Yeah, I sort of expected something like that.

I'm still trying to decide whether to automated testing to catch
regressions in this configuration, or existing ones will be able to
catch them from now on.  Presumably there are still arch-specific tests
that will have to be covered as well, so it would be good if other
platforms' automated testers were to cover them as well.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


[patch,avr] copysign: Use copysign rtx code, allow const_double

2024-09-14 Thread Georg-Johann Lay

This uses new rtx code copysign instead of an unspec.
It also allows const_double as 2nd operand because gcc
does not optimize code like

__builtin_copysignf (x, -1.0f);

Ok for trunk?

Johann

--

AVR: Use rtx code copysign.

gcc/
* config/avr/avr.md (UNSPEC_COPYSIGN): Remove define_enum.
(copysignsf3): Use copysign instead of UNSPEC_COPYSIGN.
Allow const_double for operand 2.

diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md
index 429f537b7d4..2abf3c38d83 100644
--- a/gcc/config/avr/avr.md
+++ b/gcc/config/avr/avr.md
@@ -87,7 +87,6 @@ (define_c_enum "unspec"
UNSPEC_FMUL
UNSPEC_FMULS
UNSPEC_FMULSU
-   UNSPEC_COPYSIGN
UNSPEC_INSERT_BITS
UNSPEC_ROUND
])
@@ -9272,12 +9271,18 @@ (define_insn "*ffssihi2.libgcc"
 ;; Copysign

 (define_insn "copysignsf3"
-  [(set (match_operand:SF 0 "register_operand" "=r")
-(unspec:SF [(match_operand:SF 1 "register_operand"  "0")
-(match_operand:SF 2 "register_operand"  "r")]
-   UNSPEC_COPYSIGN))]
+  [(set (match_operand:SF 0 "register_operand"  "=r")
+(copysign:SF (match_operand:SF 1 "register_operand"  "0")
+ (match_operand:SF 2 "nonmemory_operand" "rF")))]
   ""
-  "bst %D2,7\;bld %D0,7"
+  {
+if (const_double_operand (operands[2], SFmode))
+  {
+rtx xmsb = simplify_gen_subreg (QImode, operands[2], SFmode, 3);
+return INTVAL (xmsb) < 0 ? "set\;bld %D0,7" : "clt\;bld %D0,7";
+  }
+return "bst %D2,7\;bld %D0,7";
+  }
   [(set_attr "length" "2")])


Re: [PATCH] Try fixing RISC-V .SELECT_VL with SLP

2024-09-14 Thread Robin Dapp
> The following simply removes a seemingly bogus guard.
>
>   * tree-vect-loop.cc (vect_analyze_loop_1): Remove SLP guard
>   from .SELECT_VL disabling.
> ---
>  gcc/tree-vect-loop.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index cc15492f6a0..378e7c560bd 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -3078,7 +3078,7 @@ start_over:
>if (direct_internal_fn_supported_p (IFN_SELECT_VL, iv_type,
> OPTIMIZE_FOR_SPEED)
> && LOOP_VINFO_LENS (loop_vinfo).length () == 1
> -   && LOOP_VINFO_LENS (loop_vinfo)[0].factor == 1 && !slp
> +   && LOOP_VINFO_LENS (loop_vinfo)[0].factor == 1

I don't think it will just work like that.  The problem is that we adjust data
pointer increments according to the current vector length in
vect_get_loop_variant_data_ptr_increment.

To this end we use the SELECT_VL result and multiply it with the current
dataref's step.  Without having looked into it closer I suppose using step is
not suffcient or maybe even wrong in an SLP setting and I guess this
complication lead to SLP being disabled initially.

-- 
Regards
 Robin



Re: [PATCH] libstdc++: Enable most of for freestanding

2024-09-14 Thread Arsen Arsenović
Jonathan Wakely  writes:

> This restores support for most of  with -ffreestanding. In case
> there are users who want a minimal freestanding implementation that only
> provides what the standard guarantees, there's a new macro that disables
>  again. This can be used to write more portable freestanding
> code that doesn't rely on  being usable. As we add other things
> to the freestanding subset (e.g.  for PR 113398 and  for
> PR 109814) we can add other _GLIBCXX_NO_FREESTANDING_XXX macros, and a
> _GLIBCXX_NO_FREESTANDING_EXTRAS to define all of them at once. I haven't
> done that in this patch, because there's on the CHRONO one for now.

That seems reasonable to me.

Do we want to count the headers that we moved into the non-hosted
configuration into this 'extras' set also (such as tuple)?  Or just
things not installed in the --disable-hosted-libstdcxx configuration?

That also impacts what we do if/when we move the stuff that was made not
freestanding due to std::allocator (IIRC vector and string are in this
set) into the std_freestanding subset.

The patch seems OK either way.

> Tested x86_64-linux.
>
> -- >8 --
>
> This makes durations, time points and calendrical types available for
> freestanding. The clocks and time zone utilities are disabled for
> freestanding, as they require functions in the hosted lib.
>
> Add support for a new macro _GLIBCXX_NO_FREESTANDING_CHRONO which can be
> used to explicitly disable  for freestanding.
>
> libstdc++-v3/ChangeLog:
>
>   * doc/xml/manual/using.xml (_GLIBCXX_NO_FREESTANDING_CHRONO):
>   Document macro.
>   * doc/html/*: Regenerate.
>   * include/bits/chrono.h [_GLIBCXX_NO_FREESTANDING_CHRONO]:
>   Only include  when this macro is defined.
>   [_GLIBCXX_HOSTED]: Only define clocks for hosted.
>   * include/bits/version.def (chrono_udls): Remove hosted=yes.
>   * include/bits/version.h: Regenerate.
>   * include/std/chrono [_GLIBCXX_HOSTED]: Only define clocks and
>   time zone utilities for hosted.
>   * testsuite/std/time/freestanding.cc: New test.
> ---
>  .../doc/html/manual/using_macros.html |  7 +++
>  libstdc++-v3/doc/xml/manual/using.xml | 12 +
>  libstdc++-v3/include/bits/chrono.h| 24 ++---
>  libstdc++-v3/include/bits/version.def |  1 -
>  libstdc++-v3/include/bits/version.h   |  2 +-
>  libstdc++-v3/include/std/chrono   | 24 +++--
>  .../testsuite/std/time/freestanding.cc| 52 +++
>  7 files changed, 109 insertions(+), 13 deletions(-)
>  create mode 100644 libstdc++-v3/testsuite/std/time/freestanding.cc
>
> diff --git a/libstdc++-v3/doc/html/manual/using_macros.html 
> b/libstdc++-v3/doc/html/manual/using_macros.html
> index ae564692630..67623b5e2af 100644
> --- a/libstdc++-v3/doc/html/manual/using_macros.html
> +++ b/libstdc++-v3/doc/html/manual/using_macros.html
> @@ -124,4 +124,11 @@
>  must be present on all vector operations or none, so this macro must
>  be defined to the same value for all translation units that create,
>  destroy, or modify vectors.
> +   class="code">_GLIBCXX_NO_FREESTANDING_CHRONO
> + Undefined by default. When defined, the
> +  header cannot
> + be used with -ffreestanding.
> + When not defined, durations, time points, and calendar types are
> + available for freestanding, but the standard clocks and the time zone
> + database are not (because they require OS support).
> width="100%" summary="Navigation footer"> accesskey="p" href="using_headers.html">Prev  align="center">Up align="right">  href="using_dual_abi.html">Next valign="top">Headers  href="../index.html">Home 
> Dual ABI
> \ No newline at end of file
> diff --git a/libstdc++-v3/doc/xml/manual/using.xml 
> b/libstdc++-v3/doc/xml/manual/using.xml
> index 6675359f3b3..4e1c70040b5 100644
> --- a/libstdc++-v3/doc/xml/manual/using.xml
> +++ b/libstdc++-v3/doc/xml/manual/using.xml
> @@ -1321,6 +1321,18 @@ g++ -Winvalid-pch -I. -include stdc++.h -H -g -O2 
> hello.cc -o test.exe
>  destroy, or modify vectors.
>
>  
> +
> +_GLIBCXX_NO_FREESTANDING_CHRONO
> +
> +  
> + Undefined by default. When defined, the
> +  header cannot
> + be used with -ffreestanding.
> + When not defined, durations, time points, and calendar types are
> + available for freestanding, but the standard clocks and the time zone
> + database are not (because they require OS support).
> +  
> +
>  
>  
>
> diff --git a/libstdc++-v3/include/bits/chrono.h 
> b/libstdc++-v3/include/bits/chrono.h
> index 0773867da71..fd9c4642f4f 100644
> --- a/libstdc++-v3/include/bits/chrono.h
> +++ b/libstdc++-v3/include/bits/chrono.h
> @@ -37,7 +37,9 @@
>  #include 
>  #include 
>  #include 
> -#include 
> +#if _GLIBCXX_HOSTED
> +# include 
> +#endif
>  #include  // for literals support.
>  #if __cplusplus >= 202002L
>  # include 
> @@ -50

Re: [patch,avr] copysign: Use copysign rtx code, allow const_double

2024-09-14 Thread Denis Chertykov
сб, 14 сент. 2024 г. в 13:00, Georg-Johann Lay :
>
> This uses new rtx code copysign instead of an unspec.
> It also allows const_double as 2nd operand because gcc
> does not optimize code like
>
> __builtin_copysignf (x, -1.0f);
>
> Ok for trunk?

Ok. Please apply.

Denis.

>
> Johann
>
> --
>
>  AVR: Use rtx code copysign.
>
>  gcc/
>  * config/avr/avr.md (UNSPEC_COPYSIGN): Remove define_enum.
>  (copysignsf3): Use copysign instead of UNSPEC_COPYSIGN.
>  Allow const_double for operand 2.
>
> diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md
> index 429f537b7d4..2abf3c38d83 100644
> --- a/gcc/config/avr/avr.md
> +++ b/gcc/config/avr/avr.md
> @@ -87,7 +87,6 @@ (define_c_enum "unspec"
>  UNSPEC_FMUL
>  UNSPEC_FMULS
>  UNSPEC_FMULSU
> -   UNSPEC_COPYSIGN
>  UNSPEC_INSERT_BITS
>  UNSPEC_ROUND
>  ])
> @@ -9272,12 +9271,18 @@ (define_insn "*ffssihi2.libgcc"
>   ;; Copysign
>
>   (define_insn "copysignsf3"
> -  [(set (match_operand:SF 0 "register_operand" "=r")
> -(unspec:SF [(match_operand:SF 1 "register_operand"  "0")
> -(match_operand:SF 2 "register_operand"  "r")]
> -   UNSPEC_COPYSIGN))]
> +  [(set (match_operand:SF 0 "register_operand"  "=r")
> +(copysign:SF (match_operand:SF 1 "register_operand"  "0")
> + (match_operand:SF 2 "nonmemory_operand" "rF")))]
> ""
> -  "bst %D2,7\;bld %D0,7"
> +  {
> +if (const_double_operand (operands[2], SFmode))
> +  {
> +rtx xmsb = simplify_gen_subreg (QImode, operands[2], SFmode, 3);
> +return INTVAL (xmsb) < 0 ? "set\;bld %D0,7" : "clt\;bld %D0,7";
> +  }
> +return "bst %D2,7\;bld %D0,7";
> +  }
> [(set_attr "length" "2")])


[patch,avr] Tidy up enum and struct tags and rtx_code.

2024-09-14 Thread Georg-Johann Lay

This patch tidies up enum and struct tags that are
not required in C++.  It also uses rtx_code for
RTX codes instead of RTX_CODE.  RTX_CODE is now only
used in #ifdef's in avr-protos.h.

Ok for trunk?

Johann

--

AVR: Tidy up enum and struct tags.

Use "rtx_code" for RTX codes, not "enum rtx_code" and not "RTX_CODE".
Drop enum and struct tags if possible.

gcc/
* avr.cc: Use rtx_code for RTX codes.  Drop enum and struct tags.
* avr.md: Same.
* avr-c.cc: Same.
* avr-dimode.md: Same.
* avr-passes.cc: Same.
* avr-protos.h: Same.AVR: Tidy up enum and struct tags.

Use "rtx_code" for RTX codes, not "enum rtx_code" and not "RTX_CODE".
Drop enum and struct tags if possible.

gcc/
* avr.cc: Use rtx_code for RTX codes.  Drop enum and struct tags.
* avr.md: Same.
* avr-c.cc: Same.
* avr-dimode.md: Same.
* avr-passes.cc: Same.
* avr-protos.h: Same.

diff --git a/gcc/config/avr/avr-c.cc b/gcc/config/avr/avr-c.cc
index 34c73c6bf55..81c91c38363 100644
--- a/gcc/config/avr/avr-c.cc
+++ b/gcc/config/avr/avr-c.cc
@@ -51,7 +51,7 @@ static tree
 avr_resolve_overloaded_builtin (unsigned int iloc, tree fndecl, void *vargs)
 {
   tree type0, type1, fold = NULL_TREE;
-  enum avr_builtin_id id = AVR_BUILTIN_COUNT;
+  avr_builtin_id id = AVR_BUILTIN_COUNT;
   location_t loc = (location_t) iloc;
   vec &args = * (vec*) vargs;
 
@@ -291,7 +291,7 @@ avr_toupper (char *up, const char *lo)
 /* Worker function for TARGET_CPU_CPP_BUILTINS.  */
 
 void
-avr_cpu_cpp_builtins (struct cpp_reader *pfile)
+avr_cpu_cpp_builtins (cpp_reader *pfile)
 {
   builtin_define_std ("AVR");
 
diff --git a/gcc/config/avr/avr-dimode.md b/gcc/config/avr/avr-dimode.md
index e6284ced45d..59337218783 100644
--- a/gcc/config/avr/avr-dimode.md
+++ b/gcc/config/avr/avr-dimode.md
@@ -459,11 +459,11 @@ (define_expand "cbranch4"
   (label_ref (match_operand 3))
   (pc)))]
   "avr_have_dimode"
-   {
+  {
 int icode = (int) GET_CODE (operands[0]);
 
 targetm.canonicalize_comparison (&icode, &operands[1], &operands[2], false);
-operands[0] = gen_rtx_fmt_ee ((enum rtx_code) icode,
+operands[0] = gen_rtx_fmt_ee ((rtx_code) icode,
   VOIDmode, operands[1], operands[2]);
 
 rtx acc_a = gen_rtx_REG (mode, ACC_A);
@@ -488,7 +488,7 @@ (define_expand "cbranch4"
 emit_jump_insn (gen_cbranch_2_split (operands[0], operands[3]));
   }
 DONE;
-   })
+  })
 
 (define_insn_and_split "cbranch_2_split"
   [(set (pc)
diff --git a/gcc/config/avr/avr-passes.cc b/gcc/config/avr/avr-passes.cc
index 95b8e9a7529..ad913ddd4af 100644
--- a/gcc/config/avr/avr-passes.cc
+++ b/gcc/config/avr/avr-passes.cc
@@ -813,7 +813,7 @@ avr_optimize_casesi (rtx_insn *insns[5], rtx *xop)
 
   // How the original switch value was extended to SImode; this is
   // SIGN_EXTEND or ZERO_EXTEND.
-  enum rtx_code code = GET_CODE (xop[9]);
+  rtx_code code = GET_CODE (xop[9]);
 
   // Lower index, upper index (plus one) and range of case calues.
   HOST_WIDE_INT low_idx = -INTVAL (xop[1]);
@@ -1078,7 +1078,7 @@ public:
   {
 rtx reg_or_0, mem, addr, addr_reg;
 int addr_regno;
-enum rtx_code addr_code;
+rtx_code addr_code;
 machine_mode mode;
 addr_space_t addr_space;
 bool store_p, volatile_p;
@@ -1744,7 +1744,7 @@ avr_split_fake_addressing_move (rtx_insn * /*insn*/, rtx *xop)
 
   machine_mode mode = GET_MODE (mem);
   rtx base, addr = XEXP (mem, 0);
-  enum rtx_code addr_code = GET_CODE (addr);
+  rtx_code addr_code = GET_CODE (addr);
 
   if (REG_P (reg_or_0)
   && reg_overlap_mentioned_p (reg_or_0, addr))
@@ -1784,7 +1784,7 @@ avr_split_fake_addressing_move (rtx_insn * /*insn*/, rtx *xop)
 	mem_volatile_p = true;
 }
 
-  enum rtx_code new_code = UNKNOWN;
+  rtx_code new_code = UNKNOWN;
   HOST_WIDE_INT add = 0, sub = 0;
   int msize = GET_MODE_SIZE (mode);
 
diff --git a/gcc/config/avr/avr-protos.h b/gcc/config/avr/avr-protos.h
index 71c110c791f..c8aa7c72df7 100644
--- a/gcc/config/avr/avr-protos.h
+++ b/gcc/config/avr/avr-protos.h
@@ -20,7 +20,7 @@
 
 
 extern int avr_function_arg_regno_p (int r);
-extern void avr_cpu_cpp_builtins (struct cpp_reader * pfile);
+extern void avr_cpu_cpp_builtins (cpp_reader * pfile);
 extern enum reg_class avr_regno_reg_class (int r);
 extern void asm_globalize_label (FILE *file, const char *name);
 extern void avr_adjust_reg_alloc_order (void);
@@ -66,7 +66,7 @@ extern const char *avr_out_plus_set_N (rtx*, int*);
 extern const char *avr_out_op8_set_ZN (rtx_code, rtx*, int*);
 extern int avr_len_op8_set_ZN (rtx_code, rtx*);
 extern bool avr_op8_ZN_operator (rtx);
-extern const char *avr_out_cmp_ext (rtx*, enum rtx_code, int*);
+extern const char *avr_out_cmp_ext (rtx*, rtx_code, int*);
 
 extern const char *ashlqi3_out (rtx_insn *insn, rtx operands[], int *len);
 extern const char *ashlhi3_out (rtx

Re: [PATCH] x86-64: Don't use temp for argument in a TImode register

2024-09-14 Thread H.J. Lu
On Sun, Sep 8, 2024 at 12:10 AM Uros Bizjak  wrote:
>
> On Fri, Sep 6, 2024 at 2:24 PM H.J. Lu  wrote:
> >
> > Don't use temp for a PARALLEL BLKmode argument of an EXPR_LIST expression
> > in a TImode register.  Otherwise, the TImode variable will be put in
> > the GPR save area which guarantees only 8-byte alignment.
> >
> > gcc/
> >
> > PR target/116621
> > * config/i386/i386.cc (ix86_gimplify_va_arg): Don't use temp for
> > a PARALLEL BLKmode container of an EXPR_LIST expression in a
> > TImode register.
> >
> > gcc/testsuite/
> >
> > PR target/116621
> > * gcc.target/i386/pr116621.c: New test.
>
> LGTM.

OK to backport to release branches?

> Thanks,
> Uros.
>
> >
> > Signed-off-by: H.J. Lu 
> > ---
> >  gcc/config/i386/i386.cc  | 22 ++--
> >  gcc/testsuite/gcc.target/i386/pr116621.c | 43 
> >  2 files changed, 63 insertions(+), 2 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr116621.c
> >
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > index 707b75a6d5d..45320124b91 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -4908,13 +4908,31 @@ ix86_gimplify_va_arg (tree valist, tree type, 
> > gimple_seq *pre_p,
> >
> >examine_argument (nat_mode, type, 0, &needed_intregs, 
> > &needed_sseregs);
> >
> > -  need_temp = (!REG_P (container)
> > +  bool container_in_reg = false;
> > +  if (REG_P (container))
> > +   container_in_reg = true;
> > +  else if (GET_CODE (container) == PARALLEL
> > +  && GET_MODE (container) == BLKmode
> > +  && XVECLEN (container, 0) == 1)
> > +   {
> > + /* Check if it is a PARALLEL BLKmode container of an EXPR_LIST
> > +expression in a TImode register.  In this case, temp isn't
> > +needed.  Otherwise, the TImode variable will be put in the
> > +GPR save area which guarantees only 8-byte alignment.   */
> > + rtx x = XVECEXP (container, 0, 0);
> > + if (GET_CODE (x) == EXPR_LIST
> > + && REG_P (XEXP (x, 0))
> > + && XEXP (x, 1) == const0_rtx)
> > +   container_in_reg = true;
> > +   }
> > +
> > +  need_temp = (!container_in_reg
> >&& ((needed_intregs && TYPE_ALIGN (type) > 64)
> >|| TYPE_ALIGN (type) > 128));
> >
> >/* In case we are passing structure, verify that it is consecutive 
> > block
> >   on the register save area.  If not we need to do moves.  */
> > -  if (!need_temp && !REG_P (container))
> > +  if (!need_temp && !container_in_reg)
> > {
> >   /* Verify that all registers are strictly consecutive  */
> >   if (SSE_REGNO_P (REGNO (XEXP (XVECEXP (container, 0, 0), 0
> > diff --git a/gcc/testsuite/gcc.target/i386/pr116621.c 
> > b/gcc/testsuite/gcc.target/i386/pr116621.c
> > new file mode 100644
> > index 000..704266458a8
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr116621.c
> > @@ -0,0 +1,43 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-O2" } */
> > +
> > +#include 
> > +#include 
> > +
> > +union S8302
> > +{
> > +  union
> > +  {
> > +double b;
> > +int c;
> > +  } a;
> > +  long double d;
> > +  unsigned short int f[5];
> > +};
> > +
> > +union S8302 s8302;
> > +extern void check8302va (int i, ...);
> > +
> > +int
> > +main (void)
> > +{
> > +  memset (&s8302, '\0', sizeof (s8302));
> > +  s8302.a.b = -221438.25;
> > +  check8302va (1, s8302);
> > +  return 0;
> > +}
> > +
> > +__attribute__((noinline, noclone))
> > +void
> > +check8302va (int z, ...)
> > +{
> > +  union S8302 arg, *p;
> > +  va_list ap;
> > +
> > +  __builtin_va_start (ap, z);
> > +  p = &s8302;
> > +  arg = __builtin_va_arg (ap, union S8302);
> > +  if (p->a.b != arg.a.b)
> > +__builtin_abort ();
> > +  __builtin_va_end (ap);
> > +}
> > --
> > 2.46.0
> >



-- 
H.J.


[Patch, Fortran] Implement Unsigned for SUM and PRODUCT

2024-09-14 Thread Thomas Koenig

Hello world,

this implements SUM and PRODUCT for UNSIGNED.  This is actually
missing from 24-116.txt, but that is due to an oversight.
This patch is on top of

https://gcc.gnu.org/pipermail/fortran/2024-September/060975.html

and uses the same method suggested by Richard: Use unsigned for the
data in SUM and PRODUCT in the integer library version, and call
the integer library version for unsigned.

Regression-tested. OK for trunk?

Best regards

Thomas

gcc/fortran/ChangeLog:

* gfortran.texi: Document SUM and PRODUCT.
* iresolve.cc (resolve_transformational): New argument,
use_integer, to translate calls to unsigned to calls to
integer.
(gfc_resolve_product): Use it
(gfc_resolve_sum): Use it.
* simplify.cc (init_result_expr): Handle BT_UNSIGNED.

libgfortran/ChangeLog:

* generated/product_c10.c: Regenerated.
* generated/product_c16.c: Regenerated.
* generated/product_c17.c: Regenerated.
* generated/product_c4.c: Regenerated.
* generated/product_c8.c: Regenerated.
* generated/product_i1.c: Regenerated.
* generated/product_i16.c: Regenerated.
* generated/product_i2.c: Regenerated.
* generated/product_i4.c: Regenerated.
* generated/product_i8.c: Regenarated.
* generated/product_r10.c: Regenerated.
* generated/product_r16.c: Regenerated.
* generated/product_r17.c: Regenerated.
* generated/product_r4.c: Regenerated.
* generated/product_r8.c: Regenarated.
* generated/sum_c10.c: Regenerated.
* generated/sum_c16.c: Regenerated.
* generated/sum_c17.c: Regenerated.
* generated/sum_c4.c: Regenerated.
* generated/sum_c8.c: Regenerated.
* generated/sum_i1.c: Regenerated.
* generated/sum_i16.c: Regenerated.
* generated/sum_i2.c: Regenerated.
* generated/sum_i4.c: Regenerated.
* generated/sum_i8.c: Regenerated.
* generated/sum_r10.c: Regenerated.
* generated/sum_r16.c: Regenerated.
* generated/sum_r17.c: Regenerated.
* generated/sum_r4.c: Regenerated.
* generated/sum_r8.c: Regenerated.
* m4/ifunction.m4: Whitespace fix.
* m4/product.m4: If type is integer, change to unsigned.
* m4/sum.m4: Likewise.



[pushed] testsuite: adjust pragma-diag-17.c diagnostics

2024-09-14 Thread Jason Merrill
Tested x86_64-pc-linux-gnu and armv8l-unknown-linuxgnueabihf, applying to
trunk.

-- 8< --

The Linaro CI runs of this testcase pointed out that I need to check for DFP
support, as well.

gcc/testsuite/ChangeLog:

* c-c++-common/pragma-diag-17.c: Handle !dfp targets.
---
 gcc/testsuite/c-c++-common/pragma-diag-17.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/c-c++-common/pragma-diag-17.c 
b/gcc/testsuite/c-c++-common/pragma-diag-17.c
index a38841bc48d..a44ce90f98b 100644
--- a/gcc/testsuite/c-c++-common/pragma-diag-17.c
+++ b/gcc/testsuite/c-c++-common/pragma-diag-17.c
@@ -12,7 +12,7 @@ void f()
   0b0100; /* { dg-error "binary constant" "" { target { ! c++14 } } } 
*/
 #pragma GCC diagnostic ignored "-Wpedantic"
   2.0j;
-  1.0dd;
+  1.0dd; /* { dg-error "decimal floating-point" "" { target { ! dfp } } } */
   1.0d;
 
 #ifdef __cplusplus

base-commit: bec1f2ce4ad3fe56906d6429b542a290e574b3eb
-- 
2.46.0



[PATCH] tree-object-size: Fold PHI node offsets with constants [PR116556]

2024-09-14 Thread Siddhesh Poyarekar
In PTR + OFFSET cases, try harder to see if the target offset could
result in a constant.  Specifically, if the offset is a PHI node with
all constant branches, return the minimum (or maximum for OST_MINIMUM)
of the possible values.

gcc/ChangeLog:

PR tree-optimization/116556
* tree-object-size.cc (try_collapsing_offset): New function.
(plus_stmt_object_size): Use it.
* gcc/testsuite/gcc.dg/builtin-object-size-1.c (test12): New
function.
(main): Call it.

Signed-off-by: Siddhesh Poyarekar 
---
Tests underway for x86_64 and i686.  OK if they pass?

 gcc/testsuite/gcc.dg/builtin-object-size-1.c | 25 
 gcc/tree-object-size.cc  | 41 
 2 files changed, 66 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-1.c 
b/gcc/testsuite/gcc.dg/builtin-object-size-1.c
index d6d13c5ef7a..eed1653505c 100644
--- a/gcc/testsuite/gcc.dg/builtin-object-size-1.c
+++ b/gcc/testsuite/gcc.dg/builtin-object-size-1.c
@@ -712,6 +712,30 @@ test11 (void)
 }
 #endif
 
+void
+__attribute__ ((noinline))
+test12 (unsigned cond)
+{
+  char *buf2 = malloc (10);
+  char *p;
+  size_t t;
+
+  if (cond)
+t = 8;
+  else
+t = 4;
+
+  p = &buf2[t];
+
+#ifdef __builtin_object_size
+  if (__builtin_object_size (p, 0) != (cond ? 2 : 6))
+FAIL ();
+#else
+  if (__builtin_object_size (p, 0) != 6)
+FAIL ();
+#endif
+}
+
 int
 main (void)
 {
@@ -729,6 +753,7 @@ main (void)
   test10 ();
 #ifndef SKIP_STRNDUP
   test11 ();
+  test12 (1);
 #endif
   DONE ();
 }
diff --git a/gcc/tree-object-size.cc b/gcc/tree-object-size.cc
index 4c1fa9b555f..d90cc43fa97 100644
--- a/gcc/tree-object-size.cc
+++ b/gcc/tree-object-size.cc
@@ -1467,6 +1467,44 @@ merge_object_sizes (struct object_size_info *osi, tree 
dest, tree orig)
   return bitmap_bit_p (osi->reexamine, SSA_NAME_VERSION (orig));
 }
 
+/* For constant sizes, try collapsing a non-constant offset to a constant if
+   possible.  The case handled at the moment is when the offset is a PHI node
+   with all of its targets are constants.  */
+
+static tree
+try_collapsing_offset (tree op, int object_size_type)
+{
+  gcc_assert (!(object_size_type & OST_DYNAMIC));
+
+  if (TREE_CODE (op) != SSA_NAME)
+return op;
+
+  gimple *stmt = SSA_NAME_DEF_STMT (op);
+
+  if (gimple_code (stmt) != GIMPLE_PHI)
+return op;
+
+  tree off = size_unknown (object_size_type);
+
+  for (unsigned i = 0; i < gimple_phi_num_args (stmt); i++)
+{
+  tree rhs = gimple_phi_arg (stmt, i)->def;
+
+  if (TREE_CODE (rhs) != INTEGER_CST)
+   return op;
+
+  /* Note that this is the *opposite* of what we usually do with sizes,
+because the maximum offset estimate here will give us a minimum size
+estimate and vice versa.  */
+  enum tree_code code = (object_size_type & OST_MINIMUM
+? MAX_EXPR : MIN_EXPR);
+
+  off = size_binop (code, off, rhs);
+}
+
+  gcc_assert (TREE_CODE (off) == INTEGER_CST);
+  return off;
+}
 
 /* Compute object_sizes for VAR, defined to the result of an assignment
with operator POINTER_PLUS_EXPR.  Return true if the object size might
@@ -1499,6 +1537,9 @@ plus_stmt_object_size (struct object_size_info *osi, tree 
var, gimple *stmt)
   if (object_sizes_unknown_p (object_size_type, varno))
 return false;
 
+  if (!(object_size_type & OST_DYNAMIC) && TREE_CODE (op1) != INTEGER_CST)
+op1 = try_collapsing_offset (op1, object_size_type);
+
   /* Handle PTR + OFFSET here.  */
   if (size_valid_p (op1, object_size_type)
   && (TREE_CODE (op0) == SSA_NAME || TREE_CODE (op0) == ADDR_EXPR))
-- 
2.45.1



[pushed] c++: avoid init_priority warning in system header

2024-09-14 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

We don't want a warning about a reserved init_priority in a system header
even with -Wsystem-headers.

gcc/cp/ChangeLog:

* tree.cc (handle_init_priority_attribute): Check
in_system_header_at.
---
 gcc/cp/tree.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/cp/tree.cc b/gcc/cp/tree.cc
index 99088da9cee..f43febed124 100644
--- a/gcc/cp/tree.cc
+++ b/gcc/cp/tree.cc
@@ -5196,7 +5196,8 @@ handle_init_priority_attribute (tree* node,
 
   /* Check for init_priorities that are reserved for
  language and runtime support implementations.*/
-  if (pri <= MAX_RESERVED_INIT_PRIORITY)
+  if (pri <= MAX_RESERVED_INIT_PRIORITY
+  && !in_system_header_at (input_location))
 {
   warning
(0, "requested % %i is reserved for internal use",

base-commit: 005f7176e0f457a1e1a7398ddcb4a4972da28c62
-- 
2.46.0



Re: [PATCH] libstdc++: Enable most of for freestanding

2024-09-14 Thread Jonathan Wakely
On Sat, 14 Sept 2024 at 10:39, Arsen Arsenović  wrote:
>
> Jonathan Wakely  writes:
>
> > This restores support for most of  with -ffreestanding. In case
> > there are users who want a minimal freestanding implementation that only
> > provides what the standard guarantees, there's a new macro that disables
> >  again. This can be used to write more portable freestanding
> > code that doesn't rely on  being usable. As we add other things
> > to the freestanding subset (e.g.  for PR 113398 and  for
> > PR 109814) we can add other _GLIBCXX_NO_FREESTANDING_XXX macros, and a
> > _GLIBCXX_NO_FREESTANDING_EXTRAS to define all of them at once. I haven't
> > done that in this patch, because there's on the CHRONO one for now.
>
> That seems reasonable to me.
>
> Do we want to count the headers that we moved into the non-hosted
> configuration into this 'extras' set also (such as tuple)?  Or just
> things not installed in the --disable-hosted-libstdcxx configuration?

Ah yes, good point. I think logically it makes sense for the macro to
disable all extensions, including  etc.

That's assuming the macros will actually be used. Somebody emailed me
off-list saying maybe we should support "freestanding plus extras" and
"freestanding without extras" but I don't know if they actually want
to use the latter, or if it was just a thought they had.

>
> That also impacts what we do if/when we move the stuff that was made not
> freestanding due to std::allocator (IIRC vector and string are in this
> set) into the std_freestanding subset.

Yup.

> The patch seems OK either way.

Thanks for looking.


> > Tested x86_64-linux.
> >
> > -- >8 --
> >
> > This makes durations, time points and calendrical types available for
> > freestanding. The clocks and time zone utilities are disabled for
> > freestanding, as they require functions in the hosted lib.
> >
> > Add support for a new macro _GLIBCXX_NO_FREESTANDING_CHRONO which can be
> > used to explicitly disable  for freestanding.
> >
> > libstdc++-v3/ChangeLog:
> >
> >   * doc/xml/manual/using.xml (_GLIBCXX_NO_FREESTANDING_CHRONO):
> >   Document macro.
> >   * doc/html/*: Regenerate.
> >   * include/bits/chrono.h [_GLIBCXX_NO_FREESTANDING_CHRONO]:
> >   Only include  when this macro is defined.
> >   [_GLIBCXX_HOSTED]: Only define clocks for hosted.
> >   * include/bits/version.def (chrono_udls): Remove hosted=yes.
> >   * include/bits/version.h: Regenerate.
> >   * include/std/chrono [_GLIBCXX_HOSTED]: Only define clocks and
> >   time zone utilities for hosted.
> >   * testsuite/std/time/freestanding.cc: New test.
> > ---
> >  .../doc/html/manual/using_macros.html |  7 +++
> >  libstdc++-v3/doc/xml/manual/using.xml | 12 +
> >  libstdc++-v3/include/bits/chrono.h| 24 ++---
> >  libstdc++-v3/include/bits/version.def |  1 -
> >  libstdc++-v3/include/bits/version.h   |  2 +-
> >  libstdc++-v3/include/std/chrono   | 24 +++--
> >  .../testsuite/std/time/freestanding.cc| 52 +++
> >  7 files changed, 109 insertions(+), 13 deletions(-)
> >  create mode 100644 libstdc++-v3/testsuite/std/time/freestanding.cc
> >
> > diff --git a/libstdc++-v3/doc/html/manual/using_macros.html 
> > b/libstdc++-v3/doc/html/manual/using_macros.html
> > index ae564692630..67623b5e2af 100644
> > --- a/libstdc++-v3/doc/html/manual/using_macros.html
> > +++ b/libstdc++-v3/doc/html/manual/using_macros.html
> > @@ -124,4 +124,11 @@
> >  must be present on all vector operations or none, so this macro 
> > must
> >  be defined to the same value for all translation units that create,
> >  destroy, or modify vectors.
> > +   > class="code">_GLIBCXX_NO_FREESTANDING_CHRONO
> > + Undefined by default. When defined, the
> > +  header cannot
> > + be used with -ffreestanding.
> > + When not defined, durations, time points, and calendar types are
> > + available for freestanding, but the standard clocks and the time zone
> > + database are not (because they require OS support).
> > > width="100%" summary="Navigation footer"> > align="left">Prev  > width="20%" align="center"> > href="using.html">Up  > accesskey="n" href="using_dual_abi.html">Next > width="40%" align="left" valign="top">Headers  > align="center">Home > width="40%" align="right" valign="top"> Dual 
> > ABI
> > \ No newline at end of file
> > diff --git a/libstdc++-v3/doc/xml/manual/using.xml 
> > b/libstdc++-v3/doc/xml/manual/using.xml
> > index 6675359f3b3..4e1c70040b5 100644
> > --- a/libstdc++-v3/doc/xml/manual/using.xml
> > +++ b/libstdc++-v3/doc/xml/manual/using.xml
> > @@ -1321,6 +1321,18 @@ g++ -Winvalid-pch -I. -include stdc++.h -H -g -O2 
> > hello.cc -o test.exe
> >  destroy, or modify vectors.
> >
> >  
> > +
> > +_GLIBCXX_NO_FREESTANDING_CHRONO
> > +
> > +  
> > + Undefined by default. When defined, the
> > + 

Re: [PATCH] tree-object-size: Fold PHI node offsets with constants [PR116556]

2024-09-14 Thread Andrew Pinski
On Sat, Sep 14, 2024 at 5:31 AM Siddhesh Poyarekar  wrote:
>
> In PTR + OFFSET cases, try harder to see if the target offset could
> result in a constant.  Specifically, if the offset is a PHI node with
> all constant branches, return the minimum (or maximum for OST_MINIMUM)
> of the possible values.
>
> gcc/ChangeLog:
>
> PR tree-optimization/116556
> * tree-object-size.cc (try_collapsing_offset): New function.
> (plus_stmt_object_size): Use it.
> * gcc/testsuite/gcc.dg/builtin-object-size-1.c (test12): New
> function.
> (main): Call it.
>
> Signed-off-by: Siddhesh Poyarekar 
> ---
> Tests underway for x86_64 and i686.  OK if they pass?
>
>  gcc/testsuite/gcc.dg/builtin-object-size-1.c | 25 
>  gcc/tree-object-size.cc  | 41 
>  2 files changed, 66 insertions(+)
>
> diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-1.c 
> b/gcc/testsuite/gcc.dg/builtin-object-size-1.c
> index d6d13c5ef7a..eed1653505c 100644
> --- a/gcc/testsuite/gcc.dg/builtin-object-size-1.c
> +++ b/gcc/testsuite/gcc.dg/builtin-object-size-1.c
> @@ -712,6 +712,30 @@ test11 (void)
>  }
>  #endif
>
> +void
> +__attribute__ ((noinline))
> +test12 (unsigned cond)
> +{
> +  char *buf2 = malloc (10);
> +  char *p;
> +  size_t t;
> +
> +  if (cond)
> +t = 8;
> +  else
> +t = 4;
> +
> +  p = &buf2[t];
> +
> +#ifdef __builtin_object_size
> +  if (__builtin_object_size (p, 0) != (cond ? 2 : 6))
> +FAIL ();
> +#else
> +  if (__builtin_object_size (p, 0) != 6)
> +FAIL ();
> +#endif
> +}
> +
>  int
>  main (void)
>  {
> @@ -729,6 +753,7 @@ main (void)
>test10 ();
>  #ifndef SKIP_STRNDUP
>test11 ();
> +  test12 (1);
>  #endif
>DONE ();
>  }
> diff --git a/gcc/tree-object-size.cc b/gcc/tree-object-size.cc
> index 4c1fa9b555f..d90cc43fa97 100644
> --- a/gcc/tree-object-size.cc
> +++ b/gcc/tree-object-size.cc
> @@ -1467,6 +1467,44 @@ merge_object_sizes (struct object_size_info *osi, tree 
> dest, tree orig)
>return bitmap_bit_p (osi->reexamine, SSA_NAME_VERSION (orig));
>  }
>
> +/* For constant sizes, try collapsing a non-constant offset to a constant if
> +   possible.  The case handled at the moment is when the offset is a PHI node
> +   with all of its targets are constants.  */
> +
> +static tree
> +try_collapsing_offset (tree op, int object_size_type)
> +{
> +  gcc_assert (!(object_size_type & OST_DYNAMIC));
> +
> +  if (TREE_CODE (op) != SSA_NAME)
> +return op;
> +
> +  gimple *stmt = SSA_NAME_DEF_STMT (op);
> +
> +  if (gimple_code (stmt) != GIMPLE_PHI)
> +return op;
> +
> +  tree off = size_unknown (object_size_type);
> +
> +  for (unsigned i = 0; i < gimple_phi_num_args (stmt); i++)
> +{
> +  tree rhs = gimple_phi_arg (stmt, i)->def;
> +
> +  if (TREE_CODE (rhs) != INTEGER_CST)
> +   return op;
> +
> +  /* Note that this is the *opposite* of what we usually do with sizes,
> +because the maximum offset estimate here will give us a minimum size
> +estimate and vice versa.  */
> +  enum tree_code code = (object_size_type & OST_MINIMUM
> +? MAX_EXPR : MIN_EXPR);
> +
> +  off = size_binop (code, off, rhs);


I suspect this won't work for integer constants which have the what
would be the sign bit set.

That is:
```

void
__attribute__ ((noinline))
test9 (unsigned cond)
{
  char *buf2 = __builtin_malloc (10);
  char *p;
  __SIZE_TYPE__ t;

  if (cond)
t = -4;
  else
t = 4;
  p = &buf2[4] + t;

  if (__builtin_object_size (&p[0], 0) != 10)
__builtin_abort ();
}
```

Since you do the MIN/MAX in unsigned.

Thanks,
Andrew Pinski

> +}
> +
> +  gcc_assert (TREE_CODE (off) == INTEGER_CST);
> +  return off;
> +}
>
>  /* Compute object_sizes for VAR, defined to the result of an assignment
> with operator POINTER_PLUS_EXPR.  Return true if the object size might
> @@ -1499,6 +1537,9 @@ plus_stmt_object_size (struct object_size_info *osi, 
> tree var, gimple *stmt)
>if (object_sizes_unknown_p (object_size_type, varno))
>  return false;
>
> +  if (!(object_size_type & OST_DYNAMIC) && TREE_CODE (op1) != INTEGER_CST)
> +op1 = try_collapsing_offset (op1, object_size_type);
> +
>/* Handle PTR + OFFSET here.  */
>if (size_valid_p (op1, object_size_type)
>&& (TREE_CODE (op0) == SSA_NAME || TREE_CODE (op0) == ADDR_EXPR))
> --
> 2.45.1
>


Re: [PATCH v2] c++: Don't crash when mangling member with anonymous union or template types [PR100632, PR109790]

2024-09-14 Thread Jason Merrill

On 9/13/24 11:06 AM, Simon Martin wrote:

Hi Jason,

On 12 Sep 2024, at 16:48, Jason Merrill wrote:


On 9/12/24 7:23 AM, Simon Martin wrote:

Hi,

While looking at more open PRs, I have discovered that the problem
reported in PR109790 is very similar to that in PR100632, so I’m
combining both in a single patch attached here. The fix is similar to
the one I initially submitted, only more general and I believe
better.



We currently crash upon mangling members that have an anonymous union
or a template type.

The problem is that before calling write_unqualified_name,
write_member_name has an assert that assumes that it has an
IDENTIFIER_NODE in its hand. However it's incorrect: it has an
anonymous union in PR100632, and a template in PR109790.


The assert does not assume it has an IDENTIFIER_NODE; it assumes it
has a _DECL, and expects its DECL_NAME to be an IDENTIFIER_NODE.

!identifier_p will always be true for a _DECL, making the assert useless.

Indeed, my bad. Thanks for catching and explaining this!


How about checking !DECL_NAME (member) instead of !identifier_p?

Unfortunately it does not fix PR100632, that actually involves
legitimate operators.

I checked why the assert was added in the first place (via r11-6301),

and the idea was to catch any case where we’d be missing the “on”
marker - PR100632 contains such cases.


I assume you mean 109790?


So I took the approach to refactor write_member_name a bit to first
write the marker in all the cases required, and then actually write the
member name; and the assert is not needed anymore there.


Refactoring code in mangle.cc is tricky given the intent to retain 
backward bug-compatibility.


Specifically, adding the "on" in ABI v11 is wrong since GCC 10 (ABI v10) 
didn't emit it for the 109790 testcase; we can add it for v16, since GCC 
11 ICEd on the testcase.


I would prefer to fix the bug locally rather than refactor.

Jason



Re: [PATCH] tree-object-size: Fold PHI node offsets with constants [PR116556]

2024-09-14 Thread Siddhesh Poyarekar

On 2024-09-14 12:10, Andrew Pinski wrote:

+  /* Note that this is the *opposite* of what we usually do with sizes,
+because the maximum offset estimate here will give us a minimum size
+estimate and vice versa.  */
+  enum tree_code code = (object_size_type & OST_MINIMUM
+? MAX_EXPR : MIN_EXPR);
+
+  off = size_binop (code, off, rhs);



I suspect this won't work for integer constants which have the what
would be the sign bit set.



Oops, yes, I'll send v2.

Thanks,
Sid


Re: [PATCH v2] c++: Don't crash when mangling member with anonymous union or template types [PR100632, PR109790]

2024-09-14 Thread Simon Martin
Hi Jason,

On 14 Sep 2024, at 18:11, Jason Merrill wrote:

> On 9/13/24 11:06 AM, Simon Martin wrote:
>> Hi Jason,
>>
>> On 12 Sep 2024, at 16:48, Jason Merrill wrote:
>>
>>> On 9/12/24 7:23 AM, Simon Martin wrote:
 Hi,

 While looking at more open PRs, I have discovered that the problem
 reported in PR109790 is very similar to that in PR100632, so I’m
 combining both in a single patch attached here. The fix is similar 
 to
 the one I initially submitted, only more general and I believe
 better.
>>>
 We currently crash upon mangling members that have an anonymous 
 union
 or a template type.

 The problem is that before calling write_unqualified_name,
 write_member_name has an assert that assumes that it has an
 IDENTIFIER_NODE in its hand. However it's incorrect: it has an
 anonymous union in PR100632, and a template in PR109790.
>>>
>>> The assert does not assume it has an IDENTIFIER_NODE; it assumes it
>>> has a _DECL, and expects its DECL_NAME to be an IDENTIFIER_NODE.
>>>
>>> !identifier_p will always be true for a _DECL, making the assert 
>>> useless.
>> Indeed, my bad. Thanks for catching and explaining this!
>>>
>>> How about checking !DECL_NAME (member) instead of !identifier_p?
>> Unfortunately it does not fix PR100632, that actually involves
>> legitimate operators.
>>
>> I checked why the assert was added in the first place (via r11-6301),
>>
>> and the idea was to catch any case where we’d be missing the 
>> “on”
>> marker - PR100632 contains such cases.
>
> I assume you mean 109790?
Yes :-/
>
>> So I took the approach to refactor write_member_name a bit to first
>> write the marker in all the cases required, and then actually write 
>> the
>> member name; and the assert is not needed anymore there.
>
> Refactoring code in mangle.cc is tricky given the intent to retain 
> backward bug-compatibility.
>
> Specifically, adding the "on" in ABI v11 is wrong since GCC 10 (ABI 
> v10) didn't emit it for the 109790 testcase; we can add it for v16, 
> since GCC 11 ICEd on the testcase.
>
> I would prefer to fix the bug locally rather than refactor.
Understood, that makes sense.

I’ll work on a more local patch and resubmit (by the way you can also 
ignore 
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662496.html, 
that is also “too wide”).

Thanks, Simon



Re: *PING* [PATCH v3 10/10] fortran: Add -finline-intrinsics flag for MINLOC/MAXLOC [PR90608]

2024-09-14 Thread Steve Kargl
On Fri, Sep 13, 2024 at 12:27:07PM +0200, Mikael Morin wrote:
> > 
> > gcc/fortran/ChangeLog:
> > 
> > * invoke.texi(finline-intrinsics): Document new flag.
> > * lang.opt (finline-intrinsics, finline-intrinsics=,
> > fno-inline-intrinsics): New flags.
> > * options.cc (gfc_post_options): If the option variable controling

s/controling/controlling

> >   The default value for @var{n} is 30.
> > +@opindex @code{finline-intrinsics}
> > +@item -finline-intrinsics
> > +@itemx -finline-intrinsics=@var{intr1},@var{intr2},...
> > +Prefer generating inline code over calls to libgfortran functions to 
> > implement
> > +intrinscs.

s/intrinscs/intrinsics

> > +Usage of intrinsics can be implemented either by generating a call to the
> > +libgfortran library function implementing it, or by directly generating the
> > +implementation code inline.  For most intrinsics, only a single of those
> > +variants is available and there is no choice of implementation.  For some 
> > of
> > +them, however, both are available, and for them the 
> > @code{-finline-intrinsics}
> > +flag permits the selection of inline code generation in its positive form, 
> > or
> > +library call generation in its negative form @code{-fno-inline-intrinsics}.
> > +With @code{-finline-intrinsics=...} or @code{-fno-inline-intrinsics=...}, 
> > the
> > +choice applies only to the intrinsics present in the comma-separated list
> > +provided as argument.

While I understand the intent of 'positive form' vs 'negative form', the
above might be clearer as

   Usage of intrinsics can be implemented either by generating a call
   to the libgfortran library function or by directly generating inline
   code.  For most intrinsics, only a single variant is available, and
   there is no choice of implementation.  However, some intrinsics can
   use a library function or inline code, wher inline code typically offers
   opportunities for additional optimization over a library function.
   With @code{-finline-intrinsics=...} or @code{-fno-inline-intrinsics=...}, the
   choice applies only to the intrinsics present in the comma-separated list
   provided as argument.

> > +For each intrinsic, if no choice of implementation was made through either 
> > of
> > +the flag variants, a default behaviour is chosen depending on optimization:
> > +library calls are generated when not optimizing or when optimizing for 
> > size;
> > +otherwise inline code is preferred.
> > +


OK with consideration the above comments.

-- 
Steve


Re: [PATCH v2] Provide more contexts for -Warray-bounds warning messages

2024-09-14 Thread Joern Rennecke
> 1. Change the name of the option from:
>
> -fdiagnostic-try-to-explain-harder
> To
> -fdiagnostic-explain-harder
I can think of a lot of connotations for this name, but alas, they are
unfortunate, off-topic, or both.

Some more neutral ideas:

-fdiagnostics-verbose
-fdiagnostic-details

Or maybe a bit more specific:

-fdiagnostics-trace-source
-fdiagnostics-show-source-condition

if part of a group of diagnostic-augmenting options:

-fdiagnostic-details=condition

Or if you envisage this facility to be used with some debug info extension so 
that gdb can eventually also show where exactly the code comes from:

-frecord-source-condition
-frecord-code-duplication-condition


Although there is also some potential here to misconstrue 'condition' to be 
about the style etc of the source, so maybe someone can come up with a better 
name still?



Re: *PING* [PATCH v3 10/10] fortran: Add -finline-intrinsics flag for MINLOC/MAXLOC [PR90608]

2024-09-14 Thread Steve Kargl
On Sat, Sep 14, 2024 at 11:02:42AM -0700, Steve Kargl wrote:
> 
> While I understand the intent of 'positive form' vs 'negative form', the
> above might be clearer as
> 
>Usage of intrinsics can be implemented either by generating a call
>to the libgfortran library function or by directly generating inline
>code.  For most intrinsics, only a single variant is available, and
>there is no choice of implementation.  However, some intrinsics can
>use a library function or inline code, wher inline code typically offers

Whoops.

s/wher/where

-- 
Steve


Re: [PATCH] Try fixing RISC-V .SELECT_VL with SLP

2024-09-14 Thread Richard Biener



> Am 14.09.2024 um 11:38 schrieb Robin Dapp :
> 
> 
>> 
>> The following simply removes a seemingly bogus guard.
>> 
>>* tree-vect-loop.cc (vect_analyze_loop_1): Remove SLP guard
>>from .SELECT_VL disabling.
>> ---
>> gcc/tree-vect-loop.cc | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
>> index cc15492f6a0..378e7c560bd 100644
>> --- a/gcc/tree-vect-loop.cc
>> +++ b/gcc/tree-vect-loop.cc
>> @@ -3078,7 +3078,7 @@ start_over:
>>   if (direct_internal_fn_supported_p (IFN_SELECT_VL, iv_type,
>>  OPTIMIZE_FOR_SPEED)
>>  && LOOP_VINFO_LENS (loop_vinfo).length () == 1
>> -  && LOOP_VINFO_LENS (loop_vinfo)[0].factor == 1 && !slp
>> +  && LOOP_VINFO_LENS (loop_vinfo)[0].factor == 1
> 
> I don't think it will just work like that.  The problem is that we adjust data
> pointer increments according to the current vector length in
> vect_get_loop_variant_data_ptr_increment.
> 
> To this end we use the SELECT_VL result and multiply it with the current
> dataref's step.  Without having looked into it closer I suppose using step is
> not suffcient or maybe even wrong in an SLP setting and I guess this
> complication lead to SLP being disabled initially.

Yes, that’s one complication.  Another is that the length would need to be 
scaled by the SLP group size - but maybe one could use lmul for that…  
(assuming an uniform loop)

Restricting SLP to the case of all single-lane nodes should work though.

Richard.

> --
> Regards
> Robin
> 


Re: [Patch, Fortran] Implement Unsigned for SUM and PRODUCT

2024-09-14 Thread Thomas Koenig

As Jerry wrote (thanks!), this was missing the attached patch.

Best regards

Thomasdiff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi
index 829ab00c665..e5ffe67 100644
--- a/gcc/fortran/gfortran.texi
+++ b/gcc/fortran/gfortran.texi
@@ -2788,7 +2788,7 @@ As of now, the following intrinsics take unsigned arguments:
 @item @code{MVBITS}
 @item @code{RANGE}
 @item @code{TRANSFER}
-@item @code{MATMUL} and @code{DOT_PRODUCT}
+@item @code{SUM}, @code{PRODUCT}, @code{MATMUL} and @code{DOT_PRODUCT}
 @end itemize
 This list will grow in the near future.
 @c -
diff --git a/gcc/fortran/iresolve.cc b/gcc/fortran/iresolve.cc
index 32b31432e58..92a591cf6d7 100644
--- a/gcc/fortran/iresolve.cc
+++ b/gcc/fortran/iresolve.cc
@@ -175,9 +175,11 @@ resolve_bound (gfc_expr *f, gfc_expr *array, gfc_expr *dim, gfc_expr *kind,
 
 static void
 resolve_transformational (const char *name, gfc_expr *f, gfc_expr *array,
-			  gfc_expr *dim, gfc_expr *mask)
+			  gfc_expr *dim, gfc_expr *mask,
+			  bool use_integer = false)
 {
   const char *prefix;
+  bt type;
 
   f->ts = array->ts;
 
@@ -200,9 +202,18 @@ resolve_transformational (const char *name, gfc_expr *f, gfc_expr *array,
   gfc_resolve_dim_arg (dim);
 }
 
+  /* For those intrinsic like SUM where we the integer version
+ actually uses unsigned, but we call it as the integer
+ version.  */
+
+  if (use_integer && array->ts.type == BT_UNSIGNED)
+type = BT_INTEGER;
+  else
+type = array->ts.type;
+
   f->value.function.name
 = gfc_get_string (PREFIX ("%s%s_%c%d"), prefix, name,
-		  gfc_type_letter (array->ts.type),
+		  gfc_type_letter (type),
 		  gfc_type_abi_kind (&array->ts));
 }
 
@@ -2333,7 +2344,7 @@ void
 gfc_resolve_product (gfc_expr *f, gfc_expr *array, gfc_expr *dim,
 		 gfc_expr *mask)
 {
-  resolve_transformational ("product", f, array, dim, mask);
+  resolve_transformational ("product", f, array, dim, mask, true);
 }
 
 
@@ -2881,7 +2892,7 @@ gfc_resolve_storage_size (gfc_expr *f, gfc_expr *a ATTRIBUTE_UNUSED,
 void
 gfc_resolve_sum (gfc_expr *f, gfc_expr *array, gfc_expr *dim, gfc_expr *mask)
 {
-  resolve_transformational ("sum", f, array, dim, mask);
+  resolve_transformational ("sum", f, array, dim, mask, true);
 }
 
 
diff --git a/gcc/fortran/simplify.cc b/gcc/fortran/simplify.cc
index 83d0fdc9ea9..e5681c42a48 100644
--- a/gcc/fortran/simplify.cc
+++ b/gcc/fortran/simplify.cc
@@ -359,7 +359,16 @@ init_result_expr (gfc_expr *e, int init, gfc_expr *array)
 	  mpz_set_si (e->value.integer, init);
 	break;
 
-	  case BT_REAL:
+	  case BT_UNSIGNED:
+	if (init == INT_MIN)
+	  mpz_set_ui (e->value.integer, 0);
+	else if (init == INT_MAX)
+	  mpz_set (e->value.integer, gfc_unsigned_kinds[i].huge);
+	else
+	  mpz_set_ui (e->value.integer, init);
+	break;
+
+	case BT_REAL:
 	if (init == INT_MIN)
 	  {
 		mpfr_set (e->value.real, gfc_real_kinds[i].huge, GFC_RND_MODE);
diff --git a/libgfortran/generated/product_c10.c b/libgfortran/generated/product_c10.c
index 9438f5d7316..c57f1642b65 100644
--- a/libgfortran/generated/product_c10.c
+++ b/libgfortran/generated/product_c10.c
@@ -29,13 +29,13 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #if defined (HAVE_GFC_COMPLEX_10) && defined (HAVE_GFC_COMPLEX_10)
 
 
-extern void product_c10 (gfc_array_c10 * const restrict, 
+extern void product_c10 (gfc_array_c10 * const restrict,
 	gfc_array_c10 * const restrict, const index_type * const restrict);
 export_proto(product_c10);
 
 void
-product_c10 (gfc_array_c10 * const restrict retarray, 
-	gfc_array_c10 * const restrict array, 
+product_c10 (gfc_array_c10 * const restrict retarray,
+	gfc_array_c10 * const restrict array,
 	const index_type * const restrict pdim)
 {
   index_type count[GFC_MAX_DIMENSIONS];
@@ -188,15 +188,15 @@ product_c10 (gfc_array_c10 * const restrict retarray,
 }
 
 
-extern void mproduct_c10 (gfc_array_c10 * const restrict, 
+extern void mproduct_c10 (gfc_array_c10 * const restrict,
 	gfc_array_c10 * const restrict, const index_type * const restrict,
 	gfc_array_l1 * const restrict);
 export_proto(mproduct_c10);
 
 void
-mproduct_c10 (gfc_array_c10 * const restrict retarray, 
-	gfc_array_c10 * const restrict array, 
-	const index_type * const restrict pdim, 
+mproduct_c10 (gfc_array_c10 * const restrict retarray,
+	gfc_array_c10 * const restrict array,
+	const index_type * const restrict pdim,
 	gfc_array_l1 * const restrict mask)
 {
   index_type count[GFC_MAX_DIMENSIONS];
@@ -378,15 +378,15 @@ mproduct_c10 (gfc_array_c10 * const restrict retarray,
 }
 
 
-extern void sproduct_c10 (gfc_array_c10 * const restrict, 
+extern void sproduct_c10 (gfc_array_c10 * const restrict,
 	gfc_array_c10 * const restrict, const index_type * const restrict,
 	GFC_LOGICAL_4 *);
 export_proto(sproduct_c10);
 
 void
-sproduct_c10 (gfc_array_c10 * const restrict retarray,

Re: [PATCH v5] Provide new GCC builtin __builtin_counted_by_ref [PR116016]

2024-09-14 Thread Jakub Jelinek
On Wed, Sep 11, 2024 at 09:13:40PM +, Qing Zhao wrote:
> @@ -11741,6 +11770,55 @@ c_parser_postfix_expression (c_parser *parser)
>   set_c_expr_source_range (&expr, loc, close_paren_loc);
>   break;
> }
> + case RID_BUILTIN_COUNTED_BY_REF:
> +   {
> + vec *cexpr_list;
> + c_expr_t *e_p;
> + location_t close_paren_loc;
> +
> + in_builtin_counted_by_ref++;

I think this in_builtin_counted_by_ref stuff here plus the whole c-typeck.cc
hunk should be replaced with:

> + c_parser_consume_token (parser);
> + if (!c_parser_get_builtin_args (parser,
> + "__builtin_counted_by_ref",
> + &cexpr_list, false,
> + &close_paren_loc))
> +   {
> + expr.set_error ();
> + in_builtin_counted_by_ref--;
> + break;
> +   }
> + if (vec_safe_length (cexpr_list) != 1)
> +   {
> + error_at (loc, "wrong number of arguments to "
> +"%<__builtin_counted_by_ref%>");
> + expr.set_error ();
> + in_builtin_counted_by_ref--;
> + break;
> +   }
> +
> + e_p = &(*cexpr_list)[0];
> +
> + if (TREE_CODE (TREE_TYPE (e_p->value)) != ARRAY_TYPE)
> +   {
> + error_at (loc, "the argument must be an array"
> +"%<__builtin_counted_by_ref%>");
> + expr.set_error ();
> + in_builtin_counted_by_ref--;
> + break;
> +   }
> +

if (has_counted_by_object (e_p->value))
  expr.value = get_counted_by_ref (e_p->value);
else if (in_typeof && TREE_CODE (e_p->value) == COMPONENT_REF)
  {
tree counted_by_type = NULL_TREE;
tree arg = (*cexpr_
if (build_counted_by_ref (TREE_OPERAND (e_p->value, 0),
  TREE_OPERAND (e_p->value, 1),
  &counted_by_type))
  expr.value
= build_zero_cst (build_pointer_type (counted_by_type));
else
  expr.value = null_pointer_node;
  }
else
  expr.value = null_pointer_node;

(plus make build_counted_by_ref non-static and add prototype).

Completely untested.

> + if (has_counted_by_object ((*cexpr_list)[0].value))
> +   expr.value
> + = get_counted_by_ref ((*cexpr_list)[0].value);
> + else
> +   expr.value
> + = build_int_cst (build_pointer_type (void_type_node), 0);

Jakub



[RFC PATCH] Allow functions with target_clones attribute to be inlined

2024-09-14 Thread Yangyu Chen
I recently found that target_clones functions cannot inline even when
the caller has exactly the same target. However, if we only use target
attributes in C++ and let the compiler generate IFUNC for us, the
functions with the same target will be inlined.

For example, the following code compiled on x86-64 target with -O3 will
generate IFUNC for foo and bar and inline foo into the bar:

```cpp
__attribute__((target("default")))
int foo(int *arr) {
int sum = 0;
for (int i=0;i<16;i++) sum += arr[i];
return sum;
}

__attribute__((target("avx2")))
int foo(int *arr) {
int sum = 0;
for (int i=0;i<16;i++) sum += arr[i];
return sum;
}

__attribute__((target("default")))
int bar(int *arr) {
return foo(arr);
}

__attribute__((target("avx2")))
int bar(int *arr) {
return foo(arr);
}
```

However, if we use target_clones attribute, the target_clones functions
will not be inlined:

```cpp
__attribute__((target_clones("default","avx2")))
int foo(int *arr) {
int sum = 0;
for (int i=0;i<16;i++) sum += arr[i];
return sum;
}

__attribute__((target_clones("default","avx2")))
int bar(int *arr) {
return foo(arr);
}
```

This behavior may negatively impact performance since the target_clones
functions are not inlined. And since we didn't jump to the target_clones
functions based on PLT but used the same target as the caller's target.
I think it's better to allow the target_clones functions to be inlined.

gcc/ada/ChangeLog:

* gcc-interface/utils.cc (handle_target_clones_attribute):
Allow functions with target_clones attribute to be inlined.

gcc/c-family/ChangeLog:

* c-attribs.cc (handle_target_clones_attribute):
Allow functions with target_clones attribute to be inlined.

gcc/d/ChangeLog:

* d-attribs.cc (d_handle_target_clones_attribute):
Allow functions with target_clones attribute to be inlined.

Signed-off-by: Yangyu Chen 
---
 gcc/ada/gcc-interface/utils.cc | 5 +
 gcc/c-family/c-attribs.cc  | 3 ---
 gcc/d/d-attribs.cc | 5 -
 3 files changed, 1 insertion(+), 12 deletions(-)

diff --git a/gcc/ada/gcc-interface/utils.cc b/gcc/ada/gcc-interface/utils.cc
index 60f36b1e50d..d010b684177 100644
--- a/gcc/ada/gcc-interface/utils.cc
+++ b/gcc/ada/gcc-interface/utils.cc
@@ -7299,10 +7299,7 @@ handle_target_clones_attribute (tree *node, tree name, 
tree ARG_UNUSED (args),
  int ARG_UNUSED (flags), bool *no_add_attrs)
 {
   /* Ensure we have a function type.  */
-  if (TREE_CODE (*node) == FUNCTION_DECL)
-/* Do not inline functions with multiple clone targets.  */
-DECL_UNINLINABLE (*node) = 1;
-  else
+  if (TREE_CODE (*node) != FUNCTION_DECL)
 {
   warning (OPT_Wattributes, "%qE attribute ignored", name);
   *no_add_attrs = true;
diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 4dd2eecbea5..f8759bb1908 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -6105,9 +6105,6 @@ handle_target_clones_attribute (tree *node, tree name, 
tree ARG_UNUSED (args),
   "single % attribute is ignored");
  *no_add_attrs = true;
}
-  else
-  /* Do not inline functions with multiple clone targets.  */
-   DECL_UNINLINABLE (*node) = 1;
 }
   else
 {
diff --git a/gcc/d/d-attribs.cc b/gcc/d/d-attribs.cc
index 0f7ca10e017..9f67415adb1 100644
--- a/gcc/d/d-attribs.cc
+++ b/gcc/d/d-attribs.cc
@@ -788,11 +788,6 @@ d_handle_target_clones_attribute (tree *node, tree name, 
tree, int,
   warning (OPT_Wattributes, "%qE attribute ignored", name);
   *no_add_attrs = true;
 }
-  else
-{
-  /* Do not inline functions with multiple clone targets.  */
-  DECL_UNINLINABLE (*node) = 1;
-}
 
   return NULL_TREE;
 }
-- 
2.45.2



[PUSHED] testsuite; Fix execute/pr52286.c for 16bit

2024-09-14 Thread Andrew Pinski
The code path which was added for 16bit had a broken inline-asm which would
only assign maybe half of the registers for the `long` type to 0.

Adding L to the input operand of the inline-asm fixes the issue by now assigning
the full 32bit value of the input register that would match up with the output 
register.

Fixes r0-115223-gb0408f13d4b317 which added the 16bit code path to fix the 
testcase for 16bit.

Pushed as obvious.

PR testsuite/116716

gcc/testsuite/ChangeLog:

* gcc.c-torture/execute/pr52286.c: Fix inline-asm for 16bit case.
---
 gcc/testsuite/gcc.c-torture/execute/pr52286.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.c-torture/execute/pr52286.c 
b/gcc/testsuite/gcc.c-torture/execute/pr52286.c
index bb56295ab52..4fd5d6ac813 100644
--- a/gcc/testsuite/gcc.c-torture/execute/pr52286.c
+++ b/gcc/testsuite/gcc.c-torture/execute/pr52286.c
@@ -11,7 +11,7 @@ main ()
   b = (~a | 1) & -2038094497;
 #else
   long a, b;
-  asm ("" : "=r" (a) : "0" (0));
+  asm ("" : "=r" (a) : "0" (0L));
   b = (~a | 1) & -2038094497L;
 #endif
   if (b >= 0)
-- 
2.43.0



libgomp: with USM, init 'link' variables with host address

2024-09-14 Thread Tobias Burnus
The idea of link variables is to replace he full device variable by a 
pointer, permitting to map only parts of the variable to the device, 
saving memory.


However, having a pointer permits for (unified) shared memory to point 
to the host variable.


That's what this patch does: instead of having a dangling pointer, upon 
loading the image, the device side pointers are updated to point to the 
host. With the current patch, this is only done when explicitly 
requesting unified-shared memory.


Tested on x86-64-gnu-linux and nvptx offloading (that supports USM).

Remarks/comments/suggestions before I commit it?

Tobias

PS: I intent to do some additional changes for improved USM handling. 
Once done, I intent to look into (a) given the user a bit more power on 
mapping vs. not mapping and (b) to use for APUs by default USM, even 
without 'requires unified_shared_memory'.
libgomp: with USM, init 'link' variables with host address

If requires unified_shared_memory is set, make 'declare target link'
variables to point initially to the host pointer.

libgomp/ChangeLog:

	* target.c (gomp_load_image_to_device): For requires
	unified_shared_memory, update 'link' vars to point to the host var.
	* testsuite/libgomp.c-c++-common/target-link-3.c: New test.

 libgomp/target.c   |  5 +++
 .../testsuite/libgomp.c-c++-common/target-link-3.c | 52 ++
 2 files changed, 57 insertions(+)

diff --git a/libgomp/target.c b/libgomp/target.c
index 47ec36928a6..66b54fd2ab8 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -2451,6 +2451,11 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
   array->right = NULL;
   splay_tree_insert (&devicep->mem_map, array);
   array++;
+
+  if (is_link_var
+	  && (omp_requires_mask & GOMP_REQUIRES_UNIFIED_SHARED_MEMORY))
+	gomp_copy_host2dev (devicep, NULL, (void *) target_var->start,
+			&k->host_start, sizeof (void *), false, NULL);
 }
 
   /* Last entry is for the ICV struct variable; if absent, start = end = 0.  */
diff --git a/libgomp/testsuite/libgomp.c-c++-common/target-link-3.c b/libgomp/testsuite/libgomp.c-c++-common/target-link-3.c
new file mode 100644
index 000..c707b38b7d4
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/target-link-3.c
@@ -0,0 +1,52 @@
+/* { dg-do run }  */
+
+#include 
+#include 
+
+#pragma omp requires unified_shared_memory
+
+int A[3] = {-3,-4,-5};
+static int q = -401;
+#pragma omp declare target link(A, q)
+
+#pragma omp begin declare target
+void
+f (uintptr_t *pA, uintptr_t *pq)
+{
+  if (A[0] != 1 || A[1] != 2 || A[2] != 3 || q != 42)
+__builtin_abort ();
+  A[0] = 13;
+  A[1] = 14;
+  A[2] = 15;
+  q = 23;
+  *pA = (uintptr_t) &A[0];
+  *pq = (uintptr_t) &q;
+}
+#pragma omp end declare target
+
+int
+main ()
+{
+  uintptr_t hpA = (uintptr_t) &A[0];
+  uintptr_t hpq = (uintptr_t) &q;
+  uintptr_t dpA, dpq;
+
+  A[0] = 1;
+  A[1] = 2;
+  A[2] = 3;
+  q = 42;
+
+  for (int i = 0; i <= omp_get_num_devices (); ++i)
+{
+  #pragma omp target device(device_num: i) map(dpA, dpq)
+	f (&dpA, &dpq);
+  if (hpA != dpA || hpq != dpq)
+	__builtin_abort ();
+  if (A[0] != 13 || A[1] != 14 || A[2] != 15 || q != 23)
+	__builtin_abort ();
+  A[0] = 1;
+  A[1] = 2;
+  A[2] = 3;
+  q = 42;
+}
+}


Re: RFC PATCH: contrib/test_summary mode for submitting testsuite results to bunsen

2024-09-14 Thread Eric Gallager
On Fri, Sep 13, 2024 at 4:19 PM Frank Ch. Eigler  wrote:
>
> Hi -
>
> The gcc-testresults mailing list is a well-established place to plop
> snippets of testsuite results.  It's an okay way to archive and
> distribute overall counts, but it's not machine readable, and it's way
> incomplete (lacks .log content, a lot of metadata, barely meaningfully
> searchable) for trying to understand why something failed.
>
> A few years ago, our team @ RH built the bunsen system to serve as a
> structured repository for dejagnu (and other) testsuite types. [1] An
> instance running here [2] has been collecting data from the sourceware
> buildbots and individuals (hi iains!) for years.  It convers all the
> toolchain pieces and then some, including gcc.  It features machine
> readable indexing, cross-referenced test case browsing, regression
> analysis, old data aging, and many other capabilities.  (It even has a
> piece that extracts data from the gcc-testresults@ mailing list
> archives, but there's too little there to bother. [4])
>
> This patch attempts to make it easy for gcc developers who use the
> contrib/test_summary script today to opt in to contributing their
> results to a bunsen server (defaulting to the sourceware one [2]).
>
> The prerequites are:
> - git
> - the bunsen t-upload-git-push script [3], feel free to grab just that
>   into your $PATH
> - sourceware account with bunsendb commit access [ask on
>   admin-reque...@sourceware.org]
> - pretty much nothing else - specifically, gcc commit access is not
>   necessary!
>
> Here's how it looks when it's run after a random small gcc build/test:
> .../contrib/test_summary -h
>   [...]
>  -b: instead of emailing, push test logs into a bunsen git repo
>  -bg REPO: specify the bunsen git repo to override default
>  -bt TAG: specify the bunsen git commit tag to override default
>
> .../contrib/test_summary -b | env PATH=$BUNSEN/INST/bin:$PATH sh -x
> + echo master
> + echo basepoints/gcc-15-3524-ga523c2ba5862
> + echo a523c2ba58621c3630a1cd890d6db82879f92c90
> + echo git://gcc.gnu.org/git/gcc.git
> + find . -name '*.log' -o -name '*.sum' -o -name '.bunsen.*'
> + t-upload-git-push ssh://sourceware.org/git/bunsendb.git/ 
> fche/gcc/x86_64-20240913-1516
> ec57fb8ee928e341d1f0d1b09c1d571fb590bd2b 
> refs/tags/fche/gcc/x86_64-20240913-1516
>
> And here's what that dataset looks like on bunsen a few minutes later:
>
> https://builder.sourceware.org/testrun/ec57fb8ee928e341d1f0d1b09c1d571fb590bd2b
>
> If this is of any interest, I'd be glad to hack on this script further
> to make it acceptable.
>

I pretty much only use `contrib/test_summary` via the
`mail-report.log` target in the top-level Makefile; maybe add a
`bunsen` target to the top-level Makefile, too, to simplify invoking
it?

>
> [1] https://sourceware.org/bunsen/
> [2] https://builder.sourceware.org/testruns/
> [3] https://sourceware.org/git/?p=bunsen.git;a=blob;f=bin/t-upload-git-push
> [4] 
> https://sourceware.org/git/?p=bunsen.git;a=blob;f=bin/t-sourceware-mails-import
>
>
> diff --git a/contrib/ChangeLog b/contrib/ChangeLog
> index 9b36caf02bb1..def8dd8a8a73 100644
> --- a/contrib/ChangeLog
> +++ b/contrib/ChangeLog
> @@ -1,3 +1,9 @@
> +2024-09-13  Frank Ch. Eigler  
> +
> +   * test_summary: Add -b (bunsen) mode to report all test results
> +   into a https://sourceware.org/bunsen/ system instead of emailing
> +   extracts.
> +
>  2024-08-01  Thomas Schwinge  
>
> * gcc_update (files_and_dependencies): Update for
> diff --git a/contrib/test_summary b/contrib/test_summary
> index 5760b053ec27..867ada4d6b81 100755
> --- a/contrib/test_summary
> +++ b/contrib/test_summary
> @@ -39,6 +39,9 @@ if test x"$1" = "x-h"; then
>   should be selected from the log files.
>   -f: force reports to be mailed; if omitted, only reports that differ
>   from the sent.* version are sent.
> + -b: instead of emailing, push test logs into a bunsen git repo
> + -bg REPO: specify the bunsen git repo to override default
> + -bt TAG: specify the bunsen git commit tag to override default
>  _EOF
>exit 0
>  fi
> @@ -57,6 +60,9 @@ fi
>  : ${filesuffix=}; export filesuffix
>  : ${move=true}; export move
>  : ${forcemail=false}; export forcemail
> +: ${bunsen=false};
> +: ${bunsengit=ssh://sourceware.org/git/bunsendb.git/};
> +: ${bunsentag=`whoami`/gcc/`uname -m`-`date +%Y%m%d-%H%M`};
>  while true; do
>  case "$1" in
>-o) filesuffix=.sent; move=false; : ${mailto=nobody}; shift;;
> @@ -64,10 +70,30 @@ while true; do
>-p) prepend_logs=${prepend_logs+"$prepend_logs "}"$2"; shift 2;;
>-i) append_logs=${append_logs+"$append_logs "}"$2"; shift 2;;
>-m) mailto=$2; forcemail=true; shift 2;;
> +  -b) bunsen=true; shift;;
> +  -bg) bunsengit=$2; shift 2;;
> +  -bt) bunsentag=$2; shift 2;;
>-f) unset mailto; forcemail=true; shift;;
>*) break;;
>  esac
>  done
> +if [ "x$bunsen" = "xtrue" ]; then
> +gitsrcdir=`dirname "$0"` # this script,

New Chinese (simplified) PO file for 'gcc' (version 14.2.0)

2024-09-14 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Chinese (simplified) team of translators.  The file is available at:

https://translationproject.org/latest/gcc/zh_CN.po

(This file, 'gcc-14.2.0.zh_CN.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: [PATCH] phi-opt: Improve heuristics for factoring out with constant (again) [PR116699]

2024-09-14 Thread Richard Biener
On Sat, Sep 14, 2024 at 1:02 AM Andrew Pinski  wrote:
>
> The heuristics for factoring out with a constant checks that the assignment 
> statement
> is the last statement of the basic block but sometimes there is a predicate 
> or a nop statement
> after the assignment. Rejecting this case does not make sense since both 
> predicates and nop
> statements are removed and don't contribute any instructions. So we should 
> skip over them
> when checking if the assignment statement was the last statement in the basic 
> block.
>
> phi-opt-factor-1.c's f0 is such an example where it should catch it at 
> phiopt1 (before predicates are removed)
> and should happen in a similar way as f1 (which uses a temporary variable 
> rather than return).


OK

> Bootstrapped and tested on x86_64-linux-gnu.
>
> PR tree-optimization/116699
>
> gcc/ChangeLog:
>
> * tree-ssa-phiopt.cc (factor_out_conditional_operation): Skip over 
> nop/predicates
> for seeing the assignment is the last statement.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/phi-opt-factor-1.c: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  .../gcc.dg/tree-ssa/phi-opt-factor-1.c| 26 +++
>  gcc/tree-ssa-phiopt.cc|  6 +
>  2 files changed, 32 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-factor-1.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-factor-1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-factor-1.c
> new file mode 100644
> index 000..12b846b9337
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-factor-1.c
> @@ -0,0 +1,26 @@
> +/* { dg-options "-O2 -fdump-tree-phiopt" } */
> +
> +/* PR tree-optimization/116699
> +   Make sure the return PREDICT has no factor in deciding
> +   if we factor out the conversion. */
> +
> +short f0(int a, int b, int c)
> +{
> +  int t1 = 4;
> +  if (c < t1)  return (c > -1 ? c : -1);
> +  return t1;
> +}
> +
> +
> +short f1(int a, int b, int c)
> +{
> +  int t1 = 4;
> +  short t = t1;
> +  if (c < t1)  t = (c > -1 ? c : -1);
> +  return t;
> +}
> +
> +/* Both f1 and f0  should be optimized at phiopt1 to the same thing. */
> +/* { dg-final { scan-tree-dump-times "MAX_EXPR " 2 "phiopt1" } } */
> +/* { dg-final { scan-tree-dump-times "MIN_EXPR " 2  "phiopt1" } } */
> +/* { dg-final { scan-tree-dump-not "if " "phiopt1" } } */
> diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
> index e5413e40572..7b12692237e 100644
> --- a/gcc/tree-ssa-phiopt.cc
> +++ b/gcc/tree-ssa-phiopt.cc
> @@ -360,6 +360,12 @@ factor_out_conditional_operation (edge e0, edge e1, gphi 
> *phi,
> }
>   gsi = gsi_for_stmt (arg0_def_stmt);
>   gsi_next_nondebug (&gsi);
> + /* Skip past nops and predicates. */
> + while (!gsi_end_p (gsi)
> + && (gimple_code (gsi_stmt (gsi)) == GIMPLE_NOP
> + || gimple_code (gsi_stmt (gsi)) == GIMPLE_PREDICT))
> +   gsi_next_nondebug (&gsi);
> + /* Reject if the statement was not at the end of the block. */
>   if (!gsi_end_p (gsi))
> return NULL;
> }
> --
> 2.43.0
>


Re: [PATCH] Mark the copy/move constructor/operator= of auto_bitmap as delete

2024-09-14 Thread Richard Biener
On Sat, Sep 14, 2024 at 9:24 AM Andrew Pinski  wrote:
>
> Since we are written in C++11, these should be marked as delete rather
> than just private.

OK

> Bootstrapped and tested on x86_64-linux-gnu.
>
> gcc/ChangeLog:
>
> * bitmap.h (class auto_bitmap): Mark copy/move constructor/operator=
> as deleted.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/bitmap.h | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/bitmap.h b/gcc/bitmap.h
> index 4cad1b4d6c6..451edcfc590 100644
> --- a/gcc/bitmap.h
> +++ b/gcc/bitmap.h
> @@ -959,10 +959,10 @@ class auto_bitmap
>
>   private:
>// Prevent making a copy that references our bitmap.
> -  auto_bitmap (const auto_bitmap &);
> -  auto_bitmap &operator = (const auto_bitmap &);
> -  auto_bitmap (auto_bitmap &&);
> -  auto_bitmap &operator = (auto_bitmap &&);
> +  auto_bitmap (const auto_bitmap &) = delete;
> +  auto_bitmap &operator = (const auto_bitmap &) = delete;
> +  auto_bitmap (auto_bitmap &&) = delete;
> +  auto_bitmap &operator = (auto_bitmap &&) = delete;
>
>bitmap_head m_bits;
>  };
> --
> 2.43.0
>


Re: [PATCH] vect: release defs of removed statement

2024-09-14 Thread Richard Biener
On Sat, Sep 14, 2024 at 9:24 AM Andrew Pinski  wrote:
>
> While trying to add use of simple_dce_from_worklist
> to the vectorizer so we don't need to run a full blown
> DCE pass after the vectorizer, there was a crash noticed
> due to a ssa name which has a stmt without a bb. This was
> due to not calling release_defs after the call to gsi_remove.
>
> Note the code to remove zero use statements should be able to
> remove once the use of simple_dce_from_worklist has been added.
> But in the meantime, fixing this bug will also improve memory
> usage and a few other things which look through all ssa names.

OK

> gcc/ChangeLog:
>
> * tree-vect-loop.cc (optimize_mask_stores): Call release_defs
> after the call to gsi_remove with last argument of true.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/tree-vect-loop.cc | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index cc15492f6a0..62c7f90779f 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -12803,6 +12803,7 @@ optimize_mask_stores (class loop *loop)
>   if (has_zero_uses (lhs))
> {
>   gsi_remove (&gsi_from, true);
> + release_defs (stmt1);
>   continue;
> }
> }
> --
> 2.43.0
>


Re: [PATCH] c++, coroutines: Fix handling of bool await_suspend() [PR115905].

2024-09-14 Thread Iain Sandoe
Hi Jason,

> On 10 Sep 2024, at 20:22, Jason Merrill  wrote:
> 
> On 9/7/24 6:45 AM, Iain Sandoe wrote:
>> As noted in the PR the action of the existing implementation was to
>> treat a false value from await_suspend () as equivalent to "do not
>> suspend".  Actually it needs to be the equivalent of "resume" - and
>> we need to restart the dispatcher - since the await_suspend() body
>> could have already resumed the coroutine.
>> See also https://github.com/cplusplus/CWG/issues/601 (NAD) for more
>> discussion.
> 
> I'm having trouble wrapping my head around this stuff, so I'll take your word 
> for it.  :)  It would be nice to have more full pseudocode.

It was not 100% clear to any of the implementers it seems - since all of them 
got this wrong.

There are two pseudo-codes, I suppose:  one that reflects the compile-time 
program flow (since we have to react differently to various of the 
customisation points) and one that reflects the user’s runtime flow (which also 
depends on the customisation points).

Will think about how to expand this some more - maybe in a larger block at the 
start of the file.

>> + /* Finish the destory dispatcher. */
> 
> Typo.
fixed,
>> 
>> +  /* Build the dispatcher:
>> + if (resume index is odd)
>> +{
>> +  switch (resume index)
>> +   case 1:
>> + goto cleanup.
>> +   case ... odd suspension point number
>> + .CO_ACTOR (... odd suspension point number)
>> + break;
>> +   default:
>> + break;
>> +}
>> +  else
>> +{
>> +  coro.restart.dispatch:
>> +   case 0:
>> + goto start.
>> +   case ... even suspension point number
>> + .CO_ACTOR (... even suspension point number)
>> + break;
>> +   default:
>> + break;
>> +}
>> +we should not get here unless something is broken badly.
>> + __builtin_trap ();
>> +*/
> 
> Maybe mention in this that the odd case is destroy and the even case is 
> resume?

done.

>  Why is it useful to have two separate switches when they have the same 
> default: behavior?

At some (reasonably soon) point I want to try adjusting the call to the actor  
so that it takes
a “destroy” parameter (and then to have a small shim for the resume() call - 
like the one we
have for the destroy()).  The investigation there is to whether we can get 
better inlining
when the compiler does not have to look through the indirection.  It does not 
work well at the
moment (too little inlining) and neither does it work well if we disable 
inlining limits (too much).
> 
>> +  /* For the case of a boolean await_resume () that returns 'true' we should
>> + restart the dispatch, since we cannot know if additional resumes were
>> + executed from within the await_resume function.  */
> 
> Do you mean await_suspend here?

thanks, fixed.
> 
> OK with at least the typo fixed.

Pushed as attached.
- some expanded pseudo-code/description to follow in due course.

thanks
Iain


0001-c-coroutines-Fix-handling-of-bool-await_suspend-PR11.patch
Description: Binary data