[PATCH] range-op-float: Fix up -ffinite-math-only range extension and don't extend into infinities [PR109008]

2023-03-10 Thread Jakub Jelinek via Gcc-patches
Hi!

The following patch does two things (both related to range extension
around the boundaries).

The first part (in the 2 real_isfinite blocks) is to make the ranges
narrower when the old boundaries are minimum and/or maximum representable
finite number.  In that case frange_nextafter gives -Inf or +Inf,
but then the resulting computed reverse range is very far from the actually
needed range, usually extends up to infinity or could even result in NaNs.
While infinities are really the next representable numbers in the
corresponding mode, REAL_VALUE_TYPE is actually a type with wider range
for exponent and 160 bit precision, so the patch instead uses
nextafter number in a hypothetical floating point format with the same
mantissa precision but wider range of exponents.  This significantly
improves the actual ranges of the reverse operations, while still making
them conservatively correct.

The second part is a fix for miscompilation of the new testcase below.
For -ffinite-math-only, without this patch we extend the minimum and/or
maximum representable finite number to -Inf or +Inf, with the patch to
some number outside of the normal exponent range of the mode, but then
we use set which canonicalizes it and turns the boundaries back to
the minimum and/or maximum representable finite numbers, but because
in say [__DBL_MAX__, __DBL_MAX__] = op1 + [__DBL_MAX__, __DBL_MAX__]
op1 can be larger than 0, up to the largest number which rounds to even
down back to __DBL_MAX__ and there are still no infinities involved,
it needs to work even with -ffinite-math-only.  So, we really need to
widen the lhs range a little bit even in that case.  The patch does
that through temporarily clearing -ffinite-math-only, such that the
value with infinities or the outside of bounds values passes the
setting and verification (the VR_VARYING case is needed because
we get ICEs otherwise, but when lhs is VR_VARYING in -ffast-math,
i.e. minimum to maximum representable finite and both signs of NaN,
then set does all we need, we don't need to or in a NaN range).
We don't really later use the range in a way that would become a problem
that it is wider than varying, we actually just perform maths on the
two boundaries.

As I said in the PR, this doesn't fix the !MODE_HAS_INFINITIES case,
I believe we actually need to treat the boundary values as infinities
in that case because they (probably) work like that, but it is unclear
if it is just the reverse operation lhs widening that is a problem there,
or whether it is a general problem.  I have zero experience with
floating points without infinities (PDP11, some ARM half type?,
what else?).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-03-10  Jakub Jelinek  

PR tree-optimization/109008
* range-op-float.cc (float_widen_lhs_range): If lb is
minimum representable finite number or ub is maximum
representable finite number, instead of widening it to
-inf or inf widen it to negative or positive 0x0.8p+(EMAX+1).
Temporarily clear flag_finite_math_only when canonicalizing
the widened range.

* gcc.dg/pr109008.c: New test.

--- gcc/range-op-float.cc.jj2023-03-09 09:54:53.880453046 +0100
+++ gcc/range-op-float.cc   2023-03-09 20:52:07.456284507 +0100
@@ -2217,12 +2217,42 @@ float_widen_lhs_range (tree type, const
   REAL_VALUE_TYPE lb = lhs.lower_bound ();
   REAL_VALUE_TYPE ub = lhs.upper_bound ();
   if (real_isfinite (&lb))
-frange_nextafter (TYPE_MODE (type), lb, dconstninf);
+{
+  frange_nextafter (TYPE_MODE (type), lb, dconstninf);
+  if (real_isinf (&lb))
+   {
+ /* For -DBL_MAX, instead of -Inf use
+nexttoward (-DBL_MAX, -LDBL_MAX) in a hypothetical
+wider type with the same mantissa precision but larger
+exponent range; it is outside of range of double values,
+but makes it clear it is just one ulp larger rather than
+infinite amount larger.  */
+ lb = dconstm1;
+ SET_REAL_EXP (&lb, FLOAT_MODE_FORMAT (TYPE_MODE (type))->emax + 1);
+   }
+}
   if (real_isfinite (&ub))
-frange_nextafter (TYPE_MODE (type), ub, dconstinf);
+{
+  frange_nextafter (TYPE_MODE (type), ub, dconstinf);
+  if (real_isinf (&ub))
+   {
+ /* For DBL_MAX similarly.  */
+ ub = dconst1;
+ SET_REAL_EXP (&ub, FLOAT_MODE_FORMAT (TYPE_MODE (type))->emax + 1);
+   }
+}
+  /* Temporarily disable -ffinite-math-only, so that frange::set doesn't
+ reduce the range back to real_min_representable (type) as lower bound
+ or real_max_representable (type) as upper bound.  */
+  bool save_flag_finite_math_only = flag_finite_math_only;
+  flag_finite_math_only = false;
   ret.set (type, lb, ub);
-  ret.clear_nan ();
-  ret.union_ (lhs);
+  if (lhs.kind () != VR_VARYING)
+{
+  ret.clear_nan ();
+  ret.union_ (lhs);
+}
+  flag_finite_math_only = save_flag_fini

[PATCH] RISC-V: Fix ICE of RVV compare intrinsic

2023-03-10 Thread juzhe . zhong
From: Ju-Zhe Zhong 

vfrsub_vf_m.cpp: In function 'int main()': 
vfrsub_vf_m.cpp:5:43: error: invalid argument to built-in function 
   5 |   vbool32_t d = __riscv_vmflt_vf_f32m1_b32(c, b, 8); 
 | ~~^ 
during RTL pass: expand 
vfrsub_vf_m.cpp:5:43: internal compiler error: Segmentation fault 
0x19f1b89 crash_signal 
   ../../../../riscv-gnu-toolchain-trunk/riscv-gcc/gcc/toplev.cc:314 
0x1472e2f store_expr(tree_node*, rtx_def*, int, bool, bool) 
   ../../../../riscv-gnu-toolchain-trunk/riscv-gcc/gcc/expr.cc:6348

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins.cc 
(function_expander::use_compare_insn): Add operand predicate check.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/bug-1.c: New test.

---
 gcc/config/riscv/riscv-vector-builtins.cc |  9 +++
 .../gcc.target/riscv/rvv/base/bug-1.c | 79 +++
 2 files changed, 88 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/bug-1.c

diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
b/gcc/config/riscv/riscv-vector-builtins.cc
index fcda3863576..75e65091db3 100644
--- a/gcc/config/riscv/riscv-vector-builtins.cc
+++ b/gcc/config/riscv/riscv-vector-builtins.cc
@@ -3084,6 +3084,15 @@ function_expander::use_compare_insn (rtx_code rcode, 
insn_code icode)
 
   rtx op1 = expand_normal (CALL_EXPR_ARG (exp, arg_offset++));
   rtx op2 = expand_normal (CALL_EXPR_ARG (exp, arg_offset++));
+  if (!insn_operand_matches (icode, opno + 1, op1))
+op1 = force_reg (mode, op1);
+  if (!insn_operand_matches (icode, opno + 2, op2))
+{
+  if (VECTOR_MODE_P (GET_MODE (op2)))
+   op2 = force_reg (mode, op2);
+  else
+   op2 = force_reg (GET_MODE_INNER (mode), op2);
+}
   rtx comparison = gen_rtx_fmt_ee (rcode, mask_mode, op1, op2);
   if (!VECTOR_MODE_P (GET_MODE (op2)))
 comparison = gen_rtx_fmt_ee (rcode, mask_mode, op1,
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/bug-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/bug-1.c
new file mode 100644
index 000..a8843674e31
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/bug-1.c
@@ -0,0 +1,79 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32d -O0" } */
+
+#include "riscv_vector.h"
+
+int
+f0 ()
+{
+  float b;
+  vfloat32m1_t c;
+  vbool32_t d = __riscv_vmflt_vf_f32m1_b32 (c, b, 8);
+  return 0;
+}
+
+int
+f1 ()
+{
+  vfloat32m1_t c;
+  vbool32_t d = __riscv_vmflt_vf_f32m1_b32 (c, 0, 8);
+  return 0;
+}
+
+int
+f2 ()
+{
+  vfloat32m1_t c;
+  vbool32_t d = __riscv_vmflt_vf_f32m1_b32 (c, 55.55, 8);
+  return 0;
+}
+
+int
+f3 ()
+{
+  int32_t b;
+  vint32m1_t c;
+  vbool32_t d = __riscv_vmseq_vx_i32m1_b32 (c, b, 8);
+  return 0;
+}
+
+int
+f4 ()
+{
+  vint32m1_t c;
+  vbool32_t d = __riscv_vmseq_vx_i32m1_b32 (c, 11, 8);
+  return 0;
+}
+
+int
+f5 ()
+{
+  int64_t b;
+  vint64m1_t c;
+  vbool64_t d = __riscv_vmseq_vx_i64m1_b64 (c, b, 8);
+  return 0;
+}
+
+int
+f6 ()
+{
+  vint64m1_t c;
+  vbool64_t d = __riscv_vmseq_vx_i64m1_b64 (c, 11, 8);
+  return 0;
+}
+
+int
+f7 ()
+{
+  vint64m1_t c;
+  vbool64_t d = __riscv_vmseq_vx_i64m1_b64 (c, 0x, 8);
+  return 0;
+}
+
+int
+f8 ()
+{
+  vint64m1_t c;
+  vbool64_t d = __riscv_vmseq_vx_i64m1_b64 (c, 0xAA, 8);
+  return 0;
+}
-- 
2.36.3



[PATCH] range-op-float: Extend lhs by 0.5ulp rather than 1ulp if not -frounding-math [PR109008]

2023-03-10 Thread Jakub Jelinek via Gcc-patches
Hi!

This patch, incremental to the just posted one, improves the reverse
operation ranges significantly by widening just by 0.5ulp in each
direction rather than 1ulp.  Again, REAL_VALUE_TYPE has both wider
exponent range and wider mantissa precision (160 bits) than any
supported type, this patch uses the latter property.

The patch doesn't do it if -frounding-math, because then the rounding
can be +-1ulp in each direction depending on the rounding mode which
we don't know, or for IBM double double because that type is just weird
and we can't trust in sane properties.

I've performed testing of these 2 patches on 30 random tests as with
yesterday's patch, exact numbers are in the PR, but I see very significant
improvement in the precision of the ranges while keeping it conservatively
correct.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-03-10  Jakub Jelinek  

PR tree-optimization/109008
* range-op-float.cc (float_widen_lhs_range): If not
-frounding-math and not IBM double double format, extend lhs
range just by 0.5ulp rather than 1ulp in each direction.

--- gcc/range-op-float.cc.jj2023-03-09 12:13:57.189790814 +0100
+++ gcc/range-op-float.cc   2023-03-09 13:12:05.248873234 +0100
@@ -2205,8 +2205,8 @@ zero_to_inf_range (REAL_VALUE_TYPE &lb,
[1., 1.] = op1 + [1., 1.].  op1's range is not [0., 0.], but
[-0x1.0p-54, 0x1.0p-53] (when not -frounding-math), any value for
which adding 1. to it results in 1. after rounding to nearest.
-   So, for op1_range/op2_range extend the lhs range by 1ulp in each
-   direction.  See PR109008 for more details.  */
+   So, for op1_range/op2_range extend the lhs range by 1ulp (or 0.5ulp)
+   in each direction.  See PR109008 for more details.  */
 
 static frange
 float_widen_lhs_range (tree type, const frange &lhs)
@@ -2230,6 +2230,14 @@ float_widen_lhs_range (tree type, const
  lb = dconstm1;
  SET_REAL_EXP (&lb, FLOAT_MODE_FORMAT (TYPE_MODE (type))->emax + 1);
}
+  if (!flag_rounding_math && !MODE_COMPOSITE_P (TYPE_MODE (type)))
+   {
+ /* If not -frounding-math nor IBM double double, actually widen
+just by 0.5ulp rather than 1ulp.  */
+ REAL_VALUE_TYPE tem;
+ real_arithmetic (&tem, PLUS_EXPR, &lhs.lower_bound (), &lb);
+ real_arithmetic (&lb, RDIV_EXPR, &tem, &dconst2);
+   }
 }
   if (real_isfinite (&ub))
 {
@@ -2240,6 +2248,14 @@ float_widen_lhs_range (tree type, const
  ub = dconst1;
  SET_REAL_EXP (&ub, FLOAT_MODE_FORMAT (TYPE_MODE (type))->emax + 1);
}
+  if (!flag_rounding_math && !MODE_COMPOSITE_P (TYPE_MODE (type)))
+   {
+ /* If not -frounding-math nor IBM double double, actually widen
+just by 0.5ulp rather than 1ulp.  */
+ REAL_VALUE_TYPE tem;
+ real_arithmetic (&tem, PLUS_EXPR, &lhs.upper_bound (), &ub);
+ real_arithmetic (&ub, RDIV_EXPR, &tem, &dconst2);
+   }
 }
   /* Temporarily disable -ffinite-math-only, so that frange::set doesn't
  reduce the range back to real_min_representable (type) as lower bound

Jakub



Re: [PATCH] Extend nops num in "maybe_gen_insn" for RISC-V Vector intrinsics

2023-03-10 Thread Kito Cheng via Gcc-patches
Committed to trunk, thanks :)

On Wed, Mar 8, 2023 at 3:49 PM Richard Biener  wrote:
>
> On Wed, 8 Mar 2023, juzhe.zh...@rivai.ai wrote:
>
> > From: Ju-Zhe Zhong 
> >
> > Hi, current maybe_gen_insn can only expand 9 nops.
> > For RVV intrinsics, I need to extend it as 10, otherwise I should use 
> > GEN_FCN.
> > This patch is quite obvious change, Ok for trunk ?
>
> The optabs.cc change is OK.
>
> Thanks,
> Richard.
>
> > Thanks.
> >
> > gcc/ChangeLog:
> >
> > * config/riscv/riscv-vector-builtins.cc 
> > (function_expander::use_ternop_insn): Use maybe_gen_insn instead.
> > (function_expander::use_widen_ternop_insn): Ditto.
> > * optabs.cc (maybe_gen_insn): Extend nops handling.
> >
> > ---
> >  gcc/config/riscv/riscv-vector-builtins.cc | 24 ++-
> >  gcc/optabs.cc |  5 +
> >  2 files changed, 7 insertions(+), 22 deletions(-)
> >
> > diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
> > b/gcc/config/riscv/riscv-vector-builtins.cc
> > index 60381cfe98f..fcda3863576 100644
> > --- a/gcc/config/riscv/riscv-vector-builtins.cc
> > +++ b/gcc/config/riscv/riscv-vector-builtins.cc
> > @@ -3154,17 +3154,7 @@ function_expander::use_ternop_insn (bool vd_accum_p, 
> > insn_code icode)
> >add_input_operand (Pmode, get_tail_policy_for_pred (pred));
> >add_input_operand (Pmode, get_mask_policy_for_pred (pred));
> >add_input_operand (Pmode, get_avl_type_rtx (avl_type::NONVLMAX));
> > -
> > -  /* See optabs.cc, the maximum nops is 9 for using 'maybe_gen_insn'.
> > - We temporarily use GCN directly. We will change it back it we
> > - can support nops >= 10.  */
> > -  gcc_assert (maybe_legitimize_operands (icode, 0, opno, m_ops));
> > -  rtx_insn *pat = GEN_FCN (
> > -icode) (m_ops[0].value, m_ops[1].value, m_ops[2].value, m_ops[3].value,
> > - m_ops[4].value, m_ops[5].value, m_ops[6].value, m_ops[7].value,
> > - m_ops[8].value, m_ops[9].value);
> > -  emit_insn (pat);
> > -  return m_ops[0].value;
> > +  return generate_insn (icode);
> >  }
> >
> >  /* Implement the call using instruction ICODE, with a 1:1 mapping between
> > @@ -3196,17 +3186,7 @@ function_expander::use_widen_ternop_insn (insn_code 
> > icode)
> >add_input_operand (Pmode, get_tail_policy_for_pred (pred));
> >add_input_operand (Pmode, get_mask_policy_for_pred (pred));
> >add_input_operand (Pmode, get_avl_type_rtx (avl_type::NONVLMAX));
> > -
> > -  /* See optabs.cc, the maximum nops is 9 for using 'maybe_gen_insn'.
> > - We temporarily use GCN directly. We will change it back it we
> > - can support nops >= 10.  */
> > -  gcc_assert (maybe_legitimize_operands (icode, 0, opno, m_ops));
> > -  rtx_insn *pat = GEN_FCN (
> > -icode) (m_ops[0].value, m_ops[1].value, m_ops[2].value, m_ops[3].value,
> > - m_ops[4].value, m_ops[5].value, m_ops[6].value, m_ops[7].value,
> > - m_ops[8].value, m_ops[9].value);
> > -  emit_insn (pat);
> > -  return m_ops[0].value;
> > +  return generate_insn (icode);
> >  }
> >
> >  /* Implement the call using instruction ICODE, with a 1:1 mapping between
> > diff --git a/gcc/optabs.cc b/gcc/optabs.cc
> > index cf22bfec3f5..4c641cab192 100644
> > --- a/gcc/optabs.cc
> > +++ b/gcc/optabs.cc
> > @@ -8091,6 +8091,11 @@ maybe_gen_insn (enum insn_code icode, unsigned int 
> > nops,
> >return GEN_FCN (icode) (ops[0].value, ops[1].value, ops[2].value,
> > ops[3].value, ops[4].value, ops[5].value,
> > ops[6].value, ops[7].value, ops[8].value);
> > +case 10:
> > +  return GEN_FCN (icode) (ops[0].value, ops[1].value, ops[2].value,
> > +   ops[3].value, ops[4].value, ops[5].value,
> > +   ops[6].value, ops[7].value, ops[8].value,
> > +   ops[9].value);
> >  }
> >gcc_unreachable ();
> >  }
> >
>
> --
> Richard Biener 
> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
> Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
> HRB 36809 (AG Nuernberg)


Re: [PATCH] testsuite, rs6000: Adjust ppc-fortran.exp to support dg-{warning,error}

2023-03-10 Thread HAO CHEN GUI via Gcc-patches
Hi Kewen,
  I tested it with my fortran test case. It works. Thanks a lot.

Gui Haochen

在 2023/3/6 17:27, Kewen.Lin 写道:
> Hi,
> 
> According to Haochen's finding in [1], currently ppc-fortran.exp
> doesn't support Fortran specific warning or error messages well.
> By looking into it, it's due to that gfortran uses some different
> warning/error prefixes as follows:
> 
> set gcc_warning_prefix "\[Ww\]arning:"
> set gcc_error_prefix "(Fatal )?\[Ee\]rror:"
> 
> comparing to:
> 
> set gcc_warning_prefix "warning:"
> set gcc_error_prefix "(fatal )?error:"
> 
> So this is to override these two prefixes and make it support
> dg-{warning,error} checks.
> 
> Tested on powerpc64-linux-gnu P7/P8/P9 and
> powerpc64le-linux-gnu P9/P10.
> 
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-March/613302.html
> 
> BR,
> Kewen
> -
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/powerpc/ppc-fortran/ppc-fortran.exp: Override
>   gcc_{warning,error}_prefix with Fortran specific one used in
>   gfortran_init.
> ---
>  gcc/testsuite/gcc.target/powerpc/ppc-fortran/ppc-fortran.exp | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-fortran/ppc-fortran.exp 
> b/gcc/testsuite/gcc.target/powerpc/ppc-fortran/ppc-fortran.exp
> index a556d7b48a3..f7e99ac8487 100644
> --- a/gcc/testsuite/gcc.target/powerpc/ppc-fortran/ppc-fortran.exp
> +++ b/gcc/testsuite/gcc.target/powerpc/ppc-fortran/ppc-fortran.exp
> @@ -58,6 +58,11 @@ proc dg-compile-aux-modules { args } {
>  }
>  }
> 
> +# Override gcc_{warning,error}_prefix with Fortran specific prefixes used
> +# in gfortran_init to support dg-{warning,error} checks.
> +set gcc_warning_prefix "\[Ww\]arning:"
> +set gcc_error_prefix "(Fatal )?\[Ee\]rror:"
> +
>  # Main loop.
>  gfortran-dg-runtest [lsort \
> [glob -nocomplain $srcdir/$subdir/*.\[fF\]{,90,95,03,08} ] ] "" 
> $DEFAULT_FFLAGS
> --
> 2.39.1


Re: AArch64 bfloat16 mangling

2023-03-10 Thread Jakub Jelinek via Gcc-patches
On Thu, Mar 09, 2023 at 05:14:11PM +, Richard Sandiford wrote:
> We decided to keep the current mangling of __bf16 and use it for
> std::bfloat16_t too.  __bf16 will become a non-standard arithmetic type.
> This will be an explicit diversion from the Itanium ABI.
> 
> I think that's equivalent to your (2) without the part about following
> the Itanium ABI.

I'm afraid I have no idea how can the above work though.

Diversion from the Itanium ABI is doable, we have various examples
where we mangle things differently, say on powerpc* where
long double is mangled as g if it is IBM double double format (i.e.
Itanium __float128) while for long double the spec says e, and
as u9__ieee128 if it is IEEE quad.  __float128 also mangles as
u9__ieee128 and so does __ieee128.

The problem is if __bf16 needs to be treated differently from
decltype (0.0bf16) aka std::bfloat16_t (the former being a non-standard
arithmetic type, the latter being C++23 extended floating-point type,
then they need to be distinct types.  And distinct types need to
mangle differently.  Consider
#include 
template 
void bar () {}
void baz ()
{
  bar<__bf16> ();
  bar ();
  bar ();
}
If __bf16 is distinct from the latter two which are the same type,
then it will instantiate bar twice, for both of those types, but
if they are mangled the same, will emit two functions with the same
name and assembler will reject it (or LTO might ICE etc.).

Note, e.g.
void foo (__float128, __ieee128, long double, _Float128) {}
template 
void bar () {}
void baz ()
{
  bar <__float128> ();
  bar <__ieee128> ();
  bar  ();
}
works on powerpc64le-linux with -mlong-double-128 -mabi=ieeelongdouble
because __float128, __ieee128 and long double types are in that case
the same type, not distinct, so e.g. bar is instantiated just once
(only _Float128 mangles differently above).  With
-mlong-double-128 -mabi=ibmlongdouble __float128 and __ieee128 are
the same (non-standard) type, while long double mangles differently
(g) and _Float128 too, so bar is instantiated twice.

So, either __bf16 should be also extended floating-point type
like decltype (0.0bf16) and std::bfloat16_t and in that case
it is fine if it mangles u6__bf16, or __bf16 will be a distinct
type from the latter two, __bf16 non-standard arithmetic type
while the latter two extended floating-point types, but then
they need to mangle differently, most likely u6__bf16 vs. DF16b.

Jakub



Re: [PATCH 3/4]middle-end: Implement preferred_div_as_shifts_over_mult [PR108583]

2023-03-10 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
> Hi,
>
> Here's the respun patch.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   PR target/108583
>   * target.def (preferred_div_as_shifts_over_mult): New.
>   * doc/tm.texi.in: Document it.
>   * doc/tm.texi: Regenerate.
>   * targhooks.cc (default_preferred_div_as_shifts_over_mult): New.
>   * targhooks.h (default_preferred_div_as_shifts_over_mult): New.
>   * tree-vect-patterns.cc (vect_recog_divmod_pattern): Use it.
>
> gcc/testsuite/ChangeLog:
>
>   PR target/108583
>   * gcc.dg/vect/vect-div-bitmask-4.c: New test.
>   * gcc.dg/vect/vect-div-bitmask-5.c: New test.
>
> --- inline copy of patch ---
>
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index 
> 50a8872a6695b18b9bed0d393bacf733833633db..bf7269e323de1a065d4d04376e5a2703cbb0f9fa
>  100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -6137,6 +6137,12 @@ instruction pattern.  There is no need for the hook to 
> handle these two
>  implementation approaches itself.
>  @end deftypefn
>  
> +@deftypefn {Target Hook} bool 
> TARGET_VECTORIZE_PREFERRED_DIV_AS_SHIFTS_OVER_MULT (const_tree @var{type})
> +Sometimes it is possible to implement a vector division using a sequence
> +of two addition-shift pairs, giving four instructions in total.
> +Return true if taking this approach for @var{vectype} is likely
> +to be better than using a sequence involving highpart multiplication.
> +Default is false if @code{can_mult_highpart_p}, otherwise true.
>  @end deftypefn
>  
>  @deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION 
> (unsigned @var{code}, tree @var{vec_type_out}, tree @var{vec_type_in})
> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> index 
> 3e07978a02f4e6077adae6cadc93ea4273295f1f..0051017a7fd67691a343470f36ad4fc32c8e7e15
>  100644
> --- a/gcc/doc/tm.texi.in
> +++ b/gcc/doc/tm.texi.in
> @@ -4173,6 +4173,7 @@ address;  but often a machine-dependent strategy can 
> generate better code.
>  
>  @hook TARGET_VECTORIZE_VEC_PERM_CONST
>  
> +@hook TARGET_VECTORIZE_PREFERRED_DIV_AS_SHIFTS_OVER_MULT
>  
>  @hook TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION
>  
> diff --git a/gcc/target.def b/gcc/target.def
> index 
> e0a5c7adbd962f5d08ed08d1d81afa2c2baa64a5..e4474a3ed6bd2f5f5c010bf0d40c2a371370490c
>  100644
> --- a/gcc/target.def
> +++ b/gcc/target.def
> @@ -1868,6 +1868,18 @@ correct for most targets.",
>   poly_uint64, (const_tree type),
>   default_preferred_vector_alignment)
>  
> +/* Returns whether the target has a preference for decomposing divisions 
> using
> +   shifts rather than multiplies.  */
> +DEFHOOK
> +(preferred_div_as_shifts_over_mult,
> + "Sometimes it is possible to implement a vector division using a sequence\n\
> +of two addition-shift pairs, giving four instructions in total.\n\
> +Return true if taking this approach for @var{vectype} is likely\n\
> +to be better than using a sequence involving highpart multiplication.\n\
> +Default is false if @code{can_mult_highpart_p}, otherwise true.",
> + bool, (const_tree type),
> + default_preferred_div_as_shifts_over_mult)
> +
>  /* Return true if vector alignment is reachable (by peeling N
> iterations) for the given scalar type.  */
>  DEFHOOK
> diff --git a/gcc/targhooks.h b/gcc/targhooks.h
> index 
> a6a4809ca91baa5d7fad2244549317a31390f0c2..a207963b9e6eb9300df0043e1b79aa6c941d0f7f
>  100644
> --- a/gcc/targhooks.h
> +++ b/gcc/targhooks.h
> @@ -53,6 +53,8 @@ extern scalar_int_mode default_unwind_word_mode (void);
>  extern unsigned HOST_WIDE_INT default_shift_truncation_mask
>(machine_mode);
>  extern unsigned int default_min_divisions_for_recip_mul (machine_mode);
> +extern bool default_preferred_div_as_shifts_over_mult
> +  (const_tree);
>  extern int default_mode_rep_extended (scalar_int_mode, scalar_int_mode);
>  
>  extern tree default_stack_protect_guard (void);
> diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
> index 
> 211525720a620d6f533e2da91e03877337a931e7..7f39ff9b7ec2bf66625d48a47bb76e96c05a3233
>  100644
> --- a/gcc/targhooks.cc
> +++ b/gcc/targhooks.cc
> @@ -1483,6 +1483,15 @@ default_preferred_vector_alignment (const_tree type)
>return TYPE_ALIGN (type);
>  }
>  
> +/* The default implementation of
> +   TARGET_VECTORIZE_PREFERRED_DIV_AS_SHIFTS_OVER_MULT.  */
> +
> +bool
> +default_preferred_div_as_shifts_over_mult (const_tree type)
> +{
> +  return can_mult_highpart_p (TYPE_MODE (type), TYPE_UNSIGNED (type));

The return value should be inverted.

> +}
> +
>  /* By default assume vectors of element TYPE require a multiple of the 
> natural
> alignment of TYPE.  TYPE is naturally aligned if IS_PACKED is false.  */
>  bool
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c 
> b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c
> new file mode 100644
> index 
> ..c81f8946922250234bf759e0a0a04ea8c1f73e3c
> --- /de

Re: AArch64 bfloat16 mangling

2023-03-10 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek  writes:
> On Thu, Mar 09, 2023 at 05:14:11PM +, Richard Sandiford wrote:
>> We decided to keep the current mangling of __bf16 and use it for
>> std::bfloat16_t too.  __bf16 will become a non-standard arithmetic type.
>> This will be an explicit diversion from the Itanium ABI.
>> 
>> I think that's equivalent to your (2) without the part about following
>> the Itanium ABI.
>
> I'm afraid I have no idea how can the above work though.
>
> Diversion from the Itanium ABI is doable, we have various examples
> where we mangle things differently, say on powerpc* where
> long double is mangled as g if it is IBM double double format (i.e.
> Itanium __float128) while for long double the spec says e, and
> as u9__ieee128 if it is IEEE quad.  __float128 also mangles as
> u9__ieee128 and so does __ieee128.
>
> The problem is if __bf16 needs to be treated differently from
> decltype (0.0bf16) aka std::bfloat16_t (the former being a non-standard
> arithmetic type, the latter being C++23 extended floating-point type,
> then they need to be distinct types.  And distinct types need to
> mangle differently.  Consider
> #include 
> template 
> void bar () {}
> void baz ()
> {
>   bar<__bf16> ();
>   bar ();
>   bar ();
> }
> If __bf16 is distinct from the latter two which are the same type,
> then it will instantiate bar twice, for both of those types, but
> if they are mangled the same, will emit two functions with the same
> name and assembler will reject it (or LTO might ICE etc.).
>
> Note, e.g.
> void foo (__float128, __ieee128, long double, _Float128) {}
> template 
> void bar () {}
> void baz ()
> {
>   bar <__float128> ();
>   bar <__ieee128> ();
>   bar  ();
> }
> works on powerpc64le-linux with -mlong-double-128 -mabi=ieeelongdouble
> because __float128, __ieee128 and long double types are in that case
> the same type, not distinct, so e.g. bar is instantiated just once
> (only _Float128 mangles differently above).  With
> -mlong-double-128 -mabi=ibmlongdouble __float128 and __ieee128 are
> the same (non-standard) type, while long double mangles differently
> (g) and _Float128 too, so bar is instantiated twice.
>
> So, either __bf16 should be also extended floating-point type
> like decltype (0.0bf16) and std::bfloat16_t and in that case
> it is fine if it mangles u6__bf16, or __bf16 will be a distinct
> type from the latter two,

Yeah, the former is what I meant.  The intention is that __bf16 and
std::bfloat16_t are the same type, not distinct types.

Richard

> __bf16 non-standard arithmetic type
> while the latter two extended floating-point types, but then
> they need to mangle differently, most likely u6__bf16 vs. DF16b.
>
>   Jakub


Re: [PATCH v2 0/5] A small Texinfo refinement

2023-03-10 Thread Iain Sandoe via Gcc-patches
Hi all,

> On 9 Mar 2023, at 23:35, Sandra Loosemore via Gcc-patches 
>  wrote:
> 
> On 3/9/23 01:26, Richard Biener wrote:
> 
>> SLES 12 has texinfo 4.13a, SLES 15 has texinfo 6.5.  We still provide
>> up-to-date GCC for SLES 12 but we can probably manage in some ways
>> when the texinfo requirement gets bumped.
> 
> OK, this seems to be the oldest version anyone admits to actually using.  I 
> built the manual with Arsen's patches using 4.13a; the build was successful, 
> and I didn't see any obvious issues with the @gol removal in either the PDF 
> or HTML output, so I think we are OK for backward compatibility.

FWIW macOS/Darwin (as delivered by Apple) is stuck on 4.8 (and, presumably, 
very unlikely to advance), but I would expect most macOS FOSS users have 
something newer installed, either self-built or via macposrts/homebrew etc. so 
the “admits to actually using” applies here too I think (personally, I am using 
6.7 but not for any special reason other than it was current when I updated  my 
local toolset).  So I think Darwin can also manage with a newer requirement.

thanks
Iain

> I will work up a patch to remove the references to version 4.7 and replace it 
> with some generic language as I suggested earlier, that won't be so prone to 
> bit rot.
> 
> -Sandra



Patch ping - [PATCH] file-prefix-map: Fix up -f*-prefix-map= [PR108464]

2023-03-10 Thread Jakub Jelinek via Gcc-patches
Hi!

I'd like to ping these patches.  All 3 variants have been
bootstrapped/regtested on x86_64-linux and i686-linux, the last
one is my preference I guess.  The current state breaks e.g. ccache.

https://gcc.gnu.org/pipermail/gcc-patches/2023-January/610285.html
  - PR108464 - P1 - file-prefix-map: Fix up -f*-prefix-map= (3 variants)

Thanks
Jakub

On Fri, Jan 20, 2023 at 04:05:55PM +0100, Jakub Jelinek via Gcc-patches wrote:
> On Tue, Nov 01, 2022 at 01:46:20PM -0600, Jeff Law via Gcc-patches wrote:
> > > This does cause a change of behaviour if users were previously relying 
> > > upon
> > > symlinks or absolute paths not being resolved.
> > 
> > I'm not too worried about this scenario.
> 
> As mentioned in the PR, this patch breaks e.g. ccache testsuite.
> 
> I strongly doubt most of the users want such a behavior, because it
> makes all filenames absolute when -f*-prefix-map= options remap one
> absolute path to another one.
> Say if I'm in /tmp and /tmp is the canonical path and there is
> src/test.c file, with -fdebug-prefix-map=/tmp=/blah
> previously there would be DW_AT_comp_dir "/blah" and it is still there,
> but DW_AT_name which was previouly "src/test.c" (relative against
> DW_AT_comp_dir) is now "/blah/src/test.c" instead.
> 
> Even worse, the canonicalization is only done on the remap_filename
> argument, but not on the old_prefix side.  That is e.g. what breaks
> ccache.  If there is
> /tmp/foobar1 directory and
> ln -sf foobar1 /tmp/foobar2
> cd /tmp/foobar2
> then -fdebug-prefix-map=`pwd`:/blah will just not work, while
> src/test.c will be canonicalized to /tmp/foobar1/src/test.c,
> old_prefix is still what the user provided which is /tmp/foobar2.
> User would need to change their uses to use -fdebug-prefix-map=`realpath 
> $(pwd)`=/blah
> 
> I'm attaching 3 so far just compile tested patches.
> 
> The first patch just reverts the patch (and its follow-up patch).
> 
> The second introduces a new option, -f{,no}-canon-prefix-map which affects
> the behavior of -f{file,macro,debug,profile}-prefix-map=, if on it
> canonicalizes the old path of the prefix map option and compares that
> against the canonicalized filename for absolute paths but not relative.
> 
> And last is like the second, but does that also for relative paths except
> for filenames with no / (or / or \ on DOS based fs).  So, the third patch
> gets an optional behavior of what has been on the trunk lately with the
> difference that the old_prefix is canonicalized by the compiler.
> 
> Initially I've thought I'd just add some magic syntax to the OLD=NEW
> argument of those options (because there are 4 of them), but as noted
> in the comments, = is valid char in OLD (just not new), so it would
> be hard to figure out some syntax.  So instead a new option, which one
> can turn on and off for different -f*-prefix-map= options if needed.
> 
> -fdebug-prefix-map=/path1=/mypath1 -fcanon-prefix-map \
> -fdebug-prefix-map=/path2=/mypath2 -fno-canon-prefix-map \
> -fdebug-prefix-map=/path3=/mypath3
> 
> will use the old behavior for the /path1 and /path3 handling and
> the new one only for /path2 handling.
> 
> Thoughts on this?
> 
>   Jakub

> 2023-01-20  Jakub Jelinek  
> 
>   PR other/108464
>   * file-prefix-map.cc (remap_filename): Revert 2022-11-01 and 2022-11-07
>   changes.
> 
> --- gcc/file-prefix-map.cc
> +++ gcc/file-prefix-map.cc
> @@ -70,29 +70,19 @@ remap_filename (file_prefix_map *maps, const char 
> *filename)
>file_prefix_map *map;
>char *s;
>const char *name;
> -  char *realname;
>size_t name_len;
>  
> -  if (!filename || lbasename (filename) == filename)
> -return filename;
> -
> -  realname = lrealpath (filename);
> -
>for (map = maps; map; map = map->next)
> -if (filename_ncmp (realname, map->old_prefix, map->old_len) == 0)
> +if (filename_ncmp (filename, map->old_prefix, map->old_len) == 0)
>break;
>if (!map)
> -{
> -  free (realname);
> -  return filename;
> -}
> -  name = realname + map->old_len;
> +return filename;
> +  name = filename + map->old_len;
>name_len = strlen (name) + 1;
>  
>s = (char *) ggc_alloc_atomic (name_len + map->new_len);
>memcpy (s, map->new_prefix, map->new_len);
>memcpy (s + map->new_len, name, name_len);
> -  free (realname);
>return s;
>  }
>  

> 2023-01-20  Jakub Jelinek  
> 
>   PR other/108464
>   * common.opt (fcanon-prefix-map): New option.
>   * opts.cc: Include file-prefix-map.h.
>   (flag_canon_prefix_map): New variable.
>   (common_handle_option): Handle OPT_fcanon_prefix_map.
>   (gen_command_line_string): Ignore OPT_fcanon_prefix_map.
>   * file-prefix-map.h (flag_canon_prefix_map): Declare.
>   * file-prefix-map.cc (struct file_prefix_map): Add canonicalize
>   member.
>   (add_prefix_map): Initialize canonicalize member from
>   flag_canon_prefix_map, and if true and old_prefix is absolute
>   pathname, canonicalize

Re: [PATCH] range-op-float: Fix up -ffinite-math-only range extension and don't extend into infinities [PR109008]

2023-03-10 Thread Richard Biener via Gcc-patches
On Fri, 10 Mar 2023, Jakub Jelinek wrote:

> Hi!
> 
> The following patch does two things (both related to range extension
> around the boundaries).
> 
> The first part (in the 2 real_isfinite blocks) is to make the ranges
> narrower when the old boundaries are minimum and/or maximum representable
> finite number.  In that case frange_nextafter gives -Inf or +Inf,
> but then the resulting computed reverse range is very far from the actually
> needed range, usually extends up to infinity or could even result in NaNs.
> While infinities are really the next representable numbers in the
> corresponding mode, REAL_VALUE_TYPE is actually a type with wider range
> for exponent and 160 bit precision, so the patch instead uses
> nextafter number in a hypothetical floating point format with the same
> mantissa precision but wider range of exponents.  This significantly
> improves the actual ranges of the reverse operations, while still making
> them conservatively correct.
> 
> The second part is a fix for miscompilation of the new testcase below.
> For -ffinite-math-only, without this patch we extend the minimum and/or
> maximum representable finite number to -Inf or +Inf, with the patch to
> some number outside of the normal exponent range of the mode, but then
> we use set which canonicalizes it and turns the boundaries back to
> the minimum and/or maximum representable finite numbers, but because
> in say [__DBL_MAX__, __DBL_MAX__] = op1 + [__DBL_MAX__, __DBL_MAX__]
> op1 can be larger than 0, up to the largest number which rounds to even
> down back to __DBL_MAX__ and there are still no infinities involved,
> it needs to work even with -ffinite-math-only.  So, we really need to
> widen the lhs range a little bit even in that case.  The patch does
> that through temporarily clearing -ffinite-math-only, such that the
> value with infinities or the outside of bounds values passes the
> setting and verification (the VR_VARYING case is needed because
> we get ICEs otherwise, but when lhs is VR_VARYING in -ffast-math,
> i.e. minimum to maximum representable finite and both signs of NaN,
> then set does all we need, we don't need to or in a NaN range).
> We don't really later use the range in a way that would become a problem
> that it is wider than varying, we actually just perform maths on the
> two boundaries.
> 
> As I said in the PR, this doesn't fix the !MODE_HAS_INFINITIES case,
> I believe we actually need to treat the boundary values as infinities
> in that case because they (probably) work like that, but it is unclear
> if it is just the reverse operation lhs widening that is a problem there,
> or whether it is a general problem.  I have zero experience with
> floating points without infinities (PDP11, some ARM half type?,
> what else?).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2023-03-10  Jakub Jelinek  
> 
>   PR tree-optimization/109008
>   * range-op-float.cc (float_widen_lhs_range): If lb is
>   minimum representable finite number or ub is maximum
>   representable finite number, instead of widening it to
>   -inf or inf widen it to negative or positive 0x0.8p+(EMAX+1).
>   Temporarily clear flag_finite_math_only when canonicalizing
>   the widened range.
> 
>   * gcc.dg/pr109008.c: New test.
> 
> --- gcc/range-op-float.cc.jj  2023-03-09 09:54:53.880453046 +0100
> +++ gcc/range-op-float.cc 2023-03-09 20:52:07.456284507 +0100
> @@ -2217,12 +2217,42 @@ float_widen_lhs_range (tree type, const
>REAL_VALUE_TYPE lb = lhs.lower_bound ();
>REAL_VALUE_TYPE ub = lhs.upper_bound ();
>if (real_isfinite (&lb))
> -frange_nextafter (TYPE_MODE (type), lb, dconstninf);
> +{
> +  frange_nextafter (TYPE_MODE (type), lb, dconstninf);
> +  if (real_isinf (&lb))
> + {
> +   /* For -DBL_MAX, instead of -Inf use
> +  nexttoward (-DBL_MAX, -LDBL_MAX) in a hypothetical
> +  wider type with the same mantissa precision but larger
> +  exponent range; it is outside of range of double values,
> +  but makes it clear it is just one ulp larger rather than
> +  infinite amount larger.  */
> +   lb = dconstm1;
> +   SET_REAL_EXP (&lb, FLOAT_MODE_FORMAT (TYPE_MODE (type))->emax + 1);
> + }
> +}
>if (real_isfinite (&ub))
> -frange_nextafter (TYPE_MODE (type), ub, dconstinf);
> +{
> +  frange_nextafter (TYPE_MODE (type), ub, dconstinf);
> +  if (real_isinf (&ub))
> + {
> +   /* For DBL_MAX similarly.  */
> +   ub = dconst1;
> +   SET_REAL_EXP (&ub, FLOAT_MODE_FORMAT (TYPE_MODE (type))->emax + 1);
> + }
> +}
> +  /* Temporarily disable -ffinite-math-only, so that frange::set doesn't
> + reduce the range back to real_min_representable (type) as lower bound
> + or real_max_representable (type) as upper bound.  */
> +  bool save_flag_finite_math_only = flag_finite_math_only;
> +  flag_finite_math_only = false;
>ret.set (type,

Patch ping: Re: [PATCH] libgcc, i386, optabs, v2: Add __float{, un}tibf to libgcc and expand BF -> integral through SF intermediate [PR107703]

2023-03-10 Thread Jakub Jelinek via Gcc-patches
Hi!

On Wed, Mar 01, 2023 at 01:32:43PM +0100, Jakub Jelinek via Gcc-patches wrote:
> On Wed, Nov 16, 2022 at 12:51:14PM +0100, Jakub Jelinek via Gcc-patches wrote:
> > On Wed, Nov 16, 2022 at 10:06:17AM +0100, Jakub Jelinek via Gcc-patches 
> > wrote:
> > > Thoughts on this?  I guess my preference would be the BF -> SF -> TI
> > > path because we won't need to waste
> > > 32: 00015e10   321 FUNCGLOBAL DEFAULT   13 
> > > __fixbfti@@GCC_13.0.0
> > > 89: 00015f60   299 FUNCGLOBAL DEFAULT   13 
> > > __fixunsbfti@@GCC_13.0.0
> > > If so, I'd need to cut the fix parts of the patch below and
> > > do something in the middle-end.
> > 
> > Here is adjusted patch that does that.
> > 
> > 2022-11-16  Jakub Jelinek  
> > 
> > PR target/107703
> > * optabs.cc (expand_fix): For conversions from BFmode to integral,
> > use shifts to convert it to SFmode first and then convert SFmode
> > to integral.
> > 
> > * soft-fp/floattibf.c: New file.
> > * soft-fp/floatuntibf.c: New file.
> > * config/i386/libgcc-glibc.ver: Export __float{,un}tibf @ GCC_13.0.0.
> > * config/i386/64/t-softfp (softfp_extras): Add floattibf and
> > floatuntibf.
> > (CFLAGS-floattibf.c, CFLAGS-floatunstibf.c): Add -msse2.
> 
> I'd like to ping the libgcc non-i386 part of this patch, Uros said the i386
> part is ok but that one depends on the generic libgcc changes.
> I'll ping the optabs.cc change separately.
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606398.html
> with more info in
> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606382.html

I'd like to ping this again.  I've posted the previously added
bfloat16 changes as well as the above 2 new files to libc-alpha as well
https://sourceware.org/pipermail/libc-alpha/2023-March/146246.html
if it makes the review easier.

Thanks

Jakub



Patch ping: [PATCH] c++: Don't clear TREE_READONLY for -fmerge-all-constants for non-aggregates [PR107558]

2023-03-10 Thread Jakub Jelinek via Gcc-patches
Hi!

I'd like to ping this patch:

https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607145.html
  - PR107558 - P2 - c++: Don't clear TREE_READONLY for -fmerge-all-constants 
for non-aggregates

Thanks
Jakub

On Thu, Nov 24, 2022 at 10:13:55AM +0100, Jakub Jelinek via Gcc-patches wrote:
> The following testcase ICEs, because OpenMP lowering for shared clause
> on l variable with REFERENCE_TYPE creates POINTER_TYPE to REFERENCE_TYPE.
> The reason is that the automatic variable has non-trivial construction
> (reference to a lambda) and -fmerge-all-constants is on and so TREE_READONLY
> isn't set - omp-low will handle automatic TREE_READONLY vars in shared
> specially and only copy to the construct and not back, while !TREE_READONLY
> are assumed to be changeable.
> The PR91529 change rationale was that the gimplification can change
> some non-addressable automatic variables to TREE_STATIC with
> -fmerge-all-constants and therefore TREE_READONLY on them is undesirable.
> But, the gimplifier does that only for aggregate variables:
>   switch (TREE_CODE (type))
> {  
> case RECORD_TYPE:
> case UNION_TYPE:
> case QUAL_UNION_TYPE:
> case ARRAY_TYPE:
> and not for anything else.  So, I think clearing TREE_READONLY for
> automatic integral or reference or pointer etc. vars for
> -fmerge-all-constants only is unnecessary.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2022-11-24  Jakub Jelinek  
> 
>   PR c++/107558
>   * decl.cc (cp_finish_decl): Don't clear TREE_READONLY on
>   automatic non-aggregate variables just because of
>   -fmerge-all-constants.
> 
>   * g++.dg/gomp/pr107558.C: New test.
> 
> --- gcc/cp/decl.cc.jj 2022-11-19 09:21:14.662439877 +0100
> +++ gcc/cp/decl.cc2022-11-23 13:12:31.866553152 +0100
> @@ -8679,8 +8679,10 @@ cp_finish_decl (tree decl, tree init, bo
>  
>if (var_definition_p
> /* With -fmerge-all-constants, gimplify_init_constructor
> -  might add TREE_STATIC to the variable.  */
> -   && (TREE_STATIC (decl) || flag_merge_constants >= 2))
> +  might add TREE_STATIC to aggregate variables.  */
> +   && (TREE_STATIC (decl)
> +   || (flag_merge_constants >= 2
> +   && AGGREGATE_TYPE_P (type
>   {
> /* If a TREE_READONLY variable needs initialization
>at runtime, it is no longer readonly and we need to
> --- gcc/testsuite/g++.dg/gomp/pr107558.C.jj   2022-11-23 13:13:27.260736525 
> +0100
> +++ gcc/testsuite/g++.dg/gomp/pr107558.C  2022-11-23 13:15:22.271041005 
> +0100
> @@ -0,0 +1,14 @@
> +// PR c++/107558
> +// { dg-do compile { target c++11 } }
> +// { dg-additional-options "-fmerge-all-constants" }
> +// { dg-additional-options "-flto" { target lto } }
> +
> +int a = 15;
> +
> +void
> +foo ()
> +{
> +  auto &&l = [&]() { return a; };
> +#pragma omp target parallel
> +  l ();
> +}



Patch ping - [PATCH] tree: Use comdat tree_code_{type,length} even for C++11/14 [PR108634]

2023-03-10 Thread Jakub Jelinek via Gcc-patches
Hi!

I'd like to ping this patch, which has been successfully
bootstrapped/regtested on x86_64-linux and i686-linux:

https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611180.html
  - PR108634 - P3 - tree: Use comdat tree_code_{type,length} even for C++11/14

Thanks
Jakub

On Thu, Feb 02, 2023 at 03:30:29PM +0100, Jakub Jelinek via Gcc-patches wrote:
> The recent change to undo the tree_code_type/tree_code_length
> excessive duplication apparently broke building the Linux kernel
> plugin.  While it is certainly desirable that GCC plugins are built
> with the same compiler as GCC has been built and with the same options
> (at least the important ones), it might be hard to arrange that,
> e.g. if gcc is built using a cross-compiler but the plugin then built
> natively, or GCC isn't bootstrapped for other reasons, or just as in
> the kernel case they were building the plugin with -std=gnu++11 while
> the bootstrapped GCC has been built without any such option and so with
> whatever the compiler defaulted to.
> 
> For C++17 and later tree_code_{type,length} are UNIQUE symbols with
> those assembler names, while for C++11/14 they were
> _ZL14tree_code_type and _ZL16tree_code_length.
> 
> The following patch uses a comdat var for those even for C++11/14
> as suggested by Maciej Cencora.  Relying on weak attribute is not an
> option because not all hosts support it and there are non-GNU system
> compilers.  While we could use it unconditionally,
> I think defining a template just to make it comdat is weird, and
> the compiler itself is always built with the same compiler.
> Plugins, being separate shared libraries, will have a separate copy of
> the arrays if they are ODR-used in the plugin, so there is not a big
> deal if e.g. cc1plus uses tree_code_type while plugin uses
> _ZN19tree_code_type_tmplILi0EE14tree_code_typeE or vice versa.
> 
> Tested in non-bootstrapped build with both -std=gnu++17 and -std=gnu++11,
> ok for trunk if it passes full bootstrap/regtest?
> 
> 2023-02-02  Jakub Jelinek  
> 
>   PR plugins/108634
>   * tree-core.h (tree_code_type, tree_code_length): For C++11 or
>   C++14, don't declare as extern const arrays.
>   (tree_code_type_tmpl, tree_code_length_tmpl): New types with
>   static constexpr member arrays for C++11 or C++14.
>   * tree.h (TREE_CODE_CLASS): For C++11 or C++14 use
>   tree_code_type_tmpl <0>::tree_code_type instead of tree_code_type.
>   (TREE_CODE_LENGTH): For C++11 or C++14 use
>   tree_code_length_tmpl <0>::tree_code_length instead of
>   tree_code_length.
>   * tree.cc (tree_code_type, tree_code_length): Remove.
> 
> --- gcc/tree-core.h.jj2023-01-27 10:51:27.575399052 +0100
> +++ gcc/tree-core.h   2023-02-02 15:06:05.048665279 +0100
> @@ -2285,19 +2285,27 @@ struct floatn_type_info {
>  extern bool tree_contains_struct[MAX_TREE_CODES][64];
>  
>  /* Class of tree given its code.  */
> -#if __cpp_inline_variables >= 201606L
>  #define DEFTREECODE(SYM, NAME, TYPE, LENGTH) TYPE,
>  #define END_OF_BASE_TREE_CODES tcc_exceptional,
>  
> +#if __cpp_inline_variables < 201606L
> +template 
> +struct tree_code_type_tmpl {
> +  static constexpr enum tree_code_class tree_code_type[] = {
> +#include "all-tree.def"
> +  };
> +};
> +
> +template 
> +constexpr enum tree_code_class tree_code_type_tmpl::tree_code_type[];
> +#else
>  constexpr inline enum tree_code_class tree_code_type[] = {
>  #include "all-tree.def"
>  };
> +#endif
>  
>  #undef DEFTREECODE
>  #undef END_OF_BASE_TREE_CODES
> -#else
> -extern const enum tree_code_class tree_code_type[];
> -#endif
>  
>  /* Each tree code class has an associated string representation.
> These must correspond to the tree_code_class entries.  */
> @@ -2305,18 +2313,27 @@ extern const char *const tree_code_class
>  
>  /* Number of argument-words in each kind of tree-node.  */
>  
> -#if __cpp_inline_variables >= 201606L
>  #define DEFTREECODE(SYM, NAME, TYPE, LENGTH) LENGTH,
>  #define END_OF_BASE_TREE_CODES 0,
> +
> +#if __cpp_inline_variables < 201606L
> +template 
> +struct tree_code_length_tmpl {
> +  static constexpr unsigned char tree_code_length[] = {
> +#include "all-tree.def"
> +  };
> +};
> +
> +template 
> +constexpr unsigned char tree_code_length_tmpl::tree_code_length[];
> +#else
>  constexpr inline unsigned char tree_code_length[] = {
>  #include "all-tree.def"
>  };
> +#endif
>  
>  #undef DEFTREECODE
>  #undef END_OF_BASE_TREE_CODES
> -#else
> -extern const unsigned char tree_code_length[];
> -#endif
>  
>  /* Vector of all alias pairs for global symbols.  */
>  extern GTY(()) vec *alias_pairs;
> --- gcc/tree.h.jj 2023-01-27 20:09:16.183970583 +0100
> +++ gcc/tree.h2023-02-02 14:37:17.255004291 +0100
> @@ -177,7 +177,12 @@ code_helper::is_builtin_fn () const
>  #define TREE_CODE_CLASS_STRING(CLASS)\
>  tree_code_class_strings[(int) (CLASS)]
>  
> +#if __cpp_inline_variables < 201606L
> +#define TREE_CODE_CLASS(CODE)\
> +  t

Re: Patch ping - [PATCH] file-prefix-map: Fix up -f*-prefix-map= [PR108464]

2023-03-10 Thread Richard Biener via Gcc-patches
On Fri, 10 Mar 2023, Jakub Jelinek wrote:

> Hi!
> 
> I'd like to ping these patches.  All 3 variants have been
> bootstrapped/regtested on x86_64-linux and i686-linux, the last
> one is my preference I guess.  The current state breaks e.g. ccache.
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2023-January/610285.html
>   - PR108464 - P1 - file-prefix-map: Fix up -f*-prefix-map= (3 variants)

Let's go with your preference - I was hoping on comments from Richard 
Purdie as of his preference but then ..

Thanks,
Richard.

> Thanks
>   Jakub
> 
> On Fri, Jan 20, 2023 at 04:05:55PM +0100, Jakub Jelinek via Gcc-patches wrote:
> > On Tue, Nov 01, 2022 at 01:46:20PM -0600, Jeff Law via Gcc-patches wrote:
> > > > This does cause a change of behaviour if users were previously relying 
> > > > upon
> > > > symlinks or absolute paths not being resolved.
> > > 
> > > I'm not too worried about this scenario.
> > 
> > As mentioned in the PR, this patch breaks e.g. ccache testsuite.
> > 
> > I strongly doubt most of the users want such a behavior, because it
> > makes all filenames absolute when -f*-prefix-map= options remap one
> > absolute path to another one.
> > Say if I'm in /tmp and /tmp is the canonical path and there is
> > src/test.c file, with -fdebug-prefix-map=/tmp=/blah
> > previously there would be DW_AT_comp_dir "/blah" and it is still there,
> > but DW_AT_name which was previouly "src/test.c" (relative against
> > DW_AT_comp_dir) is now "/blah/src/test.c" instead.
> > 
> > Even worse, the canonicalization is only done on the remap_filename
> > argument, but not on the old_prefix side.  That is e.g. what breaks
> > ccache.  If there is
> > /tmp/foobar1 directory and
> > ln -sf foobar1 /tmp/foobar2
> > cd /tmp/foobar2
> > then -fdebug-prefix-map=`pwd`:/blah will just not work, while
> > src/test.c will be canonicalized to /tmp/foobar1/src/test.c,
> > old_prefix is still what the user provided which is /tmp/foobar2.
> > User would need to change their uses to use -fdebug-prefix-map=`realpath 
> > $(pwd)`=/blah
> > 
> > I'm attaching 3 so far just compile tested patches.
> > 
> > The first patch just reverts the patch (and its follow-up patch).
> > 
> > The second introduces a new option, -f{,no}-canon-prefix-map which affects
> > the behavior of -f{file,macro,debug,profile}-prefix-map=, if on it
> > canonicalizes the old path of the prefix map option and compares that
> > against the canonicalized filename for absolute paths but not relative.
> > 
> > And last is like the second, but does that also for relative paths except
> > for filenames with no / (or / or \ on DOS based fs).  So, the third patch
> > gets an optional behavior of what has been on the trunk lately with the
> > difference that the old_prefix is canonicalized by the compiler.
> > 
> > Initially I've thought I'd just add some magic syntax to the OLD=NEW
> > argument of those options (because there are 4 of them), but as noted
> > in the comments, = is valid char in OLD (just not new), so it would
> > be hard to figure out some syntax.  So instead a new option, which one
> > can turn on and off for different -f*-prefix-map= options if needed.
> > 
> > -fdebug-prefix-map=/path1=/mypath1 -fcanon-prefix-map \
> > -fdebug-prefix-map=/path2=/mypath2 -fno-canon-prefix-map \
> > -fdebug-prefix-map=/path3=/mypath3
> > 
> > will use the old behavior for the /path1 and /path3 handling and
> > the new one only for /path2 handling.
> > 
> > Thoughts on this?
> > 
> > Jakub
> 
> > 2023-01-20  Jakub Jelinek  
> > 
> > PR other/108464
> > * file-prefix-map.cc (remap_filename): Revert 2022-11-01 and 2022-11-07
> > changes.
> > 
> > --- gcc/file-prefix-map.cc
> > +++ gcc/file-prefix-map.cc
> > @@ -70,29 +70,19 @@ remap_filename (file_prefix_map *maps, const char 
> > *filename)
> >file_prefix_map *map;
> >char *s;
> >const char *name;
> > -  char *realname;
> >size_t name_len;
> >  
> > -  if (!filename || lbasename (filename) == filename)
> > -return filename;
> > -
> > -  realname = lrealpath (filename);
> > -
> >for (map = maps; map; map = map->next)
> > -if (filename_ncmp (realname, map->old_prefix, map->old_len) == 0)
> > +if (filename_ncmp (filename, map->old_prefix, map->old_len) == 0)
> >break;
> >if (!map)
> > -{
> > -  free (realname);
> > -  return filename;
> > -}
> > -  name = realname + map->old_len;
> > +return filename;
> > +  name = filename + map->old_len;
> >name_len = strlen (name) + 1;
> >  
> >s = (char *) ggc_alloc_atomic (name_len + map->new_len);
> >memcpy (s, map->new_prefix, map->new_len);
> >memcpy (s + map->new_len, name, name_len);
> > -  free (realname);
> >return s;
> >  }
> >  
> 
> > 2023-01-20  Jakub Jelinek  
> > 
> > PR other/108464
> > * common.opt (fcanon-prefix-map): New option.
> > * opts.cc: Include file-prefix-map.h.
> > (flag_canon_prefix_map): New variable.
> > (common_handle_option): Handle O

Re: Patch ping - [PATCH] tree: Use comdat tree_code_{type,length} even for C++11/14 [PR108634]

2023-03-10 Thread Richard Biener via Gcc-patches
On Fri, 10 Mar 2023, Jakub Jelinek wrote:

> Hi!
> 
> I'd like to ping this patch, which has been successfully
> bootstrapped/regtested on x86_64-linux and i686-linux:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611180.html
>   - PR108634 - P3 - tree: Use comdat tree_code_{type,length} even for C++11/14

OK.

Thanks,
Richard.

> Thanks
>   Jakub
> 
> On Thu, Feb 02, 2023 at 03:30:29PM +0100, Jakub Jelinek via Gcc-patches wrote:
> > The recent change to undo the tree_code_type/tree_code_length
> > excessive duplication apparently broke building the Linux kernel
> > plugin.  While it is certainly desirable that GCC plugins are built
> > with the same compiler as GCC has been built and with the same options
> > (at least the important ones), it might be hard to arrange that,
> > e.g. if gcc is built using a cross-compiler but the plugin then built
> > natively, or GCC isn't bootstrapped for other reasons, or just as in
> > the kernel case they were building the plugin with -std=gnu++11 while
> > the bootstrapped GCC has been built without any such option and so with
> > whatever the compiler defaulted to.
> > 
> > For C++17 and later tree_code_{type,length} are UNIQUE symbols with
> > those assembler names, while for C++11/14 they were
> > _ZL14tree_code_type and _ZL16tree_code_length.
> > 
> > The following patch uses a comdat var for those even for C++11/14
> > as suggested by Maciej Cencora.  Relying on weak attribute is not an
> > option because not all hosts support it and there are non-GNU system
> > compilers.  While we could use it unconditionally,
> > I think defining a template just to make it comdat is weird, and
> > the compiler itself is always built with the same compiler.
> > Plugins, being separate shared libraries, will have a separate copy of
> > the arrays if they are ODR-used in the plugin, so there is not a big
> > deal if e.g. cc1plus uses tree_code_type while plugin uses
> > _ZN19tree_code_type_tmplILi0EE14tree_code_typeE or vice versa.
> > 
> > Tested in non-bootstrapped build with both -std=gnu++17 and -std=gnu++11,
> > ok for trunk if it passes full bootstrap/regtest?
> > 
> > 2023-02-02  Jakub Jelinek  
> > 
> > PR plugins/108634
> > * tree-core.h (tree_code_type, tree_code_length): For C++11 or
> > C++14, don't declare as extern const arrays.
> > (tree_code_type_tmpl, tree_code_length_tmpl): New types with
> > static constexpr member arrays for C++11 or C++14.
> > * tree.h (TREE_CODE_CLASS): For C++11 or C++14 use
> > tree_code_type_tmpl <0>::tree_code_type instead of tree_code_type.
> > (TREE_CODE_LENGTH): For C++11 or C++14 use
> > tree_code_length_tmpl <0>::tree_code_length instead of
> > tree_code_length.
> > * tree.cc (tree_code_type, tree_code_length): Remove.
> > 
> > --- gcc/tree-core.h.jj  2023-01-27 10:51:27.575399052 +0100
> > +++ gcc/tree-core.h 2023-02-02 15:06:05.048665279 +0100
> > @@ -2285,19 +2285,27 @@ struct floatn_type_info {
> >  extern bool tree_contains_struct[MAX_TREE_CODES][64];
> >  
> >  /* Class of tree given its code.  */
> > -#if __cpp_inline_variables >= 201606L
> >  #define DEFTREECODE(SYM, NAME, TYPE, LENGTH) TYPE,
> >  #define END_OF_BASE_TREE_CODES tcc_exceptional,
> >  
> > +#if __cpp_inline_variables < 201606L
> > +template 
> > +struct tree_code_type_tmpl {
> > +  static constexpr enum tree_code_class tree_code_type[] = {
> > +#include "all-tree.def"
> > +  };
> > +};
> > +
> > +template 
> > +constexpr enum tree_code_class tree_code_type_tmpl::tree_code_type[];
> > +#else
> >  constexpr inline enum tree_code_class tree_code_type[] = {
> >  #include "all-tree.def"
> >  };
> > +#endif
> >  
> >  #undef DEFTREECODE
> >  #undef END_OF_BASE_TREE_CODES
> > -#else
> > -extern const enum tree_code_class tree_code_type[];
> > -#endif
> >  
> >  /* Each tree code class has an associated string representation.
> > These must correspond to the tree_code_class entries.  */
> > @@ -2305,18 +2313,27 @@ extern const char *const tree_code_class
> >  
> >  /* Number of argument-words in each kind of tree-node.  */
> >  
> > -#if __cpp_inline_variables >= 201606L
> >  #define DEFTREECODE(SYM, NAME, TYPE, LENGTH) LENGTH,
> >  #define END_OF_BASE_TREE_CODES 0,
> > +
> > +#if __cpp_inline_variables < 201606L
> > +template 
> > +struct tree_code_length_tmpl {
> > +  static constexpr unsigned char tree_code_length[] = {
> > +#include "all-tree.def"
> > +  };
> > +};
> > +
> > +template 
> > +constexpr unsigned char tree_code_length_tmpl::tree_code_length[];
> > +#else
> >  constexpr inline unsigned char tree_code_length[] = {
> >  #include "all-tree.def"
> >  };
> > +#endif
> >  
> >  #undef DEFTREECODE
> >  #undef END_OF_BASE_TREE_CODES
> > -#else
> > -extern const unsigned char tree_code_length[];
> > -#endif
> >  
> >  /* Vector of all alias pairs for global symbols.  */
> >  extern GTY(()) vec *alias_pairs;
> > --- gcc/tree.h.jj   2023-01-27 20:09:16.183970583 +0100
> > +++ gcc/tree.h  2

Patch ping: [PATCH] cygwin: Don't try to support multilibs [PR107998]

2023-03-10 Thread Jakub Jelinek via Gcc-patches
Hi!

I'd like to ping this patch (as I wrote a week ago, NightStrike has tested
it):

On Fri, Mar 03, 2023 at 07:44:47PM +0100, Jakub Jelinek via Gcc-patches wrote:
> > > 2023-02-22  Jakub Jelinek  
> > > 
> > >   PR target/107998
> > >   * config.gcc (x86_64-*-cygwin*): Don't add i386/t-cygwin-w64 into
> > >   $tmake_file.
> > >   * config/i386/t-cygwin-w64: Remove.
> > > 
> > > --- gcc/config.gcc.jj 2023-02-18 12:38:30.803025062 +0100
> > > +++ gcc/config.gcc2023-02-21 17:07:12.143164563 +0100
> > > @@ -2105,7 +2105,7 @@ x86_64-*-cygwin*)
> > >   need_64bit_isa=yes
> > >   tm_file="${tm_file} i386/unix.h i386/bsd.h i386/gas.h 
> > > i386/cygming.h i386/cygwin.h i386/cygwin-w64.h i386/cygwin-stdint.h"
> > >   xm_file=i386/xm-cygwin.h
> > > - tmake_file="${tmake_file} i386/t-cygming t-slibgcc i386/t-cygwin-w64"
> > > + tmake_file="${tmake_file} i386/t-cygming t-slibgcc"
> > >   target_gtfiles="$target_gtfiles \$(srcdir)/config/i386/winnt.cc"
> > >   extra_options="${extra_options} i386/cygming.opt 
> > > i386/cygwin.opt"
> > >   extra_objs="${extra_objs} winnt.o winnt-stubs.o"
> > > --- gcc/config/i386/t-cygwin-w64.jj   2020-01-12 11:54:36.333414616 
> > > +0100
> > > +++ gcc/config/i386/t-cygwin-w64  2023-02-21 17:06:44.121572616 +0100
> > > @@ -1,3 +0,0 @@
> > > -MULTILIB_OPTIONS = m64/m32
> > > -MULTILIB_DIRNAMES = 64
> > > -MULTILIB_OSDIRNAMES = ../lib ../lib32
> > 
> > Achim, mind looking at this?
> > Resending due to mail client problems, hopefully not a duplicate.
> 
> NightStrike on IRC said he has tested the patch and it worked fine.
> 
> Is the patch ok for trunk then?

Jakub



Re: Patch ping - [PATCH] file-prefix-map: Fix up -f*-prefix-map= [PR108464]

2023-03-10 Thread Richard Purdie via Gcc-patches
On Fri, 2023-03-10 at 09:05 +, Richard Biener wrote:
> On Fri, 10 Mar 2023, Jakub Jelinek wrote:
> 
> > Hi!
> > 
> > I'd like to ping these patches.  All 3 variants have been
> > bootstrapped/regtested on x86_64-linux and i686-linux, the last
> > one is my preference I guess.  The current state breaks e.g. ccache.
> > 
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-January/610285.html
> >   - PR108464 - P1 - file-prefix-map: Fix up -f*-prefix-map= (3 variants)
> 
> Let's go with your preference - I was hoping on comments from Richard 
> Purdie as of his preference but then ..
> > 

Sorry, I hadn't realised you were waiting on me :(. 

I'd commented on the bug and thought that covered things. We should be
fine with the third option, sorry about the issue and thanks for
resolving it!

Cheers,

Richard


Re: Patch ping: [PATCH] cygwin: Don't try to support multilibs [PR107998]

2023-03-10 Thread Jonathan Yong via Gcc-patches

On 3/10/23 09:37, Jakub Jelinek wrote:

Hi!

I'd like to ping this patch (as I wrote a week ago, NightStrike has tested
it):



Thanks, pushed to master branch.




Re: [PATCH] range-op-float: Fix up -ffinite-math-only range extension and don't extend into infinities [PR109008]

2023-03-10 Thread Jakub Jelinek via Gcc-patches
On Fri, Mar 10, 2023 at 08:53:37AM +, Richard Biener wrote:
> Meh - I wonder if we can avoid all this by making float_widen_lhs_range
> friend of frange and simply access m_min/m_max directly and use the
> copy-CTOR to copy bounds and nan state ... after all verify_range
> will likely fail after you restore flag_finite_math_only ...

I'll defer such changes to Aldy.

As for verification, I think verify_range will not fail on it, it mainly
checks whether it is normalized (e.g. if minimum is frange_val_min and
maximum is frange_val_max and NaNs are possible with both signs (if NaNs
are supported) then it is VR_VARYING etc.).  It doesn't check if the actual
VR_RANGE bounds are smaller or larger than the VR_VARYING bounds, there is
just equality check.
Of course, behavior of wider than varying ranges is still unexpected in
many ways, say the union_ of such a range and VR_VARYING will ICE etc.

Now, I guess another possibility for the reverse ops over these wider ranges
would be avoid calling fold_range in the reverse ops, but call rv_fold
directly or have fold_range variant which would instead of the op1, op2
argument have 2 triplets, op1, op1lb, op1ub, op2, op2lb, op2ub, and it
would use those const REAL_VALUE_TYPE &op??b in preference to
op?.{lower,upper}_bound () or perhaps normal fold_range be implemented
in terms of this extended fold_range.  Then we wouldn't need to bother with
these non-standard franges...

> But OK for the moment.

Thanks, committed.

Jakub



Re: [PATCH v2 4/5] Update texinfo.tex, remove the @gol macro/alias

2023-03-10 Thread Thomas Schwinge
Hi!

On 2023-03-10T01:17:04+0100, Gerald Pfeifer  wrote:
> On Thu, 9 Mar 2023, Sandra Loosemore wrote:
>> This is OK, but I'd like to see this patch split into two separate
>> commits as well -- one for the texinfo.tex import, and one for the @gol
>> changes.
>
> I believe Arsen does not have git write access.

Well, I guess we should fix that?  :-) I've met Arsen in person at
FOSDEM 2023, and I'm happy to vouch for his reasonable behavior.
He does have a FSF Copyright Assignment for GCC.
; I'm happy to act
as sponsor.


Grüße
 Thomas


> Arsen, if that is indeed the case, I offer to push these two commits for
> you if you send them by e-mail (as two attachments).
>
> Gerald
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH] RISC-V: Add fault first load C/C++ support

2023-03-10 Thread Richard Sandiford via Gcc-patches
Bernhard Reutner-Fischer via Gcc-patches  writes:
> On 7 March 2023 07:21:23 CET, juzhe.zh...@rivai.ai wrote:
>>From: Ju-Zhe Zhong 
>>
>
>>+class vleff : public function_base
>>+{
>>+public:
>>+  unsigned int call_properties (const function_instance &) const override
>>+  {
>>+return CP_READ_MEMORY | CP_WRITE_CSR;
>>+  }
>>+
>>+  gimple *fold (gimple_folder &f) const override
>>+  {
>>+/* fold vleff (const *base, size_t *new_vl, size_t vl)
>>+
>>+   > vleff (const *base, size_t vl)
>>+  new_vl = MEM_REF[read_vl ()].  */
>>+
>>+auto_vec vargs;
>
> Where is that magic 8 coming from?

I'm probably not saying anything you don't already know, but:

The second template parameter is just an optimisation.  It reserves a
"small" amount of stack space for the vector, to reduce the likelihood
that a full malloc/free will be needed.  The vector can still grow
arbitrarily large.

So these numbers are always just gut instinct for what a reasonable
common case would be.  There's no particular science to it, and no
particular need to explain away the value.

The second parameter is still useful if the vector size is known at
construction time.

When I've looked at cc1 and cc1plus profiles in the past, malloc has
often been a significant contributor.  Trying to avoid malloc/free
cycles for "petty" arrays seems like a worthwhile thing to do.

Thanks,
Richard


Re: [PATCH] Fix PR 108874: aarch64 code regression with shift and ands

2023-03-10 Thread Richard Sandiford via Gcc-patches
Andrew Pinski via Gcc-patches  writes:
> After r6-2044-g98e30e515f184b, code like "((x & 0xff00ff00U) >> 8)"
> would be optimized like (x >> 8) & 0xff00ffU which is normally better
> except on aarch64, the shift right could be combined with another
> operation in some cases. So we need to add a few define_splits
> to the aarch64 backends that match "((x >> shift) & CST0) OP Y"
> and splits it to:
> TMP = X & CST1
> (TMP >> shift) OP Y
>
> Note this also gets us to matching rev16 back too so I added a
> testcase to make sure we don't lose that matching any more.
> Note when the generic patch to recognize those as bswap ROT 16,
> we might regress again and need to add a few more patterns to
> the aarch64 backend but will deal with that once that happens.
>
> OK? Bootstrapped and tested on aarch64 with no regressions.
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.md: Add a new define_split
>   to help combine.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/rev16_2.c: New test.
>   * gcc.target/aarch64/shift_and_operator-1.c: New test.
> ---
>  gcc/config/aarch64/aarch64.md | 21 ++
>  gcc/testsuite/gcc.target/aarch64/rev16_2.c| 39 +++
>  .../gcc.target/aarch64/shift_and_operator-1.c | 22 +++
>  3 files changed, 82 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/rev16_2.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/shift_and_operator-1.c
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index af9087508ac..41cc563f10c 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -4656,6 +4656,27 @@ (define_insn "*_3"
>[(set_attr "type" "logic_shift_imm")]
>  )
>  
> +(define_split
> +  [(set (match_operand:GPI 0 "register_operand")
> + (LOGICAL_OR_PLUS:GPI
> +   (and:GPI
> + (lshiftrt:GPI (match_operand:GPI 1 "register_operand")
> +   (match_operand:QI 2 "aarch64_shift_imm_"))
> + (match_operand:GPI 3 "aarch64_logical_immediate"))
> +   (match_operand:GPI 4 "register_operand")))]
> +  "can_create_pseudo_p ()
> +   && aarch64_bitmask_imm (UINTVAL (operands[3]) << UINTVAL (operands[2]), 
> mode)"

Formatting nit: long line

> +  [(set (match_dup 5) (and:GPI (match_dup 1) (match_dup 6)))
> +   (set (match_dup 0) (match_dup 7))]
> +  {
> +operands[5] = gen_reg_rtx (mode);
> +operands[6] = gen_int_mode (UINTVAL (operands[3]) << UINTVAL 
> (operands[2]), mode);

Here too.

> +rtx shift = gen_rtx_LSHIFTRT (mode, operands[5], operands[2]);
> +rtx_code new_code = ;
> +operands[7] = gen_rtx_fmt_ee (new_code, mode, shift, operands[4]);

It should be possible to do the last three statements in the
rtl pattern, e.g. as:

  [(set (match_dup 5) (and:GPI (match_dup 1) (match_dup 6)))
   (set (match_dup 0) (LOGICAL_OR_PLUS:GPI
(lshiftrt:GPI (match_dup 5) (match_dup 2))
(match_dup 4)))]

OK with those change, thanks.

Richard

> +  }
> +)
> +
>  (define_split
>[(set (match_operand:GPI 0 "register_operand")
>   (LOGICAL_OR_PLUS:GPI
> diff --git a/gcc/testsuite/gcc.target/aarch64/rev16_2.c 
> b/gcc/testsuite/gcc.target/aarch64/rev16_2.c
> new file mode 100644
> index 000..621eb5dfbf0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/rev16_2.c
> @@ -0,0 +1,39 @@
> +/* { dg-options "-O2" } */
> +/* { dg-do compile } */
> +
> +extern void abort (void);
> +
> +typedef unsigned int __u32;
> +
> +__u32
> +__rev16_32_alt (__u32 x)
> +{
> +  return (((__u32)(x) & (__u32)0xff00ff00UL) >> 8)
> + | (((__u32)(x) & (__u32)0x00ff00ffUL) << 8);
> +}
> +
> +__u32
> +__rev16_32 (__u32 x)
> +{
> +  return (((__u32)(x) & (__u32)0x00ff00ffUL) << 8)
> + | (((__u32)(x) & (__u32)0xff00ff00UL) >> 8);
> +}
> +
> +typedef unsigned long long __u64;
> +
> +__u64
> +__rev16_64_alt (__u64 x)
> +{
> +  return (((__u64)(x) & (__u64)0xff00ff00ff00ff00UL) >> 8)
> + | (((__u64)(x) & (__u64)0x00ff00ff00ff00ffUL) << 8);
> +}
> +
> +__u64
> +__rev16_64 (__u64 x)
> +{
> +  return (((__u64)(x) & (__u64)0x00ff00ff00ff00ffUL) << 8)
> + | (((__u64)(x) & (__u64)0xff00ff00ff00ff00UL) >> 8);
> +}
> +
> +/* { dg-final { scan-assembler-times "rev16\\tx\[0-9\]+" 2 } } */
> +/* { dg-final { scan-assembler-times "rev16\\tw\[0-9\]+" 2 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/shift_and_operator-1.c 
> b/gcc/testsuite/gcc.target/aarch64/shift_and_operator-1.c
> new file mode 100644
> index 000..49152c5495a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/shift_and_operator-1.c
> @@ -0,0 +1,22 @@
> +/* { dg-options "-O2" } */
> +/* { dg-do compile } */
> +
> +unsigned f(unsigned x, unsigned b)
> +{
> +  return ((x & 0xff00ff00U) >> 8) | b;
> +}
> +
> +unsigned f0(unsigned x, unsigned b)
> +{
> +  return ((x & 0xff00ff00U) >> 8) ^ b;
> +}
> +unsigned f1(unsigned x, unsigned b)
> +{
> +  return ((x & 0xff00ff00U) >> 8) 

[PATCH] Shrink points-to analysis dumps when not dumping with -details

2023-03-10 Thread Richard Biener via Gcc-patches
The following allows to get PTA stats with -stats without blowing
up your filesystem by guarding constraint and solution dumping
with TDF_DETAILS and the SSA points-to info with TDF_DETAILS
or TDF_ALIAS.

Queued for stage1.

* tree-ssa-structalias.cc (dump_sa_stats): Split out from...
(dump_sa_points_to_info): ... this function.
(compute_points_to_sets): Guard large dumps with TDF_DETAILS,
and call dump_sa_stats guarded with TDF_STATS.
(ipa_pta_execute): Likewise.
(compute_may_aliases): Guard dump_alias_info with
TDF_DETAILS|TDF_ALIAS.
---
 gcc/tree-ssa-structalias.cc | 63 -
 1 file changed, 35 insertions(+), 28 deletions(-)

diff --git a/gcc/tree-ssa-structalias.cc b/gcc/tree-ssa-structalias.cc
index 07e0fd6827a..fd7450b9477 100644
--- a/gcc/tree-ssa-structalias.cc
+++ b/gcc/tree-ssa-structalias.cc
@@ -7130,33 +7130,33 @@ pt_solutions_intersect (struct pt_solution *pt1, struct 
pt_solution *pt2)
   return res;
 }
 
+/* Dump stats information to OUTFILE.  */
+
+static void
+dump_sa_stats (FILE *outfile)
+{
+  fprintf (outfile, "Points-to Stats:\n");
+  fprintf (outfile, "Total vars:   %d\n", stats.total_vars);
+  fprintf (outfile, "Non-pointer vars:  %d\n",
+  stats.nonpointer_vars);
+  fprintf (outfile, "Statically unified vars:  %d\n",
+  stats.unified_vars_static);
+  fprintf (outfile, "Dynamically unified vars: %d\n",
+  stats.unified_vars_dynamic);
+  fprintf (outfile, "Iterations:   %d\n", stats.iterations);
+  fprintf (outfile, "Number of edges:  %d\n", stats.num_edges);
+  fprintf (outfile, "Number of implicit edges: %d\n",
+  stats.num_implicit_edges);
+}
 
 /* Dump points-to information to OUTFILE.  */
 
 static void
 dump_sa_points_to_info (FILE *outfile)
 {
-  unsigned int i;
-
   fprintf (outfile, "\nPoints-to sets\n\n");
 
-  if (dump_flags & TDF_STATS)
-{
-  fprintf (outfile, "Stats:\n");
-  fprintf (outfile, "Total vars:   %d\n", stats.total_vars);
-  fprintf (outfile, "Non-pointer vars:  %d\n",
-  stats.nonpointer_vars);
-  fprintf (outfile, "Statically unified vars:  %d\n",
-  stats.unified_vars_static);
-  fprintf (outfile, "Dynamically unified vars: %d\n",
-  stats.unified_vars_dynamic);
-  fprintf (outfile, "Iterations:   %d\n", stats.iterations);
-  fprintf (outfile, "Number of edges:  %d\n", stats.num_edges);
-  fprintf (outfile, "Number of implicit edges: %d\n",
-  stats.num_implicit_edges);
-}
-
-  for (i = 1; i < varmap.length (); i++)
+  for (unsigned i = 1; i < varmap.length (); i++)
 {
   varinfo_t vi = get_varinfo (i);
   if (!vi->may_have_pointers)
@@ -7537,7 +7537,7 @@ compute_points_to_sets (void)
}
 }
 
-  if (dump_file)
+  if (dump_file && (dump_flags & TDF_DETAILS))
 {
   fprintf (dump_file, "Points-to analysis\n\nConstraints:\n\n");
   dump_constraints (dump_file, 0);
@@ -7610,7 +7610,10 @@ compute_points_to_sets (void)
BITMAP_FREE (new_delta);
   }
 
-  if (dump_file)
+  if (dump_file && (dump_flags & TDF_STATS))
+dump_sa_stats (dump_file);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
 dump_sa_points_to_info (dump_file);
 
   /* Compute the points-to set for ESCAPED used for call-clobber analysis.  */
@@ -8032,7 +8035,8 @@ compute_may_aliases (void)
   "because IPA points-to information is available.\n\n");
 
  /* But still dump what we have remaining it.  */
- dump_alias_info (dump_file);
+ if (dump_flags & (TDF_DETAILS|TDF_ALIAS))
+   dump_alias_info (dump_file);
}
 
   return 0;
@@ -8044,7 +8048,7 @@ compute_may_aliases (void)
   compute_points_to_sets ();
 
   /* Debugging dumps.  */
-  if (dump_file)
+  if (dump_file && (dump_flags & (TDF_DETAILS|TDF_ALIAS)))
 dump_alias_info (dump_file);
 
   /* Compute restrict-based memory disambiguations.  */
@@ -8305,7 +8309,7 @@ ipa_pta_execute (void)
   fprintf (dump_file, "\n");
 }
 
-  if (dump_file)
+  if (dump_file && (dump_flags & TDF_DETAILS))
 {
   fprintf (dump_file, "Generating generic constraints\n\n");
   dump_constraints (dump_file, from);
@@ -8344,7 +8348,7 @@ ipa_pta_execute (void)
   vi = create_function_info_for (node->decl,
 alias_get_name (node->decl), false,
 nonlocal_p);
-  if (dump_file
+  if (dump_file && (dump_flags & TDF_DETAILS)
  && from != constraints.length ())
{
  fprintf (dump_file,
@@ -8385,7 +8389,7 @@ ipa_pta_execute (void)
vi->is_ipa_escape_point = true;
 }
 
-  if (dump_file
+  if (dump_file && (dump_flags & TDF_DETAILS)
   && from != constraints.length ())
 {
   fprintf (dump_file,
@@ -8442,7 +8446,7 @@ ipa_pta_execute (void)

Re: AArch64 bfloat16 mangling

2023-03-10 Thread Jakub Jelinek via Gcc-patches
On Fri, Mar 10, 2023 at 08:43:02AM +, Richard Sandiford wrote:
> > So, either __bf16 should be also extended floating-point type
> > like decltype (0.0bf16) and std::bfloat16_t and in that case
> > it is fine if it mangles u6__bf16, or __bf16 will be a distinct
> > type from the latter two,
> 
> Yeah, the former is what I meant.  The intention is that __bf16 and
> std::bfloat16_t are the same type, not distinct types.

Ok, in that case here is totally untested patch on top of
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606398.html
which is also needed (for aarch64 of course the i386 parts of the
patch which have been acked already don't matter but the 2 libgcc
new files are needed and the optabs change is too).

The reason why __floatdibf and __floatundibf are needed on aarch64
and not on x86 is that the latter has optabs for DI -> XF conversions
and so for DI -> BF uses DI -> XF -> BF where the first conversion
doesn't round/truncate anything.  While on aarch64 DI -> TF conversion
where TF is the narrowed mode which can hold all DI values exactly
is done using a libcall and so GCC emits direct DI -> BF conversions.

Will test it momentarily (including the patch it depends on):

2023-03-10  Jakub Jelinek  

gcc/
* config/aarch64/aarch64.h (aarch64_bf16_type_node): Remove.
(aarch64_bf16_ptr_type_node): Adjust comment.
* config/aarch64/aarch64.cc (aarch64_gimplify_va_arg_expr): Use
bfloat16_type_node rather than aarch64_bf16_type_node.
(aarch64_libgcc_floating_mode_supported_p,
aarch64_scalar_mode_supported_p): Also support BFmode.
(aarch64_invalid_conversion, aarch64_invalid_unary_op): Remove.
aarch64_invalid_binary_op): Remove BFmode related rejections.
(TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP): Don't redefine.
* config/aarch64/aarch64-builtins.cc (aarch64_bf16_type_node): Remove.
(aarch64_int_or_fp_type): Use bfloat16_type_node rather than
aarch64_bf16_type_node.
(aarch64_init_simd_builtin_types): Likewise.
(aarch64_init_bf16_types): Likewise.  Don't create bfloat16_type_node,
which is created in tree.cc already.
* config/aarch64/aarch64-sve-builtins.def (svbfloat16_t): Likewise.
gcc/testsuite/
* gcc.target/aarch64/sve/acle/general-c/ternary_bfloat16_opt_n_1.c:
Don't expect one __bf16 related error.
libgcc/
* config/aarch64/t-softfp (softfp_extensions): Add bfsf.
(softfp_truncations): Add tfbf dfbf sfbf hfbf.
(softfp_extras): Add floatdibf floatundibf floattibf floatuntibf.
* config/aarch64/libgcc-softfp.ver (GCC_13.0.0): Export
__extendbfsf2 and __trunc{s,d,t,h}fbf2.
* config/aarch64/sfp-machine.h (_FP_NANFRAC_B, _FP_NANSIGN_B): Define.
* soft-fp/floatundibf.c: New file.
* soft-fp/floatdibf.c: New file.
libstdc++-v3/
* config/abi/pre/gnu.ver (CXXABI_1.3.14): Also export __bf16 tinfos
if it isn't mangled as DF16b but u6__bf16.

--- gcc/config/aarch64/aarch64.h.jj 2023-01-16 11:52:15.923736422 +0100
+++ gcc/config/aarch64/aarch64.h2023-03-10 11:49:35.941436327 +0100
@@ -1237,9 +1237,8 @@ extern const char *aarch64_rewrite_mcpu
 extern GTY(()) tree aarch64_fp16_type_node;
 extern GTY(()) tree aarch64_fp16_ptr_type_node;
 
-/* This type is the user-visible __bf16, and a pointer to that type.  Defined
-   in aarch64-builtins.cc.  */
-extern GTY(()) tree aarch64_bf16_type_node;
+/* Pointer to the user-visible __bf16 type.  __bf16 itself is generic
+   bfloat16_type_node.  Defined in aarch64-builtins.cc.  */
 extern GTY(()) tree aarch64_bf16_ptr_type_node;
 
 /* The generic unwind code in libgcc does not initialize the frame pointer.
--- gcc/config/aarch64/aarch64-builtins.cc.jj   2023-01-16 11:52:15.913736570 
+0100
+++ gcc/config/aarch64/aarch64-builtins.cc  2023-03-10 11:49:35.942436313 
+0100
@@ -918,7 +918,6 @@ tree aarch64_fp16_type_node = NULL_TREE;
 tree aarch64_fp16_ptr_type_node = NULL_TREE;
 
 /* Back-end node type for brain float (bfloat) types.  */
-tree aarch64_bf16_type_node = NULL_TREE;
 tree aarch64_bf16_ptr_type_node = NULL_TREE;
 
 /* Wrapper around add_builtin_function.  NAME is the name of the built-in
@@ -1010,7 +1009,7 @@ aarch64_int_or_fp_type (machine_mode mod
 case E_DFmode:
   return double_type_node;
 case E_BFmode:
-  return aarch64_bf16_type_node;
+  return bfloat16_type_node;
 default:
   gcc_unreachable ();
 }
@@ -1124,8 +1123,8 @@ aarch64_init_simd_builtin_types (void)
   aarch64_simd_types[Float64x2_t].eltype = double_type_node;
 
   /* Init Bfloat vector types with underlying __bf16 type.  */
-  aarch64_simd_types[Bfloat16x4_t].eltype = aarch64_bf16_type_node;
-  aarch64_simd_types[Bfloat16x8_t].eltype = aarch64_bf16_type_node;
+  aarch64_simd_types[Bfloat16x4_t].eltype = bfloat16_type_node;
+  aarch64_simd_types[Bfloat16x8_t].eltype = bfloat16_type_node;
 
   for (i = 0; i < nelts; 

[committed] libstdc++: Fix GDB Xmethod for std::shared_ptr::use_count() [PR109064]

2023-03-10 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux (GDB 13.1).

Pushed to trunk for now, will do backports too.

-- >8 --

libstdc++-v3/ChangeLog:

PR libstdc++/109064
* python/libstdcxx/v6/xmethods.py (SharedPtrUseCountWorker):
Remove self-recursion in __init__. Add missing _supports.
* testsuite/libstdc++-xmethods/shared_ptr.cc: Check use_count()
and unique().
---
 libstdc++-v3/python/libstdcxx/v6/xmethods.py| 5 -
 libstdc++-v3/testsuite/libstdc++-xmethods/shared_ptr.cc | 7 +++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/python/libstdcxx/v6/xmethods.py 
b/libstdc++-v3/python/libstdcxx/v6/xmethods.py
index 6af7a3dcfe3..18a165f425e 100644
--- a/libstdc++-v3/python/libstdcxx/v6/xmethods.py
+++ b/libstdc++-v3/python/libstdcxx/v6/xmethods.py
@@ -730,7 +730,7 @@ class SharedPtrUseCountWorker(gdb.xmethod.XMethodWorker):
 "Implements std::shared_ptr::use_count()"
 
 def __init__(self, elem_type):
-SharedPtrUseCountWorker.__init__(self, elem_type)
+pass
 
 def get_arg_types(self):
 return None
@@ -738,6 +738,9 @@ class SharedPtrUseCountWorker(gdb.xmethod.XMethodWorker):
 def get_result_type(self, obj):
 return gdb.lookup_type('long')
 
+def _supports(self, method_name):
+return True
+
 def __call__(self, obj):
 refcounts = obj['_M_refcount']['_M_pi']
 return refcounts['_M_use_count'] if refcounts else 0
diff --git a/libstdc++-v3/testsuite/libstdc++-xmethods/shared_ptr.cc 
b/libstdc++-v3/testsuite/libstdc++-xmethods/shared_ptr.cc
index c89f1d05aa3..c228d3ad26c 100644
--- a/libstdc++-v3/testsuite/libstdc++-xmethods/shared_ptr.cc
+++ b/libstdc++-v3/testsuite/libstdc++-xmethods/shared_ptr.cc
@@ -37,6 +37,8 @@ main ()
 
   std::shared_ptr s(new x_struct[2]{ {92}, {115} });
 
+  auto qq = q;
+
 // { dg-final { note-test *p 10 } }
 // { dg-final { regexp-test p.get() 0x.* } }
 
@@ -67,6 +69,11 @@ main ()
 // { dg-final { whatis-test s.get() "x_struct \*" } }
 // { dg-final { whatis-test s\[1].y int } }
 
+// { dg-final { note-test p.use_count() 1 } }
+// { dg-final { note-test p.unique() true } }
+// { dg-final { note-test q.use_count() 2 } }
+// { dg-final { note-test q.unique() false } }
+
   return 0;  // Mark SPOT
 }
 
-- 
2.39.2



Re: [PATCH 0/4] openacc: Async fixes

2023-03-10 Thread Thomas Schwinge
Hi!

On 2021-06-30T10:28:00+0200, I wrote:
> On 2021-06-29T16:42:00-0700, Julian Brown  wrote:
>>  - The OpenACC profiling-interface implementation did not measure
>>asynchronous operations properly.
>
> We'll need to be careful: (possibly, an older version of) that one we
> internally had identified to be causing some issues; see the
> "acc_prof-parallel-1.c intermittent failure on og10 branch" emails,
> 2020-07.

That's still unresolved (not blaming you!); those intermittent failures
are still seen.  I've not yet been able to look into your follow-on
discussion and WIP patch 'acc_prof-parallel-barrier-1.diff'
"Add barrier, hack" in detail.


As part of the og12 branch setup, Kwok then had to put
og12 commit b845d2f62e7da1c4cfdfee99690de94b648d076d
"Revert changes to acc_prof-init-1.c and acc_prof-parallel-1.c" on top of
your og12 commit 719f93c8618a134f90b5b661ab70c918d659ad05
"OpenACC profiling-interface fixes for asynchronous operations", and that
stuff is now again conflicting with GCC master branch work that I need to
cherry-pick into og12 branch.

Therefore, I'm now reverting this on og12 branch -- with the intention to
resolve that issue on master branch, eventually (but no promises, when).
Pushed to devel/omp/gcc-12 branch
commit 1818bab2ce9f11d8dde5b378f580971b87a5c4ff
'Revert "Revert changes to acc_prof-init-1.c and acc_prof-parallel-1.c"', and
commit b8beaa8447ed3c1637e8f93a08c0e47b5709290f
'Revert "OpenACC profiling-interface fixes for asynchronous operations"',
see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 1818bab2ce9f11d8dde5b378f580971b87a5c4ff Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 2 Mar 2023 11:24:28 +0100
Subject: [PATCH 1/2] Revert "Revert changes to acc_prof-init-1.c and
 acc_prof-parallel-1.c"

... as a prerequisite for reverting
"OpenACC profiling-interface fixes for asynchronous operations".

This reverts og12 commit b845d2f62e7da1c4cfdfee99690de94b648d076d.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/acc_prof-init-1.c: Revert
	"Revert changes to acc_prof-init-1.c and acc_prof-parallel-1.c"
	changes.
	* testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c:
	Likewise.
---
 libgomp/ChangeLog.omp |  8 
 .../acc_prof-init-1.c | 17 
 .../acc_prof-parallel-1.c | 20 +++
 3 files changed, 45 insertions(+)

diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp
index 3ed90bb38f2..d55b0503920 100644
--- a/libgomp/ChangeLog.omp
+++ b/libgomp/ChangeLog.omp
@@ -1,3 +1,11 @@
+2023-03-10  Thomas Schwinge  
+
+	* testsuite/libgomp.oacc-c-c++-common/acc_prof-init-1.c: Revert
+	"Revert changes to acc_prof-init-1.c and acc_prof-parallel-1.c"
+	changes.
+	* testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c:
+	Likewise.
+
 2023-03-01  Tobias Burnus  
 
 	Backported from master:
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-init-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-init-1.c
index 6bbe99df1ff..a33fac7556c 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-init-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-init-1.c
@@ -208,6 +208,21 @@ static void cb_compute_construct_end (acc_prof_info *prof_info, acc_event_info *
 
   assert (state == 11
 	  || state == 111);
+#if defined COPYIN
+  /* In an 'async' setting, this event may be triggered before actual 'async'
+ data copying has completed.  Given that 'state' appears in 'COPYIN', we
+ first have to synchronize (that is, let the 'async' 'COPYIN' read the
+ current 'state' value)...  */
+  if (acc_async != acc_async_sync)
+{
+  /* "We're not yet accounting for the fact that _OpenACC events may occur
+	 during event processing_"; temporarily disable to avoid deadlock.  */
+  unreg (acc_ev_none, NULL, acc_toggle_per_thread);
+  acc_wait (acc_async);
+  reg (acc_ev_none, NULL, acc_toggle_per_thread);
+}
+  /* ... before modifying it in the following.  */
+#endif
   STATE_OP (state, ++);
 
   assert (tool_info != NULL);
@@ -280,6 +295,7 @@ int main()
 {
   state_init = state;
 }
+acc_async = acc_async_sync;
 #pragma acc wait
 assert (state_init == 11);
   }
@@ -306,6 +322,7 @@ int main()
 {
   state_init = state;
 }
+acc_async = acc_async_sync;
 #pragma acc wait
 assert (state_init == 111);
   }
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c
index 9a542b56fe5..663f7f724d5 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-

Re: [PATCH] range-op-float: Extend lhs by 0.5ulp rather than 1ulp if not -frounding-math [PR109008]

2023-03-10 Thread Richard Biener via Gcc-patches
On Fri, 10 Mar 2023, Jakub Jelinek wrote:

> Hi!
> 
> This patch, incremental to the just posted one, improves the reverse
> operation ranges significantly by widening just by 0.5ulp in each
> direction rather than 1ulp.  Again, REAL_VALUE_TYPE has both wider
> exponent range and wider mantissa precision (160 bits) than any
> supported type, this patch uses the latter property.
> 
> The patch doesn't do it if -frounding-math, because then the rounding
> can be +-1ulp in each direction depending on the rounding mode which
> we don't know, or for IBM double double because that type is just weird
> and we can't trust in sane properties.
> 
> I've performed testing of these 2 patches on 30 random tests as with
> yesterday's patch, exact numbers are in the PR, but I see very significant
> improvement in the precision of the ranges while keeping it conservatively
> correct.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

The flow is a little bit obfuscated, but OK.

Thanks,
Richard.

> 2023-03-10  Jakub Jelinek  
> 
>   PR tree-optimization/109008
>   * range-op-float.cc (float_widen_lhs_range): If not
>   -frounding-math and not IBM double double format, extend lhs
>   range just by 0.5ulp rather than 1ulp in each direction.
> 
> --- gcc/range-op-float.cc.jj  2023-03-09 12:13:57.189790814 +0100
> +++ gcc/range-op-float.cc 2023-03-09 13:12:05.248873234 +0100
> @@ -2205,8 +2205,8 @@ zero_to_inf_range (REAL_VALUE_TYPE &lb,
> [1., 1.] = op1 + [1., 1.].  op1's range is not [0., 0.], but
> [-0x1.0p-54, 0x1.0p-53] (when not -frounding-math), any value for
> which adding 1. to it results in 1. after rounding to nearest.
> -   So, for op1_range/op2_range extend the lhs range by 1ulp in each
> -   direction.  See PR109008 for more details.  */
> +   So, for op1_range/op2_range extend the lhs range by 1ulp (or 0.5ulp)
> +   in each direction.  See PR109008 for more details.  */
>  
>  static frange
>  float_widen_lhs_range (tree type, const frange &lhs)
> @@ -2230,6 +2230,14 @@ float_widen_lhs_range (tree type, const
> lb = dconstm1;
> SET_REAL_EXP (&lb, FLOAT_MODE_FORMAT (TYPE_MODE (type))->emax + 1);
>   }
> +  if (!flag_rounding_math && !MODE_COMPOSITE_P (TYPE_MODE (type)))
> + {
> +   /* If not -frounding-math nor IBM double double, actually widen
> +  just by 0.5ulp rather than 1ulp.  */
> +   REAL_VALUE_TYPE tem;
> +   real_arithmetic (&tem, PLUS_EXPR, &lhs.lower_bound (), &lb);
> +   real_arithmetic (&lb, RDIV_EXPR, &tem, &dconst2);
> + }
>  }
>if (real_isfinite (&ub))
>  {
> @@ -2240,6 +2248,14 @@ float_widen_lhs_range (tree type, const
> ub = dconst1;
> SET_REAL_EXP (&ub, FLOAT_MODE_FORMAT (TYPE_MODE (type))->emax + 1);
>   }
> +  if (!flag_rounding_math && !MODE_COMPOSITE_P (TYPE_MODE (type)))
> + {
> +   /* If not -frounding-math nor IBM double double, actually widen
> +  just by 0.5ulp rather than 1ulp.  */
> +   REAL_VALUE_TYPE tem;
> +   real_arithmetic (&tem, PLUS_EXPR, &lhs.upper_bound (), &ub);
> +   real_arithmetic (&ub, RDIV_EXPR, &tem, &dconst2);
> + }
>  }
>/* Temporarily disable -ffinite-math-only, so that frange::set doesn't
>   reduce the range back to real_min_representable (type) as lower bound
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


Re: AArch64 bfloat16 mangling

2023-03-10 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek  writes:
> On Fri, Mar 10, 2023 at 08:43:02AM +, Richard Sandiford wrote:
>> > So, either __bf16 should be also extended floating-point type
>> > like decltype (0.0bf16) and std::bfloat16_t and in that case
>> > it is fine if it mangles u6__bf16, or __bf16 will be a distinct
>> > type from the latter two,
>> 
>> Yeah, the former is what I meant.  The intention is that __bf16 and
>> std::bfloat16_t are the same type, not distinct types.
>
> Ok, in that case here is totally untested patch on top of
> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606398.html
> which is also needed (for aarch64 of course the i386 parts of the
> patch which have been acked already don't matter but the 2 libgcc
> new files are needed and the optabs change is too).

OK for the rest of that.

> The reason why __floatdibf and __floatundibf are needed on aarch64
> and not on x86 is that the latter has optabs for DI -> XF conversions
> and so for DI -> BF uses DI -> XF -> BF where the first conversion
> doesn't round/truncate anything.  While on aarch64 DI -> TF conversion
> where TF is the narrowed mode which can hold all DI values exactly
> is done using a libcall and so GCC emits direct DI -> BF conversions.
>
> Will test it momentarily (including the patch it depends on):
>
> 2023-03-10  Jakub Jelinek  
>
> gcc/
>   * config/aarch64/aarch64.h (aarch64_bf16_type_node): Remove.
>   (aarch64_bf16_ptr_type_node): Adjust comment.
>   * config/aarch64/aarch64.cc (aarch64_gimplify_va_arg_expr): Use
>   bfloat16_type_node rather than aarch64_bf16_type_node.
>   (aarch64_libgcc_floating_mode_supported_p,
>   aarch64_scalar_mode_supported_p): Also support BFmode.
>   (aarch64_invalid_conversion, aarch64_invalid_unary_op): Remove.
>   aarch64_invalid_binary_op): Remove BFmode related rejections.
>   (TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP): Don't redefine.
>   * config/aarch64/aarch64-builtins.cc (aarch64_bf16_type_node): Remove.
>   (aarch64_int_or_fp_type): Use bfloat16_type_node rather than
>   aarch64_bf16_type_node.
>   (aarch64_init_simd_builtin_types): Likewise.
>   (aarch64_init_bf16_types): Likewise.  Don't create bfloat16_type_node,
>   which is created in tree.cc already.
>   * config/aarch64/aarch64-sve-builtins.def (svbfloat16_t): Likewise.
> gcc/testsuite/
>   * gcc.target/aarch64/sve/acle/general-c/ternary_bfloat16_opt_n_1.c:
>   Don't expect one __bf16 related error.
> libgcc/
>   * config/aarch64/t-softfp (softfp_extensions): Add bfsf.
>   (softfp_truncations): Add tfbf dfbf sfbf hfbf.
>   (softfp_extras): Add floatdibf floatundibf floattibf floatuntibf.
>   * config/aarch64/libgcc-softfp.ver (GCC_13.0.0): Export
>   __extendbfsf2 and __trunc{s,d,t,h}fbf2.
>   * config/aarch64/sfp-machine.h (_FP_NANFRAC_B, _FP_NANSIGN_B): Define.
>   * soft-fp/floatundibf.c: New file.
>   * soft-fp/floatdibf.c: New file.
> libstdc++-v3/
>   * config/abi/pre/gnu.ver (CXXABI_1.3.14): Also export __bf16 tinfos
>   if it isn't mangled as DF16b but u6__bf16.

Thanks, looks great.  Nice to see all the - lines. :)

A naive question:

> --- libgcc/config/aarch64/t-softfp.jj 2022-11-14 13:35:34.527155682 +0100
> +++ libgcc/config/aarch64/t-softfp2023-03-10 12:19:58.668882041 +0100
> @@ -1,9 +1,10 @@
>  softfp_float_modes := tf
>  softfp_int_modes := si di ti
> -softfp_extensions := sftf dftf hftf
> -softfp_truncations := tfsf tfdf tfhf
> +softfp_extensions := sftf dftf hftf bfsf
> +softfp_truncations := tfsf tfdf tfhf tfbf dfbf sfbf hfbf

Is bfsf used for conversions in which sf is the ultimate target,
as opposed to operations that convert bf to sf and then do something
with the sf?  And so the libfunc is needed to raise exceptions, which in
more complex operations can be left to the following sf operation?

Do we still optimise to a shift for -ffinite-math-only?

Assuming so, the patch LGTM.  I'm not familiar enough with softfloat
to do a meaningful review of those parts, and I'm taking the versioning
changes on faith. :)

Thanks,
Richard

>  softfp_exclude_libgcc2 := n
> -softfp_extras := fixhfti fixunshfti floattihf floatuntihf
> +softfp_extras := fixhfti fixunshfti floattihf floatuntihf \
> +  floatdibf floatundibf floattibf floatuntibf
>  
>  TARGET_LIBGCC2_CFLAGS += -Wno-missing-prototypes
>  
> --- libgcc/config/aarch64/libgcc-softfp.ver.jj2023-01-16 
> 11:52:16.633725959 +0100
> +++ libgcc/config/aarch64/libgcc-softfp.ver   2023-03-10 12:11:44.144082714 
> +0100
> @@ -26,3 +26,16 @@ GCC_11.0 {
>__mulhc3
>__trunctfhf2
>  }
> +
> +%inherit GCC_13.0.0 GCC_11.0.0
> +GCC_13.0.0 {
> +  __extendbfsf2
> +  __floatdibf
> +  __floattibf
> +  __floatundibf
> +  __floatuntibf
> +  __truncdfbf2
> +  __truncsfbf2
> +  __trunctfbf2
> +  __trunchfbf2
> +}
> --- libgcc/config/aarch64/sfp-machine.h.jj2023-01-16 11:52:16.633725959 
> +0100
> +++ libgcc/config/aarch64/sfp-mach

[PATCH v6] RISC-V: Add support for experimental zfa extension.

2023-03-10 Thread Jin Ma via Gcc-patches
This patch adds the 'Zfa' extension for riscv, which is based on:
 
https://github.com/riscv/riscv-isa-manual/commit/d74d99e22d5f68832f70982d867614e2149a3bd7
latest 'Zfa' change on the master branch of the RISC-V ISA Manual as
of this writing.

The Wiki Page (details):
 https://github.com/a4lg/binutils-gdb/wiki/riscv_zfa

The binutils-gdb for 'Zfa' extension:
 https://sourceware.org/pipermail/binutils/2022-September/122938.html

Implementation of zfa extension on LLVM:
  https://reviews.llvm.org/rGc0947dc44109252fcc0f68a542fc6ef250d4d3a9

There are three points that need to be discussed here.
1. According to riscv-spec, "The FCVTMO D.W.D instruction was added principally 
to
  accelerate the processing of JavaScript Numbers.", so it seems that no 
implementation
  is required in the compiler.
2. The FROUND and FROUNDN instructions in this patch use related functions in 
the math
  library, such as round, floor, ceil, etc. Since there is no interface for 
half-precision in
  the math library, the instructions FROUN D.H and FROUNDN X.H have not been 
implemented for
  the time being. Is it necessary to add a built-in interface belonging to 
riscv such as
 __builtin_roundhf or __builtin_roundf16 to generate half floating point 
instructions?
3. As far as I know, FMINM and FMAXM instructions correspond to C23 library 
function fminimum
  and fmaximum. Therefore, I have not dealt with such instructions for the time 
being, but have
  simply implemented the pattern of fminm3 and fmaxm3. Is 
it necessary to
  add a built-in interface belonging to riscv such as__builtin_fminm to 
generate half
  floating-point instructions?

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add zfa extension.
* config/riscv/constraints.md (Zf): Constrain the floating point number 
that the FLI instruction can load.
* config/riscv/iterators.md (round_pattern): New.
* config/riscv/predicates.md: Predicate the floating point number that 
the FLI instruction can load.
* config/riscv/riscv-opts.h (MASK_ZFA): New.
(TARGET_ZFA): New.
* config/riscv/riscv-protos.h (riscv_float_const_rtx_index_for_fli): 
Get the index of the
  floating-point number that the FLI instruction can load.
* config/riscv/riscv.cc (find_index_in_array): New.
(riscv_float_const_rtx_index_for_fli): New.
(riscv_cannot_force_const_mem): Likewise.
(riscv_const_insns): Likewise.
(riscv_legitimize_const_move): Likewise.
(riscv_split_64bit_move_p): Exclude floating point numbers that can be 
loaded by FLI instructions.
(riscv_output_move): Likewise.
(riscv_memmodel_needs_release_fence): Likewise.
(riscv_print_operand): Likewise.
(riscv_secondary_memory_needed): Likewise.
* config/riscv/riscv.h (GP_REG_RTX_P): New.
* config/riscv/riscv.md (fminm3): New.
(fmaxm3): New.
(2): New.
(rint2): New.
(f_quiet4_zfa): New.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zfa-fleq-fltq-rv32.c: New test.
* gcc.target/riscv/zfa-fleq-fltq.c: New test.
* gcc.target/riscv/zfa-fli-rv32.c: New test.
* gcc.target/riscv/zfa-fli-zfh-rv32.c: New test.
* gcc.target/riscv/zfa-fli-zfh.c: New test.
* gcc.target/riscv/zfa-fli.c: New test.
* gcc.target/riscv/zfa-fmovh-fmovp-rv32.c: New test.
* gcc.target/riscv/zfa-fround-rv32.c: New test.
* gcc.target/riscv/zfa-fround.c: New test.
---
 gcc/common/config/riscv/riscv-common.cc   |   4 +
 gcc/config/riscv/constraints.md   |   7 +
 gcc/config/riscv/iterators.md |   5 +
 gcc/config/riscv/predicates.md|   4 +
 gcc/config/riscv/riscv-opts.h |   3 +
 gcc/config/riscv/riscv-protos.h   |   1 +
 gcc/config/riscv/riscv.cc | 168 +-
 gcc/config/riscv/riscv.h  |   1 +
 gcc/config/riscv/riscv.md | 112 +---
 .../gcc.target/riscv/zfa-fleq-fltq-rv32.c |  19 ++
 .../gcc.target/riscv/zfa-fleq-fltq.c  |  19 ++
 gcc/testsuite/gcc.target/riscv/zfa-fli-rv32.c |  79 
 .../gcc.target/riscv/zfa-fli-zfh-rv32.c   |  41 +
 gcc/testsuite/gcc.target/riscv/zfa-fli-zfh.c  |  41 +
 gcc/testsuite/gcc.target/riscv/zfa-fli.c  |  79 
 .../gcc.target/riscv/zfa-fmovh-fmovp-rv32.c   |  10 ++
 .../gcc.target/riscv/zfa-fround-rv32.c|  42 +
 gcc/testsuite/gcc.target/riscv/zfa-fround.c   |  42 +
 18 files changed, 654 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fleq-fltq-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fleq-fltq.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-zfh-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-zfh.c
 create mode 100644 gcc/testsuite/gcc.t

Re: [PATCH v2 4/5] Update texinfo.tex, remove the @gol macro/alias

2023-03-10 Thread Arsen Arsenović via Gcc-patches
Morning,

Sandra Loosemore  writes:

> On 2/23/23 03:27, Arsen Arsenović via Gcc-patches wrote:
>> The @gol macro appears to have existed as a workaround for a bug in old
>> versions of makeinfo and/or texinfo.tex, where they would, in some types
>> of output, fail to emit line breaks in @gccoptlists.  After updating
>> texinfo.tex, I noticed that this behavior appears to no longer be
>> exhibited, instead, both acted correctly and inserted newlines.  The
>> (groff) manual output also appears unaffected.
>> gcc/ChangeLog:
>>  * doc/include/texinfo.tex: Update to 2023-01-17.19.
>>  * doc/implement-c.texi: Remove usage of @gol.
>>  * doc/invoke.texi: Ditto.
>>  * doc/sourcebuild.texi: Ditto.
>>  * doc/include/gcc-common.texi: Remove @gol.  In new Makeinfo and
>>  texinfo.tex versions, the bug it was working around appears to
>>  be gone.
>> gcc/fortran/ChangeLog:
>>  * invoke.texi: Remove usages of @gol.
>>  * intrinsic.texi: Ditto.
>
> This is OK, but I'd like to see this patch split into two separate commits as
> well -- one for the texinfo.tex import, and one for the @gol changes.

Sure thing.  I'll send the updated git log in a few hours, when I split
the commits, for your review (for both of the commits).

Thanks, have a great day.

> -Sandra


-- 
Arsen Arsenović


signature.asc
Description: PGP signature


Re: [PATCH v2 4/5] Update texinfo.tex, remove the @gol macro/alias

2023-03-10 Thread Arsen Arsenović via Gcc-patches
Hi Gerald, Thomas,

Thomas Schwinge  writes:

> Hi!
>
> On 2023-03-10T01:17:04+0100, Gerald Pfeifer  wrote:
>> On Thu, 9 Mar 2023, Sandra Loosemore wrote:
>>> This is OK, but I'd like to see this patch split into two separate
>>> commits as well -- one for the texinfo.tex import, and one for the @gol
>>> changes.
>>
>> I believe Arsen does not have git write access.
>
> Well, I guess we should fix that?  :-) I've met Arsen in person at
> FOSDEM 2023, and I'm happy to vouch for his reasonable behavior.
> He does have a FSF Copyright Assignment for GCC.
> ; I'm happy to act
> as sponsor.

Thanks, Thomas.  I'd be happy to undergo this process later today.  If I
understood right, I should fill out
https://sourceware.org/cgi-bin/pdw/ps_form.cgi and name you, right?

Thanks again.

>
> Grüße
>  Thomas
>
>
>> Arsen, if that is indeed the case, I offer to push these two commits for
>> you if you send them by e-mail (as two attachments).

Thanks!  Either approach works for me :)

>> Gerald
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
> München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
> Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
> München, HRB 106955

Have a lovely day!
-- 
Arsen Arsenović


signature.asc
Description: PGP signature


Re: [PATCH v2 5/5] update_web_docs_git: Update CSS reference to new manual CSS

2023-03-10 Thread Arsen Arsenović via Gcc-patches

Sandra Loosemore  writes:

> On 2/23/23 03:27, Arsen Arsenović via Gcc-patches wrote:
>> maintainer-scripts/ChangeLog:
>>  * update_web_docs_git (CSS): Update CSS reference to point to
>>  /texinfo-manuals.css.
>
> I'm going to defer to Gerald on this one, since I am ignorant of how documents
> are produced for the GCC web site.  IIUC the online docs are built on a system
> with Texinfo 6.5; I don't know if it's reasonable to update that, otherwise I
> think somebody ought to give it a dry run to make sure that the style sheet
> does reasonable things with Texinfo 6.5 output.

ISTR asking Mark about updating that system for the purposes of this
change and releaseq, and him saying that it is possible to do so, should
the GCC admins agree, so I think we should be okay.

Gerald, there's one more patch for update_web_docs_git that you'll be
interested in, see:
https://inbox.sourceware.org/gcc-patches/86sfed63l6@aarsen.me/

This addresses the @shortcontents coming after the @contents even with
the Texinfo sources specifying otherwise.

Thanks in advance.

> -Sandra


-- 
Arsen Arsenović


signature.asc
Description: PGP signature


Re: [PATCH, OpenACC, v3] Non-contiguous array support for OpenACC data clauses

2023-03-10 Thread Thomas Schwinge
Hi!

On 2019-11-26T22:49:21+0800, Chung-Lin Tang  wrote:
> this is a reorg of the last non-contiguous arrays patch.

(Sorry, this is still not the master branch integration email...)


Just a small clean-up, to simplify other changes that I'm working on:

> (4) Along the way, I've added a 'gomp_map_vars_openacc' for specializing our
> uses, which should shave off quite some code through inlining.

> --- libgomp/libgomp.h (revision 278656)
> +++ libgomp/libgomp.h (working copy)
> @@ -1167,6 +1167,10 @@ extern struct target_mem_desc *gomp_map_vars_async
>   size_t, void **, void **,
>   size_t *, void *, bool,
>   enum gomp_map_vars_kind);
> +extern struct target_mem_desc *gomp_map_vars_openacc (struct 
> gomp_device_descr *,
> +   struct goacc_asyncqueue *,
> +   size_t, void **, size_t *,
> +   unsigned short *, void *);
>  extern void gomp_unmap_tgt (struct target_mem_desc *);
>  extern void gomp_unmap_vars (struct target_mem_desc *, bool);
>  extern void gomp_unmap_vars_async (struct target_mem_desc *, bool,

> --- libgomp/target.c  (revision 278656)
> +++ libgomp/target.c  (working copy)

> @@ -1086,12 +1248,25 @@ gomp_map_vars_internal (struct gomp_device_descr *
>  }
>
>  attribute_hidden struct target_mem_desc *
> +gomp_map_vars_openacc (struct gomp_device_descr *devicep,
> +struct goacc_asyncqueue *aq, size_t mapnum,
> +void **hostaddrs, size_t *sizes, unsigned short *kinds,
> +void *nca_info)
> +{
> +  return gomp_map_vars_internal (devicep, aq, mapnum, hostaddrs, NULL,
> +  sizes, (void *) kinds,
> +  (struct goacc_ncarray_info *) nca_info,
> +  true, GOMP_MAP_VARS_OPENACC);
> +}

Pushed to devel/omp/gcc-12 branch
commit 5ea330fdc918e6731c5b706715a18470909247bf
"libgomp: Merge 'gomp_map_vars_openacc' into 'goacc_map_vars' [PR76739]",
see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 5ea330fdc918e6731c5b706715a18470909247bf Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 2 Mar 2023 18:36:47 +0100
Subject: [PATCH] libgomp: Merge 'gomp_map_vars_openacc' into 'goacc_map_vars'
 [PR76739]

Upstream has 'goacc_map_vars'; merge the new 'gomp_map_vars_openacc' into it.
(Maybe the latter didn't exist yet when the former was originally added?)
No functional change.

Clean-up for og12 commit 15d0f61a7fecdc8fd12857c40879ea3730f6d99f
"Merge non-contiguous array support patches".

	PR other/76739
	libgomp/
	* libgomp.h (goacc_map_vars): Add 'struct goacc_ncarray_info *'
	formal parameter.
	(gomp_map_vars_openacc): Remove.
	* target.c (goacc_map_vars): Adjust.
	(gomp_map_vars_openacc): Remove.
	* oacc-mem.c (acc_map_data, goacc_enter_datum)
	(goacc_enter_data_internal): Adjust.
	* oacc-parallel.c (GOACC_parallel_keyed, GOACC_data_start):
	Adjust.
---
 libgomp/ChangeLog.omp   | 11 +++
 libgomp/libgomp.h   |  9 -
 libgomp/oacc-mem.c  |  8 
 libgomp/oacc-parallel.c | 10 +-
 libgomp/target.c| 17 +++--
 5 files changed, 27 insertions(+), 28 deletions(-)

diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp
index 0e984754bb0..be21ec39428 100644
--- a/libgomp/ChangeLog.omp
+++ b/libgomp/ChangeLog.omp
@@ -1,5 +1,16 @@
 2023-03-10  Thomas Schwinge  
 
+	PR other/76739
+	* libgomp.h (goacc_map_vars): Add 'struct goacc_ncarray_info *'
+	formal parameter.
+	(gomp_map_vars_openacc): Remove.
+	* target.c (goacc_map_vars): Adjust.
+	(gomp_map_vars_openacc): Remove.
+	* oacc-mem.c (acc_map_data, goacc_enter_datum)
+	(goacc_enter_data_internal): Adjust.
+	* oacc-parallel.c (GOACC_parallel_keyed, GOACC_data_start):
+	Adjust.
+
 	* oacc-host.c: Revert
 	"OpenACC profiling-interface fixes for asynchronous operations"
 	changes.
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index ba12d558465..92f6f14960f 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -1445,15 +1445,14 @@ extern void gomp_attach_pointer (struct gomp_device_descr *,
 extern void gomp_detach_pointer (struct gomp_device_descr *,
  struct goacc_asyncqueue *, splay_tree_key,
  uintptr_t, bool, struct gomp_coalesce_buf *);
+struct goacc_ncarray_info;
 extern struct target_mem_desc *goacc_map_vars (struct gomp_device_descr *,
 	   struct goacc_asyncqueue *,
 	   size_t, void **, void **,
-	   size_t *, void *, bool,
+	   size_t *, void *,
+	   struct goacc_ncarray_in

Re: [PATCH 2/4][ranger]: Add range-ops for widen addition and widen multiplication [PR108583]

2023-03-10 Thread Andrew MacLeod via Gcc-patches

On 3/9/23 14:37, Tamar Christina wrote:

Cheers,

Thanks! I'll way for him to come back then 😊

Thanks,
Tamar


-Original Message-
From: Aldy Hernandez 
Sent: Wednesday, March 8, 2023 8:57 AM
To: Tamar Christina 
Cc: gcc-patches@gcc.gnu.org; nd ; amacl...@redhat.com
Subject: Re: [PATCH 2/4][ranger]: Add range-ops for widen addition and
widen multiplication [PR108583]

As Andrew has been advising on this one, I'd prefer for him to review it.
However, he's on vacation this week.  FYI...

Aldy

On Mon, Mar 6, 2023 at 12:22 PM Tamar Christina
 wrote:

Ping.

And updated the patch to reject cases that we don't expect or can handle

cleanly for now.

Its OK by me...  but i think a release managers haa to sign off on it 
for this stage. Next stage 1 I will formalize the process a bit more for 
nonstandard rangeops


Andrew



[pushed] analyzer: fix deref-before-check false +ves seen in haproxy [PR108475, PR109060]

2023-03-10 Thread David Malcolm via Gcc-patches
Integration testing showed various false positives from
-Wanalyzer-deref-before-check where the expression that's dereferenced
is different from the one that's checked, but the diagnostic is emitted
because they both evaluate to the same symbolic value.

This patch rejects such warnings, unless we have tree expressions for
both and that both tree expressions are "spelled the same way" i.e.
would be printed to the same user-facing string.

The patch has this effect on my integration tests of -fanalyzer:

  Comparison: 
GOOD: 129(19.20% -> 19.40%)
 BAD: 543 -> 536 (-7)

where the only affected warning is:

  -Wanalyzer-deref-before-check:
GOOD:   1(5.00% -> 8.33%)
 BAD:  19 ->  11 (-8)

   Known false positives: 15 -> 10 (-5)
  coreutils-9.1: 1 -> 0 (-1)
  haproxy-2.7.1: 4 -> 0 (-4)
   Suspected false positives: 4 -> 1 (-3)
  haproxy-2.7.1: 2 -> 0 (-2)
 qemu-7.2.0: 1 -> 0 (-1)

whilst retaining the known true positive.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r13-6581-gc4fd232f9843bb.

gcc/analyzer/ChangeLog:
PR analyzer/108475
PR analyzer/109060
* sm-malloc.cc (deref_before_check::deref_before_check):
Initialize new field m_deref_expr.  Assert that arg is non-NULL.
(deref_before_check::emit): Reject cases where the spelling of the
thing that was dereferenced differs from that of what is checked,
or if the dereference expression was not found.  Remove code to
handle NULL m_arg.
(deref_before_check::describe_state_change): Remove code to handle
NULL m_arg.
(deref_before_check::describe_final_event): Likewise.
(deref_before_check::sufficiently_similar_p): New.
(deref_before_check::m_deref_expr): New field.
(malloc_state_machine::maybe_complain_about_deref_before_check):
Don't warn if the diag_ptr is NULL.

gcc/testsuite/ChangeLog:
PR analyzer/108475
PR analyzer/109060
* gcc.dg/analyzer/deref-before-check-pr108475-1.c: New test.
* gcc.dg/analyzer/deref-before-check-pr108475-haproxy-tcpcheck.c:
New test.
* gcc.dg/analyzer/deref-before-check-pr109060-haproxy-cfgparse.c:
New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/sm-malloc.cc |  81 +
 .../analyzer/deref-before-check-pr108475-1.c  |  51 ++
 ...f-before-check-pr108475-haproxy-tcpcheck.c | 169 ++
 ...f-before-check-pr109060-haproxy-cfgparse.c |  92 ++
 4 files changed, 356 insertions(+), 37 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.dg/analyzer/deref-before-check-pr108475-1.c
 create mode 100644 
gcc/testsuite/gcc.dg/analyzer/deref-before-check-pr108475-haproxy-tcpcheck.c
 create mode 100644 
gcc/testsuite/gcc.dg/analyzer/deref-before-check-pr109060-haproxy-cfgparse.c

diff --git a/gcc/analyzer/sm-malloc.cc b/gcc/analyzer/sm-malloc.cc
index 1ea9b30fa13..16883d301d5 100644
--- a/gcc/analyzer/sm-malloc.cc
+++ b/gcc/analyzer/sm-malloc.cc
@@ -1498,8 +1498,11 @@ public:
   deref_before_check (const malloc_state_machine &sm, tree arg)
   : malloc_diagnostic (sm, arg),
 m_deref_enode (NULL),
+m_deref_expr (NULL),
 m_check_enode (NULL)
-  {}
+  {
+gcc_assert (arg);
+  }
 
   const char *get_kind () const final override { return "deref_before_check"; }
 
@@ -1560,6 +1563,15 @@ public:
 if (linemap_location_from_macro_definition_p (line_table, check_loc))
   return false;
 
+/* Reject if m_deref_expr is sufficiently different from m_arg
+   for cases where the dereference is spelled differently from
+   the check, which is probably two different ways to get the
+   same svalue, and thus not worth reporting.  */
+if (!m_deref_expr)
+  return false;
+if (!sufficiently_similar_p (m_deref_expr, m_arg))
+  return false;
+
 /* Reject the warning if the deref's BB doesn't dominate that
of the check, so that we don't warn e.g. for shared cleanup
code that checks a pointer for NULL, when that code is sometimes
@@ -1572,15 +1584,10 @@ public:
 m_deref_enode->get_supernode ()->m_bb))
   return false;
 
-if (m_arg)
-  return warning_at (rich_loc, get_controlling_option (),
-"check of %qE for NULL after already"
-" dereferencing it",
-m_arg);
-else
-  return warning_at (rich_loc, get_controlling_option (),
-"check of pointer for NULL after already"
-" dereferencing it");
+return warning_at (rich_loc, get_controlling_option (),
+  "check of %qE for NULL after already"
+  " dereferencing it",
+  m_arg);
   }
 
   label_text describe_state_change (const evdesc::state_change &change)
@@ -1591,11 +1598,9 @@ public:
   {
m_first_deref_event 

[PATCH] Speedup PTA solving for call constraint sets

2023-03-10 Thread Richard Biener via Gcc-patches
With calls we now often get contraints like

  callarg = *callarg + UNKNOWN

or similar cases.  The important thing to note is that this
complex constraint changes the node solution itself, so when
solving the node is marked as changed immediately again.  When
that happens it's profitable to iterate that self-cycle immediately
so we maximize cache reuse and build up the successor graph quickly
to get better topological ordering and reduce the number of
iterations of the solving.

For a testcase derived from ceph this reduces the time spent in
PTA solving from 453s to 92s which is quite significant.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.  For
the testcase I verified we create identical points-to solutions
before and after the change.  There are regression bugs complaining
about high PTA time (often only as part of overall slow compile),
I did not verify if this improves any of those but consider the
change a regression fix.

* tree-ssa-structalias.cc (solve_graph): Immediately
iterate self-cycles.
---
 gcc/tree-ssa-structalias.cc | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/gcc/tree-ssa-structalias.cc b/gcc/tree-ssa-structalias.cc
index fd7450b9477..fa3a2e4e1f9 100644
--- a/gcc/tree-ssa-structalias.cc
+++ b/gcc/tree-ssa-structalias.cc
@@ -2775,8 +2775,15 @@ solve_graph (constraint_graph_t graph)
continue;
 
  /* If the node has changed, we need to process the
-complex constraints and outgoing edges again.  */
- if (bitmap_clear_bit (changed, i))
+complex constraints and outgoing edges again.  For complex
+constraints that modify i itself, like the common group of
+  callarg = callarg + UNKNOWN;
+  callarg = *callarg + UNKNOWN;
+  *callarg = callescape;
+make sure to iterate immediately because that maximizes
+cache reuse and expands the graph quickest, leading to
+better visitation order in the next iteration.  */
+ while (bitmap_clear_bit (changed, i))
{
  unsigned int j;
  constraint_t c;
@@ -2794,7 +2801,7 @@ solve_graph (constraint_graph_t graph)
 ???  But we shouldn't ended up with "changed" set ...  */
  if (vi->oldsolution
  && bitmap_bit_p (vi->oldsolution, anything_id))
-   continue;
+   break;
  bitmap_copy (pts, get_varinfo (find (anything_id))->solution);
}
  else if (vi->oldsolution)
@@ -2803,7 +2810,7 @@ solve_graph (constraint_graph_t graph)
bitmap_copy (pts, vi->solution);
 
  if (bitmap_empty_p (pts))
-   continue;
+   break;
 
  if (vi->oldsolution)
bitmap_ior_into (vi->oldsolution, pts);
-- 
2.35.3


Re: [v4][PATCH 1/2] Handle component_ref to a structre/union field including C99 FAM [PR101832]

2023-03-10 Thread Qing Zhao via Gcc-patches


> On Mar 10, 2023, at 2:54 AM, Richard Biener  wrote:
> 
> On Thu, 9 Mar 2023, Qing Zhao wrote:
> 
>> 
>> 
>>> On Mar 9, 2023, at 7:20 AM, Richard Biener  wrote:
>>> 
>>> On Fri, 24 Feb 2023, Qing Zhao wrote:
>>> 
 GCC extension accepts the case when a struct with a C99 flexible array 
 member
 is embedded into another struct or union (possibly recursively).
 __builtin_object_size should treat such struct as flexible size.
 
 gcc/c/ChangeLog:
 
PR tree-optimization/101832
* c-decl.cc (finish_struct): Set TYPE_INCLUDE_FLEXARRAY for
struct/union type.
>>> 
>>> I can't really comment on the correctness of this part but since
>>> only the C frontend will ever set this and you are using it from
>>> addr_object_size which is also used for other C family languages
>>> (at least), I wonder if you can really test
>>> 
>>> +   if (!TYPE_INCLUDE_FLEXARRAY (TREE_TYPE (v)))
>>> 
>>> there.
>> 
>> You mean for C++ and also other C family languages (other than C), the above 
>> bit cannot be set?
>> Yes, that’s true. The bit is only set for C. So is the bit 
>> DECL_NOT_FLEXARRAY, which is only set for C too. 
>> So, I am wondering whether the bit DECL_NOT_FLEXARRAY should be also set in 
>> middle end? Or we can set DECL_NOT_FLEXARRAY in C++ FE too? And then we can 
>> set TYPE_INCLUDE_FLEXARRAY also in C++ FE?
>> What’s your suggestion?
>> 
>> (I personally feel that DECL_NOT_FLEXARRAY and TYPE_INCLUDE_FLEXARRAY should 
>> be set in the same places).
> 
> I was wondering if the above test errors on the conservative side
> correctly - it will now, for all but C, cut off some thing where it
> didn't before?

As long as the default value of TYPE_INCLUDE_FLEXARRAY reflects the correct 
conservative  behavior, then the testing should be correct, right?

The default value of TYPE_INCLUDE_FLEXARRAY is 0, i.e, FALSE, means that the 
TYPE does not include a flex array by default, this is the correct behavior. 
Only when the TYPE does include a flexiarray, it will be set to TRUE. So, I 
think it’s correct.

This is a different situation for DECL_NOT_FLEXARRAY, by default, the compiler 
will treat ALL trailing arrays as FMA, so in order to keep the correct 
conservative behavior, we should keep the default value for DECL_NOT_FLEXARRAY 
as it’s a FMA, i.e, DECL_NOT_FLEXARRAY being FALSE, by default. Only when the 
array is NOT a FMA, we set it to true. 

So, the default value for TYPE_INCLUDE_FLEXARRAY is correct. 
> 
>>> 
>>> Originally I was suggesting to set this flag in stor-layout.cc
>>> which eventually all languages funnel their types through and
>>> if there's language specific handling use a langhook (with the
>>> default implementation preserving the status quo).
>> 
>> If we decide to set the bits in stor-layout.cc, where is the best place to 
>> do it? I checked the star-layout.cc code, looks like “layout_type” might be 
>> the place where we can set these bits for RECORD_TYPE, UNION_TYPE? 
> 
> Yes, it would be layout_type.
> 
>>> 
>>> Some more comments below ...
>>> 
 gcc/cp/ChangeLog:
 
PR tree-optimization/101832
* module.cc (trees_out::core_bools): Stream out new bit
type_include_flexarray.
(trees_in::core_bools): Stream in new bit type_include_flexarray.
 
 gcc/ChangeLog:
 
PR tree-optimization/101832
* print-tree.cc (print_node): Print new bit type_include_flexarray.
* tree-core.h (struct tree_type_common): New bit
type_include_flexarray.
* tree-object-size.cc (addr_object_size): Handle structure/union type
when it has flexible size.
* tree-streamer-in.cc (unpack_ts_type_common_value_fields): Stream
in new bit type_include_flexarray.
* tree-streamer-out.cc (pack_ts_type_common_value_fields): Stream
out new bit type_include_flexarray.
* tree.h (TYPE_INCLUDE_FLEXARRAY): New macro
TYPE_INCLUDE_FLEXARRAY.
 
 gcc/testsuite/ChangeLog:
 
PR tree-optimization/101832
* gcc.dg/builtin-object-size-pr101832.c: New test.
 ---
 gcc/c/c-decl.cc   |  12 ++
 gcc/cp/module.cc  |   2 +
 gcc/print-tree.cc |   5 +
 .../gcc.dg/builtin-object-size-pr101832.c | 134 ++
 gcc/tree-core.h   |   4 +-
 gcc/tree-object-size.cc   |  79 +++
 gcc/tree-streamer-in.cc   |   1 +
 gcc/tree-streamer-out.cc  |   1 +
 gcc/tree.h|   6 +
 9 files changed, 215 insertions(+), 29 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/builtin-object-size-pr101832.c
 
 diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
 index 08078eadeb8..f589a2f5192 100644
 --- a/gcc/c/c-decl.cc
 +++ b/gcc/c/c-decl.cc
 @@ -9284,6 +

[pushed] c++: class NTTP and nested anon union [PR108566]

2023-03-10 Thread Jason Merrill via Gcc-patches
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

We were failing to come up with the name for the anonymous union.  It seems
like unfortunate redundancy, but the ABI does say that the name of an
anonymous union is its first named member.

PR c++/108566

gcc/cp/ChangeLog:

* mangle.cc (anon_aggr_naming_decl): New.
(write_unqualified_name): Use it.

gcc/testsuite/ChangeLog:

* g++.dg/abi/anon6.C: New test.
---
 gcc/cp/mangle.cc | 27 ++-
 gcc/testsuite/g++.dg/abi/anon6.C | 19 +++
 2 files changed, 45 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/abi/anon6.C

diff --git a/gcc/cp/mangle.cc b/gcc/cp/mangle.cc
index 242b3f31cba..a235f23459d 100644
--- a/gcc/cp/mangle.cc
+++ b/gcc/cp/mangle.cc
@@ -1389,6 +1389,28 @@ find_decomp_unqualified_name (tree decl, size_t *len)
   return p;
 }
 
+/* "For the purposes of mangling, the name of an anonymous union is considered
+   to be the name of the first named data member found by a pre-order,
+   depth-first, declaration-order walk of the data members of the anonymous
+   union. If there is no such data member (i.e., if all of the data members in
+   the union are unnamed), then there is no way for a program to refer to the
+   anonymous union, and there is therefore no need to mangle its name."  */
+
+static tree
+anon_aggr_naming_decl (tree type)
+{
+  tree field = next_aggregate_field (TYPE_FIELDS (type));
+  for (; field; field = next_aggregate_field (DECL_CHAIN (field)))
+{
+  if (DECL_NAME (field))
+   return field;
+  if (ANON_AGGR_TYPE_P (TREE_TYPE (field)))
+   if (tree sub = anon_aggr_naming_decl (TREE_TYPE (field)))
+ return sub;
+}
+  return NULL_TREE;
+}
+
 /* We don't need to handle thunks, vtables, or VTTs here.  Those are
mangled through special entry points.
 
@@ -1432,7 +1454,10 @@ write_unqualified_name (tree decl)
 
   bool found = false;
 
-  if (DECL_NAME (decl) == NULL_TREE)
+  if (DECL_NAME (decl) == NULL_TREE
+  && ANON_AGGR_TYPE_P (TREE_TYPE (decl)))
+decl = anon_aggr_naming_decl (TREE_TYPE (decl));
+  else if (DECL_NAME (decl) == NULL_TREE)
 {
   found = true;
   gcc_assert (DECL_ASSEMBLER_NAME_SET_P (decl));
diff --git a/gcc/testsuite/g++.dg/abi/anon6.C b/gcc/testsuite/g++.dg/abi/anon6.C
new file mode 100644
index 000..7be0b0bbdb7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/abi/anon6.C
@@ -0,0 +1,19 @@
+// PR c++/108566
+// { dg-do compile { target c++20 } }
+
+template
+struct wrapper1 {
+  union {
+union {
+  T RightName;
+};
+  };
+};
+
+template void dummy(){}
+
+void uses() {
+  dummy{123.0}>();
+}
+
+// { dg-final { scan-assembler 
"_Z5dummyIXtl8wrapper1IdEtlNS1_Ut_Edi9RightNametlNS2_Ut_Edi9RightNameLd405ec000EEvv"
 } }

base-commit: 2fc55f51f9953b451d6d6ddfae23379001e6ac95
-- 
2.31.1



Fix OpenACC/GCN 'acc_ev_enqueue_launch_end' position (was: [PATCH] [og9] OpenACC profiling support for AMD GCN)

2023-03-10 Thread Thomas Schwinge
Hi!

On 2019-09-06T09:02:13-0700, Julian Brown  wrote:
> This patch adds profiling support to the AMD GCN libgomp plugin, modeled
> after the equivalent support in the NVPTX plugin. This gives a positive
> test delta in AMD GCN offload testing.

Yay!  \o/

> I will apply to the openacc-gcc-9-branch shortly.

..., and later these changes got into master branch, via integration into
"[PATCH 7/7 libgomp,amdgcn] GCN Libgomp Plugin".

> --- a/libgomp/plugin/plugin-gcn.c
> +++ b/libgomp/plugin/plugin-gcn.c

|  static void
|  gomp_offload_free (void *ptr)
|  {
|GCN_DEBUG ("Async thread ?:?: Freeing %p\n", ptr);
|GOMP_OFFLOAD_free (0, ptr);
|  }

> @@ -3046,6 +3075,35 @@ GOMP_OFFLOAD_free (int device, void *ptr)
>return false;
>  }
>
> +  struct goacc_thread *thr = GOMP_PLUGIN_goacc_thread ();
> +  bool profiling_dispatch_p
> += __builtin_expect (thr != NULL && thr->prof_info != NULL, false);
> +  if (profiling_dispatch_p)
> +{
> +  [...]
> +  prof_info->event_type = acc_ev_free;
> +
> +  [...]
> +  GOMP_PLUGIN_goacc_profiling_dispatch (prof_info, &data_event_info,
> + api_info);
> +}
> +
>return true;
>  }
>

> @@ -3276,6 +3334,35 @@ gcn_exec (struct kernel_info *kernel, size_t mapnum, 
> void **hostaddrs,
>   {1,   64, 16}
>  };
>
> +  struct goacc_thread *thr = GOMP_PLUGIN_goacc_thread ();
> +  acc_prof_info *prof_info = thr->prof_info;
> +  acc_event_info enqueue_launch_event_info;
> +  acc_api_info *api_info = thr->api_info;
> +  bool profiling_dispatch_p = __builtin_expect (prof_info != NULL, false);
> +  if (profiling_dispatch_p)
> +{
> +  prof_info->event_type = acc_ev_enqueue_launch_start;
> +
> +  [...]
> +  GOMP_PLUGIN_goacc_profiling_dispatch (prof_info,
> + &enqueue_launch_event_info, api_info);
> +}
> +
>if (!async)
>  {
>run_kernel (kernel, ind_da, &kla, NULL, false);
|gomp_offload_free (ind_da);
|  }
|else
|  {
|queue_push_launch (aq, kernel, ind_da, &kla);
|if (DEBUG_QUEUES)
|   GCN_DEBUG ("queue_push_callback %d:%d gomp_offload_free, %p\n",
>  aq->agent->device_id, aq->id, ind_da);
>queue_push_callback (aq, gomp_offload_free, ind_da);
>  }
> +
> +  if (profiling_dispatch_p)
> +{
> +  prof_info->event_type = acc_ev_enqueue_launch_end;
> +  enqueue_launch_event_info.launch_event.event_type = 
> prof_info->event_type;
> +  GOMP_PLUGIN_goacc_profiling_dispatch (prof_info,
> + &enqueue_launch_event_info,
> + api_info);
> +}
>  }

Per that, we've currently got:

  - [...]
  - acc_ev_enqueue_launch_start
  - launch kernel
  - free memory
  - acc_ev_free
  - acc_ev_enqueue_launch_end

This confused another thing that I'm working on, so I adjusted that to:

  - [...]
  - acc_ev_enqueue_launch_start
  - launch kernel
  - acc_ev_enqueue_launch_end
  - free memory
  - acc_ev_free

Pushed to master branch commit 649f1939baf11f45fd3579b8b9601c7840a097b3
"Fix OpenACC/GCN 'acc_ev_enqueue_launch_end' position", see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 649f1939baf11f45fd3579b8b9601c7840a097b3 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 2 Mar 2023 10:39:09 +0100
Subject: [PATCH] Fix OpenACC/GCN 'acc_ev_enqueue_launch_end' position

For an OpenACC compute construct, we've currently got:

  - [...]
  - acc_ev_enqueue_launch_start
  - launch kernel
  - free memory
  - acc_ev_free
  - acc_ev_enqueue_launch_end

This confused another thing that I'm working on, so I adjusted that to:

  - [...]
  - acc_ev_enqueue_launch_start
  - launch kernel
  - acc_ev_enqueue_launch_end
  - free memory
  - acc_ev_free

Correspondingly, verify 'acc_ev_alloc', 'acc_ev_free' in
'libgomp.oacc-c-c++-common/acc_prof-parallel-1.c'.

	libgomp/
	* plugin/plugin-gcn.c (gcn_exec): Fix 'acc_ev_enqueue_launch_end'
	position.
	* testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c:
	Verify 'acc_ev_alloc', 'acc_ev_free'.
---
 libgomp/plugin/plugin-gcn.c   |  23 +-
 .../acc_prof-parallel-1.c | 202 --
 2 files changed, 195 insertions(+), 30 deletions(-)

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 11ce6b0fa8d..96920a48d2e 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -3192,18 +3192,9 @@ gcn_exec (struct kernel_info *kernel, size_t mapnum, void **hostaddrs,
 }
 
   if (!async)
-{
-  run_kernel (kernel, ind_da, &kla, NULL, false);
-  gomp_offload_free (ind_da);
-}
+run_kernel (kernel, ind_da, &kla, NULL, false);
   else
-{
-  

RE: [PATCH 2/4][ranger]: Add range-ops for widen addition and widen multiplication [PR108583]

2023-03-10 Thread Tamar Christina via Gcc-patches
> >> As Andrew has been advising on this one, I'd prefer for him to review it.
> >> However, he's on vacation this week.  FYI...
> >>
> >> Aldy
> >>
> >> On Mon, Mar 6, 2023 at 12:22 PM Tamar Christina
> >>  wrote:
> >>> Ping.
> >>>
> >>> And updated the patch to reject cases that we don't expect or can
> >>> handle
> >> cleanly for now.
> >>
> Its OK by me...  but i think a release managers haa to sign off on it for this
> stage. Next stage 1 I will formalize the process a bit more for nonstandard
> rangeops
> 

Thanks!

Richi is this change OK with you?

Thanks,
Tamar
> Andrew



Document/verify another aspect of OpenACC 'async' semantics in 'libgomp.oacc-c-c++-common/data-3.c'

2023-03-10 Thread Thomas Schwinge
Hi!

In order to document/verify one aspect of OpenACC 'async' semantics, I've
pushed to master branch commit 442d51a20ef13a8e6c080ca30bc37fc93b6bfac4
"Document/verify another aspect of OpenACC 'async' semantics in 
'libgomp.oacc-c-c++-common/data-3.c'",
see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 442d51a20ef13a8e6c080ca30bc37fc93b6bfac4 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 24 Feb 2023 16:21:31 +0100
Subject: [PATCH] Document/verify another aspect of OpenACC 'async' semantics
 in 'libgomp.oacc-c-c++-common/data-3.c'

... that I almost broke with later implementation changes.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/data-3.c: Document/verify
	another aspect of OpenACC 'async' semantics.
---
 libgomp/testsuite/libgomp.oacc-c-c++-common/data-3.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-3.c
index 5ec50b808a7..c422cbcd325 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-3.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-3.c
@@ -144,8 +144,8 @@ main (int argc, char **argv)
 
 #pragma acc exit data copyout (a[0:N]) copyout (b[0:N]) copyout (c[0:N]) \
   copyout (d[0:N]) copyout (e[0:N]) wait (1, 2, 3, 4) async (1)
-#pragma acc exit data delete (N)
-#pragma acc wait (1)
+#pragma acc exit data delete (N) wait(1) async(2)
+#pragma acc wait (2)
 
   for (i = 0; i < N; i++)
 {
-- 
2.25.1



Re: [PATCH 2/4][ranger]: Add range-ops for widen addition and widen multiplication [PR108583]

2023-03-10 Thread Richard Biener via Gcc-patches



> Am 10.03.2023 um 15:12 schrieb Tamar Christina via Gcc-patches 
> :
> 
> 
>> 
 As Andrew has been advising on this one, I'd prefer for him to review it.
 However, he's on vacation this week.  FYI...
 
 Aldy
 
 On Mon, Mar 6, 2023 at 12:22 PM Tamar Christina
  wrote:
> Ping.
> 
> And updated the patch to reject cases that we don't expect or can
> handle
 cleanly for now.
 
>> Its OK by me...  but i think a release managers haa to sign off on it for 
>> this
>> stage. Next stage 1 I will formalize the process a bit more for nonstandard
>> rangeops
>> 
> 
> Thanks!
> 
> Richi is this change OK with you?

Yes.

Richard 

> Thanks,
> Tamar
>> Andrew
> 


OpenACC: Remove 'acc_async_test' -> skip shortcut in 'libgomp/oacc-async.c:goacc_wait'

2023-03-10 Thread Thomas Schwinge
Hi!

Pushed to master branch commit b5037d4a073f2e4625afab5ec1f35624d9f9eba1
"OpenACC: Remove 'acc_async_test' -> skip shortcut in 
'libgomp/oacc-async.c:goacc_wait'",
see attached.

Chung-Lin, in case you did "worry" ;-) -- no need to, this code dates
back way before your "async re-work".


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From b5037d4a073f2e4625afab5ec1f35624d9f9eba1 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 24 Feb 2023 16:17:57 +0100
Subject: [PATCH] OpenACC: Remove 'acc_async_test' -> skip shortcut in
 'libgomp/oacc-async.c:goacc_wait'

We're not taking such a shortcut anywhere else, and (with future changes) it
has potential to confuse things if synchronization in a libgomp plugin happens
to have side effects even if an async queue currently is empty.

	libgomp/
	* oacc-async.c (goacc_wait): Remove 'acc_async_test' -> skip
	shortcut.
---
 libgomp/oacc-async.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/libgomp/oacc-async.c b/libgomp/oacc-async.c
index 2562afbd753..82d00b64b50 100644
--- a/libgomp/oacc-async.c
+++ b/libgomp/oacc-async.c
@@ -410,9 +410,6 @@ goacc_wait (int async, int num_waits, va_list *ap)
 	  break;
 	}
 
-  if (acc_async_test (qid))
-	continue;
-
   if (async == acc_async_sync)
 	acc_wait (qid);
   else if (qid == async)
-- 
2.25.1



Simplify OpenACC 'no_create' clause implementation

2023-03-10 Thread Thomas Schwinge
Hi!

Pushed to master branch commit 199867d07be65cb0227a318ebf42b8376ca09313
"Simplify OpenACC 'no_create' clause implementation", see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 199867d07be65cb0227a318ebf42b8376ca09313 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 27 Feb 2023 12:02:02 +0100
Subject: [PATCH] Simplify OpenACC 'no_create' clause implementation

For 'OFFSET_INLINED', 'gomp_map_val' does the right thing, and we may then
simplify the device plugins accordingly.

This is a follow-up to
Subversion r279551 (Git commit a6163563f2ce502bd4ef444bd5de33570bb8eeb1)
"Add OpenACC 2.6's no_create",
Subversion r279622 (Git commit 5bcd470bf0749e1f56d05dd43aa9584ff2e3a090)
"Use gomp_map_val for OpenACC host-to-device address translation".

	libgomp/
	* target.c (gomp_map_vars_internal): Use 'OFFSET_INLINED' for
	'GOMP_MAP_IF_PRESENT'.
	* plugin/plugin-gcn.c (gcn_exec, GOMP_OFFLOAD_openacc_exec)
	(GOMP_OFFLOAD_openacc_async_exec): Adjust.
	* plugin/plugin-nvptx.c (nvptx_exec, GOMP_OFFLOAD_openacc_exec)
	(GOMP_OFFLOAD_openacc_async_exec): Likewise.
	* testsuite/libgomp.oacc-c-c++-common/no_create-1.c: Add 'async'
	testing.
	* testsuite/libgomp.oacc-c-c++-common/no_create-2.c: Likewise.
---
 libgomp/plugin/plugin-gcn.c   | 18 +--
 libgomp/plugin/plugin-nvptx.c | 19 ++--
 libgomp/target.c  |  2 +-
 .../libgomp.oacc-c-c++-common/no_create-1.c   | 30 +++
 .../libgomp.oacc-c-c++-common/no_create-2.c   | 12 +++-
 5 files changed, 54 insertions(+), 27 deletions(-)

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 96920a48d2e..954a140ba5e 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -3064,7 +3064,7 @@ wait_queue (struct goacc_asyncqueue *aq)
 /* Execute an OpenACC kernel, synchronously or asynchronously.  */
 
 static void
-gcn_exec (struct kernel_info *kernel, size_t mapnum, void **hostaddrs,
+gcn_exec (struct kernel_info *kernel, size_t mapnum,
 	  void **devaddrs, unsigned *dims, void *targ_mem_desc, bool async,
 	  struct goacc_asyncqueue *aq)
 {
@@ -3077,9 +3077,7 @@ gcn_exec (struct kernel_info *kernel, size_t mapnum, void **hostaddrs,
   /* devaddrs must be double-indirect on the target.  */
   void **ind_da = alloc_by_agent (kernel->agent, sizeof (void*) * mapnum);
   for (size_t i = 0; i < mapnum; i++)
-hsa_fns.hsa_memory_copy_fn (&ind_da[i],
-devaddrs[i] ? &devaddrs[i] : &hostaddrs[i],
-sizeof (void *));
+hsa_fns.hsa_memory_copy_fn (&ind_da[i], &devaddrs[i], sizeof (void *));
 
   struct hsa_kernel_description *hsa_kernel_desc = NULL;
   for (unsigned i = 0; i < kernel->module->image_desc->kernel_count; i++)
@@ -3887,27 +3885,27 @@ GOMP_OFFLOAD_async_run (int device, void *tgt_fn, void *tgt_vars,
 
 void
 GOMP_OFFLOAD_openacc_exec (void (*fn_ptr) (void *), size_t mapnum,
-			   void **hostaddrs, void **devaddrs, unsigned *dims,
+			   void **hostaddrs __attribute__((unused)),
+			   void **devaddrs, unsigned *dims,
 			   void *targ_mem_desc)
 {
   struct kernel_info *kernel = (struct kernel_info *) fn_ptr;
 
-  gcn_exec (kernel, mapnum, hostaddrs, devaddrs, dims, targ_mem_desc, false,
-	NULL);
+  gcn_exec (kernel, mapnum, devaddrs, dims, targ_mem_desc, false, NULL);
 }
 
 /* Run an asynchronous OpenACC kernel on the specified queue.  */
 
 void
 GOMP_OFFLOAD_openacc_async_exec (void (*fn_ptr) (void *), size_t mapnum,
- void **hostaddrs, void **devaddrs,
+ void **hostaddrs __attribute__((unused)),
+ void **devaddrs,
  unsigned *dims, void *targ_mem_desc,
  struct goacc_asyncqueue *aq)
 {
   struct kernel_info *kernel = (struct kernel_info *) fn_ptr;
 
-  gcn_exec (kernel, mapnum, hostaddrs, devaddrs, dims, targ_mem_desc, true,
-	aq);
+  gcn_exec (kernel, mapnum, devaddrs, dims, targ_mem_desc, true, aq);
 }
 
 /* Create a new asynchronous thread and queue for running future kernels.  */
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 1166807f68f..13e31156d36 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -742,8 +742,7 @@ link_ptx (CUmodule *module, const struct targ_ptx_obj *ptx_objs,
 }
 
 static void
-nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
-	unsigned *dims, void *targ_mem_desc,
+nvptx_exec (void (*fn), size_t mapnum, unsigned *dims, void *targ_mem_desc,
 	CUdeviceptr dp, CUstream stream)
 {
   struct targ_fn_descriptor *targ_fn = (struct targ_fn_descriptor *) fn;
@@ -1530,7 +1529,8 @@ GOMP_OFFLOAD_free (int ord, void *ptr)
 
 void
 GOMP_OFFLOAD_openacc_exec (void (*fn) (void *), size_t mapnum,
-			   void **hostaddrs, void **devaddrs,
+			  

Allow libgomp 'cbuf' buffering with OpenACC 'async' for 'ephemeral' data (was: [PATCH 3/4] openacc: Fix asynchronous host-to-device copies in libgomp runtime)

2023-03-10 Thread Thomas Schwinge
Hi!

On 2021-07-27T12:01:18+0200, I wrote:
> On 2021-06-29T16:42:03-0700, Julian Brown  wrote:
>> This patch fixes several places in libgomp/target.c where "ephemeral" data
>> (on the stack or in temporary heap locations) may be used as the source of
>> an asynchronous host-to-device copy that may not complete before the host
>> data disappears.  Versions of the patch have been posted several times
>> before, but this one (at Chung-Lin Tang's prior suggesion, IIRC) moves
>> all logic into target.c rather than pushing it out to each target plugin.
>
> Thanks for the re-work!

>> +/* Copy host memory to an offload device.  In asynchronous mode (if AQ is
>> +   non-NULL), when the source data is stack or may otherwise be deallocated
>> +   before the asynchronous copy takes place, EPHEMERAL must be passed as
>> +   TRUE.  The CBUF isn't used for non-ephemeral asynchronous copies, because
>> +   the host data might not be computed yet (by an earlier asynchronous 
>> compute
>> +   region).  */
>> +
>>  [gomp_copy_host2dev]
>
> Code changes related to the latter sentence have moved into a separate
> "Don't use libgomp 'cbuf' buffering with OpenACC 'async'", pushed to
> master branch in commit d88a6951586c7229b25708f4486eaaf4bf4b5bbe, [...]

Re this TODO comment:

> +   TODO ... but we could allow CBUF usage for EPHEMERAL data?  (Open 
> question:
> +   is it more performant to use libgomp CBUF buffering or individual device
> +   asyncronous copying?)  */

Pushed to master branch commit 2b2340e236c0bba8aaca358ea25a5accd8249fbd
"Allow libgomp 'cbuf' buffering with OpenACC 'async' for 'ephemeral' data",
see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 2b2340e236c0bba8aaca358ea25a5accd8249fbd Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 27 Feb 2023 16:41:17 +0100
Subject: [PATCH] Allow libgomp 'cbuf' buffering with OpenACC 'async' for
 'ephemeral' data

This does *allow*, but under no circumstances is this currently going to be
used: all potentially applicable data is non-'ephemeral', and thus not
considered for 'gomp_coalesce_buf_add' for OpenACC 'async'.  (But a use will
emerge later.)

Follow-up to commit r12-2530-gd88a6951586c7229b25708f4486eaaf4bf4b5bbe
"Don't use libgomp 'cbuf' buffering with OpenACC 'async'", addressing this
TODO comment:

TODO ... but we could allow CBUF usage for EPHEMERAL data?  (Open question:
is it more performant to use libgomp CBUF buffering or individual device
asyncronous copying?)

Ephemeral data is small, and therefore individual device asyncronous copying
does seem dubious -- in particular given that for all those, we'd individually
have to allocate and queue for deallocation a temporary buffer to capture the
ephemeral data.  Instead, just let the 'cbuf' *be* the temporary buffer.

	libgomp/
	* target.c (gomp_copy_host2dev, gomp_map_vars_internal): Allow
	libgomp 'cbuf' buffering with OpenACC 'async' for 'ephemeral'
	data.
---
 libgomp/target.c | 70 +---
 1 file changed, 36 insertions(+), 34 deletions(-)

diff --git a/libgomp/target.c b/libgomp/target.c
index 0344f68a936..074caa6a4dc 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -310,10 +310,8 @@ struct gomp_coalesce_buf
 
This must not be used for asynchronous copies, because the host data might
not be computed yet (by an earlier asynchronous compute region, for
-   example).
-   TODO ... but we could allow CBUF usage for EPHEMERAL data?  (Open question:
-   is it more performant to use libgomp CBUF buffering or individual device
-   asyncronous copying?)  */
+   example).  The exception is for EPHEMERAL data, that we know is available
+   already "by construction".  */
 
 static inline void
 gomp_coalesce_buf_add (struct gomp_coalesce_buf *cbuf, size_t start, size_t len)
@@ -377,30 +375,6 @@ gomp_copy_host2dev (struct gomp_device_descr *devicep,
 		void *d, const void *h, size_t sz,
 		bool ephemeral, struct gomp_coalesce_buf *cbuf)
 {
-  if (__builtin_expect (aq != NULL, 0))
-{
-  /* See 'gomp_coalesce_buf_add'.  */
-  assert (!cbuf);
-
-  void *h_buf = (void *) h;
-  if (ephemeral)
-	{
-	  /* We're queueing up an asynchronous copy from data that may
-	 disappear before the transfer takes place (i.e. because it is a
-	 stack local in a function that is no longer executing).  Make a
-	 copy of the data into a temporary buffer in those cases.  */
-	  h_buf = gomp_malloc (sz);
-	  memcpy (h_buf, h, sz);
-	}
-  goacc_device_copy_async (devicep, devicep->openacc.async.host2dev_func,
-			   "dev", d, "host", h_buf, h, sz, aq);
-  if (ephemeral)
-	/* Free temporary buffer once the transfer has completed.  */
-	devicep->openacc.async.queue_callback_

[PATCH] gcc: Add deleted assignment operators to non-copyable types

2023-03-10 Thread Jonathan Wakely via Gcc-patches
Bootstrapped and regtested on powerpc64le-linux.

OK for trunk?

It's safe to do now rather than waiting for Stage 1, because if we were
actually relying on copy-assigning these types it would have failed to
compile with this change. So it has no functional change, but will help
prevent any future misuse of these types.

-- >8 --

The auto_timevar and auto_cond_timevar classes are supposed to be
non-copyable, but they have implicit assignment operators. Define their
assignment operators as deleted.

The auto_bitmap declares private copy/move constructors/assignments,
which can be replced with deleted copies to get the same effect but
using more idiomatic C++11 style.

gcc/ChangeLog:

* bitmap.h (class auto_bitmap): Replace private-and-undefined
copy and move special member functions with deleted copies.
* timevar.h (class auto_timevar): Delete assignment operator.
(class auto_cond_timevar): Likewise.
---
 gcc/bitmap.h  | 11 ---
 gcc/timevar.h |  6 --
 2 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/gcc/bitmap.h b/gcc/bitmap.h
index 43337d2e9d9..ccb484651ab 100644
--- a/gcc/bitmap.h
+++ b/gcc/bitmap.h
@@ -945,7 +945,7 @@ bmp_iter_and_compl (bitmap_iterator *bi, unsigned *bit_no)
 /* A class that ties the lifetime of a bitmap to its scope.  */
 class auto_bitmap
 {
- public:
+public:
   auto_bitmap (ALONE_CXX_MEM_STAT_INFO)
 { bitmap_initialize (&m_bits, &bitmap_default_obstack PASS_MEM_STAT); }
   explicit auto_bitmap (bitmap_obstack *o CXX_MEM_STAT_INFO)
@@ -954,12 +954,9 @@ class auto_bitmap
   // Allow calling bitmap functions on our bitmap.
   operator bitmap () { return &m_bits; }
 
- private:
-  // Prevent making a copy that references our bitmap.
-  auto_bitmap (const auto_bitmap &);
-  auto_bitmap &operator = (const auto_bitmap &);
-  auto_bitmap (auto_bitmap &&);
-  auto_bitmap &operator = (auto_bitmap &&);
+  // Prevent shallow copies.
+  auto_bitmap (const auto_bitmap &) = delete;
+  auto_bitmap &operator = (const auto_bitmap &) = delete;
 
   bitmap_head m_bits;
 };
diff --git a/gcc/timevar.h b/gcc/timevar.h
index ad465731609..b2d13d44190 100644
--- a/gcc/timevar.h
+++ b/gcc/timevar.h
@@ -247,8 +247,9 @@ class auto_timevar
   m_timer->pop (m_tv);
   }
 
-  // Disallow copies.
+  // Prevent shallow copies.
   auto_timevar (const auto_timevar &) = delete;
+  auto_timevar &operator= (const auto_timevar &) = delete;
 
  private:
   timer *m_timer;
@@ -279,8 +280,9 @@ class auto_cond_timevar
   m_timer->cond_stop (m_tv);
   }
 
-  // Disallow copies.
+  // Prevent shallow copies.
   auto_cond_timevar (const auto_cond_timevar &) = delete;
+  auto_cond_timevar &operator= (const auto_cond_timevar &) = delete;
 
  private:
   void start()
-- 
2.39.2



Re: AArch64 bfloat16 mangling

2023-03-10 Thread Jakub Jelinek via Gcc-patches
On Fri, Mar 10, 2023 at 11:50:39AM +, Richard Sandiford wrote:
> > Will test it momentarily (including the patch it depends on):

Note, testing still pending, I'm testing in a Fedora scratch build
and that is quite slow (lto bootstrap and the like).

> A naive question:
> 
> > --- libgcc/config/aarch64/t-softfp.jj   2022-11-14 13:35:34.527155682 
> > +0100
> > +++ libgcc/config/aarch64/t-softfp  2023-03-10 12:19:58.668882041 +0100
> > @@ -1,9 +1,10 @@
> >  softfp_float_modes := tf
> >  softfp_int_modes := si di ti
> > -softfp_extensions := sftf dftf hftf
> > -softfp_truncations := tfsf tfdf tfhf
> > +softfp_extensions := sftf dftf hftf bfsf
> > +softfp_truncations := tfsf tfdf tfhf tfbf dfbf sfbf hfbf
> 
> Is bfsf used for conversions in which sf is the ultimate target,
> as opposed to operations that convert bf to sf and then do something
> with the sf?  And so the libfunc is needed to raise exceptions, which in
> more complex operations can be left to the following sf operation?
> 
> Do we still optimise to a shift for -ffinite-math-only?

Reminds me I should have added testcase coverage for PR107703, will post
it momentarily.

But, consider say:
template 
[[gnu::noipa]] T cvt (F f)
{
  return T (F (f));
}

void
foo ()
{
  cvt <_Float32, __bf16> (0.0bf16);
  cvt <_Float64, __bf16> (0.0bf16);
  cvt <_Float128, __bf16> (0.0bf16);
  cvt  (0.0bf16);
  cvt  (0.0bf16);
  cvt  (0.0bf16);
  cvt  (0.0bf16);
  cvt <__int128, __bf16> (0.0bf16);
}

This emits on x86_64 -O2:
/usr/src/gcc/obj/gcc/cc1plus -quiet -O2 .C; grep call.*__ .s
call__extendbfsf2
call__extendbfsf2
call__extendbfsf2
call__extendsftf2
call__fixsfti
where the first call is in cvt <_Float32, __bf16> is really needed,
admittedly the second 2 calls could be replaced by shifts but aren't right
now (we expand BF -> DF as BF -> SF -> DF and because sNaN would be already
diagnosed on the SF -> DF conversion if BF -> SF is done with shift, I think
it would be ok; similarly for BF -> TF).  All the others (BF -> ?I) are
expanded as BF -> SF using shift and then SF -> ?I.  With -O2 -ffast-math
/usr/src/gcc/obj/gcc/cc1plus -quiet -O2 -ffast-math .C; grep call.*__ .s
call__extendsftf2
call__fixsfti
so all the BF -> SF conversions are then done using shifts.
And aarch64 is exactly the same:
./cc1plus -quiet -nostdinc -O2 .C; grep bl.*__[ef] .s
bl  __extendbfsf2
bl  __extendbfsf2
bl  __extendbfsf2
bl  __extendsftf2
bl  __fixsfti
./cc1plus -quiet -nostdinc -O2 -ffast-math .C; grep bl.*__[ef] .s
bl  __extendsftf2
bl  __fixsfti

> Assuming so, the patch LGTM.  I'm not familiar enough with softfloat
> to do a meaningful review of those parts, and I'm taking the versioning
> changes on faith. :)

The soft-fp new files (in both patches) are fairly mechanical:
for i in float{,un}{d,t}isf.c; do \
  sed 's/IEEE single/bfloat16/;s/single/brain/;s/SFtype/BFtype/;s/_S /_B /;s/sf 
/bf /' \
$i `echo $i | sed 's/sf.c/bf.c/'`
done
(well, I've created them by hand, so the Copyright lines differ, but
otherwise they are identical to what the above script would create).
So, there are no smarts in those, the soft-fp library already can handle
those formats.

Jakub



[PATCH] c++ testsuite: Add test for PR107703

2023-03-10 Thread Jakub Jelinek via Gcc-patches
Hi!

This is on top of the
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606398.html
and
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/613724.html
patches (to be precise, the latter isn't essential for it), I've
realized that for the PR107703 bugfix in the first patch I haven't
added some test coverage that the extended floating vs. integral
or vice versa conversions work correctly.

This new testcase adds such checks.  And when writing it I've
found that in ext-floating.h header in the testsuite I forgot back
in November to remove #undef __STDCPP_BFLOAT16_T__ which was left
there because the bfloat16 support wasn't in yet.

The new testcase (and all older ext-floating*.C tests too) passes
on vanilla trunk without the ext-floating.h change (x86_64-linux
-m32/-m64) and with the PR107703 fix also with the ext-floating.h
change.

Ok for trunk?

2023-03-10  Jakub Jelinek  

PR target/107703
* g++.dg/cpp23/ext-floating.h (__STDCPP_BFLOAT16_T__): Don't undefine
it.
(std::bfloat16_t): Use decltype (0.0bf16) like libstdc++, rather than
__bf16.
* g++.dg/cpp23/ext-floating14.C: New test.

--- gcc/testsuite/g++.dg/cpp23/ext-floating.h.jj2022-09-27 
08:03:27.118982749 +0200
+++ gcc/testsuite/g++.dg/cpp23/ext-floating.h   2023-03-10 15:04:01.647824767 
+0100
@@ -14,9 +14,8 @@ namespace std
   #ifdef __STDCPP_FLOAT128_T__
   using float128_t = _Float128;
   #endif
-  #undef __STDCPP_BFLOAT16_T__
   #ifdef __STDCPP_BFLOAT16_T__
-  using bfloat16_t = __bf16; // ???
+  using bfloat16_t = decltype (0.0bf16);
   #endif
   template struct integral_constant {
 static constexpr T value = v;
--- gcc/testsuite/g++.dg/cpp23/ext-floating14.C.jj  2023-03-10 
14:12:17.658925358 +0100
+++ gcc/testsuite/g++.dg/cpp23/ext-floating14.C 2023-03-10 15:32:26.912057825 
+0100
@@ -0,0 +1,585 @@
+// P1467R9 - Extended floating-point types and standard names.
+// PR target/107703
+// { dg-do run { target c++23 } }
+// { dg-options "-fexcess-precision=standard" }
+
+#include "ext-floating.h"
+
+#ifdef __SIZEOF_INT128__
+#define INT128_MAX ((signed __int128) ((~(unsigned __int128) 0) >> 1))
+#endif
+
+template 
+[[gnu::noipa]] T cvt (F f)
+{
+  return T (F (f));
+}
+
+int
+main ()
+{
+  // __FLT32_MAX_EXP__ is 128, so make sure all unsigned long long and 
unsigned __int128
+  // values fit into it.  __FLT16_MAX__ is 65504.0f16, so we need to be
+  // careful for that.
+#if __SIZEOF_LONG_LONG__ * __CHAR_BIT__ <= 128
+#if !defined(__SIZEOF_INT128__) || __SIZEOF_INT128__ * __CHAR_BIT__ == 128
+#ifdef __STDCPP_FLOAT16_T__
+  if (cvt  (42) != (std::float16_t) 42
+  || cvt  (-42) != (std::float16_t) -42
+#if __SCHAR_MAX__ < 65504
+  || cvt  (__SCHAR_MAX__) != (std::float16_t) 
__SCHAR_MAX__
+  || cvt  (-__SCHAR_MAX__ - 1) != 
(std::float16_t) (-__SCHAR_MAX__ - 1)
+#endif
+ )
+__builtin_abort ();
+  if (cvt  (42) != (std::float16_t) 42
+#if __SCHAR_MAX__ * 2 + 1 < 65504
+  || cvt  ((unsigned char) ~0) != 
(std::float16_t) ((unsigned char) ~0)
+#endif
+ )
+__builtin_abort ();
+  if (cvt  (42) != (std::float16_t) 42
+  || cvt  (-42) != (std::float16_t) -42
+#if __SHRT_MAX__ < 65504
+  || cvt  (__SHRT_MAX__) != (std::float16_t) 
__SHRT_MAX__
+  || cvt  (-__SHRT_MAX__ - 1) != 
(std::float16_t) (-__SHRT_MAX__ - 1)
+#else
+  || cvt  (65504) != (std::float16_t) 65504
+  || cvt  (-65504) != (std::float16_t) -65504
+#endif
+ )
+__builtin_abort ();
+  if (cvt  (42) != (std::float16_t) 42
+  || cvt  (65504U) != (std::float16_t) 
65504U)
+__builtin_abort ();
+  if (cvt  (42) != (std::float16_t) 42
+  || cvt  (-42) != (std::float16_t) -42
+#if __INT_MAX__ < 65504
+  || cvt  (__INT_MAX__) != (std::float16_t) 
__INT_MAX__
+  || cvt  (-__INT_MAX__ - 1) != 
(std::float16_t) (-__INT_MAX__ - 1)
+#else
+  || cvt  (65504) != (std::float16_t) 65504
+  || cvt  (-65504) != (std::float16_t) -65504
+#endif
+ )
+__builtin_abort ();
+  if (cvt  (42) != (std::float16_t) 42U
+  || cvt  (65504U) != (std::float16_t) 
65504U)
+__builtin_abort ();
+  if (cvt  (42L) != (std::float16_t) 42L
+  || cvt  (-42L) != (std::float16_t) -42L
+  || cvt  (65504L) != (std::float16_t) 
65504L
+  || cvt  (-65504L) != (std::float16_t) 
-65504L)
+__builtin_abort ();
+  if (cvt  (42UL) != (std::float16_t) 42UL
+  || cvt  (65504UL) != (std::float16_t) 
65504UL)
+__builtin_abort ();
+  if (cvt  (42LL) != (std::float16_t) 
42LL
+  || cvt  (-42LL) != 
(std::float16_t) -42LL
+  || cvt  (65504LL) != 
(std::float16_t) 65504LL
+  || cvt  (-65504LL) != 
(std::float16_t) -65504LL)
+__builtin_abort ();
+  if (cvt  (42ULL) != (std::float16_t) 
42ULL
+  || cvt  (65504ULL) != 
(std::float16_t) 65504ULL)
+__builtin_abort ();
+#ifdef __SIZEOF_INT128__
+  if (cvt  (42LL) != (std::float16_t) (signed 
__int128) 42LL
+  || cvt  (-42LL) != (std::float16_t) 
(signed __int128) -42LL
+  || cvt

Re: [PATCH] gcc: Add deleted assignment operators to non-copyable types

2023-03-10 Thread Richard Biener via Gcc-patches



> Am 10.03.2023 um 16:36 schrieb Jonathan Wakely via Gcc-patches 
> :
> 
> Bootstrapped and regtested on powerpc64le-linux.
> 
> OK for trunk?

Ok.

Thanks,
Richard 

> It's safe to do now rather than waiting for Stage 1, because if we were
> actually relying on copy-assigning these types it would have failed to
> compile with this change. So it has no functional change, but will help
> prevent any future misuse of these types.
> 
> -- >8 --
> 
> The auto_timevar and auto_cond_timevar classes are supposed to be
> non-copyable, but they have implicit assignment operators. Define their
> assignment operators as deleted.
> 
> The auto_bitmap declares private copy/move constructors/assignments,
> which can be replced with deleted copies to get the same effect but
> using more idiomatic C++11 style.
> 
> gcc/ChangeLog:
> 
>* bitmap.h (class auto_bitmap): Replace private-and-undefined
>copy and move special member functions with deleted copies.
>* timevar.h (class auto_timevar): Delete assignment operator.
>(class auto_cond_timevar): Likewise.
> ---
> gcc/bitmap.h  | 11 ---
> gcc/timevar.h |  6 --
> 2 files changed, 8 insertions(+), 9 deletions(-)
> 
> diff --git a/gcc/bitmap.h b/gcc/bitmap.h
> index 43337d2e9d9..ccb484651ab 100644
> --- a/gcc/bitmap.h
> +++ b/gcc/bitmap.h
> @@ -945,7 +945,7 @@ bmp_iter_and_compl (bitmap_iterator *bi, unsigned *bit_no)
> /* A class that ties the lifetime of a bitmap to its scope.  */
> class auto_bitmap
> {
> - public:
> +public:
>   auto_bitmap (ALONE_CXX_MEM_STAT_INFO)
> { bitmap_initialize (&m_bits, &bitmap_default_obstack PASS_MEM_STAT); }
>   explicit auto_bitmap (bitmap_obstack *o CXX_MEM_STAT_INFO)
> @@ -954,12 +954,9 @@ class auto_bitmap
>   // Allow calling bitmap functions on our bitmap.
>   operator bitmap () { return &m_bits; }
> 
> - private:
> -  // Prevent making a copy that references our bitmap.
> -  auto_bitmap (const auto_bitmap &);
> -  auto_bitmap &operator = (const auto_bitmap &);
> -  auto_bitmap (auto_bitmap &&);
> -  auto_bitmap &operator = (auto_bitmap &&);
> +  // Prevent shallow copies.
> +  auto_bitmap (const auto_bitmap &) = delete;
> +  auto_bitmap &operator = (const auto_bitmap &) = delete;
> 
>   bitmap_head m_bits;
> };
> diff --git a/gcc/timevar.h b/gcc/timevar.h
> index ad465731609..b2d13d44190 100644
> --- a/gcc/timevar.h
> +++ b/gcc/timevar.h
> @@ -247,8 +247,9 @@ class auto_timevar
>   m_timer->pop (m_tv);
>   }
> 
> -  // Disallow copies.
> +  // Prevent shallow copies.
>   auto_timevar (const auto_timevar &) = delete;
> +  auto_timevar &operator= (const auto_timevar &) = delete;
> 
>  private:
>   timer *m_timer;
> @@ -279,8 +280,9 @@ class auto_cond_timevar
>   m_timer->cond_stop (m_tv);
>   }
> 
> -  // Disallow copies.
> +  // Prevent shallow copies.
>   auto_cond_timevar (const auto_cond_timevar &) = delete;
> +  auto_cond_timevar &operator= (const auto_cond_timevar &) = delete;
> 
>  private:
>   void start()
> -- 
> 2.39.2
> 


Re: [PATCH v2 4/5] Update texinfo.tex, remove the @gol macro/alias

2023-03-10 Thread Gerald Pfeifer
On Fri, 10 Mar 2023, Arsen Arsenović wrote:
> Thanks, Thomas.  I'd be happy to undergo this process later today.  If I
> understood right, I should fill out
> https://sourceware.org/cgi-bin/pdw/ps_form.cgi and name you, right?

Yes. 

(Thomas, you, and me actually could have met a FOSDEM. Next year we should
send a note to the gcc@ list and arrange for something?)

>>> Arsen, if that is indeed the case, I offer to push these two commits for
>>> you if you send them by e-mail (as two attachments).
> Thanks!  Either approach works for me :)

Happy to go the route Thomas suggested (though available to help, too).

Gerald


Re: [PATCH v2 4/5] Update texinfo.tex, remove the @gol macro/alias

2023-03-10 Thread Arsen Arsenović via Gcc-patches
Afternoon,

Sandra Loosemore  writes:

> On 2/23/23 03:27, Arsen Arsenović via Gcc-patches wrote:
>> The @gol macro appears to have existed as a workaround for a bug in old
>> versions of makeinfo and/or texinfo.tex, where they would, in some types
>> of output, fail to emit line breaks in @gccoptlists.  After updating
>> texinfo.tex, I noticed that this behavior appears to no longer be
>> exhibited, instead, both acted correctly and inserted newlines.  The
>> (groff) manual output also appears unaffected.
>> gcc/ChangeLog:
>>  * doc/include/texinfo.tex: Update to 2023-01-17.19.
>>  * doc/implement-c.texi: Remove usage of @gol.
>>  * doc/invoke.texi: Ditto.
>>  * doc/sourcebuild.texi: Ditto.
>>  * doc/include/gcc-common.texi: Remove @gol.  In new Makeinfo and
>>  texinfo.tex versions, the bug it was working around appears to
>>  be gone.
>> gcc/fortran/ChangeLog:
>>  * invoke.texi: Remove usages of @gol.
>>  * intrinsic.texi: Ditto.
>
> This is OK, but I'd like to see this patch split into two separate commits as
> well -- one for the texinfo.tex import, and one for the @gol changes.

The full log is below.  This branch is also, as usual, visible on my
fork at:

  https://git.sr.ht/~arsen/gcc texinfo_improvements

... and is visible in web form at:

  https://git.sr.ht/~arsen/gcc/log/texinfo_improvements

I rebased the patchset to today's revision again.

The log:

commit 63fcce9b7d7af55fd73024fa42cb44fde9063c3a
Author: Arsen Arsenović 
Date:   Thu Mar 9 21:44:29 2023 +0100

update_web_docs_git: Set CONTENTS_OUTPUT_LOCATION=inline

maintainer-scripts/ChangeLog:

* update_web_docs_git: Set CONTENTS_OUTPUT_LOCATION=inline in
order to put @shortcontents above contents. See
9dd976a4-4e09-d901-b949-6d5037567...@codesourcery.com on
gcc-patches.

commit 04c696ed31e623d003a0472f83d6a93c53541790
Author: Arsen Arsenović 
Date:   Tue Feb 28 11:40:56 2023 +0100

docs: Fix up new instances of index reordering

This commit fixes up an instance of the index entry mis-ordering that
occurred between the formulation and application of commit
r13-6310-gf33d7a88d069d1.

gcc/ChangeLog:

* doc/extend.texi: Associate use_hazard_barrier_return index
entry with its attribute.
* doc/invoke.texi: Associate -fcanon-prefix-map index entry with
its attribute

commit cfc401f06bf46fd98ee1272e0204155d635b3c42
Author: Arsen Arsenović 
Date:   Thu Jan 26 18:50:38 2023 +0100

update_web_docs_git: Update CSS reference to new manual CSS

maintainer-scripts/ChangeLog:

* update_web_docs_git (CSS): Update CSS reference to point to
/texinfo-manuals.css.

commit d3c953045f99523a726d1d2b2d25bc9e9352fdfd
Author: Arsen Arsenović 
Date:   Wed Jan 25 23:33:03 2023 +0100

doc: Remove the @gol macro/alias

The @gol macro appears to have existed as a workaround for a bug in old
versions of makeinfo and/or texinfo.tex, where they would, in some types
of output, fail to emit line breaks in @gccoptlists.  After updating
texinfo.tex, I noticed that this behavior appears to no longer be
exhibited, instead, both acted correctly and inserted newlines.  The
(groff) manual output also appears unaffected.

gcc/ChangeLog:

* doc/implement-c.texi: Remove usage of @gol.
* doc/invoke.texi: Ditto.
* doc/sourcebuild.texi: Ditto.
* doc/include/gcc-common.texi: Remove @gol.  In new Makeinfo and
texinfo.tex versions, the bug it was working around appears to
be gone.

gcc/fortran/ChangeLog:

* invoke.texi: Remove usages of @gol.
* intrinsic.texi: Ditto.

commit e543f4acda2573331d02a99a779f8710abfb14c6
Author: Arsen Arsenović 
Date:   Fri Mar 10 16:21:33 2023 +0100

doc: Update texinfo.tex

gcc/ChangeLog:

* doc/include/texinfo.tex: Update to 2023-01-17.19.

commit 5f093996a453656755a0209271d01392d05b849e
Author: Arsen Arsenović 
Date:   Fri Mar 10 16:13:28 2023 +0100

docs: Add @defbuiltin family of helpers

The @defbuiltin{,x} macros are convenience macros for the often-repeated
task of defining a built-in function in extend.texi.  Usage of this
macro should lead to a higher degree of consistency across pieces of
text written by different people, and provide a better reading
experience, as they prevent easy-to-make errors, like forgetting index
entries for these functions.

gcc/ChangeLog:

* doc/include/gcc-common.texi: Add @defbuiltin{,x} and
@enddefbuiltin for defining built-in functions.
* doc/extend.texi: Apply @defbuiltin{,x} to many, but not all,
places where it should be used.

commit a867275185f2f2d5d654394a406ae28f16a4f26a
Author: Arsen Arsenović 
Date:   Fri Mar 10 16:08:19 2023 +0

Re: [PATCH v2 4/5] Update texinfo.tex, remove the @gol macro/alias

2023-03-10 Thread Arsen Arsenović via Gcc-patches
Hi Gerald,

Gerald Pfeifer  writes:

> On Fri, 10 Mar 2023, Arsen Arsenović wrote:
>> Thanks, Thomas.  I'd be happy to undergo this process later today.  If I
>> understood right, I should fill out
>> https://sourceware.org/cgi-bin/pdw/ps_form.cgi and name you, right?
>
> Yes. 

Thanks, done.  Exciting :-)

> (Thomas, you, and me actually could have met a FOSDEM. Next year we should
> send a note to the gcc@ list and arrange for something?)
>
 Arsen, if that is indeed the case, I offer to push these two commits for
 you if you send them by e-mail (as two attachments).
>> Thanks!  Either approach works for me :)
>
> Happy to go the route Thomas suggested (though available to help, too).

Thank you!  I'll keep you posted.  For now, let's get a final review of
the remaining bits and I'll try pushing myself should that become
possible.

> Gerald

Have a great day :-)
-- 
Arsen Arsenović


signature.asc
Description: PGP signature


[PATCH] c++: ICE with constexpr lambda [PR107280]

2023-03-10 Thread Marek Polacek via Gcc-patches
We crash here since r10-3661, the store_init_value hunk in particular.
Before, we called cp_fully_fold_init, so e.g.

  {.str=VIEW_CONVERT_EXPR("")}

was folded into

  {.str=""}

but now we don't fold and keep the VCE around, and it causes trouble in
cxx_eval_store_expression: in the !refs->is_empty () loop we descend on
.str's initializer but since it's wrapped in a VCE, we skip the STRING_CST
check and then crash on the CONSTRUCTOR_NO_CLEARING.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/12?

PR c++/107280

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_store_expression): Strip location wrappers.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/constexpr-lambda28.C: New test.
---
 gcc/cp/constexpr.cc |  3 ++-
 gcc/testsuite/g++.dg/cpp1z/constexpr-lambda28.C | 15 +++
 2 files changed, 17 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/constexpr-lambda28.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 8683c00596a..abf6ee560c5 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -6033,7 +6033,8 @@ cxx_eval_store_expression (const constexpr_ctx *ctx, tree 
t,
  *valp = build_constructor (type, NULL);
  CONSTRUCTOR_NO_CLEARING (*valp) = no_zero_init;
}
-  else if (TREE_CODE (*valp) == STRING_CST)
+  else if (STRIP_ANY_LOCATION_WRAPPER (*valp),
+  TREE_CODE (*valp) == STRING_CST)
{
  /* An array was initialized with a string constant, and now
 we're writing into one of its elements.  Explode the
diff --git a/gcc/testsuite/g++.dg/cpp1z/constexpr-lambda28.C 
b/gcc/testsuite/g++.dg/cpp1z/constexpr-lambda28.C
new file mode 100644
index 000..aafbfddd8b9
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/constexpr-lambda28.C
@@ -0,0 +1,15 @@
+// PR c++/107280
+// { dg-do compile { target c++17 } }
+
+struct string {
+  char str[8] = "";
+};
+template  constexpr void
+test ()
+{
+  string str{};
+  auto append = [&](const char *s) { *str.str = *s; };
+  append("");
+}
+
+static_assert ((test(), true), "");

base-commit: 2b2340e236c0bba8aaca358ea25a5accd8249fbd
-- 
2.39.2



Re: AArch64 bfloat16 mangling

2023-03-10 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek  writes:
> On Fri, Mar 10, 2023 at 11:50:39AM +, Richard Sandiford wrote:
>> > Will test it momentarily (including the patch it depends on):
>
> Note, testing still pending, I'm testing in a Fedora scratch build
> and that is quite slow (lto bootstrap and the like).
>
>> A naive question:
>> 
>> > --- libgcc/config/aarch64/t-softfp.jj  2022-11-14 13:35:34.527155682 
>> > +0100
>> > +++ libgcc/config/aarch64/t-softfp 2023-03-10 12:19:58.668882041 +0100
>> > @@ -1,9 +1,10 @@
>> >  softfp_float_modes := tf
>> >  softfp_int_modes := si di ti
>> > -softfp_extensions := sftf dftf hftf
>> > -softfp_truncations := tfsf tfdf tfhf
>> > +softfp_extensions := sftf dftf hftf bfsf
>> > +softfp_truncations := tfsf tfdf tfhf tfbf dfbf sfbf hfbf
>> 
>> Is bfsf used for conversions in which sf is the ultimate target,
>> as opposed to operations that convert bf to sf and then do something
>> with the sf?  And so the libfunc is needed to raise exceptions, which in
>> more complex operations can be left to the following sf operation?
>> 
>> Do we still optimise to a shift for -ffinite-math-only?
>
> Reminds me I should have added testcase coverage for PR107703, will post
> it momentarily.
>
> But, consider say:
> template 
> [[gnu::noipa]] T cvt (F f)
> {
>   return T (F (f));
> }
>
> void
> foo ()
> {
>   cvt <_Float32, __bf16> (0.0bf16);
>   cvt <_Float64, __bf16> (0.0bf16);
>   cvt <_Float128, __bf16> (0.0bf16);
>   cvt  (0.0bf16);
>   cvt  (0.0bf16);
>   cvt  (0.0bf16);
>   cvt  (0.0bf16);
>   cvt <__int128, __bf16> (0.0bf16);
> }
>
> This emits on x86_64 -O2:
> /usr/src/gcc/obj/gcc/cc1plus -quiet -O2 .C; grep call.*__ .s
>   call__extendbfsf2
>   call__extendbfsf2
>   call__extendbfsf2
>   call__extendsftf2
>   call__fixsfti
> where the first call is in cvt <_Float32, __bf16> is really needed,
> admittedly the second 2 calls could be replaced by shifts but aren't right
> now (we expand BF -> DF as BF -> SF -> DF and because sNaN would be already
> diagnosed on the SF -> DF conversion if BF -> SF is done with shift, I think
> it would be ok; similarly for BF -> TF).  All the others (BF -> ?I) are
> expanded as BF -> SF using shift and then SF -> ?I.  With -O2 -ffast-math
> /usr/src/gcc/obj/gcc/cc1plus -quiet -O2 -ffast-math .C; grep call.*__ 
> .s
>   call__extendsftf2
>   call__fixsfti
> so all the BF -> SF conversions are then done using shifts.
> And aarch64 is exactly the same:
> ./cc1plus -quiet -nostdinc -O2 .C; grep bl.*__[ef] .s
>   bl  __extendbfsf2
>   bl  __extendbfsf2
>   bl  __extendbfsf2
>   bl  __extendsftf2
>   bl  __fixsfti
> ./cc1plus -quiet -nostdinc -O2 -ffast-math .C; grep bl.*__[ef] .s
>   bl  __extendsftf2
>   bl  __fixsfti

Thanks, sounds good.  In some ways it's ironic that, in a bf->df
conversion, it's the bf->sf that needs a call, and the sf->df can
be done inline, given that one of the purposes of bf16 was to provide
cheap conversions to float.  And similarly that bf->sf is more expensive
than sf->df.  But that's not the patch's fault.

Rather than have an out-of-line call, would it be possible to synthesise
the checking inline by making bf->sf do a following sf->df conversion,
even when the df result is not used?  It would obviously need to be kept
alive somehow (not sure how).

Richard


[pushed] analyzer: fix leak false +ve seen in haproxy's cfgparse.c [PR109059]

2023-03-10 Thread David Malcolm via Gcc-patches
If a bound region gets overwritten with UNKNOWN due to being
possibly-aliased during a write, that could have been the only
region keeping its value live, in which case we could falsely report
a leak.  This is hidden somewhat by the "uncertainty" mechanism for
cases where the write happens in the same stmt as the last reference
to the value goes away, but not in the general case, which occurs
in PR analyzer/109059, which falsely complains about a leak whilst
haproxy updates a doubly-linked list.

The whole "uncertainty_t" class seems broken to me now; I think we need
to track (in the store) what values could have escaped to the external
part of the program.  We do this to some extent for pointers by tracking
the region as escaped, though we're failing to do this for this case:
even though there could still be other pointers to the region,
eventually they go away; we want to capture the fact that the external
part of the state is still keeping it live.  Also, this doesn't work for
non-pointer svalues, such as for detecting file-descriptor leaks.

As both a workaround and a step towards eventually removing
"class uncertainty_t" this patch updates the "mark_region_as_unknown"
code called by possibly-aliased set_value so that when old values are
removed, any base region pointed to them is marked as escaped, fixing
the leak false positive.

The patch has this effect on my integration tests of -fanalyzer:

  Comparison: 
GOOD: 129(19.20% -> 20.22%)
 BAD: 543 -> 509 (-34)

where there's a big improvement in -Wanalyzer-malloc-leak:

  -Wanalyzer-malloc-leak: 
GOOD: 61   (45.19% -> 54.95%)
 BAD: 74 -> 50 (-24)
 Known false positives: 25 -> 2 (-23)
   haproxy-2.7.1: 24 ->  1 (-23)
 Suspected false positives: 49 -> 48 (-1)
   coreutils-9.1: 32 -> 31 (-1)

and some churn in the other warnings:

  -Wanalyzer-use-of-uninitialized-value:
 GOOD: 0
  BAD: 81 -> 80 (-1)
  -Wanalyzer-file-leak:
 GOOD: 0
  BAD: 10 -> 11 (+1)
  -Wanalyzer-out-of-bounds:
 GOOD: 0
  BAD: 24 -> 22 (-2)

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r13-6589-g14f5e56a8a766c.

gcc/analyzer/ChangeLog:
PR analyzer/109059
* region-model.cc (region_model::mark_region_as_unknown): Gather a
set of maybe-live svalues and call on_maybe_live_values with it.
* store.cc (binding_map::remove_overlapping_bindings): Add new
"maybe_live_values" param; add any removed svalues to it.
(binding_cluster::clobber_region): Add NULL as new param of
remove_overlapping_bindings.
(binding_cluster::mark_region_as_unknown): Add "maybe_live_values"
param and pass it to remove_overlapping_bindings.
(binding_cluster::maybe_get_compound_binding): Add NULL for new
param of binding_map::remove_overlapping_bindings.
(binding_cluster::remove_overlapping_bindings): Add
"maybe_live_values" param and pass to
binding_map::remove_overlapping_bindings.
(store::set_value): Capture a set of maybe-live svalues, and call
on_maybe_live_values with it.
(store::on_maybe_live_values): New.
(store::mark_region_as_unknown): Add "maybe_live_values" param
and pass it to binding_cluster::mark_region_as_unknown.
(store::remove_overlapping_bindings): Pass NULL for new param of
binding_cluster::remove_overlapping_bindings.
* store.h (binding_map::remove_overlapping_bindings): Add
"maybe_live_values" param.
(binding_cluster::mark_region_as_unknown): Likewise.
(binding_cluster::remove_overlapping_bindings): Likewise.
(store::mark_region_as_unknown): Likewise.
(store::on_maybe_live_values): New decl.

gcc/testsuite/ChangeLog:
PR analyzer/109059
* gcc.dg/analyzer/flex-with-call-summaries.c: Remove xfail.
* gcc.dg/analyzer/leak-pr109059-1.c: New test.
* gcc.dg/analyzer/leak-pr109059-2.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/region-model.cc  |  4 +-
 gcc/analyzer/store.cc | 70 +++
 gcc/analyzer/store.h  | 11 ++-
 .../analyzer/flex-with-call-summaries.c   |  3 +-
 .../gcc.dg/analyzer/leak-pr109059-1.c | 46 
 .../gcc.dg/analyzer/leak-pr109059-2.c | 42 +++
 6 files changed, 158 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/leak-pr109059-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/leak-pr109059-2.c

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index bf07cec2884..56beaa82f95 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -3296,8 +3296,10 @@ void
 region_model::mark_region_as_unknown (const region *reg,
  uncertainty_t *uncertainty)
 {
+  svalue_set maybe_live_values;
   m_store.mark_region_as

Use 'GOMP_MAP_VARS_TARGET' for OpenACC compute constructs [PR90596]

2023-03-10 Thread Thomas Schwinge
Hi!

Pushed to master branch commit f8332e52a498df480f72303de32ad0751ad899fe
"Use 'GOMP_MAP_VARS_TARGET' for OpenACC compute constructs [PR90596]",
see attached.

libgomp/oacc-parallel.c|  13 +-
libgomp/plugin/plugin-gcn.c|  47 ++-
libgomp/plugin/plugin-nvptx.c  | 154 ++---
libgomp/target.c   |  10 +-
.../acc_prof-parallel-1.c  |  58 ++--
5 files changed, 44 insertions(+), 238 deletions(-)

I like it.  :-)


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From f8332e52a498df480f72303de32ad0751ad899fe Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 27 Feb 2023 15:56:18 +0100
Subject: [PATCH] Use 'GOMP_MAP_VARS_TARGET' for OpenACC compute constructs
 [PR90596]

Thereby considerably simplify the device plugins' 'GOMP_OFFLOAD_openacc_exec',
'GOMP_OFFLOAD_openacc_async_exec' functions: in terms of lines of code, but in
particular conceptually: no more device memory allocation, host to device data
copying, device memory deallocation -- 'GOMP_MAP_VARS_TARGET' does all that for
us.

This depends on commit 2b2340e236c0bba8aaca358ea25a5accd8249fbd
"Allow libgomp 'cbuf' buffering with OpenACC 'async' for 'ephemeral' data",
where I said that "a use will emerge later", which is this one here.

	PR libgomp/90596
	libgomp/
	* target.c (gomp_map_vars_internal): Allow for
	'param_kind == GOMP_MAP_VARS_OPENACC | GOMP_MAP_VARS_TARGET'.
	* oacc-parallel.c (GOACC_parallel_keyed): Pass
	'GOMP_MAP_VARS_TARGET' to 'goacc_map_vars'.
	* plugin/plugin-gcn.c (alloc_by_agent, gcn_exec)
	(GOMP_OFFLOAD_openacc_exec, GOMP_OFFLOAD_openacc_async_exec):
	Adjust, simplify.
	(gomp_offload_free): Remove.
	* plugin/plugin-nvptx.c (nvptx_exec, GOMP_OFFLOAD_openacc_exec)
	(GOMP_OFFLOAD_openacc_async_exec): Adjust, simplify.
	(cuda_free_argmem): Remove.
	* testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c:
	Adjust.
---
 libgomp/oacc-parallel.c   |  13 +-
 libgomp/plugin/plugin-gcn.c   |  47 +-
 libgomp/plugin/plugin-nvptx.c | 154 ++
 libgomp/target.c  |  10 +-
 .../acc_prof-parallel-1.c |  58 ++-
 5 files changed, 44 insertions(+), 238 deletions(-)

diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
index 687edf898fc..363e6656982 100644
--- a/libgomp/oacc-parallel.c
+++ b/libgomp/oacc-parallel.c
@@ -108,8 +108,6 @@ GOACC_parallel_keyed (int flags_m, void (*fn) (void *),
   va_list ap;
   struct goacc_thread *thr;
   struct gomp_device_descr *acc_dev;
-  struct target_mem_desc *tgt;
-  void **devaddrs;
   unsigned int i;
   struct splay_tree_key_s k;
   splay_tree_key tgt_fn_key;
@@ -290,8 +288,10 @@ GOACC_parallel_keyed (int flags_m, void (*fn) (void *),
 
   goacc_aq aq = get_goacc_asyncqueue (async);
 
-  tgt = goacc_map_vars (acc_dev, aq, mapnum, hostaddrs, NULL, sizes, kinds,
-			true, 0);
+  struct target_mem_desc *tgt
+= goacc_map_vars (acc_dev, aq, mapnum, hostaddrs, NULL, sizes, kinds, true,
+		  GOMP_MAP_VARS_TARGET);
+
   if (profiling_p)
 {
   prof_info.event_type = acc_ev_enter_data_end;
@@ -301,10 +301,7 @@ GOACC_parallel_keyed (int flags_m, void (*fn) (void *),
 &api_info);
 }
 
-  devaddrs = gomp_alloca (sizeof (void *) * mapnum);
-  for (i = 0; i < mapnum; i++)
-devaddrs[i] = (void *) gomp_map_val (tgt, hostaddrs, i);
-
+  void **devaddrs = (void **) tgt->tgt_start;
   if (aq == NULL)
 acc_dev->openacc.exec_func (tgt_fn, mapnum, hostaddrs, devaddrs, dims,
 tgt);
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 954a140ba5e..347803762eb 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -1833,13 +1833,6 @@ alloc_by_agent (struct agent_info *agent, size_t size)
 {
   GCN_DEBUG ("Allocating %zu bytes on device %d\n", size, agent->device_id);
 
-  /* Zero-size allocations are invalid, so in order to return a valid pointer
- we need to pass a valid size.  One source of zero-size allocations is
- kernargs for kernels that have no inputs or outputs (the kernel may
- only use console output, for example).  */
-  if (size == 0)
-size = 4;
-
   void *ptr;
   hsa_status_t status = hsa_fns.hsa_memory_allocate_fn (agent->data_region,
 			size, &ptr);
@@ -2989,15 +2982,6 @@ copy_data (void *data_)
   free (data);
 }
 
-/* Free device data.  This is intended for use as an async callback event.  */
-
-static void
-gomp_offload_free (void *ptr)
-{
-  GCN_DEBUG ("Async thread ?:?: Freeing %p\n", ptr);
-  GOMP_OFFLOAD_free (0, ptr);
-}
-
 /* Request an asynchronous data copy, to or from a device, on a given queue.
The event will be registered as a callback.  */
 
@@ -3064,7 +3048,7 @@

Re: [PATCH 1/2] ipa-cp: Fix various issues in update_specialized_profile (PR 107925)

2023-03-10 Thread Jan Hubicka via Gcc-patches
> Hi,
> 
> the patch below fixes various issues in function
> update_specialized_profile.  The main is removal of the assert which
> is bogus in the case of recursive cloning.  The division of
> unexplained counts is guesswork, which then leads to updates of counts
> of recursive edges, which then can be redirected to the new clone and
> their count subtracted from the count and there simply may not be
> enough left in the count of the original node - especially when we
> clone a lot because of using --param ipa-cp-eval-threshold=1.
> 
> The other issue was omission to drop the count of the original node to
> ipa count.  And when calculating the remainder, we should use
> lenient_count_portion_handling to account for partial train runs.
> Finally, the patch adds dumping of the original count which I think
> is useful.
> 
> Profiled-LTO-bootstrapped on its own and also normally bootstrapped and
> tested together with the subsequent patch on an x86_64-linux.  OK for
> master and the 12 branch - assuming it is also affected?
> 
> Thanks,
> 
> Martin
> 
> 
> gcc/ChangeLog:
> 
> 2023-02-17  Martin Jambor  
> 
>   PR ipa/107925
>   * ipa-cp.cc (update_specialized_profile): Drop orig_node_count to
>   ipa count, remove assert, lenient_count_portion_handling, dump
>   also orig_node_count.

OK,
thanks!
Honza


Re: [PATCH 2/2] ipa-cp: Improve updating behavior when profile counts have gone bad

2023-03-10 Thread Jan Hubicka via Gcc-patches
> Hi,
> 
> Looking into the behavior of profile count updating in PR 107925, I
> noticed that an option not considered possible was actually happening,
> and - with the guesswork in place to distribute unexplained counts -
> it simply can happen.  Currently it is handled by dropping the counts
> to local estimated zero, whereas it is probably better to leave the
> count as they are but drop the category to GUESSED_GLOBAL0 - which is
> what profile_count::combine_with_ipa_count in a similar case (or so I
> hope :-)
> 
> Profiled-LTO-bootstrapped and normally bootstrapped and tested on an
> x86_64-linux.  OK for master once stage1 opens up?  Or perhaps even now?
> 
> Thanks,
> 
> Martin
> 
> 
> gcc/ChangeLog:
> 
> 2023-02-20  Martin Jambor  
> 
>   PR ipa/107925
>   * ipa-cp.cc (update_profiling_info): Drop counts of orig_node to
>   global0 instead of zeroing when it does not have as many counts as
>   it should.

OK,
thanks!
Honza
> ---
>  gcc/ipa-cp.cc | 29 ++---
>  1 file changed, 22 insertions(+), 7 deletions(-)
> 
> diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
> index 5a6b41cf2d6..6477bb840e5 100644
> --- a/gcc/ipa-cp.cc
> +++ b/gcc/ipa-cp.cc
> @@ -4969,10 +4969,20 @@ update_profiling_info (struct cgraph_node *orig_node,
> false);
>new_sum = stats.count_sum;
>  
> +  bool orig_edges_processed = false;
>if (new_sum > orig_node_count)
>  {
> -  /* TODO: Perhaps this should be gcc_unreachable ()?  */
> -  remainder = profile_count::zero ().guessed_local ();
> +  /* TODO: Profile has alreay gone astray, keep what we have but lower it
> +  to global0 category.  */
> +  remainder = orig_node->count.global0 ();
> +
> +  for (cgraph_edge *cs = orig_node->callees; cs; cs = cs->next_callee)
> + cs->count = cs->count.global0 ();
> +  for (cgraph_edge *cs = orig_node->indirect_calls;
> +cs;
> +cs = cs->next_callee)
> + cs->count = cs->count.global0 ();
> +  orig_edges_processed = true;
>  }
>else if (stats.rec_count_sum.nonzero_p ())
>  {
> @@ -5070,11 +5080,16 @@ update_profiling_info (struct cgraph_node *orig_node,
>for (cgraph_edge *cs = new_node->indirect_calls; cs; cs = cs->next_callee)
>  cs->count = cs->count.apply_scale (new_sum, orig_new_node_count);
>  
> -  profile_count::adjust_for_ipa_scaling (&remainder, &orig_node_count);
> -  for (cgraph_edge *cs = orig_node->callees; cs; cs = cs->next_callee)
> -cs->count = cs->count.apply_scale (remainder, orig_node_count);
> -  for (cgraph_edge *cs = orig_node->indirect_calls; cs; cs = cs->next_callee)
> -cs->count = cs->count.apply_scale (remainder, orig_node_count);
> +  if (!orig_edges_processed)
> +{
> +  profile_count::adjust_for_ipa_scaling (&remainder, &orig_node_count);
> +  for (cgraph_edge *cs = orig_node->callees; cs; cs = cs->next_callee)
> + cs->count = cs->count.apply_scale (remainder, orig_node_count);
> +  for (cgraph_edge *cs = orig_node->indirect_calls;
> +cs;
> +cs = cs->next_callee)
> + cs->count = cs->count.apply_scale (remainder, orig_node_count);
> +}
>  
>if (dump_file)
>  {
> -- 
> 2.39.1
> 


[pushed] wwwdocs: gcc-13: Escape < and > as < and >

2023-03-10 Thread Gerald Pfeifer
Note that in HTML < and > have a special meaning, so we cannot simply 
write "<* noreturn *>", but need to escape it as "<* noreturn *>".

Pushed.

Gerald

---
 htdocs/gcc-13/changes.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
index 17be7e7c..209c13cd 100644
--- a/htdocs/gcc-13/changes.html
+++ b/htdocs/gcc-13/changes.html
@@ -383,7 +383,7 @@ a work-in-progress.
 support for the ISO/IEC 10514-1, PIM2, PIM3, PIM4 dialects
 together with a complete set of ISO/IEC 10514-1 and PIM
 libraries.
-  The <* noreturn *> attribute is supported
+  The <* noreturn *> attribute is supported
 with the -Wreturn-type
 https://gcc.gnu.org/onlinedocs/m2/Compiler-options.html";>
   option.
-- 
2.39.2


[COMMITTED] Fix PR 108874: aarch64 code regression with shift and ands

2023-03-10 Thread Andrew Pinski via Gcc-patches
After r6-2044-g98e30e515f184b, code like "((x & 0xff00ff00U) >> 8)"
would be optimized like (x >> 8) & 0xff00ffU which is normally better
except on aarch64, the shift right could be combined with another
operation in some cases. So we need to add a few define_splits
to the aarch64 backends that match "((x >> shift) & CST0) OP Y"
and splits it to:
TMP = X & CST1
(TMP >> shift) OP Y

Note this also gets us to matching rev16 back too so I added a
testcase to make sure we don't lose that matching any more.
Note when the generic patch to recognize those as bswap ROT 16,
we might regress again and need to add a few more patterns to
the aarch64 backend but will deal with that once that happens.

Committed as approved after a bootstrapp/test on aarch64-linux-gnu with no 
regressions.

gcc/ChangeLog:

* config/aarch64/aarch64.md: Add a new define_split
to help combine.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/rev16_2.c: New test.
* gcc.target/aarch64/shift_and_operator-1.c: New test.
---
 gcc/config/aarch64/aarch64.md | 23 +++
 gcc/testsuite/gcc.target/aarch64/rev16_2.c| 39 +++
 .../gcc.target/aarch64/shift_and_operator-1.c | 22 +++
 3 files changed, 84 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/rev16_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shift_and_operator-1.c

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index af9087508ac..022eef80bc1 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4656,6 +4656,29 @@ (define_insn "*_3"
   [(set_attr "type" "logic_shift_imm")]
 )
 
+(define_split
+  [(set (match_operand:GPI 0 "register_operand")
+   (LOGICAL_OR_PLUS:GPI
+ (and:GPI
+   (lshiftrt:GPI (match_operand:GPI 1 "register_operand")
+ (match_operand:QI 2 "aarch64_shift_imm_"))
+   (match_operand:GPI 3 "aarch64_logical_immediate"))
+ (match_operand:GPI 4 "register_operand")))]
+  "can_create_pseudo_p ()
+   && aarch64_bitmask_imm (UINTVAL (operands[3]) << UINTVAL (operands[2]),
+  mode)"
+  [(set (match_dup 5) (and:GPI (match_dup 1) (match_dup 6)))
+   (set (match_dup 0) (LOGICAL_OR_PLUS:GPI
+  (lshiftrt:GPI (match_dup 5) (match_dup 2))
+   (match_dup 4)))]
+  {
+operands[5] = gen_reg_rtx (mode);
+operands[6]
+  = gen_int_mode (UINTVAL (operands[3]) << UINTVAL (operands[2]),
+ mode);
+  }
+)
+
 (define_split
   [(set (match_operand:GPI 0 "register_operand")
(LOGICAL_OR_PLUS:GPI
diff --git a/gcc/testsuite/gcc.target/aarch64/rev16_2.c 
b/gcc/testsuite/gcc.target/aarch64/rev16_2.c
new file mode 100644
index 000..621eb5dfbf0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/rev16_2.c
@@ -0,0 +1,39 @@
+/* { dg-options "-O2" } */
+/* { dg-do compile } */
+
+extern void abort (void);
+
+typedef unsigned int __u32;
+
+__u32
+__rev16_32_alt (__u32 x)
+{
+  return (((__u32)(x) & (__u32)0xff00ff00UL) >> 8)
+ | (((__u32)(x) & (__u32)0x00ff00ffUL) << 8);
+}
+
+__u32
+__rev16_32 (__u32 x)
+{
+  return (((__u32)(x) & (__u32)0x00ff00ffUL) << 8)
+ | (((__u32)(x) & (__u32)0xff00ff00UL) >> 8);
+}
+
+typedef unsigned long long __u64;
+
+__u64
+__rev16_64_alt (__u64 x)
+{
+  return (((__u64)(x) & (__u64)0xff00ff00ff00ff00UL) >> 8)
+ | (((__u64)(x) & (__u64)0x00ff00ff00ff00ffUL) << 8);
+}
+
+__u64
+__rev16_64 (__u64 x)
+{
+  return (((__u64)(x) & (__u64)0x00ff00ff00ff00ffUL) << 8)
+ | (((__u64)(x) & (__u64)0xff00ff00ff00ff00UL) >> 8);
+}
+
+/* { dg-final { scan-assembler-times "rev16\\tx\[0-9\]+" 2 } } */
+/* { dg-final { scan-assembler-times "rev16\\tw\[0-9\]+" 2 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/shift_and_operator-1.c 
b/gcc/testsuite/gcc.target/aarch64/shift_and_operator-1.c
new file mode 100644
index 000..49152c5495a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/shift_and_operator-1.c
@@ -0,0 +1,22 @@
+/* { dg-options "-O2" } */
+/* { dg-do compile } */
+
+unsigned f(unsigned x, unsigned b)
+{
+  return ((x & 0xff00ff00U) >> 8) | b;
+}
+
+unsigned f0(unsigned x, unsigned b)
+{
+  return ((x & 0xff00ff00U) >> 8) ^ b;
+}
+unsigned f1(unsigned x, unsigned b)
+{
+  return ((x & 0xff00ff00U) >> 8) + b;
+}
+
+/* { dg-final { scan-assembler-times "lsr\\tw\[0-9\]+" 0 } } */
+/* { dg-final { scan-assembler-times "lsr 8" 3 } } */
+/* { dg-final { scan-assembler-times "eor\\tw\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "add\\tw\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "orr\\tw\[0-9\]+" 1 } } */
-- 
2.31.1



Re: [PATCH v2 0/5] A small Texinfo refinement

2023-03-10 Thread Sandra Loosemore via Gcc-patches

On 3/10/23 01:50, Iain Sandoe wrote:

Hi all,


On 9 Mar 2023, at 23:35, Sandra Loosemore via Gcc-patches
 wrote:

On 3/9/23 01:26, Richard Biener wrote:


SLES 12 has texinfo 4.13a, SLES 15 has texinfo 6.5.  We still
provide up-to-date GCC for SLES 12 but we can probably manage in
some ways when the texinfo requirement gets bumped.


OK, this seems to be the oldest version anyone admits to actually
using.  I built the manual with Arsen's patches using 4.13a; the
build was successful, and I didn't see any obvious issues with the
@gol removal in either the PDF or HTML output, so I think we are OK
for backward compatibility.


FWIW macOS/Darwin (as delivered by Apple) is stuck on 4.8 (and,
presumably, very unlikely to advance), but I would expect most macOS
FOSS users have something newer installed, either self-built or via
macposrts/homebrew etc. so the “admits to actually using” applies
here too I think (personally, I am using 6.7 but not for any special
reason other than it was current when I updated  my local toolset).
So I think Darwin can also manage with a newer requirement.


Well, with 4.8 being too old to produce PDF output, that does kind of 
kill my idea of replacing the existing requirement for a specific 
Texinfo version with "the version that comes with your OS distribution 
is good enough".  :-(


AFAIK we have not knowingly changed any specific requirements beyond the 
stated 4.7 and 4.9 for PDF output, but it concerns me that nobody is 
likely to be using versions that old on a regular basis to make sure 
they continue to work and we haven't unknowingly introduced dependencies 
on newer Texinfo features.


Anyway, I think I will leave the existing requirement alone for now, and 
just add a note that newer versions produce better output.


-Sandra



[patch, Fortran] Enable -fwrapv for -std=legacy

2023-03-10 Thread Thomas Koenig via Gcc-patches

Hello world, here's the patch that was discussed.

Regression-tested. OK for trunk?

Since this appeared only in gcc13, I see no need for a backport.
I will also document this in the changes file.

Best regards

Thomas

Set -frapv if -std=legacy is set.

Fortran legacy codes sometimes contain linear congruential
seudorandom number generators.  These generators implicitly depend
on wrapping behavior on integer overflow, which is illegal Fortran,
but the best they could to at the time.

A gcc13 change exposed this in rnflow, part of the Polyhedron
benchmark, with -O3.  Rather than "regress" on such code, this patch
enables -fwrapv if -std=legacy is enabled.  This allows the benchmark
to run successfully, and presumably lots of other code as well.

gcc/fortran/ChangeLog:

PR fortran/109075
* options.cc (gfc_handle_option):  If -std=legacy is set,
also set -frwapv.
* invoke.texi: Document the change.diff --git a/gcc/fortran/invoke.texi b/gcc/fortran/invoke.texi
index 5679e2f2650..4f4950dad41 100644
--- a/gcc/fortran/invoke.texi
+++ b/gcc/fortran/invoke.texi
@@ -549,15 +549,16 @@ Fortran standard that includes all of the extensions supported by GNU
 Fortran, although warnings will be given for obsolete extensions not
 recommended for use in new code.  The @samp{legacy} value is
 equivalent but without the warnings for obsolete extensions, and may
-be useful for old non-standard programs.  The @samp{f95},
-@samp{f2003}, @samp{f2008}, and @samp{f2018} values specify strict
-conformance to the Fortran 95, Fortran 2003, Fortran 2008 and Fortran
-2018 standards, respectively; errors are given for all extensions
-beyond the relevant language standard, and warnings are given for the
-Fortran 77 features that are permitted but obsolescent in later
-standards. The deprecated option @samp{-std=f2008ts} acts as an alias for
-@samp{-std=f2018}. It is only present for backwards compatibility with
-earlier gfortran versions and should not be used any more.
+be useful for old non-standard programs.  It also sets
+@option{-fwrapv}.  The @samp{f95}, @samp{f2003}, @samp{f2008}, and
+@samp{f2018} values specify strict conformance to the Fortran 95,
+Fortran 2003, Fortran 2008 and Fortran 2018 standards, respectively;
+errors are given for all extensions beyond the relevant language
+standard, and warnings are given for the Fortran 77 features that are
+permitted but obsolescent in later standards. The deprecated option
+@samp{-std=f2008ts} acts as an alias for @samp{-std=f2018}. It is only
+present for backwards compatibility with earlier gfortran versions and
+should not be used any more.
 
 @opindex @code{ftest-forall-temp}
 @item -ftest-forall-temp
diff --git a/gcc/fortran/options.cc b/gcc/fortran/options.cc
index 27311961325..76166ac69aa 100644
--- a/gcc/fortran/options.cc
+++ b/gcc/fortran/options.cc
@@ -797,6 +797,8 @@ gfc_handle_option (size_t scode, const char *arg, HOST_WIDE_INT value,
 case OPT_std_legacy:
   set_default_std_flags ();
   gfc_option.warn_std = 0;
+  /* -std=legacy implies -fwapv, but the user can override it.  */
+  flag_wrapv = 1;
   break;
 
 case OPT_fshort_enums:


Re: [patch, Fortran] Enable -fwrapv for -std=legacy

2023-03-10 Thread Jakub Jelinek via Gcc-patches
On Fri, Mar 10, 2023 at 06:54:10PM +0100, Thomas Koenig via Gcc-patches wrote:
> Hello world, here's the patch that was discussed.
> 
> Regression-tested. OK for trunk?
> 
> Since this appeared only in gcc13, I see no need for a backport.
> I will also document this in the changes file.
> 
> Best regards
> 
>   Thomas
> 
> Set -frapv if -std=legacy is set.

s/frapv/fwrapv/

> Fortran legacy codes sometimes contain linear congruential
> seudorandom number generators.  These generators implicitly depend
> on wrapping behavior on integer overflow, which is illegal Fortran,
> but the best they could to at the time.
> 
> A gcc13 change exposed this in rnflow, part of the Polyhedron
> benchmark, with -O3.  Rather than "regress" on such code, this patch
> enables -fwrapv if -std=legacy is enabled.  This allows the benchmark
> to run successfully, and presumably lots of other code as well.

I think it certainly shouldn't overwrite it, it can adjust the default,
but if user asks for -std=legacy -fno-wrapv or
-fno-wrapv -std=legacy, it should honor that.

> gcc/fortran/ChangeLog:
> 
>   PR fortran/109075
>   * options.cc (gfc_handle_option):  If -std=legacy is set,

s/  / /

>   also set -frwapv.

s/rwapv/wrapv/

>   * invoke.texi: Document the change.

> --- a/gcc/fortran/options.cc
> +++ b/gcc/fortran/options.cc
> @@ -797,6 +797,8 @@ gfc_handle_option (size_t scode, const char *arg, 
> HOST_WIDE_INT value,
>  case OPT_std_legacy:
>set_default_std_flags ();
>gfc_option.warn_std = 0;
> +  /* -std=legacy implies -fwapv, but the user can override it.  */
> +  flag_wrapv = 1;
>break;
>  
>  case OPT_fshort_enums:

So, I think it should be done later, after option processing, say
in gfc_post_options if it is possible to determine if -std=legacy
has been specified (and not say -std=legacy -std=f2018 etc.),
and using SET_OPTION_IF_UNSET.

Jakub



Re: [patch, Fortran] Enable -fwrapv for -std=legacy

2023-03-10 Thread Richard Biener via Gcc-patches



> Am 10.03.2023 um 18:54 schrieb Thomas Koenig via Fortran 
> :
> 
> Hello world, here's the patch that was discussed.
> 
> Regression-tested. OK for trunk?
> 
> Since this appeared only in gcc13, I see no need for a backport.
> I will also document this in the changes file.

The „problem“ is latent forever, I’m not sure it’s good to amend the 
kitchen-sink std=legacy option with -fwrapv since that has quite some negative 
effects on optimization.

Richard 

> Best regards
> 
>Thomas
> 
> Set -frapv if -std=legacy is set.
> 
> Fortran legacy codes sometimes contain linear congruential
> seudorandom number generators.  These generators implicitly depend
> on wrapping behavior on integer overflow, which is illegal Fortran,
> but the best they could to at the time.
> 
> A gcc13 change exposed this in rnflow, part of the Polyhedron
> benchmark, with -O3.  Rather than "regress" on such code, this patch
> enables -fwrapv if -std=legacy is enabled.  This allows the benchmark
> to run successfully, and presumably lots of other code as well.
> 
> gcc/fortran/ChangeLog:
> 
>PR fortran/109075
>* options.cc (gfc_handle_option):  If -std=legacy is set,
>also set -frwapv.
>* invoke.texi: Document the change.
> 


Re: [PATCH v2] ubsan: missed -fsanitize=bounds for compound ops [PR108060]

2023-03-10 Thread Jakub Jelinek via Gcc-patches
On Thu, Mar 09, 2023 at 07:44:53PM -0500, Marek Polacek wrote:
> On Thu, Mar 09, 2023 at 09:44:49AM +0100, Jakub Jelinek wrote:
> > On Thu, Mar 09, 2023 at 08:12:47AM +, Richard Biener wrote:
> > > I think this is a reasonable way to address the regression, so OK.
> > 
> > It is true that both C and C++ (including c++14_down and c++17 and later
> > where the latter have different ordering rules) evaluate the lhs of
> > MODIFY_EXPR after rhs, so conceptually this patch makes sense.
> 
> Thank you both for taking a look.
> 
> > But I wonder why we do in ubsan_maybe_instrument_array_ref:
> >   if (e != NULL_TREE)
> > {
> >   tree t = copy_node (*expr_p);
> >   TREE_OPERAND (t, 1) = build2 (COMPOUND_EXPR, TREE_TYPE (op1),
> > e, op1);
> >   *expr_p = t;
> > }
> > rather than modification of the ARRAY_REF's operand in place.  If we
> > did that, we wouldn't really care about the order, shared tree would
> > be instrumented once, with SAVE_EXPR in there making sure we don't
> > compute that multiple times.  Is that because the 2 copies could
> > have side-effects and we do want to evaluate those multiple times?
> 
> I'd assumed that that was the point of the copy_node.  But now that
> I'm actually experimenting with it, I can't trigger any problems
> without the copy_node.  So maybe we can use the following patch, which
> also adds a new test, bounds-21.c, to check that side-effects are
> evaluated correctly.  I didn't bother writing a description for this
> patch yet because I sort of think we should apply both patches at the
> same time.  

Perhaps it would be safer to apply for GCC 13 just your first patch
and maybe the testsuite coverage from this one and defer this change
for GCC 14?

> Regtested on x86_64-pc-linux-gnu.
> 
> -- >8 --
>   PR sanitizer/108060
>   PR sanitizer/109050
> 
> gcc/c-family/ChangeLog:
> 
>   * c-ubsan.cc (ubsan_maybe_instrument_array_ref): Don't copy_node.
> 
> gcc/testsuite/ChangeLog:
> 
>   * c-c++-common/ubsan/bounds-17.c: New test.
>   * c-c++-common/ubsan/bounds-18.c: New test.
>   * c-c++-common/ubsan/bounds-19.c: New test.
>   * c-c++-common/ubsan/bounds-20.c: New test.
>   * c-c++-common/ubsan/bounds-21.c: New test.

Jakub



Re: [aarch64] Use dup and zip1 for interleaving elements in initializing vector

2023-03-10 Thread Richard Sandiford via Gcc-patches
Sorry for the slow reply.

Prathamesh Kulkarni  writes:
> Unfortunately it regresses code-gen for the following case:
>
> svint32_t f(int32x4_t x)
> {
>   return svdupq_s32 (x[0], x[1], x[2], x[3]);
> }
>
> -O2 code-gen with trunk:
> f:
> dup z0.q, z0.q[0]
> ret
>
> -O2 code-gen with patch:
> f:
> dup s1, v0.s[1]
> movv2.8b, v0.8b
> ins v1.s[1], v0.s[3]
> ins v2.s[1], v0.s[2]
> zip1v0.4s, v2.4s, v1.4s
> dup z0.q, z0.q[0]
> ret
>
> IIUC, svdupq_impl::expand uses aarch64_expand_vector_init
> to initialize the "base 128-bit vector" and then use dupq to replicate it.
>
> Without patch, aarch64_expand_vector_init generates fallback code, and then
> combine optimizes a sequence of vec_merge/vec_select pairs into an assignment:
>
> (insn 7 3 8 2 (set (reg:SI 99)
> (vec_select:SI (reg/v:V4SI 97 [ x ])
> (parallel [
> (const_int 1 [0x1])
> ]))) "bar.c":6:10 2592 {aarch64_get_lanev4si}
>  (nil))
>
> (insn 13 9 15 2 (set (reg:V4SI 102)
> (vec_merge:V4SI (vec_duplicate:V4SI (reg:SI 99))
> (reg/v:V4SI 97 [ x ])
> (const_int 2 [0x2]))) "bar.c":6:10 1794 {aarch64_simd_vec_setv4si}
>  (expr_list:REG_DEAD (reg:SI 99)
> (expr_list:REG_DEAD (reg/v:V4SI 97 [ x ])
> (nil
>
> into:
> Trying 7 -> 13:
> 7: r99:SI=vec_select(r97:V4SI,parallel)
>13: r102:V4SI=vec_merge(vec_duplicate(r99:SI),r97:V4SI,0x2)
>   REG_DEAD r99:SI
>   REG_DEAD r97:V4SI
> Successfully matched this instruction:
> (set (reg:V4SI 102)
> (reg/v:V4SI 97 [ x ]))
>
> which eventually results into:
> (note 2 25 3 2 NOTE_INSN_DELETED)
> (note 3 2 7 2 NOTE_INSN_FUNCTION_BEG)
> (note 7 3 8 2 NOTE_INSN_DELETED)
> (note 8 7 9 2 NOTE_INSN_DELETED)
> (note 9 8 13 2 NOTE_INSN_DELETED)
> (note 13 9 15 2 NOTE_INSN_DELETED)
> (note 15 13 17 2 NOTE_INSN_DELETED)
> (note 17 15 18 2 NOTE_INSN_DELETED)
> (note 18 17 22 2 NOTE_INSN_DELETED)
> (insn 22 18 23 2 (parallel [
> (set (reg/i:VNx4SI 32 v0)
> (vec_duplicate:VNx4SI (reg:V4SI 108)))
> (clobber (scratch:VNx16BI))
> ]) "bar.c":7:1 5202 {aarch64_vec_duplicate_vqvnx4si_le}
>  (expr_list:REG_DEAD (reg:V4SI 108)
> (nil)))
> (insn 23 22 0 2 (use (reg/i:VNx4SI 32 v0)) "bar.c":7:1 -1
>  (nil))
>
> I was wondering if we should add the above special case, of assigning
> target = vec in aarch64_expand_vector_init, if initializer is {
> vec[0], vec[1], ... } ?

I'm not sure it will be easy to detect that.  Won't the inputs to
aarch64_expand_vector_init just be plain registers?  It's not a
good idea in general to search for definitions of registers
during expansion.

It would be nice to fix this by lowering svdupq into:

(a) a constructor for a 128-bit vector
(b) a duplication of the 128-bit vector to fill an SVE vector

But I'm not sure what the best way of doing (b) would be.
In RTL we can use vec_duplicate, but I don't think gimple
has an equivalent construct.  Maybe Richi has some ideas.

We're planning to implement the ACLE's Neon-SVE bridge:
https://github.com/ARM-software/acle/blob/main/main/acle.md#neon-sve-bridge
and so we'll need (b) to implement the svdup_neonq functions.

Thanks,
Richard


Re: [PATCH v2 0/5] A small Texinfo refinement

2023-03-10 Thread Sandra Loosemore via Gcc-patches

On 3/10/23 10:51, Sandra Loosemore wrote:

On 3/10/23 01:50, Iain Sandoe wrote:

Hi all,


On 9 Mar 2023, at 23:35, Sandra Loosemore via Gcc-patches
 wrote:

On 3/9/23 01:26, Richard Biener wrote:


SLES 12 has texinfo 4.13a, SLES 15 has texinfo 6.5.  We still
provide up-to-date GCC for SLES 12 but we can probably manage in
some ways when the texinfo requirement gets bumped.


OK, this seems to be the oldest version anyone admits to actually
using.  I built the manual with Arsen's patches using 4.13a; the
build was successful, and I didn't see any obvious issues with the
@gol removal in either the PDF or HTML output, so I think we are OK
for backward compatibility.


FWIW macOS/Darwin (as delivered by Apple) is stuck on 4.8 (and,
presumably, very unlikely to advance), but I would expect most macOS
FOSS users have something newer installed, either self-built or via
macposrts/homebrew etc. so the “admits to actually using” applies
here too I think (personally, I am using 6.7 but not for any special
reason other than it was current when I updated  my local toolset).
So I think Darwin can also manage with a newer requirement.


Well, with 4.8 being too old to produce PDF output, that does kind of 
kill my idea of replacing the existing requirement for a specific 
Texinfo version with "the version that comes with your OS distribution 
is good enough".  :-(


AFAIK we have not knowingly changed any specific requirements beyond the 
stated 4.7 and 4.9 for PDF output, but it concerns me that nobody is 
likely to be using versions that old on a regular basis to make sure 
they continue to work and we haven't unknowingly introduced dependencies 
on newer Texinfo features.


Anyway, I think I will leave the existing requirement alone for now, and 
just add a note that newer versions produce better output.


Oh, I need to take that back -- looking now, it's 4.8 for PDF output, 
not 4.9 like I was thinking before I had enough caffeine in my 
bloodstream.  I'll leave the version check in place but add some text 
suggesting a more recent version, anyway.


-Sandra



Re: [PATCH v2] ubsan: missed -fsanitize=bounds for compound ops [PR108060]

2023-03-10 Thread Marek Polacek via Gcc-patches
On Fri, Mar 10, 2023 at 07:07:36PM +0100, Jakub Jelinek wrote:
> On Thu, Mar 09, 2023 at 07:44:53PM -0500, Marek Polacek wrote:
> > On Thu, Mar 09, 2023 at 09:44:49AM +0100, Jakub Jelinek wrote:
> > > On Thu, Mar 09, 2023 at 08:12:47AM +, Richard Biener wrote:
> > > > I think this is a reasonable way to address the regression, so OK.
> > > 
> > > It is true that both C and C++ (including c++14_down and c++17 and later
> > > where the latter have different ordering rules) evaluate the lhs of
> > > MODIFY_EXPR after rhs, so conceptually this patch makes sense.
> > 
> > Thank you both for taking a look.
> > 
> > > But I wonder why we do in ubsan_maybe_instrument_array_ref:
> > >   if (e != NULL_TREE)
> > > {
> > >   tree t = copy_node (*expr_p);
> > >   TREE_OPERAND (t, 1) = build2 (COMPOUND_EXPR, TREE_TYPE (op1),
> > > e, op1);
> > >   *expr_p = t;
> > > }
> > > rather than modification of the ARRAY_REF's operand in place.  If we
> > > did that, we wouldn't really care about the order, shared tree would
> > > be instrumented once, with SAVE_EXPR in there making sure we don't
> > > compute that multiple times.  Is that because the 2 copies could
> > > have side-effects and we do want to evaluate those multiple times?
> > 
> > I'd assumed that that was the point of the copy_node.  But now that
> > I'm actually experimenting with it, I can't trigger any problems
> > without the copy_node.  So maybe we can use the following patch, which
> > also adds a new test, bounds-21.c, to check that side-effects are
> > evaluated correctly.  I didn't bother writing a description for this
> > patch yet because I sort of think we should apply both patches at the
> > same time.  
> 
> Perhaps it would be safer to apply for GCC 13 just your first patch
> and maybe the testsuite coverage from this one and defer this change
> for GCC 14?

That sounds good, I'll push the original patch with the new test now.
Thanks.
 
> > Regtested on x86_64-pc-linux-gnu.
> > 
> > -- >8 --
> > PR sanitizer/108060
> > PR sanitizer/109050
> > 
> > gcc/c-family/ChangeLog:
> > 
> > * c-ubsan.cc (ubsan_maybe_instrument_array_ref): Don't copy_node.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * c-c++-common/ubsan/bounds-17.c: New test.
> > * c-c++-common/ubsan/bounds-18.c: New test.
> > * c-c++-common/ubsan/bounds-19.c: New test.
> > * c-c++-common/ubsan/bounds-20.c: New test.
> > * c-c++-common/ubsan/bounds-21.c: New test.
> 
>   Jakub
> 

Marek



Re: Patch ping: Re: [PATCH] libgcc, i386, optabs, v2: Add __float{, un}tibf to libgcc and expand BF -> integral through SF intermediate [PR107703]

2023-03-10 Thread Ian Lance Taylor
Jakub Jelinek  writes:

> On Wed, Mar 01, 2023 at 01:32:43PM +0100, Jakub Jelinek via Gcc-patches wrote:
>> On Wed, Nov 16, 2022 at 12:51:14PM +0100, Jakub Jelinek via Gcc-patches 
>> wrote:
>> > On Wed, Nov 16, 2022 at 10:06:17AM +0100, Jakub Jelinek via
>> > Gcc-patches wrote:
>> > > Thoughts on this?  I guess my preference would be the BF -> SF -> TI
>> > > path because we won't need to waste
>> > > 32: 00015e10 321 FUNC GLOBAL DEFAULT 13
>> > > __fixbfti@@GCC_13.0.0
>> > > 89: 00015f60 299 FUNC GLOBAL DEFAULT 13
>> > > __fixunsbfti@@GCC_13.0.0
>> > > If so, I'd need to cut the fix parts of the patch below and
>> > > do something in the middle-end.
>> > 
>> > Here is adjusted patch that does that.
>> > 
>> > 2022-11-16  Jakub Jelinek  
>> > 
>> >PR target/107703
>> >* optabs.cc (expand_fix): For conversions from BFmode to integral,
>> >use shifts to convert it to SFmode first and then convert SFmode
>> >to integral.
>> > 
>> >* soft-fp/floattibf.c: New file.
>> >* soft-fp/floatuntibf.c: New file.
>> >* config/i386/libgcc-glibc.ver: Export __float{,un}tibf @ GCC_13.0.0.
>> >* config/i386/64/t-softfp (softfp_extras): Add floattibf and
>> >floatuntibf.
>> >(CFLAGS-floattibf.c, CFLAGS-floatunstibf.c): Add -msse2.
>> 
>> I'd like to ping the libgcc non-i386 part of this patch, Uros said the i386
>> part is ok but that one depends on the generic libgcc changes.
>> I'll ping the optabs.cc change separately.
>> 
>> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606398.html
>> with more info in
>> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606382.html
>
> I'd like to ping this again.  I've posted the previously added
> bfloat16 changes as well as the above 2 new files to libc-alpha as well
> https://sourceware.org/pipermail/libc-alpha/2023-March/146246.html
> if it makes the review easier.


The libgcc parts of this are fine.  Thanks.

Ian


Re: [PATCH] c++, abi: Fix up class layout with bitfields [PR109039]

2023-03-10 Thread Jason Merrill via Gcc-patches

On 3/9/23 14:40, Jakub Jelinek wrote:

Hi!

The following testcase FAILs, because starting with r12-6028
the S class has only 2 bytes, not enough to hold one 7-bit bitfield, one 8-bit
bitfield and one 8-bit char field.

The reason is that when end_of_class attempts to compute dsize, it simply
adds byte_position of the field and DECL_SIZE_UNIT (and uses maximum from
those offsets).
The problematic bit-field in question has bit_position 7, byte_position 0,
DECL_SIZE 8 and DECL_SIZE_UNIT 1.  So, byte_position + DECL_SIZE_UNIT is
1, even when the bitfield only has a single bit in the first byte and 7
further bits in the second byte, so per the Itanium ABI it should be 2:
"In either case, update dsize(C) to include the last byte
containing (part of) the bit-field, and update sizeof(C) to
max(sizeof(C),dsize(C))."

The following patch fixes it by computing bitsize of the end and using
CEIL_DIV_EXPR division to round it to next byte boundary and convert
from bits to bytes.

While this is an ABI change, classes with such incorrect layout couldn't
have worked properly, so I doubt anybody is actually running it often
in the wild.  Thus I think adding some ABI warning for it is unnecessary.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk
(and after a while for GCC 12)?


OK.


2023-03-09  Jakub Jelinek  

PR c++/109039
* class.cc (end_of_class): For bit-fields, instead of computing
offset as sum of byte_position (field) and DECL_SIZE_UNIT (field),
compute it as sum of bit_position (field) and DECL_SIZE (field)
divided by BITS_PER_UNIT rounded up.

* g++.dg/abi/no_unique_address7.C: New test.

--- gcc/cp/class.cc.jj  2023-02-04 06:22:17.053407477 +0100
+++ gcc/cp/class.cc 2023-03-09 18:02:43.967815721 +0100
@@ -6476,7 +6476,15 @@ end_of_class (tree t, eoc_mode mode)
 size of the type (usually 1) for computing nvsize.  */
  size = TYPE_SIZE_UNIT (TREE_TYPE (field));
  
-	offset = size_binop (PLUS_EXPR, byte_position (field), size);

+   if (DECL_BIT_FIELD_TYPE (field))
+ {
+   offset = size_binop (PLUS_EXPR, bit_position (field),
+DECL_SIZE (field));
+   offset = size_binop (CEIL_DIV_EXPR, offset, bitsize_unit_node);
+   offset = fold_convert (sizetype, offset);
+ }
+   else
+ offset = size_binop (PLUS_EXPR, byte_position (field), size);
if (tree_int_cst_lt (result, offset))
  result = offset;
}
--- gcc/testsuite/g++.dg/abi/no_unique_address7.C.jj2023-03-09 
18:09:08.397205087 +0100
+++ gcc/testsuite/g++.dg/abi/no_unique_address7.C   2023-03-09 
18:08:56.439379395 +0100
@@ -0,0 +1,33 @@
+// PR c++/109039
+// { dg-do run { target c++11 } }
+
+struct X {
+  signed short x0 : 7;
+  signed short x1 : 8;
+  X () : x0 (1), x1 (2) {}
+  int get () { return x0 + x1; }
+};
+
+struct S {
+  [[no_unique_address]] X x;
+  signed char c;
+  S () : c (0) {}
+};
+
+S s;
+
+int
+main ()
+{
+  if (s.x.x0 != 1 || s.x.x1 != 2 || s.c != 0)
+__builtin_abort ();
+  s.x.x0 = -1;
+  s.x.x1 = -1;
+  if (s.x.x0 != -1 || s.x.x1 != -1 || s.c != 0)
+__builtin_abort ();
+  s.c = -1;
+  s.x.x0 = 0;
+  s.x.x1 = 0;
+  if (s.x.x0 != 0 || s.x.x1 != 0 || s.c != -1)
+__builtin_abort ();
+}

Jakub





Re: [PATCH] c++: Don't clear TREE_READONLY for -fmerge-all-constants for non-aggregates [PR107558]

2023-03-10 Thread Jason Merrill via Gcc-patches

On 11/24/22 04:13, Jakub Jelinek wrote:

Hi!

The following testcase ICEs, because OpenMP lowering for shared clause
on l variable with REFERENCE_TYPE creates POINTER_TYPE to REFERENCE_TYPE.
The reason is that the automatic variable has non-trivial construction
(reference to a lambda) and -fmerge-all-constants is on and so TREE_READONLY
isn't set - omp-low will handle automatic TREE_READONLY vars in shared
specially and only copy to the construct and not back, while !TREE_READONLY
are assumed to be changeable.
The PR91529 change rationale was that the gimplification can change
some non-addressable automatic variables to TREE_STATIC with
-fmerge-all-constants and therefore TREE_READONLY on them is undesirable.
But, the gimplifier does that only for aggregate variables:
   switch (TREE_CODE (type))
 {
 case RECORD_TYPE:
 case UNION_TYPE:
 case QUAL_UNION_TYPE:
 case ARRAY_TYPE:
and not for anything else.  So, I think clearing TREE_READONLY for
automatic integral or reference or pointer etc. vars for
-fmerge-all-constants only is unnecessary.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK.


2022-11-24  Jakub Jelinek  

PR c++/107558
* decl.cc (cp_finish_decl): Don't clear TREE_READONLY on
automatic non-aggregate variables just because of
-fmerge-all-constants.

* g++.dg/gomp/pr107558.C: New test.

--- gcc/cp/decl.cc.jj   2022-11-19 09:21:14.662439877 +0100
+++ gcc/cp/decl.cc  2022-11-23 13:12:31.866553152 +0100
@@ -8679,8 +8679,10 @@ cp_finish_decl (tree decl, tree init, bo
  
if (var_definition_p

  /* With -fmerge-all-constants, gimplify_init_constructor
-might add TREE_STATIC to the variable.  */
- && (TREE_STATIC (decl) || flag_merge_constants >= 2))
+might add TREE_STATIC to aggregate variables.  */
+ && (TREE_STATIC (decl)
+ || (flag_merge_constants >= 2
+ && AGGREGATE_TYPE_P (type
{
  /* If a TREE_READONLY variable needs initialization
 at runtime, it is no longer readonly and we need to
--- gcc/testsuite/g++.dg/gomp/pr107558.C.jj 2022-11-23 13:13:27.260736525 
+0100
+++ gcc/testsuite/g++.dg/gomp/pr107558.C2022-11-23 13:15:22.271041005 
+0100
@@ -0,0 +1,14 @@
+// PR c++/107558
+// { dg-do compile { target c++11 } }
+// { dg-additional-options "-fmerge-all-constants" }
+// { dg-additional-options "-flto" { target lto } }
+
+int a = 15;
+
+void
+foo ()
+{
+  auto &&l = [&]() { return a; };
+#pragma omp target parallel
+  l ();
+}

Jakub





Re: [PATCH] c++ testsuite: Add test for PR107703

2023-03-10 Thread Jason Merrill via Gcc-patches

On 3/10/23 10:43, Jakub Jelinek wrote:

Hi!

This is on top of the
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606398.html
and
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/613724.html
patches (to be precise, the latter isn't essential for it), I've
realized that for the PR107703 bugfix in the first patch I haven't
added some test coverage that the extended floating vs. integral
or vice versa conversions work correctly.

This new testcase adds such checks.  And when writing it I've
found that in ext-floating.h header in the testsuite I forgot back
in November to remove #undef __STDCPP_BFLOAT16_T__ which was left
there because the bfloat16 support wasn't in yet.

The new testcase (and all older ext-floating*.C tests too) passes
on vanilla trunk without the ext-floating.h change (x86_64-linux
-m32/-m64) and with the PR107703 fix also with the ext-floating.h
change.

Ok for trunk?


OK.


2023-03-10  Jakub Jelinek  

PR target/107703
* g++.dg/cpp23/ext-floating.h (__STDCPP_BFLOAT16_T__): Don't undefine
it.
(std::bfloat16_t): Use decltype (0.0bf16) like libstdc++, rather than
__bf16.
* g++.dg/cpp23/ext-floating14.C: New test.

--- gcc/testsuite/g++.dg/cpp23/ext-floating.h.jj2022-09-27 
08:03:27.118982749 +0200
+++ gcc/testsuite/g++.dg/cpp23/ext-floating.h   2023-03-10 15:04:01.647824767 
+0100
@@ -14,9 +14,8 @@ namespace std
#ifdef __STDCPP_FLOAT128_T__
using float128_t = _Float128;
#endif
-  #undef __STDCPP_BFLOAT16_T__
#ifdef __STDCPP_BFLOAT16_T__
-  using bfloat16_t = __bf16; // ???
+  using bfloat16_t = decltype (0.0bf16);
#endif
template struct integral_constant {
  static constexpr T value = v;
--- gcc/testsuite/g++.dg/cpp23/ext-floating14.C.jj  2023-03-10 
14:12:17.658925358 +0100
+++ gcc/testsuite/g++.dg/cpp23/ext-floating14.C 2023-03-10 15:32:26.912057825 
+0100
@@ -0,0 +1,585 @@
+// P1467R9 - Extended floating-point types and standard names.
+// PR target/107703
+// { dg-do run { target c++23 } }
+// { dg-options "-fexcess-precision=standard" }
+
+#include "ext-floating.h"
+
+#ifdef __SIZEOF_INT128__
+#define INT128_MAX ((signed __int128) ((~(unsigned __int128) 0) >> 1))
+#endif
+
+template 
+[[gnu::noipa]] T cvt (F f)
+{
+  return T (F (f));
+}
+
+int
+main ()
+{
+  // __FLT32_MAX_EXP__ is 128, so make sure all unsigned long long and 
unsigned __int128
+  // values fit into it.  __FLT16_MAX__ is 65504.0f16, so we need to be
+  // careful for that.
+#if __SIZEOF_LONG_LONG__ * __CHAR_BIT__ <= 128
+#if !defined(__SIZEOF_INT128__) || __SIZEOF_INT128__ * __CHAR_BIT__ == 128
+#ifdef __STDCPP_FLOAT16_T__
+  if (cvt  (42) != (std::float16_t) 42
+  || cvt  (-42) != (std::float16_t) -42
+#if __SCHAR_MAX__ < 65504
+  || cvt  (__SCHAR_MAX__) != (std::float16_t) 
__SCHAR_MAX__
+  || cvt  (-__SCHAR_MAX__ - 1) != 
(std::float16_t) (-__SCHAR_MAX__ - 1)
+#endif
+ )
+__builtin_abort ();
+  if (cvt  (42) != (std::float16_t) 42
+#if __SCHAR_MAX__ * 2 + 1 < 65504
+  || cvt  ((unsigned char) ~0) != 
(std::float16_t) ((unsigned char) ~0)
+#endif
+ )
+__builtin_abort ();
+  if (cvt  (42) != (std::float16_t) 42
+  || cvt  (-42) != (std::float16_t) -42
+#if __SHRT_MAX__ < 65504
+  || cvt  (__SHRT_MAX__) != (std::float16_t) 
__SHRT_MAX__
+  || cvt  (-__SHRT_MAX__ - 1) != 
(std::float16_t) (-__SHRT_MAX__ - 1)
+#else
+  || cvt  (65504) != (std::float16_t) 65504
+  || cvt  (-65504) != (std::float16_t) -65504
+#endif
+ )
+__builtin_abort ();
+  if (cvt  (42) != (std::float16_t) 42
+  || cvt  (65504U) != (std::float16_t) 
65504U)
+__builtin_abort ();
+  if (cvt  (42) != (std::float16_t) 42
+  || cvt  (-42) != (std::float16_t) -42
+#if __INT_MAX__ < 65504
+  || cvt  (__INT_MAX__) != (std::float16_t) 
__INT_MAX__
+  || cvt  (-__INT_MAX__ - 1) != 
(std::float16_t) (-__INT_MAX__ - 1)
+#else
+  || cvt  (65504) != (std::float16_t) 65504
+  || cvt  (-65504) != (std::float16_t) -65504
+#endif
+ )
+__builtin_abort ();
+  if (cvt  (42) != (std::float16_t) 42U
+  || cvt  (65504U) != (std::float16_t) 
65504U)
+__builtin_abort ();
+  if (cvt  (42L) != (std::float16_t) 42L
+  || cvt  (-42L) != (std::float16_t) -42L
+  || cvt  (65504L) != (std::float16_t) 
65504L
+  || cvt  (-65504L) != (std::float16_t) 
-65504L)
+__builtin_abort ();
+  if (cvt  (42UL) != (std::float16_t) 42UL
+  || cvt  (65504UL) != (std::float16_t) 
65504UL)
+__builtin_abort ();
+  if (cvt  (42LL) != (std::float16_t) 
42LL
+  || cvt  (-42LL) != 
(std::float16_t) -42LL
+  || cvt  (65504LL) != 
(std::float16_t) 65504LL
+  || cvt  (-65504LL) != 
(std::float16_t) -65504LL)
+__builtin_abort ();
+  if (cvt  (42ULL) != (std::float16_t) 
42ULL
+  || cvt  (65504ULL) != 
(std::float16_t) 65504ULL)
+__builtin_abort ();
+#ifdef __SIZEOF_INT128__
+  if (cvt  (42LL) != (std::float16_t) (signed 
__int128) 42LL
+  || cvt  (-42LL) != 

Re: [PATCH] c++: ICE with constexpr lambda [PR107280]

2023-03-10 Thread Jason Merrill via Gcc-patches

On 3/10/23 11:17, Marek Polacek wrote:

We crash here since r10-3661, the store_init_value hunk in particular.
Before, we called cp_fully_fold_init, so e.g.

   {.str=VIEW_CONVERT_EXPR("")}

was folded into

   {.str=""}

but now we don't fold and keep the VCE around, and it causes trouble in
cxx_eval_store_expression: in the !refs->is_empty () loop we descend on
.str's initializer but since it's wrapped in a VCE, we skip the STRING_CST
check and then crash on the CONSTRUCTOR_NO_CLEARING.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/12?

PR c++/107280

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_store_expression): Strip location wrappers.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/constexpr-lambda28.C: New test.
---
  gcc/cp/constexpr.cc |  3 ++-
  gcc/testsuite/g++.dg/cpp1z/constexpr-lambda28.C | 15 +++
  2 files changed, 17 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1z/constexpr-lambda28.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 8683c00596a..abf6ee560c5 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -6033,7 +6033,8 @@ cxx_eval_store_expression (const constexpr_ctx *ctx, tree 
t,
  *valp = build_constructor (type, NULL);
  CONSTRUCTOR_NO_CLEARING (*valp) = no_zero_init;
}
-  else if (TREE_CODE (*valp) == STRING_CST)
+  else if (STRIP_ANY_LOCATION_WRAPPER (*valp),
+  TREE_CODE (*valp) == STRING_CST)


Seems like this is stripping the location wrapper when we try to modify 
the string; I think we want to strip it earlier, when we first 
initialize the array member.


Jason



[pushed] c++: constrained lambda error-recovery [PR108972]

2023-03-10 Thread Jason Merrill via Gcc-patches
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

Better not to ICE after various valid errors.

PR c++/108972

gcc/cp/ChangeLog:

* lambda.cc (compare_lambda_template_head): Check more
for error_mark_node.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-lambda3.C: Run at lower std levels,
but expect errors.
---
 gcc/cp/lambda.cc  | 4 
 gcc/testsuite/g++.dg/cpp2a/concepts-lambda3.C | 4 ++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/gcc/cp/lambda.cc b/gcc/cp/lambda.cc
index c752622816d..212990a21bf 100644
--- a/gcc/cp/lambda.cc
+++ b/gcc/cp/lambda.cc
@@ -1537,6 +1537,8 @@ compare_lambda_template_head (tree tmpl_a, tree tmpl_b)
  if (parm_a == error_mark_node)
return false;
  parm_a = TREE_VALUE (parm_a);
+ if (parm_a == error_mark_node)
+   return false;
  if (DECL_VIRTUAL_P (parm_a))
parm_a = NULL_TREE;
}
@@ -1548,6 +1550,8 @@ compare_lambda_template_head (tree tmpl_a, tree tmpl_b)
  if (parm_b == error_mark_node)
return false;
  parm_b = TREE_VALUE (parm_b);
+ if (parm_b == error_mark_node)
+   return false;
  if (DECL_VIRTUAL_P (parm_b))
parm_b = NULL_TREE;
}
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-lambda3.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-lambda3.C
index 291e451ca1a..b18e6b62aa4 100644
--- a/gcc/testsuite/g++.dg/cpp2a/concepts-lambda3.C
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-lambda3.C
@@ -1,4 +1,5 @@
-// { dg-do run { target c++20 } }
+// { dg-do run }
+// { dg-excess-errors "" { target { ! concepts } } } (PR108972)
 
 template
 concept C1 = __is_same_as(T, int)
@@ -61,4 +62,3 @@ int main(int, char**)
 
   return 0;
 }
-

base-commit: e1c8cf9006bd278e969ab7ed35178067ce128f32
-- 
2.31.1



[pushed] MAINTAINERS: add myself to write after approval

2023-03-10 Thread Arsen Arsenović via Gcc-patches
ChangeLog:

* MAINTAINERS (Write After Approval): Add myself.
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index a61d3ae06df..3c533cb651d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -318,6 +318,7 @@ from other maintainers or reviewers.
 Mark G. Adams  
 Pedro Alves
 Paul-Antoine Arras 
+Arsen Arsenović
 Raksit Ashok   
 Matt Austern   
 David Ayers
-- 
2.39.2



Re: [PATCH v2 0/5] A small Texinfo refinement

2023-03-10 Thread Gerald Pfeifer
On Fri, 10 Mar 2023, Sandra Loosemore wrote:
> AFAIK we have not knowingly changed any specific requirements beyond the 
> stated 4.7 and 4.9 for PDF output, but it concerns me that nobody is 
> likely to be using versions that old on a regular basis to make sure 
> they continue to work and we haven't unknowingly introduced dependencies 
> on newer Texinfo features.

I'm generally very interested in ensuring we do not hurt users who do not 
have the latest and greatest of the day. On the other hand, if there's a 
few people using (more or less deliberately abandonware) we should not 
feel too bad if something breaks.

> Anyway, I think I will leave the existing requirement alone for now, and 
> just add a note that newer versions produce better output.

With Richi mentioning that SLE 12 (which was first released 9 years ago) 
uses texinfo 4.13a and Andrew mentioning that RHEL 7 uses texinfo 5.1 I
would feel very comfortable making either 4.13 or even 5.1 the new minimum.

(Not because we need to cater to those two Enterprise Linux distros, 
rather since they tend to fall on the conversative side.)

Gerald


[PATCH] c++: suppress -Wdangling-reference for std::span [PR107532]

2023-03-10 Thread Marek Polacek via Gcc-patches
std::span is a view and therefore should be treated as a reference
wrapper class for the purposes of -Wdangling-reference.  I'm not sure
there's a pattern that we could check for.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR c++/107532

gcc/cp/ChangeLog:

* call.cc (reference_like_class_p): Check for std::span.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wdangling-reference10.C: New test.
---
 gcc/cp/call.cc|  1 +
 gcc/testsuite/g++.dg/warn/Wdangling-reference10.C | 12 
 2 files changed, 13 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/warn/Wdangling-reference10.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index 3dfa12a0733..c01e7b82457 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -13800,6 +13800,7 @@ reference_like_class_p (tree ctype)
   tree name = DECL_NAME (tdecl);
   return (name
  && (id_equal (name, "reference_wrapper")
+ || id_equal (name, "span")
  || id_equal (name, "ref_view")));
 }
   for (tree fields = TYPE_FIELDS (ctype);
diff --git a/gcc/testsuite/g++.dg/warn/Wdangling-reference10.C 
b/gcc/testsuite/g++.dg/warn/Wdangling-reference10.C
new file mode 100644
index 000..733fb8cce63
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wdangling-reference10.C
@@ -0,0 +1,12 @@
+// PR c++/107532
+// { dg-do compile { target c++20 } }
+// { dg-options "-Wdangling-reference" }
+
+#include 
+#include 
+
+void f(const std::vector& v)
+{
+  const int& r = std::span(v)[0]; // { dg-bogus "dangling 
reference" }
+  (void) r;
+}

base-commit: 20d790aa3ea5b0d240032cab997b8e0938cac62c
-- 
2.39.2



Re: [patch, Fortran] Enable -fwrapv for -std=legacy

2023-03-10 Thread Steve Kargl via Gcc-patches
On Fri, Mar 10, 2023 at 07:01:29PM +0100, Richard Biener via Fortran wrote:
> 
> 
> > Am 10.03.2023 um 18:54 schrieb Thomas Koenig via Fortran 
> > :
> > 
> > Hello world, here's the patch that was discussed.
> > 
> > Regression-tested. OK for trunk?
> > 
> > Since this appeared only in gcc13, I see no need for a backport.
> > I will also document this in the changes file.
> 
> The „problem“ is latent forever, I’m not sure it’s good to
> amend the kitchen-sink std=legacy option with -fwrapv since
> that has quite some negative effects on optimization.

In that case, it would then seem logical to remove whatever
patch was added to -O3 that causes the massive regression with
rnflow.f90 and add it instead to -Ofast.  -Ofast at least
hints that is unsafe to use.

-- 
Steve


[PATCH] vect: Verify that GET_MODE_NUNITS is power-of-2

2023-03-10 Thread Michael Collison
While working on autovectorizing for the RISCV port I encountered an issue
where can_duplicate_and_interleave_p assumes that GET_MODE_NUNITS is a
power of two. The RISC-V target has vector modes (e.g. VNx1DImode) that
are not a power of two.

Tested on RISCV and x86_64-linux-gnu. Okay?

2023-03-09  Michael Collison  

* poly-int.h (exact_div_p): New function to
verify that argument is a power of 2 poly_int.
* tree-vect-slp.cc (can_duplicate_and_interleave_p):
Check that GET_MODE_NUNITS is a power of 2.
---
 gcc/poly-int.h   | 17 +
 gcc/tree-vect-slp.cc |  3 ++-
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/gcc/poly-int.h b/gcc/poly-int.h
index 12571455081..d09632f341f 100644
--- a/gcc/poly-int.h
+++ b/gcc/poly-int.h
@@ -2219,6 +2219,23 @@ multiple_p (const poly_int_pod &a, const 
poly_int_pod &b,
   return constant_multiple_p (a, b, multiple);
 }
 
+/* Return true, if A is known to be a multiple of B.  */
+
+template
+inline bool
+exact_div_p (const poly_int_pod &a, Cb b)
+{
+  typedef POLY_CONST_COEFF (Ca, Cb) C;
+  poly_int r;
+  for (unsigned int i = 0; i < N; i++)
+{
+  if ((a.coeffs[i] % b) != 0)
+   return false;
+
+}
+  return true;
+}
+
 /* Return A / B, given that A is known to be a multiple of B.  */
 
 template
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 9a4e000925e..6be2036a13a 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -426,7 +426,8 @@ can_duplicate_and_interleave_p (vec_info *vinfo, unsigned 
int count,
  if (vector_type
  && VECTOR_MODE_P (TYPE_MODE (vector_type))
  && known_eq (GET_MODE_SIZE (TYPE_MODE (vector_type)),
-  GET_MODE_SIZE (base_vector_mode)))
+  GET_MODE_SIZE (base_vector_mode))
+ && exact_div_p (GET_MODE_NUNITS (TYPE_MODE (vector_type)), 2))
{
  /* Try fusing consecutive sequences of COUNT / NVECTORS elements
 together into elements of type INT_TYPE and using the result
-- 
2.34.1



[COMMITTED/12] tree-optimization: [PR108684] ICE in verify_ssa due to simple_dce_from_worklist

2023-03-10 Thread Andrew Pinski via Gcc-patches
In simple_dce_from_worklist, we were removing an inline-asm which had a vdef.
We should not be removing inline-asm which have a vdef as this code
does not check to the store.
This fixes that oversight. This was a latent bug exposed recently
by both VRP and removal of stores to static starting to use
simple_dce_from_worklist.

Backported after bootstrapped and tested on x86_64-linux-gnu with no 
regressions.

PR tree-optimization/108684

gcc/ChangeLog:

* tree-ssa-dce.cc (simple_dce_from_worklist):
Check all ssa names and not just non-vdef ones
before accepting the inline-asm.
Call unlink_stmt_vdef on the statement before
removing it.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/dce-inline-asm-1.c: New test.
* gcc.c-torture/compile/dce-inline-asm-2.c: New test.
* gcc.dg/tree-ssa/pr108684-1.c: New test.

co-authored-by: Andrew Macleod  
(cherry picked from commit 6a5cb782d1486b378d70857c8efae558da0eb2cc)
---
 .../gcc.c-torture/compile/dce-inline-asm-1.c   | 15 +++
 .../gcc.c-torture/compile/dce-inline-asm-2.c   | 16 
 gcc/testsuite/gcc.dg/tree-ssa/pr108684-1.c | 18 ++
 gcc/tree-ssa-dce.cc|  5 +++--
 4 files changed, 52 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/dce-inline-asm-1.c
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/dce-inline-asm-2.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr108684-1.c

diff --git a/gcc/testsuite/gcc.c-torture/compile/dce-inline-asm-1.c 
b/gcc/testsuite/gcc.c-torture/compile/dce-inline-asm-1.c
new file mode 100644
index 000..a9f02e44bd7
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/dce-inline-asm-1.c
@@ -0,0 +1,15 @@
+/* PR tree-optimization/108684 */
+/* This used to ICE as when we remove the store to
+   `t`, we also would remove the inline-asm which
+   had a VDEF on it but we didn't update the
+   VUSE that was later on.  */
+static int t;
+
+int f (int *a)
+{
+  int t1;
+  asm (" " : "=X" (t1) : : "memory");
+  t = t1;
+  return *a;
+}
+
diff --git a/gcc/testsuite/gcc.c-torture/compile/dce-inline-asm-2.c 
b/gcc/testsuite/gcc.c-torture/compile/dce-inline-asm-2.c
new file mode 100644
index 000..a41b16e4bd0
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/dce-inline-asm-2.c
@@ -0,0 +1,16 @@
+/* PR tree-optimization/108684 */
+/* This used to ICE as when we removed the
+   __builtin_unreachable in VRP, as we
+   would also remove the branch and the
+   inline-asm. The inline-asm had a VDEF on it,
+   which we didn't update further along and
+   not have the VDEF on the return statement
+   updated.  */
+
+int f (int a)
+{
+  asm (" " : "=X" (a) : : "memory");
+  if (a)
+return 0;
+  __builtin_unreachable();
+}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr108684-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr108684-1.c
new file mode 100644
index 000..3ba206f765e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr108684-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+
+static int t;
+
+int f (int *a)
+{
+  int t1, t2 = 0;
+  asm ("shouldshowupstill %1" : "=r" (t1), "=m"(t2) : : );
+  t = t1;
+  return t2;
+}
+
+/* Check to make sure DCE does not remove the inline-asm as it writes to t2. */
+/* We used to DCE this inline-asm when removing the store to t. */
+/* { dg-final { scan-assembler "shouldshowupstill" } } */
+/* { dg-final { scan-tree-dump-times "shouldshowupstill" 1 "optimized" } } */
diff --git a/gcc/tree-ssa-dce.cc b/gcc/tree-ssa-dce.cc
index 4b7b664867d..4fa830dffd3 100644
--- a/gcc/tree-ssa-dce.cc
+++ b/gcc/tree-ssa-dce.cc
@@ -2062,9 +2062,9 @@ simple_dce_from_worklist (bitmap worklist)
 
   /* The defining statement needs to be defining only this name.
 ASM is the only statement that can define more than one
-(non-virtual) name. */
+name. */
   if (is_a(t)
- && !single_ssa_def_operand (t, SSA_OP_DEF))
+ && !single_ssa_def_operand (t, SSA_OP_ALL_DEFS))
continue;
 
   /* Don't remove statements that are needed for non-call
@@ -2094,6 +2094,7 @@ simple_dce_from_worklist (bitmap worklist)
remove_phi_node (&gsi, true);
   else
{
+ unlink_stmt_vdef (t);
  gsi_remove (&gsi, true);
  release_defs (t);
}
-- 
2.17.1



[committed] testsuite: gcc.dg/pr106397.c: Add -w to options

2023-03-10 Thread Hans-Peter Nilsson via Gcc-patches
Committed as obvious.
-- >8 --
Targets that don't support prefetching get a warning:
cc1: warning: '-fprefetch-loop-arrays' not supported for this target

Align with other tests passing -fprefetch-loop-arrays for
all targets: add "-w" to options.

* gcc.dg/pr106397.c: Add -w to options.
---
 gcc/testsuite/gcc.dg/pr106397.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/pr106397.c b/gcc/testsuite/gcc.dg/pr106397.c
index 2bc17f8cf806..2dd04b870775 100644
--- a/gcc/testsuite/gcc.dg/pr106397.c
+++ b/gcc/testsuite/gcc.dg/pr106397.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -fprefetch-loop-arrays --param l2-cache-size=0 --param 
prefetch-latency=3 -fprefetch-loop-arrays" } */
+/* { dg-options "-O3 -fprefetch-loop-arrays -w --param l2-cache-size=0 --param 
prefetch-latency=3 -fprefetch-loop-arrays" } */
 /* { dg-additional-options "-march=i686 -msse" { target { { i?86-*-* 
x86_64-*-* } && ia32 } } } */
 
 int
-- 
2.30.2



[committed] testsuite: gcc.dg/pr108117.c: Require effective-target scheduling

2023-03-10 Thread Hans-Peter Nilsson via Gcc-patches
Committed as obvious.
-- >8 --
Targets that don't support scheduling get a warning:
cc1: warning: instruction scheduling not supported on this target machine

Do like other target-generic tests unconditionally passing
-fschedule-insns: require effective target scheduling.

* gcc.dg/pr108117.c: Require effective-target scheduling.
---
 gcc/testsuite/gcc.dg/pr108117.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.dg/pr108117.c b/gcc/testsuite/gcc.dg/pr108117.c
index ae151693e2f9..4b3bebe229e5 100644
--- a/gcc/testsuite/gcc.dg/pr108117.c
+++ b/gcc/testsuite/gcc.dg/pr108117.c
@@ -1,5 +1,6 @@
 /* { dg-do run } */
 /* { dg-require-effective-target nonlocal_goto } */
+/* { dg-require-effective-target scheduling } */
 /* { dg-options "-O2 -fschedule-insns" } */
 
 #include 
-- 
2.30.2



[committed] testsuite: Tweak check_fork_available for CRIS

2023-03-10 Thread Hans-Peter Nilsson via Gcc-patches
This takes care of the failing gcc.dg/torture/ftrapv-1.c and
-ftrapv-2.c for cris-elf.

For simplicity, assume simulators are the GNU simulator (in the gdb
repo).  But cris-elf is newlib, so a newlib target forking?  Yes: the
I/O, etc. interface to the simulator uses the Linux/CRIS ABI.

* lib/target-supports.exp (check_fork_available): Don't signal
true for CRIS running on a simulator.
---
 gcc/testsuite/lib/target-supports.exp | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index a4fbc1998c70..335e24b23b12 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2880,6 +2880,12 @@ proc check_fork_available {} {
# tell as we're doing partial links for kernel modules.
return 0
  }
+if { [istarget cris-*-*] } {
+   # Compiling and linking works, and an executable running e.g.
+   # gcc.dg/torture/ftrapv-1.c works on now-historical hardware,
+   # but the GNU simulator emits an error for the fork syscall.
+   return [check_effective_target_hw]
+}
 return [check_function_available "fork"]
 }
 
-- 
2.30.2



[PATCH] update copyright year in libstdc++ manual

2023-03-10 Thread Jonny Grant
docs: update copyright year in libstdc++ manual

gcc/ChangeLog
* libstdc++-v3/doc/xml/faq.xml: update copyright year in libstdc++ 
manual

---
 libstdc++-v3/doc/xml/faq.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/doc/xml/faq.xml b/libstdc++-v3/doc/xml/faq.xml
index 9ae4966ecea..42354f87af7 100644
--- a/libstdc++-v3/doc/xml/faq.xml
+++ b/libstdc++-v3/doc/xml/faq.xml
@@ -7,7 +7,7 @@
 
   
 
-  2008-2018
+  2008-2023
 
 
   http://www.w3.org/1999/xlink"; 
xlink:href="https://www.fsf.org";>FSF
-- 2.37.2



[PATCH 0/3] OpenMP 5.0: Strided updates and array shape-operator support (C++)

2023-03-10 Thread Julian Brown
This series implements support for the "array shape-operator" (OpenMP
5.0, 2.1.4) and strided accesses for update operations. It follows on
from the in-review series:

  https://gcc.gnu.org/pipermail/gcc-patches/2022-December/609031.html

and makes use of some of the infrastructure introduced in those patches.

Further comments on individual patches. Tested with offloading to NVPTX.

OK (for stage 1)?

Thanks,

Julian

Julian Brown (3):
  OpenMP: Fix "exit data" for array sections for ref-to-ptr components
  OpenMP: Allow complete replacement of clause during map/to/from
expansion
  OpenMP: Support strided and shaped-array updates for C++

 gcc/c-family/c-common.h   |  12 +-
 gcc/c-family/c-omp.cc | 277 ---
 gcc/c-family/c-pretty-print.cc|   5 +
 gcc/c/c-parser.cc |  32 +-
 gcc/c/c-tree.h|   2 +-
 gcc/c/c-typeck.cc |  58 ++-
 gcc/cp/cp-objcp-common.cc |   1 +
 gcc/cp/cp-tree.def|   1 +
 gcc/cp/cp-tree.h  |  13 +-
 gcc/cp/decl.cc|  75 +++
 gcc/cp/decl2.cc   |  19 +-
 gcc/cp/error.cc   |   5 +
 gcc/cp/mangle.cc  |   1 +
 gcc/cp/operators.def  |   1 +
 gcc/cp/parser.cc  | 303 ++-
 gcc/cp/parser.h   |   7 +
 gcc/cp/pt.cc  |  39 +-
 gcc/cp/semantics.cc   | 288 +--
 gcc/cp/typeck.cc  |  12 +-
 gcc/gimplify.cc   |  74 ++-
 gcc/omp-general.cc|  47 ++
 gcc/omp-general.h |   4 +-
 gcc/omp-low.cc| 403 ++-
 gcc/testsuite/g++.dg/gomp/array-shaping-1.C   |  22 +
 gcc/testsuite/g++.dg/gomp/array-shaping-2.C   | 134 +
 .../g++.dg/gomp/bad-array-shaping-1.C |  47 ++
 .../g++.dg/gomp/bad-array-shaping-2.C |  52 ++
 .../g++.dg/gomp/bad-array-shaping-3.C |  53 ++
 .../g++.dg/gomp/bad-array-shaping-4.C |  60 +++
 .../g++.dg/gomp/bad-array-shaping-5.C |  55 ++
 .../g++.dg/gomp/bad-array-shaping-6.C |  59 +++
 .../g++.dg/gomp/bad-array-shaping-7.C |  48 ++
 .../g++.dg/gomp/bad-array-shaping-8.C |  50 ++
 gcc/tree-pretty-print.cc  |  17 +
 gcc/tree.def  |   2 +-
 include/gomp-constants.h  |   7 +-
 libgomp/libgomp.h |  14 +
 libgomp/target.c  | 216 +---
 .../testsuite/libgomp.c++/array-shaping-1.C   | 469 ++
 .../testsuite/libgomp.c++/array-shaping-10.C  |  61 +++
 .../testsuite/libgomp.c++/array-shaping-11.C  |  63 +++
 .../testsuite/libgomp.c++/array-shaping-12.C  |  65 +++
 .../testsuite/libgomp.c++/array-shaping-13.C  |  89 
 .../testsuite/libgomp.c++/array-shaping-2.C   |  38 ++
 .../testsuite/libgomp.c++/array-shaping-3.C   |  38 ++
 .../testsuite/libgomp.c++/array-shaping-4.C   |  38 ++
 .../testsuite/libgomp.c++/array-shaping-5.C   |  38 ++
 .../testsuite/libgomp.c++/array-shaping-6.C   |  54 ++
 .../testsuite/libgomp.c++/array-shaping-7.C   |  54 ++
 .../testsuite/libgomp.c++/array-shaping-8.C   |  65 +++
 .../testsuite/libgomp.c++/array-shaping-9.C   |  95 
 51 files changed, 3421 insertions(+), 261 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/gomp/array-shaping-1.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/array-shaping-2.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/bad-array-shaping-1.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/bad-array-shaping-2.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/bad-array-shaping-3.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/bad-array-shaping-4.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/bad-array-shaping-5.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/bad-array-shaping-6.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/bad-array-shaping-7.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/bad-array-shaping-8.C
 create mode 100644 libgomp/testsuite/libgomp.c++/array-shaping-1.C
 create mode 100644 libgomp/testsuite/libgomp.c++/array-shaping-10.C
 create mode 100644 libgomp/testsuite/libgomp.c++/array-shaping-11.C
 create mode 100644 libgomp/testsuite/libgomp.c++/array-shaping-12.C
 create mode 100644 libgomp/testsuite/libgomp.c++/array-shaping-13.C
 create mode 100644 libgomp/testsuite/libgomp.c++/array-shaping-2.C
 create mode 100644 libgomp/testsuite/libgomp.c++/array-shaping-3.C
 create mode 100644 libgomp/testsuite/libgomp.c++/array-shaping-4.C
 create mode 100644 libgomp/testsuite/libgomp.c++/array-shaping-5.C
 create mode 100644 libgomp/testsuite/libgomp.c++/array-shaping-6.C
 create mod

[PATCH 1/3] OpenMP: Fix "exit data" for array sections for ref-to-ptr components

2023-03-10 Thread Julian Brown
This patch fixes "exit data" for (C++) reference-to-pointer struct
components with array sections, such as:

  struct S { int *&ptr; [...] };
  ...
  #pragma omp target exit data map(from: str->ptr, str->ptr[0:n])

Such exits need two "detach" operations. We need to unmap
both the pointer and the slice. That idiom is recognized by
omp_resolve_clause_dependencies, but before omp_build_struct_sibling_lists
finishes the resulting mapping nodes are represented like this:

  GOMP_MAP_FROM GOMP_MAP_DETACH GOMP_MAP_ATTACH_DETACH

And at the moment, that won't be recognized as a single mapping group
as it should be. This patch fixes that.

(This is covered by a test case added in later patches in this series,
e.g. libgomp/testsuite/libgomp.c++/array-shaping-8.C.)

2023-03-10  Julian Brown  

gcc/
* gimplify.cc (omp_get_attachment): Handle GOMP_MAP_DETACH here.
(omp_group_last): Handle *, GOMP_MAP_DETACH, GOMP_MAP_ATTACH_DETACH
groups for "exit data" of reference-to-pointer component array
sections.
(omp_group_base): Handle GOMP_MAP_DETACH.
---
 gcc/gimplify.cc | 30 ++
 1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index f3c97932608a..ae2fbc65c690 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -9067,6 +9067,7 @@ omp_get_attachment (omp_mapping_group *grp)
 
  case GOMP_MAP_ATTACH_DETACH:
  case GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION:
+ case GOMP_MAP_DETACH:
return OMP_CLAUSE_DECL (node);
 
  default:
@@ -9143,23 +9144,43 @@ omp_group_last (tree *start_p)
 == GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION)
 || (OMP_CLAUSE_MAP_KIND (nc)
 == GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION)
+|| OMP_CLAUSE_MAP_KIND (nc) == GOMP_MAP_DETACH
 || OMP_CLAUSE_MAP_KIND (nc) == GOMP_MAP_ALWAYS_POINTER
 || OMP_CLAUSE_MAP_KIND (nc) == GOMP_MAP_TO_PSET))
{
- grp_last_p = &OMP_CLAUSE_CHAIN (c);
- c = nc;
  tree nc2 = OMP_CLAUSE_CHAIN (nc);
+ if (OMP_CLAUSE_MAP_KIND (nc) == GOMP_MAP_DETACH)
+   {
+ /* In the specific case we're doing "exit data" on an array
+slice of a reference-to-pointer struct component, we will see
+DETACH followed by ATTACH_DETACH here.  We want to treat that
+as a single group. In other cases DETACH might represent a
+stand-alone "detach" clause, so we don't want to consider
+that part of the group.  */
+ if (nc2
+ && OMP_CLAUSE_CODE (nc2) == OMP_CLAUSE_MAP
+ && OMP_CLAUSE_MAP_KIND (nc2) == GOMP_MAP_ATTACH_DETACH)
+   goto consume_two_nodes;
+ else
+   break;
+   }
  if (nc2
  && OMP_CLAUSE_CODE (nc2) == OMP_CLAUSE_MAP
  && (OMP_CLAUSE_MAP_KIND (nc)
  == GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION)
  && OMP_CLAUSE_MAP_KIND (nc2) == GOMP_MAP_ATTACH)
{
+   consume_two_nodes:
  grp_last_p = &OMP_CLAUSE_CHAIN (nc);
  c = nc2;
- nc2 = OMP_CLAUSE_CHAIN (nc2);
+ nc = OMP_CLAUSE_CHAIN (nc2);
+   }
+ else
+   {
+ grp_last_p = &OMP_CLAUSE_CHAIN (c);
+ c = nc;
+ nc = nc2;
}
-  nc = nc2;
}
   break;
 
@@ -9305,6 +9326,7 @@ omp_group_base (omp_mapping_group *grp, unsigned int 
*chained,
  case GOMP_MAP_ALWAYS_POINTER:
  case GOMP_MAP_ATTACH_DETACH:
  case GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION:
+ case GOMP_MAP_DETACH:
return *grp->grp_start;
 
  default:
-- 
2.29.2



[PATCH 2/3] OpenMP: Allow complete replacement of clause during map/to/from expansion

2023-03-10 Thread Julian Brown
At present, map/to/from clauses on OpenMP "target" directives may be
expanded into several mapping nodes if they describe array sections with
pointer or reference bases, or similar.  This patch allows the original
clause to be replaced during that expansion, mostly by passing the list
pointer to the node to various functions rather than the node itself.

This is needed by the following patch. There shouldn't be any functional
changes introduced by this patch itself.

2023-03-10  Julian Brown  

gcc/c-family/
* c-common.h (expand_array_base, expand_component_selector,
expand_map_clause): Adjust member declarations.
* c-omp.cc (omp_expand_access_chain): Pass and return pointer to
clause.
(c_omp_address_inspector::expand_array_base): Likewise.
(c_omp_address_inspector::expand_component_selector): Likewise.
(c_omp_address_inspector::expand_map_clause): Likewise.

gcc/c/
* c-typeck.cc (handle_omp_array_sections): Pass pointer to clause to
process instead of clause.
(c_finish_omp_clauses): Update calls to handle_omp_array_sections.
Handle cases where initial clause might be replaced.

gcc/cp/
* semantics.cc (handle_omp_array_sections): Pass pointer to clause
instead of clause.  Add PNEXT return parameter for next clause in list
to process.
(finish_omp_clauses): Update calls to handle_omp_array_sections.
Handle cases where initial clause might be replaced.
---
 gcc/c-family/c-common.h | 12 +++
 gcc/c-family/c-omp.cc   | 75 +
 gcc/c/c-typeck.cc   | 32 +++---
 gcc/cp/semantics.cc | 37 +---
 4 files changed, 88 insertions(+), 68 deletions(-)

diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 3472f180543e..01ec9a739458 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1348,12 +1348,12 @@ public:
 
   bool maybe_zero_length_array_section (tree);
 
-  tree expand_array_base (tree, vec &, tree, unsigned *,
- c_omp_region_type, bool);
-  tree expand_component_selector (tree, vec &, tree,
- unsigned *, c_omp_region_type);
-  tree expand_map_clause (tree, tree, vec &,
- c_omp_region_type);
+  tree * expand_array_base (tree *, vec &, tree, unsigned *,
+   c_omp_region_type, bool);
+  tree * expand_component_selector (tree *, vec &, tree,
+   unsigned *, c_omp_region_type);
+  tree * expand_map_clause (tree *, tree, vec &,
+   c_omp_region_type);
 };
 
 enum c_omp_directive_kind {
diff --git a/gcc/c-family/c-omp.cc b/gcc/c-family/c-omp.cc
index 32699adc664c..0de3d350d023 100644
--- a/gcc/c-family/c-omp.cc
+++ b/gcc/c-family/c-omp.cc
@@ -3332,11 +3332,12 @@ 
c_omp_address_inspector::maybe_zero_length_array_section (tree clause)
expression types here, because e.g. you can't have an array of
references.  See also gimplify.cc:omp_expand_access_chain.  */
 
-static tree
-omp_expand_access_chain (tree c, tree expr, vec &addr_tokens,
-unsigned *idx)
+static tree *
+omp_expand_access_chain (tree *pc, tree expr,
+vec &addr_tokens, unsigned *idx)
 {
   using namespace omp_addr_tokenizer;
+  tree c = *pc;
   location_t loc = OMP_CLAUSE_LOCATION (c);
   unsigned i = *idx;
   tree c2 = NULL_TREE;
@@ -3364,35 +3365,36 @@ omp_expand_access_chain (tree c, tree expr, 
vec &addr_tokens,
   break;
 
 default:
-  return error_mark_node;
+  return NULL;
 }
 
   if (c2)
 {
   OMP_CLAUSE_CHAIN (c2) = OMP_CLAUSE_CHAIN (c);
   OMP_CLAUSE_CHAIN (c) = c2;
-  c = c2;
+  pc = &OMP_CLAUSE_CHAIN (c);
 }
 
   *idx = ++i;
 
   if (i < addr_tokens.length ()
   && addr_tokens[i]->type == ACCESS_METHOD)
-return omp_expand_access_chain (c, expr, addr_tokens, idx);
+return omp_expand_access_chain (pc, expr, addr_tokens, idx);
 
-  return c;
+  return pc;
 }
 
 /* Translate "array_base_decl access_method" to OMP mapping clauses.  */
 
-tree
-c_omp_address_inspector::expand_array_base (tree c,
+tree *
+c_omp_address_inspector::expand_array_base (tree *pc,
vec &addr_tokens,
tree expr, unsigned *idx,
c_omp_region_type ort,
bool decl_p)
 {
   using namespace omp_addr_tokenizer;
+  tree c = *pc;
   location_t loc = OMP_CLAUSE_LOCATION (c);
   int i = *idx;
   tree decl = addr_tokens[i + 1]->expr;
@@ -3417,7 +3419,7 @@ c_omp_address_inspector::expand_array_base (tree c,
  || OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_DETACH))
 {
   *idx = ++i;
-  return c;
+  return pc;
 }
 
   switch (addr_tokens[i + 1]->u.access_kind)
@@ -3663,7 +3665,7 @@ c_omp_address_inspector:

Re: [PATCH v2 1/5] docs: Create Indices appendix

2023-03-10 Thread Sandra Loosemore via Gcc-patches

On 3/9/23 13:38, Arsen Arsenović wrote:


Found the change.  HTML got support for CONTENTS_OUTPUT_LOCATION,
which defaults to after_top, which ignores the inline location of
these elements.  Here's a patch:

maintainer-scripts/ChangeLog:

* update_web_docs_git: Set CONTENTS_OUTPUT_LOCATION=inline in
order to put @shortcontents above contents. See
9dd976a4-4e09-d901-b949-6d5037567...@codesourcery.com on
gcc-patches.


I don't think this is an adequate fix.  We mere mortals build the 
manuals with "make html" etc instead of the maintainer scripts for the 
web site, so we need a solution that we can put either in the Makefile 
or directly in the .texi files, that won't blow up for older versions of 
Texinfo that don't support this thing.


-Sandra


[COMMITTED/12] Fix PR 105532: match.pd patterns calling tree_nonzero_bits with vector types

2023-03-10 Thread Andrew Pinski via Gcc-patches
Even though this PR was reported with an ubsan issue, the problem is
tree_nonzero_bits is being called with an expression which is a vector type.
This fixes three patterns I noticed which does that.
And adds a testcase for one of the patterns.

Committed after a bootstrapped and tested on x86_64-linux-gnu with no 
regressions

gcc/ChangeLog:

PR tree-optimization/105532
* match.pd (~(X >> Y) -> ~X >> Y): Check if it is an integral
type before calling tree_nonzero_bits.
(popcount(X) + popcount(Y)): Likewise.
(popcount(X&C1)): Likewise.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/vector-shift-1.c: New test.

(cherry picked from commit 193fccaa5c3525e979a989835c47c76d2c49d10c)
---
 gcc/match.pd  | 25 +++
 .../gcc.c-torture/compile/vector-shift-1.c|  8 ++
 2 files changed, 22 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/vector-shift-1.c

diff --git a/gcc/match.pd b/gcc/match.pd
index ef352af1572..fc2833bbdca 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1268,7 +1268,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
/* For logical right shifts, this is possible only if @0 doesn't
   have MSB set and the logical right shift is changed into
   arithmetic shift.  */
-   (if (!wi::neg_p (tree_nonzero_bits (@0)))
+   (if (INTEGRAL_TYPE_P (type)
+&& !wi::neg_p (tree_nonzero_bits (@0)))
 (with { tree stype = signed_type_for (TREE_TYPE (@0)); }
  (convert (rshift (bit_not! (convert:stype @0)) @1))
 #endif
@@ -7169,7 +7170,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* popcount(X) + popcount(Y) is popcount(X|Y) when X&Y must be zero.  */
 (simplify
   (plus (POPCOUNT:s @0) (POPCOUNT:s @1))
-  (if (wi::bit_and (tree_nonzero_bits (@0), tree_nonzero_bits (@1)) == 0)
+  (if (INTEGRAL_TYPE_P (type)
+   && wi::bit_and (tree_nonzero_bits (@0), tree_nonzero_bits (@1)) == 0)
 (POPCOUNT (bit_ior @0 @1
 
 /* popcount(X) == 0 is X == 0, and related (in)equalities.  */
@@ -7201,15 +7203,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (for pfun (POPCOUNT PARITY)
   (simplify
 (pfun @0)
-(with { wide_int nz = tree_nonzero_bits (@0); }
-  (switch
-   (if (nz == 1)
- (convert @0))
-   (if (wi::popcount (nz) == 1)
- (with { tree utype = unsigned_type_for (TREE_TYPE (@0)); }
-   (convert (rshift:utype (convert:utype @0)
-  { build_int_cst (integer_type_node,
-   wi::ctz (nz)); }
+(if (INTEGRAL_TYPE_P (type))
+ (with { wide_int nz = tree_nonzero_bits (@0); }
+   (switch
+(if (nz == 1)
+  (convert @0))
+(if (wi::popcount (nz) == 1)
+  (with { tree utype = unsigned_type_for (TREE_TYPE (@0)); }
+(convert (rshift:utype (convert:utype @0)
+   { build_int_cst (integer_type_node,
+wi::ctz (nz)); })
 
 #if GIMPLE
 /* 64- and 32-bits branchless implementations of popcount are detected:
diff --git a/gcc/testsuite/gcc.c-torture/compile/vector-shift-1.c 
b/gcc/testsuite/gcc.c-torture/compile/vector-shift-1.c
new file mode 100644
index 000..142ea56d5bb
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/vector-shift-1.c
@@ -0,0 +1,8 @@
+typedef unsigned char __attribute__((__vector_size__ (1))) U;
+
+U
+foo (U u)
+{
+  u = u == u;
+  return (~(u >> 255));
+}
-- 
2.17.1



[Committed] Docs: Update documentation of Texinfo versions for building manuals.

2023-03-10 Thread Sandra Loosemore
I've checked in the attached patch per discussion in another thread 
about possibly updating the minimum required Texinfo version.  This 
patch doesn't do that; it just recommends using a more recent version, 
removes redundant references to version 4.7, and fixes some related 
obsolete bits.


BTW the hardcopy manual being offered for sale in the FSF shop is for 
GCC 3.3 (2003?), so I felt no compunction about deleting the pointer to 
it as unhelpful.


-Sandracommit c62df15d283f035d5b1644f74493db2933f2a8cb
Author: Sandra Loosemore 
Date:   Sat Mar 11 00:40:42 2023 +

Docs: Update documentation of Texinfo versions for building manuals.

There has been recent discussion on updating the minimum required
version of Texinfo from the current version 4.7.  This patch does not
do that, but it suggests that people use a more recent version to get
better output.  It also removes some other references to Texinfo 4.7
and fixes some related bit-rot in the installation manual.  (Nobody
really wants to print the GCC manual any more, and the GCC web site
is a better place to get prebuilt manuals than the FSF store.)

gcc/ChangeLog:
* doc/install.texi (Prerequisites): Suggest using newer versions
of Texinfo.
(Final install): Clean up and modernize discussion of how to
build or obtain the GCC manuals.
* doc/install.texi2html: Update comment to point to the PR instead
of "makeinfo 4.7 brokenness" (it's not specific to that version).

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index f549ba597cb..63fc949b447 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -543,13 +543,21 @@ They are included in releases.
 Necessary for running @command{makeinfo} when modifying @file{*.texi}
 files to test your changes.
 
-Necessary for running @command{make dvi} or @command{make pdf} to
-create printable documentation in DVI or PDF format.  Texinfo version
+Necessary for running @command{make dvi}, @command{make pdf},
+or @command{make html} to create formatted documentation.  Texinfo version
 4.8 or later is required for @command{make pdf}.
 
-Necessary to build GCC documentation during development because the
-generated output files are not included in the repository.  They are
-included in releases.
+Necessary to build GCC documentation in info format during development
+because the generated output files are not included in the repository.
+(They are included in release tarballs.)
+
+Note that the minimum requirement is for a very old version of
+Texinfo, but recent versions of Texinfo produce better-quality output,
+especially for HTML format.  The version of Texinfo packaged with any
+current operating system distribution is likely to be adequate for
+building the documentation without error, but you may still want to
+install a newer release to get the best appearance and usability of
+the generated manuals.
 
 @item @TeX{} (any working version)
 
@@ -3429,6 +3437,31 @@ You can install stripped programs and libraries with
 make install-strip
 @end smallexample
 
+By default, only the man pages and info-format GCC documentation
+are built and installed.  If you want to generate the GCC manuals in
+other formats, use commands like
+
+@smallexample
+make dvi
+make pdf
+make html
+@end smallexample
+
+@noindent
+to build the manuals in the corresponding formats, and
+
+@smallexample
+make install-dvi
+make install-pdf
+make install-html
+@end smallexample
+
+@noindent
+to install them.
+Alternatively, there are prebuilt online versions of the manuals for
+released versions of GCC on
+@uref{https://gcc.gnu.org/onlinedocs/,,the GCC web site}.
+
 If you are bootstrapping a released version of GCC then please
 quickly review the build status page for your release, available from
 @uref{https://gcc.gnu.org/buildstat.html}.
@@ -3494,22 +3527,6 @@ incomplete or out of date.  Send a note to
 If you find a bug, please report it following the
 @uref{../bugs/,,bug reporting guidelines}.
 
-If you want to print the GCC manuals, do @samp{cd @var{objdir}; make
-dvi}.  You will need to have @command{texi2dvi} (version at least 4.7)
-and @TeX{} installed.  This creates a number of @file{.dvi} files in
-subdirectories of @file{@var{objdir}}; these may be converted for
-printing with programs such as @command{dvips}.  Alternately, by using
-@samp{make pdf} in place of @samp{make dvi}, you can create documentation
-in the form of @file{.pdf} files; this requires @command{texi2pdf}, which
-is included with Texinfo version 4.8 and later.  You can also
-@uref{https://shop.fsf.org/,,buy printed manuals from the
-Free Software Foundation}, though such manuals may not be for the most
-recent version of GCC@.
-
-If you would like to generate online HTML documentation, do @samp{cd
-@var{objdir}; make html} and HTML will be generated for the gcc manuals in
-@file{@var{objdir}/gcc/HTML}.
-
 @html
 
 
diff --git a/gcc/do

Re: [PATCH v2 1/5] docs: Create Indices appendix

2023-03-10 Thread Arsen Arsenović via Gcc-patches

Sandra Loosemore  writes:

> On 3/9/23 13:38, Arsen Arsenović wrote:
>> Found the change.  HTML got support for CONTENTS_OUTPUT_LOCATION,
>> which defaults to after_top, which ignores the inline location of
>> these elements.  Here's a patch:
>> maintainer-scripts/ChangeLog:
>>  * update_web_docs_git: Set CONTENTS_OUTPUT_LOCATION=inline in
>>  order to put @shortcontents above contents. See
>>  9dd976a4-4e09-d901-b949-6d5037567...@codesourcery.com on
>>  gcc-patches.
>
> I don't think this is an adequate fix.  We mere mortals build the manuals with
> "make html" etc instead of the maintainer scripts for the web site, so we need
> a solution that we can put either in the Makefile or directly in the .texi
> files, that won't blow up for older versions of Texinfo that don't support 
> this
> thing.

Hm, I've forgotten about that.  AFAICT, the only way to specify this
customization variable is through makeinfo flags.  It'd seem that
unrecognized variables produce a warning, though, so at least building
with older versions won't fail.

We could probably test for whether -c CONTENTS_OUTPUT_LOCATION produces
no warning, and if so, pass an extra flag in the makefile, or just
accept the warning on older versions (before 6.8).

Those, IIUC, should behave as if CONTENTS_OUTPUT_LOCATION is set to
inline, but I haven't tested that (it's getting quite late).

Also worth noting is that the contents come before the top node when set
up like this.  It might be nice to gate that behind @ifhtml or such.

Maybe we should also consider suggesting that texi2any places
@shortcontents first in after_top mode.  I can handle that if you think
that's reasonable.

I'll send the updated patch in the morning.

> -Sandra

Thanks, have a lovely night.
-- 
Arsen Arsenović


signature.asc
Description: PGP signature


New German PO file for 'gcc' (version 13.1-b20230212)

2023-03-10 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the German team of translators.  The file is available at:

https://translationproject.org/latest/gcc/de.po

(This file, 'gcc-13.1-b20230212.de.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.