Re: [x86_64 PATCH] PR target/109973: CCZmode and CCCmode variants of [v]ptest.

2023-05-29 Thread Jakub Jelinek via Gcc-patches
On Mon, May 29, 2023 at 07:17:42PM +0100, Roger Sayle wrote:
> The only change that was unanticipated was the tweak to ix86_match_ccmode.
> Oddly, CCZmode is allowable for CCmode, but CCCmode isn't.  Given that

So another option would be to use CCZmode for the ptestz cases and keep
CCmode for ptestc, I think we don't have any modes which cover C and Z
flags but nothing else, and for the optimization we only need to find out
if it is CCZmode.
Though, I'm certainly not familiar with the CC mode details in the backend,
so certainly need to defer this to Uros.

> CCZmode means just the Z flag, CCCmode means just the C flag, and
> CCmode means all the flags, I'm guessing this asymmetry is unintentional.
> Perhaps a super-safe fix is to explicitly test for CCZmode, CCCmode or
> CCmode
> in the *_ptest pattern's predicate, and not attempt to
> re-use ix86_match_ccmode?
> 
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?

Jakub



Re: [PATCH] tree: Fix up save_expr [PR52339]

2023-05-30 Thread Jakub Jelinek via Gcc-patches
On Tue, May 30, 2023 at 10:03:05AM +0200, Eric Botcazou wrote:
> > We want to be able to treat such things as invariant somehow even if we
> > can't do that for references to user data that might be changed by
> > intervening code.
> > 
> > That is, indicate that we know that the _REF actually refers to a const
> > variable or is otherwise known to be unchanging.
> > 
> > Perhaps that should be a new flag that tree_invariant_p can check
> > instead of TREE_READONLY.
> 
> Richard earlier suggested a langhook; given that Ada will be the main (sole?) 
> user of it, this would probably be better.

Are the DECL_INVARIANT_P FIELD_DECLs in Ada really invariant no matter how
exactly they are accessed?  Or can Ada suffer from the same problem as
C/C++, where the FIELD_DECL is TREE_READONLY, but could go out of scope or
a pointer to it could change.
I mean the p->fld cases in C/C++, where there could be free (p); or p++
etc. in between the place where save_expr is first evaluated and later
uses?

Jakub



Re: [PATCH] tree: Fix up save_expr [PR52339]

2023-05-30 Thread Jakub Jelinek via Gcc-patches
On Tue, May 30, 2023 at 04:36:34PM -0400, Jason Merrill wrote:
> Note that it is fine to treat p->fld as invariant in C++ if fld is
> TREE_READONLY and p is itself invariant.  The implementation is allowed to
> assume that other code didn't destroy *p and create a new object with a
> different value of p in the same location under
> https://eel.is/c++draft/basic.memobj#basic.life-8.3

You mean
#include 
struct S { const int fld; S (int x) : fld (x) {} };
int foo (S *p)
{
  int a[p->fld];
  delete p;
  return sizeof (a);
}
is invalid (sure, VLA is an extension)?

Jakub



Re: [committed] libstdc++: Add std::numeric_limits<__float128> specialization [PR104772]

2023-05-31 Thread Jakub Jelinek via Gcc-patches
On Wed, May 31, 2023 at 03:04:03PM +0100, Jonathan Wakely via Gcc-patches wrote:
> On Wed, 31 May 2023 at 13:23, Jonathan Wakely via Libstdc++ <
> libstd...@gcc.gnu.org> wrote:
> 
> > Tested powerpc64le-linux. Pushed to trunk.
> >
> > -- >8 --
> >
> > As suggested by Jakub in the PR, this just hardcodes the constants with
> > a Q suffix, since the properties of __float128 are not going to change.
> >
> > We can only define it for non-strict modes because the suffix gives an
> > error otherwise, even in system headers:
> >
> > limits:2085: error: unable to find numeric literal operator 'operator""Q'
> >
> > libstdc++-v3/ChangeLog:
> >
> > PR libstdc++/104772
> > * include/std/limits (numeric_limits<__float128>): Define.
> > * testsuite/18_support/numeric_limits/128bit.cc: New test.
> >
> 
> I should have tested this with clang before pushing:
> 
> /home/jwakely/gcc/latest/lib/gcc/x86_64-pc-linux-gnu/14.0.0/../../../../include/c++/14.0.0/limits:2125:
> 16: error: use of undeclared identifier '__builtin_huge_valq'
>  { return __builtin_huge_valq(); }
>   ^
> /home/jwakely/gcc/latest/lib/gcc/x86_64-pc-linux-gnu/14.0.0/../../../../include/c++/14.0.0/limits:2129:
> 16: error: use of undeclared identifier '__builtin_nanq'
>  { return __builtin_nanq(""); }
>   ^
> /home/jwakely/gcc/latest/lib/gcc/x86_64-pc-linux-gnu/14.0.0/../../../../include/c++/14.0.0/limits:2133:
> 16: error: use of undeclared identifier '__builtin_nansq'
>  { return __builtin_nansq(""); }
>   ^

See my comments in bugzilla how to support this stuff even without
being able to use Q suffixes.
As for the builtins, no reason not to use __float128(__builtin_huge_val())
for the first case or __float128(__builtin_nan("")) for the second case.
I'm afraid there is nothing that can be done about signalling NaN
of neither __builtin_nansq nor __builtin_nansf128 is supported.

Jakub



Re: [PATCH] doc: Fix description of x86 -m32 option [PR109954]

2023-06-01 Thread Jakub Jelinek via Gcc-patches
On Thu, Jun 01, 2023 at 11:37:37AM +0100, Jonathan Wakely via Gcc-patches wrote:
> In https://gcc.gnu.org/PR109954 I suggested also adding:
> 
> "N.B., using @option{-march} might be required to produce code suitable
> for a specific CPU family, e.g., @option{-march=i486}."
> 
> I realise that that is true for all of -m32, -m64 and -mx32, and similar
> rules apply for other targets too. But I still feel that saying it
> explicitly for -m32 doesn't hurt, and would avoid a common
> misunderstanding by putting that info somewhere it's more likely to be
> read.
> 
> But I'd prefer to just fix the part that is *wrong*, and then we can
> discuss whether or not that other part is an improvement. This patch
> fixes the wrongness.
> 
> OK for trunk and release branches?

Ok, thanks.

> This option does not imply -march=i386 so it's incorrect to say it
> generates code that will run on "any i386 system".
> 
> gcc/ChangeLog:
> 
>   PR target/109954
>   * doc/invoke.texi (x86 Options): Fix description of -m32 option.
> ---
>  gcc/doc/invoke.texi | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 898a88ce33e..ec71c2e9e0f 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -34091,7 +34091,7 @@ on x86-64 processors in 64-bit environments.
>  Generate code for a 16-bit, 32-bit or 64-bit environment.
>  The @option{-m32} option sets @code{int}, @code{long}, and pointer types
>  to 32 bits, and
> -generates code that runs on any i386 system.
> +generates code that runs in 32-bit mode.
>  
>  The @option{-m64} option sets @code{int} to 32 bits and @code{long} and 
> pointer
>  types to 64 bits, and generates code for the x86-64 architecture.
> -- 
> 2.40.1

Jakub



Re: [PATCH][committed] btf: Fix -Wformat errors

2023-06-02 Thread Jakub Jelinek via Gcc-patches
On Fri, Jun 02, 2023 at 06:18:38PM +0200, Rainer Orth wrote:
> Hi Alex,
> 
> > g:7aae58b04b92303ccda3ead600be98f0d4b7f462 introduced -Wformat errors
> > breaking bootstrap on some targets. This patch fixes that.
> >
> > Committed as obvious.
> >
> > Thanks,
> > Alex
> >
> > gcc/ChangeLog:
> >
> > * btfout.cc (btf_asm_type): Use PRIu64 instead of %lu for uint64_t.
> > (btf_asm_datasec_type): Likewise.
> 
> This is PR libstdc++/110077.  Btw., your fix is incomplete: it needs
> another change (%lu -> %zu) in btf_asm_func_type.

Can we rely on %zu working?  Sure, it is in C99 and so in C++11 as well,
but I don't see it being used inside of gcc/ at all and not sure if all host
C libraries support it.

Jakub



Re: [PATCH 2/3] Refactor widen_plus as internal_fn

2023-06-06 Thread Jakub Jelinek via Gcc-patches
On Thu, Jun 01, 2023 at 05:27:56PM +0100, Andre Vieira (lists) via Gcc-patches 
wrote:
> --- a/gcc/internal-fn.h
> +++ b/gcc/internal-fn.h
> @@ -20,6 +20,10 @@ along with GCC; see the file COPYING3.  If not see
>  #ifndef GCC_INTERNAL_FN_H
>  #define GCC_INTERNAL_FN_H
>  
> +#include "insn-codes.h"
> +#include "insn-opinit.h"

My i686-linux build configured with
../configure --enable-languages=default,obj-c++,lto,go,d,rust,m2 
--enable-checking=yes,rtl,extra --enable-libstdcxx-backtrace=yes
just died with
In file included from ../../gcc/m2/gm2-gcc/gcc-consolidation.h:74,
 from ../../gcc/m2/gm2-gcc/m2except.cc:22:
../../gcc/internal-fn.h:24:10: fatal error: insn-opinit.h: No such file or 
directory
   24 | #include "insn-opinit.h"
  |  ^~~
compilation terminated.
In file included from ../../gcc/m2/gm2-gcc/gcc-consolidation.h:74,
 from ../../gcc/m2/m2pp.cc:23:
../../gcc/internal-fn.h:24:10: fatal error: insn-opinit.h: No such file or 
directory
   24 | #include "insn-opinit.h"
  |  ^~~
In file included from ../../gcc/m2/gm2-gcc/gcc-consolidation.h:74,
 from ../../gcc/m2/gm2-gcc/rtegraph.cc:22:
../../gcc/internal-fn.h:24:10: fatal error: insn-opinit.h: No such file or 
directory
   24 | #include "insn-opinit.h"
  |  ^~~
compilation terminated.
compilation terminated.
supposedly because of this change.

Do you really need those includes there?
If yes, what is supposed to ensure that the generated includes
are generated before compiling files which include those?

>From what I can see, gcc/Makefile.in has
generated_files var which includes among other things insn-opinit.h,
and
# Dependency information.
  
# In order for parallel make to really start compiling the expensive
# objects from $(OBJS) as early as possible, build all their
# prerequisites strictly before all objects.
$(ALL_HOST_OBJS) : | $(generated_files)

rule, plus I see $(generated_files) mentioned in a couple of dependencies
in gcc/m2/Make-lang.in .  But supposedly because of this change it now
needs to be added to tons of other spots.

Jakub



[PATCH] modula2: Fix bootstrap

2023-06-06 Thread Jakub Jelinek via Gcc-patches
Hi!

internal-fn.h since yesterday includes insn-opinit.h, which is a generated
header.
One of my bootstraps today failed because some m2 sources started compiling
before insn-opinit.h has been generated.

Normally, gcc/Makefile.in has
# In order for parallel make to really start compiling the expensive
# objects from $(OBJS) as early as possible, build all their
# prerequisites strictly before all objects.
$(ALL_HOST_OBJS) : | $(generated_files)

rule which ensures that all the generated files are generated before
any $(ALL_HOST_OBJS) objects start, but use order-only dependency for
this because we don't want to rebuild most of the objects whenever one
generated header is regenerated.  After the initial build in an empty
directory we'll have .deps/ files contain the detailed dependencies.

$(ALL_HOST_OBJS) includes even some FE files, I think in the m2 case
would be m2_OBJS, but m2/Make-lang.in doesn't define those.

The following patch just adds a similar rule to m2/Make-lang.in.
Another option would be to set m2_OBJS variable in m2/Make-lang.in to
something, but not really sure to which exactly and why it isn't
done.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-06-06  Jakub Jelinek  

* Make-lang.in: Build $(generated_files) before building
all $(GM2_C_OBJS).

--- gcc/m2/Make-lang.in.jj  2023-05-04 09:31:27.289948109 +0200
+++ gcc/m2/Make-lang.in 2023-06-06 21:38:26.655336041 +0200
@@ -511,6 +511,8 @@ GM2_LIBS_BOOT = m2/gm2-compiler-boot
 m2/gm2-libs-boot/libgm2.a \
 $(GM2-BOOT-O)
 
+$(GM2_C_OBJS) : | $(generated_files)
+
 cc1gm2$(exeext): m2/stage1/cc1gm2$(exeext) $(m2.prev)
cp -p $< $@
 


Jakub



[PATCH] middle-end, i386: Pattern recognize add/subtract with carry [PR79173]

2023-06-06 Thread Jakub Jelinek via Gcc-patches
Hi!

The following patch introduces {add,sub}c5_optab and pattern recognizes
various forms of add with carry and subtract with carry/borrow, see
pr79173-{1,2,3,4,5,6}.c tests on what is matched.
Primarily forms with 2 __builtin_add_overflow or __builtin_sub_overflow
calls per limb (with just one for the least significant one), for
add with carry even when it is hand written in C (for subtraction
reassoc seems to change it too much so that the pattern recognition
doesn't work).  __builtin_{add,sub}_overflow are standardized in C23
under ckd_{add,sub} names, so it isn't any longer a GNU only extension.

Note, clang has for these has (IMHO badly designed)
__builtin_{add,sub}c{b,s,,l,ll} builtins which don't add/subtract just
a single bit of carry, but basically add 3 unsigned values or
subtract 2 unsigned values from one, and result in carry out of 0, 1, or 2
because of that.  If we wanted to introduce those for clang compatibility,
we could and lower them early to just two __builtin_{add,sub}_overflow
calls and let the pattern matching in this patch recognize it later.

I've added expanders for this on ix86 and in addition to that
added various peephole2s to make sure we get nice (and small) code
for the common cases.  I think there are other PRs which request that
e.g. for the _{addcarry,subborrow}_u{32,64} intrinsics, which the patch
also improves.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Would be nice if support for these optabs was added to many other targets,
arm/aarch64 and powerpc* certainly have such instructions, I'd expect
in fact that most targets do.

The _BitInt support I'm working on will also need this to emit reasonable
code.

2023-06-06  Jakub Jelinek  

PR middle-end/79173
* internal-fn.def (ADDC, SUBC): New internal functions.
* internal-fn.cc (expand_ADDC, expand_SUBC): New functions.
(commutative_ternary_fn_p): Return true also for IFN_ADDC.
* optabs.def (addc5_optab, subc5_optab): New optabs.
* tree-ssa-math-opts.cc (match_addc_subc): New function.
(math_opts_dom_walker::after_dom_children): Call match_addc_subc
for PLUS_EXPR, MINUS_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR unless
other optimizations have been successful for those.
* gimple-fold.cc (gimple_fold_call): Handle IFN_ADDC and IFN_SUBC.
* gimple-range-fold.cc (adjust_imagpart_expr): Likewise.
* tree-ssa-dce.cc (eliminate_unnecessary_stmts): Likewise.
* doc/md.texi (addc5, subc5): Document new named
patterns.
* config/i386/i386.md (subborrow): Add alternative with
memory destination.
(addc5, subc5): New define_expand patterns.
(*sub_3, @add3_carry, addcarry, @sub3_carry,
subborrow, *add3_cc_overflow_1): Add define_peephole2
TARGET_READ_MODIFY_WRITE/-Os patterns to prefer using memory
destination in these patterns.

* gcc.target/i386/pr79173-1.c: New test.
* gcc.target/i386/pr79173-2.c: New test.
* gcc.target/i386/pr79173-3.c: New test.
* gcc.target/i386/pr79173-4.c: New test.
* gcc.target/i386/pr79173-5.c: New test.
* gcc.target/i386/pr79173-6.c: New test.
* gcc.target/i386/pr79173-7.c: New test.
* gcc.target/i386/pr79173-8.c: New test.
* gcc.target/i386/pr79173-9.c: New test.
* gcc.target/i386/pr79173-10.c: New test.

--- gcc/internal-fn.def.jj  2023-06-05 10:38:06.670333685 +0200
+++ gcc/internal-fn.def 2023-06-05 11:40:50.672212265 +0200
@@ -381,6 +381,8 @@ DEF_INTERNAL_FN (ASAN_POISON_USE, ECF_LE
 DEF_INTERNAL_FN (ADD_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (SUB_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (MUL_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
+DEF_INTERNAL_FN (ADDC, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
+DEF_INTERNAL_FN (SUBC, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (TSAN_FUNC_EXIT, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (VA_ARG, ECF_NOTHROW | ECF_LEAF, NULL)
 DEF_INTERNAL_FN (VEC_CONVERT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
--- gcc/internal-fn.cc.jj   2023-05-15 19:12:24.080780016 +0200
+++ gcc/internal-fn.cc  2023-06-06 09:38:46.333871169 +0200
@@ -2722,6 +2722,44 @@ expand_MUL_OVERFLOW (internal_fn, gcall
   expand_arith_overflow (MULT_EXPR, stmt);
 }
 
+/* Expand ADDC STMT.  */
+
+static void
+expand_ADDC (internal_fn ifn, gcall *stmt)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  tree arg1 = gimple_call_arg (stmt, 0);
+  tree arg2 = gimple_call_arg (stmt, 1);
+  tree arg3 = gimple_call_arg (stmt, 2);
+  tree type = TREE_TYPE (arg1);
+  machine_mode mode = TYPE_MODE (type);
+  insn_code icode = optab_handler (ifn == IFN_ADDC
+  ? addc5_optab : subc5_optab, mode);
+  rtx op1 = expand_normal (arg1);
+  rtx op2 = expand_normal (arg2);
+  rtx op3 = expand_normal (arg3);
+  rtx target = expand_expr 

Re: [x86_64 PATCH] PR target/110104: Missing peephole2 for addcarry.

2023-06-06 Thread Jakub Jelinek via Gcc-patches
On Tue, Jun 06, 2023 at 11:31:42PM +0100, Roger Sayle wrote:
> 
> This patch resolves PR target/110104, a missed optimization on x86 around
> adc with memory operands.  In i386.md, there's a peephole2 after the
> pattern for *add3_cc_overflow_1 that converts the sequence
> reg = add(reg,mem); mem = reg [where the reg is dead afterwards] into
> the equivalent mem = add(mem,reg).  The equivalent peephole2 for adc
> is missing (after addcarry), and is added by this patch.
> 
> For the example code provided in the bugzilla PR:

Seems to be pretty much the same as one of the 12 define_peephole2
patterns I've posted in
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620821.html

> 2023-06-07  Roger Sayle  
> 
> gcc/ChangeLog
> PR target/110104
> * config/i386/i386.md (define_peephole2): Transform reg=adc(reg,mem)
> followed by mem=reg into mem=adc(mem,reg) when applicable.
> 
> gcc/testsuite/ChangeLog
> PR target/110104
> * gcc.target/i386/pr110104.c: New test case.

The testcase will be useful though (but I'd go with including
the intrin header and using the intrinsic rather than builtin).

Jakub



Re: [PATCH 1/2] Match: zero_one_valued_p should match 0 constants too

2023-06-07 Thread Jakub Jelinek via Gcc-patches
On Tue, Jun 06, 2023 at 05:17:05PM -0700, Andrew Pinski via Gcc-patches wrote:
> While working on `bool0 ? bool1 : bool2` I noticed that
> zero_one_valued_p does not match on the constant zero
> as in that case tree_nonzero_bits will return 0 and
> that is different from 1.
> 
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> 
> gcc/ChangeLog:
> 
>   * match.pd (zero_one_valued_p): Match 0 integer constant
>   too.
> ---
>  gcc/match.pd | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index f9cbd757752..f97ff7ef760 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -1983,11 +1983,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(bit_not (bit_not @0))
>@0)
>  
> +/* zero_one_valued_p will match when a value is known to be either
> +   0 or 1 including the constant 0. */
>  (match zero_one_valued_p
>   @0
>   (if (INTEGRAL_TYPE_P (type) && tree_nonzero_bits (@0) == 1)))

So perhaps instead change this to
&& wi::leu_p (tree_nonzero_bits (@0), 1)
?

>  (match zero_one_valued_p
>   truth_valued_p@0)
> +(match zero_one_valued_p
> + integer_zerop@0
> + (if (INTEGRAL_TYPE_P (type
>  
>  /* Transform { 0 or 1 } * { 0 or 1 } into { 0 or 1 } & { 0 or 1 }.  */
>  (simplify

Jakub



[PATCH] i386: Fix endless recursion in ix86_expand_vector_init_general with MMX [PR110152]

2023-06-07 Thread Jakub Jelinek via Gcc-patches
Hi!

I'm getting
+FAIL: gcc.target/i386/3dnow-1.c (internal compiler error: Segmentation fault 
signal terminated program cc1)
+FAIL: gcc.target/i386/3dnow-1.c (test for excess errors)
+FAIL: gcc.target/i386/3dnow-2.c (internal compiler error: Segmentation fault 
signal terminated program cc1)
+FAIL: gcc.target/i386/3dnow-2.c (test for excess errors)
+FAIL: gcc.target/i386/mmx-1.c (internal compiler error: Segmentation fault 
signal terminated program cc1)
+FAIL: gcc.target/i386/mmx-1.c (test for excess errors)
+FAIL: gcc.target/i386/mmx-2.c (internal compiler error: Segmentation fault 
signal terminated program cc1)
+FAIL: gcc.target/i386/mmx-2.c (test for excess errors)
regressions on i686-linux since r14-1166.  The problem is when
ix86_expand_vector_init_general is called with mmx_ok = true and
mode = V4HImode, it newly recurses with mmx_ok = false and mode = V2SImode,
but as mmx_ok is false and !TARGET_SSE, we recurse again with the same
arguments (ok, fresh new tmp and vals) infinitely.
The following patch fixes that by passing mmx_ok to that recursive call.
For n_words == 4 it isn't needed, because we only care about mmx_ok for
V2SImode or V2SFmode and no other modes.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-06-07  Jakub Jelinek  

PR target/110152
* config/i386/i386-expand.cc (ix86_expand_vector_init_general): For
n_words == 2 recurse with mmx_ok as first argument rather than false.

--- gcc/config/i386/i386-expand.cc.jj   2023-06-03 15:32:04.489410367 +0200
+++ gcc/config/i386/i386-expand.cc  2023-06-07 10:31:34.715981752 +0200
@@ -16371,7 +16371,7 @@ quarter:
  machine_mode concat_mode = tmp_mode == DImode ? V2DImode : V2SImode;
  rtx tmp = gen_reg_rtx (concat_mode);
  vals = gen_rtx_PARALLEL (concat_mode, gen_rtvec_v (2, words));
- ix86_expand_vector_init_general (false, concat_mode, tmp, vals);
+ ix86_expand_vector_init_general (mmx_ok, concat_mode, tmp, vals);
  emit_move_insn (target, gen_lowpart (mode, tmp));
}
   else if (n_words == 4)

Jakub



[PATCH] optabs: Implement double-word ctz and ffs expansion

2023-06-07 Thread Jakub Jelinek via Gcc-patches
Hi!

We have expand_doubleword_clz for a couple of years, where we emit
double-word CLZ as if (high_word == 0) return CLZ (low_word) + word_size;
else return CLZ (high_word);
We can do something similar for CTZ and FFS IMHO, just with the 2
words swapped.  So if (low_word == 0) return CTZ (high_word) + word_size;
else return CTZ (low_word); for CTZ and
if (low_word == 0) { return high_word ? FFS (high_word) + word_size : 0;
else return FFS (low_word);

The following patch implements that.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Note, on some targets which implement both word_mode ctz and ffs patterns,
it might be better to incrementally implement those double-word ffs expansion
patterns in md files, because we aren't able to optimize it correctly;
nothing can detect we have just made sure that argument is not 0 and so
don't need to bother with handling that case.  So, on ia32 just using
CTZ patterns would be better there, but I think we can even do better and
instead of doing the comparisons of the operands against 0 do the CTZ
expansion followed by testing of flags.

2023-06-07  Jakub Jelinek  

* optabs.cc (expand_ffs): Add forward declaration.
(expand_doubleword_clz): Rename to ...
(expand_doubleword_clz_ctz_ffs): ... this.  Add UNOPTAB argument,
handle also doubleword CTZ and FFS in addition to CLZ.
(expand_unop): Adjust caller.  Also call it for doubleword
ctz_optab and ffs_optab.

* gcc.target/i386/ctzll-1.c: New test.
* gcc.target/i386/ffsll-1.c: New test.

--- gcc/optabs.cc.jj2023-06-07 09:42:14.701130305 +0200
+++ gcc/optabs.cc   2023-06-07 14:35:04.909879272 +0200
@@ -2697,10 +2697,14 @@ expand_clrsb_using_clz (scalar_int_mode
   return temp;
 }
 
-/* Try calculating clz of a double-word quantity as two clz's of word-sized
-   quantities, choosing which based on whether the high word is nonzero.  */
+static rtx expand_ffs (scalar_int_mode, rtx, rtx);
+
+/* Try calculating clz, ctz or ffs of a double-word quantity as two clz, ctz or
+   ffs operations on word-sized quantities, choosing which based on whether the
+   high (for clz) or low (for ctz and ffs) word is nonzero.  */
 static rtx
-expand_doubleword_clz (scalar_int_mode mode, rtx op0, rtx target)
+expand_doubleword_clz_ctz_ffs (scalar_int_mode mode, rtx op0, rtx target,
+  optab unoptab)
 {
   rtx xop0 = force_reg (mode, op0);
   rtx subhi = gen_highpart (word_mode, xop0);
@@ -2709,6 +2713,7 @@ expand_doubleword_clz (scalar_int_mode m
   rtx_code_label *after_label = gen_label_rtx ();
   rtx_insn *seq;
   rtx temp, result;
+  int addend = 0;
 
   /* If we were not given a target, use a word_mode register, not a
  'mode' register.  The result will fit, and nobody is expecting
@@ -2721,6 +2726,9 @@ expand_doubleword_clz (scalar_int_mode m
  'target' to tag a REG_EQUAL note on.  */
   result = gen_reg_rtx (word_mode);
 
+  if (unoptab != clz_optab)
+std::swap (subhi, sublo);
+
   start_sequence ();
 
   /* If the high word is not equal to zero,
@@ -2728,7 +2736,13 @@ expand_doubleword_clz (scalar_int_mode m
   emit_cmp_and_jump_insns (subhi, CONST0_RTX (word_mode), EQ, 0,
   word_mode, true, hi0_label);
 
-  temp = expand_unop_direct (word_mode, clz_optab, subhi, result, true);
+  if (optab_handler (unoptab, word_mode) != CODE_FOR_nothing)
+temp = expand_unop_direct (word_mode, unoptab, subhi, result, true);
+  else
+{
+  gcc_assert (unoptab == ffs_optab);
+  temp = expand_ffs (word_mode, subhi, result);
+}
   if (!temp)
 goto fail;
 
@@ -2739,14 +2753,32 @@ expand_doubleword_clz (scalar_int_mode m
   emit_barrier ();
 
   /* Else clz of the full value is clz of the low word plus the number
- of bits in the high word.  */
+ of bits in the high word.  Similarly for ctz/ffs of the high word,
+ except that ffs should be 0 when both words are zero.  */
   emit_label (hi0_label);
 
-  temp = expand_unop_direct (word_mode, clz_optab, sublo, 0, true);
+  if (unoptab == ffs_optab)
+{
+  convert_move (result, const0_rtx, true);
+  emit_cmp_and_jump_insns (sublo, CONST0_RTX (word_mode), EQ, 0,
+  word_mode, true, after_label);
+}
+
+  if (optab_handler (unoptab, word_mode) != CODE_FOR_nothing)
+temp = expand_unop_direct (word_mode, unoptab, sublo, NULL_RTX, true);
+  else
+{
+  gcc_assert (unoptab == ffs_optab);
+  temp = expand_unop_direct (word_mode, ctz_optab, sublo, NULL_RTX, true);
+  addend = 1;
+}
+
   if (!temp)
 goto fail;
+
   temp = expand_binop (word_mode, add_optab, temp,
-  gen_int_mode (GET_MODE_BITSIZE (word_mode), word_mode),
+  gen_int_mode (GET_MODE_BITSIZE (word_mode) + addend,
+word_mode),
   result, true, OPTAB_DIRECT);
   if (!temp)
 goto fail;
@@ -2759,7 +2791,7

[PATCH] libstdc++: Fix up 20_util/to_chars/double.cc test for excess precision [PR110145]

2023-06-07 Thread Jakub Jelinek via Gcc-patches
Hi!

This test apparently contains 3 problematic floating point constants,
1e126, 4.91e-6 and 5.547e-6.  These constants suffer from double rounding
when -fexcess-precision=standard evaluates double constants in the precision
of Intel extended 80-bit long double.
As written in the PR, e.g. the first one is
0x1.7a2ecc414a03f7ff6ca1cb527787b130a97d51e51202365p+418
in the precision of GCC's internal format, 80-bit long double has
63-bit precision, so the above constant rounded to long double is
0x1.7a2ecc414a03f800p+418L
(the least significant bit in the 0 before p isn't there already).
0x1.7a2ecc414a03f800p+418L rounded to IEEE double is
0x1.7a2ecc414a040p+418.
Now, if excess precision doesn't happen and we round the GCC's internal
format number directly to double, it is
0x1.7a2ecc414a03fp+418 and that is the number the test expects.
One can see it on x86-64 (where excess precision to long double doesn't
happen) where double(1e126L) != 1e126.
The other two constants suffer from the same problem.

The following patch tweaks the testcase, such that those problematic
constants are used only if FLT_EVAL_METHOD is 0 or 1 (i.e. when we have
guarantee the constants will be evaluated in double precision),
plus adds corresponding tests with hexadecimal constants which don't
suffer from this excess precision problem, they are exact in double
and long double can hold all double values.

Bootstrapped/regtested on x86_64-linux and i686-linux, additionally
tested on the latter with
make check RUNTESTFLAGS='--target_board=unix/-fexcess-precision=standard 
conformance.exp=to_chars/double.cc'
Ok for trunk?

2023-06-07  Jakub Jelinek  

PR libstdc++/110145
* testsuite/20_util/to_chars/double.cc: Include .
(double_to_chars_test_cases,
double_scientific_precision_to_chars_test_cases_2,
double_fixed_precision_to_chars_test_cases_2): #if out 1e126, 4.91e-6
and 5.547e-6 tests if FLT_EVAL_METHOD is negative or larger than 1.
Add unconditional tests with corresponding double constants
0x1.7a2ecc414a03fp+418, 0x1.4981285e98e79p-18 and
0x1.7440bbff418b9p-18.

--- libstdc++-v3/testsuite/20_util/to_chars/double.cc.jj2022-11-03 
22:16:08.542329555 +0100
+++ libstdc++-v3/testsuite/20_util/to_chars/double.cc   2023-06-07 
15:41:44.275604870 +0200
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -1968,9 +1969,19 @@ inline constexpr double_to_chars_testcas
 {1e125, chars_format::fixed,
 
"248677616189928820425446708698348384614392259722252941999757930266031634937628176537515300"
 "5841365553228283904"},
+#if FLT_EVAL_METHOD >= 0 && FLT_EVAL_METHOD <= 1
+// When long double is Intel extended and double constants are evaluated 
in precision of
+// long double, this value is initialized to double(1e126L), which is 
0x1.7a2ecc414a040p+418 due to
+// double rounding of 0x1.7a2ecc414a03f7ff6p+418L first to 
0x1.7a2ecc414a03f800p+418L and
+// then to 0x1.7a2ecc414a040p+418, while when double constants are 
evaluated in precision of
+// IEEE double, this is 0x1.7a2ecc414a03fp+418 which the test expects.  
See PR110145.
 {1e126, chars_format::fixed,
 
"248677616189928820425446708698348384614392259722252941999757930266031634937628176537515300"
 "58413655532282839040"},
+#endif
+{0x1.7a2ecc414a03fp+418, chars_format::fixed,
+   
"248677616189928820425446708698348384614392259722252941999757930266031634937628176537515300"
+   "58413655532282839040"},
 {1e127, chars_format::fixed,
 
"549291066784979473595300225087383524118479625982517885450291174622154390152298057300868772"
 "377386949310916067328"},
@@ -2816,8 +2827,12 @@ inline constexpr double_to_chars_testcas
 {0x1.a6c767640cd71p+879, chars_format::scientific, 
"6.6564021122018745e+264"},
 
 // Incorrectly handled by dtoa_milo() (Grisu2), which doesn't achieve 
shortest round-trip.
+#if FLT_EVAL_METHOD >= 0 && FLT_EVAL_METHOD <= 1
 {4.91e-6, chars_format::scientific, "4.91e-06"},
 {5.547e-6, chars_format::scientific, "5.547e-06"},
+#endif
+{0x1.4981285e98e79p-18, chars_format::scientific, "4.91e-06"},
+{0x1.7440bbff418b9p-18, chars_format::scientific, "5.547e-06"},
 
 // Test hexfloat corner cases.
 {0x1.728p+0, chars_format::hex, "1.728p+0"}, // instead of "2.e5p-1"
@@ -5537,10 +5552,16 @@ inline constexpr double_to_chars_testcas
 "9."
 
"9992486776161899288204254467086983483846143922597222529419997579302660316349376281765375153005"
 "841365553228283904e+124"},
+#if FLT_EVAL_METHOD >= 0 && FLT_EVAL_METHOD <= 1
 {1e+126, chars_format::scientific, 124,
 "9."
 
"9992486776161899288204254467086983483846143922597222529419997579302660316349376281765375153005"
 "841365553228283904e+125"},
+#endif
+{0x1.7a2ecc414a03fp+418, chars_format:

Re: [committed] libstdc++: Fix code size regressions in std::vector [PR110060]

2023-06-08 Thread Jakub Jelinek via Gcc-patches
On Thu, Jun 08, 2023 at 10:05:43AM +0100, Jonathan Wakely via Gcc-patches wrote:
> > Looking at assembly, one of the differences I see is that the "after"
> > version has calls to realloc_insert(), while "before" version seems to have
> > them inlined [2].
> >
> > [1]
> > https://git.linaro.org/toolchain/ci/interesting-commits.git/tree/gcc/sha1/b7b255e77a271974479c34d1db3daafc04b920bc/tcwg_bmk-code_size-cpu2017fast/status.txt
> >
> >
> I find it annoying that adding `if (n < sz) __builtin_unreachable()` seems
> to affect the size estimates for the function, and so perturbs inlining
> decisions. That code shouldn't add any actual instructions, so shouldn't
> affect size estimates.
> 
> I mentioned this in a meeting last week and Jason suggested checking
> whether using __builtin_assume has the same undesirable consequences, so I

We don't support __builtin_assume (intentionally), if you mean 
[[assume(n>=sz)]],
then because n >= sz doesn't have side-effects, it will be lowered to
exactly that if (n < sz) __builtin_unreachable(); - you can look at
-fdump-tree-all to confirm that.

I agree that the inliner should ignore if (comparison) __builtin_unreachable();
from costs estimation.  And inliner should ignore what we emit for [[assume()]]
if there are side-effects.  CCing Honza.

Jakub



[committed] fortran: Fix ICE on pr96024.f90 on big-endian hosts [PR96024]

2023-06-09 Thread Jakub Jelinek via Gcc-patches
Hi!

The pr96024.f90 testcase ICEs on big-endian hosts.  The problem is
that length->val.integer is accessed after checking
length->expr_type == EXPR_CONSTANT, but it is a CHARACTER constant
which uses length->val.character union member instead and on big-endian
we end up reading constant 0x1 rather than some small number
on little-endian and if target doesn't have enough memory for 4 times
that (i.e. 16GB allocation), it ICEs.

Fixed thusly, bootstrapped/regtested on
{x86_64,i686,powerpc64le,aarch64,s390x}-linux, preapproved in bugzilla
by Harald, committed to trunk and 13, 12, 11 and 10 release branches.

2023-06-09  Jakub Jelinek  

PR fortran/96024
* primary.cc (gfc_convert_to_structure_constructor): Only do
constant string ctor length verification and truncation/padding
if constant length has INTEGER type.

--- gcc/fortran/primary.cc.jj   2023-05-20 15:31:09.183661713 +0200
+++ gcc/fortran/primary.cc  2023-06-08 11:49:39.354875373 +0200
@@ -3188,10 +3188,11 @@ gfc_convert_to_structure_constructor (gf
goto cleanup;
 
   /* For a constant string constructor, make sure the length is
-correct; truncate of fill with blanks if needed.  */
+correct; truncate or fill with blanks if needed.  */
   if (this_comp->ts.type == BT_CHARACTER && !this_comp->attr.allocatable
  && this_comp->ts.u.cl && this_comp->ts.u.cl->length
  && this_comp->ts.u.cl->length->expr_type == EXPR_CONSTANT
+ && this_comp->ts.u.cl->length->ts.type == BT_INTEGER
  && actual->expr->ts.type == BT_CHARACTER
  && actual->expr->expr_type == EXPR_CONSTANT)
{

Jakub



Re: [PATCH RFC] c++: use __cxa_call_terminate for MUST_NOT_THROW [PR97720]

2023-06-09 Thread Jakub Jelinek via Gcc-patches
On Fri, Jun 09, 2023 at 11:02:48AM +0200, Richard Biener via Gcc-patches wrote:
> > Currently both gcc-13 and trunk are at the same library version,
> > libstdc++.so.6.0.32
> >
> > But with this addition to trunk we need to bump that .32 to .33, meaning
> > that gcc-13 and trunk diverge. If we want to backport any new symbols from
> > trunk to gcc-13 that gets trickier once they've diverged.
> 
> But if you backport any new used symbol you have to bump the version
> anyway.  So why not bump now (on trunk)?

We've already done that in 13.1.1.  So, before 13.2 is released, we can add
further symbols to the GLIBCXX_3.4.32 symbol version.
Though, I don't see a problem bumping libstdc++ to libstdc++.so.6.0.33
on the trunk now and put __cxa_call_terminate to GLIBCXX_3.4.33.
The ABI on the trunk is certainly not stable at this point.
If we come up with a need to introduce another symbol to 13.2, we can just
add it to GLIBCXX_3.4.32 on the trunk and then backport that change to the
branch.  If nothing in 13 will use the new symbol, seems like a waste to add
it to libstdc++.so.6.0.32.

> > If we added __cxa_call_terminate to gcc-13, making it another new addition
> > to libstdc++.so.6.0.32, then it would simplify a few things.
> >
> > In theory it could be a problem for distros already shipping gcc-13.1.1
> > with that new libstdc++.so.6.0.32 version, but since the
> > __cxa_call_terminate symbol won't actually be used by the gcc-13.1.1
> > compilers, I don't think it will be a problem.

Jakub



Re: [PATCH] MATCH: Fix zero_one_valued_p not to match signed 1 bit integers

2023-06-09 Thread Jakub Jelinek via Gcc-patches
On Fri, Jun 09, 2023 at 11:06:04AM +0200, Richard Biener via Gcc-patches wrote:
> On Fri, Jun 9, 2023 at 3:48 AM Andrew Pinski via Gcc-patches
>  wrote:
> >
> > So for the attached testcase, we assumed that zero_one_valued_p would
> > be the value [0,1] but currently zero_one_valued_p matches also
> > signed 1 bit integers.
> > This changes that not to match that and fixes the 2 new testcases at
> > all optimization levels.
> >
> > OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> 
> OK.

Note, this means it won't return true for zero INTEGER_CSTs with
signed 1-bit precision type.  Such value is in the [0, 1] range.
Though, I guess signed 1-bit precision types are so rare and problematic
that it doesn't hurt not to optimize that.

Jakub



Re: Ping: [PATCH] libatomic: x86_64: Always try ifunc

2023-06-09 Thread Jakub Jelinek via Gcc-patches
On Fri, Jun 09, 2023 at 08:37:20PM +0800, Xi Ruoyao wrote:
> Ping (in hopes that someone can review before the weekend).
> 
> On Sat, 2023-06-03 at 19:25 +0800, Xi Ruoyao wrote:
> > We used to skip ifunc check when CX16 is available.  But now we use
> > CX16+AVX+Intel/AMD for the "perfect" 16b load implementation, so CX16
> > alone is not a sufficient reason not to use ifunc (see PR104688).
> > 
> > This causes a subtle and annoying issue: when GCC is built with a
> > higher -march= setting in CFLAGS_FOR_TARGET, ifunc is disabled and
> > the worst (locked) implementation of __atomic_load_16 is always used.
> > 
> > There seems no good way to check if the CPU is Intel or AMD from
> > the built-in macros (maybe we can check every known model like
> > __skylake,
> > __bdver2, ..., but it will be very error-prune and require an update
> > whenever we add the support for a new x86 model).  The best thing we
> > can
> > do seems "always try ifunc" here.
> > 
> > Bootstrapped and tested on x86_64-linux-gnu.  Ok for trunk?
> > 
> > libatomic/ChangeLog:
> > 
> > * configure.tgt: For x86_64, always set try_ifunc=yes.

Ok, thanks.

Jakub



[RFC] Add stdckdint.h header for C23

2023-06-10 Thread Jakub Jelinek via Gcc-patches
Hi!

The following patch is an attempt to implement the C23 stdckdint.h
header on top of our GNU extension - __builtin_{add,sub,mul}_overflow
builtins.

I have looked at gnulib stdckdint.h and they are full of workarounds
for various compilers, EDG doesn't do this, clang <= 14 can't multiply
__int128, ..., so I think the header belongs into the compiler rather
than C library, because it would be a nightmare to maintain it there.

What I'm struggling with is enforcing the weird restrictions
C23 imposes on these.

The builtins error on the result pointer not being writable, or
having boolean or enumeral type (the reason for disallowing bool
was that it would be questionable whether it should act as if
storing to an unsigned 1-bit precision type which would overflow
if result is not in [0,1] or whether it would never overflow
for bool * result and simply store false if the infinite precision
result is 0 and true otherwise, and for enums because of the
uncertainities on just the enumerators vs. range from smallest to
largest enumerator vs. strict enum precision with underlying type).
They do allow storing result in plain char.  And the source operands
can have any integral types, including plain char, including booleans
and including enumeral types.  The plain is to allow even _BitInt(N)
as both source and result later on.

Now, C23 says that suitable types for both type2/type3 and type1
are integral types other than plain char, bool, a bit-precise integer type,
or an enumerated type.

And it also says:
It is recommended to produce a diagnostic message if type2 or type3 are
not suitable integer types, or if *result is not a modifiable lvalue of
a suitable integer type.

I've tried to first check it with:
  static_assert (_Generic ((a), char: 0, const char: 0, volatile char: 0, const 
volatile char: 0,
   default: __builtin_classify_type (a) - 1 <= 1U),
 "...")
but that only catches plain char and doesn't catch _Bool/bool and
doesn't catch enumerated types (note, for the *result we diagnose
it for the builtins, but not for the other args), because
__builtin_classify_type sadly promotes its argument.

The _Generic in the patch below is slightly better, it catches
also _Bool/bool, but doesn't catch enumerated types, comptypes
used by _Generic says enumeral type is compatible with the underlying
integer type.  But catching just plain char and bool would be
also doable with just _Generic listing the non-allowed types.

I think changing __builtin_classify_type behavior after 35 years
would be dangerous, shall we introduce a new similar builtin which
would just never promote the argument/perform array/function/enum
conversions on it, so that
__builtin_type_classify (true) == boolean_type_class
enum E { E1, E2 } e;
__builtin_type_classify (e) == enumeral_type_class
int a[2];
__builtin_type_classify (a) == array_type_class
etc.?
Seems clang changed __builtin_type_classify at some point
so that it e.g. returns enumeral_type_class for enum arguments
and array_type_class for arrays, but doesn't return boolean_type_class
for _Bool argument.

Also, shall we introduce __typeof_unqual__ keyword which could be used in
< C23 modes and perhaps C++?

2023-06-10  Jakub Jelinek  

* Makefile.in (USER_H): Add stdckdint.h.
* ginclude/stdckdint.h: New file.

* gcc.dg/stdckdint-1.c: New test.
* gcc.dg/stdckdint-2.c: New test.

--- gcc/Makefile.in.jj  2023-06-06 20:02:35.581211930 +0200
+++ gcc/Makefile.in 2023-06-10 10:17:05.062270115 +0200
@@ -466,6 +466,7 @@ USER_H = $(srcdir)/ginclude/float.h \
 $(srcdir)/ginclude/stdnoreturn.h \
 $(srcdir)/ginclude/stdalign.h \
 $(srcdir)/ginclude/stdatomic.h \
+$(srcdir)/ginclude/stdckdint.h \
 $(EXTRA_HEADERS)
 
 USER_H_INC_NEXT_PRE = @user_headers_inc_next_pre@
--- gcc/ginclude/stdckdint.h.jj 2023-06-10 09:20:39.154053338 +0200
+++ gcc/ginclude/stdckdint.h2023-06-10 12:02:33.454947780 +0200
@@ -0,0 +1,78 @@
+/* Copyright (C) 2023 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+/* ISO C23: 7.

Re: [RFC] Add stdckdint.h header for C23

2023-06-10 Thread Jakub Jelinek via Gcc-patches
On Sat, Jun 10, 2023 at 12:37:40PM +0200, Jakub Jelinek via Gcc-patches wrote:
> I think changing __builtin_classify_type behavior after 35 years
> would be dangerous, shall we introduce a new similar builtin which
> would just never promote the argument/perform array/function/enum
> conversions on it, so that
> __builtin_type_classify (true) == boolean_type_class
> enum E { E1, E2 } e;
> __builtin_type_classify (e) == enumeral_type_class
> int a[2];
> __builtin_type_classify (a) == array_type_class
> etc.?
> Seems clang changed __builtin_type_classify at some point
> so that it e.g. returns enumeral_type_class for enum arguments
> and array_type_class for arrays, but doesn't return boolean_type_class
> for _Bool argument.

Another option would be just extend the current __builtin_classify_type
to be more sizeof like, that the argument could be either expression with
current behavior, or type, and so one could use
__builtin_classify_type (int)
or
__builtin_classify_type (0)
or
__builtin_classify_type (typeof (expr))
and the last way would ensure no argument promotion happens.

Jakub



Re: [PATCH][committed] Regenerate config.in

2023-06-12 Thread Jakub Jelinek via Gcc-patches
On Mon, Jun 12, 2023 at 11:20:00AM +0100, Tamar Christina via Gcc-patches wrote:
> Hi All,
> 
> Looks like I forgot to regenerate config.in which
> causes updates when you enable maintainer mode.
> 
> Bootstrapped aarch64-none-linux-gnu.
> 
> Committed under obvious rule.

Do you use the DEFAULT_MATCHPD_PARTITIONS macro anywhere?
If not, why the AC_DEFINE_UNQUOTED at all and not just the AC_SUBST?

Jakub



Re: [PATCH] Remove DEFAULT_MATCHPD_PARTITIONS macro

2023-06-12 Thread Jakub Jelinek via Gcc-patches
On Mon, Jun 12, 2023 at 01:53:26PM +0100, Tamar Christina wrote:
> gcc/ChangeLog:
> 
>   * config.in: Regenerate.
>   * configure: Regenerate.
>   * configure.ac: Remove DEFAULT_MATCHPD_PARTITIONS.

Ok, thanks.

Jakub



[PATCH] c: Add __typeof_unqual__ and __typeof_unqual support

2023-06-12 Thread Jakub Jelinek via Gcc-patches
Hi!

As I mentioned in my stdckdint.h mail, I think having __ prefixed
keywords for the typeof_unqual keyword which can be used in earlier
language modes can be useful, not all code can be switched to C23
right away.

The following patch implements that.  It keeps the non-C23 behavior
for it for the _Noreturn functions to stay compatible with how
__typeof__ behaves.

I think we don't need it for C++, in C++ we have standard
traits to remove qualifiers etc.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-06-12  Jakub Jelinek  

gcc/
* doc/extend.texi (Typeof): Document typeof_unqual
and __typeof_unqual__.
gcc/c-family/
* c-common.cc (c_common_reswords): Add __typeof_unqual
and __typeof_unqual__ spellings of typeof_unqual.
gcc/c/
* c-parser.cc (c_parser_typeof_specifier): Handle
__typeof_unqual and __typeof_unqual__ as !is_std.
gcc/testsuite/
* gcc.dg/c11-typeof-2.c: New test.
* gcc.dg/c11-typeof-3.c: New test.
* gcc.dg/gnu11-typeof-3.c: New test.
* gcc.dg/gnu11-typeof-4.c: New test.

--- gcc/c-family/c-common.cc.jj 2023-05-20 15:31:09.070663296 +0200
+++ gcc/c-family/c-common.cc2023-06-10 19:20:11.106910976 +0200
@@ -420,6 +420,8 @@ const struct c_common_resword c_common_r
   { "__transaction_cancel", RID_TRANSACTION_CANCEL, 0 },
   { "__typeof",RID_TYPEOF, 0 },
   { "__typeof__",  RID_TYPEOF, 0 },
+  { "__typeof_unqual", RID_TYPEOF_UNQUAL, D_CONLY },
+  { "__typeof_unqual__", RID_TYPEOF_UNQUAL, D_CONLY },
   { "__volatile",  RID_VOLATILE,   0 },
   { "__volatile__",RID_VOLATILE,   0 },
   { "__GIMPLE",RID_GIMPLE, D_CONLY },
--- gcc/c/c-parser.cc.jj2023-06-06 20:02:35.587211846 +0200
+++ gcc/c/c-parser.cc   2023-06-10 19:22:15.577205685 +0200
@@ -4126,7 +4126,8 @@ c_parser_typeof_specifier (c_parser *par
 {
   gcc_assert (c_parser_next_token_is_keyword (parser, RID_TYPEOF_UNQUAL));
   is_unqual = true;
-  is_std = true;
+  tree spelling = c_parser_peek_token (parser)->value;
+  is_std = strcmp (IDENTIFIER_POINTER (spelling), "typeof_unqual") == 0;
 }
   c_parser_consume_token (parser);
   c_inhibit_evaluation_warnings++;
--- gcc/doc/extend.texi.jj  2023-06-06 20:02:35.643211062 +0200
+++ gcc/doc/extend.texi 2023-06-10 19:58:26.197478291 +0200
@@ -843,6 +843,13 @@ Thus, @code{array (pointer (char), 4)} i
 pointers to @code{char}.
 @end itemize
 
+The ISO C2X operator @code{typeof_unqual} is available in ISO C2X mode
+and its result is the non-atomic unqualified version of what @code{typeof}
+operator returns.  Alternate spelling @code{__typeof_unqual__} is
+available in all C modes and provides non-atomic unqualified version of
+what @code{__typeof__} operator returns.
+@xref{Alternate Keywords}.
+
 In GNU C, but not GNU C++, you may also declare the type of a variable
 as @code{__auto_type}.  In that case, the declaration must declare
 only one variable, whose declarator must just be an identifier, the
--- gcc/testsuite/gcc.dg/c11-typeof-2.c.jj  2023-06-10 19:27:38.675779747 
+0200
+++ gcc/testsuite/gcc.dg/c11-typeof-2.c 2023-06-10 19:42:27.450606301 +0200
@@ -0,0 +1,177 @@
+/* Test GNU extensions __typeof__ and __typeof_unqual__.  Valid code.  */
+/* { dg-do run } */
+/* { dg-options "-std=c11 -pedantic-errors" } */
+
+int i;
+extern __typeof__ (i) i;
+extern __typeof (int) i;
+extern __typeof_unqual__ (i) i;
+extern __typeof_unqual (int) i;
+
+volatile int vi;
+extern __typeof__ (volatile int) vi;
+extern __typeof (vi) vi;
+
+extern __typeof_unqual__ (volatile int) i;
+extern __typeof_unqual__ (vi) i;
+extern __typeof__ ((const int) vi) i;
+extern __typeof ((volatile int) vi) i;
+
+const int ci;
+extern __typeof (const int) ci;
+extern __typeof (ci) ci;
+
+extern __typeof_unqual (const int) i;
+extern __typeof_unqual (ci) i;
+extern __typeof__ ((const int) ci) i;
+extern __typeof__ (+ci) i;
+extern __typeof (0, ci) i;
+extern __typeof__ (1 ? ci : ci) i;
+extern __typeof (0) i;
+
+const int fci (void);
+extern __typeof__ (fci ()) i;
+
+_Atomic int ai;
+extern __typeof (_Atomic int) ai;
+extern __typeof__ (_Atomic (int)) ai;
+extern __typeof (ai) ai;
+
+extern __typeof_unqual__ (_Atomic int) i;
+extern __typeof_unqual (_Atomic (int)) i;
+extern __typeof_unqual__ (ai) i;
+extern __typeof (+ai) i;
+extern __typeof__ ((_Atomic int) ai) i;
+extern __typeof__ (0, ai) i;
+extern __typeof (1 ? ai : ai) i;
+
+_Atomic int fai (void);
+extern __typeof__ (fai ()) i;
+
+_Atomic const volatile int acvi;
+extern __typeof (int volatile const _Atomic) acvi;
+extern __typeof (acvi) acvi;
+extern const _Atomic volatile __typeof (acvi) acvi;
+extern _Atomic volatile __typeof__ (ci) acvi;
+extern _Atomic const __typeof (vi) acvi;
+extern const __typeof__ (ai) volatile acvi;
+
+extern __typeof_unqual (acvi) i;
+extern __typeof_unqual__ (__typeof (acvi)) i;
+extern __typeof_unqual (_Atomic __typeof_unqual__ (acvi)) i;
+

[PATCH] c, c++: Accept __builtin_classify_type (typename)

2023-06-12 Thread Jakub Jelinek via Gcc-patches
Hi!

As mentioned in my stdckdint.h mail, __builtin_classify_type has
a problem that argument promotion (the argument is passed to ...
prototyped builtin function) means that certain type classes will
simply never appear.
I think it is too late to change how it behaves, lots of code in the
wild might rely on the current behavior.

So, the following patch adds option to use a typename rather than
expression as the operand to the builtin, making it behave similarly
to sizeof, typeof or say the clang _Generic extension where the
first argument can be there not just expression, but also typename.

I think we have other prior art here, e.g. __builtin_va_arg also
expects typename.

I've added this to both C and C++, because it would be weird if it
supported it only in C and not in C++.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-06-12  Jakub Jelinek  

gcc/
* builtins.h (type_to_class): Declare.
* builtins.cc (type_to_class): No longer static.  Return
int rather than enum.
* doc/extend.texi (__builtin_classify_type): Document.
gcc/c/
* c-parser.cc (c_parser_postfix_expression_after_primary): Parse
__builtin_classify_type call with typename as argument.
gcc/cp/
* parser.cc (cp_parser_postfix_expression): Parse
__builtin_classify_type call with typename as argument.
* pt.cc (tsubst_copy_and_build): Handle __builtin_classify_type
with dependent typename as argument.
gcc/testsuite/
* c-c++-common/builtin-classify-type-1.c: New test.
* g++.dg/ext/builtin-classify-type-1.C: New test.
* g++.dg/ext/builtin-classify-type-2.C: New test.
* gcc.dg/builtin-classify-type-1.c: New test.

--- gcc/builtins.h.jj   2023-01-03 00:20:34.856089856 +0100
+++ gcc/builtins.h  2023-06-12 09:35:20.841902572 +0200
@@ -156,5 +156,6 @@ extern internal_fn associated_internal_f
 extern internal_fn replacement_internal_fn (gcall *);
 
 extern bool builtin_with_linkage_p (tree);
+extern int type_to_class (tree);
 
 #endif /* GCC_BUILTINS_H */
--- gcc/builtins.cc.jj  2023-05-20 15:31:09.03352 +0200
+++ gcc/builtins.cc 2023-06-12 09:35:31.709751296 +0200
@@ -113,7 +113,6 @@ static rtx expand_builtin_apply_args (vo
 static rtx expand_builtin_apply_args_1 (void);
 static rtx expand_builtin_apply (rtx, rtx, rtx);
 static void expand_builtin_return (rtx);
-static enum type_class type_to_class (tree);
 static rtx expand_builtin_classify_type (tree);
 static rtx expand_builtin_mathfn_3 (tree, rtx, rtx);
 static rtx expand_builtin_mathfn_ternary (tree, rtx, rtx);
@@ -1852,7 +1851,7 @@ expand_builtin_return (rtx result)
 
 /* Used by expand_builtin_classify_type and fold_builtin_classify_type.  */
 
-static enum type_class
+int
 type_to_class (tree type)
 {
   switch (TREE_CODE (type))
--- gcc/doc/extend.texi.jj  2023-06-10 19:58:26.197478291 +0200
+++ gcc/doc/extend.texi 2023-06-12 18:06:24.629413024 +0200
@@ -14354,6 +14354,30 @@ need not be a constant.  @xref{Object Si
 description of the function.
 @enddefbuiltin
 
+@defbuiltin{int __builtin_classify_type (@var{arg})}
+@defbuiltinx{int __builtin_classify_type (@var{type})}
+The @code{__builtin_classify_type} returns a small integer with a category
+of @var{arg} argument's type, like void type, integer type, enumeral type,
+boolean type, pointer type, reference type, offset type, real type, complex
+type, function type, method type, record type, union type, array type,
+string type, etc.  When the argument is an expression, for
+backwards compatibility reason the argument is promoted like arguments
+passed to @code{...} in varargs function, so some classes are never returned
+in certain languages.  Alternatively, the argument of the builtin-in
+function can be a typename, such as the @code{typeof} specifier.
+
+@smallexample
+int a[2];
+__builtin_classify_type (a) == __builtin_classify_type (int[5]);
+__builtin_classify_type (a) == __builtin_classify_type (void*);
+__builtin_classify_type (typeof (a)) == __builtin_classify_type (int[5]);
+@end smallexample
+
+The first comparison will never be true, as @var{a} is implicitly converted
+to pointer.  The last two comparisons will be true as they classify
+pointers in the second case and arrays in the last case.
+@enddefbuiltin
+
 @defbuiltin{double __builtin_huge_val (void)}
 Returns a positive infinity, if supported by the floating-point format,
 else @code{DBL_MAX}.  This function is suitable for implementing the
--- gcc/c/c-parser.cc.jj2023-06-10 19:22:15.577205685 +0200
+++ gcc/c/c-parser.cc   2023-06-12 17:32:31.007413019 +0200
@@ -11213,6 +11213,32 @@ c_parser_postfix_expression_after_primar
literal_zero_mask = 0;
if (c_parser_next_token_is (parser, CPP_CLOSE_PAREN))
  exprlist = NULL;
+   else if (TREE_CODE (expr.value) == FUNCTION_DECL
+&& fndecl_built_in_p (expr.value, BUILT_IN_CLASSIFY_TYPE)
+&& c_

[PATCH] c: Add stdckdint.h header for C23

2023-06-12 Thread Jakub Jelinek via Gcc-patches
Hi!

The following patch is an updated version of the patch I've posted
on Saturday, on top of the 2 patches I've posted a few minutes ago.

Bootstrapped/regtested on x86_64-linux and i686-linux.

2023-06-12  Jakub Jelinek  

* Makefile.in (USER_H): Add stdckdint.h.
* ginclude/stdckdint.h: New file.

* gcc.dg/stdckdint-1.c: New test.
* gcc.dg/stdckdint-2.c: New test.

--- gcc/Makefile.in.jj  2023-06-06 20:02:35.581211930 +0200
+++ gcc/Makefile.in 2023-06-10 10:17:05.062270115 +0200
@@ -466,6 +466,7 @@ USER_H = $(srcdir)/ginclude/float.h \
 $(srcdir)/ginclude/stdnoreturn.h \
 $(srcdir)/ginclude/stdalign.h \
 $(srcdir)/ginclude/stdatomic.h \
+$(srcdir)/ginclude/stdckdint.h \
 $(EXTRA_HEADERS)
 
 USER_H_INC_NEXT_PRE = @user_headers_inc_next_pre@
--- gcc/ginclude/stdckdint.h.jj 2023-06-10 09:20:39.154053338 +0200
+++ gcc/ginclude/stdckdint.h2023-06-12 18:09:06.429184992 +0200
@@ -0,0 +1,58 @@
+/* Copyright (C) 2023 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+/* ISO C23: 7.20 Checked Integer Arithmetic .  */
+
+#ifndef _STDCKDINT_H
+#define _STDCKDINT_H
+
+#define __STDC_VERSION_STDCKDINT_H__ 202311L
+
+#if __STDC_VERSION__ >= 201112L
+# define __ckd_type_check_1(a, n) \
+  static_assert (_Generic (((__typeof_unqual__ (a)) 0),
  \
+  char: 0,   \
+  default: __builtin_classify_type (__typeof__ (a))  \
+   == 1),\
+"types used in " n " should be integral other than plain "   \
+"char, bool, bit-precise integer or enumerated type")
+# define __ckd_type_check(r, a, b, n) \
+  (__extension__ ({ __ckd_type_check_1 ((r)[0], n);  \
+   __ckd_type_check_1 (a, n);\
+   __ckd_type_check_1 (b, n);\
+   (r); }))
+#else
+# define __ckd_type_check(r, a, b, n) r
+#endif
+
+#define ckd_add(r, a, b) \
+  ((_Bool) __builtin_add_overflow (a, b, \
+  __ckd_type_check (r, a, b, "ckd_add")))
+#define ckd_sub(r, a, b) \
+  ((_Bool) __builtin_sub_overflow (a, b, \
+  __ckd_type_check (r, a, b, "ckd_sub")))
+#define ckd_mul(r, a, b) \
+  ((_Bool) __builtin_mul_overflow (a, b, \
+  __ckd_type_check (r, a, b, "ckd_mul")))
+
+#endif /* stdckdint.h */
--- gcc/testsuite/gcc.dg/stdckdint-1.c.jj   2023-06-10 10:04:23.547792318 
+0200
+++ gcc/testsuite/gcc.dg/stdckdint-1.c  2023-06-10 10:50:29.748579415 +0200
@@ -0,0 +1,61 @@
+/* Test C23 Checked Integer Arithmetic macros in .  */
+/* { dg-do run } */
+/* { dg-options "-std=c2x" } */
+
+#include 
+
+#if __STDC_VERSION_STDCKDINT_H__ != 202311L
+# error __STDC_VERSION_STDCKDINT_H__ not defined to 202311L
+#endif
+
+extern void abort (void);
+
+int
+main ()
+{
+  unsigned int a;
+  if (ckd_add (&a, 1, 2) || a != 3)
+abort ();
+  if (ckd_add (&a, ~2U, 2) || a != ~0U)
+abort ();
+  if (!ckd_add (&a, ~2U, 4) || a != 1)
+abort ();
+  if (ckd_sub (&a, 42, 2) || a != 40)
+abort ();
+  if (!ckd_sub (&a, 11, ~0ULL) || a != 12)
+abort ();
+  if (ckd_mul (&a, 42, 16U) || a != 672)
+abort ();
+  if (ckd_mul (&a, ~0UL, 0) || a != 0)
+abort ();
+  if (ckd_mul (&a, 1, ~0U) || a != ~0U)
+abort ();
+  if (ckd_mul (&a, ~0UL, 1) != (~0UL > ~0U) || a != ~0U)
+abort ();
+  static_assert (_Generic (ckd_add (&a, 1, 1), bool: 1, default: 0));
+  static_assert (_Generic (ckd_sub (&a, 1, 1), bool: 1, default: 0));
+  static_assert (_Generic (ckd_mul (&a, 1, 1), bool: 1, default: 0));
+  signed char b;
+  if (ckd_add (&b, 8, 12) || b != 20)
+abort ();
+  if (ckd_sub (&b, 8UL, 12ULL) || b != -4)
+abort ();
+  if (ckd_mul (&b, 2, 3) || b != 6)
+abort ();
+  u

Re: [RFC] Add stdckdint.h header for C23

2023-06-12 Thread Jakub Jelinek via Gcc-patches
On Mon, Jun 12, 2023 at 09:51:02PM +, Joseph Myers wrote:
> On Sat, 10 Jun 2023, Jakub Jelinek via Gcc-patches wrote:
> 
> > I have looked at gnulib stdckdint.h and they are full of workarounds
> > for various compilers, EDG doesn't do this, clang <= 14 can't multiply
> > __int128, ..., so I think the header belongs into the compiler rather
> > than C library, because it would be a nightmare to maintain it there.
> 
> While C2x only has type-generic macros in this header, there's a proposal 
> N2868 (which didn't get consensus for C2x but may come back for a future 
> standard version) for additional interfaces for structure types with a 
> sticky overflow flag, including some functions that are expected to be 
> defined with external linkage as usual for library functions.  So if that 
> gets adopted in future, we'd need to arrange to provide those library 
> functions with external linkage - which is mostly not something we do in 
> GCC, although there are a few atomic_* functions in libatomic in addition 
> to the __atomic_* functions underlying type-generic macros.

There is always the possibility to have the header co-owned by both
the compiler and C library, limits.h style.
Just 
#if __has_include_next()
# include_next 
#endif
perhaps guarded with some macro at the end of the GCC version and
do the same at the start of the glibc version again perhaps with some macro.
And leave the compiler specific part to the compiler (perhaps with some
fallback in the libc version if the compiler specific part is missing) and
have the library related part be provided by the C library?

But if you want it solely in glibc, I can withdraw my patch (though, perhaps
the 2 preparation patches, __typeof_unqual__ and __builtin_classify_type
(typeof (...)) could be still of help for it).

> > What I'm struggling with is enforcing the weird restrictions
> > C23 imposes on these.
> 
> It's not clear all those restrictions need to be enforced - this 
> definitely seems like a case of undefined behavior to provide useful 
> extension space, where for various of those restrictions there are unique 
> sensible semantics to provide if the types in question are supported.

So why does C2X say
Recommended practice
It is recommended to produce a diagnostic message if type2 or type3 are
not suitable integer types, or if *result is not a modifiable lvalue of
a suitable integer type.
?
Or is it meant that a suitable integer type doesn't need to be necessarily
one that is listed in the previous paragraph?
Perhaps the checking could be guarded on #ifdef __STRICT_ANSI__, sure...

Jakub



Patch ping (Re: [PATCH] middle-end, i386: Pattern recognize add/subtract with carry [PR79173])

2023-06-13 Thread Jakub Jelinek via Gcc-patches
Hi!

On Tue, Jun 06, 2023 at 11:42:07PM +0200, Jakub Jelinek via Gcc-patches wrote:
> The following patch introduces {add,sub}c5_optab and pattern recognizes
> various forms of add with carry and subtract with carry/borrow, see
> pr79173-{1,2,3,4,5,6}.c tests on what is matched.
> Primarily forms with 2 __builtin_add_overflow or __builtin_sub_overflow
> calls per limb (with just one for the least significant one), for
> add with carry even when it is hand written in C (for subtraction
> reassoc seems to change it too much so that the pattern recognition
> doesn't work).  __builtin_{add,sub}_overflow are standardized in C23
> under ckd_{add,sub} names, so it isn't any longer a GNU only extension.
> 
> Note, clang has for these has (IMHO badly designed)
> __builtin_{add,sub}c{b,s,,l,ll} builtins which don't add/subtract just
> a single bit of carry, but basically add 3 unsigned values or
> subtract 2 unsigned values from one, and result in carry out of 0, 1, or 2
> because of that.  If we wanted to introduce those for clang compatibility,
> we could and lower them early to just two __builtin_{add,sub}_overflow
> calls and let the pattern matching in this patch recognize it later.
> 
> I've added expanders for this on ix86 and in addition to that
> added various peephole2s to make sure we get nice (and small) code
> for the common cases.  I think there are other PRs which request that
> e.g. for the _{addcarry,subborrow}_u{32,64} intrinsics, which the patch
> also improves.

I'd like to ping this patch.

Thanks.

Jakub



Re: [PATCH] middle-end, i386: Pattern recognize add/subtract with carry [PR79173]

2023-06-13 Thread Jakub Jelinek via Gcc-patches
On Tue, Jun 13, 2023 at 08:40:36AM +, Richard Biener wrote:
> I suspect re-association can wreck things even more here.  I have
> to say the matching code is very hard to follow, not sure if
> splitting out a function matching
> 
>_22 = .{ADD,SUB}_OVERFLOW (_6, _5);
>_23 = REALPART_EXPR <_22>;
>_24 = IMAGPART_EXPR <_22>;
> 
> from _23 and _24 would help?

I've outlined 3 most often used sequences of statements or checks
into 3 helper functions, hope that helps.

> > +  while (TREE_CODE (rhs[0]) == SSA_NAME && !rhs[3])
> > +   {
> > + gimple *g = SSA_NAME_DEF_STMT (rhs[0]);
> > + if (has_single_use (rhs[0])
> > + && is_gimple_assign (g)
> > + && (gimple_assign_rhs_code (g) == code
> > + || (code == MINUS_EXPR
> > + && gimple_assign_rhs_code (g) == PLUS_EXPR
> > + && TREE_CODE (gimple_assign_rhs2 (g)) == INTEGER_CST)))
> > +   {
> > + rhs[0] = gimple_assign_rhs1 (g);
> > + tree &r = rhs[2] ? rhs[3] : rhs[2];
> > + r = gimple_assign_rhs2 (g);
> > + if (gimple_assign_rhs_code (g) != code)
> > +   r = fold_build1 (NEGATE_EXPR, TREE_TYPE (r), r);
> 
> Can you use const_unop here?  In fact both will not reliably
> negate all constants (ick), so maybe we want a force_const_negate ()?

It is unsigned type NEGATE_EXPR of INTEGER_CST, so I think it should
work.  That said, changed it to const_unop and am just giving up on it
as if it wasn't a PLUS_EXPR with INTEGER_CST addend if const_unop doesn't
simplify.

> > + else if (addc_subc)
> > +   {
> > + if (!integer_zerop (arg2))
> > +   ;
> > + /* x = y + 0 + 0; x = y - 0 - 0; */
> > + else if (integer_zerop (arg1))
> > +   result = arg0;
> > + /* x = 0 + y + 0; */
> > + else if (subcode != MINUS_EXPR && integer_zerop (arg0))
> > +   result = arg1;
> > + /* x = y - y - 0; */
> > + else if (subcode == MINUS_EXPR
> > +  && operand_equal_p (arg0, arg1, 0))
> > +   result = integer_zero_node;
> > +   }
> 
> So this all performs simplifications but also constant folding.  In
> particular the match.pd re-simplification will invoke fold_const_call
> on all-constant argument function calls but does not do extra folding
> on partially constant arg cases but instead relies on patterns here.
> 
> Can you add all-constant arg handling to fold_const_call and
> consider moving cases like y + 0 + 0 to match.pd?

The reason I've done this here is that this is the spot where all other
similar internal functions are handled, be it the ubsan ones
- IFN_UBSAN_CHECK_{ADD,SUB,MUL}, or __builtin_*_overflow ones
- IFN_{ADD,SUB,MUL}_OVERFLOW, or these 2 new ones.  The code handles
there 2 constant arguments as well as various patterns that can be
simplified and has code to clean it up later, build a COMPLEX_CST,
or COMPLEX_EXPR etc. as needed.  So, I think we want to handle those
elsewhere, we should do it for all of those functions, but then
probably incrementally.

> > +@cindex @code{addc@var{m}5} instruction pattern
> > +@item @samp{addc@var{m}5}
> > +Adds operands 2, 3 and 4 (where the last operand is guaranteed to have
> > +only values 0 or 1) together, sets operand 0 to the result of the
> > +addition of the 3 operands and sets operand 1 to 1 iff there was no
> > +overflow on the unsigned additions, and to 0 otherwise.  So, it is
> > +an addition with carry in (operand 4) and carry out (operand 1).
> > +All operands have the same mode.
> 
> operand 1 set to 1 for no overflow sounds weird when specifying it
> as carry out - can you double check?

Fixed.

> > +@cindex @code{subc@var{m}5} instruction pattern
> > +@item @samp{subc@var{m}5}
> > +Similarly to @samp{addc@var{m}5}, except subtracts operands 3 and 4
> > +from operand 2 instead of adding them.  So, it is
> > +a subtraction with carry/borrow in (operand 4) and carry/borrow out
> > +(operand 1).  All operands have the same mode.
> > +
> 
> I wonder if we want to name them uaddc and usubc?  Or is this supposed
> to be simply the twos-complement "carry"?  I think the docs should
> say so then (note we do have uaddv and addv).

Makes sense, I've actually renamed even the internal functions etc.

Here is only lightly tested patch with everything but gimple-fold.cc
changed.

2023-06-13  Jakub Jelinek  

PR middle-end/79173
* internal-fn.def (UADDC, USUBC): New internal functions.
* internal-fn.cc (expand_UADDC, expand_USUBC): New functions.
(commutative_ternary_fn_p): Return true also for IFN_UADDC.
* optabs.def (uaddc5_optab, usubc5_optab): New optabs.
* tree-ssa-math-opts.cc (uaddc_cast, uaddc_ne0, uaddc_is_cplxpart,
match_uaddc_usubc): New functions.
(math_opts_dom_walker::after_dom_children): Call match_uaddc_usubc
for PLUS_EXPR, MINUS_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR unless
other optimizations have been succe

Re: [PATCH] modula2: Fix bootstrap

2023-06-13 Thread Jakub Jelinek via Gcc-patches
On Wed, Jun 07, 2023 at 09:42:22AM +0100, Andre Vieira (lists) wrote:
> I do need those includes and sorry I broke your bootstrap it didn't show up
> on my aarch64-unknown-linux-gnu bootstrap, I'm guessing the rules there were
> just run in a different order. Glad you were able to fix it :)

Unfortunately, it doesn't really work.
My x86_64-linux bootstrap today died again with:
In file included from ../../gcc/m2/gm2-gcc/gcc-consolidation.h:74,
 from ../../gcc/m2/gm2-lang.cc:24:
../../gcc/internal-fn.h:24:10: fatal error: insn-opinit.h: No such file or 
directory
   24 | #include "insn-opinit.h"
  |  ^~~
compilation terminated.
/home/jakub/src/gcc/obj36/./prev-gcc/xg++ 
-B/home/jakub/src/gcc/obj36/./prev-gcc/ -B/usr/local/x86_64-pc-linux-gnu/bin/ 
-nostdinc++ 
-B/home/jakub/src/gcc/obj36/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs 
-B/home/jakub/src/gcc/obj36/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs
  
-I/home/jakub/src/gcc/obj36/prev-x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu
  -I/home/jakub/src/gcc/obj36/prev-x86_64-pc-linux-gnu/libstdc++-v3/include  
-I/home/jakub/src/gcc/libstdc++-v3/libsupc++ 
-L/home/jakub/src/gcc/obj36/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs 
-L/home/jakub/src/gcc/obj36/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs
  -fno-PIE -c -g   -g -O2 -fchecking=1 -DIN_GCC -fno-exceptions -fno-rtti 
-fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual  -fno-common  -DHAVE_CONFIG_H \
 -I. -Im2/gm2-gcc -I../../gcc -I../../gcc/m2/gm2-gcc 
-I../../gcc/../include  -I../../gcc/../libcpp/include -I../../gcc/../libcody  
-I../../gcc/../libdecnumber -I../../gcc/../libdecnumber/bid -I../libdecnumber 
-I../../gcc/../libbacktrace   -I. -Im2/gm2-gcc -I../../gcc 
-I../../gcc/m2/gm2-gcc -I../../gcc/../include  -I../../gcc/../libcpp/include 
-I../../gcc/../libcody  -I../../gcc/../libdecnumber 
-I../../gcc/../libdecnumber/bid -I../libdecnumber -I../../gcc/../libbacktrace  
../../gcc/m2/gm2-gcc/m2type.cc -o m2/gm2-gcc/m2type.o
make[3]: *** [../../gcc/m2/Make-lang.in:570: m2/gm2-lang.o] Error 1
make[3]: *** Waiting for unfinished jobs
errors.  Dunno what is going on.
I've tried
--- gcc/m2/Make-lang.in.jj  2023-06-07 15:56:07.112684198 +0200
+++ gcc/m2/Make-lang.in 2023-06-13 16:08:55.409364765 +0200
@@ -511,7 +511,7 @@ GM2_LIBS_BOOT = m2/gm2-compiler-boot
 m2/gm2-libs-boot/libgm2.a \
 $(GM2-BOOT-O)
 
-$(GM2_C_OBJS) : | $(generated_files)
+m2_OBJS = $(GM2_C_OBJS)
 
 cc1gm2$(exeext): m2/stage1/cc1gm2$(exeext) $(m2.prev)
cp -p $< $@

but that doesn't really work either, this time not just random bootstrap
breakages from time to time, but all the time.
Including GM2_C_OBJS in m2_OBJS is I think the right thing, but that
results in predefining IN_GCC_FRONTEND macro and we have e.g.

/* Front ends should never have to include middle-end headers.  Enforce
   this by poisoning the header double-include protection defines.  */
#ifdef IN_GCC_FRONTEND
#pragma GCC poison GCC_RTL_H GCC_EXCEPT_H GCC_EXPR_H
#endif

in system.h to make sure that FE sources don't include rtl.h, except.h,
expr.h.  But m2/gm2-gcc/gcc-consolidation.h includes tons of the RTL
headers, rtl.h, df.h (twice), except.h; why?
Also, seems one of GM2_C_OBJS is some special copy of stor-layout.cc
which really isn't a FE file and so needs the RTL-ish headers.

Jakub



Re: [RFC] Add stdckdint.h header for C23

2023-06-13 Thread Jakub Jelinek via Gcc-patches
On Tue, Jun 13, 2023 at 03:10:48PM +, Joseph Myers wrote:
> > So why does C2X say
> > Recommended practice
> > It is recommended to produce a diagnostic message if type2 or type3 are
> > not suitable integer types, or if *result is not a modifiable lvalue of
> > a suitable integer type.
> > ?
> > Or is it meant that a suitable integer type doesn't need to be necessarily
> > one that is listed in the previous paragraph?
> > Perhaps the checking could be guarded on #ifdef __STRICT_ANSI__, sure...
> 
> Diagnostics are better than doing something completely random - but making 
> it conditional when there are sensible semantics also makes sense.
> 
> It seems likely a future standard version will support these operations 
> for bit-precise types, at least.  (Bit-precise types are generally tricky 
> for type-generic operations; there's no standard way to select on them 
> with _Generic beyond listing individual types with specific widths, and no 
> standard way to determine the width of the bit-precise type of an 
> argument.  So implementing some type-generic operations for such types may 
> need new language extensions, prompting WG14 caution about requiring such 
> support - but this also makes support for such types in standard 
> type-generic macros in headers particularly valuable, precisely because 
> they can't be implemented purely in user code using standard language 
> features.)

Yeah, having say __builtin_{clz,ctz,ffs,popcount,parity} variants which would
be typegeneric and would allow say any normal integral or _BitInt type
(or just unsigned versions thereof?) would be useful.  Even without _BitInt
we have the problem that we don't have builtins for __uint128_t.

One question is if we should keep them UB on zero input or hardcode some 
particular
behavior for clz/ctz.  The backend defaults might not be appropriate, I
think if we'd make it non-UB, using the precision of the type would be
reasonable, whatever it is (__builtin_clzb ((unsigned _BitInt(126)) 0)
might be 126 etc.).

Jakub



[PATCH] libcpp: Diagnose #include after failed __has_include [PR80753]

2023-06-13 Thread Jakub Jelinek via Gcc-patches
Hi!

As can be seen in the testcase, we don't diagnose #include/#include_next
of a non-existent header if __has_include/__has_include_next is done for
that header first.
The problem is that we normally error the first time some header is not
found, but in the _cpp_FFK_HAS_INCLUDE case obviously don't want to diagnose
it, just expand it to 0.  And libcpp caches both successful includes and
unsuccessful ones.

The following patch fixes that by remembering that we haven't diagnosed
error when using __has_include* on it, and diagnosing it when using the
cache entry in normal mode the first time.

I think _cpp_FFK_NORMAL is the only mode in which we normally diagnose
errors, for _cpp_FFK_PRE_INCLUDE that open_file_failed isn't reached
and for _cpp_FFK_FAKE neither.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk and
after a while for backports?

2023-06-13  Jakub Jelinek  

PR preprocessor/80753
libcpp/
* files.cc (struct _cpp_file): Add deferred_error bitfield.
(_cpp_find_file): When finding a file in cache with deferred_error
set in _cpp_FFK_NORMAL mode, call open_file_failed and clear the flag.
Set deferred_error in _cpp_FFK_HAS_INCLUDE mode if open_file_failed
hasn't been called.
gcc/testsuite/
* c-c++-common/missing-header-5.c: New test.

--- libcpp/files.cc.jj  2023-01-16 11:52:16.326730483 +0100
+++ libcpp/files.cc 2023-06-13 11:27:59.867465878 +0200
@@ -109,6 +109,10 @@ struct _cpp_file
   /* If this file is implicitly preincluded.  */
   bool implicit_preinclude : 1;
 
+  /* Set if a header wasn't found with __has_include or __has_include_next
+ and error should be emitted if it is included normally.  */
+  bool deferred_error : 1;
+
   /* > 0: Known C++ Module header unit, <0: known not.  ==0, unknown  */
   int header_unit : 2;
 };
@@ -523,7 +527,14 @@ _cpp_find_file (cpp_reader *pfile, const
   cpp_file_hash_entry *entry
 = search_cache ((struct cpp_file_hash_entry *) *hash_slot, start_dir);
   if (entry)
-return entry->u.file;
+{
+  if (entry->u.file->deferred_error && kind == _cpp_FFK_NORMAL)
+   {
+ open_file_failed (pfile, entry->u.file, angle_brackets, loc);
+ entry->u.file->deferred_error = false;
+   }
+  return entry->u.file;
+}
 
   _cpp_file *file = make_cpp_file (start_dir, fname);
   file->implicit_preinclude
@@ -589,6 +600,8 @@ _cpp_find_file (cpp_reader *pfile, const
 
if (kind != _cpp_FFK_HAS_INCLUDE)
  open_file_failed (pfile, file, angle_brackets, loc);
+   else
+ file->deferred_error = true;
break;
  }
 
--- gcc/testsuite/c-c++-common/missing-header-5.c.jj2023-06-13 
11:29:49.345931030 +0200
+++ gcc/testsuite/c-c++-common/missing-header-5.c   2023-06-13 
11:25:34.952497526 +0200
@@ -0,0 +1,15 @@
+/* PR preprocessor/80753 */
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+#if __has_include("nonexistent.h")
+# error
+#endif
+
+#include "nonexistent.h"
+
+/* { dg-message "nonexistent.h" "nonexistent.h" { target *-*-* } 0 } */
+/* { dg-message "terminated" "terminated" { target *-*-* } 0 } */
+
+/* This declaration should not receive any diagnostic.  */
+foo bar;

Jakub



[committed] i386: Fix up whitespace in assembly

2023-06-13 Thread Jakub Jelinek via Gcc-patches
Hi!

I've noticed that standard_sse_constant_opcode emits some spurious
whitespace around tab, that isn't something which is done for
any other instruction and looks wrong.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk
as obvious.

2023-06-13  Jakub Jelinek  

* config/i386/i386.cc (standard_sse_constant_opcode): Remove
superfluous spaces around \t for vpcmpeqd.

--- gcc/config/i386/i386.cc.jj  2023-06-12 15:47:21.855511203 +0200
+++ gcc/config/i386/i386.cc 2023-06-13 14:33:49.940464312 +0200
@@ -5358,19 +5358,19 @@ standard_sse_constant_opcode (rtx_insn *
   if (GET_MODE_SIZE (mode) == 64)
{
  gcc_assert (TARGET_AVX512F);
- return "vpcmpeqd \t %t0, %t0, %t0";
+ return "vpcmpeqd\t%t0, %t0, %t0";
}
   else if (GET_MODE_SIZE (mode) == 32)
{
  gcc_assert (TARGET_AVX);
- return "vpcmpeqd \t %x0, %x0, %x0";
+ return "vpcmpeqd\t%x0, %x0, %x0";
}
   gcc_unreachable ();
 }
   else if (vector_all_ones_zero_extend_quarter_operand (x, mode))
 {
   gcc_assert (TARGET_AVX512F);
-  return "vpcmpeqd \t %x0, %x0, %x0";
+  return "vpcmpeqd\t%x0, %x0, %x0";
 }
 
   gcc_unreachable ();

Jakub



Re: [RFC] Add stdckdint.h header for C23

2023-06-13 Thread Jakub Jelinek via Gcc-patches
On Tue, Jun 13, 2023 at 07:54:25PM -0700, Paul Eggert wrote:
> I don't see how you could implement __has_include_next() for
> arbitrary non-GCC compilers, which is what we'd need for glibc users. For
> glibc internals we can use "#include_next" more readily, since we assume a
> new-enough GCC. I.e. we could do something like this:

Indeed, limits.h uses
#if defined __GNUC__ && !defined _GCC_LIMITS_H_
/* `_GCC_LIMITS_H_' is what GCC's file defines.  */
# include_next 
#endif

I'd just do:
#if defined (__has_include_next) && !defined (ckd_add)
# if __has_include_next ()
#  include_next 
# endif
#endif

and then deal with the fallback (if the C library really needs one,
does it need to support a C23 feature in arbitrary compiler which
doesn't support C23?).
A fallback should start with using __builtin_{add,sub,mul}_overflow
without extra checking if compiler supports it, with or without
use of __has_builtin.  GCC 6 and later has __builtin_{add,sub,mul}_overflow,
only GCC 10 has __has_builtin, I think clang has added those shortly
after they were added in GCC 6 (though, __builtin_{add,sub,mul}_overflow_p
which has been added in GCC 7 wasn't added to clang).

Jakub



Re: [PATCH] middle-end, i386: Pattern recognize add/subtract with carry [PR79173]

2023-06-14 Thread Jakub Jelinek via Gcc-patches
On Tue, Jun 13, 2023 at 01:29:04PM +0200, Jakub Jelinek via Gcc-patches wrote:
> > > +   else if (addc_subc)
> > > + {
> > > +   if (!integer_zerop (arg2))
> > > + ;
> > > +   /* x = y + 0 + 0; x = y - 0 - 0; */
> > > +   else if (integer_zerop (arg1))
> > > + result = arg0;
> > > +   /* x = 0 + y + 0; */
> > > +   else if (subcode != MINUS_EXPR && integer_zerop (arg0))
> > > + result = arg1;
> > > +   /* x = y - y - 0; */
> > > +   else if (subcode == MINUS_EXPR
> > > +&& operand_equal_p (arg0, arg1, 0))
> > > + result = integer_zero_node;
> > > + }
> > 
> > So this all performs simplifications but also constant folding.  In
> > particular the match.pd re-simplification will invoke fold_const_call
> > on all-constant argument function calls but does not do extra folding
> > on partially constant arg cases but instead relies on patterns here.
> > 
> > Can you add all-constant arg handling to fold_const_call and
> > consider moving cases like y + 0 + 0 to match.pd?
> 
> The reason I've done this here is that this is the spot where all other
> similar internal functions are handled, be it the ubsan ones
> - IFN_UBSAN_CHECK_{ADD,SUB,MUL}, or __builtin_*_overflow ones
> - IFN_{ADD,SUB,MUL}_OVERFLOW, or these 2 new ones.  The code handles
> there 2 constant arguments as well as various patterns that can be
> simplified and has code to clean it up later, build a COMPLEX_CST,
> or COMPLEX_EXPR etc. as needed.  So, I think we want to handle those
> elsewhere, we should do it for all of those functions, but then
> probably incrementally.

The patch I've posted yesterday now fully tested on x86_64-linux and
i686-linux.

Here is an untested incremental patch to handle constant folding of these
in fold-const-call.cc rather than gimple-fold.cc.
Not really sure if that is the way to go because it is replacing 28
lines of former code with 65 of new code, for the overall benefit that say
int
foo (long long *p)
{
  int one = 1;
  long long max = __LONG_LONG_MAX__;
  return __builtin_add_overflow (one, max, p);
}
can be now fully folded already in ccp1 pass while before it was only
cleaned up in forwprop1 pass right after it.

As for doing some stuff in match.pd, I'm afraid it would result in even more
significant growth, the advantage of gimple-fold.cc doing all of these in
one place is that the needed infrastructure can be shared.

--- gcc/gimple-fold.cc.jj   2023-06-14 12:21:38.657657759 +0200
+++ gcc/gimple-fold.cc  2023-06-14 12:52:04.335054958 +0200
@@ -5731,34 +5731,6 @@ gimple_fold_call (gimple_stmt_iterator *
result = arg0;
  else if (subcode == MULT_EXPR && integer_onep (arg0))
result = arg1;
- if (type
- && result == NULL_TREE
- && TREE_CODE (arg0) == INTEGER_CST
- && TREE_CODE (arg1) == INTEGER_CST
- && (!uaddc_usubc || TREE_CODE (arg2) == INTEGER_CST))
-   {
- if (cplx_result)
-   result = int_const_binop (subcode, fold_convert (type, arg0),
- fold_convert (type, arg1));
- else
-   result = int_const_binop (subcode, arg0, arg1);
- if (result && arith_overflowed_p (subcode, type, arg0, arg1))
-   {
- if (cplx_result)
-   overflow = build_one_cst (type);
- else
-   result = NULL_TREE;
-   }
- if (uaddc_usubc && result)
-   {
- tree r = int_const_binop (subcode, result,
-   fold_convert (type, arg2));
- if (r == NULL_TREE)
-   result = NULL_TREE;
- else if (arith_overflowed_p (subcode, type, result, arg2))
-   overflow = build_one_cst (type);
-   }
-   }
  if (result)
{
  if (result == integer_zero_node)
--- gcc/fold-const-call.cc.jj   2023-06-02 10:36:43.096967505 +0200
+++ gcc/fold-const-call.cc  2023-06-14 12:56:08.195631214 +0200
@@ -1669,6 +1669,7 @@ fold_const_call (combined_fn fn, tree ty
 {
   const char *p0, *p1;
   char c;
+  tree_code subcode;
   switch (fn)
 {
 case CFN_BUILT_IN_STRSPN:
@@ -1738,6 +1739,46 @@ fold_const_call (combined_fn fn, tree ty
 case CFN_FOLD_LEFT_PLUS:
   return fold_const_fold_left (type, arg0, arg1, PLUS_EXPR);
 
+case CFN_UBSAN_CHECK_ADD:
+case CFN_ADD_OVERFLOW:
+  subcode = PLUS_EXPR;
+  goto arith_overflow;
+
+case CFN_UBSAN_CHECK_SUB:
+case CFN_SUB_OVERFLOW:
+  subcode

[PATCH] middle-end: Move constant args folding of .UBSAN_CHECK_* and .*_OVERFLOW into fold-const-call.cc

2023-06-14 Thread Jakub Jelinek via Gcc-patches
On Wed, Jun 14, 2023 at 12:25:46PM +, Richard Biener wrote:
> I think that's still very much desirable so this followup looks OK.
> Maybe you can re-base it as prerequesite though?

Rebased then (of course with the UADDC/USUBC handling removed from this
first patch, will be added in the second one).

Ok for trunk if it passes bootstrap/regtest?

2023-06-14  Jakub Jelinek  

* gimple-fold.cc (gimple_fold_call): Move handling of arg0
as well as arg1 INTEGER_CSTs for .UBSAN_CHECK_{ADD,SUB,MUL}
and .{ADD,SUB,MUL}_OVERFLOW calls from here...
* fold-const-call.cc (fold_const_call): ... here.

--- gcc/gimple-fold.cc.jj   2023-06-13 18:23:37.199793275 +0200
+++ gcc/gimple-fold.cc  2023-06-14 15:41:51.090987708 +0200
@@ -5702,22 +5702,6 @@ gimple_fold_call (gimple_stmt_iterator *
result = arg0;
  else if (subcode == MULT_EXPR && integer_onep (arg0))
result = arg1;
- else if (TREE_CODE (arg0) == INTEGER_CST
-  && TREE_CODE (arg1) == INTEGER_CST)
-   {
- if (cplx_result)
-   result = int_const_binop (subcode, fold_convert (type, arg0),
- fold_convert (type, arg1));
- else
-   result = int_const_binop (subcode, arg0, arg1);
- if (result && arith_overflowed_p (subcode, type, arg0, arg1))
-   {
- if (cplx_result)
-   overflow = build_one_cst (type);
- else
-   result = NULL_TREE;
-   }
-   }
  if (result)
{
  if (result == integer_zero_node)
--- gcc/fold-const-call.cc.jj   2023-06-02 10:36:43.096967505 +0200
+++ gcc/fold-const-call.cc  2023-06-14 15:40:34.388064498 +0200
@@ -1669,6 +1669,7 @@ fold_const_call (combined_fn fn, tree ty
 {
   const char *p0, *p1;
   char c;
+  tree_code subcode;
   switch (fn)
 {
 case CFN_BUILT_IN_STRSPN:
@@ -1738,6 +1739,46 @@ fold_const_call (combined_fn fn, tree ty
 case CFN_FOLD_LEFT_PLUS:
   return fold_const_fold_left (type, arg0, arg1, PLUS_EXPR);
 
+case CFN_UBSAN_CHECK_ADD:
+case CFN_ADD_OVERFLOW:
+  subcode = PLUS_EXPR;
+  goto arith_overflow;
+
+case CFN_UBSAN_CHECK_SUB:
+case CFN_SUB_OVERFLOW:
+  subcode = MINUS_EXPR;
+  goto arith_overflow;
+
+case CFN_UBSAN_CHECK_MUL:
+case CFN_MUL_OVERFLOW:
+  subcode = MULT_EXPR;
+  goto arith_overflow;
+
+arith_overflow:
+  if (integer_cst_p (arg0) && integer_cst_p (arg1))
+   {
+ tree itype
+   = TREE_CODE (type) == COMPLEX_TYPE ? TREE_TYPE (type) : type;
+ bool ovf = false;
+ tree r = int_const_binop (subcode, fold_convert (itype, arg0),
+   fold_convert (itype, arg1));
+ if (!r || TREE_CODE (r) != INTEGER_CST)
+   return NULL_TREE;
+ if (arith_overflowed_p (subcode, itype, arg0, arg1))
+   ovf = true;
+ if (TREE_OVERFLOW (r))
+   r = drop_tree_overflow (r);
+ if (itype == type)
+   {
+ if (ovf)
+   return NULL_TREE;
+ return r;
+   }
+ else
+   return build_complex (type, r, build_int_cst (itype, ovf));
+   }
+  return NULL_TREE;
+
 default:
   return fold_const_call_1 (fn, type, arg0, arg1);
 }


Jakub



[PATCH] middle-end, i386, v3: Pattern recognize add/subtract with carry [PR79173]

2023-06-14 Thread Jakub Jelinek via Gcc-patches
Hi!

On Wed, Jun 14, 2023 at 12:35:42PM +, Richard Biener wrote:
> At this point two pages of code without a comment - can you introduce
> some vertical spacing and comments as to what is matched now?  The
> split out functions help somewhat but the code is far from obvious :/
> 
> Maybe I'm confused by the loops and instead of those sth like
> 
>  if (match_x_y_z (op0)
>  || match_x_y_z (op1))
>...
> 
> would be easier to follow with the loop bodies split out?
> Maybe put just put them in lambdas even?
> 
> I guess you'll be around as long as myself so we can go with
> this code under the premise you're going to maintain it - it's
> not that I'm writing trivially to understand code myself ...

As I said on IRC, I don't really know how to split that into further
functions, the problem is that we need to pattern match a lot of
statements and it is hard to come up with names for each of them.
And we need quite a lot of variables for checking their interactions.

The code isn't that much different from say match_arith_overflow or
optimize_spaceship or other larger pattern recognizers.  And the
intent is that all the code paths in the recognizer are actually covered
by the testcases in the testsuite.

That said, I've added 18 new comments to the function, and rebased it
on top of the
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621717.html
patch with all constant arguments handling moved to fold-const-call.cc
even for the new ifns.

Ok for trunk like this if it passes bootstrap/regtest?

2023-06-13  Jakub Jelinek  

PR middle-end/79173
* internal-fn.def (UADDC, USUBC): New internal functions.
* internal-fn.cc (expand_UADDC, expand_USUBC): New functions.
(commutative_ternary_fn_p): Return true also for IFN_UADDC.
* optabs.def (uaddc5_optab, usubc5_optab): New optabs.
* tree-ssa-math-opts.cc (uaddc_cast, uaddc_ne0, uaddc_is_cplxpart,
match_uaddc_usubc): New functions.
(math_opts_dom_walker::after_dom_children): Call match_uaddc_usubc
for PLUS_EXPR, MINUS_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR unless
other optimizations have been successful for those.
* gimple-fold.cc (gimple_fold_call): Handle IFN_UADDC and IFN_USUBC.
* fold-const-call.cc (fold_const_call): Likewise.
* gimple-range-fold.cc (adjust_imagpart_expr): Likewise.
* tree-ssa-dce.cc (eliminate_unnecessary_stmts): Likewise.
* doc/md.texi (uaddc5, usubc5): Document new named
patterns.
* config/i386/i386.md (subborrow): Add alternative with
memory destination.
(uaddc5, usubc5): New define_expand patterns.
(*sub_3, @add3_carry, addcarry, @sub3_carry,
subborrow, *add3_cc_overflow_1): Add define_peephole2
TARGET_READ_MODIFY_WRITE/-Os patterns to prefer using memory
destination in these patterns.

* gcc.target/i386/pr79173-1.c: New test.
* gcc.target/i386/pr79173-2.c: New test.
* gcc.target/i386/pr79173-3.c: New test.
* gcc.target/i386/pr79173-4.c: New test.
* gcc.target/i386/pr79173-5.c: New test.
* gcc.target/i386/pr79173-6.c: New test.
* gcc.target/i386/pr79173-7.c: New test.
* gcc.target/i386/pr79173-8.c: New test.
* gcc.target/i386/pr79173-9.c: New test.
* gcc.target/i386/pr79173-10.c: New test.

--- gcc/internal-fn.def.jj  2023-06-13 18:23:37.208793152 +0200
+++ gcc/internal-fn.def 2023-06-14 12:21:38.650657857 +0200
@@ -416,6 +416,8 @@ DEF_INTERNAL_FN (ASAN_POISON_USE, ECF_LE
 DEF_INTERNAL_FN (ADD_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (SUB_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (MUL_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
+DEF_INTERNAL_FN (UADDC, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
+DEF_INTERNAL_FN (USUBC, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (TSAN_FUNC_EXIT, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (VA_ARG, ECF_NOTHROW | ECF_LEAF, NULL)
 DEF_INTERNAL_FN (VEC_CONVERT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
--- gcc/internal-fn.cc.jj   2023-06-13 18:23:37.206793179 +0200
+++ gcc/internal-fn.cc  2023-06-14 12:21:38.652657829 +0200
@@ -2776,6 +2776,44 @@ expand_MUL_OVERFLOW (internal_fn, gcall
   expand_arith_overflow (MULT_EXPR, stmt);
 }
 
+/* Expand UADDC STMT.  */
+
+static void
+expand_UADDC (internal_fn ifn, gcall *stmt)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  tree arg1 = gimple_call_arg (stmt, 0);
+  tree arg2 = gimple_call_arg (stmt, 1);
+  tree arg3 = gimple_call_arg (stmt, 2);
+  tree type = TREE_TYPE (arg1);
+  machine_mode mode = TYPE_MODE (type);
+  insn_code icode = optab_handler (ifn == IFN_UADDC
+  ? uaddc5_optab : usubc5_optab, mode);
+  rtx op1 = expand_normal (arg1);
+  rtx op2 = expand_normal (arg2);
+  rtx op3 = expand_normal (arg3);
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE

Re: [PATCH] middle-end, i386, v3: Pattern recognize add/subtract with carry [PR79173]

2023-06-14 Thread Jakub Jelinek via Gcc-patches
On Wed, Jun 14, 2023 at 04:34:27PM +0200, Uros Bizjak wrote:
> LGTM for the x86 part. I did my best, but those peephole2 patterns are
> real PITA to be reviewed thoroughly.
> 
> Maybe split out peephole2 pack to a separate patch, followed by a
> testcase patch. This way, bisection would be able to point out if a
> generic part or target-dependent part caused eventual regression.

Ok.  Guess if it helps for bisection, I could even split the peephole2s
to one peephole2 addition per commit and then the final patch would add the
expanders and the generic code.

Jakub



Re: [PATCH] middle-end, i386, v3: Pattern recognize add/subtract with carry [PR79173]

2023-06-14 Thread Jakub Jelinek via Gcc-patches
On Wed, Jun 14, 2023 at 04:45:48PM +0200, Uros Bizjak wrote:
> +;; Helper peephole2 for the addcarry and subborrow
> +;; peephole2s, to optimize away nop which resulted from uaddc/usubc
> +;; expansion optimization.
> +(define_peephole2
> +  [(set (match_operand:SWI48 0 "general_reg_operand")
> +   (match_operand:SWI48 1 "memory_operand"))
> +   (const_int 0)]
> +  ""
> +  [(set (match_dup 0) (match_dup 1))])
> 
> Is this (const_int 0) from a recent patch from Roger that introduced:

The first one I see is the one immediately above that:
;; Pre-reload splitter to optimize
;; *setcc_qi followed by *addqi3_cconly_overflow_1 with the same QI
;; operand and no intervening flags modifications into nothing.
(define_insn_and_split "*setcc_qi_addqi3_cconly_overflow_1_"
  [(set (reg:CCC FLAGS_REG)
(compare:CCC (neg:QI (geu:QI (reg:CC_CCC FLAGS_REG) (const_int 0)))
 (ltu:QI (reg:CC_CCC FLAGS_REG) (const_int 0]
  "ix86_pre_reload_split ()"
  "#"
  "&& 1"
  [(const_int 0)])

And you're right, the following incremental patch (I'd integrate it
into the full patch with
(*setcc_qi_addqi3_cconly_overflow_1_, *setccc,
*setcc_qi_negqi_ccc_1_, *setcc_qi_negqi_ccc_2_): Split
into NOTE_INSN_DELETED note rather than nop instruction.
added to ChangeLog) passes all the new tests as well:

--- gcc/config/i386/i386.md 2023-06-14 12:21:38.668657604 +0200
+++ gcc/config/i386/i386.md 2023-06-14 17:12:31.742625193 +0200
@@ -7990,16 +7990,6 @@
(set_attr "pent_pair" "pu")
(set_attr "mode" "")])
 
-;; Helper peephole2 for the addcarry and subborrow
-;; peephole2s, to optimize away nop which resulted from uaddc/usubc
-;; expansion optimization.
-(define_peephole2
-  [(set (match_operand:SWI48 0 "general_reg_operand")
-   (match_operand:SWI48 1 "memory_operand"))
-   (const_int 0)]
-  ""
-  [(set (match_dup 0) (match_dup 1))])
-
 (define_peephole2
   [(parallel [(set (reg:CCC FLAGS_REG)
   (compare:CCC
@@ -8641,7 +8631,8 @@
   "ix86_pre_reload_split ()"
   "#"
   "&& 1"
-  [(const_int 0)])
+  [(const_int 0)]
+  "emit_note (NOTE_INSN_DELETED); DONE;")
 
 ;; Set the carry flag from the carry flag.
 (define_insn_and_split "*setccc"
@@ -8650,7 +8641,8 @@
   "ix86_pre_reload_split ()"
   "#"
   "&& 1"
-  [(const_int 0)])
+  [(const_int 0)]
+  "emit_note (NOTE_INSN_DELETED); DONE;")
 
 ;; Set the carry flag from the carry flag.
 (define_insn_and_split "*setcc_qi_negqi_ccc_1_"
@@ -8659,7 +8651,8 @@
   "ix86_pre_reload_split ()"
   "#"
   "&& 1"
-  [(const_int 0)])
+  [(const_int 0)]
+  "emit_note (NOTE_INSN_DELETED); DONE;")
 
 ;; Set the carry flag from the carry flag.
 (define_insn_and_split "*setcc_qi_negqi_ccc_2_"
@@ -8669,7 +8662,8 @@
   "ix86_pre_reload_split ()"
   "#"
   "&& 1"
-  [(const_int 0)])
+  [(const_int 0)]
+  "emit_note (NOTE_INSN_DELETED); DONE;")
 
 ;; Overflow setting add instructions
 

Jakub



Re: [WIP RFC] Add support for keyword-based attributes

2023-07-14 Thread Jakub Jelinek via Gcc-patches
On Fri, Jul 14, 2023 at 04:56:18PM +0100, Richard Sandiford via Gcc-patches 
wrote:
> Summary: We'd like to be able to specify some attributes using
> keywords, rather than the traditional __attribute__ or [[...]]
> syntax.  Would that be OK?

Will defer to C/C++ maintainers, but as you mentioned, there are many
attributes which really can't be ignored and change behavior significantly.
vector_size is one of those, mode attribute another,
no_unique_address another one (changes ABI in various cases),
the OpenMP attributes (omp::directive, omp::sequence) can change
behavior if -fopenmp, etc.
One can easily error with
#ifdef __has_cpp_attribute
#if !__has_cpp_attribute (arm::whatever)
#error arm::whatever attribute unsupported
#endif
#else
#error __has_cpp_attribute unsupported
#endif
Adding keywords instead of attributes seems to be too ugly to me.

Jakub



Re: [PATCH 1/2] [i386] Support type _Float16/__bf16 independent of SSE2.

2023-07-19 Thread Jakub Jelinek via Gcc-patches
On Wed, Jul 19, 2023 at 01:58:21PM +0800, Hongtao Liu wrote:
> > LGTM, if you need someone to rubber-stamp the patch. I'm not really
> > versed in this part of the compiler, so please wait a day if someone
> > has anything to say about the patch.
> Thanks, pushed to trunk.

I see some regressions most likely with this change on i686-linux,
in particular:
+FAIL: gcc.dg/pr107547.c (test for excess errors)
+FAIL: gcc.dg/torture/floatn-convert.c   -O0  (test for excess errors)
+UNRESOLVED: gcc.dg/torture/floatn-convert.c   -O0  compilation failed to 
produce executable
+FAIL: gcc.dg/torture/floatn-convert.c   -O1  (test for excess errors)
+UNRESOLVED: gcc.dg/torture/floatn-convert.c   -O1  compilation failed to 
produce executable
+FAIL: gcc.dg/torture/floatn-convert.c   -O2  (test for excess errors)
+UNRESOLVED: gcc.dg/torture/floatn-convert.c   -O2  compilation failed to 
produce executable
+FAIL: gcc.dg/torture/floatn-convert.c   -O2 -flto  (test for excess errors)
+UNRESOLVED: gcc.dg/torture/floatn-convert.c   -O2 -flto  compilation failed to 
produce executable
+FAIL: gcc.dg/torture/floatn-convert.c   -O2 -flto -flto-partition=none  (test 
for excess errors)
+UNRESOLVED: gcc.dg/torture/floatn-convert.c   -O2 -flto -flto-partition=none  
compilation failed to produce executable
+FAIL: gcc.dg/torture/floatn-convert.c   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess 
errors)
+UNRESOLVED: gcc.dg/torture/floatn-convert.c   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  compilation failed to 
produce executable
+FAIL: gcc.dg/torture/floatn-convert.c   -O3 -g  (test for excess errors)
+UNRESOLVED: gcc.dg/torture/floatn-convert.c   -O3 -g  compilation failed to 
produce executable
+FAIL: gcc.dg/torture/floatn-convert.c   -Os  (test for excess errors)
+UNRESOLVED: gcc.dg/torture/floatn-convert.c   -Os  compilation failed to 
produce executable
+FAIL: gcc.target/i386/float16-7.c  (test for errors, line 7)

Perhaps we need to tweak
gcc/testsuite/lib/target-supports.exp (add_options_for_float16)
so that it adds -msse2 for i?86-*-* x86_64-*-* (that would likely
fix up floatn-convert) and for the others perhaps
/* { dg-add-options float16 } */
?

Jakub



[PATCH] wide-int: Fix up wi::divmod_internal [PR110731]

2023-07-19 Thread Jakub Jelinek via Gcc-patches
Hi!

As the following testcase shows, wi::divmod_internal doesn't handle
correctly signed division with precision > 64 when the dividend (and likely
divisor as well) is the type's minimum and the precision isn't divisible
by 64.

A few lines above what the patch hunk changes is:
  /* Make the divisor and dividend positive and remember what we
 did.  */
  if (sgn == SIGNED)
{
  if (wi::neg_p (dividend))
{
  neg_dividend = -dividend;
  dividend = neg_dividend;
  dividend_neg = true;
}
  if (wi::neg_p (divisor))
{
  neg_divisor = -divisor;
  divisor = neg_divisor;
  divisor_neg = true;
}
}
i.e. we negate negative dividend or divisor and remember those.
But, after we do that, when unpacking those values into b_dividend and
b_divisor we need to always treat the wide_ints as UNSIGNED,
because divmod_internal_2 performs an unsigned division only.
Now, if precision <= 64, we don't reach here at all, earlier code
handles it.  If dividend or divisor aren't the most negative values,
the negation clears their most significant bit, so it doesn't really
matter if we unpack SIGNED or UNSIGNED.  And if precision is multiple
of HOST_BITS_PER_WIDE_INT, there is no difference in behavior, while
-0x8000 negates to
-0x8000 the unpacking of it as SIGNED
or UNSIGNED works the same.
In the testcase, we have signed precision 119 and the dividend is
val = { 0, 0xffc0 }, len = 2, precision = 119
both before and after negation.
Divisor is
val = { 2 }, len = 1, precision = 119
But we really want to divide 0x40 by 2
unsigned and then negate at the end.
If it is unsigned precision 119 division
0x40 by 2
dividend is
val = { 0, 0xffc0 }, len = 2, precision = 119
but as we unpack it UNSIGNED, it is unpacked into
0, 0, 0, 0x0040

The following patch fixes it by always using UNSIGNED unpacking
because we've already negated negative values at that point if
sgn == SIGNED and so most negative constants should be treated as
positive.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk, 13.2
and later for older branches?

2023-07-19  Jakub Jelinek  

PR tree-optimization/110731
* wide-int.cc (wi::divmod_internal): Always unpack dividend and
divisor as UNSIGNED regardless of sgn.

* gcc.dg/pr110731.c: New test.

--- gcc/wide-int.cc.jj  2023-06-12 15:47:22.461502821 +0200
+++ gcc/wide-int.cc 2023-07-19 09:52:40.241661869 +0200
@@ -1911,9 +1911,9 @@ wi::divmod_internal (HOST_WIDE_INT *quot
 }
 
   wi_unpack (b_dividend, dividend.get_val (), dividend.get_len (),
-dividend_blocks_needed, dividend_prec, sgn);
+dividend_blocks_needed, dividend_prec, UNSIGNED);
   wi_unpack (b_divisor, divisor.get_val (), divisor.get_len (),
-divisor_blocks_needed, divisor_prec, sgn);
+divisor_blocks_needed, divisor_prec, UNSIGNED);
 
   m = dividend_blocks_needed;
   b_dividend[m] = 0;
--- gcc/testsuite/gcc.dg/pr110731.c.jj  2023-07-19 10:03:03.707986705 +0200
+++ gcc/testsuite/gcc.dg/pr110731.c 2023-07-19 10:04:34.857716862 +0200
@@ -0,0 +1,17 @@
+/* PR tree-optimization/110731 */
+/* { dg-do run { target int128 } } */
+/* { dg-options "-O2" } */
+
+__int128
+foo (void)
+{
+  struct S { __int128 f : 119; } s = { ((__int128) -18014398509481984) << 64 };
+  return s.f / 2;
+}
+
+int
+main ()
+{
+  if (foo () != (((__int128) -9007199254740992) << 64))
+__builtin_abort ();
+}

Jakub



[committed] tree-switch-conversion: Fix a comment typo

2023-07-19 Thread Jakub Jelinek via Gcc-patches
Hi!

I've noticed a comment typo, this patch fixes that.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk
as obvious.

2023-07-19  Jakub Jelinek  

* tree-switch-conversion.h (class bit_test_cluster): Fix comment typo.

--- gcc/tree-switch-conversion.h.jj 2023-02-17 12:45:08.218636117 +0100
+++ gcc/tree-switch-conversion.h2023-07-18 10:10:21.398933379 +0200
@@ -303,7 +303,7 @@ public:
 /* A GIMPLE switch statement can be expanded to a short sequence of bit-wise
 comparisons.  "switch(x)" is converted into "if ((1 << (x-MINVAL)) & CST)"
 where CST and MINVAL are integer constants.  This is better than a series
-of compare-and-banch insns in some cases,  e.g. we can implement:
+of compare-and-branch insns in some cases,  e.g. we can implement:
 
if ((x==4) || (x==6) || (x==9) || (x==11))
 

Jakub



Re: [PATCH] middle-end/61747 - conditional move expansion and constants

2023-07-19 Thread Jakub Jelinek via Gcc-patches
On Tue, Jul 18, 2023 at 01:25:45PM +0200, Richard Biener wrote:
> 
>   PR middle-end/61747
>   * internal-fn.cc (expand_vec_cond_optab_fn): When the
>   value operands are equal to the original comparison operands
>   preserve that equality by re-using the comparison expansion.
>   * optabs.cc (emit_conditional_move): When the value operands
>   are equal to the comparison operands and would be forced to
>   a register by prepare_cmp_insn do so earlier, preserving the
>   equality.
> 
>   * g++.target/i386/pr61747.C: New testcase.
> ---
>  gcc/internal-fn.cc  | 17 --
>  gcc/optabs.cc   | 32 ++-
>  gcc/testsuite/g++.target/i386/pr61747.C | 42 +
>  3 files changed, 88 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/i386/pr61747.C
> 
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index e698f0bffc7..c83c3921792 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -3019,8 +3019,21 @@ expand_vec_cond_optab_fn (internal_fn, gcall *stmt, 
> convert_optab optab)
>icode = convert_optab_handler (optab, mode, cmp_op_mode);
>rtx comparison
>  = vector_compare_rtx (VOIDmode, tcode, op0a, op0b, unsignedp, icode, 4);
> -  rtx rtx_op1 = expand_normal (op1);
> -  rtx rtx_op2 = expand_normal (op2);
> +  /* vector_compare_rtx legitimizes operands, preserve equality when
> + expanding op1/op2.  */
> +  rtx rtx_op1, rtx_op2;
> +  if (operand_equal_p (op1, op0a))
> +rtx_op1 = XEXP (comparison, 0);
> +  else if (operand_equal_p (op1, op0b))
> +rtx_op1 = XEXP (comparison, 1);
> +  else
> +rtx_op1 = expand_normal (op1);
> +  if (operand_equal_p (op2, op0a))
> +rtx_op2 = XEXP (comparison, 0);
> +  else if (operand_equal_p (op2, op0b))
> +rtx_op2 = XEXP (comparison, 1);
> +  else
> +rtx_op2 = expand_normal (op2);
>  
>rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
>create_output_operand (&ops[0], target, mode);

The above LGTM, it relies on vector_compare_rtx not swapping the arguments
or performing some other comparison canonicalization, but at least right now
that function doesn't seem to do that.

> --- a/gcc/optabs.cc
> +++ b/gcc/optabs.cc
> @@ -5119,13 +5119,43 @@ emit_conditional_move (rtx target, struct 
> rtx_comparison comp,
> last = get_last_insn ();
> do_pending_stack_adjust ();
> machine_mode cmpmode = comp.mode;
> +   rtx orig_op0 = XEXP (comparison, 0);
> +   rtx orig_op1 = XEXP (comparison, 1);
> +   rtx op2p = op2;
> +   rtx op3p = op3;
> +   /* If we are optimizing, force expensive constants into a register
> +  but preserve an eventual equality with op2/op3.  */
> +   if (CONSTANT_P (orig_op0) && optimize
> +   && (rtx_cost (orig_op0, mode, COMPARE, 0,
> + optimize_insn_for_speed_p ())
> +   > COSTS_N_INSNS (1))
> +   && can_create_pseudo_p ())
> + {
> +   XEXP (comparison, 0) = force_reg (cmpmode, orig_op0);
> +   if (rtx_equal_p (orig_op0, op2))
> + op2p = XEXP (comparison, 0);
> +   if (rtx_equal_p (orig_op0, op3))
> + op3p = XEXP (comparison, 0);
> + }
> +   if (CONSTANT_P (orig_op1) && optimize
> +   && (rtx_cost (orig_op1, mode, COMPARE, 0,
> + optimize_insn_for_speed_p ())
> +   > COSTS_N_INSNS (1))
> +   && can_create_pseudo_p ())
> + {
> +   XEXP (comparison, 1) = force_reg (cmpmode, orig_op1);
> +   if (rtx_equal_p (orig_op1, op2))
> + op2p = XEXP (comparison, 1);
> +   if (rtx_equal_p (orig_op1, op3))
> + op3p = XEXP (comparison, 1);
> + }

I'm worried here, because prepare_cmp_insn before doing almost identical
forcing to reg does
  if (CONST_SCALAR_INT_P (y))
canonicalize_comparison (mode, &comparison, &y);
which the above change will make not happen anymore (for the more expensive
constants).
If we have a match between at least one of the comparison operands and
op2/op3, I think having equivalency there is perhaps more important than
the canonicalization, but it would be nice not to break it even if there
is no match.  So, perhaps force_reg only if there is a match?
force_reg (cmpmode, force_reg (cmpmode, x)) is equivalent to
force_reg (cmpmode, x), so perhaps:
{
  if (rtx_equal_p (orig_op0, op2))
op2p = XEXP (comparison, 0) = force_reg (cmpmode, orig_op0);
  if (rtx_equal_p (orig_op0, op3))
op3p = XEXP (comparison, 0)
  = force_reg (cmpmode, XEXP (comparison, 0));
}
and similarly for the other body?

Jakub



Re: [PATCH] middle-end/61747 - conditional move expansion and constants

2023-07-19 Thread Jakub Jelinek via Gcc-patches
On Wed, Jul 19, 2023 at 01:36:23PM +, Richard Biener wrote:
> > If we have a match between at least one of the comparison operands and
> > op2/op3, I think having equivalency there is perhaps more important than
> > the canonicalization, but it would be nice not to break it even if there
> > is no match.  So, perhaps force_reg only if there is a match?
> > force_reg (cmpmode, force_reg (cmpmode, x)) is equivalent to
> > force_reg (cmpmode, x), so perhaps:
> > {
> >   if (rtx_equal_p (orig_op0, op2))
> > op2p = XEXP (comparison, 0) = force_reg (cmpmode, orig_op0);
> >   if (rtx_equal_p (orig_op0, op3))
> > op3p = XEXP (comparison, 0)
> >   = force_reg (cmpmode, XEXP (comparison, 0));
> > }
> > and similarly for the other body?
> 
> I don't think we'll have op3 == op2 == orig_op0 because if
> op2 == op3 the 
> 
>   /* If the two source operands are identical, that's just a move.  */
> 
>   if (rtx_equal_p (op2, op3))
> {
>   if (!target)
> target = gen_reg_rtx (mode);
> 
>   emit_move_insn (target, op3);
>   return target;
> 
> code should have triggered.  So we should know we invoke force_reg
> only once for each comparison operand check?
> 
> So I'm going to test the following ontop of the patch.

Please use else if instead of the second if then.
Ok with that change.

> --- a/gcc/optabs.cc
> +++ b/gcc/optabs.cc
> @@ -5131,11 +5131,10 @@ emit_conditional_move (rtx target, struct 
> rtx_comparison comp,
>   > COSTS_N_INSNS (1))
>   && can_create_pseudo_p ())
> {
> - XEXP (comparison, 0) = force_reg (cmpmode, orig_op0);
>   if (rtx_equal_p (orig_op0, op2))
> -   op2p = XEXP (comparison, 0);
> +   op2p = XEXP (comparison, 0) = force_reg (cmpmode, 
> orig_op0);
>   if (rtx_equal_p (orig_op0, op3))
> -   op3p = XEXP (comparison, 0);
> +   op3p = XEXP (comparison, 0) = force_reg (cmpmode, 
> orig_op0);
> }
>   if (CONSTANT_P (orig_op1) && optimize
>   && (rtx_cost (orig_op1, mode, COMPARE, 0,
> @@ -5143,11 +5142,10 @@ emit_conditional_move (rtx target, struct 
> rtx_comparison comp,
>   > COSTS_N_INSNS (1))
>   && can_create_pseudo_p ())
> {
> - XEXP (comparison, 1) = force_reg (cmpmode, orig_op1);
>   if (rtx_equal_p (orig_op1, op2))
> -   op2p = XEXP (comparison, 1);
> +   op2p = XEXP (comparison, 1) = force_reg (cmpmode, 
> orig_op1);
>   if (rtx_equal_p (orig_op1, op3))
> -   op3p = XEXP (comparison, 1);
> +   op3p = XEXP (comparison, 1) = force_reg (cmpmode, 
> orig_op1);
> }
>   prepare_cmp_insn (XEXP (comparison, 0), XEXP (comparison, 1),
> GET_CODE (comparison), NULL_RTX, unsignedp,

Jakub



[PATCH] range-op-float: Fix up -frounding-math frange_arithmetic +- handling [PR110755]

2023-07-24 Thread Jakub Jelinek via Gcc-patches
Hi!

IEEE754 says that x + (-x) and x - x result in +0 in all rounding modes
but rounding towards negative infinity, in which case the result is -0
for all finite x.  x + x and x - (-x) if it is zero retain sign of x.
Now, range_arithmetic implements the normal rounds to even rounding,
and as the addition or subtraction in those cases is exact, we don't do any
further rounding etc. and e.g. on the testcase below distilled from glibc
compute a range [+0, +INF], which is fine for -fno-rounding-math or
if we'd have a guarantee that those statements aren't executed with rounding
towards negative infinity.

I believe it is only +- which has this problematic behavior and I think
it is best to deal with it in frange_arithmetic; if we know -frounding-math
is on, it is x + (-x) or x - x and we are asked to round to negative
infinity (i.e. want low bound rather than high bound), change +0 result to
-0.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk and
after a while for 13.3?  I'm afraid rushing this so late into 13.2...

2023-07-24  Jakub Jelinek  

PR tree-optimization/110755
* range-op-float.cc (frange_arithmetic): Change +0 result to -0
for PLUS_EXPR or MINUS_EXPR if -frounding-math, inf is negative and
it is exact op1 + (-op1) or op1 - op1.

* gcc.dg/pr110755.c: New test.

--- gcc/range-op-float.cc.jj2023-07-23 19:32:20.832434105 +0200
+++ gcc/range-op-float.cc   2023-07-24 09:41:26.231030258 +0200
@@ -324,6 +324,24 @@ frange_arithmetic (enum tree_code code,
   bool inexact = real_arithmetic (&value, code, &op1, &op2);
   real_convert (&result, mode, &value);
 
+  /* When rounding towards negative infinity, x + (-x) and
+ x - x is -0 rather than +0 real_arithmetic computes.
+ So, when we are looking for lower bound (inf is negative),
+ use -0 rather than +0.  */
+  if (flag_rounding_math
+  && (code == PLUS_EXPR || code == MINUS_EXPR)
+  && !inexact
+  && real_iszero (&result)
+  && !real_isneg (&result)
+  && real_isneg (&inf))
+{
+  REAL_VALUE_TYPE op2a = op2;
+  if (code == PLUS_EXPR)
+   op2a.sign ^= 1;
+  if (real_isneg (&op1) == real_isneg (&op2a) && real_equal (&op1, &op2a))
+   result.sign = 1;
+}
+
   // Be extra careful if there may be discrepancies between the
   // compile and runtime results.
   bool round = false;
--- gcc/testsuite/gcc.dg/pr110755.c.jj  2023-07-21 10:34:05.037251433 +0200
+++ gcc/testsuite/gcc.dg/pr110755.c 2023-07-21 10:35:10.986326816 +0200
@@ -0,0 +1,29 @@
+/* PR tree-optimization/110755 */
+/* { dg-do run } */
+/* { dg-require-effective-target fenv } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-O2 -frounding-math" } */
+
+#include 
+
+__attribute__((noipa)) float
+foo (float x)
+{ 
+  if (x > 0.0)
+{ 
+  x += 0x1p+23;
+  x -= 0x1p+23;
+  x = __builtin_fabsf (x);
+}
+  return x;
+}
+
+int
+main ()
+{
+#ifdef FE_DOWNWARD
+  fesetround (FE_DOWNWARD);
+  if (__builtin_signbit (foo (0.5)))
+__builtin_abort ();
+#endif
+}

Jakub



Re: [PATCH] match.pd: Implement missed optimization (x << c) >> c -> -(x & 1) [PR101955]

2023-07-24 Thread Jakub Jelinek via Gcc-patches
On Mon, Jul 24, 2023 at 03:29:54PM -0400, Drew Ross via Gcc-patches wrote:
> So would something like
> 
> (simplify
>  (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
>  (with { tree stype = build_nonstandard_integer_type (1, 0); }
>  (if (INTEGRAL_TYPE_P (type)
>   && !TYPE_UNSIGNED (type)
>   && wi::eq_p (wi::to_wide (@1), element_precision (type) - 1))
>   (convert (convert:stype @0)
> 
> work?

Certainly swap the if and with and the (with then should be indented by 1
column to the right of (if and (convert one further (the reason for the
swapping is not to call build_nonstandard_integer_type when it will not be
needed, which will be probably far more often then an actual match).
As discussed privately, the above isn't what we want for vectors and the 2
shifts are probably best on most arches because even when using -(x & 1) the
{ 1, 1, 1, ... } vector would often needed to be loaded from memory.

Jakub



Re: [PATCH] range-op-float: Fix up -frounding-math frange_arithmetic +- handling [PR110755]

2023-07-24 Thread Jakub Jelinek via Gcc-patches
On Mon, Jul 24, 2023 at 07:39:05PM +, Joseph Myers wrote:
> On Mon, 24 Jul 2023, Jakub Jelinek via Gcc-patches wrote:
> 
> > I believe it is only +- which has this problematic behavior and I think
> 
> fma has the same property (of rounding-mode-dependent exact results), but 
> I think that's not relevant here?

Indeed, real_arithmetics doesn't handle FMA* and I think the ranger doesn't
either as of now, if it would, it would likely use mpfr for that and indeed
would need to take this into account.

Jakub



Re: [Patch] OpenMP/Fortran: Reject not strictly nested target -> teams [PR110725, PR71065]

2023-07-24 Thread Jakub Jelinek via Gcc-patches
On Mon, Jul 24, 2023 at 09:43:10PM +0200, Tobias Burnus wrote:
> This patch adds diagnostic for additional code alongside a nested teams
> in a target region.
> 
> The diagnostic is happening soon after parsing such that expressions
> in clauses are not yet expanded - those would end up before TEAMS
> and can be very complicated (e.g. assume an allocatable-returning function).
> 
> (The patch diagnoses it in openmp.cc; after trans-openmp.cc it would
> already be to late.)
> 
> Comments, remarks, suggestions?

Thanks for working on this.  The fuzzy thing on the Fortran side is
if e.g. multiple nested BLOCK statements can appear sandwiched in between
target and teams (of course without declarations in them), or if e.g.
extra empty BLOCK; END BLOCK could appear next to it etc.
And on C/C++ side similarly with {}s, ; is an empty statement, so
#pragma omp target
{
  ;
  #pragma omp teams
  ;
  ;
}
etc. would be invalid.

Jakub



Re: [PATCH] match.pd: Implement missed optimization (x << c) >> c -> -(x & 1) [PR101955]

2023-07-25 Thread Jakub Jelinek via Gcc-patches
On Tue, Jul 25, 2023 at 03:25:57PM -0400, Drew Ross wrote:
> > With that fixed I think for non-vector integrals the above is the most
> suitable
> > canonical form of a sign-extension.  Note it should also work for any
> other
> > constant shift amount - just use the appropriate intermediate precision
> for
> > the truncating type.
> > We _might_ want
> > to consider to only use the converts when the intermediate type has
> > mode precision (and as a special case allow one bit as in your above case)
> > so it can expand to (sign_extend: (subreg: reg)).
> 
> Here is a pattern that that only matches to truncations that result in mode
> precision (or precision of 1):
> 
> (simplify
>  (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
>  (if (INTEGRAL_TYPE_P (type)
>   && !TYPE_UNSIGNED (type)
>   && wi::gt_p (element_precision (type), wi::to_wide (@1), TYPE_SIGN
> (TREE_TYPE (@1

I'd use
 && wi::ltu_p (wi::to_wide (@1), element_precision (type))
If the shift count would be negative, you'd otherwise ICE in tree_to_uhwi on
it (sure, that is UB at runtime, but compiler shouldn't ICE on it).

>   (with {
> int width = element_precision (type) - tree_to_uhwi (@1);
> tree stype = build_nonstandard_integer_type (width, 0);
>}
>(if (TYPE_PRECISION (stype) == 1 || type_has_mode_precision_p (stype))
> (convert (convert:stype @0))

Jakub



Re: [PATCH] match.pd: Implement missed optimization (~X | Y) ^ X -> ~(X & Y) [PR109986]

2023-07-25 Thread Jakub Jelinek via Gcc-patches
On Tue, Jul 25, 2023 at 03:42:21PM -0400, David Edelsohn via Gcc-patches wrote:
> Hi, Drew
> 
> Thanks for addressing this missed optimization.
> 
> The testcase includes an incorrect assumption: signed char, which
> causes the testcase to fail on PowerPC.
> 
> Should the testcase be updated to specify signed char in the function
> signatures or should -fsigned-char be added to the command line
> options?

I think we should use signed char instead of char in the testcase.

Jakub



Re: [gcc13 backport 00/12] RISC-V: Implement ISA Manual Table A.6 Mappings

2023-07-25 Thread Jakub Jelinek via Gcc-patches
On Tue, Jul 25, 2023 at 11:01:54AM -0700, Patrick O'Neill wrote:
> Discussed during the weekly RISC-V GCC meeting[1] and pre-approved by
> Jeff Law.
> If there aren't any objections I'll commit this cherry-picked series
> on Thursday (July 27th).

Please don't before 13.2 will be released, the branch is frozen and none of
this seems to be a release blocker.

Jakub



Re: [PATCH] tree-optimization/91838 - fix FAIL of g++.dg/opt/pr91838.C

2023-07-27 Thread Jakub Jelinek via Gcc-patches
On Thu, Jul 27, 2023 at 12:00:56PM +, Richard Biener wrote:
> The following fixes the lack of simplification of a vector shift
> by an out-of-bounds shift value.  For scalars this is done both
> by CCP and VRP but vectors are not handled there.  This results
> in PR91838 differences in outcome dependent on whether a vector
> shift ISA is available and thus vector lowering does or does not
> expose scalar shifts here.
> 
> The following adds a match.pd pattern to catch uniform out-of-bound
> shifts, simplifying them to zero when not sanitizing shift amounts.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> 
> OK?
> 
> Thanks,
> Richard.
> 
>   PR tree-optimization/91838
>   * match.pd (([rl]shift @0 out-of-bounds) -> zero): New pattern.

The !(flag_sanitize & SANITIZE_SHIFT_EXPONENT)
should be !sanitize_flags_p (SANITIZE_SHIFT_EXPONENT)
or maybe even
GIMPLE || !sanitize_flags_p (SANITIZE_SHIFT_EXPONENT)
because the shift ubsan instrumentation is done on GENERIC, so it can be
optimized on GIMPLE even with ubsan.

Otherwise LGTM.

Jakub



Re: [PATCH] tree-optimization/91838 - fix FAIL of g++.dg/opt/pr91838.C

2023-07-27 Thread Jakub Jelinek via Gcc-patches
On Thu, Jul 27, 2023 at 01:07:58PM +, Richard Biener wrote:
> On Thu, 27 Jul 2023, Jakub Jelinek wrote:
> 
> > On Thu, Jul 27, 2023 at 12:00:56PM +, Richard Biener wrote:
> > > The following fixes the lack of simplification of a vector shift
> > > by an out-of-bounds shift value.  For scalars this is done both
> > > by CCP and VRP but vectors are not handled there.  This results
> > > in PR91838 differences in outcome dependent on whether a vector
> > > shift ISA is available and thus vector lowering does or does not
> > > expose scalar shifts here.
> > > 
> > > The following adds a match.pd pattern to catch uniform out-of-bound
> > > shifts, simplifying them to zero when not sanitizing shift amounts.
> > > 
> > > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > > 
> > > OK?
> > > 
> > > Thanks,
> > > Richard.
> > > 
> > >   PR tree-optimization/91838
> > >   * match.pd (([rl]shift @0 out-of-bounds) -> zero): New pattern.
> > 
> > The !(flag_sanitize & SANITIZE_SHIFT_EXPONENT)
> > should be !sanitize_flags_p (SANITIZE_SHIFT_EXPONENT)
> > or maybe even
> > GIMPLE || !sanitize_flags_p (SANITIZE_SHIFT_EXPONENT)
> > because the shift ubsan instrumentation is done on GENERIC, so it can be
> > optimized on GIMPLE even with ubsan.
> > 
> > Otherwise LGTM.
> 
> So like the following, will push after re-testing succeeded.

Yes, thanks.

Jakub



[PATCH 0/5] GCC _BitInt support [PR102989]

2023-07-27 Thread Jakub Jelinek via Gcc-patches
[PATCH 0/5] GCC _BitInt support [PR102989]

The following patch series introduces support for C23 bit-precise integer
types.  In short, they are similar to other integral types in many ways,
just aren't subject for integral promotions if smaller than int and they can
have even much wider precisions than ordinary integer types.

It is enabled only on targets which have agreed on processor specific
ABI how to lay those out or pass as function arguments/return values,
which currently is just x86-64 I believe, would be nice if target maintainers
helped to get agreement on psABI changes and GCC 14 could enable it on far
more architectures than just one.

C23 says that  defines BITINT_MAXWIDTH macro and that is the
largest supported precision of the _BitInt types, smallest is precision
of unsigned long long (but due to lack of psABI agreement we'll violate
that on architectures which don't have the support done yet).
The following series uses for the time just WIDE_INT_MAX_PRECISION as
that BITINT_MAXWIDTH, with the intent to increase it incrementally later
on.  WIDE_INT_MAX_PRECISION is 575 bits on x86_64, but will be even smaller
on lots of architectures.  This is the largest precision we can support
without changes of wide_int/widest_int representation (to make those non-POD
and allow use of some allocated buffer rather than the included fixed size
one).  Once that would be overcome, there is another internal enforced limit,
INTEGER_CST in current layout allows at most 255 64-bit limbs, which is
16320 bits as another cap.  And if that is overcome, then we have limitation
of TYPE_PRECISION being 16-bit, so 65535 as maximum precision.  Perhaps
we could make TYPE_PRECISION dependent on BITINT_TYPE vs. others and use
32-bit precision in that case later.  Latest Clang/LLVM I think supports
on paper up to 8388608 bits, but is hardly usable even with much shorter
precisions.

Besides this hopefully temporary cap on supported precision and support
only on targets which buy into it, the support has the following limitations:

- _BitInt(N) bit-fields aren't supported yet (the patch rejects them); I'd like
  to enable those incrementally, but don't really see details on how such
  bit-fields should be laid-out in memory nor passed inside of function
  arguments; LLVM implements something, but it is a question if that is what
  the various ABIs want

- conversions between large/huge (see later) _BitInt and _Decimal{32,64,128}
  aren't support and emit a sorry; I'm not familiar enough with DFP stuff
  to implement that

- _Complex _BitInt(N) isn't supported; again mainly because none of the psABIs
  mention how those should be passed/returned; in a limited way they are
  supported internally because the internal functions into which
  __builtin_{add,sub,mul}_overflow{,_p} is lowered return COMPLEX_TYPE as a
  hack to return 2 values without using references/pointers

- vectors of _BitInt(N) aren't supported, both because psABIs don't specify
  how that works and because I'm not really sure it would be useful given
  lack of hw support for anything but bit-precise integers with the same
  bit precision as standard integer types

Because the bit-precise types have different behavior both in the C FE
(e.g. the lack of promotion) and do or can have different behavior in type
layout and function argument passing/returning values, the patch introduces
a new integral type, BITINT_TYPE, so various spots which explicitly check
for INTEGER_TYPE and not say INTEGRAL_TYPE_P macro need to be adjusted.
Also the assumption that all integral types have scalar integer type mode
is no longer true, larger BITINT_TYPEs have BLKmode type.

The patch makes 4 different categories of _BitInt depending on the target hook
decisions and their precision.  The x86-64 psABI says that _BitInt which fit
into signed/unsigned char, short, int, long and long long are laid out and
passed as those types (with padding bits undefined if they don't have mode
precision).  Such smallest precision bit-precise integer types are categorized
as small, the target hook gives for specific precision a scalar integral mode
where a single such mode contains all the bits.  Such small _BitInt types are
generally kept in the IL until expansion into RTL, with minor tweaks during
expansion to avoid relying on the padding bit values.  All larger precision
_BitInt types are supposed to be handled as structure containing an array
of limbs or so, where a limb has some integral mode (for libgcc purposes
best if it has word-size) and the limbs have either little or big endian
ordering in the array.  The padding bits in the most significant limb if any
are either undefined or should be always sign/zero extended (but support for 
this
isn't in yet, we don't know if any psABI will require it).  As mentioned in
some psABI proposals, while currently there is just one limb mode, if the limb
ordering would follow normal target endianity, there is always a possibility
to have two limb modes,

[PATCH 2/5] libgcc _BitInt support [PR102989]

2023-07-27 Thread Jakub Jelinek via Gcc-patches
Hi!

This patch adds the library helpers for multiplication, division + modulo
and casts from and to floating point.
As described in the intro, the first step is try to reduce further the
passed in precision by skipping over most significant limbs with just zeros
or sign bit copies.  For multiplication and division I've implemented
a simple algorithm, using something smarter like Karatsuba or Toom N-Way
might be faster for very large _BitInts (which we don't support right now
anyway), but could mean more code in libgcc, which maybe isn't what people
are willing to accept.
For the to/from floating point conversions the patch uses soft-fp, because
it already has tons of handy macros which can be used for that.  In theory
it could be implemented using {,unsigned} long long or {,unsigned} __int128
to/from floating point conversions with some frexp before/after, but at that
point we already need to force it into integer registers and analyze it
anyway.  Plus, for 32-bit arches there is no __int128 that could be used
for XF/TF mode stuff.
I know that soft-fp is owned by glibc and I think the op-common.h change
should be propagated there, but the bitint stuff is really GCC specific
and IMHO doesn't belong into the glibc copy.

2023-07-27  Jakub Jelinek  

PR c/102989
libgcc/
* config/aarch64/t-softfp (softfp_extras): Use += rather than :=.
* config/i386/64/t-softfp (softfp_extras): Likewise.
* config/i386/libgcc-glibc.ver (GCC_14.0.0): Export _BitInt support
routines.
* config/i386/t-softfp (softfp_extras): Add fixxfbitint and
bf, hf and xf mode floatbitint.
(CFLAGS-floatbitintbf.c, CFLAGS-floatbitinthf.c): Add -msse2.
* config/riscv/t-softfp32 (softfp_extras): Use += rather than :=.
* config/rs6000/t-e500v1-fp (softfp_extras): Likewise.
* config/rs6000/t-e500v2-fp (softfp_extras): Likewise.
* config/t-softfp (softfp_floatbitint_funcs): New.
(softfp_func_list): Add sf and df mode from and to _BitInt libcalls.
* config/t-softfp-sfdftf (softfp_extras): Add fixtfbitint and
floatbitinttf.
* config/t-softfp-tf (softfp_extras): Likewise.
* libgcc2.c (bitint_reduce_prec): New inline function.
(BITINT_INC, BITINT_END): Define.
(bitint_mul_1, bitint_addmul_1): New helper functions.
(__mulbitint3): New function.
(bitint_negate, bitint_submul_1): New helper functions.
(__divmodbitint4): New function.
* libgcc2.h (LIBGCC2_UNITS_PER_WORD): When building _BitInt support
libcalls, redefine depending on __LIBGCC_BITINT_LIMB_WIDTH__.
(__mulbitint3, __divmodbitint4): Declare.
* libgcc-std.ver.in (GCC_14.0.0): Export _BitInt support routines.
* Makefile.in (lib2funcs): Add _mulbitint3.
(LIB2_DIVMOD_FUNCS): Add _divmodbitint4.
* soft-fp/bitint.h: New file.
* soft-fp/fixdfbitint.c: New file.
* soft-fp/fixsfbitint.c: New file.
* soft-fp/fixtfbitint.c: New file.
* soft-fp/fixxfbitint.c: New file.
* soft-fp/floatbitintbf.c: New file.
* soft-fp/floatbitintdf.c: New file.
* soft-fp/floatbitinthf.c: New file.
* soft-fp/floatbitintsf.c: New file.
* soft-fp/floatbitinttf.c: New file.
* soft-fp/floatbitintxf.c: New file.
* soft-fp/op-common.h (_FP_FROM_INT): Add support for rsize up to
4 * _FP_W_TYPE_SIZE rather than just 2 * _FP_W_TYPE_SIZE.

--- libgcc/config/aarch64/t-softfp.jj   2023-03-13 00:11:52.330213322 +0100
+++ libgcc/config/aarch64/t-softfp  2023-07-14 12:38:30.764869473 +0200
@@ -3,7 +3,7 @@ softfp_int_modes := si di ti
 softfp_extensions := sftf dftf hftf bfsf
 softfp_truncations := tfsf tfdf tfhf tfbf dfbf sfbf hfbf
 softfp_exclude_libgcc2 := n
-softfp_extras := fixhfti fixunshfti floattihf floatuntihf \
+softfp_extras += fixhfti fixunshfti floattihf floatuntihf \
 floatdibf floatundibf floattibf floatuntibf
 
 TARGET_LIBGCC2_CFLAGS += -Wno-missing-prototypes
--- libgcc/config/i386/64/t-softfp.jj   2023-03-10 20:39:43.849687830 +0100
+++ libgcc/config/i386/64/t-softfp  2023-07-14 12:37:55.422344930 +0200
@@ -1,4 +1,4 @@
-softfp_extras := fixhfti fixunshfti floattihf floatuntihf \
+softfp_extras += fixhfti fixunshfti floattihf floatuntihf \
 floattibf floatuntibf
 
 CFLAGS-fixhfti.c += -msse2
--- libgcc/config/i386/libgcc-glibc.ver.jj  2023-07-11 13:39:49.760107863 
+0200
+++ libgcc/config/i386/libgcc-glibc.ver 2023-07-17 09:45:43.128281615 +0200
@@ -226,3 +226,13 @@ GCC_13.0.0 {
   __truncxfbf2
   __trunchfbf2
 }
+
+%inherit GCC_14.0.0 GCC_13.0.0
+GCC_14.0.0 {
+  __PFX__fixxfbitint
+  __PFX__fixtfbitint
+  __PFX__floatbitintbf
+  __PFX__floatbitinthf
+  __PFX__floatbitintxf
+  __PFX__floatbitinttf
+}
--- libgcc/config/i386/t-softfp.jj  2022-10-14 09:35:56.268989311 +0200
+++ libgcc/config/i386/t-softfp 2023-07-17 09:38:43.575980078 +0200
@@ -10,7 +10,7 @@ sof

[PATCH 3/5] C _BitInt support [PR102989]

2023-07-27 Thread Jakub Jelinek via Gcc-patches
Hi!

This patch adds the C FE support, c-family support, small libcpp change
so that 123wb and 42uwb suffixes are handled plus glimits.h change
to define BITINT_MAXWIDTH macro.

The previous two patches really do nothing without this, which enables
all the support.

2023-07-27  Jakub Jelinek  

PR c/102989
gcc/
* glimits.h (BITINT_MAXWIDTH): Define if __BITINT_MAXWIDTH__ is
predefined.
gcc/c-family/
* c-common.cc (c_common_reswords): Add _BitInt as keyword.
(c_common_signed_or_unsigned_type): Handle BITINT_TYPE.
(check_builtin_function_arguments): Handle BITINT_TYPE like
INTEGER_TYPE.
(keyword_begins_type_specifier): Handle RID_BITINT.
* c-common.h (enum rid): Add RID_BITINT enumerator.
* c-cppbuiltin.cc (c_cpp_builtins): For C call
targetm.c.bitint_type_info and predefine __BITINT_MAXWIDTH__
and for -fbuilding-libgcc also __LIBGCC_BITINT_LIMB_WIDTH__ and
__LIBGCC_BITINT_ORDER__ macros if _BitInt is supported.
* c-lex.cc (interpret_integer): Handle CPP_N_BITINT.
* c-pretty-print.cc (c_pretty_printer::simple_type_specifier,
c_pretty_printer::direct_abstract_declarator): Handle BITINT_TYPE.
(pp_c_integer_constant): Handle printing of large precision wide_ints
which would buffer overflow digit_buffer.
* c-ubsan.cc (ubsan_instrument_shift): Use UBSAN_PRINT_FORCE_INT
for type0 type descriptor.
gcc/c/
* c-convert.cc (c_convert): Handle BITINT_TYPE like INTEGER_TYPE.
* c-decl.cc (declspecs_add_type): Formatting fixes.  Handle
cts_bitint.  Adjust for added union in *specs.  Handle RID_BITINT.
(finish_declspecs): Handle cts_bitint.  Adjust for added union in
*specs.
* c-parser.cc (c_keyword_starts_typename, c_token_starts_declspecs,
c_parser_declspecs, c_parser_gnu_attribute_any_word): Handle
RID_BITINT.
* c-tree.h (enum c_typespec_keyword): Mention _BitInt in comment.
Add cts_bitint enumerator.
(struct c_declspecs): Move int_n_idx and floatn_nx_idx into a union
and add bitint_prec there as well.
* c-typeck.cc (composite_type, c_common_type, comptypes_internal):
Handle BITINT_TYPE.
(build_array_ref, build_unary_op, build_conditional_expr,
convert_for_assignment, digest_init, build_binary_op): Likewise.
libcpp/
* expr.cc (interpret_int_suffix): Handle wb and WB suffixes.
* include/cpplib.h (CPP_N_BITINT): Define.

--- gcc/glimits.h.jj2023-01-03 00:20:35.071086812 +0100
+++ gcc/glimits.h   2023-07-27 15:03:24.238234396 +0200
@@ -157,6 +157,11 @@ see the files COPYING3 and COPYING.RUNTI
 # undef BOOL_WIDTH
 # define BOOL_WIDTH 1
 
+# ifdef __BITINT_MAXWIDTH__
+#  undef BITINT_MAXWIDTH
+#  define BITINT_MAXWIDTH __BITINT_MAXWIDTH__
+# endif
+
 # define __STDC_VERSION_LIMITS_H__ 202311L
 #endif
 
--- gcc/c-family/c-common.cc.jj 2023-07-24 17:48:26.436041278 +0200
+++ gcc/c-family/c-common.cc2023-07-27 15:03:24.276233865 +0200
@@ -349,6 +349,7 @@ const struct c_common_resword c_common_r
   { "_Alignas",RID_ALIGNAS,   D_CONLY },
   { "_Alignof",RID_ALIGNOF,   D_CONLY },
   { "_Atomic", RID_ATOMIC,D_CONLY },
+  { "_BitInt", RID_BITINT,D_CONLY },
   { "_Bool",   RID_BOOL,  D_CONLY },
   { "_Complex",RID_COMPLEX,0 },
   { "_Imaginary",  RID_IMAGINARY, D_CONLY },
@@ -2728,6 +2729,9 @@ c_common_signed_or_unsigned_type (int un
   || TYPE_UNSIGNED (type) == unsignedp)
 return type;
 
+  if (TREE_CODE (type) == BITINT_TYPE)
+return build_bitint_type (TYPE_PRECISION (type), unsignedp);
+
 #define TYPE_OK(node)  \
   (TYPE_MODE (type) == TYPE_MODE (node)
\
&& TYPE_PRECISION (type) == TYPE_PRECISION (node))
@@ -6341,8 +6345,10 @@ check_builtin_function_arguments (locati
  code0 = TREE_CODE (TREE_TYPE (args[0]));
  code1 = TREE_CODE (TREE_TYPE (args[1]));
  if (!((code0 == REAL_TYPE && code1 == REAL_TYPE)
-   || (code0 == REAL_TYPE && code1 == INTEGER_TYPE)
-   || (code0 == INTEGER_TYPE && code1 == REAL_TYPE)))
+   || (code0 == REAL_TYPE
+   && (code1 == INTEGER_TYPE || code1 == BITINT_TYPE))
+   || ((code0 == INTEGER_TYPE || code0 == BITINT_TYPE)
+   && code1 == REAL_TYPE)))
{
  error_at (loc, "non-floating-point arguments in call to "
"function %qE", fndecl);
@@ -8402,6 +8408,7 @@ keyword_begins_type_specifier (enum rid
 case RID_FRACT:
 case RID_ACCUM:
 case RID_BOOL:
+case RID_BITINT:
 case RID_WCHAR:
 case RID_CHAR8:
 case RID_CHAR16:
--- gcc/c-family/c-common.h.jj  2023-06-26 09:27:04.276367532 +0200
+++ gcc/c-family/c-common.h 2023-

Re: [PATCH 4/5] testsuite part 1 for _BitInt support [PR102989]

2023-07-27 Thread Jakub Jelinek via Gcc-patches
On Thu, Jul 27, 2023 at 07:15:28PM +0200, Jakub Jelinek via Gcc-patches wrote:
> testcases, I've been using
> https://defuse.ca/big-number-calculator.htm
> tool, a randombitint tool I wrote (will post as a reply to this) plus
> LLVM trunk on godbolt and the WIP GCC support checking if both compilers
> agree on stuff (and in case of differences tried constant evaluation etc.).

So, the randombitint.c tool is attached, when invoked like
./randombitint 174
it prints pseudo random 174 bit integer in decimal, when invoked as
./randombitint 575 mask
it prints all ones number as decimal for the 575 bit precision, and
./randombitint 275 0x432445aebe435646567547567647
prints the given hexadecimal number as decimal (all using gmp).

In the tests I've often used
__attribute__((noipa)) void
printme (const void *p, int n)
{
  __builtin_printf ("0x");
  if ((n & 7) != 0)
__builtin_printf ("%02x", ((const unsigned char *) p)[n / 8] & ((1 << (n & 
7)) - 1));
  for (int i = (n / 8) - 1; i >= 0; --i)
__builtin_printf ("%02x", ((const unsigned char *) p)[i]);
  __builtin_printf ("\n");
}
function to print hexadecimal values (temporaries or finals) and then used
the third invocation of the tool to convert those to decimal.
For unsigned _BitInt just called the above like
  printme (&whatever, 575);
where 575 was the N from unsigned _BitInt(N) whatever, or
  _BitInt(575) x = ...
  if (x < 0)
{
  __builtin_printf ("-");
  x = -x;
}
  printme (&x, 575);
to print it signed.

Jakub
#include 
#include 
#include 
#include 
#include 

int
main (int argc, const char *argv[])
{
  int n = atoi (argv[1]);
  int m = (n + 7) / 8;
  char *p = __builtin_alloca (m * 2 + 1);
  const char *q;
  srandom (getpid ());
  for (int i = 0; i < m; ++i)
{
  unsigned char v = random ();
  if (argc >= 3 && strcmp (argv[2], "mask") == 0)
v = 0xff;
  if (i == 0 && (n & 7) != 0)
v &= (1 << (n & 7)) - 1;
  sprintf (&p[2 * i], "%02x", v);
}
  p[m * 2] = '\0';
  mpz_t a;
  if (argc >= 3 && strcmp (argv[2], "mask") != 0)
{
  q = argv[2];
  if (q[0] == '0' && q[1] == 'x')
q += 2;
}
  else
q = p;
  gmp_sscanf (q, "%Zx", a);
  gmp_printf ("0x%s\n%Zd\n", q, a);
  return 0;
}


Re: [PATCH 0/5] GCC _BitInt support [PR102989]

2023-07-28 Thread Jakub Jelinek via Gcc-patches
On Thu, Jul 27, 2023 at 06:41:44PM +, Joseph Myers wrote:
> On Thu, 27 Jul 2023, Jakub Jelinek via Gcc-patches wrote:
> 
> > - _BitInt(N) bit-fields aren't supported yet (the patch rejects them); I'd 
> > like
> >   to enable those incrementally, but don't really see details on how such
> >   bit-fields should be laid-out in memory nor passed inside of function
> >   arguments; LLVM implements something, but it is a question if that is what
> >   the various ABIs want
> 
> So if the x86-64 ABI (or any other _BitInt ABI that already exists) 
> doesn't specify this adequately then an issue should be filed (at 
> https://gitlab.com/x86-psABIs/x86-64-ABI/-/issues in the x86-64 case).
> 
> (Note that the language specifies that e.g. _BitInt(123):45 gets promoted 
> to _BitInt(123) by the integer promotions, rather than left as a type with 
> the bit-field width.)

Ok, I'll try to investigate in detail what LLVM does and what GCC would do
if I just enabled the bitfield support and report.  Still, I'd like to
handle this only in incremental step after the rest of _BitInt support goes
in.

> > - conversions between large/huge (see later) _BitInt and _Decimal{32,64,128}
> >   aren't support and emit a sorry; I'm not familiar enough with DFP stuff
> >   to implement that
> 
> Doing things incrementally might indicate first doing this only for BID 
> (so sufficing for x86-64), with DPD support to be added when _BitInt 
> support is added for an architecture using DPD, i.e. powerpc / s390.
> 
> This conversion is a mix of base conversion and things specific to DFP 
> types.

I had a brief look at libbid and am totally unimpressed.
Seems we don't implement {,unsigned} __int128 <-> _Decimal{32,64,128}
conversions at all (we emit calls to __bid_* functions which don't exist),
the library (or the way we configure it) doesn't care about exceptions nor
rounding mode (see following testcase) and for integral <-> _Decimal32
conversions implement them as integral <-> _Decimal64 <-> _Decimal32
conversions.  While in the _Decimal32 -> _Decimal64 -> integral
direction that is probably ok, even if exceptions and rounding (other than
to nearest) were supported, the other direction I'm sure can suffer from
double rounding.

So, wonder if it wouldn't be better to implement these in the soft-fp
infrastructure which at least has the exception and rounding mode support.
Unlike DPD, decoding BID seems to be about 2 simple tests of the 4 bits
below the sign bit and doing some shifts, so not something one needs a 10MB
of a library for.  Now, sure, 5MB out of that are generated tables in
bid_binarydecimal.c, but unfortunately those are static and not in a form
which could be directly fed into multiplication (unless we'd want to go
through conversions to/from strings).
So, it seems to be easier to guess needed power of 10 from number of binary
digits or vice versa, have a small table of powers of 10 (say those which
fit into a limb) and construct larger powers of 10 by multiplicating those
several times, _Decimal128 has exponent up to 6144 which is ~ 2552 bytes
or 319 64-bit limbs, but having a table with all the 6144 powers of ten
would be just huge.  In 64-bit limb fit power of ten until 10^19, so we
might need say < 32 multiplications to cover it all (but with the current
575 bits limitation far less).  Perhaps later on write a few selected powers
of 10 as _BitInt to decrease that number.

> For conversion *from _BitInt to DFP*, the _BitInt value needs to be 
> expressed in decimal.  In the absence of optimized multiplication / 
> division for _BitInt, it seems reasonable enough to do this naively 
> (repeatedly dividing by a power of 10 that fits in one limb to determine 
> base 10^N digits from the least significant end, for example), modulo 
> detecting obvious overflow cases up front (if the absolute value is at 

Wouldn't it be cheaper to guess using the 10^3 ~= 2^10 approximation
and instead repeatedly multiply like in the other direction and then just
divide once with remainder?

Jakub
#include 

int
main ()
{
  volatile _Decimal64 d;
  volatile long long l;
  int e;

  feclearexcept (FE_ALL_EXCEPT);
  d = __builtin_infd64 ();
  l = d;
  e = fetestexcept (FE_INVALID);
  feclearexcept (FE_ALL_EXCEPT);
  __builtin_printf ("%016lx %d\n", l, e != 0);
  l = 50LL;
  fesetround (FE_TONEAREST);
  d = l;
  __builtin_printf ("%ld\n", (long long) d);
  fesetround (FE_UPWARD);
  d = l;
  fesetround (FE_TONEAREST);
  __builtin_printf ("%ld\n", (long long) d);
  fesetround (FE_DOWNWARD);
  d = l;
  fesetround (FE_TONEAREST);
  __builtin_printf ("%ld\n", (long long) d);
  l = 01LL;
  fesetround (FE_TONEAREST);
  d = l;
  __builtin_printf ("%ld\n", (long long) d);
  fesetround (FE_UPWARD);
  d = l;
  fesetround (FE_TONEAREST);
  __builtin_printf ("%ld\n", (long long) d);
  fesetround (FE_DOWNWARD);
  d = l;
  fesetround (FE_TONEAREST);
  __builtin_printf ("%ld\n", (long long) d);
}


[PATCH] gimple-fold: Handle _BitInt in __builtin_clear_padding [PR102989]

2023-07-28 Thread Jakub Jelinek via Gcc-patches
Hi!

The comments about _Atomic _BitInt made me figure out I forgot (although
earlier was planning to do that) to implement __builtin_clear_padding
support for _BitInt.

The following patch (incremental to the _BitInt series) does that.

2023-07-28  Jakub Jelinek  

PR c/102989
* gimple-fold.cc (clear_padding_unit): Mention in comment that
_BitInt types don't need to fit either.
(clear_padding_bitint_needs_padding_p): New function.
(clear_padding_type_may_have_padding_p): Handle BITINT_TYPE.
(clear_padding_type): Likewise.

* gcc.dg/bitint-16.c: New test.

--- gcc/gimple-fold.cc.jj   2023-07-11 15:28:54.704679510 +0200
+++ gcc/gimple-fold.cc  2023-07-28 12:37:18.971789595 +0200
@@ -4103,8 +4103,8 @@ gimple_fold_builtin_realloc (gimple_stmt
   return false;
 }
 
-/* Number of bytes into which any type but aggregate or vector types
-   should fit.  */
+/* Number of bytes into which any type but aggregate, vector or
+   _BitInt types should fit.  */
 static constexpr size_t clear_padding_unit
   = MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT;
 /* Buffer size on which __builtin_clear_padding folding code works.  */
@@ -4595,6 +4595,26 @@ clear_padding_real_needs_padding_p (tree
  && (fmt->signbit_ro == 79 || fmt->signbit_ro == 95));
 }
 
+/* _BitInt has padding bits if it isn't extended in the ABI and has smaller
+   precision than bits in limb or corresponding number of limbs.  */
+
+static bool
+clear_padding_bitint_needs_padding_p (tree type)
+{
+  struct bitint_info info;
+  gcc_assert (targetm.c.bitint_type_info (TYPE_PRECISION (type), &info));
+  if (info.extended)
+return false;
+  scalar_int_mode limb_mode = as_a  (info.limb_mode);
+  if (TYPE_PRECISION (type) < GET_MODE_PRECISION (limb_mode))
+return true;
+  else if (TYPE_PRECISION (type) == GET_MODE_PRECISION (limb_mode))
+return false;
+  else
+return (((unsigned) TYPE_PRECISION (type))
+   % GET_MODE_PRECISION (limb_mode)) != 0;
+}
+
 /* Return true if TYPE might contain any padding bits.  */
 
 bool
@@ -4611,6 +4631,8 @@ clear_padding_type_may_have_padding_p (t
   return clear_padding_type_may_have_padding_p (TREE_TYPE (type));
 case REAL_TYPE:
   return clear_padding_real_needs_padding_p (type);
+case BITINT_TYPE:
+  return clear_padding_bitint_needs_padding_p (type);
 default:
   return false;
 }
@@ -4855,6 +4877,57 @@ clear_padding_type (clear_padding_struct
   memset (buf->buf + buf->size, ~0, sz);
   buf->size += sz;
   break;
+case BITINT_TYPE:
+  {
+   struct bitint_info info;
+   gcc_assert (targetm.c.bitint_type_info (TYPE_PRECISION (type), &info));
+   scalar_int_mode limb_mode = as_a  (info.limb_mode);
+   if (TYPE_PRECISION (type) <= GET_MODE_PRECISION (limb_mode))
+ {
+   gcc_assert ((size_t) sz <= clear_padding_unit);
+   if ((unsigned HOST_WIDE_INT) sz + buf->size
+   > clear_padding_buf_size)
+ clear_padding_flush (buf, false);
+   if (!info.extended
+   && TYPE_PRECISION (type) < GET_MODE_PRECISION (limb_mode))
+ {
+   int tprec = GET_MODE_PRECISION (limb_mode);
+   int prec = TYPE_PRECISION (type);
+   tree t = build_nonstandard_integer_type (tprec, 1);
+   tree cst = wide_int_to_tree (t, wi::mask (prec, true, tprec));
+   int len = native_encode_expr (cst, buf->buf + buf->size, sz);
+   gcc_assert (len > 0 && (size_t) len == (size_t) sz);
+ }
+   else
+ memset (buf->buf + buf->size, 0, sz);
+   buf->size += sz;
+   break;
+ }
+   tree limbtype
+ = build_nonstandard_integer_type (GET_MODE_PRECISION (limb_mode), 1);
+   fldsz = int_size_in_bytes (limbtype);
+   nelts = int_size_in_bytes (type) / fldsz;
+   for (HOST_WIDE_INT i = 0; i < nelts; i++)
+ {
+   if (!info.extended
+   && i == (info.big_endian ? 0 : nelts - 1)
+   && (((unsigned) TYPE_PRECISION (type))
+   % TYPE_PRECISION (limbtype)) != 0)
+ {
+   int tprec = GET_MODE_PRECISION (limb_mode);
+   int prec = (((unsigned) TYPE_PRECISION (type)) % tprec);
+   tree cst = wide_int_to_tree (limbtype,
+wi::mask (prec, true, tprec));
+   int len = native_encode_expr (cst, buf->buf + buf->size,
+ fldsz);
+   gcc_assert (len > 0 && (size_t) len == (size_t) fldsz);
+   buf->size += fldsz;
+ }
+   else
+ clear_padding_type (buf, limbtype, fldsz, for_auto_init);
+ }
+   break;
+  }
 default:
   gcc_assert ((size_t) sz <= clear_padding_unit);
   if ((unsigned HOST_WIDE_INT) sz + buf->size > clear_padding_buf_size)
--- gcc

Re: _BitInt vs. _Atomic

2023-07-28 Thread Jakub Jelinek via Gcc-patches
On Thu, Jul 27, 2023 at 07:06:03PM +, Joseph Myers wrote:
> I think there should be tests for _Atomic _BitInt types.  Hopefully atomic 
> compound assignment just works via the logic for compare-and-exchange 
> loops, but does e.g. atomic_fetch_add work with _Atomic _BitInt types?

So, there are 2 issues.

One is something I haven't seen being handled for C at all so far, but
handled for C++ - padding bits.

Already e.g. x86 long double has some padding bits - 16 bits on ia32,
48 bits on x86_64, when one does
  _Atomic long double l;
...
  l += 2.0;
it will sometimes work and sometimes hang forever.
Similarly atomic_compare_exchange with structs which contain padding
(unions with padding bits are lost case, there is nothing that can be
reliably done for that, because we don't know at runtime what is the active
union member if any).  And _BitInt if it doesn't use all bits in
all containing limbs has padding as well (and psABI doesn't say it is sign
or zero extended).

The C++ way of dealing with this is using __builtin_clear_padding,
done on atomic stores/updates of the atomic memory (padding is cleared
if any on the value to be stored, or on the expected and desired values).

I don't know enough about the C atomic requirements whether that is feasible
for it as well, or whether it is possible to make the padding bits partially
or fully set somehow non-atomically without invoking UB and then make it
never match.

If one ignores this or deals with it, then

_Atomic _BitInt(15) a;
_Atomic(_BitInt(15)) b;
_Atomic _BitInt(115) c;
_Atomic _BitInt(192) d;
_Atomic _BitInt(575) e;
_BitInt(575) f;

int
main ()
{
  a += 1wb;
  b -= 2wb;
  c += 3wb;
  d += 4wb;
  e -= 5wb;
//  f = __atomic_fetch_add (&e, 
54342985743985743985743895743834298574985734895743895734895wb, 
__ATOMIC_SEQ_CST);
}

compiles fine with the patch set.

And another issue is that while __atomic_load, __atomic_store,
__atomic_exchange and __atomic_compare_exchange work on arbitrary _BitInt
sizes, others like __atomic_fetch_add only support _BitInt or other integral
types which have size of 1, 2, 4, 8 or 16 bytes, others emit an error
in c-family/c-common.cc (sync_resolve_size).  So, either
resolve_overloaded_builtin should for the case when pointer is pointer to
_BitInt which doesn't have 1, 2, 4, 8 or 16 bytes size lower those into
a loop using __atomic_compare_exchange (or perhaps also if there is
padding), or  should do that.

Thoughts on that?

Jakub



Re: _BitInt vs. _Atomic

2023-07-28 Thread Jakub Jelinek via Gcc-patches
On Fri, Jul 28, 2023 at 04:03:39PM +0200, Martin Uecker wrote:
> > On Thu, Jul 27, 2023 at 07:06:03PM +, Joseph Myers wrote:
> > > I think there should be tests for _Atomic _BitInt types.  Hopefully 
> > > atomic 
> > > compound assignment just works via the logic for compare-and-exchange 
> > > loops, but does e.g. atomic_fetch_add work with _Atomic _BitInt types?
> > 
> > So, there are 2 issues.
> > 
> > One is something I haven't seen being handled for C at all so far, but
> > handled for C++ - padding bits.
> > 
> > Already e.g. x86 long double has some padding bits - 16 bits on ia32,
> > 48 bits on x86_64, when one does
> >   _Atomic long double l;
> > ...
> >   l += 2.0;
> > it will sometimes work and sometimes hang forever.
> > Similarly atomic_compare_exchange with structs which contain padding
> > (unions with padding bits are lost case, there is nothing that can be
> > reliably done for that, because we don't know at runtime what is the active
> > union member if any).  And _BitInt if it doesn't use all bits in
> > all containing limbs has padding as well (and psABI doesn't say it is sign
> > or zero extended).
> 
> What is the problem here?  In C, atomic_compare_exchange is defined in terms
> of the memory content which includes padding.  So it may fail spuriously
> due to padding differences (but it may fail anyway for arbitrary reasons
> even without padding differences), but then should work in the second 
> iterations.

See https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0528r3.html
for background.  The thing is that user doesn't have much control over those
padding bits, so whether _Atomic operations on long double (when it is 80
bit and stores from hw actually store 10 bytes rather than 12 or 16), or
_BitInt(37) or _BitInt(195) or struct S { char a; int b; }; then depend
purely on luck.  If the expected value is based on atomic_load on the
atomic_compare_exchange location or whatever atomic_compare_exchange gave
back, if in the loop one e.g. adds something to it, then again it might get
different padding bits from what is originally in memory, so it isn't true
that it will always succeed at least in the second loop iteration.

Jakub



Re: _BitInt vs. _Atomic

2023-07-28 Thread Jakub Jelinek via Gcc-patches
On Fri, Jul 28, 2023 at 04:53:30PM +0200, Martin Uecker wrote:
> > The thing is that user doesn't have much control over those
> > padding bits, so whether _Atomic operations on long double (when it is 80
> > bit and stores from hw actually store 10 bytes rather than 12 or 16), or
> > _BitInt(37) or _BitInt(195) or struct S { char a; int b; }; then depend
> > purely on luck.  If the expected value is based on atomic_load on the
> > atomic_compare_exchange location or whatever atomic_compare_exchange gave
> > back, if in the loop one e.g. adds something to it, then again it might get
> > different padding bits from what is originally in memory, so it isn't true
> > that it will always succeed at least in the second loop iteration.
> 
> Sorry, somehow I must be missing something here.
> 
> If you add something you would create a new value and this may (in
> an object) have random new padding.  But the "expected" value should
> be updated by a failed atomic_compare_exchange cycle and then have
> same padding as the value stored in the atomic. So the next cycle
> should succeed.  The user would not change the representation of
> the "expected" value but create a new value for another object
> by adding something.

You're right that it would pass the expected value not something after an
operation on it usually.  But still, expected type will be something like
_BitInt(37) or _BitInt(195) and so neither the atomic_load nor what
atomic_compare_exchange copies back on failure is guaranteed to have the
padding bits preserved.
It is true that if it is larger than 16 bytes the libatomic
atomic_compare_exchange will memcpy the value back which copies the padding
bits, but is there a guarantee that the user code doesn't actually copy that
value further into some other variable?  Anyway, for smaller or equal
to 16 (or 8) bytes if atomic_compare_exchange is emitted inline I don't see
what would preserve the bits.

Jakub



[RFC WIP PATCH] _BitInt bit-field support [PR102989]

2023-07-28 Thread Jakub Jelinek via Gcc-patches
On Fri, Jul 28, 2023 at 11:05:42AM +0200, Jakub Jelinek via Gcc-patches wrote:
> On Thu, Jul 27, 2023 at 06:41:44PM +, Joseph Myers wrote:
> > On Thu, 27 Jul 2023, Jakub Jelinek via Gcc-patches wrote:
> > 
> > > - _BitInt(N) bit-fields aren't supported yet (the patch rejects them); 
> > > I'd like
> > >   to enable those incrementally, but don't really see details on how such
> > >   bit-fields should be laid-out in memory nor passed inside of function
> > >   arguments; LLVM implements something, but it is a question if that is 
> > > what
> > >   the various ABIs want
> > 
> > So if the x86-64 ABI (or any other _BitInt ABI that already exists) 
> > doesn't specify this adequately then an issue should be filed (at 
> > https://gitlab.com/x86-psABIs/x86-64-ABI/-/issues in the x86-64 case).
> > 
> > (Note that the language specifies that e.g. _BitInt(123):45 gets promoted 
> > to _BitInt(123) by the integer promotions, rather than left as a type with 
> > the bit-field width.)
> 
> Ok, I'll try to investigate in detail what LLVM does and what GCC would do
> if I just enabled the bitfield support and report.  Still, I'd like to
> handle this only in incremental step after the rest of _BitInt support goes
> in.

So, I've spent some time on this but after simply enabling _BitInt
bit-fields everything I've tried yields the same layout with both GCC and
LLVM trunk and as I didn't have to muck with stor-layout.cc for that
(haven't tried yet the function arg passing/returning), I assume it is just
the generic PCC_BITFIELD_TYPE_MATTERS behavior which works like that, so
while it wouldn't hurt it the psABI said something about those, perhaps it is
ok as is.

But I ran into a compiler divergence on _Generic with bit-field expressions.
My understanding is that _Generic controlling expression undergoes array
to pointer and function to pointer conversions, but not integral promotions
(otherwise it would never match for char, short etc. types).
C23 draft I have says:
"A bit-field is interpreted as having a signed or unsigned integer type
consisting of the specified number of bits"
but doesn't say which exact signed or unsigned integer type that is.
Additionally, I think at least for the larger widths, it would be really
strange if it was interpreted as an INTEGER_TYPE with say precision of 350
because that is more than INTEGER_TYPEs can really use.
So, in the patch I try to use a bit-precise integer type for the bit-field
type if it has a bit-precise bit-field type if possible (where not possible
is the special case of signed 1-bit field where _BitInt(1) is not allowed.

Now, in the testcase with GCC the
static_assert (expr_has_type (s4.a, _BitInt(195)));
static_assert (expr_has_type (s4.b, _BitInt(282)));
static_assert (expr_has_type (s4.c, _BitInt(389)));
static_assert (expr_has_type (s4.d, _BitInt(2)));
static_assert (expr_has_type (s5.a, _BitInt(192)));
static_assert (expr_has_type (s5.b, unsigned _BitInt(192)));
static_assert (expr_has_type (s5.c, _BitInt(192)));
static_assert (expr_has_type (s6.a, _BitInt(2)));
assertions all fail (and all the ones where integer promotions are performed
for binary operation succeed).  They would succeed with
static_assert (expr_has_type (s4.a, _BitInt(63)));
static_assert (expr_has_type (s4.b, _BitInt(280)));
static_assert (expr_has_type (s4.c, _BitInt(23)));
static_assert (!expr_has_type (s4.d, _BitInt(2)));
static_assert (expr_has_type (s5.a, _BitInt(191)));
static_assert (expr_has_type (s5.b, unsigned _BitInt(190)));
static_assert (expr_has_type (s5.c, _BitInt(189)));
static_assert (!expr_has_type (s6.a, _BitInt(2)));
The s4.d and s6.a cases for GCC with this patch actually have int:1 type,
something that can't be ever matched in _Generic except for default:.

On the other side, all the above pass with LLVM, i.e. as if they have
undergone the integral promotion for the _BitInt bitfield case even for
_Generic.  And the
static_assert (expr_has_type (s4.c + 1uwb, _BitInt(389)));
static_assert (expr_has_type (s4.d * 0wb, _BitInt(2)));
static_assert (expr_has_type (s6.a + 0wb, _BitInt(2)));
That looks to me like LLVM bug, because
"The value from a bit-field of a bit-precise integer type is converted to
the corresponding bit-precise integer type."
specifies that s4.c has _BitInt(389) type after integer promotions
and s4.d and s6.a have _BitInt(2) type.  Now, 1uwb has unsigned _BitInt(1)
type and 0wb has _BitInt(2) and the common type for those in all cases is
I believe the type of the left operand.

Thoughts on this?

The patch is obviously incomplete, I haven't added code for lowering
loads/stores from/to bit-fields for large/huge _BitInt nor added testcase
coverage for passing of small structs with _BitInt bit-fields as function
arguments/return value

Re: [PATCH 0/5] GCC _BitInt support [PR102989]

2023-08-01 Thread Jakub Jelinek via Gcc-patches
On Fri, Jul 28, 2023 at 06:03:33PM +, Joseph Myers wrote:
> You could e.g. have a table up to 10^(N-1) for some N, and 10^N, 10^2N 
> etc. up to 10^6144 (or rather up to 10^6111, which can then be multiplied 
> by a 34-digit integer significand), so that only one multiplication is 
> needed to get the power of 10 and then a second multiplication by the 
> significand.  (Or split into three parts at the cost of an extra 
> multiplication, or multiply the significand by 1, 10, 100, 1000 or 1 
> as a multiplication within 128 bits and so only need to compute 10^k for k 
> a multiple of 5, or any number of variations on those themes.)

So, I've done some quick counting, if we want at most one multiplication
to get 10^X for X in 0..6111 (plus another to multiply mantissa by that),
having one table with 10^1..10^(N-1) and another with 10^YN for Y 1..6111/N,
I get for 64-bit limbs
S1 - size of 10^1..10^(N-1) table in bytes
S2 - size of 10^YN table
N   S1  S2  S
20  152 388792  388944
32  344 241848  242192
64  1104121560  122664
128 389660144   64040
255 14472   29320   43792
256 14584   29440   44024
266 15704   28032   43736
384 32072   19192   51264
512 56384   14080   70464
where 266 seems to be the minimum, though the difference from 256 is minimal
and having N a power of 2 seems cheaper.  Though, the above is just counting
the bytes of the 64-bit limb arrays concatenated together, I think it will
be helpful to have also an unsigned short table with the indexes into the
limb array (so another 256*2 + 24*2 bytes).
For something not in libgcc_s.so but in libgcc.a I guess 43.5KiB of .rodata
might be acceptable to make it fast.

Jakub



Re: [PATCH] match.pd: Canonicalize (signed x << c) >> c [PR101955]

2023-08-01 Thread Jakub Jelinek via Gcc-patches
On Tue, Aug 01, 2023 at 03:20:33PM -0400, Drew Ross via Gcc-patches wrote:
> Canonicalizes (signed x << c) >> c into the lowest
> precision(type) - c bits of x IF those bits have a mode precision or a
> precision of 1. Also combines this rule with (unsigned x << c) >> c -> x &
> ((unsigned)-1 >> c) to prevent duplicate pattern. Tested successfully on
> x86_64 and x86 targets.
> 
>   PR middle-end/101955
> 
> gcc/ChangeLog:
> 
>   * match.pd ((signed x << c) >> c): New canonicalization.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/pr101955.c: New test.
> ---
>  gcc/match.pd| 20 +++
>  gcc/testsuite/gcc.dg/pr101955.c | 63 +
>  2 files changed, 77 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr101955.c
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 8543f777a28..62f7c84f565 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3758,13 +3758,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   - TYPE_PRECISION (TREE_TYPE (@2)
>(bit_and (convert @0) (lshift { build_minus_one_cst (type); } @1
>  
> -/* Optimize (x << c) >> c into x & ((unsigned)-1 >> c) for unsigned
> -   types.  */
> +/* For (x << c) >> c, optimize into x & ((unsigned)-1 >> c) for
> +   unsigned x OR truncate into the precision(type) - c lowest bits
> +   of signed x (if they have mode precision or a precision of 1)  */

There should be . between ) and "  */" above.

>  (simplify
> - (rshift (lshift @0 INTEGER_CST@1) @1)
> - (if (TYPE_UNSIGNED (type)
> -  && (wi::ltu_p (wi::to_wide (@1), element_precision (type
> -  (bit_and @0 (rshift { build_minus_one_cst (type); } @1
> + (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
> + (if (wi::ltu_p (wi::to_wide (@1), element_precision (type)))
> +  (if (TYPE_UNSIGNED (type))
> +   (bit_and @0 (rshift { build_minus_one_cst (type); } @1))

This needs to be (convert @0) instead of @0, because now that there is
the nop_convert? in between, @0 could have different type than type.
I certainly see regressions on
gcc.c-torture/compile/950612-1.c
on i686-linux because of this:
/home/jakub/src/gcc/gcc/testsuite/gcc.c-torture/compile/950612-1.c:17:1: error: 
type mismatch in binary expression
long long unsigned int

long long int

long long unsigned int

_346 = _3 & 4294967295;
during GIMPLE pass: forwprop
/home/jakub/src/gcc/gcc/testsuite/gcc.c-torture/compile/950612-1.c:17:1: 
internal compiler error: verify_gimple failed
0x9018a4e verify_gimple_in_cfg(function*, bool, bool)
../../gcc/tree-cfg.cc:5646
0x8e81eb5 execute_function_todo
../../gcc/passes.cc:2088
0x8e8234c do_per_function
../../gcc/passes.cc:1687
0x8e82431 execute_todo
../../gcc/passes.cc:2142
Please submit a full bug report, with preprocessed source (by using 
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.

> +   (if (INTEGRAL_TYPE_P (type))
> +(with {
> +  int width = element_precision (type) - tree_to_uhwi (@1);
> +  tree stype = build_nonstandard_integer_type (width, 0);
> + }
> + (if (width  == 1 || type_has_mode_precision_p (stype))
> +  (convert (convert:stype @0

just one space before == instead of two

> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr101955.c
> @@ -0,0 +1,63 @@
> +/* { dg-do compile } */

The above line should be
/* { dg-do compile { target int32 } } */
because the test relies on 32-bit int, some targets have just
16-bit int.
Of course, unless you want to make the testcase more portable, by
using say
#define CHAR_BITS __CHAR_BIT__
#define INT_BITS (__SIZEOF_INT__ * __CHAR_BIT__)
#define LLONG_BITS (__SIZEOF_LONGLONG__ * __CHAR_BIT__)
and replacing all the 31, 24, 56 etc. constants with (INT_BITS - 1),
(INT_BITS - CHAR_BITS), (LLONG_BITS - CHAR_BITS) etc.
Though, it would still fail on some AVR configurations which have
(invalid for C) just 8-bit int, and the question is what to do with
that 16, because (INT_BITS - 2 * CHAR_BITS) is 0 on 16-bit ints, so
it would need to be (INT_BITS / 2) instead.  C requires that
long long is at least 64-bit, so that is less problematic (no known
target to have > 64-bit long long, though theoretically possible).

> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +

Jakub



Re: [PATCH 1/2] Move `~X & X` and `~X | X` over to use bitwise_inverted_equal_p

2023-08-02 Thread Jakub Jelinek via Gcc-patches
On Wed, Aug 02, 2023 at 10:04:26AM +0200, Richard Biener via Gcc-patches wrote:
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -1157,8 +1157,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >
> >  /* Simplify ~X & X as zero.  */
> >  (simplify
> > - (bit_and:c (convert? @0) (convert? (bit_not @0)))
> > -  { build_zero_cst (type); })
> > + (bit_and (convert? @0) (convert? @1))
> > + (if (bitwise_inverted_equal_p (@0, @1))
> > +  { build_zero_cst (type); }))

I wonder if the above isn't incorrect.
Without the possibility of widening converts it would be ok,
but for widening conversions it is significant not just that
the bits of @0 and @1 are inverted, but also that they are either
both signed or both unsigned and so the MS bit (which is guaranteed
to be different) extends to 0s in one case and to all 1s in the other
one, so that even the upper bits are inverted.
But that isn't the case here.  Something like (untested):
long long
foo (unsigned int x)
{
  int y = x;
  y = ~y;
  return ((long long) x) & y;
}
Actually maybe for this pattern it happens to be ok, because while
the upper bits in this case might not be inverted between the extended
operands (if x has msb set), it will be 0 & 0 in the upper bits.

> >
> >  /* PR71636: Transform x & ((1U << b) - 1) -> x & ~(~0U << b);  */
> >  (simplify
> > @@ -1395,8 +1396,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >  /* ~x ^ x -> -1 */
> >  (for op (bit_ior bit_xor)
> >   (simplify
> > -  (op:c (convert? @0) (convert? (bit_not @0)))
> > -  (convert { build_all_ones_cst (TREE_TYPE (@0)); })))
> > +  (op (convert? @0) (convert? @1))
> > +  (if (bitwise_inverted_equal_p (@0, @1))
> > +   (convert { build_all_ones_cst (TREE_TYPE (@0)); }

But not here.
long long
bar (unsigned int x)
{
  int y = x;
  y = ~y;
  return ((long long) x) ^ y;
}

long long
baz (unsigned int x)
{
  int y = x;
  y = ~y;
  return y ^ ((long long) x);
}
You pick TREE_TYPE (@0), but that is a random signedness if the two
operands have different signedness.

Jakub



Re: [PATCH] OpenMP, libgomp: Environment variable syntax extension.

2022-06-30 Thread Jakub Jelinek via Gcc-patches
On Thu, Jun 30, 2022 at 01:40:24PM +0200, Jakub Jelinek via Gcc-patches wrote:
> > +/* The initial ICV values for the host, which are configured with 
> > environment
> > +   variables without a suffix, e.g. OMP_NUM_TEAMS.  */
> > +struct gomp_initial_icvs gomp_initial_icvs_none;
> > +
> > +/* Initial ICV values that were configured for the host and for all 
> > devices by
> > +   using environment variables like OMP_NUM_TEAMS_ALL.  */
> > +struct gomp_initial_icvs gomp_initial_icvs_all;
> > +
> > +/* Initial ICV values that were configured only for devices (not for the 
> > host)
> > +   by using environment variables like OMP_NUM_TEAMS_DEV.  */
> > +struct gomp_initial_icvs gomp_initial_icvs_dev;
> 
> As I said last time, I don't like allocating these
> all the time in the data section of libgomp when at least for a few upcoming
> years, most users will never use those suffixes.
> Can't *_DEV and *_ALL go into the gomp_initial_icv_dev_list
> chain too, perhaps 

Sorry, forgot to finish sentence, I meant perhaps with dev_num of some magic
negative constants, and ensure that the all entry goes e.g. first in the
list, then dev and then the rest, so when filling up say what values to copy
to some device, it would start with the defaults, then if all is present
overwrite from selected all vars, then if non-host and dev is present,
overwrite from selected dev vars and finally overwrite from selected
specific device vars.

Jakub



Re: [GCC 13][PATCH] PR101836: Add a new option -fstrict-flex-array[=n] and use it in __builtin_object_size

2022-06-30 Thread Jakub Jelinek via Gcc-patches
On Thu, Jun 30, 2022 at 03:31:00PM +, Qing Zhao wrote:
> > No, that’s not true.  A FIELD_DELC is only shared for cv variants of a 
> > structure.
> 
> Sorry for my dump questions: 
> 
> 1. What do you mean by “cv variants” of a structure?

const/volatile qualified variants.  So

> 2. For the following example:
> 
> struct AX { int n; short ax[];};

struct AX, const struct AX, volatile const struct AX etc. types will share
the FIELD_DECLs.

> struct UX {struct AX b; int m;};
> 
> Are there two different FIELD_DECLs in the IR, one for AX.ax, the other one 
> is for UX.b.ax?

No, there are just n and ax FIELD_DECLs with DECL_CONTEXT of struct AX and
b and m FIELD_DECLs with DECL_CONTEXT of struct UX.

But, what is important is that when some FIELD_DECL is last in some
structure and has array type, it doesn't mean it should have an
unconstrained length.
In the above case, when struct AX is is followed by some other member, it
acts as a strict short ax[0]; field (even when that is an exception), one
can tak address of &UX.b.ax[0], but can't dereference that, or &UX.b.ax[1].

I believe pedantically flexible array members in such cases don't
necessarily mean zero length array, could be longer, e.g. for the usual
x86_64 alignments
struct BX { long long n; short o; short ax[]; };
struct VX { struct BX b; int m; };
I think it acts as short ax[3]; because the padding at the end of struct BX
is so long that 3 short elements fit in there.
While if one uses
struct BX bx = { 1LL, 2, { 3, 4, 5, 6, 7, 8, 9, 10 } };
(a GNU extension), then it acts as short ax[11]; - the initializer is 8
elements and after short ax[8]; is padding for another 3 full elemenets.
And of course:
struct BX *p = malloc (offsetof (struct BX, ax) + n * sizeof (short));
means short ax[n].
Whether struct WX { struct BX b; };
struct WX *p = malloc (offsetof (struct WX, b.ax) + n * sizeof (short));
is pedantically acting as short ax[n]; is unclear to me, but we are
generally allowing that and people expect it.

Though, on the GCC side, I think we are only treating like flexible arrays
what is really at the end of structs, not followed by other members.

Jakub



Re: [PATCH] OpenMP, libgomp: Environment variable syntax extension.

2022-06-30 Thread Jakub Jelinek via Gcc-patches
On Thu, Jun 30, 2022 at 03:21:15PM +0200, Jakub Jelinek via Gcc-patches wrote:
> On Thu, Jun 30, 2022 at 01:40:24PM +0200, Jakub Jelinek via Gcc-patches wrote:
> > > +/* The initial ICV values for the host, which are configured with 
> > > environment
> > > +   variables without a suffix, e.g. OMP_NUM_TEAMS.  */
> > > +struct gomp_initial_icvs gomp_initial_icvs_none;
> > > +
> > > +/* Initial ICV values that were configured for the host and for all 
> > > devices by
> > > +   using environment variables like OMP_NUM_TEAMS_ALL.  */
> > > +struct gomp_initial_icvs gomp_initial_icvs_all;
> > > +
> > > +/* Initial ICV values that were configured only for devices (not for the 
> > > host)
> > > +   by using environment variables like OMP_NUM_TEAMS_DEV.  */
> > > +struct gomp_initial_icvs gomp_initial_icvs_dev;
> > 
> > As I said last time, I don't like allocating these
> > all the time in the data section of libgomp when at least for a few upcoming
> > years, most users will never use those suffixes.
> > Can't *_DEV and *_ALL go into the gomp_initial_icv_dev_list
> > chain too, perhaps 
> 
> Sorry, forgot to finish sentence, I meant perhaps with dev_num of some magic
> negative constants, and ensure that the all entry goes e.g. first in the
> list, then dev and then the rest, so when filling up say what values to copy
> to some device, it would start with the defaults, then if all is present
> overwrite from selected all vars, then if non-host and dev is present,
> overwrite from selected dev vars and finally overwrite from selected
> specific device vars.

One more thing.  If we go just with gomp_initia_icvs_none and the all/dev
are in the list together with the rest, it would be enough not to use
multi-bit flags, but just single bit, has this env var been specified here
or not.  So, as long as there are <= 32 vars (excluding the ones that don't
have suffixed variants), we can use unsigned int bitmask indexed by the
enum, or if we need <= 64 vars we could use unsigned long long int bitmask.
And have one such bitmask next to gomp_initial_icvs_none and another one in
each list node.

Jakub



[PATCH] wide-int: Fix up wi::shifted_mask [PR106144]

2022-07-01 Thread Jakub Jelinek via Gcc-patches
Hi!

As the following self-test testcase shows, wi::shifted_mask sometimes
doesn't create canonicalized wide_ints, which then fail to compare equal
to canonicalized wide_ints with the same value.
In particular, wi::mask (128, false, 128) gives { -1 } with len 1 and prec 128,
while wi::shifted_mask (0, 128, false, 128) gives { -1, -1 } with len 2
and prec 128.
The problem is that the code is written with the assumption that there are
3 bit blocks (or 2 if start is 0), but doesn't consider the possibility
where there are 2 bit blocks (or 1 if start is 0) where the highest block
isn't present.  In that case, there is the optional block of negate ? 0 : -1
elts, followed by just one elt (either one from the if (shift) or just
negate ? -1 : 0) and the rest is implicit sign-extension.
Only if end < prec there is 1 or more bits above it that have different bit
value and so we need to emit all the elts till end and then one more elt.

if (end == prec) would work too, because we have:
  if (width > prec - start)
width = prec - start;
  unsigned int end = start + width;
so end is guaranteed to be end <= prec, dunno what is preferred.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-07-01  Jakub Jelinek  

PR middle-end/106144
* wide-int.cc (wi::shifted_mask): If end >= prec, return right after
emitting element for shift or if shift is 0 first element after start.
(wide_int_cc_tests): Add tests for equivalency of wi::mask and
wi::shifted_mask with 0 start.

--- gcc/wide-int.cc.jj  2022-01-11 23:11:23.592273263 +0100
+++ gcc/wide-int.cc 2022-06-30 20:41:25.506292687 +0200
@@ -842,6 +842,13 @@ wi::shifted_mask (HOST_WIDE_INT *val, un
val[i++] = negate ? block : ~block;
 }
 
+  if (end >= prec)
+{
+  if (!shift)
+   val[i++] = negate ? 0 : -1;
+  return i;
+}
+
   while (i < end / HOST_BITS_PER_WIDE_INT)
 /* 111 */
 val[i++] = negate ? 0 : -1;
@@ -2583,6 +2590,10 @@ wide_int_cc_tests ()
   run_all_wide_int_tests  ();
   test_overflow ();
   test_round_for_mask ();
+  ASSERT_EQ (wi::mask (128, false, 128),
+wi::shifted_mask (0, 128, false, 128));
+  ASSERT_EQ (wi::mask (128, true, 128),
+wi::shifted_mask (0, 128, true, 128));
 }
 
 } // namespace selftest

Jakub



Re: [GCC 13][PATCH] PR101836: Add a new option -fstrict-flex-array[=n] and use it in __builtin_object_size

2022-07-01 Thread Jakub Jelinek via Gcc-patches
On Fri, Jul 01, 2022 at 12:55:08PM +, Qing Zhao wrote:
> If so, comparing to the current implemenation to have all the checking in 
> middle-end, what’s the 
> major benefit of moving part of the checking into FE, and leaving the other 
> part in middle-end?

The point is recording early what FIELD_DECLs could be vs. can't possibly be
treated like flexible array members and just use that flag in the decisions
in the current routines in addition to what it is doing.

Jakub



Re: [Patch][v5] OpenMP: Move omp requires checks to libgomp

2022-07-01 Thread Jakub Jelinek via Gcc-patches
On Fri, Jul 01, 2022 at 03:06:05PM +0200, Tobias Burnus wrote:
> --- a/gcc/fortran/parse.cc
> +++ b/gcc/fortran/parse.cc
> @@ -1168,7 +1168,8 @@ decode_omp_directive (void)
>  }
>switch (ret)
>  {
> -case ST_OMP_DECLARE_TARGET:
> +/* Set omp_target_seen; exclude ST_OMP_DECLARE_TARGET.
> +   FIXME: Get clarification, cf. OpenMP Spec Issue #3240.  */
>  case ST_OMP_TARGET:
>  case ST_OMP_TARGET_DATA:
>  case ST_OMP_TARGET_ENTER_DATA:
> @@ -6879,11 +6880,14 @@ done:
>  
>/* Fixup for external procedures and resolve 'omp requires'.  */
>int omp_requires;
> +  bool omp_target_seen;
>omp_requires = 0;
> +  omp_target_seen = false;
>for (gfc_current_ns = gfc_global_ns_list; gfc_current_ns;
> gfc_current_ns = gfc_current_ns->sibling)
>  {
>omp_requires |= gfc_current_ns->omp_requires;
> +  omp_target_seen |= gfc_current_ns->omp_target_seen;
>gfc_check_externals (gfc_current_ns);
>  }
>for (gfc_current_ns = gfc_global_ns_list; gfc_current_ns;
> @@ -6908,6 +6912,22 @@ done:
>break;
>  }
>  
> +  if (omp_target_seen)
> +omp_requires_mask = (enum omp_requires) (omp_requires_mask
> +  | OMP_REQUIRES_TARGET_USED);
> +  if (omp_requires & OMP_REQ_REVERSE_OFFLOAD)
> +omp_requires_mask = (enum omp_requires) (omp_requires_mask
> +  | OMP_REQUIRES_REVERSE_OFFLOAD);
> +  if (omp_requires & OMP_REQ_UNIFIED_ADDRESS)
> +omp_requires_mask = (enum omp_requires) (omp_requires_mask
> +  | OMP_REQUIRES_UNIFIED_ADDRESS);
> +  if (omp_requires & OMP_REQ_UNIFIED_SHARED_MEMORY)
> +omp_requires_mask
> +   = (enum omp_requires) (omp_requires_mask
> +  | OMP_REQUIRES_UNIFIED_SHARED_MEMORY);
> +  if (omp_requires & OMP_REQ_DYNAMIC_ALLOCATORS)
> +omp_requires_mask = (enum omp_requires) (omp_requires_mask
> +  | OMP_REQUIRES_DYNAMIC_ALLOCATORS);
>/* Do the parse tree dump.  */
>gfc_current_ns = flag_dump_fortran_original ? gfc_global_ns_list : NULL;

Will Fortran diagnose:
subroutine foo
!$omp requires unified_shared_memory
!$omp target
!$omp end target
end subroutine foo
subroutine bar
!$omp requires reverse_offload
!$omp target
!$omp end target
end subroutine bar

or just merge it from the different namespaces?
This is something that can be handled separately if it isn't resolved
and might need clarification from omp-lang.

> @@ -1764,6 +1781,20 @@ input_symtab (void)
>  }
>  }
>  
> +static void
> +omp_requires_to_name (char *buf, size_t size, unsigned int requires_mask)
> +{
> +  char *end = buf + size, *p = buf;
> +  if (requires_mask & GOMP_REQUIRES_UNIFIED_ADDRESS)
> +p += snprintf (p, end - p, "unified_address");
> +  if (requires_mask & GOMP_REQUIRES_UNIFIED_SHARED_MEMORY)
> +p += snprintf (p, end - p, "%sunified_shared_memory",
> +(p == buf ? "" : ", "));
> +  if (requires_mask & GOMP_REQUIRES_REVERSE_OFFLOAD)
> +p += snprintf (p, end - p, "%sreverse_offload",
> +(p == buf ? "" : ", "));

So, what does this print if requires_mask is 0 (or just the target used bit
set but not unified_address, unified_shared_memory nor reverse_offload)?
Say in case of:
a.c
#pragma omp requires unified_address
void foo (void) {
#pragma omp target
;
}
b.c:
void bar (void) {
#pragma omp target
;
}
gcc -fopenmp -shared -o a.so a.c b.c
?

> @@ -1810,6 +1847,54 @@ input_offload_tables (bool do_force_output)
>may be no refs to var_decl in offload LTO mode.  */
> if (do_force_output)
>   varpool_node::get (var_decl)->force_output = 1;
> +   tmp_decl = var_decl;
> + }
> +   else if (tag == LTO_symtab_edge)
> + {
> +   static bool error_emitted = false;
> +   HOST_WIDE_INT val = streamer_read_hwi (ib);
> +
> +   if (omp_requires_mask == 0)
> + {
> +   omp_requires_mask = (omp_requires) val;
> +   requires_decl = tmp_decl;
> +   requires_fn = file_data->file_name;

And similarly here, if some device construct is seen but requires
directive isn't, not sure if in this version val would be 0 or something
with the TARGET_USED bit set.  In the latter case, only what is printed
for no requires or just atomic related requires is a problem, in the former
case due to the == 0 check mixing of 0 with non-zero would be ignored
but mixing of non-zero with 0 wouldn't be.

> + }
> +   else if (omp_requires_mask != val && !error_emitted)
> + {
> +   char buf[64], buf2[64];

Perhaps cleaner would be to size the buffers as
sizeof ("unified_address,unified_shared_memory,reverse_offload")
64 is more, but just a wild guess and if further clauses are added later,
it might be too small.

> +(p == buf ? "" : ", "));
> +  if (

Re: [RFC] trailing_wide_ints with runtime variable lengths

2022-07-01 Thread Jakub Jelinek via Gcc-patches
On Fri, Jul 01, 2022 at 04:12:55PM +0200, Aldy Hernandez wrote:
> > --- a/gcc/wide-int.h
> > +++ b/gcc/wide-int.h
> > @@ -1373,10 +1373,13 @@ namespace wi
> >  : public int_traits  {};
> >  }
> >
> > -/* An array of N wide_int-like objects that can be put at the end of
> > -   a variable-sized structure.  Use extra_size to calculate how many
> > -   bytes beyond the sizeof need to be allocated.  Use set_precision
> > -   to initialize the structure.  */
> > +/* A variable-lengthed array of wide_int-like objects that can be put
> > +   at the end of a variable-sized structure.  The number of objects is
> > +   at most N and can be set at runtime by using set_precision().
> > +
> > +   Use extra_size to calculate how many bytes beyond the
> > +   sizeof need to be allocated.  Use set_precision to initialize the
> > +   structure.  */
> >  template 
> >  struct GTY((user)) trailing_wide_ints
> >  {
> > @@ -1387,6 +1390,9 @@ private:
> >/* The shared maximum length of each number.  */
> >unsigned char m_max_len;
> >
> > +  /* The number of elements.  */
> > +  unsigned char m_num_elements;

IMNSHO you certainly don't want to change like this existing
trailing_wide_ints, you don't want to grow unnecessarily existing
trailing_wide_ints users (e.g. const_poly_int_def).

My brief understanding of wide-int.h is that in some cases stuff like this
is implied from template parameters or exact class instantiation and in
other cases it is present explicitly and class inheritence is used to hide
that stuff nicely.

So, you are looking for something like trailing_wide_ints but where that
N is actually a runtime value?  Then e.g. the
  struct {unsigned char len;} m_len[N];
member can't work properly either, because it isn't constant size.

Jakub



Re: [Patch] OpenMP: Handle tofrom with target enter/exit data

2022-07-01 Thread Jakub Jelinek via Gcc-patches
On Fri, Jul 01, 2022 at 05:41:27PM +0200, Tobias Burnus wrote:
> --- a/gcc/fortran/dump-parse-tree.cc
> +++ b/gcc/fortran/dump-parse-tree.cc
> @@ -1414,6 +1414,11 @@ show_omp_namelist (int list_type, gfc_omp_namelist *n)
> case OMP_MAP_TO: fputs ("to:", dumpfile); break;
> case OMP_MAP_FROM: fputs ("from:", dumpfile); break;
> case OMP_MAP_TOFROM: fputs ("tofrom:", dumpfile); break;
> +   case OMP_MAP_ALWAYS_TO: fputs ("always,to:", dumpfile); break;
> +   case OMP_MAP_ALWAYS_FROM: fputs ("always,from:", dumpfile); break;
> +   case OMP_MAP_ALWAYS_TOFROM: fputs ("always,tofrom:", dumpfile); break;
> +   case OMP_MAP_DELETE: fputs ("always,tofrom:", dumpfile); break;
> +   case OMP_MAP_RELEASE: fputs ("always,tofrom:", dumpfile); break;

Pasto in the last 2 lines?

Other than that LGTM.

Jakub



Re: [Patch][v5] OpenMP: Move omp requires checks to libgomp

2022-07-01 Thread Jakub Jelinek via Gcc-patches
On Fri, Jul 01, 2022 at 06:31:48PM +0200, Tobias Burnus wrote:
> This is done in openmp.cc during parsing. The merging you quoted (in 
> parse.cc) happens
> after the whole input file has been parsed and resolved. For your test case, 
> the
> following error is shown:
> 
> test.f90:1:15:
> 
> 1 |  subroutine foo
>   |   1
> Error: Program unit at (1) has OpenMP device constructs/routines but does not 
> set !$OMP REQUIRES REVERSE_OFFLOAD but other program units do
> test.f90:6:14:
> 
> 6 | subroutine bar
>   |  1
> Error: Program unit at (1) has OpenMP device constructs/routines but does not 
> set !$OMP REQUIRES UNIFIED_SHARED_MEMORY but other program units do

Great.

> > @@ -1764,6 +1781,20 @@ input_symtab (void)
> > >   }
> > >   }
> > > 
> > > +static void
> > > +omp_requires_to_name (char *buf, size_t size, unsigned int requires_mask)
> > > +{
> > > +  char *end = buf + size, *p = buf;
> > > +  if (requires_mask & GOMP_REQUIRES_UNIFIED_ADDRESS)
> > > +p += snprintf (p, end - p, "unified_address");
> > > +  if (requires_mask & GOMP_REQUIRES_UNIFIED_SHARED_MEMORY)
> > > +p += snprintf (p, end - p, "%sunified_shared_memory",
> > > +   (p == buf ? "" : ", "));
> > > +  if (requires_mask & GOMP_REQUIRES_REVERSE_OFFLOAD)
> > > +p += snprintf (p, end - p, "%sreverse_offload",
> > > +   (p == buf ? "" : ", "));
> > So, what does this print if requires_mask is 0 (or just the target used bit
> > set but not unified_address, unified_shared_memory nor reverse_offload)?
> 
> Well, that's what libgomp/testsuite/libgomp.c-c++-common/requires-2.c (+ 
> *-2-aux.c)
> tests:
> 
> /* { dg-error "OpenMP 'requires' directive with non-identical clauses in 
> multiple compilation units: 'unified_shared_memory' vs. ''" "" { target *-*-* 
> } 0 }  */
> 
> I hope the '' vs. 'unified_shared_memory' is clear - but if you have a better 
> wording.

I must be missing how that works.  Because the buf in the callers is
uninitialized and this function doesn't store there anything if
requires_mask == 0.
Perhaps you're just lucky and the stack contains '\0' there?

> Note that both:
>   no 'omp requires'
> and
>   'omp requires' with other clauses (such as the atomic ones or 
> dynamic_allocators)
> will lead to 0. Thus, if the wording is changed, it should fit for both cases.

Maybe it would be better to simply use different error message for the
0 vs. non-0 case, canonicalized to non-0 vs. 0 order so that it is just
2 messages vs. 3 and wording like
"OpenMP 'requires' directive with '' clauses specified only in some 
compilation units"
note: specified here ...
note: but not here ...

> > > +  if (omp_requires_mask == 0)
> > > +{
> > > +  omp_requires_mask = (omp_requires) val;
> > > +  requires_decl = tmp_decl;
> > > +  requires_fn = file_data->file_name;
> > And similarly here, if some device construct is seen but requires
> > directive isn't, not sure if in this version val would be 0 or something
> > with the TARGET_USED bit set.  In the latter case, only what is printed
> > for no requires or just atomic related requires is a problem, in the former
> > case due to the == 0 check mixing of 0 with non-zero would be ignored
> > but mixing of non-zero with 0 wouldn't be.
> 
> Here: 0 = "unset" in the sense that either TARGET_USE nor USM/UA/RO was
> specified. If any of those is set, we get != 0.

Ok.
> 
> For mkoffload, the single results are merged - and TARGET_USE is stripped,
> such that it is either 0 or a combination of USM/UA/RO

I'd find it clearer if we never stripped that, so that even the library knows.
The details will depend on the resolution of #3240.
Whether say declare target and no device constructs and device related API
calls etc. force it too or not.  If not, you could get 0 even if you are
actually registering something, just not target regions.
If anything that will lead to GOMP_offload_register_ver actually means
TARGET_USED, then it isn't necessary.  But even if it isn't necessary,
e.g. for backwards compatibility with GOMP_VERSION == 1 it will be easier
to have that bit in.  0 will then mean older gcc built library or binary.

> > > +}
> > > +  else if (omp_requires_mask != val && !error_emitted)
> > > +{
> > > +  char buf[64], buf2[64];
> > Perhaps cleaner would be to size the buffers as
> > sizeof ("unified_address,unified_shared_memory,reverse_offload")
> > 64 is more, but just a wild guess and if further clauses are added later,
> > it might be too small.
> 
> I concur – except that ',' should be ', '.
> (Likewise in libgomp/target.c)

Good catch.

> > Is
> > c.c:
> > #pragma omp requires unified_shared_memory
> > d.c:
> > void baz (void) {
> >#pragma omp target
> >;
> > }
> > ok?
> 
> This one is *already* streamed out as it creates a symbol and entry in
> in offload_functions (baz.omp_fn.0).
> 
> The code is rather 

Re: [RFC] trailing_wide_ints with runtime variable lengths

2022-07-01 Thread Jakub Jelinek via Gcc-patches
On Fri, Jul 01, 2022 at 06:47:48PM +0200, Aldy Hernandez wrote:
> > So, you are looking for something like trailing_wide_ints but where that
> > N is actually a runtime value?  Then e.g. the
> >   struct {unsigned char len;} m_len[N];
> > member can't work properly either, because it isn't constant size.
> 
> What my patch does is store the run-time length in the aforementioned
> byte, while defaulting to N/MAX.  There is no size difference (or code
> changes) for existing users.  With my change, set_precision() and
> extra_size() now take a runtime parameter, but it defaults to N and is
> inlined, so there is no penalty for existing users.  I benchmarked to
> make sure :).

So, you still use N = 3 but can sometimes store say 255 wide_ints in there?
In that case, m_len[N] provides lengths just for the first 3, no?

Anyway, you really want feedback from Richard Sandiford IMHO...

Jakub



Re: [RFC] trailing_wide_ints with runtime variable lengths

2022-07-01 Thread Jakub Jelinek via Gcc-patches
On Fri, Jul 01, 2022 at 07:43:28PM +0200, Aldy Hernandez wrote:
> You can still say N=255 and things continue to work as they do now, since
> m_len[] is still statically determined. The only difference is that before,
> the size of the structure would be 2+1+255+sizeof(int) whereas now it would
> be 1 more because of the byte I'm using for num_elements.

So, what N do you want to use for SSA_NAME_RANGE_INFO?
N=255 wouldn't be very space efficient especially if the common case is a
single range or two.
For such cases making e.g. m_len not an embedded array, but pointer to
somewhere after the HOST_WIDE_INT array in the same allocation would be
better.

Jakub



Re: [Patch][v5] OpenMP: Move omp requires checks to libgomp

2022-07-04 Thread Jakub Jelinek via Gcc-patches
On Fri, Jul 01, 2022 at 11:08:16PM +0200, Tobias Burnus wrote:
>gomp_mutex_lock (®ister_lock);
>  
> +  if (omp_requires_mask && omp_requires_mask != omp_req)

I'd use if (omp_req && omp_requires_mask && omp_requires_mask != omp_req)
e.g. for the case of mixing GCC <= 12 compiled code with GCC 13,
treat omp_req 0 as "don't know" while GOMP_REQUIRES_TARGET_USED
as "known and no requires uni*/rev* specified".

> +{
> +  char buf1[sizeof ("unified_address, unified_shared_memory, "
> + "reverse_offload")];
> +  char buf2[sizeof ("unified_address, unified_shared_memory, "
> + "reverse_offload")];
> +  gomp_requires_to_name (buf2, sizeof (buf2),
> +  omp_req != GOMP_REQUIRES_TARGET_USED
> +  ? omp_req : omp_requires_mask);
> +  if (omp_req != GOMP_REQUIRES_TARGET_USED
> +   && omp_requires_mask != GOMP_REQUIRES_TARGET_USED)
> + {
> +   gomp_requires_to_name (buf1, sizeof (buf1), omp_requires_mask);
> +   gomp_fatal ("OpenMP 'requires' directive with non-identical clauses "
> +   "in multiple compilation units: '%s' vs. '%s'",
> +   buf1, buf2);
> + }
> +  else
> + gomp_fatal ("OpenMP 'requires' directive with '%s' specified only in "
> + "some compilation units", buf2);
> +}
> +  omp_requires_mask = omp_req;
> +
>/* Load image to all initialized devices.  */
>for (i = 0; i < num_devices; i++)
>  {
> @@ -4125,8 +4173,30 @@ gomp_target_init (void)
>  
>   if (gomp_load_plugin_for_device (¤t_device, plugin_name))
> {
> - new_num_devs = current_device.get_num_devices_func ();
> - if (new_num_devs >= 1)
> + int omp_req = omp_requires_mask & ~GOMP_REQUIRES_TARGET_USED;
> + new_num_devs = current_device.get_num_devices_func (omp_req);
> + if (new_num_devs < 0)

Can this be if (gomp_debug && new_num_devs < 0) - i.e. be verbose only
when the user asks for it?

> +   {
> + bool found = false;
> + int type = current_device.get_type_func ();
> + for (int img = 0; img < num_offload_images; img++)
> +   if (type == offload_images[img].type)
> + found = true;
> + if (found)
> +   {
> + char buf[sizeof ("unified_address, unified_shared_memory, "
> +  "reverse_offload")];
> + gomp_requires_to_name (buf, sizeof (buf), omp_req);
> + char *name = (char *) malloc (cur_len + 1);
> + memcpy (name, cur, cur_len);
> + name[cur_len] = '\0';
> + GOMP_PLUGIN_error ("note: %s devices present but 'omp "
> +"requires %s' cannot be fulfilled",
> +name, buf);
> + free (name);
> +   }
> +   }
> + else if (new_num_devs >= 1)
> {
>   /* Augment DEVICES and NUM_DEVICES.  */
>  

Otherwise LGTM.

Jakub



Re: [PATCH] gcc-descr: by default show revision for HEAD

2022-07-04 Thread Jakub Jelinek via Gcc-patches
On Mon, Jul 04, 2022 at 12:15:00PM +0200, Martin Liška wrote:
> I've made the mistake multiple times while doing a bisection.
> One would expect that gcc-descr w/o an argument would print
> revision for the current HEAD and not master.
> Thus change it.
> 
> Ready to be installed?
> Thanks,
> Martin
> 
> contrib/ChangeLog:
> 
>   * git-descr.sh: By default print revision for HEAD.

Ok.

> ---
>  contrib/git-descr.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/contrib/git-descr.sh b/contrib/git-descr.sh
> index 96a993095bd..28db83abc08 100755
> --- a/contrib/git-descr.sh
> +++ b/contrib/git-descr.sh
> @@ -4,7 +4,7 @@
>  
>  short=no
>  long=no
> -c=master
> +c=HEAD
>  
>  for arg in "$@"
>  do
> -- 
> 2.36.1

Jakub



Re: [Patch] OpenMP/Fortran: Add support for OpenMP 5.2 linear clause syntax

2022-07-04 Thread Jakub Jelinek via Gcc-patches
On Mon, Jul 04, 2022 at 04:10:03PM +0200, Tobias Burnus wrote:
> This patch adds support for the OpenMP 5.2 syntax for the linear clause,
> following the C/C++ patch. The testcases are modified versions from the
> C/C++ ones, plus one added one for duplicated modifiers.
> 
> At least to me it is not quite clear when
>   linear ( var : ref)
> refers to a variable 'ref' and when to the linear-modifier 'ref'; the
> spec does not seem to be very clear about it. I made an attempt, based

See OpenMP 5.2 [59:31-34]:
A modifier that is an expression must neither lexically match the name of a 
simple modifier
defined for the clause that is an OpenMP keyword nor modifier-name 
parenthesized-tokens,
where modifier-name is the modifier-name of a complex modifier defined for the 
clause and
parenthesized-tokens is a token sequence that starts with ( and ends with ).

So, ref can't be step expression because it lexically matches the name of a
simple modifier, so linear (var : ref) is equivalent to old style linear (ref 
(var):1)
while e.g. linear (var : ref + 0) is equivalent to linear (var : step (ref + 0))

> +   else if (end_colon)
> + {
> +   gfc_symtree *st;
> +   bool has_modifiers = false;
> +   bool duplicate_step = false;
> +   bool duplicate_mod = false;
> +   while (true)
> + {
> +   old_loc = gfc_current_locus;
> +   if (gfc_match ("val )") == MATCH_YES)
> + {

So, if you see val ) even right after colon (when !old_linear_modifiers), it is
always linear modifier, so the if (!has_modifiers) looks wrong.

> +   if (!has_modifiers)
> + {
> +   gfc_find_sym_tree ("val", NULL, true, &st);
> +   bool has_val = (st
> +   && !st->n.sym->attr.function
> +   && !st->n.sym->attr.dimension);
> +   locus loc = gfc_current_locus;
> +   gfc_current_locus = old_loc;
> +   if (has_val
> +   && gfc_match (" %e ) ", &step) == MATCH_YES)
> + break;
> +   gfc_current_locus = loc;
> + }
> +   if (linear_op != OMP_LINEAR_DEFAULT)
> + {
> +   duplicate_mod = true;
> +   break;
> + }
> +   linear_op = OMP_LINEAR_VAL;
> +   has_modifiers = true;
> +   break;
> + }
> +   else if (gfc_match ("uval )") == MATCH_YES)
> + {

Likewise.

> +   if (!has_modifiers)

> +   else if (gfc_match ("ref )") == MATCH_YES)
> + {

And again.

> +   if (!has_modifiers)

> +   else if (gfc_match ("step ( ") == MATCH_YES)
> + {

step ( could start both valid step expression and be a valid modifier.

But that decision shouldn't be based on whether there is a step symtree or
not, but whether it is step ( whatever ) ) or step ( whatever ) ,
(in that case it should be parsed as the complex modifier with expression
in it), otherwise it is parsed as step expression.

The whatever above means some tokens with balanced parentheses.

I doubt the Fortran FE has something like that right now.

You can certainly try to match "step ( %e ) )" or "step ( %e ) , " first,
those would handle the case of valid complex modifier.
But, I think if there is
  interface
integer function step (x, y, z)
  integer :: x, y, z
end function step
  end interface
then
  linear (v : step (x, y, z))
should be rejected, not accepted as valid
  linear (v : step (step (x, y, z)))

I think I should add:
int step (int x, int y, int z) { return x + y + z; }

int
foo (int x)
{
  int i;
  #pragma omp parallel for linear (x : step (step (1, 2, 3)))
  for (i = 0; i < 64; i++)
x += 6;
  return x;
}

int
bar (int x)
{
  int i;
  #pragma omp parallel for linear (x : step (1, 2, 3))  /* { dg-error 
"expected" } */
  for (i = 0; i < 64; i++)
x += 6;
  return x;
}
as another testcase (where foo used to be invalid before and bar used to be
valid).

Jakub



Re: [PATCH] OpenMP, libgomp: Environment variable syntax extension.

2022-07-04 Thread Jakub Jelinek via Gcc-patches
On Thu, Jun 30, 2022 at 01:40:24PM +0200, Jakub Jelinek via Gcc-patches wrote:
> On Fri, Jun 10, 2022 at 03:59:37PM +0200, Marcel Vollweiler wrote:
> > > I'm not sure we can rely on execv on all targets that do support libgomp.
> > > Any reason why you actually need this, rather than using
> > > dg-set-target-env-var directive(s) and perhaps return 0; if getenv doesn't
> > > return the expected values?
> > 
> > Interesting topic. After some (internal) discussions I think the best way 
> > is to
> > set the environment variables explicitely instead using 
> > dg-set-target-env-var.
> > The reason is that dg-set-target-env-var does not work for remote testing 
> > (which
> > seems to be a common test environment). For remote testing dejagnu 
> > immediately
> > aborts the test case with UNSUPPORTED which is specified in the 
> > corresponding
> > extension and makes sence from my point of view as the test assumption 
> > cannot be
> > fulfilled (since the environment variables are not set on remote targets).
> > It also means that whenever dg-set-target-env-var is set in the test file, 
> > the
> > execution of the test case is not tested on remote targets.
> 
> The only reason why dg-set-target-env-var is supported on native only right
> now is that I'm never doing remote testing myself and so couldn't test that.
> There is no inherent reason why the env vars couldn't be propagated over to
> the remote and set in the environment there.
> So trying to work around that rather than at least trying to change
> dg-set-target-env-var so that it works with the remote testing you do looks
> wrong.
> If dg-set-target-env-var can be made to work remotely, it will magically
> improve those 130+ tests that use it already together with the newly added
> tests.
> 
> So, I'd suggest to just use dg-set-target-env-var and incrementally work on
> making it work for remote testing if that is important to whomever does
> that kind of testing.  Could be e.g. a matter of invoking remotely
> env VAR1=val1 VAR2=val2 program args
> instead of program args.  If env is missing on the remote side, it could
> be UNSUPPORTED then.

essentially where we now do:
if { [info exists set_target_env_var] \
 && [llength $set_target_env_var] != 0 } {
if { [is_remote target] } {
return [list "unsupported" ""]
}
set-target-env-var
}
in the is_remote case check (ideally cached) whether env program works on the
target using remote exec with the env vars temporarily disabled and trying
something like
env MYVAR=value
and if that succeeds and prints MYVAR=value in the output, don't
return [list "unsupported" ""] and instead continue without doing that
set-target-env-var/restore-target-env-var stuff
and after this big block wrap also call_remote and if it is exec and the
env vars are around (same stuff as how the intercepted ${tool}_load does
this), prepend env and the vars in VAR=value format before the rest of args.

Jakub



[PATCH] c++, v3: Add support for __real__/__imag__ modifications in constant expressions [PR88174]

2022-07-04 Thread Jakub Jelinek via Gcc-patches
On Mon, Jun 27, 2022 at 06:31:18PM +0200, Jakub Jelinek via Gcc-patches wrote:
> Because the late evaluation of the initializer could have touched
> the destination, so we need to reevaluate it.
> Same reason why we call get_or_insert_ctor_field again in the second
> loop as we call it in the first loop.
> If it would help, I could move that repeated part into:

> > This seems like it needs to come before the ctors loop, so that these flags
> > can be propagated out to enclosing constructors.
> 
> I could indeed move this in between
>   bool c = TREE_CONSTANT (init);
>   bool s = TREE_SIDE_EFFECTS (init);
> and
>   if (!c || s || activated_union_member_p)
> and update c and s from *cexpr flags.

Here is it in patch form, so far lightly tested, ok for trunk if it passes
full bootstrap/regtest?

2022-07-04  Jakub Jelinek  

PR c++/88174
* constexpr.cc (canonicalize_complex_to_complex_expr): New function.
(cxx_eval_store_expression): Handle REALPART_EXPR and IMAGPART_EXPR.
Change ctors from releasing_vec to auto_vec, adjust all uses.

* g++.dg/cpp1y/constexpr-complex1.C: New test.

--- gcc/cp/constexpr.cc.jj  2022-07-04 12:26:18.147053851 +0200
+++ gcc/cp/constexpr.cc 2022-07-04 17:35:53.100556949 +0200
@@ -5640,6 +5640,26 @@ modifying_const_object_p (tree_code code
   return false;
 }
 
+/* Helper of cxx_eval_store_expression, turn a COMPLEX_CST or
+   empty no clearing CONSTRUCTOR into a COMPLEX_EXPR.  */
+
+static tree
+canonicalize_complex_to_complex_expr (tree t)
+{
+  if (TREE_CODE (t) == COMPLEX_CST)
+t = build2 (COMPLEX_EXPR, TREE_TYPE (t),
+   TREE_REALPART (t), TREE_IMAGPART (t));
+  else if (TREE_CODE (t) == CONSTRUCTOR
+  && CONSTRUCTOR_NELTS (t) == 0
+  && CONSTRUCTOR_NO_CLEARING (t))
+{
+  tree r = build_constructor (TREE_TYPE (TREE_TYPE (t)), NULL);
+  CONSTRUCTOR_NO_CLEARING (r) = 1;
+  t = build2 (COMPLEX_EXPR, TREE_TYPE (t), r, r);
+}
+  return t;
+}
+
 /* Evaluate an INIT_EXPR or MODIFY_EXPR.  */
 
 static tree
@@ -5726,6 +5746,20 @@ cxx_eval_store_expression (const constex
  }
  break;
 
+   case REALPART_EXPR:
+ gcc_assert (probe == target);
+ vec_safe_push (refs, probe);
+ vec_safe_push (refs, TREE_TYPE (probe));
+ probe = TREE_OPERAND (probe, 0);
+ break;
+
+   case IMAGPART_EXPR:
+ gcc_assert (probe == target);
+ vec_safe_push (refs, probe);
+ vec_safe_push (refs, TREE_TYPE (probe));
+ probe = TREE_OPERAND (probe, 0);
+ break;
+
default:
  if (evaluated)
object = probe;
@@ -5764,7 +5798,8 @@ cxx_eval_store_expression (const constex
   type = TREE_TYPE (object);
   bool no_zero_init = true;
 
-  releasing_vec ctors, indexes;
+  auto_vec ctors;
+  releasing_vec indexes;
   auto_vec index_pos_hints;
   bool activated_union_member_p = false;
   bool empty_base = false;
@@ -5804,14 +5839,26 @@ cxx_eval_store_expression (const constex
  *valp = ary_ctor;
}
 
-  /* If the value of object is already zero-initialized, any new ctors for
-subobjects will also be zero-initialized.  */
-  no_zero_init = CONSTRUCTOR_NO_CLEARING (*valp);
-
   enum tree_code code = TREE_CODE (type);
   tree reftype = refs->pop();
   tree index = refs->pop();
 
+  if (code == COMPLEX_TYPE)
+   {
+ *valp = canonicalize_complex_to_complex_expr (*valp);
+ gcc_assert (TREE_CODE (*valp) == COMPLEX_EXPR);
+ ctors.safe_push (valp);
+ vec_safe_push (indexes, index);
+ valp = &TREE_OPERAND (*valp, TREE_CODE (index) == IMAGPART_EXPR);
+ gcc_checking_assert (refs->is_empty ());
+ type = reftype;
+ break;
+   }
+
+  /* If the value of object is already zero-initialized, any new ctors for
+subobjects will also be zero-initialized.  */
+  no_zero_init = CONSTRUCTOR_NO_CLEARING (*valp);
+
   if (code == RECORD_TYPE && is_empty_field (index))
/* Don't build a sub-CONSTRUCTOR for an empty base or field, as they
   have no data and might have an offset lower than previously declared
@@ -5854,7 +5901,7 @@ cxx_eval_store_expression (const constex
  no_zero_init = true;
}
 
-  vec_safe_push (ctors, *valp);
+  ctors.safe_push (valp);
   vec_safe_push (indexes, index);
 
   constructor_elt *cep
@@ -5916,11 +5963,11 @@ cxx_eval_store_expression (const constex
 semantics are not applied on an object under construction.
 They come into effect when the constructor for the most
 derived object ends."  */
- for (tree elt : *ctors)
+ for (tree *elt : ctors)
if (same_type_ignoring_top_level_qualifiers_p
-   (TREE_TYPE (const_object_being_modifie

Re: [Patch] OpenMP/Fortran: Add support for OpenMP 5.2 linear clause syntax

2022-07-04 Thread Jakub Jelinek via Gcc-patches
On Mon, Jul 04, 2022 at 06:09:31PM +0200, Tobias Burnus wrote:
> thanks for the comment & spec referral. I have now updated the patch –
> and included the new C/Fortran testcase.

Thanks.

> +   while (true)
> + {
> +   old_loc = gfc_current_locus;
> +   if (gfc_match ("val )") == MATCH_YES)
> + {
> +   if (linear_op != OMP_LINEAR_DEFAULT)
> + {
> +   duplicate_mod = true;
> +   break;
> + }
> +   linear_op = OMP_LINEAR_VAL;
> +   has_modifiers = true;
> +   break;
> + }
> +   else if (gfc_match ("val , ") == MATCH_YES)
> + {
> +   if (linear_op != OMP_LINEAR_DEFAULT)
> + {
> +   duplicate_mod = true;
> +   break;
> + }
> +   linear_op = OMP_LINEAR_VAL;
> +   has_modifiers = true;
> +   continue;
> + }

Perhaps you could avoid some code duplication by doing it like:
  bool close_paren = gfc_match ("val )") == MATCH_YES;
  if (close_paren || gfc_match ("val , ") == MATCH_YES)
{
  if (linear_op != OMP_LINEAR_DEFAULT)
{
  duplicate_mod = true;
  break;
}
  linear_op = OMP_LINEAR_VAL;
  has_modifiers = true;
  if (close_paren)
break;
  else
continue;
}
and similarly for uval and ref.

> +   else if (!has_modifiers
> +&& gfc_match ("%e )", &step) == MATCH_YES)
> + {
> +   if ((step->expr_type == EXPR_FUNCTION
> + || step->expr_type == EXPR_VARIABLE)
> +   && strcmp (step->symtree->name, "step") == 0)
> + {
> +   gfc_current_locus = old_loc;
> +   gfc_match ("step (");
> +   has_error = true;
> + }

I think the above should accept even
linear (v : step (1) + 0) or linear (v : step (1, 2, 3) * 1)
which is desirable, because step then will be some other operation (hope
folding isn't performed yet).

> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/gomp/linear-4.f90
> @@ -0,0 +1,102 @@
> +! { dg-do compile }
> +! { dg-options "-fopenmp" }
> +
> +module m
> +implicit none
> +
> +integer :: i
> +
> +interface
> +  integer function bar (x,  y, z)
> +integer :: x, y
> +integer, value :: z
> +!!$omp declare simd linear (x : ref, step (1)) linear (y : step (2), 
> uval)

Are all these !! intentional?
The test then doesn't test much.

Or is that a FIXME?

Jakub



Re: [Patch] OpenMP/Fortran: Add support for OpenMP 5.2 linear clause syntax

2022-07-04 Thread Jakub Jelinek via Gcc-patches
On Mon, Jul 04, 2022 at 08:29:37PM +0200, Tobias Burnus wrote:
> Fortran part to C/C++
> commit r13-1002-g03b71406323ddc065b1d7837d8b43b17e4b048b5
> 
> gcc/fortran/ChangeLog:
> 
>   * gfortran.h (gfc_omp_namelist): Update by creating 'linear' struct,
>   move 'linear_op' as 'op' to id and add 'old_modifier' to it.
>   * dump-parse-tree.cc (show_omp_namelist): Update accordingly.
>   * module.cc (mio_omp_declare_simd): Likewise.
>   * trans-openmp.cc (gfc_trans_omp_clauses): Likewise.
>   * openmp.cc (resolve_omp_clauses): Likewise; accept new-style
>   'val' modifier with do/simd.
>   (gfc_match_omp_clauses): Handle OpenMP 5.2 linear clause syntax.
> 
> libgomp/ChangeLog:
> 
>   * libgomp.texi (OpenMP 5.2): Mark linear-clause change as 'Y'.
> 
> gcc/testsuite/ChangeLog:
> 
>   * c-c++-common/gomp/linear-4.c: New test.
>   * gfortran.dg/gomp/linear-2.f90: New test.
>   * gfortran.dg/gomp/linear-3.f90: New test.
>   * gfortran.dg/gomp/linear-4.f90: New test.
>   * gfortran.dg/gomp/linear-5.f90: New test.
>   * gfortran.dg/gomp/linear-6.f90: New test.
>   * gfortran.dg/gomp/linear-7.f90: New test.
>   * gfortran.dg/gomp/linear-8.f90: New test.

Ok, thanks.

Jakub



Re: [PATCH] c++, v3: Add support for __real__/__imag__ modifications in constant expressions [PR88174]

2022-07-05 Thread Jakub Jelinek via Gcc-patches
On Tue, Jul 05, 2022 at 04:44:41PM -0400, Jason Merrill wrote:
> On 7/4/22 11:50, Jakub Jelinek wrote:
> > On Mon, Jun 27, 2022 at 06:31:18PM +0200, Jakub Jelinek via Gcc-patches 
> > wrote:
> 
> > > > Hmm, why do we need to handle complex in the !preeval case?  I'd think 
> > > > we
> > > > want to preevaluate all complex values or components thereof.
> 
> > > Because the late evaluation of the initializer could have touched
> > > the destination, so we need to reevaluate it.
> > > Same reason why we call get_or_insert_ctor_field again in the second
> > > loop as we call it in the first loop.
> 
> But preeval should always be true, so we'd never reach the new handling in
> the if (!preeval) block.  Certainly the new testcase doesn't exercise this
> code.

Ah, you're right, in the complex case SCALAR_TYPE_P (type) has to be true
because it is COMPLEX_TYPE and so preeval must be true.
I'll rework the patch then.

Jakub



[PATCH] c++, v4: Add support for __real__/__imag__ modifications in constant expressions [PR88174]

2022-07-27 Thread Jakub Jelinek via Gcc-patches
On Tue, Jul 05, 2022 at 04:44:41PM -0400, Jason Merrill wrote:
> But preeval should always be true, so we'd never reach the new handling in
> the if (!preeval) block.  Certainly the new testcase doesn't exercise this
> code.

Ok, changed now.

I had to keep the ctors[i] = valp; statement in the !preeval block, because
otherwise e.g. 20_util/optional/monadic/transform.cc test fails randomly.
It was wrong already before the patch, it would adjust
TREE_CONSTANT/TREE_SIDE_EFFECTS sometimes on no longer used trees,
but with the vector holding pointers to trees rather than trees it is even
more severe.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-07-27  Jakub Jelinek  

PR c++/88174
* constexpr.cc (cxx_eval_store_expression): Handle REALPART_EXPR
and IMAGPART_EXPR.  Change ctors from releasing_vec to
auto_vec, adjust all uses.  For !preeval, update ctors
vector.

* g++.dg/cpp1y/constexpr-complex1.C: New test.

--- gcc/cp/constexpr.cc.jj  2022-07-04 12:26:18.147053851 +0200
+++ gcc/cp/constexpr.cc 2022-07-26 17:35:53.100556949 +0200
@@ -5726,6 +5726,20 @@ cxx_eval_store_expression (const constex
  }
  break;
 
+   case REALPART_EXPR:
+ gcc_assert (probe == target);
+ vec_safe_push (refs, probe);
+ vec_safe_push (refs, TREE_TYPE (probe));
+ probe = TREE_OPERAND (probe, 0);
+ break;
+
+   case IMAGPART_EXPR:
+ gcc_assert (probe == target);
+ vec_safe_push (refs, probe);
+ vec_safe_push (refs, TREE_TYPE (probe));
+ probe = TREE_OPERAND (probe, 0);
+ break;
+
default:
  if (evaluated)
object = probe;
@@ -5764,7 +5778,8 @@ cxx_eval_store_expression (const constex
   type = TREE_TYPE (object);
   bool no_zero_init = true;
 
-  releasing_vec ctors, indexes;
+  auto_vec ctors;
+  releasing_vec indexes;
   auto_vec index_pos_hints;
   bool activated_union_member_p = false;
   bool empty_base = false;
@@ -5804,14 +5819,36 @@ cxx_eval_store_expression (const constex
  *valp = ary_ctor;
}
 
-  /* If the value of object is already zero-initialized, any new ctors for
-subobjects will also be zero-initialized.  */
-  no_zero_init = CONSTRUCTOR_NO_CLEARING (*valp);
-
   enum tree_code code = TREE_CODE (type);
   tree reftype = refs->pop();
   tree index = refs->pop();
 
+  if (code == COMPLEX_TYPE)
+   {
+ if (TREE_CODE (*valp) == COMPLEX_CST)
+   *valp = build2 (COMPLEX_EXPR, type, TREE_REALPART (*valp),
+   TREE_IMAGPART (*valp));
+ else if (TREE_CODE (*valp) == CONSTRUCTOR
+  && CONSTRUCTOR_NELTS (*valp) == 0
+  && CONSTRUCTOR_NO_CLEARING (*valp))
+   {
+ tree r = build_constructor (reftype, NULL);
+ CONSTRUCTOR_NO_CLEARING (r) = 1;
+ *valp = build2 (COMPLEX_EXPR, type, r, r);
+   }
+ gcc_assert (TREE_CODE (*valp) == COMPLEX_EXPR);
+ ctors.safe_push (valp);
+ vec_safe_push (indexes, index);
+ valp = &TREE_OPERAND (*valp, TREE_CODE (index) == IMAGPART_EXPR);
+ gcc_checking_assert (refs->is_empty ());
+ type = reftype;
+ break;
+   }
+
+  /* If the value of object is already zero-initialized, any new ctors for
+subobjects will also be zero-initialized.  */
+  no_zero_init = CONSTRUCTOR_NO_CLEARING (*valp);
+
   if (code == RECORD_TYPE && is_empty_field (index))
/* Don't build a sub-CONSTRUCTOR for an empty base or field, as they
   have no data and might have an offset lower than previously declared
@@ -5854,7 +5891,7 @@ cxx_eval_store_expression (const constex
  no_zero_init = true;
}
 
-  vec_safe_push (ctors, *valp);
+  ctors.safe_push (valp);
   vec_safe_push (indexes, index);
 
   constructor_elt *cep
@@ -5916,11 +5953,11 @@ cxx_eval_store_expression (const constex
 semantics are not applied on an object under construction.
 They come into effect when the constructor for the most
 derived object ends."  */
- for (tree elt : *ctors)
+ for (tree *elt : ctors)
if (same_type_ignoring_top_level_qualifiers_p
-   (TREE_TYPE (const_object_being_modified), TREE_TYPE (elt)))
+   (TREE_TYPE (const_object_being_modified), TREE_TYPE (*elt)))
  {
-   fail = TREE_READONLY (elt);
+   fail = TREE_READONLY (*elt);
break;
  }
}
@@ -5961,6 +5998,7 @@ cxx_eval_store_expression (const constex
   valp = ctx->global->values.get (object);
   for (unsigned i = 0; i < vec_safe_length (indexes); i++)
{
+ ctors[i] = valp;
  constructor_elt *cep
= get_or_insert_ctor_field (*valp, indexes[i], index_pos_hints[i]);
  valp = &cep->

[PATCH] opts: Add an assertion to help static analyzers [PR106332]

2022-07-27 Thread Jakub Jelinek via Gcc-patches
Hi!

This function would have UB if called with empty candidates vector
(accessing p[-1] where p is malloc (0) result).
As analyzed in the PR, we never call it with empty vector, so this just
adds an assertion to make it clear.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-07-27  Jakub Jelinek  

PR middle-end/106332
* opts-common.cc (candidates_list_and_hint): Add gcc_checking_assert
that candidates is not an empty vector.

--- gcc/opts-common.cc.jj   2022-02-04 14:36:55.439599237 +0100
+++ gcc/opts-common.cc  2022-07-26 11:21:49.785919993 +0200
@@ -1347,6 +1347,8 @@ candidates_list_and_hint (const char *ar
   const char *candidate;
   char *p;
 
+  gcc_checking_assert (!candidates.is_empty ());
+
   FOR_EACH_VEC_ELT (candidates, i, candidate)
 len += strlen (candidate) + 1;
 

Jakub



[PATCH] gimple, internal-fn: Add IFN_TRAP and use it for __builtin_unreachable [PR106099]

2022-07-27 Thread Jakub Jelinek via Gcc-patches
Hi!

__builtin_unreachable and __ubsan_handle_builtin_unreachable don't
use vops, they are marked const/leaf/noreturn/nothrow/cold.
But __builtin_trap uses vops, isn't const, just leaf/noreturn/nothrow/cold.
This is I believe so that when users explicitly use __builtin_trap in their
sources they get stores visible at the trap side.
-fsanitize=unreachable -fsanitize-undefined-trap-on-error used to transform
__builtin_unreachable to __builtin_trap even in the past, but the sanopt pass
has TODO_update_ssa, so it worked fine.

Now that gimple_build_builtin_unreachable can build a __builtin_trap call
right away, we can run into problems that whenever we need it we would need
to either manually or through TODO_update* ensure the vops being updated.

Though, as it is originally __builtin_unreachable which is just implemented
as trap, I think for this case it is fine to avoid vops.  For this the
patch introduces IFN_TRAP, which has ECF_* flags like __builtin_unreachable
and is expanded as __builtin_trap.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-07-27  Jakub Jelinek  

PR tree-optimization/106099
* internal-fn.def (TRAP): New internal fn.
* internal-fn.h (expand_TRAP): Declare.
* internal-fn.cc (expand_TRAP): Define.
* gimple.cc (gimple_build_builtin_unreachable): For BUILT_IN_TRAP,
use internal fn rather than builtin.

* gcc.dg/ubsan/pr106099.c: New test.

--- gcc/internal-fn.def.jj  2022-07-26 10:32:23.886269144 +0200
+++ gcc/internal-fn.def 2022-07-26 11:40:41.799927048 +0200
@@ -456,6 +456,10 @@ DEF_INTERNAL_FN (SHUFFLEVECTOR, ECF_CONS
 /* <=> optimization.  */
 DEF_INTERNAL_FN (SPACESHIP, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
 
+/* __builtin_trap created from/for __builtin_unreachable.  */
+DEF_INTERNAL_FN (TRAP, ECF_CONST | ECF_LEAF | ECF_NORETURN
+  | ECF_NOTHROW | ECF_COLD, NULL)
+
 #undef DEF_INTERNAL_INT_FN
 #undef DEF_INTERNAL_FLT_FN
 #undef DEF_INTERNAL_FLT_FLOATN_FN
--- gcc/internal-fn.h.jj2022-06-16 10:56:28.945385251 +0200
+++ gcc/internal-fn.h   2022-07-26 11:45:50.483837472 +0200
@@ -242,6 +242,7 @@ extern void expand_internal_call (intern
 extern void expand_PHI (internal_fn, gcall *);
 extern void expand_SHUFFLEVECTOR (internal_fn, gcall *);
 extern void expand_SPACESHIP (internal_fn, gcall *);
+extern void expand_TRAP (internal_fn, gcall *);
 
 extern bool vectorized_internal_fn_supported_p (internal_fn, tree);
 
--- gcc/internal-fn.cc.jj   2022-07-26 10:32:23.885269157 +0200
+++ gcc/internal-fn.cc  2022-07-26 11:42:02.611856420 +0200
@@ -4494,3 +4494,9 @@ expand_SPACESHIP (internal_fn, gcall *st
   if (!rtx_equal_p (target, ops[0].value))
 emit_move_insn (target, ops[0].value);
 }
+
+void
+expand_TRAP (internal_fn, gcall *)
+{
+  expand_builtin_trap ();
+}
--- gcc/gimple.cc.jj2022-06-27 11:18:02.680058429 +0200
+++ gcc/gimple.cc   2022-07-26 11:57:17.049760135 +0200
@@ -430,7 +430,16 @@ gimple_build_builtin_unreachable (locati
 {
   tree data = NULL_TREE;
   tree fn = sanitize_unreachable_fn (&data, loc);
-  gcall *g = gimple_build_call (fn, data != NULL_TREE, data);
+  gcall *g;
+  if (DECL_FUNCTION_CODE (fn) != BUILT_IN_TRAP)
+g = gimple_build_call (fn, data != NULL_TREE, data);
+  else
+{
+  /* Instead of __builtin_trap use .TRAP, so that it doesn't
+need vops.  */
+  gcc_checking_assert (data == NULL_TREE);
+  g = gimple_build_call_internal (IFN_TRAP, 0);
+}
   gimple_set_location (g, loc);
   return g;
 }
--- gcc/testsuite/gcc.dg/ubsan/pr106099.c.jj2022-07-26 12:22:26.248156163 
+0200
+++ gcc/testsuite/gcc.dg/ubsan/pr106099.c   2022-07-26 11:34:25.660909186 
+0200
@@ -0,0 +1,10 @@
+/* PR tree-optimization/106099 */
+/* { dg-do compile } */
+/* { dg-options "-O -fsanitize=unreachable -fsanitize-undefined-trap-on-error 
-fno-tree-ccp -fno-tree-dominator-opts" } */
+
+void
+foo (void)
+{
+  for (unsigned i = 0; i == 0; i++)
+;
+}

Jakub



[PATCH] libstdc++: Outline the overlapping case of string _M_replace into a separate function [PR105329]

2022-07-27 Thread Jakub Jelinek via Gcc-patches
Hi!

The following patch is partially a workaround for bogus warnings
when the compiler isn't able to fold _M_disjunct call into constant
false, but also an optimization attempt - assuming _M_disjunct (__s)
is rare, the patch should shrink code size for the common case and
use library or for non-standard instantiations an out of line
function to handle the rare case.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-07-27  Jakub Jelinek  

PR tree-optimization/105329
* acinclude.m4 (libtool_VERSION): Change to 6:31:0.
* config/abi/pre/gnu.ver (GLIBCXX_3.4.21): Don't export
std::basic_string methods with name length of 15.
(GLIBCXX_3.4.31): Export std::basic_string::_M_replace_cold.
* testsuite/util/testsuite_abi.cc (check_version): Handle
GLIBCXX_3.4.31.
* include/bits/basic_string.h (std::basic_string::_M_replace_cold):
Declare.
* include/bits/basic_string.tcc (std::basic_string::_M_replace_cold):
Define and export even for C++20.
(std::basic_string::_M_replace): Use __builtin_expect, outline
the overlapping case to _M_replace_cold.
* configure: Regenerated.

--- libstdc++-v3/acinclude.m4.jj2022-06-27 11:18:03.102052967 +0200
+++ libstdc++-v3/acinclude.m4   2022-07-26 13:15:51.187572177 +0200
@@ -3821,7 +3821,7 @@ changequote([,])dnl
 fi
 
 # For libtool versioning info, format is CURRENT:REVISION:AGE
-libtool_VERSION=6:30:0
+libtool_VERSION=6:31:0
 
 # Everything parsed; figure out what files and settings to use.
 case $enable_symvers in
--- libstdc++-v3/config/abi/pre/gnu.ver.jj  2022-03-17 09:22:50.135144471 
+0100
+++ libstdc++-v3/config/abi/pre/gnu.ver 2022-07-26 13:12:13.098402944 +0200
@@ -1736,7 +1736,7 @@ GLIBCXX_3.4.21 {
 _ZNSt7__cxx1112basic_stringI[cw]St11char_traitsI[cw]ESaI[cw]EE12_M*;
 _ZNSt7__cxx1112basic_stringI[cw]St11char_traitsI[cw]ESaI[cw]EE13*;
 
_ZNSt7__cxx1112basic_stringI[cw]St11char_traitsI[cw]ESaI[cw]EE14_M_replace_aux*;
-_ZNSt7__cxx1112basic_stringI[cw]St11char_traitsI[cw]ESaI[cw]EE1[568-9]*;
+_ZNSt7__cxx1112basic_stringI[cw]St11char_traitsI[cw]ESaI[cw]EE1[68-9]*;
 _ZNSt7__cxx1112basic_stringI[cw]St11char_traitsI[cw]ESaI[cw]EE2at*;
 _ZNSt7__cxx1112basic_stringI[cw]St11char_traitsI[cw]ESaI[cw]EE3end*;
 _ZNSt7__cxx1112basic_stringI[cw]St11char_traitsI[cw]ESaI[cw]EE4back*;
@@ -2444,6 +2444,10 @@ GLIBCXX_3.4.30 {
 
 } GLIBCXX_3.4.29;
 
+GLIBCXX_3.4.31 {
+
_ZNSt7__cxx1112basic_stringI[cw]St11char_traitsI[cw]ESaI[cw]EE15_M_replace_cold*;
+} GLIBCXX_3.4.30;
+
 # Symbols in the support library (libsupc++) have their own tag.
 CXXABI_1.3 {
 
--- libstdc++-v3/testsuite/util/testsuite_abi.cc.jj 2022-03-17 
09:22:50.135144471 +0100
+++ libstdc++-v3/testsuite/util/testsuite_abi.cc2022-07-26 
13:16:44.042886118 +0200
@@ -211,6 +211,7 @@ check_version(symbol& test, bool added)
   known_versions.push_back("GLIBCXX_3.4.28");
   known_versions.push_back("GLIBCXX_3.4.29");
   known_versions.push_back("GLIBCXX_3.4.30");
+  known_versions.push_back("GLIBCXX_3.4.31");
   known_versions.push_back("GLIBCXX_LDBL_3.4.29");
   known_versions.push_back("GLIBCXX_IEEE128_3.4.29");
   known_versions.push_back("GLIBCXX_IEEE128_3.4.30");
@@ -247,7 +248,7 @@ check_version(symbol& test, bool added)
test.version_status = symbol::incompatible;
 
   // Check that added symbols are added in the latest pre-release version.
-  bool latestp = (test.version_name == "GLIBCXX_3.4.30"
+  bool latestp = (test.version_name == "GLIBCXX_3.4.31"
  // XXX remove next line when baselines have been regenerated.
 || test.version_name == "GLIBCXX_IEEE128_3.4.30"
 || test.version_name == "CXXABI_1.3.13"
--- libstdc++-v3/include/bits/basic_string.h.jj 2022-06-17 11:08:11.971491626 
+0200
+++ libstdc++-v3/include/bits/basic_string.h2022-07-26 12:42:38.008440082 
+0200
@@ -2504,6 +2504,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   _M_replace_aux(size_type __pos1, size_type __n1, size_type __n2,
 _CharT __c);
 
+  __attribute__((__noinline__, __noclone__, __cold__)) void
+  _M_replace_cold(pointer __p, size_type __len1, const _CharT* __s,
+ const size_type __len2, const size_type __how_much);
+
   _GLIBCXX20_CONSTEXPR
   basic_string&
   _M_replace(size_type __pos, size_type __len1, const _CharT* __s,
--- libstdc++-v3/include/bits/basic_string.tcc.jj   2022-06-15 
10:43:46.796946222 +0200
+++ libstdc++-v3/include/bits/basic_string.tcc  2022-07-26 13:43:35.293012051 
+0200
@@ -471,6 +471,37 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 }
 
   template
+__attribute__((__noinline__, __noclone__, __cold__)) void
+basic_string<_CharT, _Traits, _Alloc>::
+_M_replace_cold(pointer __p, size_type __len1, const _CharT* __s,
+   const size_type __len2, const size_type __ho

[PATCH] cgraphunit: Don't emit asm thunks for -dx [PR106261]

2022-07-27 Thread Jakub Jelinek via Gcc-patches
Hi!

When -dx option is used (didn't know we have it and no idea what is it
useful for), we just expand functions to RTL and then omit all further
RTL passes, so the normal functions aren't actually emitted into assembly,
just variables.
The following testcase ICEs, because we don't emit the methods, but do
emit thunks pointing to that and those thunks have unwind info and rely on
at least some real functions to be emitted (which is normally the case,
thunks are only emitted for locally defined functions) because otherwise
there are no CIEs, only FDEs and dwarf2out is upset about it.

The following patch fixes that by not emitting assembly thunks for -dx
either.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-07-27  Jakub Jelinek  

PR debug/106261
* cgraphunit.cc (cgraph_node::assemble_thunks_and_aliases): Don't
output asm thunks for -dx.

* g++.dg/debug/pr106261.C: New test.

--- gcc/cgraphunit.cc.jj2022-06-27 11:18:02.048066608 +0200
+++ gcc/cgraphunit.cc   2022-07-26 16:01:38.696956950 +0200
@@ -1753,7 +1753,7 @@ cgraph_node::assemble_thunks_and_aliases
cgraph_node *thunk = e->caller;
 
e = e->next_caller;
-   expand_thunk (thunk, true, false);
+   expand_thunk (thunk, !rtl_dump_and_exit, false);
thunk->assemble_thunks_and_aliases ();
   }
 else
--- gcc/testsuite/g++.dg/debug/pr106261.C.jj2022-07-26 15:59:04.082979550 
+0200
+++ gcc/testsuite/g++.dg/debug/pr106261.C   2022-07-26 15:58:37.301329916 
+0200
@@ -0,0 +1,36 @@
+// PR debug/106261
+// { dg-do compile }
+// { dg-options "-dx -fno-dwarf2-cfi-asm" }
+
+struct A
+{
+  virtual void foo ();
+  int a;
+};
+class C : virtual public A
+{
+};
+struct B
+{
+  A *b;
+
+  B (A *x) : b (x) { b->foo (); }
+};
+struct E
+{
+  virtual ~E ();
+};
+class D : public C, E
+{
+};
+struct F : D
+{
+  F (int);
+
+  static void bar ()
+  {
+F a (0);
+B b (&a);
+  }
+};
+void baz () { F::bar (); }

Jakub



Re: [PATCH] gimple, internal-fn: Add IFN_TRAP and use it for __builtin_unreachable [PR106099]

2022-07-27 Thread Jakub Jelinek via Gcc-patches
On Wed, Jul 27, 2022 at 09:33:47AM +, Richard Biener wrote:
> > __builtin_unreachable and __ubsan_handle_builtin_unreachable don't
> > use vops, they are marked const/leaf/noreturn/nothrow/cold.
> > But __builtin_trap uses vops, isn't const, just leaf/noreturn/nothrow/cold.
> > This is I believe so that when users explicitly use __builtin_trap in their
> > sources they get stores visible at the trap side.
> > -fsanitize=unreachable -fsanitize-undefined-trap-on-error used to transform
> > __builtin_unreachable to __builtin_trap even in the past, but the sanopt 
> > pass
> > has TODO_update_ssa, so it worked fine.
> > 
> > Now that gimple_build_builtin_unreachable can build a __builtin_trap call
> > right away, we can run into problems that whenever we need it we would need
> > to either manually or through TODO_update* ensure the vops being updated.
> > 
> > Though, as it is originally __builtin_unreachable which is just implemented
> > as trap, I think for this case it is fine to avoid vops.  For this the
> > patch introduces IFN_TRAP, which has ECF_* flags like __builtin_unreachable
> > and is expanded as __builtin_trap.
> > 
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> I think for the sake of sanitizing unreachable as trap this is OK
> but it seems this isn't actually what is done.

We chose to sanitize not just explicit user __builtin_unreachable calls,
but also the internally generated __builtin_unreachable calls (with the
one exception of fall through to end of C++ function returning non-void,
which had before a separate sanitizer) and we've been doing it since 2013
when ubsan was added.
Even for the internally generated unreachable calls like those from
devirtualization or other reasons like ivcanon/unrolling, having the
possibility to get some runtime diagnostics or trap can be useful over
just falling through to random following code.
Previously we'd always emit __builtin_unreachable, then perhaps in some
cases could e.g. optimize it away (say if there is a guarding condition
around the implicitly added unreachable turning the condition into VRP
info and optimizing the conditional away), otherwise the sanopt pass
would turn those __builtin_unreachable calls into __builtin_trap.
With the recent changes, we don't run the sanopt pass when only
doing -fsanitize=unreachable (or -funrechable-traps) though, so we need
to emit the trap/__ubsan_handle_unreachable/__builtin_unreachable right
away.

Jakub



[committed] testsuite: Add extra ia32 options so that -fprefetch-loop-arrays works [PR106397]

2022-07-27 Thread Jakub Jelinek via Gcc-patches
Hi!

-fprefetch-loop-arrays isn't supported on ia32 with just -march=i386 and
similar, the following patch adds extra options similar testcases use.

2022-07-27  Jakub Jelinek  

PR tree-optimization/106397
* gcc.dg/pr106397.c: For ia32, add dg-additional-options
-march=i686 -msse.

--- gcc/testsuite/gcc.dg/pr106397.c.jj  2022-07-26 10:32:23.962268163 +0200
+++ gcc/testsuite/gcc.dg/pr106397.c 2022-07-27 10:55:49.492975896 +0200
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O3 -fprefetch-loop-arrays --param l2-cache-size=0 --param 
prefetch-latency=3 -fprefetch-loop-arrays" } */
+/* { dg-additional-options "-march=i686 -msse" { target { { i?86-*-* 
x86_64-*-* } && ia32 } } } */
 
 int
 bar (void)

Jakub



[committed] testsuite: Add -Wno-psabi to pr94920 tests [PR94920]

2022-07-27 Thread Jakub Jelinek via Gcc-patches
Hi!

These tests fail on ia32, because we get -Wpsabi warnings.
Fixed by adding -Wno-psabi.  The pr94920.C test still fails the
ABS_EXPR scan-tree-dump though, I think we'll need to add vect
options and use vect_int effective target or something similar.

2022-07-27  Jakub Jelinek  

PR tree-optimization/94920
* g++.dg/pr94920.C: Add -Wno-psabi to dg-options.
* g++.dg/pr94920-1.C: Add dg-additional-options -Wno-psabi.

--- gcc/testsuite/g++.dg/pr94920.C.jj   2022-07-26 10:32:23.957268227 +0200
+++ gcc/testsuite/g++.dg/pr94920.C  2022-07-27 10:59:21.390147571 +0200
@@ -1,6 +1,6 @@
 /* PR tree-optimization/94920 */
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-options "-O2 -Wno-psabi -fdump-tree-optimized" } */
 
 typedef int __attribute__((vector_size(4*sizeof(int vint;
 
--- gcc/testsuite/g++.dg/pr94920-1.C.jj 2022-07-26 10:32:23.957268227 +0200
+++ gcc/testsuite/g++.dg/pr94920-1.C2022-07-27 10:58:40.451693998 +0200
@@ -1,4 +1,5 @@
 /* PR tree-optimization/94920 */
+/* { dg-additional-options "-Wno-psabi" } */
 /* { dg-do run } */
 
 #include "pr94920.C"

Jakub



Re: [PATCH] gimple, internal-fn: Add IFN_TRAP and use it for __builtin_unreachable [PR106099]

2022-07-27 Thread Jakub Jelinek via Gcc-patches
On Wed, Jul 27, 2022 at 10:09:34AM +, Richard Biener wrote:
> > We chose to sanitize not just explicit user __builtin_unreachable calls,
> > but also the internally generated __builtin_unreachable calls (with the
> > one exception of fall through to end of C++ function returning non-void,
> > which had before a separate sanitizer) and we've been doing it since 2013
> > when ubsan was added.
> > Even for the internally generated unreachable calls like those from
> > devirtualization or other reasons like ivcanon/unrolling, having the
> > possibility to get some runtime diagnostics or trap can be useful over
> > just falling through to random following code.
> 
> So at least for the unrolling use the intent is to have the
> unreachable () fully elided by later passes.  Honza can correct me
> if I'm wrong.  Using __builtin_trap from the start until sanopt
> may prevent some of that from happening, keeping dead conditions
> live, no?

That is true.
I guess changing the sanopt gate back and building __builtin_unreachable
only in the ivcanon/unrolling case is possible too.

Without or with this patch then, the advantage of the patch would be that
we wouldn't need to recompute vops if we replace unreachables with traps
during the sanopt pass.
> 
> > Previously we'd always emit __builtin_unreachable, then perhaps in some
> > cases could e.g. optimize it away (say if there is a guarding condition
> > around the implicitly added unreachable turning the condition into VRP
> > info and optimizing the conditional away), otherwise the sanopt pass
> > would turn those __builtin_unreachable calls into __builtin_trap.
> > With the recent changes, we don't run the sanopt pass when only
> > doing -fsanitize=unreachable (or -funrechable-traps) though, so we need
> > to emit the trap/__ubsan_handle_unreachable/__builtin_unreachable right
> > away.
> 
> Why did the recent changes not just replace __builtin_unreachable
> at RTL expansion time?  Was the intent really to force the paths
> to be kept live?  I can see that for user or frontend generated
> unreachables but not so much for some of the middle-end ones.

It is easier on GIMPLE and perhaps allows e.g. sharing the data for
__ubsan_handle_unreachable calls.  sanopt pass is quite late anyway.

Jakub



Re: [PATCH Rust front-end v1 2/4] Add Rust lang TargetHooks for i386 and x86_64

2022-07-28 Thread Jakub Jelinek via Gcc-patches
Note, the ChangeLog entry formatting is wrong and in various places
the code formatting as well.  In gcc/rust/ you can choose different
formatting if there are strong reasons for that, but at least it should be
consistent and ideally documented.

None of the gcc/doc, gcc/config, gcc/config/i386 directories have
ChangeLog files, so all this should go into gcc/ChangeLog
and filenames should be relative to the gcc/ directory, so
doc/tm.texi.in, config/default-rust.cc, config/i386/crtdll.h etc.

After : there should be a capital letter and the description what
changed should end with a dot.
Where possible, before : there should be space ( followed by what
has changed followed by ).
When some file is regenerated, just write Regenerated. or so.
When a new file is added, just write New. or New file.

On Wed, Jul 27, 2022 at 02:40:38PM +0100, herron.philip--- via Gcc-patches 
wrote:
> gcc/doc/ChangeLog:
> 
> * tm.texi.in: specifiy hooks for rust target info
> * tm.texi: commit the generated documentation
> 
> gcc/ChangeLog:
> 
> * Makefile.in: add target to generate rust hooks
> * config.gcc: add rust target interface to be compiled for i386
>   * genhooks.cc: generate rust target hooks
>   * configure: autoconf update
>   * configure.ac: add tm_rust_file_list and tm_rust_include_list
> 
> gcc/config/ChangeLog:
> 
> * default-rust.cc: new target hooks initializer for rust
> * gnu.h: add new macro GNU_USER_TARGET_RUST_OS_INFO
>   * dragonfly.h: define TARGET_RUST_OS_INFO
>   * freebsd-spec.h: define FBSD_TARGET_RUST_OS_INFO
>   * freebsd.h: define guard for TARGET_RUST_OS_INFO
>   * fuchsia.h: define TARGET_RUST_OS_INFO
>   * kfreebsd-gnu.h: define GNU_USER_TARGET_RUST_OS_INFO
>   * kopensolaris-gnu.h: define GNU_USER_TARGET_RUST_OS_INFO
>   * linux-android.h: define ANDROID_TARGET_RUST_OS_INFO
>   * linux.h: define GNU_USER_TARGET_RUST_OS_INFO
>   * netbsd.h: define NETBSD_TARGET_RUST_OS_INFO
>   * openbsd.h: define OPENBSD_TARGET_RUST_OS_INFO
>   * phoenix.h: define TARGET_RUST_OS_INFO
>   * sol2.h: define TARGET_RUST_OS_INFO
>   * vxworks.h: define VXWORKS_TARGET_RUST_OS_INFO
>   * vxworksae.h: define VXWORKS_TARGET_RUST_OS_INFO
> 
> gcc/config/i386/ChangeLog:
> 
> * crtdll.h: define EXTRA_TARGET_RUST_OS_INFO
> * cygming.h: define TARGET_RUST_OS_INFO
> * cygwin.h: define EXTRA_TARGET_RUST_OS_INFO
> * darwin.h: define TARGET_RUST_OS_INFO
> * djgpp.h: likewise
> * gnu-user-common.h: define TARGET_RUST_OS_INFO
> * i386-protos.h: prototype for ix86_rust_target_cpu_info
> * i386-rust.cc: new file to generate the rust target host info
> * i386.h: define TARGET_RUST_CPU_INFO hook
> * linux-common.h: define hooks for target info
> * lynx.h: likewise
> * mingw32.h: likewise
> * netbsd-elf.h: likewise
> * netbsd64.h: likewise
> * nto.h: likewise
> * openbsdelf.h: likewise
> * rdos.h: likewise
> * rtemself.h: likewise
> * t-i386: add makefilke rule for i386-rust.cc
> * vxworks.h: define TARGET_RUST_OS_INFO
> 
> gcc/rust/ChangeLog:
> 
> * rust-target-def.h: define the headers to access the hooks
> * rust-target.def: define the hooks nessecary based on C90
> * rust-target.h: define extern gcc_targetrustm
> +/* Implement TARGET_RUST_CPU_INFO for x86 targets.  */
> +
> +void
> +ix86_rust_target_cpu_info (void)
> +{
> +if (TARGET_64BIT) {
> +rust_add_target_info("target_arch", "x86_64");

The indentation should be just 2 columns rather than 4, and
{ doesn't go at the end of line.  Single statement if/else/etc.
bodies aren't wrapped with {}s.  There is space before (.
So
  if (TARGET_64BIT)
{
  rust_add_target_info ("target_arch", "x86_64");
  ...
}
  else
rust_add_target_info ("target_arch", "x86");

> +  // nopl: hard-coded (as gcc doesn't technically have feature) to return 
> true for cpu arches with it
> +  // maybe refactor into switch if multiple options
> +  bool hasNOPL = ix86_arch == PROCESSOR_PENTIUMPRO || ix86_arch == 
> PROCESSOR_PENTIUM4 
> +|| ix86_arch == PROCESSOR_NOCONA || ix86_arch == PROCESSOR_CORE2 || 
> ix86_arch == PROCESSOR_NEHALEM 

Some lines are too long.

Jakub



[committed] openmp: Simplify fold_build_pointer_plus callers in omp-expand

2022-07-29 Thread Jakub Jelinek via Gcc-patches
Hi!

Tobias mentioned in PR106449 that fold_build_pointer_plus already
fold_converts the second argument to sizetype if it doesn't already
have an integral type gimple compatible with sizetype.

So, this patch simplifies the callers of fold_build_pointer_plus in
omp-expand so that they don't do those conversions manually.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2022-07-29  Jakub Jelinek  

* omp-expand.cc (expand_omp_for_init_counts, expand_omp_for_init_vars,
extract_omp_for_update_vars, expand_omp_for_ordered_loops,
expand_omp_simd): Don't fold_convert second argument to
fold_build_pointer_plus to sizetype.

--- gcc/omp-expand.cc.jj2022-06-28 13:03:30.930689700 +0200
+++ gcc/omp-expand.cc   2022-07-28 10:47:03.138939297 +0200
@@ -2267,8 +2267,7 @@ expand_omp_for_init_counts (struct omp_f
  else if (POINTER_TYPE_P (itype))
{
  gcc_assert (integer_onep (fd->loops[i].m1));
- t = fold_convert (sizetype,
-   unshare_expr (fd->loops[i].n1));
+ t = unshare_expr (fd->loops[i].n1);
  n1 = fold_build_pointer_plus (vs[i - fd->loops[i].outer], t);
}
  else
@@ -2291,8 +2290,7 @@ expand_omp_for_init_counts (struct omp_f
  else if (POINTER_TYPE_P (itype))
{
  gcc_assert (integer_onep (fd->loops[i].m2));
- t = fold_convert (sizetype,
-   unshare_expr (fd->loops[i].n2));
+ t = unshare_expr (fd->loops[i].n2);
  n2 = fold_build_pointer_plus (vs[i - fd->loops[i].outer], t);
}
  else
@@ -2353,8 +2351,7 @@ expand_omp_for_init_counts (struct omp_f
  tree step = fold_convert (itype,
unshare_expr (fd->loops[i].step));
  if (POINTER_TYPE_P (TREE_TYPE (vs[i])))
-   t = fold_build_pointer_plus (vs[i],
-fold_convert (sizetype, step));
+   t = fold_build_pointer_plus (vs[i], step);
  else
t = fold_build2 (PLUS_EXPR, itype, vs[i], step);
  t = force_gimple_operand_gsi (&gsi2, t, true, NULL_TREE,
@@ -2794,8 +2791,7 @@ expand_omp_for_init_vars (struct omp_for
  else if (POINTER_TYPE_P (itype))
{
  gcc_assert (integer_onep (fd->loops[j].m1));
- t = fold_convert (sizetype,
-   unshare_expr (fd->loops[j].n1));
+ t = unshare_expr (fd->loops[j].n1);
  n1 = fold_build_pointer_plus (vs[j - fd->loops[j].outer], t);
}
  else
@@ -2818,8 +2814,7 @@ expand_omp_for_init_vars (struct omp_for
  else if (POINTER_TYPE_P (itype))
{
  gcc_assert (integer_onep (fd->loops[j].m2));
- t = fold_convert (sizetype,
-   unshare_expr (fd->loops[j].n2));
+ t = unshare_expr (fd->loops[j].n2);
  n2 = fold_build_pointer_plus (vs[j - fd->loops[j].outer], t);
}
  else
@@ -2895,8 +2890,7 @@ expand_omp_for_init_vars (struct omp_for
  tree step
= fold_convert (itype, unshare_expr (fd->loops[j].step));
  if (POINTER_TYPE_P (vtype))
-   t = fold_build_pointer_plus (vs[j], fold_convert (sizetype,
- step));
+   t = fold_build_pointer_plus (vs[j], step);
  else
t = fold_build2 (PLUS_EXPR, itype, vs[j], step);
}
@@ -2959,8 +2953,7 @@ expand_omp_for_init_vars (struct omp_for
= fold_convert (itype, unshare_expr (fd->loops[j].step));
  t = fold_build2 (MULT_EXPR, itype, t, t2);
  if (POINTER_TYPE_P (vtype))
-   t = fold_build_pointer_plus (n1,
-fold_convert (sizetype, t));
+   t = fold_build_pointer_plus (n1, t);
  else
t = fold_build2 (PLUS_EXPR, itype, n1, t);
}
@@ -2970,8 +2963,7 @@ expand_omp_for_init_vars (struct omp_for
  t = fold_build2 (MULT_EXPR, itype, t,
   fold_convert (itype, fd->loops[j].step));
  if (POINTER_TYPE_P (vtype))
-   t = fold_build_pointer_plus (fd->loops[j].n1,
-fold_convert (sizetype, t));
+   t = fold_build_pointer_plus (fd->loops[j].n1, t);
  else
t = fold_build2 (PLUS_EXPR, itype, fd->loops[j].n1, t);
}
@@ -3035,9 +3027,8 @@ expand_omp_for

[committed] openmp: Fix up handling of non-rectangular simd loops with pointer type iterators [PR106449]

2022-07-29 Thread Jakub Jelinek via Gcc-patches
Hi!

There were 2 issues visible on this new testcase, one that we didn't have
special POINTER_TYPE_P handling in a few spots of expand_omp_simd - for
pointers we need to use POINTER_PLUS_EXPR and need to have the non-pointer
part in sizetype, for non-rectangular loop on the other side we can rely on
multiplication factor 1, pointers can't be multiplied, without those changes
we'd ICE.  The other issue was that we put n2 expression directly into a
comparison in a condition and regimplified that, for the &a[512] case that
and with gimplification being destructed that unfortunately meant modification
of original fd->loops[?].n2.  Fixed by unsharing the expression.  This was
causing a runtime failure on the testcase.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2022-07-29  Jakub Jelinek  

PR middle-end/106449
* omp-expand.cc (expand_omp_simd): Fix up handling of pointer
iterators in non-rectangular simd loops.  Unshare fd->loops[i].n2
or n2 before regimplifying it inside of a condition.

* testsuite/libgomp.c-c++-common/pr106449.c: New test.

--- gcc/omp-expand.cc.jj2022-06-28 13:03:30.930689700 +0200
+++ gcc/omp-expand.cc   2022-07-28 10:47:03.138939297 +0200
@@ -6714,7 +6696,7 @@ expand_omp_simd (struct omp_region *regi
   if (fd->loops[i].m2)
t = n2v = create_tmp_var (itype);
   else
-   t = fold_convert (itype, fd->loops[i].n2);
+   t = fold_convert (itype, unshare_expr (fd->loops[i].n2));
   t = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE,
false, GSI_CONTINUE_LINKING);
   tree v = fd->loops[i].v;
@@ -6728,7 +6710,7 @@ expand_omp_simd (struct omp_region *regi
   if (fd->collapse > 1 && !broken_loop)
t = n2var;
   else
-   t = fold_convert (type, n2);
+   t = fold_convert (type, unshare_expr (n2));
   t = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE,
false, GSI_CONTINUE_LINKING);
   tree v = fd->loop.v;
@@ -6840,7 +6819,7 @@ expand_omp_simd (struct omp_region *regi
  if (fd->loops[i].m2)
t = nextn2v = create_tmp_var (itype);
  else
-   t = fold_convert (itype, fd->loops[i].n2);
+   t = fold_convert (itype, unshare_expr (fd->loops[i].n2));
  t = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE,
false, GSI_CONTINUE_LINKING);
  tree v = fd->loops[i].v;
@@ -6870,17 +6849,25 @@ expand_omp_simd (struct omp_region *regi
  ne->probability = e->probability.invert ();
 
  gsi = gsi_after_labels (init_bb);
- t = fold_convert (TREE_TYPE (fd->loops[i + 1].v),
-   fd->loops[i + 1].n1);
  if (fd->loops[i + 1].m1)
{
- tree t2 = fold_convert (TREE_TYPE (t),
+ tree t2 = fold_convert (TREE_TYPE (fd->loops[i + 1].v),
  fd->loops[i + 1
- fd->loops[i + 1].outer].v);
- tree t3 = fold_convert (TREE_TYPE (t), fd->loops[i + 1].m1);
- t2 = fold_build2 (MULT_EXPR, TREE_TYPE (t), t2, t3);
- t = fold_build2 (PLUS_EXPR, TREE_TYPE (t), t, t2);
+ if (POINTER_TYPE_P (TREE_TYPE (t2)))
+   t = fold_build_pointer_plus (t2, fd->loops[i + 1].n1);
+ else
+   {
+ t = fold_convert (TREE_TYPE (fd->loops[i + 1].v),
+   fd->loops[i + 1].n1);
+ tree t3 = fold_convert (TREE_TYPE (t), fd->loops[i + 1].m1);
+ t2 = fold_build2 (MULT_EXPR, TREE_TYPE (t), t2, t3);
+ t = fold_build2 (PLUS_EXPR, TREE_TYPE (t), t, t2);
+   }
}
+ else
+   t = fold_convert (TREE_TYPE (fd->loops[i + 1].v),
+ fd->loops[i + 1].n1);
  expand_omp_build_assign (&gsi, fd->loops[i + 1].v, t);
  if (fd->loops[i + 1].m2)
{
@@ -6889,14 +6876,19 @@ expand_omp_simd (struct omp_region *regi
  gcc_assert (n2v == NULL_TREE);
  n2v = create_tmp_var (TREE_TYPE (fd->loops[i + 1].v));
}
- t = fold_convert (TREE_TYPE (fd->loops[i + 1].v),
-   fd->loops[i + 1].n2);
- tree t2 = fold_convert (TREE_TYPE (t),
+ tree t2 = fold_convert (TREE_TYPE (fd->loops[i + 1].v),
  fd->loops[i + 1
- fd->loops[i + 1].outer].v);
- tree t3 = fold_convert (TREE_TYPE (t), fd->loops[i + 1].m2);
- t2 = fold_build2 (MULT_EXPR, TREE_TYPE (t), t2, t3);
- t = fold_build2 (PLUS_EXPR, TREE_TYPE (t), t, t2);
+ if (POINTER_TYPE_P (TREE_TYPE (t2)))
+   t = fold_build_pointer_plus (t2, fd->loops[i + 1].n2);
+  

[committed] openmp: Reject invalid forms of C++ #pragma omp atomic compare [PR106448]

2022-07-29 Thread Jakub Jelinek via Gcc-patches
Hi!

The allowed syntaxes of atomic compare don't allow ()s around the condition
of ?:, but we were accepting it in one case for C++.

Fixed thusly.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2022-07-29  Jakub Jelinek  

PR c++/106448
* parser.cc (cp_parser_omp_atomic): For simple cast followed by
CPP_QUERY token, don't try cp_parser_binary_operation if compare
is true.

* c-c++-common/gomp/atomic-32.c: New test.

--- gcc/cp/parser.cc.jj 2022-07-26 10:32:23.557273390 +0200
+++ gcc/cp/parser.cc2022-07-28 15:12:44.288195066 +0200
@@ -41535,7 +41535,9 @@ restart:
  goto saw_error;
}
  token = cp_lexer_peek_token (parser->lexer);
- if (token->type != CPP_SEMICOLON && !cp_tree_equal (lhs, rhs1))
+ if (token->type != CPP_SEMICOLON
+ && (!compare || token->type != CPP_QUERY)
+ && !cp_tree_equal (lhs, rhs1))
{
  cp_parser_abort_tentative_parse (parser);
  cp_parser_parse_tentatively (parser);
--- gcc/testsuite/c-c++-common/gomp/atomic-32.c.jj  2022-07-28 
15:23:54.567237524 +0200
+++ gcc/testsuite/c-c++-common/gomp/atomic-32.c 2022-07-28 15:25:25.127027325 
+0200
@@ -0,0 +1,14 @@
+/* PR c++/106448 */
+
+int x, expr;
+  
+void
+foo (void)
+{
+  #pragma omp atomic compare
+  x = (expr > x) ? expr : x;   /* { dg-error "invalid (form|operator)" } */
+  #pragma omp atomic compare
+  x = (x < expr) ? expr : x;   /* { dg-error "invalid (form|operator)" } */
+  #pragma omp atomic compare
+  x = (x == expr) ? expr : x;  /* { dg-error "invalid (form|operator)" } */
+}

Jakub



Re: [RFC] Teach vectorizer to deal with bitfield reads

2022-07-29 Thread Jakub Jelinek via Gcc-patches
On Fri, Jul 29, 2022 at 09:57:29AM +0100, Andre Vieira (lists) via Gcc-patches 
wrote:
> The 'only on the vectorized code path' remains the same though as vect_recog
> also only happens on the vectorized code path right?

if conversion (in some cases) duplicates a loop and guards one copy with
an ifn which resolves to true if that particular loop is vectorized and
false otherwise.  So, then changes that shouldn't be done in case of
vectorization failure can be done on the for vectorizer only copy of the
loop.

Jakub



  1   2   3   4   5   6   7   8   9   10   >