[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7

2024-02-27 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441

Tamar Christina  changed:

   What|Removed |Added

 CC||rsandifo at gcc dot gnu.org

--- Comment #29 from Tamar Christina  ---
(In reply to rguent...@suse.de from comment #28)
> On Mon, 26 Feb 2024, tnfchris at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441
> > 
> > --- Comment #27 from Tamar Christina  ---
> > Created attachment 57538 [details]
> >   --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57538&action=edit
> > proposed1.patch
> > 
> > proposed patch, this gets the gathers and scatters back. doing regression 
> > run.
> 
> I don't think this will fly.

Well.. I don't really know what the do here I guess.

per the discussion on irc, we only used to try gather/scatters when SCEV fails.

Now that it succeeds we no longer try using the pattern and try to handle it
during vectorizable_load/vectorizable_stores as recognizing the gather/scatters
inline through VMAT_GATHER_SCATTER.

This works fine for normal gather and scatters but doesn't work for widening
gathers and narrowing scatters which only the pattern seems to handle.

I don't know how to get this to be detected through get_load_store_type since
well, that's very late.  among others we've already determined the VF and the
unpacks have already been marked relevant. So
vectorizable_load/vectorizable_store would have to actively change the IL.

So I don't know how widening and narrowing operations are supposed to work
here.  given that.. I will leave it up to the maintainers I guess.

[Bug target/94789] Failure to take advantage of shift operand semantics to turn subtraction into negate

2024-02-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94789

Andrew Pinski  changed:

   What|Removed |Added

 Target|x86_64-*-* i?86-*-* aarch64 |x86_64-*-* i?86-*-*

--- Comment #5 from Andrew Pinski  ---
(In reply to Wilco from comment #4)
> AArch64 already generates:
> 
>   neg w1, w1
>   lsl w0, w0, w1
>   ret

aarch64 is because it has a pattern to optimize this explictly:
(insn 14 9 15 2 (set (reg/i:SI 0 x0)
(ashift:SI (reg:SI 108)
(minus:QI (const_int 32 [0x20])
(subreg:QI (reg:SI 109) 0 "/app/example.cpp":5:1 744
{*aarch64_ashl_reg_minussi3}
 (expr_list:REG_DEAD (reg:SI 108)
(expr_list:REG_DEAD (reg:SI 109)
(nil

Which was added in r8-3672-g59abe903987d61 .  Maybe the x86_64 backend do a
similar thing?

[Bug target/95341] Poor vector_size decomposition when SVE is enabled

2024-02-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95341

Andrew Pinski  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |14.0
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=112787

--- Comment #5 from Andrew Pinski  ---
Fixed by r14-6752-ga3ff76278efe00 for GCC 14.

[Bug middle-end/88670] [meta-bug] generic vector extension issues

2024-02-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88670
Bug 88670 depends on bug 95341, which changed state.

Bug 95341 Summary: Poor vector_size decomposition when SVE is enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95341

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7

2024-02-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441

--- Comment #30 from Richard Biener  ---
The x86 and "emulation" paths handle narrowing/widening during code generation
(but yes, the IFN path doesn't).  A fix would be to do similar as for the
gs_info.decl case in vectorizable_load/store and handle select cases of
widening/narrowing (2x) and adjust vect_check_gather_scatter accordingly.
That might be against the spirit of how the IFN support was laid out
(possibly to be "cleaner"), but I don't see a good way to avoid the very
premature (during pattern selection) load/store vectorization choosing for
the cases there are multiple possibilities as seen here.

[Bug middle-end/114081] [14 regression] ICE in verify_dominators when building php-8.3.3 (error: dominator of 16 should be 111, not 3) since r14-6822

2024-02-27 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114081

--- Comment #7 from GCC Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:8a5d9409584aeb777b06f9c19c7d1a3552d496ad

commit r14-9191-g8a5d9409584aeb777b06f9c19c7d1a3552d496ad
Author: Richard Biener 
Date:   Mon Feb 26 15:17:43 2024 +0100

tree-optimization/114081 - dominator update for prologue peeling

The following implements manual update for multi-exit loop prologue
peeling during vectorization.

PR tree-optimization/114081
* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
Perform manual dominator update for prologue peeling.
(vect_do_peeling): Properly update dominators after adding the
prologue-around guard.

* gcc.dg/vect/vect-early-break_121-pr114081.c: New testcase.

[Bug middle-end/114081] [14 regression] ICE in verify_dominators when building php-8.3.3 (error: dominator of 16 should be 111, not 3) since r14-6822

2024-02-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114081

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #8 from Richard Biener  ---
The testcase is now fixed for me.

[Bug target/96463] [SVE] Optimise svld1rq from vectors

2024-02-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96463

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |13.0
 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Andrew Pinski  ---
Fixed.

[Bug c++/114114] [11/12/13/14 Regression] Internal compiler error on function-local conditional noexcept

2024-02-27 Thread yves.bailly at hexagon dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114114

--- Comment #3 from Yves Bailly  ---
Due credits to Stefano Bellotti  for writing the
code that triggers the ICE - I only did the paperwork.

[Bug tree-optimization/114120] add reduction with promotion and then truncation poorly vectorized

2024-02-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114120

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2024-02-27
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #1 from Richard Biener  ---
I think I've seen a duplicate for this.  We lack a pass replacing an IV
(a PHI) based on how that is used outside of the loop.  Basically we fail
to treat PHIs transparently when folding conversions.  This _might_ be sth
for IVCANON since I think it doesn't really fit any other pass.

It also came up in the context of 

int f (unsigned *src)
{
  int sum = 0;
  for (int y = 0; y < 8; y++)
{
sum += src[y];
}
  return sum;
}

which we handle fine in vectorization but still the reduction could be
done in 'unsigned' all the way through (and that conversion handling in
the vectorizer reduction code is somewhat ugly).

[Bug tree-optimization/114121] wrong code with _BitInt() arithmetics at -O2

2024-02-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114121

--- Comment #7 from Richard Biener  ---
I will have a look.

[Bug target/98532] Use load/store pairs for 2-element vector in memory permutes

2024-02-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98532

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |FIXED
   Target Milestone|--- |12.0
 Status|NEW |RESOLVED

--- Comment #4 from Andrew Pinski  ---
Fixed by enabling SLP at -O2. Though this could be improved without the SLP.

  _1 = BIT_FIELD_REF <*a_4(D), 64, 64>;
  _2 = BIT_FIELD_REF <*a_4(D), 64, 0>;
  tmp_5 = {_1, _2};

Could be turned into VEC_PERM<*a_4(D), {1, 0}> earlier on.  But I doubt that it
will matter so much really.

[Bug target/114122] RISC-V: poor code generation in calling convention with vlen > 4096

2024-02-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114122

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization

--- Comment #1 from Richard Biener  ---
Note it's always difficult if you rely on argument passing / return that is
outside of the ABI specification for the platform so I'd advise against such
interfaces.  Instead I'd suggest to go with by-referenece argument/return.

[Bug target/98877] [AArch64] Inefficient code generated for tbl NEON intrinsics

2024-02-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98877

--- Comment #7 from Andrew Pinski  ---
>Maybe the issue is only with arguments now.


Actually I think this is still a subreg vs ra issue.


(insn 8 5 9 2 (set (subreg:V16QI (reg/v:V2x16QI 100 [ __tab ]) 0)
(reg/v:V16QI 102 [ lo ])) -1
 (nil))
(insn 9 8 10 2 (set (subreg:V16QI (reg/v:V2x16QI 100 [ __tab ]) 16)
(reg/v:V16QI 103 [ hi ])) -1
 (nil))
(insn 10 9 11 2 (set (reg:V16QI 101 [  ])
(unspec:V16QI [
(reg/v:V2x16QI 100 [ __tab ])
(reg/v:V16QI 104 [ idx ])
] UNSPEC_TBL))
"/opt/compiler-explorer/arm64/gcc-trunk-20240227/aarch64-unknown-linux-gnu/lib/gcc/aarch64-unknown-linux-gnu/14.0.1/include/arm_neon.h":19566:43
-1
 (nil))

[Bug target/99161] Suboptimal SVE code for ld4/st4 MLA code

2024-02-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99161

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |FIXED
   Target Milestone|--- |13.0
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Andrew Pinski  ---
Fixed in GCC 13.

[Bug target/106694] Redundant move instructions in ARM SVE intrinsics use cases

2024-02-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106694
Bug 106694 depends on bug 99161, which changed state.

Bug 99161 Summary: Suboptimal SVE code for ld4/st4 MLA code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99161

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

[Bug target/99195] Optimise away vec_concat of 64-bit AdvancedSIMD operations with zeroes in aarch64

2024-02-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99195

--- Comment #20 from Andrew Pinski  ---
Is there any remaining patterns that need vczle/vczbe added to it?

Otherwise please close this as fixed for GCC 14.

[Bug target/100165] fmov could be used to zero out the upper bits instead of movi/zip or movi/ins with __builtin_shuffle and zero vector

2024-02-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100165

--- Comment #5 from Andrew Pinski  ---
For the ones which produce ins, it should be easy to modify the pattern to emit
fmov for those cases, that is `elt == 0`:

(define_insn "aarch64_simd_vec_set_zero"
  [(set (match_operand:VALLS_F16 0 "register_operand" "=w")
(vec_merge:VALLS_F16
(match_operand:VALLS_F16 1 "aarch64_simd_imm_zero" "")
(match_operand:VALLS_F16 3 "register_operand" "0")
(match_operand:SI 2 "immediate_operand" "i")))]
  "TARGET_SIMD && exact_log2 (INTVAL (operands[2])) >= 0"
  {
int elt = ENDIAN_LANE_N (, exact_log2 (INTVAL (operands[2])));
operands[2] = GEN_INT ((HOST_WIDE_INT) 1 << elt);
return "ins\\t%0.[%p2], zr";
  }
)

[Bug target/110411] ICE on simple memcpy test case when allowing generation of vector pair load/store insns

2024-02-27 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110411

--- Comment #7 from GCC Commits  ---
The releases/gcc-11 branch has been updated by jeevitha :

https://gcc.gnu.org/g:41af48a1750635a72c48a5809e713d9dd14d9655

commit r11-11257-g41af48a1750635a72c48a5809e713d9dd14d9655
Author: Jeevitha 
Date:   Thu Aug 31 05:40:18 2023 -0500

rs6000: Don't allow AltiVec address in movoo & movxo pattern [PR110411]

There are no instructions that do traditional AltiVec addresses (i.e.
with the low four bits of the address masked off) for OOmode and XOmode
objects. The solution is to modify the constraints used in the movoo and
movxo pattern to disallow these types of addresses, which assists LRA in
resolving this issue. Furthermore, the mode size 16 check has been
removed in vsx_quad_dform_memory_operand to allow OOmode and XOmode, and
quad_address_p already handles less than size 16.

2023-08-31  Jeevitha Palanisamy  

gcc/
PR target/110411
* config/rs6000/mma.md (define_insn_and_split movoo): Disallow
AltiVec address operands.
(define_insn_and_split movxo): Likewise.
* config/rs6000/predicates.md (vsx_quad_dform_memory_operand):
Remove
redundant mode size check.

gcc/testsuite/
PR target/110411
* gcc.target/powerpc/pr110411-1.c: New testcase.
* gcc.target/powerpc/pr110411-2.c: New testcase.

(cherry picked from commit 9ea1248604d7b65009af32103814332f35bd33e2)

[Bug tree-optimization/100745] GCC generates suboptimal assembly from vector extensions on AArch64

2024-02-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100745

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2024-02-27
  Component|target  |tree-optimization
 Status|UNCONFIRMED |NEW

--- Comment #3 from Andrew Pinski  ---
```
  # vsum$0_107 = PHI <_47(11), _29(10)>
  _232 = BIT_FIELD_REF ;
  _231 = .FMA (_100, _101, _232);
  _230 = BIT_FIELD_REF ;
  _229 = .FMA (_234, _235, _230);
...
  _47 = {_231, _229};

...
```

Confirmed, I thought I saw this before, basically inside the loop we keep
together the generic vector still and this causes stores IIRC.

[Bug rtl-optimization/114044] ICE: in expand_fn_using_insn, at internal-fn.cc:208 with _BitInt() and -O -fno-tree-dce

2024-02-27 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114044

--- Comment #5 from GCC Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:c3c44c01d20b00ab5228f32596153b7f4cbc6036

commit r14-9192-gc3c44c01d20b00ab5228f32596153b7f4cbc6036
Author: Jakub Jelinek 
Date:   Tue Feb 27 09:52:07 2024 +0100

expand: Add trivial folding for bit query builtins at expansion time
[PR114044]

While it seems a lot of places in various optimization passes fold
bit query internal functions with INTEGER_CST arguments to INTEGER_CST
when there is a lhs, when lhs is missing, all the removals of such dead
stmts are guarded with -ftree-dce, so with -fno-tree-dce those unfolded
ifn calls remain in the IL until expansion.  If they have large/huge
BITINT_TYPE arguments, there is no BLKmode optab and so expansion ICEs,
and bitint lowering doesn't touch such calls because it doesn't know they
need touching, functions only containing those will not even be further
processed by the pass because there are no non-small BITINT_TYPE SSA_NAMEs
+ the 2 exceptions (stores of BITINT_TYPE INTEGER_CSTs and conversions
from BITINT_TYPE INTEGER_CSTs to floating point SSA_NAMEs) and when walking
there is no special case for calls with BITINT_TYPE INTEGER_CSTs either,
those are for normal calls normally handled at expansion time.

So, the following patch adjust the expansion of these 6 ifns, by doing
nothing if there is no lhs, and also just in case and user disabled all
possible passes that would fold this handles the case of setting lhs
to ifn call with INTEGER_CST argument.

2024-02-27  Jakub Jelinek  

PR rtl-optimization/114044
* internal-fn.def (CLRSB, CLZ, CTZ, FFS, PARITY): Use
DEF_INTERNAL_INT_EXT_FN macro rather than DEF_INTERNAL_INT_FN.
* internal-fn.h (expand_CLRSB, expand_CLZ, expand_CTZ, expand_FFS,
expand_PARITY): Declare.
* internal-fn.cc (expand_bitquery, expand_CLRSB, expand_CLZ,
expand_CTZ, expand_FFS, expand_PARITY): New functions.
(expand_POPCOUNT): Use expand_bitquery.

* gcc.dg/bitint-95.c: New test.

[Bug rtl-optimization/114044] ICE: in expand_fn_using_insn, at internal-fn.cc:208 with _BitInt() and -O -fno-tree-dce

2024-02-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114044

Jakub Jelinek  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #6 from Jakub Jelinek  ---
Fixed.

[Bug target/102171] vget_low_*/vget_high_* intrinsics should become BIT_FIELD_REF during gimple

2024-02-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102171

Andrew Pinski  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |pinskia at gcc dot 
gnu.org

--- Comment #2 from Andrew Pinski  ---
I think I am going to implement this (or assign it interally to someone else to
implement).

[Bug target/102171] vget_low_*/vget_high_* intrinsics should become BIT_FIELD_REF during gimple

2024-02-27 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102171

--- Comment #3 from Tamar Christina  ---
(In reply to Andrew Pinski from comment #2)
> I think I am going to implement this (or assign it interally to someone else
> to implement).

If you do, please also remove them from arm_neon.h and use the new intrinsics
framework.

We're gradually trying to get this file empty.

[Bug target/102652] Unnecessary zeroing out of local ARM NEON arrays

2024-02-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102652

--- Comment #3 from Andrew Pinski  ---
The zeroing part was fixed in GCC 12.

[Bug fortran/114012] overloaded unary operator called twice

2024-02-27 Thread alexandre.poux at coria dot fr via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114012

--- Comment #5 from Alexandre Poux  ---
Thanks for the quick fix !

[Bug tree-optimization/114074] [11/12/13/14 Regression] wrong code at -O1 and above on x86_64-linux-gnu since r8-343

2024-02-27 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114074

--- Comment #8 from GCC Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:a0b1798042d033fd2cc2c806afbb77875dd2909b

commit r14-9193-ga0b1798042d033fd2cc2c806afbb77875dd2909b
Author: Richard Biener 
Date:   Mon Feb 26 13:33:21 2024 +0100

tree-optimization/114074 - CHREC multiplication and undefined overflow

When folding a multiply CHRECs are handled like {a, +, b} * c
is {a*c, +, b*c} but that isn't generally correct when overflow
invokes undefined behavior.  The following uses unsigned arithmetic
unless either a is zero or a and b have the same sign.

I've used simple early outs for INTEGER_CSTs and otherwise use
a range-query since we lack a tree_expr_nonpositive_p and
get_range_pos_neg isn't a good fit.

PR tree-optimization/114074
* tree-chrec.h (chrec_convert_rhs): Default at_stmt arg to NULL.
* tree-chrec.cc (chrec_fold_multiply): Canonicalize inputs.
Handle poly vs. non-poly multiplication correctly with respect
to undefined behavior on overflow.

* gcc.dg/torture/pr114074.c: New testcase.
* gcc.dg/pr68317.c: Adjust expected location of diagnostic.
* gcc.dg/vect/vect-early-break_119-pr114068.c: Do not expect
loop to be vectorized.

[Bug target/106106] SRA scalarizes structure copies

2024-02-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106106

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=102652,
   ||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=98877
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2024-02-27
  Component|tree-optimization   |target

[Bug tree-optimization/114074] [11/12/13 Regression] wrong code at -O1 and above on x86_64-linux-gnu since r8-343

2024-02-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114074

Richard Biener  changed:

   What|Removed |Added

Summary|[11/12/13/14 Regression]|[11/12/13 Regression] wrong
   |wrong code at -O1 and above |code at -O1 and above on
   |on x86_64-linux-gnu since   |x86_64-linux-gnu since
   |r8-343  |r8-343
  Known to work||14.0

--- Comment #9 from Richard Biener  ---
Fixed on trunk sofar.

[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function

2024-02-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534

--- Comment #28 from Jakub Jelinek  ---
(In reply to Lukas Grätz from comment #9)
> Well it is not my testcase. But I added backtracing and observed that the
> printed backtrace is unchanged with your patch. The new
> no_return_to_caller():

You haven't tried hard enough.
Consider the testcase I've posted to the mailing list, built with -Og -g.
It is artificial in that register pressure is increased artificially rather
than coming from meaningful code, noipa attribute is used heavily instead of
functions being too large or in different TUs, and optimize attribute used
instead of the noreturn function sitting in a different library, built there
with -O2, while user program say with -Og.

extern void abort (void);
volatile unsigned v = 0xdeadbeefU;
int w;

__attribute__((noipa)) void
corge (char *p)
{
  (void) p;
}

__attribute__((noipa)) int
foo (int x)
{
  return x;
}

__attribute__((noipa, noreturn, optimize (2))) void
bar (void)
{
  unsigned a = v;
  unsigned b = v;
  unsigned c = v;
  unsigned d = v;
  unsigned e = v;
  unsigned f = v;
  unsigned g = v;
  unsigned h = v;
  int i = foo (50);
  v = a + b + c + d + e + f + g + h;
  abort ();
}

__attribute__((noipa)) void
baz (int a, int b, int c, int d, int e, int f, int g, int h)
{
  int i = foo (51);
  if (w)
bar ();
}

__attribute__((noipa)) void
qux (void)
{
  int a = foo (42);
  int b = foo (43);
  int c = foo (44);
  int d = foo (45);
  int e = foo (46);
  int f = foo (47);
  int g = foo (48);
  int h = foo (49);
  corge (__builtin_alloca (foo (52)));
  baz (a, b, c, d, e, f, g, h);
  w++;
  baz (a, b, c, d, e, f, g, h);
  baz (a, b, c, d, e, f, g, h);
}

int
main ()
{
  qux ();
}

Before the r14-8470 changes the backtrace on abort was

#0  0x77dbd765 in abort () from /lib64/libc.so.6
#1  0x004011ca in bar () at /tmp/1.c:30 
#2  0x004011f1 in baz (a=a@entry=42, b=b@entry=43, c=c@entry=44,
d=d@entry=45, e=e@entry=46, f=f@entry=47, g=48, h=49) at /tmp/1.c:38
#3  0x004012d8 in qux () at /tmp/1.c:55 
#4  0x00401319 in main () at /tmp/1.c:62

The gcc trunk hits the backtrace not possible problem because rbp is
clobbered and needed in upper frame CFA computation:

#0  0x77dbd765 in abort () from /lib64/libc.so.6
#1  0x004011b0 in bar () at /tmp/1.c:30 
#2  0x004011d1 in baz (a=, b=,
c=,
d=d@entry=-559038737, e=e@entry=-559038737, f=f@entry=-559038737, g=48, h=49)
at /tmp/1.c:38  
#3  0x004012a9 in qux () at /tmp/1.c:55 
Backtrace stopped: previous frame inner to this frame (corrupt stack?)  

And in the patched gcc (with PR114116 patch to save bp register) backtrace
works but several of the values are bogus:  
#0  0x77dbd765 in abort () from /lib64/libc.so.6
#1  0x004011b1 in bar () at /tmp/1.c:30 
#2  0x004011d2 in baz (a=a@entry=42, b=b@entry=43, c=c@entry=44,
d=d@entry=-559038737, e=e@entry=-559038737, f=f@entry=-559038737, g=48, h=49)
at /tmp/1.c:38 
#3  0x004012aa in qux () at /tmp/1.c:55 
#4  0x004012e4 in main () at /tmp/1.c:62

So, I think we should limit this to -fno-unwind-tables or maybe
-mcmodel=kernel.

[Bug libquadmath/114126] New: A not infinite result of tanq of M_PI_2

2024-02-27 Thread www3.spl at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114126

Bug ID: 114126
   Summary: A not infinite result of tanq of M_PI_2
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libquadmath
  Assignee: unassigned at gcc dot gnu.org
  Reporter: www3.spl at gmail dot com
  Target Milestone: ---

Hi there. I think that this 'bug' was not sent before.

I'm obtaining an incorrect result for tanq( M_PI_2q ):
tanq( M_PI_2q ) = +2.306323558737156172766198381637374e+34
when it should be infinite.

[Bug libquadmath/114126] A not infinite result of tanq of M_PI_2

2024-02-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114126

--- Comment #1 from Andrew Pinski  ---
Can you provide a full testcase? And also specify which target are you on?

[Bug target/114098] _tile_loadconfig doesn't work

2024-02-27 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114098

--- Comment #6 from GCC Commits  ---
The releases/gcc-11 branch has been updated by H.J. Lu :

https://gcc.gnu.org/g:26b1012c26c4b4de0b4561e74b856a7f7d259a48

commit r11-11258-g26b1012c26c4b4de0b4561e74b856a7f7d259a48
Author: H.J. Lu 
Date:   Sun Feb 25 10:21:04 2024 -0800

x86: Properly implement AMX-TILE load/store intrinsics

ldtilecfg and sttilecfg take a 512-byte memory block.  With
_tile_loadconfig implemented as

extern __inline void
__attribute__((__gnu_inline__, __always_inline__, __artificial__))
_tile_loadconfig (const void *__config)
{
  __asm__ volatile ("ldtilecfg\t%X0" :: "m" (*((const void **)__config)));
}

GCC sees:

(parallel [
  (asm_operands/v ("ldtilecfg   %X0") ("") 0
   [(mem/f/c:DI (plus:DI (reg/f:DI 77 virtual-stack-vars)
 (const_int -64 [0xffc0])) [1
MEM[(const void * *)&tile_data]+0 S8 A128])]
   [(asm_input:DI ("m"))]
   (clobber (reg:CC 17 flags))])

and the memory operand size is 1 byte.  As the result, the rest of 511
bytes is ignored by GCC.  Implement ldtilecfg and sttilecfg intrinsics
with a pointer to XImode to honor the 512-byte memory block.

gcc/ChangeLog:

PR target/114098
* config/i386/amxtileintrin.h (_tile_loadconfig): Use
__builtin_ia32_ldtilecfg.
(_tile_storeconfig): Use __builtin_ia32_sttilecfg.
* config/i386/i386-builtin.def (BDESC): Add
__builtin_ia32_ldtilecfg and __builtin_ia32_sttilecfg.
* config/i386/i386-expand.c (ix86_expand_builtin): Handle
IX86_BUILTIN_LDTILECFG and IX86_BUILTIN_STTILECFG.
* config/i386/i386.md (ldtilecfg): New pattern.
(sttilecfg): Likewise.

gcc/testsuite/ChangeLog:

PR target/114098
* gcc.target/i386/amxtile-4.c: New test.

(cherry picked from commit 4972f97a265c574d51e20373ddefd66576051e5c)

[Bug target/114098] _tile_loadconfig doesn't work

2024-02-27 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114098

H.J. Lu  changed:

   What|Removed |Added

   Target Milestone|--- |11.5
 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #7 from H.J. Lu  ---
Fixed for 11.5, 12.4, 13.3 and 14.

[Bug tree-optimization/114121] wrong code with _BitInt() arithmetics at -O2

2024-02-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114121

--- Comment #8 from Richard Biener  ---
Created attachment 57549
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57549&action=edit
prototype fix

This is very similar to PR113831.  We again have two refs looking seemingly
the same:

  _80 = _109 + 1;
  _79 = VIEW_CONVERT_EXPR(y)[_80];
  _77 = .USUBC (0, _79, _103);

and

  _43 = _109 + 1;
  _42 = VIEW_CONVERT_EXPR(y)[_43];
  _39 = .USUBC (0, _42, _103);

so they are structurally entered in the same way into the expression hash
table.  But since _80 and _43 have different ranges what
get_ref_base_and_extent will compute differs - in the case of _109 <= 3
it will make the stmt walking hit the __builtin_memset and record a value
number of zero for the expresssion.

As we only after that (by bad luck) visit the other reference we successfully
look up the existing value from the hashtable during the walk.

In the PR113831 the accesses degenerated to a single array element which
allowed the fix to work (adjust the expression we put into the hash).  But
this shows (and I feared that ...) this doesn't work.  We either have to
make all ranges part of the expression (even if they make a difference in
the end) or avoid using ranges alltogether when computing a value for
an expression during the walk, most definitely when we walk to different
context (but that's hard to specify).

Maybe a middle-ground would be to make the get_ref_base_and_extent computed
info part of the expression.  Like the attached.  Lot's of ??? to address
though ...

[Bug libquadmath/114126] A not infinite result of tanq of M_PI_2

2024-02-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114126

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org
 Resolution|--- |INVALID
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Jakub Jelinek  ---
Why do you think this is a bug?
#include 
#include 
#include 

int
main ()
{
  _Float128 f = tanf128 (M_PI_2f128);
  volatile _Float128 g = M_PI_2f128;
  g = tanf128 (g);
  char buf[128];
  strfromf128 (buf, 128, "%.34a", f);
  printf ("%s\n", buf);
  strfromf128 (buf, 128, "%.34a", g);
  printf ("%s\n", buf);
}
also prints
0x1.1c46bd57277993a2ee60193c957b00p+114
0x1.1c46bd57277993a2ee60193c957b00p+114

M_PI_2q or M_PI_2f128 is
1.5707963267948966192313216916397513987...
while pi/2 with larger precision is I think
1.5707963267948966192313216916397514420...
so M_PI_2{q,f128} is rounded down, not up,
so no wonder tanq/tanf128 is not inf.

[Bug libquadmath/114126] A not infinite result of tanq of M_PI_2

2024-02-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114126

--- Comment #3 from Jakub Jelinek  ---
Not to mention that if it would be rounded up (like it happens e.g. in the
M_PI_f32 case), you wouldn't get inf either, nor -inf, but some large negative
number.

[Bug ada/114127] New: [14 regression] Assert_Failure in nlists.adb

2024-02-27 Thread simon at pushface dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114127

Bug ID: 114127
   Summary: [14 regression] Assert_Failure in nlists.adb
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: ada
  Assignee: unassigned at gcc dot gnu.org
  Reporter: simon at pushface dot org
CC: dkm at gcc dot gnu.org
  Target Milestone: ---

Created attachment 57550
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57550&action=edit
Reproducer

This was originally found in the equivalent arm-eabi cross compiler.
Sources in attached zip file.

$ gcc -c framebuffer_ili9341.ads 
+===GNAT BUG DETECTED==+
| 14.0.1 20240218 (experimental) (x86_64-apple-darwin21) Assert_Failure
nlists.adb:952|
| Error detected at ili9341-device.adb:44:53 [framebuffer_ili9341.ads:59:4]|
| Compiling framebuffer_ili9341.ads|
| Please submit a bug report; see https://gcc.gnu.org/bugs/ .  |
| Use a subject line meaningful to you and us to track the bug.|
| Include the entire contents of this bug box in the report.   |
| Include the exact command that you entered.  |
| Also include sources listed below.   |
+==+

Please include these source files with error report
Note that list may not be accurate in some cases,
so please double check that the problem can still
be reproduced with the set of files listed.
Consider also -gnatd.n switch (see debug.adb).

framebuffer_ili9341.ads
hal.ads
hal-framebuffer.ads
hal-bitmap.ads
framebuffer_ltdc.ads
stm32.ads
stm32-dma2d_bitmap.ads
stm32-dma2d.ads
memory_mapped_bitmap.ads
soft_drawing_bitmap.ads
stm32-ltdc.ads
stm32-device.ads
stm32_svd.ads
stm32_svd-sdio.ads
stm32-dma.ads
stm32_svd-dma.ads
stm32-gpio.ads
stm32_svd-gpio.ads
stm32-exti.ads
hal-gpio.ads
stm32-adc.ads
stm32_svd-adc.ads
stm32-usarts.ads
hal-uart.ads
stm32_svd-usart.ads
stm32-spi.ads
stm32_svd-spi.ads
hal-spi.ads
stm32-spi-dma.ads
stm32-dma-interrupts.ads
stm32-i2s.ads
hal-audio.ads
stm32-i2c.ads
stm32_svd-i2c.ads
hal-i2c.ads
stm32-i2c-dma.ads
stm32-timers.ads
stm32-dac.ads
stm32_svd-dac.ads
stm32-rtc.ads
hal-real_time_clock.ads
stm32-crc.ads
stm32_svd-crc.ads
stm32-sdmmc.ads
sdmmc_svd_periph.ads
hal-sdmmc.ads
hal-block_drivers.ads
stm32-sdmmc_interrupt.ads
ili9341.ads
ili9341-device.ads
hal-time.ads
ili9341-spi_connector.ads
ili9341-device.adb
ili9341-regs.ads

[Bug c++/114128] New: ice with -fstrub=internal

2024-02-27 Thread dcb314 at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114128

Bug ID: 114128
   Summary: ice with -fstrub=internal
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: dcb314 at hotmail dot com
  Target Milestone: ---

[Bug middle-end/112938] ice with -fstrub=internal

2024-02-27 Thread dcb314 at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112938

--- Comment #8 from David Binderman  ---
(In reply to Alexandre Oliva from comment #7)
> Fixed.

Seems to have reappeared:

$ ~/gcc/results/bin/gcc -c -fstrub=internal bug988.cc
bt2_locks.cpp: In function ‘void mcs_lock::spin_while_eq(const volatile
std::atomic_bool&, bool)’:
bt2_locks.cpp:36:1: error: invalid address operand in ‘mem_ref’
*expected;

# VUSE <.MEM_8>
expected.8_3 ={v} *expected;
during IPA pass: strub
bt2_locks.cpp:36:1: internal compiler error: verify_gimple failed
0x11f4a92 verify_gimple_in_cfg(function*, bool, bool)
/home/dcb38/gcc/working/gcc/../../trunk.20210101/gcc/tree-cfg.cc:5663
0x1065788 execute_function_todo(function*, void*)
/home/dcb38/gcc/working/gcc/../../trunk.20210101/gcc/passes.cc:2088

I would be grateful if someone could confirm what I am seeing here.

[Bug middle-end/113988] during GIMPLE pass: bitintlower: internal compiler error: in lower_stmt, at gimple-lower-bitint.cc:5470

2024-02-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113988

--- Comment #27 from Jakub Jelinek  ---
Created attachment 57551
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57551&action=edit
gcc14-pr113988.patch

Untested fix.

[Bug ada/114127] Assert_Failure in nlists.adb on [] aggregate in generic with pragma Ada_2022

2024-02-27 Thread ebotcazou at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114127

Eric Botcazou  changed:

   What|Removed |Added

Summary|[14 regression] |Assert_Failure in
   |Assert_Failure in   |nlists.adb on [] aggregate
   |nlists.adb  |in generic with pragma
   ||Ada_2022
 CC||ebotcazou at gcc dot gnu.org
 Ever confirmed|0   |1
   Last reconfirmed||2024-02-27
 Status|UNCONFIRMED |NEW

--- Comment #1 from Eric Botcazou  ---
Compile with -gnat2022 or use pragma Ada_2022 consistently, but that's not a
regression.

[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function

2024-02-27 Thread lukas.graetz--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534

--- Comment #29 from Lukas Grätz  ---
(In reply to Jakub Jelinek from comment #28)
> (In reply to Lukas Grätz from comment #9)
> > Well it is not my testcase. But I added backtracing and observed that the
> > printed backtrace is unchanged with your patch. The new
> > no_return_to_caller():
> 
> You haven't tried hard enough.


That might be true.


> Consider the testcase I've posted to the mailing list, built with -Og -g.

> The gcc trunk hits the backtrace not possible problem because rbp is
> 
> clobbered and needed in upper frame CFA computation:


Yes, when a backtrace is based on rbp, one needs -fno-omit-frame-pointer. I
trusted comment #10 here, as it made sense.


> And in the patched gcc (with PR114116 patch to save bp register) backtrace
> works but several of the values are bogus:  

> #2  0x004011d2 in baz (a=a@entry=42, b=b@entry=43, c=c@entry=44,
> d=d@entry=-559038737, e=e@entry=-559038737, f=f@entry=-559038737, g=48,
> h=49) at /tmp/1.c:38


glibc's backtrace() function and friends only reports function names and
addresses. This looks like the gdb bt command. I admit, I did not take a proper
look into that before.

I belief this could and should be somehow be fixed by adding DWARF info that
certain callee-saved registers (= the function parameter values) were
overwritten. The corrected backtrace could look something like this:


#2  0x004011d2 in baz (a=42, b=43, c=44, d=,
e=, f=, g=48, h=49) at /tmp/1.c:38


Some parameters would be , and this would be fine because the
code was partially compiled with -O2. It is not unusual to have 
parameter values in gdb's bt.


> So, I think we should limit this to -fno-unwind-tables or maybe
> -mcmodel=kernel.


Now I am confused. The optimization is limited to -fexceptions. And the
documentation of -funwind-tables says "Similar to -fexceptions, except". So
shouldn't -funwind-tables behave similar to -fexceptions? I don't see anything
kernel-specific here.

[Bug tree-optimization/114121] wrong code with _BitInt() arithmetics at -O2

2024-02-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114121

--- Comment #9 from Richard Biener  ---
Which of course would regress something like

int a[16];
int foo (int i)
{
  if (i > 7)
return a[i];
  else
return a[i];
}

where we'd no longer hoist as we no longer would value-number the refs the
same (that extends to any ref using SSA names with eventually differing
ranges).  Reverting r9-398 likely isn't the best answer either, it would
of course also regress the two valid replacements with zero (the prototype
patch preserves those).

There's the PR100923 fix (r12-1295-g7a56d3d3e99cc7) which targeted a similar
(but even more odd) case with "contextual" PTA info (though there's really no
such thing).
But it didn't really fix the contextual thing but added re-valueization
which in case of vn_reference_lookup_pieces works on value-numbered refs
where failure mode is keeping the value-number.

The prototype (after fixing it a bit) passes bootstrap but regresses quite
some number of testcases (maybe due to ???s present).

FAIL: g++.dg/ipa/devirt-20.C  -std=gnu++98  scan-tree-dump-not release_ssa
"abor
t"
FAIL: g++.dg/ipa/devirt-20.C  -std=gnu++14  scan-tree-dump-not release_ssa
"abor
t"
FAIL: g++.dg/ipa/devirt-20.C  -std=gnu++17  scan-tree-dump-not release_ssa
"abor
t"
FAIL: g++.dg/ipa/devirt-20.C  -std=gnu++20  scan-tree-dump-not release_ssa
"abort"
FAIL: g++.dg/opt/pr110879.C  -std=gnu++14  scan-tree-dump-not optimized
"=s*S*res_(?!S*_M_end_of_storage;)"
FAIL: g++.dg/opt/pr110879.C  -std=gnu++17  scan-tree-dump-not optimized
"=s*S*res_(?!S*_M_end_of_storage;)"
FAIL: g++.dg/opt/pr110879.C  -std=gnu++20  scan-tree-dump-not optimized
"=s*S*res_(?!S*_M_end_of_storage;)"
FAIL: g++.dg/pr99966.C  -std=gnu++17  scan-tree-dump-not vrp1 "throw"
FAIL: g++.dg/pr99966.C  -std=gnu++20  scan-tree-dump-not vrp1 "throw"
FAIL: g++.dg/vect/pr112961.cc  -std=c++98  scan-tree-dump vect "LOOP
VECTORIZED"
FAIL: g++.dg/vect/pr112961.cc  -std=c++14  scan-tree-dump vect "LOOP
VECTORIZED"
FAIL: g++.dg/vect/pr112961.cc  -std=c++17  scan-tree-dump vect "LOOP
VECTORIZED"
FAIL: g++.dg/vect/pr112961.cc  -std=c++20  scan-tree-dump vect "LOOP
VECTORIZED"
FAIL: g++.dg/vect/pr89653.cc  -std=c++98  scan-tree-dump vect "vectorized 1
loops"
FAIL: g++.dg/vect/pr89653.cc  -std=c++14  scan-tree-dump vect "vectorized 1
loops"
FAIL: g++.dg/vect/pr89653.cc  -std=c++17  scan-tree-dump vect "vectorized 1
loops"
FAIL: g++.dg/vect/pr89653.cc  -std=c++20  scan-tree-dump vect "vectorized 1
loops"
FAIL: g++.dg/vect/simd-10.cc  -std=c++98  scan-tree-dump-times vect "vectorized
[1-3] loops" 2
FAIL: g++.dg/vect/simd-10.cc  -std=c++14  scan-tree-dump-times vect "vectorized
[1-3] loops" 2
FAIL: g++.dg/vect/simd-10.cc  -std=c++17  scan-tree-dump-times vect "vectorized
[1-3] loops" 2
FAIL: g++.dg/vect/simd-10.cc  -std=c++20  scan-tree-dump-times vect "vectorized
[1-3] loops" 2

FAIL: gcc.dg/ira-loop-pressure.c scan-rtl-dump loop2_invariant "Decided to move
invariant"
FAIL: gcc.dg/pr41783.c scan-tree-dump pre "pretmp[^n]* = a_global_var;"
FAIL: gcc.dg/pr78138.c  (test for warnings, line 23)
FAIL: gcc.dg/tree-ssa/ifc-pr69489-1.c scan-tree-dump-times ifcvt "Applying
if-conversion" 1
FAIL: gcc.dg/tree-ssa/ifc-pr69489-1.c scan-tree-dump-times ifcvt "Invalid sum
of outgoing probabilities 200.0" 1
FAIL: gcc.dg/tree-ssa/ifc-pr69489-1.c scan-tree-dump-times ifcvt "Invalid sum
of incoming counts" 1
FAIL: gcc.dg/tree-ssa/ifc-pr69489-2.c scan-tree-dump-times ifcvt "Applying
if-conversion" 1
FAIL: gcc.dg/tree-ssa/ifc-pr69489-2.c scan-tree-dump-times ifcvt "Invalid sum
of outgoing probabilities 200.0" 1
FAIL: gcc.dg/tree-ssa/ifc-pr69489-2.c scan-tree-dump-times ifcvt "Invalid sum
of incoming counts" 1
FAIL: gcc.dg/tree-ssa/loadpre1.c scan-tree-dump-times pre "Eliminated: 1" 1
FAIL: gcc.dg/tree-ssa/loadpre10.c scan-tree-dump-times pre "Eliminated: 1" 1
FAIL: gcc.dg/tree-ssa/loadpre11.c scan-tree-dump-times pre "Eliminated: 1" 1
FAIL: gcc.dg/tree-ssa/loadpre12.c scan-tree-dump-times pre "Eliminated: 1" 1
FAIL: gcc.dg/tree-ssa/loadpre13.c scan-tree-dump-times pre "Eliminated: 1" 1
FAIL: gcc.dg/tree-ssa/loadpre14.c scan-tree-dump-times pre "Eliminated: 2" 1
FAIL: gcc.dg/tree-ssa/loadpre16.c scan-tree-dump-times pre "Eliminated: 1" 1
FAIL: gcc.dg/tree-ssa/loadpre2.c scan-tree-dump-times pre "Eliminated: 1" 1
FAIL: gcc.dg/tree-ssa/loadpre21.c scan-tree-dump-times pre "Eliminated: 1" 1
FAIL: gcc.dg/tree-ssa/loadpre23.c scan-tree-dump-times pre "Eliminated: 1" 1
FAIL: gcc.dg/tree-ssa/loadpre24.c scan-tree-dump-times pre "Eliminated: 1" 1
FAIL: gcc.dg/tree-ssa/loadpre25.c scan-tree-dump-times pre "Eliminated: 1" 1
FAIL: gcc.dg/tree-ssa/loadpre3.c scan-tree-dump-times pre "Eliminated: 2" 1
FAIL: gcc.dg/tree-ssa/loadpre4.c scan-tree-dump-times pre "Eliminated: 1" 1
FAIL: gcc.dg/tree-ssa/loadpre6.c scan-tree-dump-times pre "Eliminated: 1" 1
FAIL: gcc.dg/tree-ssa/loadpre6.c scan-tree-dump-times pre "Insertions: 1" 1
FAIL: gcc.dg/tree-ssa/pr21417.c sc

[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function

2024-02-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534

--- Comment #30 from Jakub Jelinek  ---
(In reply to Lukas Grätz from comment #29)
> Yes, when a backtrace is based on rbp, one needs -fno-omit-frame-pointer. I
> trusted comment #10 here, as it made sense.

See PR114116.

> glibc's backtrace() function and friends only reports function names and
> addresses. This looks like the gdb bt command. I admit, I did not take a
> proper look into that before.

Yes, it is gdb bt.  And it is what people heavily rely on for debugging, if
something fails an assertion or aborts etc., they want to figure out why.

> I belief this could and should be somehow be fixed by adding DWARF info that
> certain callee-saved registers (= the function parameter values) were
> overwritten. The corrected backtrace could look something like this:

That can be arranged by emitting those .cfi_undefined directives...

> #2  0x004011d2 in baz (a=42, b=43, c=44, d=,
> e=, f=, g=48, h=49) at /tmp/1.c:38

... but really will not help users to debug/fix their code.

> > So, I think we should limit this to -fno-unwind-tables or maybe
> > -mcmodel=kernel.
> Now I am confused. The optimization is limited to -fexceptions. And the
> documentation of -funwind-tables says "Similar to -fexceptions, except". So
> shouldn't -funwind-tables behave similar to -fexceptions? I don't see
> anything kernel-specific here.

Given that even with -fno-asynchronous-unwind-tables (or -fno-unwind-tables)
gcc emits
the unwind info, just not into .eh_frame but .debug_frame, we shouldn't disable
it
just when not emitting .eh_frame, but should just disable it always.
There is a reason why it has been rejected years ago.
If anything, guard it with some non-default -m* option and explain the
consequences to users if they use it.  Still, the guarding IMHO should be done
on top of the PR114116
change, because having random crashes from backtrace or gdb bt even when user
asked for it is a bad idea.

[Bug tree-optimization/114121] wrong code with _BitInt() arithmetics at -O2

2024-02-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114121

--- Comment #10 from Jakub Jelinek  ---
Could we for lookups if range isn't a subset of the found range pretend there
was not a match, try to see through definitions again and only if it yields an
equivalent result value range it the same?  Perhaps even remember the range
used in it and in case we find non-subset lookup having the same result union
the remembered range?

[Bug tree-optimization/114121] wrong code with _BitInt() arithmetics at -O2

2024-02-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114121

--- Comment #11 from Jakub Jelinek  ---
Shall I try to construct a non-bitint testcase for this?

[Bug target/113960] std::map with std::vector as input overwrites itself with c++20, on s390x platform

2024-02-27 Thread stefansf at linux dot ibm.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113960

--- Comment #3 from Stefan Schulze Frielinghaus  
---
This seems to be a bug in the three way comparison introduced with C++20.  The
bug happens while deciding whether key v2 already exists in the map or not.

template
  constexpr auto
  lexicographical_compare_three_way(_InputIter1 __first1,
_InputIter1 __last1,
_InputIter2 __first2,
_InputIter2 __last2,
_Comp __comp)
  -> decltype(__comp(*__first1, *__first2))
  {
// concept requirements
__glibcxx_function_requires(_InputIteratorConcept<_InputIter1>)
__glibcxx_function_requires(_InputIteratorConcept<_InputIter2>)
__glibcxx_requires_valid_range(__first1, __last1);
__glibcxx_requires_valid_range(__first2, __last2);

using _Cat = decltype(__comp(*__first1, *__first2));
static_assert(same_as, _Cat>);

if (!std::__is_constant_evaluated())
  if constexpr (same_as<_Comp, __detail::_Synth3way>
|| same_as<_Comp, compare_three_way>)
if constexpr (__is_byte_iter<_InputIter1>)
  if constexpr (__is_byte_iter<_InputIter2>)
{
  const auto [__len, __lencmp] = _GLIBCXX_STD_A::
__min_cmp(__last1 - __first1, __last2 - __first2);
  if (__len)
{
  const auto __c
= __builtin_memcmp(&*__first1, &*__first2, __len) <=> 0;
  if (__c != 0)
return __c;
}
  return __lencmp;
}

__len equals 1 since both vectors have length 1.  However, memcmp should be
called with the number of bytes and not the number of elements of the vector. 
That means memcmp is called with two pointers to MEMs of unsigned shorts 1 and
2 where the high-bytes equal 0 and therefore memcmp returns with 0 on
big-endian targets.  Ultimately __lencmp is returned which itself equals
std::strong_ordering::equal rendering v2 replacing v1.

Fixed by

diff --git a/libstdc++-v3/include/bits/stl_algobase.h
b/libstdc++-v3/include/bits/stl_algobase.h
index d534e02871f..6ebece315f7 100644
--- a/libstdc++-v3/include/bits/stl_algobase.h
+++ b/libstdc++-v3/include/bits/stl_algobase.h
@@ -1867,8 +1867,10 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
  __min_cmp(__last1 - __first1, __last2 - __first2);
if (__len)
  {
+   const auto __len_bytes = __len * sizeof (*first1);
const auto __c
- = __builtin_memcmp(&*__first1, &*__first2, __len) <=> 0;
+ = __builtin_memcmp(&*__first1, &*__first2, __len_bytes)
+   <=> 0;
if (__c != 0)
  return __c;
  }

Can you give the patch a try?

[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function

2024-02-27 Thread lukas.graetz--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534

--- Comment #31 from Lukas Grätz  ---
(In reply to Jakub Jelinek from comment #30)
> (In reply to Lukas Grätz from comment #29)
> > Yes, when a backtrace is based on rbp, one needs -fno-omit-frame-pointer. I
> > trusted comment #10 here, as it made sense.
> 
> See PR114116.
> 
> > glibc's backtrace() function and friends only reports function names and
> > addresses. This looks like the gdb bt command. I admit, I did not take a
> > proper look into that before.
> 
> Yes, it is gdb bt.  And it is what people heavily rely on for debugging, if
> something fails an assertion or aborts etc., they want to figure out why.
> 

True.

> > I belief this could and should be somehow be fixed by adding DWARF info that
> > certain callee-saved registers (= the function parameter values) were
> > overwritten. The corrected backtrace could look something like this:
> 
> That can be arranged by emitting those .cfi_undefined directives...
>  
> > #2  0x004011d2 in baz (a=42, b=43, c=44, d=,
> > e=, f=, g=48, h=49) at /tmp/1.c:38
> 
> ... but really will not help users to debug/fix their code.


Even when I compile a simple program with gcc -O2 -g:


#include 
int main(int argc, char** argv) {
abort();
}


I still get an "argc=":

(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x77dcd859 in __GI_abort () at abort.c:79
#2  0x00401046 in main (argc=, argv=) at
simple.c:4


Yes, for a better debugging, it would be nice if optimised code would just not
be optimised... But this goes against optimization.


> > > So, I think we should limit this to -fno-unwind-tables or maybe
> > > -mcmodel=kernel.
> > Now I am confused. The optimization is limited to -fexceptions. And the
> > documentation of -funwind-tables says "Similar to -fexceptions, except". So
> > shouldn't -funwind-tables behave similar to -fexceptions? I don't see
> > anything kernel-specific here.
> 
> Given that even with -fno-asynchronous-unwind-tables (or -fno-unwind-tables)
> gcc emits
> the unwind info, just not into .eh_frame but .debug_frame, we shouldn't
> disable it
> just when not emitting .eh_frame, but should just disable it always.
> There is a reason why it has been rejected years ago.
> If anything, guard it with some non-default -m* option and explain the
> consequences to users if they use it.  Still, the guarding IMHO should be
> done on top of the PR114116
> change, because having random crashes from backtrace or gdb bt even when
> user asked for it is a bad idea.


Yes, it is a bad idea to have crashes from backtrace or gdb. But when this is
only about , I don't see the point about disabling it always.

[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function

2024-02-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534

--- Comment #32 from Jakub Jelinek  ---
(In reply to Lukas Grätz from comment #31)
> Even when I compile a simple program with gcc -O2 -g:
> 
> #include 
> int main(int argc, char** argv) {
> abort();
> }
> 
> 
> I still get an "argc=":

Sure, debugging info in optimized code is best effort.

> Yes, for a better debugging, it would be nice if optimised code would just
> not be optimised... But this goes against optimization.

The significant difference between other optimizations and this one is
that normal optimizations affect the debuggability of the optimized function.
This one affects the debuggability of all callers as well, even if they are
compiled in a way that should make them more debuggable.
Normally, if debugging optimized code doesn't work out, one can simply
rebuild that code with -O0 or -Og to make it more debuggable.
Here one would also need to rebuild all the shared libraries it uses.

[Bug target/113960] std::map with std::vector as input overwrites itself with c++20, on s390x platform

2024-02-27 Thread stefansf at linux dot ibm.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113960

Stefan Schulze Frielinghaus  changed:

   What|Removed |Added

 CC||jwakely at redhat dot com

--- Comment #4 from Stefan Schulze Frielinghaus  
---
While giving it a second thought maybe something like

const auto __len_bytes
  = __len * std::min (sizeof (*__first1),
  sizeof (*__first2));

would be more appropriate since AFAICT the types _InputIter1 and _InputIter2
are not related to each other w.r.t. to their pointed size.  Maybe Jonathan can
shed some light on this?

[Bug libquadmath/114126] A not infinite result of tanq of M_PI_2

2024-02-27 Thread www3.spl at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114126

--- Comment #4 from Sergio Peña  ---
(In reply to Jakub Jelinek from comment #2)
> Why do you think this is a bug?
> #include 
> #include 
> #include 
> 
> int
> main ()
> {
>   _Float128 f = tanf128 (M_PI_2f128);
>   volatile _Float128 g = M_PI_2f128;
>   g = tanf128 (g);
>   char buf[128];
>   strfromf128 (buf, 128, "%.34a", f);
>   printf ("%s\n", buf);
>   strfromf128 (buf, 128, "%.34a", g);
>   printf ("%s\n", buf);
> }
> also prints
> 0x1.1c46bd57277993a2ee60193c957b00p+114
> 0x1.1c46bd57277993a2ee60193c957b00p+114
> 
> M_PI_2q or M_PI_2f128 is
> 1.5707963267948966192313216916397513987...
> while pi/2 with larger precision is I think
> 1.5707963267948966192313216916397514420...
> so M_PI_2{q,f128} is rounded down, not up,
> so no wonder tanq/tanf128 is not inf.

Ok. It is posible I was wrong.

[Bug c++/114129] New: Inaccurate error message

2024-02-27 Thread Theodore.Papadopoulo at inria dot fr via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114129

Bug ID: 114129
   Summary: Inaccurate error message
   Product: gcc
   Version: 13.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: Theodore.Papadopoulo at inria dot fr
  Target Milestone: ---

Given the code below

struct A {
virtual void f() { }
};

struct B: public A {
void f() override() { }
};

The g++ compiler gives the following error:
-> g++ -O3 test.cpp 
test.cpp:6:5: error: ‘f’ declared as function returning a function
6 | void f() override() { }
  | ^~~~

Technically, it should be 'override' declared as function returning a function.
or even maybe that override is a reserved name and cannot be used as a function
name...

Yet this is much better than clang:
-> clang++ -O3 test.cpp
test.cpp:6:22: error: expected ';' at end of declaration list
6 | void f() override() { }
  |  ^
  |  ;
1 error generated.

[Bug libquadmath/114126] A not infinite result of tanq of M_PI_2

2024-02-27 Thread www3.spl at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114126

--- Comment #5 from Sergio Peña  ---
(In reply to Jakub Jelinek from comment #2)
> Why do you think this is a bug?
> #include 
> #include 
> #include 
> 
> int
> main ()
> {
>   _Float128 f = tanf128 (M_PI_2f128);
>   volatile _Float128 g = M_PI_2f128;
>   g = tanf128 (g);
>   char buf[128];
>   strfromf128 (buf, 128, "%.34a", f);
>   printf ("%s\n", buf);
>   strfromf128 (buf, 128, "%.34a", g);
>   printf ("%s\n", buf);
> }
> also prints
> 0x1.1c46bd57277993a2ee60193c957b00p+114
> 0x1.1c46bd57277993a2ee60193c957b00p+114
> 
> M_PI_2q or M_PI_2f128 is
> 1.5707963267948966192313216916397513987...
> while pi/2 with larger precision is I think
> 1.5707963267948966192313216916397514420...
> so M_PI_2{q,f128} is rounded down, not up,
> so no wonder tanq/tanf128 is not inf.

Ok. It is posible I was wrong.
I have found this question:
https://stackoverflow.com/questions/54287492/why-didnt-i-get-tanpi-2-infinty-in-c

[Bug target/113960] std::map with std::vector as input overwrites itself with c++20, on s390x platform

2024-02-27 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113960

--- Comment #5 from Jonathan Wakely  ---
But it's guarded by:

if constexpr (__is_byte_iter<_InputIter1>)
  if constexpr (__is_byte_iter<_InputIter2>)

This condition is only supposed to be true when sizeof(*__first1) == 1 and
sizeof(*__first2) == 1

We can only use memcmp if we're comparing single bytes as unsigned values (and
if the iterators are pointers to contiguous memory, not e.g. segmented
iterators like std::deque's, or not even random access iterators, like
std::list's).

For std::vector we should not use this code at all.

[Bug analyzer/111881] [14 Regression] analyzer: ICE in ensure_closed, at analyzer/constraint-manager.cc:130 with -Ofast

2024-02-27 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111881

--- Comment #2 from GCC Commits  ---
The master branch has been updated by David Malcolm :

https://gcc.gnu.org/g:43ad6ce60108acc822efcd394b75e270c1996cb5

commit r14-9195-g43ad6ce60108acc822efcd394b75e270c1996cb5
Author: David Malcolm 
Date:   Tue Feb 27 08:36:58 2024 -0500

analyzer: fix ICE on floating-point bounds [PR111881]

gcc/analyzer/ChangeLog:
PR analyzer/111881
* constraint-manager.cc (bound::ensure_closed): Assert that
m_constant has integral type.
(range::add_bound): Bail out on floating point constants.

gcc/testsuite/ChangeLog:
PR analyzer/111881
* c-c++-common/analyzer/conditionals-pr111881.c: New test.

Signed-off-by: David Malcolm 

[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function

2024-02-27 Thread lukas.graetz--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534

--- Comment #33 from Lukas Grätz  ---
(In reply to Jakub Jelinek from comment #32)
> (In reply to Lukas Grätz from comment #31)
> > Even when I compile a simple program with gcc -O2 -g:
> > 
> > #include 
> > int main(int argc, char** argv) {
> > abort();
> > }
> > 
> > 
> > I still get an "argc=":
> 
> Sure, debugging info in optimized code is best effort.
> 
> > Yes, for a better debugging, it would be nice if optimised code would just
> > not be optimised... But this goes against optimization.
> 
> The significant difference between other optimizations and this one is
> that normal optimizations affect the debuggability of the optimized function.
> This one affects the debuggability of all callers as well, even if they are
> compiled in a way that should make them more debuggable.
> Normally, if debugging optimized code doesn't work out, one can simply
> rebuild that code with -O0 or -Og to make it more debuggable.
> Here one would also need to rebuild all the shared libraries it uses.

When the debugger is inside the debuggable -O0 or -Og compiled function, we
would see all parameters and current variable values. However, in the bt
example, we are in another function. So the parameters are only available at
best effort.

I just noticed that for my simple.c example above, I get "argc="
even with -Og. However, when breakpoint is somewhere else,

(gdb) break main
(gdb) run
(gdb) bt

I get the correct "argc=1". The same applies to your example with "break baz".
It is just not guaranteed that gdb is able to reconstruct function parameters
when we are in some other function.

[Bug target/114130] New: RISC-V: `__atomic_compare_exchange` does not use sign-extended value

2024-02-27 Thread x at maxxsoft dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114130

Bug ID: 114130
   Summary: RISC-V: `__atomic_compare_exchange` does not use
sign-extended value
   Product: gcc
   Version: 13.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: x at maxxsoft dot net
  Target Milestone: ---

GCC 13.2 does not generate sign-extension for value to be compared with the
result of `lr.w` instruction in `__atomic_compare_exchange`:
https://godbolt.org/z/nafKhPa1Y

Code:

```
void foo(uint32_t *p) {
uintptr_t x = *(uintptr_t *)p;
uint32_t e = !p ? 0 : (uintptr_t)p >> 1;
uint32_t d = (uintptr_t)x;
__atomic_compare_exchange(p, &e, &d, 0, __ATOMIC_RELAXED,
__ATOMIC_RELAXED);
}
```

Assembly generated by `gcc -O3`:

```
foo:
ld  a4,0(a0)
srlia5,a0,1
 1: lr.w a3,0(a0); bne a3,a5,1f; sc.w a2,a4,0(a0); bnez a2,1b; 1:
ret
```

Which `a5` should be sign-extended, since the RISC-V ISA manual says `lr.w`
returns a sign-extended value in RV64.

But `gcc -O3 -fno-delete-null-pointer-checks` generates correct code:

```
foo:
ld  a4,0(a0)
li  a5,0
beq a0,zero,.L2
srlia5,a0,1
sext.w  a5,a5
.L2:
 1: lr.w a3,0(a0); bne a3,a5,1f; sc.w a2,a4,0(a0); bnez a2,1b; 1:
ret
```

`gcc -O3 -fno-tree-ter`'s output is slight different, but also sign-extended.

`clang -O3` always generates correct code:

```
foo:# @foo
lw  a1, 0(a0)
srlia2, a0, 1
sext.w  a2, a2
.LBB0_1:# =>This Inner Loop Header: Depth=1
lr.wa3, (a0)
bne a3, a2, .LBB0_3
sc.wa4, a1, (a0)
bneza4, .LBB0_1
.LBB0_3:
ret
```

[Bug analyzer/111881] [14 Regression] analyzer: ICE in ensure_closed, at analyzer/constraint-manager.cc:130 with -Ofast

2024-02-27 Thread dmalcolm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111881

David Malcolm  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from David Malcolm  ---
Should be fixed by above patch.

[Bug c++/114129] Inaccurate error message

2024-02-27 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114129

Jonathan Wakely  changed:

   What|Removed |Added

   Keywords||diagnostic

--- Comment #1 from Jonathan Wakely  ---
(In reply to Theodore.Papadopoulo from comment #0)
> Technically, it should be 'override' declared as function returning a
> function.

No, GCC is correct here, according to the grammar ... or as much as you can
reason about how the grammar applies to code that doesn't conform to the
grammar.

The function declarator is `f() override` and the return type is `void()`.

You get exactly the same error without the override:

o.cc:6:5: error: ‘f’ declared as function returning a function
6 | void f()() { }
  | ^~~~


> or even maybe that override is a reserved name and cannot be used as a
> function name...

That would definitely be wrong though. It's not reserved, it's "an identifier
with special meaning", and `void override() { }` is perfectly valid as a
function declaration.

[Bug tree-optimization/114121] wrong code with _BitInt() arithmetics at -O2

2024-02-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114121

--- Comment #12 from Richard Biener  ---
(In reply to Jakub Jelinek from comment #11)
> Shall I try to construct a non-bitint testcase for this?

That would be nice, more coverage is always good.

[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function

2024-02-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534

--- Comment #34 from Jakub Jelinek  ---
Best effort are the whatever@entry values, that is used if an argument is no
longer used across the function call and isn't stored in any call saved
register or stack slot.
There can be also automatic variables which are live across the call (note,
even if you have noreturn function lower in the call stack, its caller e.g.
could just call the noreturn function conditionally or could be from a
different translation unit in which the callee is not declared noreturn, and
any caller up in the call stack then won't have noreturn calls).
If you have
  int x = fn1 (...);
  fn2 (...); // This function conditionally calls a noreturn function
  fn3 (x);
then typically x will be in callee saved register (unless we run out of them),
it isn't best effort in there, the debug info just says that say x lives in
%ebx register,
it doesn't say it might be in that register.
Now, when you up from the noreturn function to this frame, gdb won't be able to
restore the register (if .cfi_undefined is emitted), or right now just can have
completely bogus values.

[Bug libgcc/114131] New: std::isinf(std::float128_t) generates superfluous nan-checks

2024-02-27 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114131

Bug ID: 114131
   Summary: std::isinf(std::float128_t) generates superfluous
nan-checks
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libgcc
  Assignee: unassigned at gcc dot gnu.org
  Reporter: g.peterh...@t-online.de
  Target Milestone: ---

please see https://godbolt.org/z/djc9q1vcv
test1(default): includes nan-checks (__unordtf2)
test2: no nan-checks, but calls __eqtf2
test3: only checks for inf (via bit_cast); no additional function calls +
branchfree. Of course, this only works if (unsigned) __int128 is available.

thx
Gero

[Bug tree-optimization/114121] wrong code with _BitInt() arithmetics at -O2

2024-02-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114121

--- Comment #13 from Richard Biener  ---
(In reply to Jakub Jelinek from comment #10)
> Could we for lookups if range isn't a subset of the found range pretend
> there was not a match, try to see through definitions again and only if it
> yields an equivalent result value range it the same?  Perhaps even remember
> the range used in it and in case we find non-subset lookup having the same
> result union the remembered range?

So pretend we record the first match with using a range that improves
the result in the hashtable.  Then, when looking up the second ref we
hit the hashtable entry, see it has an incompatible range so we can't
use the recorded value.  We can then easily only ignore the entry (the
prototype patch does this).  As we can't easily tell whether we used any
(or even which) range without doing multiple lookups for each ref and
comparing the result "re-doing" things wouldn't work.

But for determining two refs are equivalent it might be enough to avoid
recording any kind of range for when the value was "varying".  The value
of such hashtable entry would be usable even by lookups with narrower
range (but also not yielding any better "constant" value).

I'm trying to improve things this way.

[Bug libgcc/114131] std::isinf(std::float128_t) generates superfluous nan-checks

2024-02-27 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114131

Jonathan Wakely  changed:

   What|Removed |Added

   Last reconfirmed||2024-02-27
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Keywords||missed-optimization

[Bug target/114004] GCC emits a superfluous instruction for simple test case on ppc

2024-02-27 Thread jskumari at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114004

Surya Kumari Jangala  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED

[Bug target/114132] New: [avr] Code sets up a frame pointer without need

2024-02-27 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114132

Bug ID: 114132
   Summary: [avr] Code sets up a frame pointer without need
   Product: gcc
   Version: 13.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gjl at gcc dot gnu.org
  Target Milestone: ---

$ avr-gcc -S -Os -mmcu=attiny40 

of 

void funcab_c (long x, char c) {
}

sets up a frame-pointer without need.

Arguments x and c occupy all of the argument registers R25..R20, so that no arg
registers are left.  Then there is this implementation of
TARGET_FRAME_POINTER_REQUIRED in avr.cc:

static bool
avr_frame_pointer_required_p (void)
{
  return (cfun->calls_alloca
  || cfun->calls_setjmp
  || cfun->has_nonlocal_label
  || crtl->args.info.nregs == 0
  || get_frame_size () > 0);
}

Problem is that crtl->args.info.nregs == 0 does not discriminate between need
for arg pointer and no need for arg pointer (but all arg regs are used up, like
in the example).

[Bug target/114132] [avr] Code sets up a frame pointer without need

2024-02-27 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114132

Georg-Johann Lay  changed:

   What|Removed |Added

   Target Milestone|--- |14.0
   Priority|P3  |P4
 Target||avr

[Bug libstdc++/114103] FAIL: 29_atomics/atomic/lock_free_aliases.cc -std=gnu++20 (test for excess errors)

2024-02-27 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114103

Jonathan Wakely  changed:

   What|Removed |Added

   Keywords||patch
   Last reconfirmed||2024-02-27
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #7 from Jonathan Wakely  ---
Patch posted:
https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646619.html

[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function

2024-02-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534

--- Comment #35 from Jakub Jelinek  ---
If I hand edit the gcc trunk + PR114116 patch assembly, add to bar
+   .cfi_undefined 3
+   .cfi_undefined 12
+   .cfi_undefined 13
+   .cfi_undefined 14
+   .cfi_undefined 15
then bt in gdb shows
#2  0x004011d2 in baz (a=a@entry=42, b=b@entry=43, c=c@entry=44,
d=, 
e=, f=, g=48, h=49) at /tmp/1.c:38
and everything in qux live across the call is  as well,
(gdb) p $r12
$10 = 
etc. while without that
(gdb) p a
$1 = 
(gdb) p b
$2 = 
(gdb) p c
$3 = 
(gdb) p d
$4 = -559038737
(gdb) p e
$5 = -559038737
(gdb) p f
$6 = -559038737
(gdb) p g
$7 = -559038737
(gdb) p h
$8 = -559038737
(gdb) p $r12
$9 = 3735928559

[Bug target/113960] std::map with std::vector as input overwrites itself with c++20, on s390x platform

2024-02-27 Thread stefansf at linux dot ibm.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113960

--- Comment #6 from Stefan Schulze Frielinghaus  
---
Guard __is_byte_iter checks for contiguous bytes which I guess is fine for
std::vector and then checks for __is_memcmp_ordered which is fine for
big-endian targets in conjunction with unsigned integers.  From
cpp_type_traits.h we have:

  // Whether memcmp can be used to determine ordering for a type
  // e.g. in std::lexicographical_compare or three-way comparisons.
  // True for unsigned integer-like types where comparing each byte in turn
  // as an unsigned char yields the right result. This is true for all
  // unsigned integers on big endian targets, but only unsigned narrow
  // character types (and std::byte) on little endian targets.
  template::__value
#else
__is_byte<_Tp>::__value
#endif

Thus using memcmp here is fine, however, I'm still a bit unsure whether we
really have to take the minimum of *__first1 and *__first2 since I haven't
found any size-relation between those types.

[Bug target/113960] std::map with std::vector as input overwrites itself with c++20, on s390x platform

2024-02-27 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113960

--- Comment #7 from Jonathan Wakely  ---
Ohhh, I forgot I did that, sorry!

Yeah, the memcmp code wasn't updated to match the different behaviour of
__is_byte_iter for BE.

We can't use memcmp if the sizes are different. We don't want to use the min,
we want to guard that code with the sizes being the same, then we can just use
len*sizeof(*first1) because we know it's the same as sizeof(*first2).

[Bug target/113960] std::map with std::vector as input overwrites itself with c++20, on s390x platform

2024-02-27 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113960

--- Comment #8 from Jonathan Wakely  ---
--- a/libstdc++-v3/include/bits/stl_algobase.h
+++ b/libstdc++-v3/include/bits/stl_algobase.h
@@ -1824,8 +1824,9 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
 }

 #if __cpp_lib_three_way_comparison
-  // Iter points to a contiguous range of unsigned narrow character type
-  // or std::byte, suitable for comparison by memcmp.
+  // Iter points to a contiguous range of unsigned narrow character type,
+  // or std::byte, or big-endian unsigned integers, suitable for comparison
+  // by memcmp.
   template
 concept __is_byte_iter = contiguous_iterator<_Iter>
   && __is_memcmp_ordered>::__value;
@@ -1879,14 +1880,16 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
if constexpr (same_as<_Comp, __detail::_Synth3way>
  || same_as<_Comp, compare_three_way>)
  if constexpr (__is_byte_iter<_InputIter1>)
-   if constexpr (__is_byte_iter<_InputIter2>)
+   if constexpr (__is_byte_iter<_InputIter2>
+   && sizeof(*__first1) == sizeof(*__first2))
  {
const auto [__len, __lencmp] = _GLIBCXX_STD_A::
  __min_cmp(__last1 - __first1, __last2 - __first2);
if (__len)
  {
+   const auto __blen = __len * sizeof(*__first1);
const auto __c
- = __builtin_memcmp(&*__first1, &*__first2, __len) <=> 0;
+ = __builtin_memcmp(&*__first1, &*__first2, __blen) <=> 0;
if (__c != 0)
  return __c;
  }

[Bug target/113960] std::map with std::vector as input overwrites itself with c++20, on s390x platform

2024-02-27 Thread stefansf at linux dot ibm.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113960

--- Comment #9 from Stefan Schulze Frielinghaus  
---
(In reply to Jonathan Wakely from comment #7)
> We can't use memcmp if the sizes are different. We don't want to use the
> min, we want to guard that code with the sizes being the same, then we can
> just use len*sizeof(*first1) because we know it's the same as
> sizeof(*first2).

Hehe I was about to add another comment.  I just confused myself with taking
the minimum but we rather need to ensure that we are walking over same sized
integers.

LGTM

[Bug target/113960] std::map with std::vector as input overwrites itself with c++20, on s390x platform

2024-02-27 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113960

--- Comment #10 from Jonathan Wakely  ---
Oh I already defined a __is_memcmp_ordered_with trait, which does the same-size
check. I think that's what should be used here.

[Bug target/113960] std::map with std::vector as input overwrites itself with c++20, on s390x platform

2024-02-27 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113960

--- Comment #11 from Jonathan Wakely  ---
--- a/libstdc++-v3/include/bits/stl_algobase.h
+++ b/libstdc++-v3/include/bits/stl_algobase.h
@@ -1824,11 +1824,14 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
 }

 #if __cpp_lib_three_way_comparison
-  // Iter points to a contiguous range of unsigned narrow character type
-  // or std::byte, suitable for comparison by memcmp.
-  template
-concept __is_byte_iter = contiguous_iterator<_Iter>
-  && __is_memcmp_ordered>::__value;
+  // Both iterators refer to contiguous ranges of unsigned narrow characters,
+  // or std::byte, or big-endian unsigned integers, suitable for comparison
+  // using memcmp.
+  template
+concept __memcmp_ordered_with
+  = (__is_memcmp_ordered_with,
+ iter_value_t<_Iter2>>::__value)
+ && contiguous_iterator<_Iter1> && contiguous_iterator<_Iter2>;

   // Return a struct with two members, initialized to the smaller of x and y
   // (or x if they compare equal) and the result of the comparison x <=> y.
@@ -1878,20 +1881,20 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
   if (!std::__is_constant_evaluated())
if constexpr (same_as<_Comp, __detail::_Synth3way>
  || same_as<_Comp, compare_three_way>)
- if constexpr (__is_byte_iter<_InputIter1>)
-   if constexpr (__is_byte_iter<_InputIter2>)
- {
-   const auto [__len, __lencmp] = _GLIBCXX_STD_A::
- __min_cmp(__last1 - __first1, __last2 - __first2);
-   if (__len)
- {
-   const auto __c
- = __builtin_memcmp(&*__first1, &*__first2, __len) <=> 0;
-   if (__c != 0)
- return __c;
- }
-   return __lencmp;
- }
+ if constexpr (__memcmp_ordered_with<_InputIter1, _InputIter2>)
+   {
+ const auto [__len, __lencmp] = _GLIBCXX_STD_A::
+   __min_cmp(__last1 - __first1, __last2 - __first2);
+ if (__len)
+   {
+ const auto __blen = __len * sizeof(*__first1);
+ const auto __c
+   = __builtin_memcmp(&*__first1, &*__first2, __blen) <=> 0;
+ if (__c != 0)
+   return __c;
+   }
+ return __lencmp;
+   }

   while (__first1 != __last1)
{

[Bug modula2/114133] New: problem passing a string pointer to a C function on solaris 32 bit and 64 bit

2024-02-27 Thread gaius at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114133

Bug ID: 114133
   Summary: problem passing a string pointer to a C function on
solaris 32 bit and 64 bit
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: modula2
  Assignee: gaius at gcc dot gnu.org
  Reporter: gaius at gcc dot gnu.org
  Target Milestone: ---

This is a follow on from:

Bug 114026 - incorrect location during for loop type check

which occurred when I added a new testcase and was reported as failing:


Two of the new tests FAIL on 32 and 64-bit Solaris/SPARC:

+FAIL: gm2/extensions/run/pass/callingc10.mod execution,  -O 
+FAIL: gm2/extensions/run/pass/callingc10.mod execution,  -O -g 
+FAIL: gm2/extensions/run/pass/callingc10.mod execution,  -O3
-fomit-frame-point
er 
+FAIL: gm2/extensions/run/pass/callingc10.mod execution,  -O3
-fomit-frame-point
er -finline-functions 
+FAIL: gm2/extensions/run/pass/callingc10.mod execution,  -Os 
+FAIL: gm2/extensions/run/pass/callingc10.mod execution,  -g 
+FAIL: gm2/extensions/run/pass/callingc11.mod execution,  -O 
+FAIL: gm2/extensions/run/pass/callingc11.mod execution,  -O -g 
+FAIL: gm2/extensions/run/pass/callingc11.mod execution,  -O3
-fomit-frame-pointer 
+FAIL: gm2/extensions/run/pass/callingc11.mod execution,  -O3
-fomit-frame-pointer -finline-functions 
+FAIL: gm2/extensions/run/pass/callingc11.mod execution,  -Os 
+FAIL: gm2/extensions/run/pass/callingc11.mod execution,  -g 

The failure mode is the same for both:

parameter is hello and length 0
executed
/var/gcc/regression/master/11.4-gcc/build/gcc/testsuite/gm2/callingc10.x0 with
result fail

[Bug modula2/114026] incorrect location during for loop type check

2024-02-27 Thread gaius at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114026

Gaius Mulley  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|REOPENED|RESOLVED

--- Comment #6 from Gaius Mulley  ---
ah I'll open up a new PR as this is now a bug relating to passing a pointer
string to a C function.

For clarity and future searching the new PR follow on is:

Bug 114133 - problem passing a string pointer to a C function on solaris 32 bit
and 64 bit

marking the original FOR loop issue as resolved.

[Bug other/89863] [meta-bug] Issues in gcc that other static analyzers (cppcheck, clang-static-analyzer, PVS-studio) find that gcc misses

2024-02-27 Thread jeevitha at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89863
Bug 89863 depends on bug 106907, which changed state.

Bug 106907 Summary: gcc/config/rs6000/rs6000.cc:23155: strange expression ?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106907

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

[Bug target/106907] gcc/config/rs6000/rs6000.cc:23155: strange expression ?

2024-02-27 Thread jeevitha at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106907

Jeevitha  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #11 from Jeevitha  ---
Fixed

[Bug target/110320] ELFv2 pc-rel ABI extension allows using r2 as a volatile register

2024-02-27 Thread jeevitha at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110320

Jeevitha  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Jeevitha  ---
Fixed

[Bug target/110411] ICE on simple memcpy test case when allowing generation of vector pair load/store insns

2024-02-27 Thread jeevitha at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110411

Jeevitha  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #8 from Jeevitha  ---
Fixed

[Bug target/100799] Stackoverflow in optimized code on PPC

2024-02-27 Thread bergner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100799

--- Comment #31 from Peter Bergner  ---
(In reply to Jakub Jelinek from comment #30)
> Either tree parmdef = ssa_default_def (cfun, parm) is NULL, or has_zero_uses
> (parmdef).
> Not sure if has_zero_uses will work properly after some bbs are converted
> from GIMPLE to RTL, but maybe it will, I think the expansion generally
> doesn't gsi_remove statements it expands nor calls update_stmt on them.  One
> could always also just compute in generic code at the start of expansion the
> number of unused DECL_HIDDEN_STRING_LENGTH PARM_DECLs at the end of the
> argument list, save that as a flag in struct function or where and let the
> backends use it from there.

Ok, I think that gives us some idea what needs to be done.  I'll look for
someone in the team to have a look at implementing this workaround.  Thanks.

[Bug modula2/114133] problem passing a string pointer to a C function on solaris 32 bit and 64 bit

2024-02-27 Thread gaius at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114133

Gaius Mulley  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
   Last reconfirmed||2024-02-27

--- Comment #1 from Gaius Mulley  ---
The gimple IR looks correct, given the input code:

MODULE callingc10 ;

FROM cvararg IMPORT funcptr ;
FROM SYSTEM IMPORT ADR ;

BEGIN
   IF funcptr (1, "hello", 5) = 1
   THEN
   END ;
   IF funcptr (1, "hello" + " ", 6) = 1
   THEN
   END ;
   IF funcptr (1, "hello" + " " + "world", 11) = 1
   THEN
   END
END callingc10.

$ gm2 -g callingc10.mod -c -fdump-ipa-all
$ cat callingc10.mod.095i.comdats
...
PROC _M2_callingc10_init (INTEGER argc, PROC * argv, PROC * envp)
{
  INTEGER D.670;
  INTEGER D.669;
  INTEGER D.668;
  PROC * _T34.0_1;
  INTEGER _2;
  INTEGER _T35.1_3;
  PROC * _T36.2_4;
  INTEGER _5;
  INTEGER _T37.3_6;
  PROC * _T38.4_7;
  INTEGER _8;
  INTEGER _12;
  INTEGER _16;
  INTEGER _20;

   :
  _T34 = "hello";
  _T34.0_1 = _T34;
  _12 = funcptr (1, _T34.0_1, 5);
  _2 = _12;
  _T35 = _2;
  _T35.1_3 = _T35;

...

[Bug tree-optimization/114121] wrong code with _BitInt() arithmetics at -O2

2024-02-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114121

--- Comment #14 from Jakub Jelinek  ---
Tried 
__attribute__((noipa)) unsigned long
foo (unsigned long x)
{
  unsigned long y[128], z = 0, w = 0;
  y[127] = x;
  __builtin_memset (&y, 0, 127 * sizeof (long));
  for (unsigned long i = 0; i < 128; i += 2)
{
  unsigned long a = y[i], b, c, d;
  b = __builtin_subcl (0, a, z, &c);
  z = c;
  if (i >= 64)
{
  if (i == 64)
w = c != 0;
  else
w = (c != 0) | w;
}
  d = i + 1;
  a = y[d];
  b = __builtin_subcl (0, a, z, &c);
  z = c;
  if (d > 64)
w = (c != 0) | w;
}
  return w;
}
but that doesn't reproduce it unfortunately.

[Bug modula2/114133] problem passing a string pointer to a C function on solaris 32 bit and 64 bit

2024-02-27 Thread gaius at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114133

--- Comment #2 from Gaius Mulley  ---
Created attachment 57552
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57552&action=edit
Query proposed fix

Does this patch fix the problem?

[Bug c++/114134] New: Extra mov instructions for simple function compared with GCC13

2024-02-27 Thread pilarlatiesa at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114134

Bug ID: 114134
   Summary: Extra mov instructions for simple function compared
with GCC13
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pilarlatiesa at gmail dot com
  Target Milestone: ---

In the example below, the function `Key` has some extra (useless?) mov
instructions that are not generated with GCC 13.

$ cat borrar.cpp 

#include 

struct TVec3D { double x, y, z; };

struct TKey { int i, j, k; };

extern double const BinSize;

inline int Index(double const x)
  { return static_cast(std::floor(static_cast(x / BinSize + 1.0) -
1.0f)); };

TKey Key(TVec3D const &r)
  { return {Index(r.x), Index(r.y), Index(r.z)}; }


$ ./gcc-13/bin/g++ -O3 -march=skylake -fno-trapping-math -S borrar.cpp -o-
.file   "borrar.cpp"
.text
.p2align 4
.globl  _Z3KeyRK6TVec3D
.type   _Z3KeyRK6TVec3D, @function
_Z3KeyRK6TVec3D:
.LFB993:
.cfi_startproc
vmovsd  BinSize(%rip), %xmm1
vmovupd (%rdi), %xmm3
vmovddup.LC1(%rip), %xmm2
vmovddup%xmm1, %xmm0
vdivpd  %xmm0, %xmm3, %xmm0
vaddpd  %xmm2, %xmm0, %xmm0
vmovq   .LC2(%rip), %xmm2
vcvtpd2psx  %xmm0, %xmm0
vaddps  %xmm2, %xmm0, %xmm0
vroundps$9, %xmm0, %xmm0
vcvttps2dq  %xmm0, %xmm4
vmovsd  16(%rdi), %xmm0
vmovq   %xmm4, %rax
vdivsd  %xmm1, %xmm0, %xmm0
vaddsd  .LC1(%rip), %xmm0, %xmm0
vcvtsd2ss   %xmm0, %xmm0, %xmm0
vsubss  .LC3(%rip), %xmm0, %xmm0
vroundss$9, %xmm0, %xmm0, %xmm0
vcvttss2sil %xmm0, %edx
movl%edx, %edx
ret
.cfi_endproc
.LFE993:
.size   _Z3KeyRK6TVec3D, .-_Z3KeyRK6TVec3D
.section.rodata.cst8,"aM",@progbits,8
.align 8
.LC1:
.long   0
.long   1072693248
.align 8
.LC2:
.long   -1082130432
.long   -1082130432
.section.rodata.cst4,"aM",@progbits,4
.align 4
.LC3:
.long   1065353216
.ident  "GCC: (GNU) 13.1.0"
.section.note.GNU-stack,"",@progbits


$ ./gcc-14/bin/g++ -O3 -march=skylake -fno-trapping-math -S borrar.cpp -o-
.file   "borrar.cpp"
.text
.p2align 4
.globl  _Z3KeyRK6TVec3D
.type   _Z3KeyRK6TVec3D, @function
_Z3KeyRK6TVec3D:
.LFB1032:
.cfi_startproc
vmovsd  BinSize(%rip), %xmm2
vmovupd (%rdi), %xmm0
vmovddup%xmm2, %xmm1
vdivpd  %xmm1, %xmm0, %xmm0
vmovddup.LC1(%rip), %xmm1
vaddpd  %xmm1, %xmm0, %xmm0
vmovq   .LC2(%rip), %xmm1
vcvtpd2psx  %xmm0, %xmm0
vaddps  %xmm1, %xmm0, %xmm0
vroundps$9, %xmm0, %xmm0
vcvttps2dq  %xmm0, %xmm0
vmovq   %xmm0, %rdx
vmovsd  16(%rdi), %xmm0
vdivsd  %xmm2, %xmm0, %xmm0
vaddsd  .LC1(%rip), %xmm0, %xmm0
vcvtsd2ss   %xmm0, %xmm0, %xmm0
vsubss  .LC3(%rip), %xmm0, %xmm0
vroundss$9, %xmm0, %xmm0, %xmm0
vcvttss2sil %xmm0, %eax
movl%eax, %eax
movq%rax, %rdi
movq%rdx, %rax
movq%rdi, %rdx
ret
.cfi_endproc
.LFE1032:
.size   _Z3KeyRK6TVec3D, .-_Z3KeyRK6TVec3D
.section.rodata.cst8,"aM",@progbits,8
.align 8
.LC1:
.long   0
.long   1072693248
.align 8
.LC2:
.long   -1082130432
.long   -1082130432
.section.rodata.cst4,"aM",@progbits,4
.align 4
.LC3:
.long   1065353216
.ident  "GCC: (GNU) 14.0.0 20240112 (experimental)"
.section.note.GNU-stack,"",@progbits

[Bug modula2/114133] problem passing a string pointer to a C function on solaris 32 bit and 64 bit

2024-02-27 Thread gaius at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114133

--- Comment #3 from Gaius Mulley  ---
At a guess the problem was the ZTyped constant (1 and 5).  Now the gimple IR
shows these constants as integers:

$ cat callingc10.mod.095i.comdats
PROC _M2_callingc10_init (INTEGER argc, PROC * argv, PROC * envp)
{
  INTEGER D.676;
  INTEGER D.675;
  INTEGER D.674;
  INTEGER _T35.0_1;
  PROC * _T36.1_2;
  INTEGER _T34.2_3;
  INTEGER _4;
  INTEGER _T37.3_5;
  INTEGER _T39.4_6;
  PROC * _T40.5_7;
  INTEGER _T38.6_8;
  INTEGER _9;
  INTEGER _T41.7_10;
  INTEGER _T43.8_11;
  PROC * _T44.9_12;
  INTEGER _T42.10_13;
  INTEGER _14;
  INTEGER _20;
  INTEGER _26;
  INTEGER _32;

   :
  _T34 = 1;
  _T35 = 5;
  _T36 = "hello";
  _T35.0_1 = _T35;
  _T36.1_2 = _T36;
  _T34.2_3 = _T34;
  _20 = funcptr (_T34.2_3, _T36.1_2, _T35.0_1);
  _4 = _20;
  _T37 = _4;
  _T37.3_5 = _T37;

[Bug c++/101443] [9/10 Regression] internal compiler error: in wide_int_to_tree_1, at tree.c:1519

2024-02-27 Thread rawiener at amazon dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101443

Rafi Wiener  changed:

   What|Removed |Added

 Status|RESOLVED|CLOSED

--- Comment #14 from Rafi Wiener  ---
thanks

[Bug c++/114013] [14 Regression] Specializations of var templates no longer emitted since r14-8987

2024-02-27 Thread enrico.seiler+gccbugs at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114013

--- Comment #3 from Enrico Seiler  ---
For -O0 and -O1, this also does not link:

template  int value;
template <> inline int value<1>;
void bar(int) { bar(value<1>); }

https://godbolt.org/z/Wxv7PE8ob

[Bug tree-optimization/114041] wrong code with _BitInt() and -O -fgraphite-identity

2024-02-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114041

--- Comment #5 from Jakub Jelinek  ---
Reduced testcase:

unsigned a[24], b[24];

__attribute__((noipa)) unsigned
foo (unsigned _BitInt(4) x)
{
  for (int i = 0; i < 24; ++i)
a[i] = i;
  unsigned e = __builtin_stdc_bit_ceil (x);
  for (int i = 0; i < 24; ++i)
b[i] = i;
  return e;
}

int
main ()
{
  if (foo (0) != 1)
__builtin_abort ();
}

I have to confirm Andrew's comment, before the graphite dump there was
  if (x_14(D) > 1)
goto ; [59.00%]
  else
goto ; [41.00%]

   [local count: 17609365]:
  goto ; [100.00%]

   [local count: 25340307]:
  _2 = x_14(D) + 15;
  _3 = (unsigned int) _2;
  _4 = __builtin_clz (_3);
  _5 = 31 - _4;
  _6 = 2 << _5;
  iftmp.1_15 = (unsigned int) _6;

   [local count: 42949672]:
  # iftmp.1_10 = PHI 
This isn't part of any kind of loop, it is in between 2 different loops.
Graphite hoists some of the statements to bb 2 where it is unconditional:
  _32 = x_14(D) + 15;
  _33 = (unsigned int) _32;
the rest of it remains after the first loop, but is now unconditional:
   [count: 0]:
  _47 = 1;
  _31 = __builtin_clz (_33);
  _34 = 31 - _31;
  _35 = 2 << _34;
  iftmp.1_36 = (unsigned int) _35;
  _48 = iftmp.1_36;
  iftmp.1_37 = _48;
In the testcase x is 0, so __builtin_stdc_bit_ceil returns 1, but when we take
the > 1
path, it is 2 << (31 - 24) instead.
The above feels like what ifcvt would do, if that _47 in there stands for one
of the phi arguments and _48 for the other.  Except __builtin_clz invokes UB
when run on 0 (which is one of the reasons why it was guarded) and there is no
conditional merging at the end.

[Bug c++/114135] New: Diagnostic missing useful information for ranges code

2024-02-27 Thread barry.revzin at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114135

Bug ID: 114135
   Summary: Diagnostic missing useful information for ranges code
   Product: gcc
   Version: 13.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: barry.revzin at gmail dot com
  Target Milestone: ---

This is an example using Ranges:

#include 
#include 
using namespace std;

int main() {
auto rng = views::iota(0, 3);
const auto [a, b] = * ranges::min_element(views::cartesian_product(rng,
rng));
return 0;
}

This is an ill-formed program, the error given by gcc trunk is:

:7:25: error: no match for 'operator*' (operand type is
'std::ranges::borrowed_iterator_t, std::ranges::iota_view > >')
7 | const auto [a, b] = *
ranges::min_element(views::cartesian_product(rng, rng));
  |
^

This is all correct. However, it would be more helpful in this case for the
reader to also note that the type
std::ranges::borrowed_iterator_t is actually the type
std::ranges::dangling. Seeing "dangling" in the error message makes it a lot
easier to understand what the issue here actually is.

[Bug tree-optimization/114041] wrong code with _BitInt() and -O -fgraphite-identity

2024-02-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114041

--- Comment #6 from Jakub Jelinek  ---
unsigned a[24], b[24];

__attribute__((noipa)) unsigned
foo (unsigned char x)
{
  for (int i = 0; i < 24; ++i)
a[i] = i;
  unsigned e = __builtin_stdc_bit_ceil (x);
  for (int i = 0; i < 24; ++i)
b[i] = i;
  return e;
}

int
main ()
{
  if (foo (0) != 1)
__builtin_abort ();
}

works right, but s/unsigned char/unsigned _BitInt(8)/ does not, so it must be
something in graphite that handles INTEGER_TYPE and not all integral types.

[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function

2024-02-27 Thread lukas.graetz--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534

--- Comment #36 from Lukas Grätz  ---
(In reply to Jakub Jelinek from comment #35)
> If I hand edit the gcc trunk + PR114116 patch assembly, add to bar
> + .cfi_undefined 3
> + .cfi_undefined 12
> + .cfi_undefined 13
> + .cfi_undefined 14
> + .cfi_undefined 15
> then bt in gdb shows
> #2  0x004011d2 in baz (a=a@entry=42, b=b@entry=43, c=c@entry=44,
> d=, 
> e=, f= reading variable: value has been optimized out>, g=48, h=49) at /tmp/1.c:38


I can confirm that. What bothers me, is the wording "d=" and not just "d=".


(gdb) run
Starting program: bar-artificial-mod 

Program received signal SIGABRT, Aborted.

(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x77dcd859 in __GI_abort () at abort.c:79
#2  0x004011b1 in bar () at bar-artificial.c:30
#3  0x004011d2 in baz (a=a@entry=42, b=b@entry=43, c=c@entry=44,
d=,
e=,
f=,
g=48, h=49) at bar-artificial.c:38
#4  0x004012aa in qux () at bar-artificial.c:55
#5  0x004012e4 in main () at bar-artificial.c:62

(gdb) p a
No symbol "a" in current context.
(gdb) p b
No symbol "b" in current context.


> and everything in qux live across the call is  as well,
> (gdb) p $r12
> $10 = 
> etc. while without that
> (gdb) p a
> $1 = 
> (gdb) p b
> $2 = 
> (gdb) p c
> $3 = 
> (gdb) p d
> $4 = -559038737
> (gdb) p e
> $5 = -559038737
> (gdb) p f
> $6 = -559038737
> (gdb) p g
> $7 = -559038737
> (gdb) p h
> $8 = -559038737
> (gdb) p $r12
> $9 = 3735928559


Where did you set the breakpoint? When I set it somewhere in qux (after
a,b,c,... were initialized), I get conclusive results:


(gdb) break bar-artificial.c:52
Breakpoint 1 at 0x40124a: file bar-artificial.c, line 52.
(gdb) run
Breakpoint 1, qux () at bar-artificial.c:52
52corge (__builtin_alloca (foo (52)));
(gdb) p a
$1 = 42
(gdb) p b
$2 = 43
(gdb) p c
$3 = 44
(gdb) p d
$4 = 45
(gdb) p e 
$5 = 46
(gdb) p f
$6 = 47
(gdb) p g
$7 = 48
(gdb) p h
$8 = 49
(gdb) p $r12
$9 = 46

[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function

2024-02-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534

--- Comment #37 from Jakub Jelinek  ---
Nowhere, just run and when it stops due to abort, just up several times until
reaching the appropriate frame.

[Bug middle-end/114136] New: wrong code for c23 fully anonymous arg lists on arm

2024-02-27 Thread rearnsha at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114136

Bug ID: 114136
   Summary: wrong code for c23 fully anonymous arg lists on arm
   Product: gcc
   Version: 13.1.0
Status: UNCONFIRMED
  Keywords: wrong-code
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rearnsha at gcc dot gnu.org
  Target Milestone: ---
Target: arm

On arm, a fully anonymous c23-style function is called incorrectly.  All
arguments are passed on the stack while the receiving function expects r0-r3 to
be used for the initial arguments.

For example,

void f (...);

void g()
{
f (1, 2, 3, 4);
}

With gcc compiles to:

g:
push{lr}
movsr0, #1
movsr1, #2
sub sp, sp, #20
movsr2, #3
movsr3, #4
stm sp, {r0, r1, r2, r3}  // Arguments pushed to stack (wrong)
bl  f
add sp, sp, #20
ldr pc, [sp], #4

When the correct code (eg, as produced by clang) is something like

g:
mov r0, #1
mov r1, #2
mov r2, #3
mov r3, #4
b   f

compile with, eg 

arm-non-eabi-gcc -O2 -c23

[Bug middle-end/114136] wrong code for c23 fully anonymous arg lists on arm

2024-02-27 Thread rearnsha at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114136

Richard Earnshaw  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2024-02-27

[Bug middle-end/114136] wrong code for c23 fully anonymous arg lists on arm

2024-02-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114136

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||testsuite-fail

--- Comment #1 from Andrew Pinski  ---
The following testcases fail because of this:

FAIL: gcc.dg/c23-stdarg-4.c execution test
FAIL: gcc.dg/torture/c23-stdarg-split-1a.c   -O0  execution test
FAIL: gcc.dg/torture/c23-stdarg-split-1a.c   -O1  execution test
FAIL: gcc.dg/torture/c23-stdarg-split-1a.c   -O2  execution test
FAIL: gcc.dg/torture/c23-stdarg-split-1a.c   -O2 -flto -fno-use-linker-plugin
-flto-partition=none  execution test
FAIL: gcc.dg/torture/c23-stdarg-split-1a.c   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects  execution test
FAIL: gcc.dg/torture/c23-stdarg-split-1a.c   -O3 -g  execution test
FAIL: gcc.dg/torture/c23-stdarg-split-1a.c   -Os  execution test

[Bug modula2/113768] gm2/extensions/run/pass/vararg2.mod FAILs

2024-02-27 Thread gaius at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113768

Gaius Mulley  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2024-02-27
 Status|UNCONFIRMED |ASSIGNED

--- Comment #1 from Gaius Mulley  ---
Thanks this is a duplicate of Bug 114133 (or visa versa).

[Bug target/113871] psrlq is not used for PERM

2024-02-27 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113871

--- Comment #8 from GCC Commits  ---
The master branch has been updated by Uros Bizjak :

https://gcc.gnu.org/g:15d1dae0d4d1be88d28ad7578a60fd3e36de36d8

commit r14-9198-g15d1dae0d4d1be88d28ad7578a60fd3e36de36d8
Author: Uros Bizjak 
Date:   Tue Feb 27 18:41:24 2024 +0100

i386: psrlq is not used for PERM [PR113871]

Also handle V2BF mode.

PR target/113871

gcc/ChangeLog:

* config/i386/mmx.md (V248FI): Add V2BF mode.
(V24FI_32): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr113871-5a.c: New test.
* gcc.target/i386/pr113871-5b.c: New test.

  1   2   >