[Bug objc/108743] -fconstant-cfstrings not supported

2023-02-10 Thread ossman at cendio dot se via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108743

--- Comment #4 from Pierre Ossman  ---
I am indeed trying to compile for macOS. Specifically Qt5, which is designed
with just Xcode in mind.

[Bug objc/108743] -fconstant-cfstrings not supported

2023-02-10 Thread ossman at cendio dot se via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108743

--- Comment #5 from Pierre Ossman  ---
Could you consider adding -fconstant-cfstrings as an alias? It would make life
easier for making build systems compiler agnostic.

[Bug target/100758] __builtin_cpu_supports does not (always) detect "sse2"

2023-02-10 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100758

--- Comment #22 from Martin Liška  ---
Thank you Jakub, please revert my documentation patch if you are convinced
enough the change works only on old VIA CPUs.

[Bug c++/101099] [10/11/12/13 Regression] ICE in type_unification_real, at cp/pt.c:22173

2023-02-10 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101099

Martin Liška  changed:

   What|Removed |Added

 CC||jason at gcc dot gnu.org

--- Comment #6 from Martin Liška  ---
Well, it's fixed since r13-3639-ga4cd2389276a30c3 which is a revision that
handles default options. Is it really fixed?

[Bug libstdc++/77760] get_time needs to set tm_wday amd tm_yday

2023-02-10 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77760

--- Comment #7 from Jakub Jelinek  ---
(In reply to Alexandre Oliva from comment #5)
> As for tm bits, my suggestion was to overwrite tm fields internally, not to
> expose that externally.  They'd be used as scratch bits.  As in, member
> functions in the public interface would not use incoming tm bits as
> __time_get_state, but rather a zeroed-out __time_get_state structure, as
> today, but when calling the internal implementation primitive do_get, they'd
> *blindly* *overwrite* some of the tm bits with those from __time_get_state,
> and when do-get returns, they'd pull them back into __time_get_state and out
> of tm.  They'd be used as scratch bits, which AFAICT is permissible. 
> do_get, being protected and thus more of an internal implementation bit,
> could make use of those scratch bits.  do_get overriders could tweak them,
> for better or worse, but since this use would be nonstandard, we could
> probably get away with assuming any such uses to be libstdc++-specific.  It
> would probably not be wise for users to rely on this internal extension,
> though, since one can hope the standard will eventually  make room for an
> implementation of time_get that is both standard-compliant and compatible
> with reasonable strptime expectations.

If all users would initialize struct tm to zeros before calling the APIs, then
I can understand that it would work (but still libstdc++ would need to know
which of the calls are nested and which are the outermost, so that they'd
finalize the state before leaving the outermost method back to canonical form).
 But if some fields can be uninitialized, I really don't understand how you
could use them as scratch bits, they could have random values upon entering the
outermost method.

I don't know if C++ says anything about the prior content of struct tm (but
given the recursive processing of some format specifiers it is hard to imagine
they'd e.g. clear the whole struct), for strptime POSIX manpage says:
"It is unspecified whether multiple calls to strptime() using the same tm
structure will update the current contents of the structure or overwrite all
contents of the structure. Conforming applications should make a single call to
strptime() with a format and all data needed to completely specify the date and
time being converted."
and the Linux manpage:
"In principle, this function does not initialize tm but stores only the values
specified. This means that tm should be initialized before the call. Details
differ a bit between different UNIX systems. The glibc implementation does not
touch those fields which are not explicitly specified, except that it
recomputes the tm_wday and tm_yday field if any of the year, month, or day
elements changed."
Now, if even the POSIX manpage shows in an example a case where struct tm isn't
zeroed out before calling it (but has format specifiers to initialize
everything).

[Bug tree-optimization/108696] querying relations is slow

2023-02-10 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108696

--- Comment #4 from Richard Biener  ---
(In reply to Andrew Macleod from comment #2)
> Created attachment 54437 [details]
> possible patch
> 
> This patch should successfully short circuit unnecessary checks. untested in
> compiler.i
> 
> Where did you get a 17% time in DOM?  when I run compiler.i I get
> dominator optimization :  38.28 (  2%) 
> where most of the time is in
> machine dep reorg  :1447.64 ( 60%) 
> 
> I'll check this patch for correctness and to see if it generally makes any
> time improvements that are measurable elsewhere.

It helps the callgrind profile but doesn't show any effect on the -ftime-report
or a perf profile.  For the testcase the bitmap operations in ranger are
definitely visible in a perf profile (with call traces), and with -O1 it's
mostly DOM and jump-threading that perform any ranger ops when diagnostics are
disabled.

That said, not allocating the self-relation bitmaps at query time is
definitely good (not 100% sure if the patch achieves that).

[Bug middle-end/108685] [13 Regression] ICE in verify_loop_structure, at cfgloop.cc:1748 since r13-2388-ga651e6d59188da

2023-02-10 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108685

Martin Liška  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org,
   ||marxin at gcc dot gnu.org
Summary|[13 Regression] ICE in  |[13 Regression] ICE in
   |verify_loop_structure, at   |verify_loop_structure, at
   |cfgloop.cc:1748 |cfgloop.cc:1748 since
   ||r13-2388-ga651e6d59188da

--- Comment #2 from Martin Liška  ---
(In reply to Andrew Pinski from comment #1)
> Most likely r13-2388-ga651e6d59188da

Yes.

[Bug objc/108743] -fconstant-cfstrings not supported

2023-02-10 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108743

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #6 from Jakub Jelinek  ---
(In reply to Pierre Ossman from comment #5)
> Could you consider adding -fconstant-cfstrings as an alias? It would make
> life easier for making build systems compiler agnostic.

As it is a machine specific option, -mconstant-cfstrings is right, if it was a
generic option, -f* would make sense, but it is not.
If the option is MacOS specific in clang too, then they just misnamed the
option.

[Bug tree-optimization/108737] [13 Regression] Apparent miscompile of infinite loop on gcc trunk in cddce2 pass since r13-3875

2023-02-10 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108737

--- Comment #4 from Richard Biener  ---
I think this is another case where control dependences do not work as intended.

Marking useful stmt: foo ();

and we have

   [local count: 3508266]:
  x_4 = foo ();
  if (x_4 != 0)
goto ; [33.00%]
  else
goto ; [67.00%]

   [local count: 1157728]:

   [local count: 116930483]:

   [local count: 116930483]:
  foo ();
  goto ; [100.00%]


and BB5 is control dependent on BB3 (on the edge 3->5), so we mark the
block necessary but since there's nothing in it we do not make its
control dependences necessary.

I think what triggers this latent bug is the "double" forwarder with
the call in the latch block rather than in the header when one tries
this on

extern int foo();

void blah()
{
  int x = foo();
  if (x)
while (1) foo ();
}

it's probably latent with a GIMPLE testcase.

[Bug c/108734] powerpc: False Detection of __atomic_*_8 Builtins

2023-02-10 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108734

--- Comment #7 from Jonathan Wakely  ---
(In reply to David Edelsohn from comment #5)
> __has_builtin() does not mean that the builtin is inlined.  It only means
> that GCC recognizes the builtin.  That is how __has_builtin() is documented.
> In 32 bit mode, GCC emits an external reference for the builtin: 8 byte
> atomic requires libatomic library, which is not linked by default (and
> shouldn't be).

And this is consistent with e.g. __has_builtin(__builtin_strlen). The fact that
GCC recognizes the token "__builtin_strlen" doesn't mean that you don't need an
extern definition of strlen for cases where the built-in isn't inlined. It's
just that for the atomic built-ins the name of the built-in is the same as the
name of the extern function that might be used.

[Bug tree-optimization/108687] [13 Regression] Non-termination since r13-5630-g881bf8de9b0

2023-02-10 Thread stefansf at linux dot ibm.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108687

--- Comment #10 from Stefan Schulze Frielinghaus  ---
Can confirm the attached patch solves this issue.

[Bug target/100758] __builtin_cpu_supports does not (always) detect "sse2"

2023-02-10 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100758

Jakub Jelinek  changed:

   What|Removed |Added

 Resolution|WONTFIX |FIXED

--- Comment #23 from Jakub Jelinek  ---
I believe in the #c20 case you'd get
FEATURE_CMOV
FEATURE_MMX
FEATURE_SSE
FEATURE_SSE2
FEATURE_CMPXCHG8B
FEATURE_FXSAVE
FEATURE_POPCNT
FEATURE_SSE3
FEATURE_SSSE3
FEATURE_SSE4_1
FEATURE_CMPXCHG16B
FEATURE_LAHF_LM
FEATURE_LM
FEATURE_X86_64_BASELINE
set and in the #c21 case
FEATURE_CMOV
FEATURE_MMX
FEATURE_SSE
FEATURE_SSE2
FEATURE_CMPXCHG8B
FEATURE_FXSAVE
FEATURE_SSE3
If that matches what those CPUs provide (say compared to /proc/cpuinfo), then I
think we are good.  The change has been committed to trunk already, so you can
try it yourself (or apply the commit patch to say gcc 12).

[Bug tree-optimization/108748] New: Enhancement: track ranges of poly_int indeterminates

2023-02-10 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108748

Bug ID: 108748
   Summary: Enhancement: track ranges of poly_int indeterminates
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rsandifo at gcc dot gnu.org
  Target Milestone: ---

I've no idea how this would work in practice, just recording it as an
idea/TODO, but: it would be nice if code guarded by a range check on a poly_int
could be optimised for the implied range of the indeterminates.  Maybe ranger
could be taught to do this.

One simple example is:

#include 

uint64_t foo(uint64_t x)
{
  if (svcntb() == 2)
x += svcntb() * 100;
  return x;
}

which generates:

cntbx1
cmp x1, 2
cntdx1, all, mul #3
lsl x1, x1, 8
add x1, x0, x1
incbx1, all, mul #4
cselx0, x1, x0, eq
ret

whereas the equivalent:

#include 

uint64_t foo(uint64_t x)
{
  if (svcntb() == 2)
x += 200;
  return x;
}

generates:

cntbx1
cmp x1, 2
add x1, x0, 200
cselx0, x1, x0, eq
ret

A more realistic use case would be to guard a block of code with a particular
VL in the hope that the code would be optimised for that VL.

[Bug tree-optimization/108724] Poor codegen when summing two arrays without AVX or SSE

2023-02-10 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108724

Richard Biener  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=101801

--- Comment #4 from Richard Biener  ---
So the vectorizer thinks that

foo:
.LFB0:
.cfi_startproc
movabsq $9223372034707292159, %rax
movq%rdi, %rcx
movq(%rdx), %r10
movq(%rsi), %rdi
movq%rax, %r8
movq%rax, %r9
movq%rax, %r11
andq%r10, %r8
andq%rdi, %r9
addq%r8, %r9
movq%rdi, %r8
movabsq $-9223372034707292160, %rdi
xorq%r10, %r8
movq8(%rdx), %r10
andq%rdi, %r8
xorq%r9, %r8
movq%rax, %r9
movq%r8, (%rcx)
movq8(%rsi), %r8
andq%r10, %r9
andq%r8, %r11
xorq%r10, %r8
movq16(%rdx), %r10
addq%r11, %r9
andq%rdi, %r8
movq%rax, %r11
xorq%r9, %r8
movq%rax, %r9
andq%r10, %r11
movq%r8, 8(%rcx)
movq16(%rsi), %r8
andq%r8, %r9
xorq%r10, %r8
movq24(%rdx), %r10
addq%r11, %r9
andq%rdi, %r8
movq%rax, %r11
xorq%r9, %r8
movq%rax, %r9
andq%r10, %r11
movq%r8, 16(%rcx)
movq24(%rsi), %r8
andq%r8, %r9
xorq%r10, %r8
movq32(%rdx), %r10
addq%r11, %r9
andq%rdi, %r8
movq%rax, %r11
xorq%r9, %r8
movq%rax, %r9
andq%r10, %r11
movq%r8, 24(%rcx)
movq32(%rsi), %r8
andq%r8, %r9
xorq%r10, %r8
movq40(%rdx), %r10
addq%r11, %r9
andq%rdi, %r8
movq%rax, %r11
xorq%r9, %r8
movq%rax, %r9
andq%r10, %r11
movq%r8, 32(%rcx)
movq40(%rsi), %r8
andq%r8, %r9
addq%r11, %r9
xorq%r10, %r8
movq48(%rsi), %r10
movq%rax, %r11
andq%rdi, %r8
xorq%r9, %r8
movq%rax, %r9
andq%r10, %r11
movq%r8, 40(%rcx)
movq48(%rdx), %r8
movq56(%rdx), %rdx
andq%r8, %r9
xorq%r10, %r8
addq%r11, %r9
andq%rdi, %r8
xorq%r9, %r8
movq%r8, 48(%rcx)
movq56(%rsi), %r8
movq%rax, %rsi
andq%rdx, %rsi
andq%r8, %rax
xorq%r8, %rdx
addq%rsi, %rax
andq%rdi, %rdx
xorq%rdx, %rax
movq%rax, 56(%rcx)
ret

will be faster than when not vectorizing.  Not vectorizing produces

foo:
.LFB0:
.cfi_startproc
movq%rsi, %rcx
movl(%rsi), %esi
addl(%rdx), %esi
movl%esi, (%rdi)
movl4(%rdx), %esi
addl4(%rcx), %esi
movl%esi, 4(%rdi)
movl8(%rdx), %esi
addl8(%rcx), %esi
movl%esi, 8(%rdi)
movl12(%rdx), %esi
addl12(%rcx), %esi
movl%esi, 12(%rdi)
movl16(%rdx), %esi
addl16(%rcx), %esi
movl%esi, 16(%rdi)
movl20(%rdx), %esi
addl20(%rcx), %esi
movl%esi, 20(%rdi)
movl24(%rdx), %esi
addl24(%rcx), %esi
movl%esi, 24(%rdi)
movl28(%rdx), %esi
addl28(%rcx), %esi
movl%esi, 28(%rdi)
movl32(%rdx), %esi
addl32(%rcx), %esi
movl%esi, 32(%rdi)
movl36(%rdx), %esi
addl36(%rcx), %esi
movl%esi, 36(%rdi)
movl40(%rdx), %esi
addl40(%rcx), %esi
movl%esi, 40(%rdi)
movl44(%rdx), %esi
addl44(%rcx), %esi
movl%esi, 44(%rdi)
movl48(%rdx), %esi
addl48(%rcx), %esi
movl%esi, 48(%rdi)
movl52(%rdx), %esi
addl52(%rcx), %esi
movl%esi, 52(%rdi)
movl56(%rdx), %esi
movl60(%rdx), %edx
addl56(%rcx), %esi
addl60(%rcx), %edx
movl%esi, 56(%rdi)
movl%edx, 60(%rdi)
ret

The vectorizer produces un-lowered vector adds which is good in case followup
optimizations are possible (the ops are not obfuscated), but also bad
because unrolling estimates the size in a wrong way.  Costs go

*_3 1 times scalar_load costs 12 in prologue
*_5 1 times scalar_load costs 12 in prologue 
_4 + _6 1 times scalar_stmt costs 4 in prologue
_8 1 times scalar_store costs 12 in prologue

[Bug tree-optimization/108724] [11/12/13 Regression] Poor codegen when summing two arrays without AVX or SSE

2023-02-10 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108724

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2
Summary|Poor codegen when summing   |[11/12/13 Regression] Poor
   |two arrays without AVX or   |codegen when summing two
   |SSE |arrays without AVX or SSE
   Target Milestone|--- |11.4

[Bug other/108749] New: [OpenMP][C/C++/Fortran] inscan reduction modifier rejected for combined/composite constructs of simd/for/do

2023-02-10 Thread burnus at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108749

Bug ID: 108749
   Summary: [OpenMP][C/C++/Fortran] inscan reduction modifier
rejected for combined/composite constructs of
simd/for/do
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Keywords: openmp, rejects-valid
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: burnus at gcc dot gnu.org
CC: jakub at gcc dot gnu.org
  Target Milestone: ---

This applies to C, C++ and Fortran likewise.

test.c:6:37: error: ‘inscan’ ‘reduction’ clause on construct other than ‘for’,
‘simd’, ‘for simd’, ‘parallel for’, ‘parallel for simd’

6 |   #pragma omp target simd reduction (inscan, *:r)
  | ^


OpenMP 5.0 had the following:

"A reduction clause with the *inscan* reduction-modifier may only appear on a
worksharing-loop construct, a worksharing-loop SIMD construct, a simd
construct, a parallel worksharing-loop construct or a parallel worksharing-loop
SIMD construct."

But, a bit confusingly, it also had:

"If a construct to which the *inscan* reduction-modifier is
applied is combined with the *target* construct, the effect is as if the same
list item also appears in a map clause with a map-type of tofrom."

The latter implying that a combined construct is also permitted - while the
former rules it out.

 * * *

OpenMP 5.1 seemingly fixed this while 5.2 removed constructs with 'distribute'.
In any case, OpenMP 5.2 reads as follows:

"A reduction clause with the *inscan* reduction-modifier may only appear on a
worksharing-loop construct, a simd construct or a combined or composite
construct for which any of the aforementioned constructs is a constituent
construct and distribute is not a constituent construct." — ["5.5.8 reduction
Clause" under "Restrictions to the reduction clause are as follows:" (7th
bullet), [136:1-4]]

 * * *

I think this currently implies that the following ones should be supported
besides the exsisting '(parallel) {do|for} (simd)' and 'simd'.

OMP_MASKED_TASKLOOP_SIMD
OMP_MASTER_TASKLOOP_SIMD
OMP_PARALLEL_MASKED_TASKLOOP_SIMD
OMP_PARALLEL_MASTER_TASKLOOP_SIMD
OMP_TARGET_PARALLEL_DO
OMP_TARGET_PARALLEL_DO_SIMD
OMP_TARGET_SIMD
OMP_TASKLOOP_SIMD

[Bug tree-optimization/106249] [13 Regression] ICE in check_loop_closed_ssa_def, at tree-ssa-loop-manip.cc:645 since r13-1450-gd2a898666609452e

2023-02-10 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106249

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|REOPENED|RESOLVED

--- Comment #12 from Richard Biener  ---
This new testcase no longer ICEs for me on trunk.

[Bug other/108749] [OpenMP][C/C++/Fortran] inscan reduction modifier rejected for combined/composite constructs of simd/for/do

2023-02-10 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108749

--- Comment #1 from Jakub Jelinek  ---
We implement the 5.0 wording which was quite clear that only the selected
combined/composite constructs are allowed for it.  The clause handling wording
was added without considering the former (it wasn't initially there I believe).

Next step would be implement the 5.2 wording, which would be support those also
on
   #pragma omp masked taskloop simd
   #pragma omp master taskloop simd
   #pragma omp parallel masked taskloop simd
   #pragma omp parallel master taskloop simd
   #pragma omp target parallel for
   #pragma omp target parallel for simd
   #pragma omp target simd
   #pragma omp taskloop simd
I guess we could even do it for GCC 13, it is a simple change.

[Bug c/108718] [10/11/12/13 Regression] csmith: possible bad code with -O2

2023-02-10 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108718

--- Comment #6 from Richard Biener  ---
Huh, the change for sure triggered some latent issue, either in the testcase or
in GCC.  More analysis is needed (the testcase is large and obfuscated...).

[Bug tree-optimization/108750] New: Loop unswitching fails for poly_int conditions

2023-02-10 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108750

Bug ID: 108750
   Summary: Loop unswitching fails for poly_int conditions
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rsandifo at gcc dot gnu.org
  Target Milestone: ---

Loop unswitching fails to handle:

#include 

void foo(int *x, int *y, int z) {
  for (int i = 0; i < 100; ++i)
if (svcntb() >= 4)
  x[i] = y[i] + 1;
else
  y[i] += 1;
}

(although it does of course handle the result of replacing svcntb() with a
variable).  From a quick check, simply removing:

  /* At least the LHS needs to be symbolic.  */
  if (TREE_CODE (gimple_cond_lhs (stmt)) != SSA_NAME)
return;

seems to fix it, but I've no idea what the fallout of that would be.

[Bug tree-optimization/108751] New: Removing dead code results in worse optimization at -Os

2023-02-10 Thread theodort at inf dot ethz.ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108751

Bug ID: 108751
   Summary: Removing dead code results in worse optimization at
-Os
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: theodort at inf dot ethz.ch
  Target Milestone: ---

I found this case where slight changes in the program that, in theory, should
not affect the output (or affect it trivially) cause the compiler to generate
worse code: 

static int a = 0;
static int b = 1;
int main() {
  char c = 0;
  for (;;) {
if (c)
  break;
for (; a; a++) { // a is 0, this loop is dead
  if (b) // this is always true
continue;
  else
return 2; // this program will never return 2
}
c = 10;
  }
  return 3;
}

compiled with gcc-trunk -Os: 

main:
.L2:
movla(%rip), %eax
testl   %eax, %eax
je  .L6
incl%eax
movl%eax, a(%rip)
jmp .L2
.L6:
movl$3, %eax
ret

Clearly, the compiler has figured out that "return 2;" will never be executed.
But if I remove it from the source:

static int a = 0;
static int b = 1;
int main() {
  char c = 0;
  for (;;) {
if (c)
  break;
for (; a; a++) {
  if (b)
continue;
  //else
  // return 2;
}
c = 10;
  }
  return 3;
}

and compile with gcc-trunk -Os again:

main:
movla(%rip), %eax
xorl%edx, %edx
.L2:
testl   %eax, %eax
jne .L4
testb   %dl, %dl
je  .L7
xorl%eax, %eax
movl%eax, a(%rip)
jmp .L7
.L4:
incl%eax
movb$1, %dl
jmp .L2
.L7:
movl$3, %eax
ret

the generated code is worse. 

The same thing happens if the return value is changed:

static int a = 0;
static int b = 1;
int main() {
  char c = 0;
  for (;;) {
if (c)
  break;
for (; a; a++) {
  if (b)
continue;
  else
return 2;
}
c = 10;
  }
  return 1; // changed from return 3
}

gcc-trunk -Os: 

main:
movla(%rip), %eax
xorl%edx, %edx
.L2:
testl   %eax, %eax
jne .L4
testb   %dl, %dl
je  .L7
xorl%eax, %eax
movl%eax, a(%rip)
jmp .L7
.L4:
incl%eax
movb$1, %dl
jmp .L2
.L7:
movl$1, %eax
ret

and if we constant propagate b:

static int a = 0;
int main() {
  char c = 0;
  for (;;) {
if (c)
  break;
for (; a; a++) {
  if (1) // this was if (b) before
continue;
  else
return 2;
}
c = 10;
  }
  return 1;
}

gcc-trunk -Os:

main:
movla(%rip), %eax
xorl%edx, %edx
.L2:
testl   %eax, %eax
jne .L12
testb   %dl, %dl
je  .L7
xorl%eax, %eax
movl%eax, a(%rip)
jmp .L7
.L12:
incl%eax
movb$1, %dl
jmp .L2
.L7:
movl$1, %eax
ret

[Bug other/108749] [OpenMP][C/C++/Fortran] inscan reduction modifier rejected for combined/composite constructs of simd/for/do

2023-02-10 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108749

--- Comment #2 from Jakub Jelinek  ---
Actually, I don't see how inscan be implemented on taskloop, so I'd say both
5.1 and 5.2 are wrong and it should be neither distribute nor taskloop are
constituent.
   #pragma omp target parallel for
   #pragma omp target parallel for simd
   #pragma omp target simd
should work fine, those just map(tofrom:) the reduction variable.

[Bug tree-optimization/108751] Removing dead code results in worse generated target code at -Os

2023-02-10 Thread theodort at inf dot ethz.ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108751

--- Comment #1 from Theodoros Theodoridis  ---
I am not sure if this qualifies as a "bug"/missed optimization but I'd be
interested in understanding why these changes cause such a difference. Thanks!

[Bug other/108749] [OpenMP][C/C++/Fortran] inscan reduction modifier rejected for combined/composite constructs of simd/for/do

2023-02-10 Thread burnus at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108749

--- Comment #3 from Tobias Burnus  ---
(In reply to Jakub Jelinek from comment #2)
> Actually, I don't see how inscan be implemented on taskloop

The proposed extension of the restriction is now tracked in the OpenMP
specification Issue 3489.

[Bug other/108749] [OpenMP][C/C++/Fortran] inscan reduction modifier rejected for combined/composite constructs of simd/for/do

2023-02-10 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108749

--- Comment #4 from Jakub Jelinek  ---
Perhaps it is implementable also for taskloop, but with a lot of work.
The way how e.g. for/do works with inscan is that the two parts of the loop are
split up, and one essentially gets two worksharing loops with the same number
of iterations, one doing one part, then some single (or in parallel)
middle-part and then another doing the other part.
With tasks, perhaps we could create separate tasks for the two halves, spawn a
taskloop that does one part, then a task that depends on all those tasks and
does the merging in the middle and finally another taskloop that does the other
part.
But I think this is something that hasn't been even considered when just
tweaking the wording.  After all, if we want to allow inscan on taskloop simd
and constructs combined with that, we'd first want to allow it on taskloop
itself.

[Bug tree-optimization/108724] [11/12/13 Regression] Poor codegen when summing two arrays without AVX or SSE

2023-02-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108724

--- Comment #5 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:dc87e1391c55c666c7ff39d4f0dea87666f25468

commit r13-5771-gdc87e1391c55c666c7ff39d4f0dea87666f25468
Author: Richard Biener 
Date:   Fri Feb 10 11:07:30 2023 +0100

tree-optimization/108724 - vectorized code getting piecewise expanded

This fixes an oversight to when removing the hard limits on using
generic vectors for the vectorizer to enable both SLP and BB
vectorization to use those.  The vectorizer relies on vector lowering
to expand plus, minus and negate to bit operations but vector
lowering has a hard limit on the minimum number of elements per
work item.  Vectorizer costs for the testcase at hand work out
to vectorize a loop with just two work items per vector and that
causes element wise expansion and spilling.

The fix for now is to re-instantiate the hard limit, matching what
vector lowering does.  For the future the way to go is to emit the
lowered sequence directly from the vectorizer instead.

PR tree-optimization/108724
* tree-vect-stmts.cc (vectorizable_operation): Avoid
using word_mode vectors when vector lowering will
decompose them to elementwise operations.

* gcc.target/i386/pr108724.c: New testcase.

[Bug tree-optimization/108724] [11/12 Regression] Poor codegen when summing two arrays without AVX or SSE

2023-02-10 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108724

Richard Biener  changed:

   What|Removed |Added

  Known to work||13.0
Summary|[11/12/13 Regression] Poor  |[11/12 Regression] Poor
   |codegen when summing two|codegen when summing two
   |arrays without AVX or SSE   |arrays without AVX or SSE

--- Comment #6 from Richard Biener  ---
Fixed on trunk sofar.

[Bug tree-optimization/108752] New: word_mode vectorization is pessimized by hard limit on nunits

2023-02-10 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108752

Bug ID: 108752
   Summary: word_mode vectorization is pessimized by hard limit on
nunits
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

r13-5771-gdc87e1391c55c6 re-introduced a hard nunits limit to the vectorizer
when using emulated vectors (aka word_mode vectorization).  That's because
this feature relies on vector lowering to implement plus, minus and negate
with bit operations and that has such limit in place for when dealing with
user written code that didn't have any cost modeling applied.

The fix is to emit supported operations from the vectorizer.

[Bug tree-optimization/108752] word_mode vectorization is pessimized by hard limit on nunits

2023-02-10 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108752

Richard Biener  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
   Keywords||missed-optimization
 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2023-02-10
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
Mine, hopefully for GCC 14.

[Bug tree-optimization/108752] word_mode vectorization is pessimized by hard limit on nunits

2023-02-10 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108752

--- Comment #2 from Richard Biener  ---
Created attachment 54447
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54447&action=edit
prototype

Prototype patch.  Would benefit from a vect_finish_stmt_generation with
a gimple_seq overload and using gimple_build and some interleaved comments.

[Bug c++/105593] avx512 math function raises uninitialized variable warning

2023-02-10 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593

Jakub Jelinek  changed:

   What|Removed |Added

 CC||mpolacek at gcc dot gnu.org

--- Comment #22 from Jakub Jelinek  ---
In order to backport these changes, we'd need to backport PR102633 changes
first.
Marek, do you think it is ok?

[Bug c++/87656] Useful flags to enable with -Wall or -Wextra

2023-02-10 Thread tschwinge at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87656

Thomas Schwinge  changed:

   What|Removed |Added

 CC||tschwinge at gcc dot gnu.org

--- Comment #19 from Thomas Schwinge  ---
(In reply to David Binderman from comment #6)
> I'd like to vote for -Wduplicated-cond being in either -Wextra or -Wall.
> 
> [...] it is proving useful in finding bugs [...]

Generally ACK, but note that '-Wduplicated-cond' once has been in '-Wall', but
then again was removed; PR67819,
.

[Bug libgcc/108279] Improved speed for float128 routines

2023-02-10 Thread already5chosen at yahoo dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279

--- Comment #24 from Michael_S  ---
(In reply to Michael_S from comment #22)
> (In reply to Michael_S from comment #8)
> > (In reply to Thomas Koenig from comment #6)
> > > And there will have to be a decision about 32-bit targets.
> > >
> > 
> > IMHO, 32-bit targets should be left in their current state.
> > People that use them probably do not care deeply about performance.
> > Technically, I can implement 32-bit targets in the same sources, by means of
> > few ifdefs and macros, but resulting source code will look much uglier than
> > how it looks today. Still, not to the same level of horror that you have in
> > matmul_r16.c, but certainly uglier than how I like it to look.
> > And I am not sure at all that my implementation of 32-bit targets would be
> > significantly faster than current soft float.
> 
> I explored this path (implementing 32-bit and 64-bit targets from the same
> source with few ifdefs) a little more:
> Now I am even more sure that it is not a way to go. gcc compiler does not
> generate good 32-bit code for this style of sources. This especially applies
> to i386, other supported 32-bit targets (RV32, SPARC32) are affected less.
> 

I can't explain to myself why I am doing it, but I did continue exploration of
32-bit targets. Well, not quite "targets", I don't have SPARC32 or RV32 to
play. So, I did continue exploration of i386.
As said above, using the same code for 32-bit and 64-bit does not produce
acceptable results. But pure 32-bit source did better than what I expected.
So when 2023-01-13 I wrote "And I am not sure at all that my implementation of
32-bit targets would be significantly faster than current soft float" I was
wrong. My implementation of 32-bit targets (i.e. i386) is significantly faster
than current soft float. Up to 3 times faster on Zen3, approximately 2 times
faster on various oldish Intel CPUs.
Today I put 32-bit sources into my github repository.

I am still convinced that improving performance of IEEE binary128 on 32-bit
targets is wastage of time, but since the time is already wasted may be results
can be used.

And may be, it can be used to bring IEEE binary128 to the Arm Cortex-M, where
it can be moderately useful in some situations.

[Bug c/108753] New: '-Wduplicated-cond' doesn't diagnose duplicated subexpressions

2023-02-10 Thread tschwinge at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108753

Bug ID: 108753
   Summary: '-Wduplicated-cond' doesn't diagnose duplicated
subexpressions
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Keywords: diagnostic
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tschwinge at gcc dot gnu.org
CC: mpolacek at gcc dot gnu.org
  Target Milestone: ---

Created attachment 54448
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54448&action=edit
pr.c

Shouldn't '-Wduplicated-cond' be able to diagnose the XFAILed duplicated
subexpressions?  (In the attached 'pr.c', 'f2' is reduced from real-world
code.)

This works:

if (a == 5) // { dg-note {previously used here} }
  return 30;
else if (a == 5) // { dg-warning {duplicated 'if' condition} }
  return 40;

..., but this and similar ones don't:

if (a == 5) // { dg-note {previously used here} TODO { xfail *-*-* } }
  return 30;
else if (a == 5 // { dg-warning {duplicated 'if' condition} TODO { xfail
*-*-* } }
 || a == 6)
  return 40;

[Bug tree-optimization/108750] Loop unswitching fails for poly_int conditions

2023-02-10 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108750

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2023-02-10

--- Comment #1 from Richard Biener  ---
probably needs some other guards in strathegic places, other than that there's
no reason it shouldn't work ...

[Bug tree-optimization/108748] Enhancement: track ranges of poly_int indeterminates

2023-02-10 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108748

Richard Biener  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
 CC||amacleod at redhat dot com
   Last reconfirmed||2023-02-10

--- Comment #1 from Richard Biener  ---
The GIMPLE we see sems to be

uint64_t foo (uint64_t x)
{
   :
  if (POLY_INT_CST [16, 16] == 2)
goto ; [INV]
  else
goto ; [INV]

   :
  x_6 = x_5(D) + POLY_INT_CST [1600, 1600];

   :
  # x_4 = PHI 
  return x_4;

so any such optimization would derive the TU wide constant(?) N that is
applied to all POLY_INT_CSTs?  Is there a set_svcntb (..) intrinsic that
would clobber such knowledge?

That said, we'd track a "virtual" variables range here.  For the above
I wonder why we cannot constant fold it - [16, 16] can never be 2, no?

[Bug tree-optimization/108500] [11/12 Regression] -O -finline-small-functions results in "internal compiler error: Segmentation fault" on a very large program (700k function calls)

2023-02-10 Thread vmakarov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108500

--- Comment #20 from Vladimir Makarov  ---
(In reply to Richard Biener from comment #14)
> Thanks for the new testcase.  With -O0 (and a --enable-checking=release
> built compiler) this builds in ~11 minutes (on a Ryzen 9 7900X) with
> 
>  integrated RA  :  38.96 (  6%)   1.94 ( 20%)  42.00 ( 
> 6%)  3392M ( 23%)
>  LRA non-specific   :  18.93 (  3%)   1.24 ( 13%)  23.78 ( 
> 4%)   450M (  3%)
>  LRA virtuals elimination   :   5.67 (  1%)   0.05 (  1%)   5.75 ( 
> 1%)   457M (  3%)
>  LRA reload inheritance : 318.25 ( 49%)   0.24 (  2%) 318.51 (
> 48%) 0  (  0%)
>  LRA create live ranges : 199.24 ( 31%)   0.12 (  1%) 199.38 (
> 30%)   228M (  2%)
> 645.67user 10.29system 11:04.42elapsed 98%CPU (0avgtext+0avgdata
> 30577844maxresident)k
> 3936200inputs+1091808outputs (122053major+10664929minor)pagefaults 0swaps
>

I've tried test-1M.i with -O0 for clang-14.  It took about 12hours on E5-2697
v3 vs about 30min for GCC.  The most time (99%) of clang is spent in "fast
register allocator":

  Total Execution Time: 42103.9395 seconds (42243.9819 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  ---
Name ---
  41533.7657 ( 99.5%)  269.5347 ( 78.6%)  41803.3005 ( 99.3%)  41942.4177 (
99.3%)  Fast Register Allocator
  139.1669 (  0.3%)  16.4785 (  4.8%)  155.6454 (  0.4%)  156.3196 (  0.4%) 
X86 DAG->DAG Instruction Selection

I've tried the same for -O1.  Again gcc took about 30min and I stopped clang
(with another used RA algorithm) after 120hours.

So the situation with RA is not so bad for GCC.  But in any case I'll try to
improve the speed for this case.

> so register allocation taking all of the time.  There's maybe the possibility
> to gate some of its features on the # of BBs or insns (or whatever the actual
> "bad" thing is - I didn't look closer yet).
> 
> It also seems to use 30GB of peak memory at -O0 ...
> 

I see only 3GB.  Improving this is hard task.  The IRA for -O0 uses very simple
algorithm with usage of very few resources.  We could use even simpler method
(assigning memory only for all pseudos) but I think it does not worth to do as
the generated code will be much bigger and probably will be 1.5-2 times slower.

[Bug tree-optimization/108748] Enhancement: track ranges of poly_int indeterminates

2023-02-10 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108748

--- Comment #2 from rsandifo at gcc dot gnu.org  
---
(In reply to Richard Biener from comment #1)
> That said, we'd track a "virtual" variables range here.  For the above
> I wonder why we cannot constant fold it - [16, 16] can never be 2, no?
Hah!  Yes.  Looks like I inadvertently filed a second bug.

The test was supposed to compare with 16 or use svcntd().

[Bug tree-optimization/108520] [13 Regression] ICE in nonnull_arg_p, at tree.cc:14372 with -O1 and above (gnu::assume and gnu::nonnull)

2023-02-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108520

--- Comment #5 from CVS Commits  ---
The master branch has been updated by Andrew Macleod :

https://gcc.gnu.org/g:99f3ad2e5b117ee79a6dcf97288261e2fa32ab4c

commit r13-5806-g99f3ad2e5b117ee79a6dcf97288261e2fa32ab4c
Author: Andrew MacLeod 
Date:   Mon Feb 6 13:07:01 2023 -0500

Add function context for querying global ranges.

When processing arguments for assume functions, call get_global_range
directly and utilize a function context pointer to avoid any assumptions
about using cfun.

PR tree-optimization/108520
gcc/
* gimple-range-infer.cc (check_assume_func): Invoke
gimple_range_global directly instead using global_range_query.
* value-query.cc (get_range_global): Add function context and
avoid calling nonnull_arg_p if not cfun.
(gimple_range_global): Add function context pointer.
* value-query.h (imple_range_global): Add function context.

gcc/testsuite/
* g++.dg/pr108520.C: New.

[Bug tree-optimization/108687] [13 Regression] Non-termination since r13-5630-g881bf8de9b0

2023-02-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108687

--- Comment #11 from CVS Commits  ---
The master branch has been updated by Andrew Macleod :

https://gcc.gnu.org/g:6493b7af37e473a89c67afab474330f931dd8447

commit r13-5807-g6493b7af37e473a89c67afab474330f931dd8447
Author: Andrew MacLeod 
Date:   Thu Feb 9 17:50:07 2023 -0500

Query rangers cache in readonly mode only from within

The change for 108356 allowed the cache to scan the dominator trees when
it was attempting a lookup rather than using the local value.  I
inadvertantly changed the externbal interface to also do this, so all
the GORI queries via range_on_edge of the cache could also do lookups.

This triggered a quadratic, possible expoential time increase when
the right conditions were presented. That being a cascading series of
recomputaions on outgoing edge calucaltions that at then searched the dom
tree
instead of being a simple calcualtion using whats easily available.

The fix is to use the internal API within the cache rather than the
extrenal one that GORI uses.   This leaves GORI computations to be
resovled in linear time.

PR tree-optimization/108687
gcc/
* gimple-range-cache.cc (ranger_cache::range_on_edge): Revert
back to RFD_NONE mode for calculations.
(ranger_cache::propagate_cache): Call the internal edge range API
with RFD_READ_ONLY instead of changing the external routine.

[Bug tree-optimization/108696] querying relations is slow

2023-02-10 Thread amacleod at redhat dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108696

--- Comment #5 from Andrew Macleod  ---
(In reply to Richard Biener from comment #4)

> 
> That said, not allocating the self-relation bitmaps at query time is
> definitely good (not 100% sure if the patch achieves that).

 If it can determine ahead of time that it isn't needed, then the self maps do
not get allocated. If it actually has to do a query, then they are allocated.

Im still looking around at this... It was simpler to always do bitmap vs bitmap
comparison rather than the full set of combinations.  Figured I would revisit
it if it was a performance concern.  I will experiment with what happens if we
expand the API, at least internally, to do ssa vs ssa, bitmap vs bitmap, and
ssa vs bitmap checks.

[Bug tree-optimization/108520] [13 Regression] ICE in nonnull_arg_p, at tree.cc:14372 with -O1 and above (gnu::assume and gnu::nonnull)

2023-02-10 Thread amacleod at redhat dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108520

Andrew Macleod  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Andrew Macleod  ---
fixed.

[Bug tree-optimization/108687] [13 Regression] Non-termination since r13-5630-g881bf8de9b0

2023-02-10 Thread amacleod at redhat dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108687

Andrew Macleod  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #12 from Andrew Macleod  ---
should be fixed.

[Bug tree-optimization/108705] [13 Regression] Unexpected CPU time usage with LTO in ranger propagation

2023-02-10 Thread amacleod at redhat dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108705

--- Comment #8 from Andrew Macleod  ---
This fix I just checked in for 108687 exhibited similar performance
characteristics, also in the same pass.. Perhaps it will fix your problem.

[Bug c/108753] '-Wduplicated-cond' doesn't diagnose duplicated subexpressions

2023-02-10 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108753

Marek Polacek  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Severity|normal  |enhancement
   Last reconfirmed||2023-02-10

--- Comment #1 from Marek Polacek  ---
It probably could.

[Bug c++/105593] avx512 math function raises uninitialized variable warning

2023-02-10 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593

--- Comment #23 from Marek Polacek  ---
I'm somewhat uneasy about backporting PR102633, to be honest.  But I could try
and test gcc 12 to see if it causes any problems, if you want me to.

[Bug c++/105593] avx512 math function raises uninitialized variable warning

2023-02-10 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593

--- Comment #24 from Jakub Jelinek  ---
(In reply to Marek Polacek from comment #23)
> I'm somewhat uneasy about backporting PR102633, to be honest.  But I could
> try and test gcc 12 to see if it causes any problems, if you want me to.

I think causing more warnings than were emitted before on the release branches
is something we should avoid, but this is the opposite, we used to emit
warnings which shouldn't be emitted and don't do that anymore.
Anyway, I'm afraid without the PR102633 changes I don't know how else to solve
this PR which seems quite important (-Wall warning on anything using
_mm*_undefined*, directly or indirectly).

[Bug c++/101099] [10/11/12/13 Regression] ICE in type_unification_real, at cp/pt.c:22173

2023-02-10 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101099

--- Comment #7 from Marek Polacek  ---
(In reply to Martin Liška from comment #6)
> Well, it's fixed since r13-3639-ga4cd2389276a30c3 which is a revision that
> handles default options. Is it really fixed?

Ah, that commit explains that this is not fixed then; the ICE still happens
with -fconcepts-ts.  Low prio, but still a bug.  Thanks for the bisect.

[Bug c++/105593] avx512 math function raises uninitialized variable warning

2023-02-10 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593

--- Comment #25 from Marek Polacek  ---
Okay, let me test the backport then.

[Bug middle-end/102633] [11/12 Regression] warning for self-initialization despite -Wno-init-self

2023-02-10 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102633

Marek Polacek  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|FIXED   |---

--- Comment #10 from Marek Polacek  ---
Will try to backport to 12 in order to unblock bug 105593.

[Bug middle-end/24639] [meta-bug] bug to track all Wuninitialized issues

2023-02-10 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=24639
Bug 24639 depends on bug 102633, which changed state.

Bug 102633 Summary: [11/12 Regression] warning for self-initialization despite 
-Wno-init-self
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102633

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|FIXED   |---

[Bug c++/105593] avx512 math function raises uninitialized variable warning

2023-02-10 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593

--- Comment #26 from Jakub Jelinek  ---
(In reply to Marek Polacek from comment #25)
> Okay, let me test the backport then.

Well, I already have 40 backports in my 12 tree, so could add your commit and
the 3 from this PR above it.

[Bug c++/105593] avx512 math function raises uninitialized variable warning

2023-02-10 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593

--- Comment #27 from Marek Polacek  ---
Ah, I'm not even sure if it applies cleanly but if it does, go ahead.

[Bug c++/105593] avx512 math function raises uninitialized variable warning

2023-02-10 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593

--- Comment #28 from Jakub Jelinek  ---
(In reply to Marek Polacek from comment #27)
> Ah, I'm not even sure if it applies cleanly but if it does, go ahead.

It does apply cleanly, and the new c-c++-common/Winit-self1.c FAILs without it
and PASSes with it.
When backporting I do such a smoke test on just the testcases from the PR on
each patch separately and then test everything together.  So, queued now.

[Bug tree-optimization/108705] [13 Regression] Unexpected CPU time usage with LTO in ranger propagation

2023-02-10 Thread rimvydas.jas at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108705

--- Comment #9 from Rimvydas (RJ)  ---
(In reply to Andrew Macleod from comment #8)
> This fix I just checked in for 108687 exhibited similar performance
> characteristics, also in the same pass.. Perhaps it will fix your problem.

Thank you!  Will have to check original cases still, but for testcase variants
even bumping calls count from 16 to 222 now takes only:
 assumed size:   ~6s dominator optimization 4.97 ( 85%)
 assumed shape: ~20s callgraph functions expansion 16.47 ( 79%)
  mainly callgraph ipa passes (22%)+alias stmt walk (11%)+tree ssa inc (14%)

[Bug objc/108743] -fconstant-cfstrings not supported

2023-02-10 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108743

--- Comment #7 from Andrew Pinski  ---
Hmm,
https://inbox.sourceware.org/gcc-patches/b4f496f4-f31d-41d2-8942-1f0aefbd7...@sandoe-acoustics.co.uk/

Seems didn't get installed even though it was approved ...

[Bug c++/105841] [12/13 Regression] Change in behavior of CTAD for alias templates

2023-02-10 Thread mike at spertus dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105841

--- Comment #9 from Mike Spertus  ---
Hi Jason,
Very exciting. Some additional tests: Both versions of
https://godbolt.org/z/aM93PEWcz should be included in the tests. There are
two versions of the deduction guides in the godbolt. Either guide should
work. (They may already work, but it remains a good test). See
https://mspertus.github.io/MSVC_CTAD_1428672/ for more details.

In addition, the examples at https://godbolt.org/z/PjqMa8T35 should work.
Do you want versions of these that do not include standard library headers?

Thanks,
Mike

On Thu, Feb 9, 2023 at 3:26 PM  wrote:

> Attachments with a MIME type of "text/html" are not allowed on this
> installation.
>
> Michael Spertus wrote:
> > Thanks, Jason! My course starts in 6 minutes, so I can't look at it now
> but
> > will give you feedback by 8:30AM tomorrow.
> >
> > Mike
> >
> > On Thu, Feb 9, 2023 at 3:07 PM jason at gcc dot gnu.org <
> > gcc-bugzi...@gcc.gnu.org> wrote:
> >
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105841
> > >
> > > --- Comment #7 from Jason Merrill  ---
> > > Created attachment 5
> > >   --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=5&action=edit
> > > fix
> > >
> > > Here's a patchset to implement the standard behavior plus the CWG2664
> > > clarification.  Mike, does this look good to you?  Any additional
> > > testcases?
> > >
> > > Also pushed to refs/users/jason/heads/alias-ctad in the git repository.
> > > (
> > >
> https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/users/jason/heads/alias-ctad
> > > )
> > >
> > > --
> > > You are receiving this mail because:
> > > You are on the CC list for the bug.
>

[Bug middle-end/108754] New: [13 Regression] multiple testsuite errors with r13-5761-g10827a92f1a8c3

2023-02-10 Thread hp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108754

Bug ID: 108754
   Summary: [13 Regression] multiple testsuite errors with
r13-5761-g10827a92f1a8c3
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hp at gcc dot gnu.org
CC: vmakarov at gcc dot gnu.org
  Target Milestone: ---
  Host: x86_64-pc-linux-gnu
Target: cris-elf

I've pin-pointed r13-5761-g10827a92f1a8c3 as the cause of these test-suite
regressions for cris-elf:

gcc.sum gcc.c-torture/execute/960215-1.c
gcc.sum gcc.c-torture/execute/ieee/mzero3.c
gcc.sum gcc.c-torture/execute/pr69447.c
gcc.sum gcc.c-torture/execute/regstack-1.c
gfortran.sum gfortran.dg/bind-c-contiguous-3.f90
gfortran.sum gfortran.dg/c-interop/cf-out-descriptor-4.f90
gfortran.sum gfortran.dg/class_allocate_22.f90
gfortran.sum gfortran.dg/func_derived_1.f90
gfortran.sum gfortran.dg/inline_matmul_14.f90
gfortran.sum gfortran.dg/ptr_func_assign_1.f08

IOW:
gcc:
Running /x/gcc/testsuite/gcc.c-torture/execute/execute.exp ...
FAIL: gcc.c-torture/execute/960215-1.c   -O2  execution test
FAIL: gcc.c-torture/execute/960215-1.c   -O3 -g  execution test
FAIL: gcc.c-torture/execute/960215-1.c   -Os  execution test
FAIL: gcc.c-torture/execute/960215-1.c   -O2 -flto -fno-use-linker-plugin
-flto-partition=none  execution test
FAIL: gcc.c-torture/execute/pr69447.c   -O2  execution test
FAIL: gcc.c-torture/execute/pr69447.c   -O3 -g  execution test
FAIL: gcc.c-torture/execute/pr69447.c   -Os  execution test
FAIL: gcc.c-torture/execute/pr69447.c   -O2 -flto -fno-use-linker-plugin
-flto-partition=none  execution test
FAIL: gcc.c-torture/execute/regstack-1.c   -O2  execution test
FAIL: gcc.c-torture/execute/regstack-1.c   -O3 -g  execution test
FAIL: gcc.c-torture/execute/regstack-1.c   -Os  execution test
FAIL: gcc.c-torture/execute/regstack-1.c   -O2 -flto -fno-use-linker-plugin
-flto-partition=none  execution test
Running /x/gcc/testsuite/gcc.c-torture/execute/ieee/ieee.exp ...
FAIL: gcc.c-torture/execute/ieee/mzero3.c execution,  -O2
FAIL: gcc.c-torture/execute/ieee/mzero3.c execution,  -O3 -g
FAIL: gcc.c-torture/execute/ieee/mzero3.c execution,  -Os
FAIL: gcc.c-torture/execute/ieee/mzero3.c execution,  -O2 -flto
-fno-use-linker-plugin -flto-partition=none

gfortran:
Running /x/gcc/testsuite/gfortran.dg/c-interop/c-interop.exp ...
[...]
FAIL: gfortran.dg/c-interop/cf-out-descriptor-4.f90   -O3 -g  execution test
[...]
Running /x/gcc/testsuite/gfortran.dg/dg.exp ...
[...]
FAIL: gfortran.dg/bind-c-contiguous-3.f90   -O1  execution test
FAIL: gfortran.dg/bind-c-contiguous-3.f90   -O2  execution test
FAIL: gfortran.dg/bind-c-contiguous-3.f90   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/bind-c-contiguous-3.f90   -O3 -g  execution test
FAIL: gfortran.dg/bind-c-contiguous-3.f90   -Os  execution test
[...]
FAIL: gfortran.dg/class_allocate_22.f90   -O2  execution test
FAIL: gfortran.dg/class_allocate_22.f90   -O3 -g  execution test
[...]
FAIL: gfortran.dg/func_derived_1.f90   -O3 -fomit-frame-pointer -funroll-loops
-fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/func_derived_1.f90   -O3 -g  execution test
FAIL: gfortran.dg/inline_matmul_14.f90   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
[...]
FAIL: gfortran.dg/ptr_func_assign_1.f08   -O1  execution test
FAIL: gfortran.dg/ptr_func_assign_1.f08   -Os  execution test

Being execution errors, gcc.log doesn't tell anything besides abort being
called.

Judging from the patch, the failing tests and the tested target-list (in
particular, their difference to cris-elf), maybe libcalls aren't handled
properly.  The previously pushed (and reverted) version caused the same (or a
very similar set of) regressions.  Will look closer.

[Bug middle-end/108754] [13 Regression] multiple testsuite errors with r13-5761-g10827a92f1a8c3

2023-02-10 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108754

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |13.0
   Keywords||ra, wrong-code
   Severity|normal  |critical

[Bug tree-optimization/108500] [11/12 Regression] -O -finline-small-functions results in "internal compiler error: Segmentation fault" on a very large program (700k function calls)

2023-02-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108500

--- Comment #21 from CVS Commits  ---
The master branch has been updated by Vladimir Makarov :

https://gcc.gnu.org/g:3c5154d0f0d2185b518465b264ca17fb7c60c1e8

commit r13-5808-g3c5154d0f0d2185b518465b264ca17fb7c60c1e8
Author: Vladimir N. Makarov 
Date:   Fri Feb 10 11:12:37 2023 -0500

RA: Use simple LRA for huge functions

The PR108500 test contains a huge function and RA spends a lot of time
to compile the test with -O0.  The patch decreases compilation time
considerably for huge functions.  Compilation time for the PR test
decreases from 1235s to 709s on Intel i7-13600K.

PR tree-optimization/108500

gcc/ChangeLog:

* params.opt (ira-simple-lra-insn-threshold): Add new param.
* ira.cc (ira): Use the param to switch on simple LRA.

[Bug middle-end/108754] [13 Regression] multiple testsuite errors with r13-5761-g10827a92f1a8c3

2023-02-10 Thread vmakarov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108754

--- Comment #1 from Vladimir Makarov  ---
I think the problem is that cris uses the old reload pass.  Could you check the
following patch:

diff --git a/gcc/ira.cc b/gcc/ira.cc
index d0b6ea062e8..9f9af808f63 100644
--- a/gcc/ira.cc
+++ b/gcc/ira.cc
@@ -3773,7 +3773,7 @@ update_equiv_regs (void)
{
  note = set_unique_reg_note (insn, REG_EQUIV,
replacement);
}
- else
+ else if (ira_use_lra_p)
{
  /* We still can use this equivalence for caller save
 optimization in LRA.  Mark this.  */

[Bug tree-optimization/108751] Removing dead code results in worse generated target code at -Os

2023-02-10 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108751

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek  ---
The code isn't smaller, which indeed for -Os is important, though many GIMPLE
decisions need to be done just from heuristics whether a particular
transformation typically results in smaller or larger code, because the sizes
can't be compared until much later, just estimated.
What happens in this testcase is that b is determined to be constant only
during IPA optimizations, ccp2 after IPA then propagates the value of 1 into b
users and before lim2 we have pretty much the same IL (if I rename ssa name
versions and temporary suffixes), the only difference of between one where b
has been discovered constant 1 after IPA and where it has been determined 1
earlier is in the counts and branch probabilities:
-   [local count: 1018865821]:
+   [local count: 536870913]:
   goto ; [100.00%]

-   [local count: 54876003]:
+   [local count: 536870911]:
   return 3;

-   [local count: 460874625]:
+   [local count: 264428955]:
   _2 = a.2_3 + 1;
   a = _2;

-   [local count: 997745539]:
+   [local count: 801299868]:
   a.2_3 = a;
   if (a.2_3 != 0)
-goto ; [94.50%]
+goto ; [33.00%]
   else
-goto ; [5.50%]
+goto ; [67.00%]
Later on, lim2 decides to perform invariant motion in the latter case and not
in the former based on the probabilities.
In the first assembly
movl%eax, a(%rip)
is done in an inner loop, while in the latter case it is done only after the
loop finishes.

[Bug c/108718] [10/11/12/13 Regression] csmith: possible bad code with -O2

2023-02-10 Thread dcb314 at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108718

David Binderman  changed:

   What|Removed |Added

 CC||dcb314 at hotmail dot com

--- Comment #7 from David Binderman  ---
Created attachment 54449
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54449&action=edit
C source code

After about 20 minutes of reduction, cvise started going the wrong way.

[Bug middle-end/108754] [13 Regression] multiple testsuite errors with r13-5761-g10827a92f1a8c3

2023-02-10 Thread hp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108754

--- Comment #2 from Hans-Peter Nilsson  ---
Diff of .s for mzero3 at -O2:
--- /x/0/gccobj/gcc/testsuite/gcc/mzero3.x2-mzero3.s2023-02-10
17:57:56.786279467 +0100
+++ /x/1/gccobj/gcc/testsuite/gcc/mzero3.x2-mzero3.s2023-02-10
17:57:06.083925076 +0100
@@ -94,10 +94,8 @@ _main:
move.d $r8,$r12
move.d $r7,$r13
jsr $r4
-   move.d [_zerof],$r9
-   move.d $r9,[$sp+36]
move.d _negf,$r2
-   move.d $r9,$r10
+   move.d [$sp+36],$r10
Jsr $r2
move.d [_nzerof],$r3
move.d _expectf,$r0

That's the setup to call negf (zerof), and the diff is that zerof is saved to
stack before the patch, but with the patch, the uninitialized contents of that
slot is used.

[Bug middle-end/108754] [13 Regression] multiple testsuite errors with r13-5761-g10827a92f1a8c3

2023-02-10 Thread hp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108754

--- Comment #3 from Hans-Peter Nilsson  ---
(In reply to Vladimir Makarov from comment #1)
> I think the problem is that cris uses the old reload pass.  Could you check
> the following patch:

Will do, thanks!

[Bug middle-end/108754] [13 Regression] multiple testsuite errors with r13-5761-g10827a92f1a8c3

2023-02-10 Thread vmakarov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108754

--- Comment #4 from Vladimir Makarov  ---
(In reply to Hans-Peter Nilsson from comment #3)
> (In reply to Vladimir Makarov from comment #1)
> > I think the problem is that cris uses the old reload pass.  Could you check
> > the following patch:
> 
> Will do, thanks!

OK.  I'll submit the patch then.

[Bug middle-end/108754] [13 Regression] multiple testsuite errors with r13-5761-g10827a92f1a8c3

2023-02-10 Thread hp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108754

--- Comment #5 from Hans-Peter Nilsson  ---
Created attachment 54450
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54450&action=edit
suggested change

Here's the exact patch I'm testing, on top of the mentioned commit.
(In submittable format!)

[Bug ipa/108605] [13 Regression] ICE in ipa_push_agg_values_from_jfunc with offsets >= INT_MAX since r13-3359-g656b2338c8f248

2023-02-10 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108605

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek  ---
The use of unsigned for offsets is all around IPA:
ipa-param-manipulation.h:  unsigned unit_offset;
ipa-param-manipulation.h:  unsigned unit_offset;
ipa-param-manipulation.h:  void register_replacement (tree base, unsigned
unit_offset, tree replacement);
ipa-param-manipulation.h:  tree lookup_replacement (tree base, unsigned
unit_offset);
ipa-param-manipulation.h:  
unsigned unit_offset);
ipa-prop.h:  unsigned unit_offset;
ipa-prop.h:  tree get_value (int index, unsigned unit_offset, bool by_ref)
const;
ipa-prop.h:  tree get_value (int index, unsigned unit_offset) const;
ipa-prop.h:  const ipa_argagg_value *get_elt (int index, unsigned unit_offset)
const;
ipa-cp.cc:ipa_argagg_value_list::get_elt (int index, unsigned unit_offset)
const
ipa-cp.cc:  unsigned prev_unit_offset = 0;
ipa-cp.cc:ipa_argagg_value_list::get_value (int index, unsigned unit_offset)
const
ipa-cp.cc:ipa_argagg_value_list::get_value (int index, unsigned unit_offset,
ipa-cp.cc:  unsigned other_offset = other.m_elts[i].unit_offset;
ipa-cp.cc:  unsigned prev_unit_offset = 0;
ipa-cp.cc:  unsigned prev_unit_offset = 0;
ipa-cp.cc:  unsigned this_offset = elts[i].unit_offset;
ipa-cp.cc:  unsigned prev_unit_offset = 0;
ipa-cp.cc:unsigned unit_offset = aglat->offset / BITS_PER_UNIT;
ipa-cp.cc:  unsigned prev_unit_offset = 0;
ipa-param-manipulation.cc:  unsigned unit_offset;
ipa-param-manipulation.cc:isra_get_ref_base_and_offset (tree expr, tree
*base_p, unsigned *unit_offset_p)
ipa-param-manipulation.cc:   
unsigned unit_offset,
ipa-param-manipulation.cc:   
unsigned unit_offset)
ipa-param-manipulation.cc:ipa_param_body_adjustments::lookup_replacement (tree
base, unsigned unit_offset)
ipa-param-manipulation.cc:  unsigned unit_offset;
ipa-prop.cc:  unsigned unit_offset = bit_offset / BITS_PER_UNIT;
ipa-sra.cc:  unsigned unit_offset;
ipa-sra.cc:  unsigned unit_offset;
ipa-sra.cc:  unsigned unit_offset, unsigned unit_size)
ipa-sra.cc:  unsigned offset = argacc->unit_offset + delta_offset;

>From the above, only aglat->offset is actually HOST_WIDE_INT.
Now, I think it is just fine to use unsigned rather than say unsigned
HOST_WIDE_INT here, as long as we punt
on trying to optimize stuff which is above those offsets.  E.g.
isra_get_ref_base_and_offset has
  if (offset < 0 || (offset / BITS_PER_UNIT) > UINT_MAX)
return false;

  *base_p = base;
  *unit_offset_p = offset / BITS_PER_UNIT;
  return true;
and so looks just fine to me.  So, one possibility is just to fix wherever we
haven't done
similar check.

[Bug middle-end/108754] [13 Regression] multiple testsuite errors with r13-5761-g10827a92f1a8c3

2023-02-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108754

--- Comment #6 from CVS Commits  ---
The master branch has been updated by Vladimir Makarov :

https://gcc.gnu.org/g:7757567358a84c3774cb972350bd7ea299daaa8d

commit r13-5809-g7757567358a84c3774cb972350bd7ea299daaa8d
Author: Vladimir N. Makarov 
Date:   Fri Feb 10 12:17:07 2023 -0500

RA: Use caller save equivalent memory only for LRA

Recently I submitted a patch to reuse memory with constant address for
caller saves optimization for constant or pure function call.  It
seems to work only for targets using LRA instead of the old reload
pass.  So the patch switches off this optimization when the old reload
pass is used.

PR middle-end/108754

gcc/ChangeLog:

* ira.cc (update_equiv_regs): Set up ira_reg_equiv for
valid_combine only when ira_use_lra_p is true.

[Bug c/107127] [11/12 Regression] Long compile times on code with C complex since r11-3299-gcba079f354a55363

2023-02-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107127

--- Comment #9 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:b585bd941ea2e5c1cca52e40210483b556ce2ed7

commit r12-9120-gb585bd941ea2e5c1cca52e40210483b556ce2ed7
Author: Jakub Jelinek 
Date:   Wed Nov 23 19:09:31 2022 +0100

c: Fix compile time hog in c_genericize [PR107127]

The complex multiplications result in deeply nested set of many SAVE_EXPRs,
which takes even on fast machines over 5 minutes to walk.
This patch fixes that by using walk_tree_without_duplicates where it is
instant.

2022-11-23  Andrew Pinski  
Jakub Jelinek  

PR c/107127
* c-gimplify.cc (c_genericize): Use walk_tree_without_duplicates
instead of walk_tree for c_genericize_control_r.

* gcc.dg/pr107127.c: New test.

(cherry picked from commit 8a0fce6a51915c29584427fd376b40073c328090)

[Bug target/106875] [11/12 Regression] ICE in ix86_emit_outlined_ms2sysv_save with -mabi=ms -mcall-ms2sysv-xlogues and "#pragma GCC target" since r11-3183-gba948b37768c99cd

2023-02-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106875

--- Comment #5 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:85a84ce2e502b820277dbb399c43d4ced291efca

commit r12-9123-g85a84ce2e502b820277dbb399c43d4ced291efca
Author: Jakub Jelinek 
Date:   Mon Nov 28 10:13:43 2022 +0100

i386: Fix up ix86_abi handling [PR106875]

The following testcase fails since my changes to make also
opts_set saved/restored upon function target/optimization changes
(before it has been acting as "has this option be ever explicit
anywhere?").

The problem is that for ix86_abi we depend on the opts_set
value for it in ix86_option_override_internal:
  SET_OPTION_IF_UNSET (opts, opts_set, ix86_abi, DEFAULT_ABI);
but as it is a TargetSave, the backend code is required to
save/restore it manually (it does that) and since gcc 11 also
to save/restore the opts_set bit for it (which isn't done).
We don't do that for various other TargetSave which
ix86_function_specific_{save,restore} saves/restores, but as long
as we never test opts_set for it, it doesn't really matter.
One possible fix would be to introduce some new TargetSave into
which ix86_function_specific_{save,restore} would save/restore a bitmask
of the opts_set bits.  The following patch uses an easier fix, by
making it a TargetVariable instead the saving/restoring is handled
by the generated code.
The differences in options.h are just slight movements on where
*ix86_abi stuff appears in it, ditto for options.cc, the real
differences are just in options-save.cc, where cl_target_option_save
gets:
+  ptr->x_ix86_abi = opts->x_ix86_abi;
...
+  if (opts_set->x_ix86_abi) mask |= HOST_WIDE_INT_1U << 3;
(plus adjustments of following TargetVariables mask related stuff),
cl_target_option_restore gets:
+  opts->x_ix86_abi = ptr->x_ix86_abi;
...
+  opts_set->x_ix86_abi = static_cast((mask & 1) != 0);
+  mask >>= 1;
plus the movements in other functions too.  So, by it being a
TargetVariable, the only thing that changed is that we don't need to
handle it manually in ix86_function_specific_{save,restore} because it
is handled automatically including the opts_set stuff.

2022-11-28  Jakub Jelinek  

PR target/106875
* config/i386/i386.opt (x_ix86_abi): Remove TargetSave.
(ix86_abi): Replace it with TargetVariable.
* config/i386/i386-options.cc (ix86_function_specific_save,
ix86_function_specific_restore): Don't save and restore x_ix86_abi.

* g++.target/i386/pr106875.C: New test.

(cherry picked from commit ee629d242d9f93a38e49bed904bb334bbe15dde1)

[Bug c/107127] [11/12 Regression] Long compile times on code with C complex since r11-3299-gcba079f354a55363

2023-02-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107127

--- Comment #10 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:80010acd052ca7fe544740144756cf9fc2fad629

commit r12-9121-g80010acd052ca7fe544740144756cf9fc2fad629
Author: Jakub Jelinek 
Date:   Thu Nov 24 10:33:00 2022 +0100

testsuite: Fix up broken testcase [PR107127]

I've added { dg-options "" } line manually in the patch but
forgot to adjust the number of added lines.

2022-11-24  Jakub Jelinek  

PR c/107127
* gcc.dg/pr107127.c (foo): Add missing closing }.

(cherry picked from commit add0f941be18cdf962a0f300019acacbf2325d41)

[Bug rtl-optimization/106751] [10/11/12 Regression] internal compiler error: in purge_dead_edges with inline-asm goto

2023-02-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106751

--- Comment #14 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:4db6e1bf2f1647521dcd709bc3673f565fc327a5

commit r12-9128-g4db6e1bf2f1647521dcd709bc3673f565fc327a5
Author: Jakub Jelinek 
Date:   Fri Dec 16 10:19:22 2022 +0100

loop-invariant: Split preheader edge if the preheader bb ends with jump
[PR106751]

The RTL loop passes only request simple preheaders, but don't require
fallthru preheaders, while move_invariant_reg apparently assumes the
latter, that it can just append instruction(s) to the end of the preheader
basic block.

The following patch fixes that by splitting the preheader edge if
the preheader bb ends with a JUMP_INSN (asm goto in this case).
Without that we get control flow in the middle of a bb.

2022-12-16  Jakub Jelinek  

PR rtl-optimization/106751
* loop-invariant.cc (move_invariant_reg): If preheader bb ends
with a JUMP_INSN, split the preheader edge and emit invariants
into the new preheader basic block.

* gcc.c-torture/compile/pr106751.c: New test.

(cherry picked from commit ddcaa60983b50378bde1b7e327086fe0ce101795)

[Bug middle-end/107317] [10/11/12 Regression] ICE in emit_redzone_byte, at asan.cc:1508

2023-02-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107317

--- Comment #8 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:ff185dd96ac4576e722b39fc0f7026281de06eb2

commit r12-9122-gff185dd96ac4576e722b39fc0f7026281de06eb2
Author: Jakub Jelinek 
Date:   Thu Nov 24 11:29:54 2022 +0100

asan: Fix up error recovery for too large frames [PR107317]

asan_emit_stack_protection and functions it calls have various asserts that
verify sanity of the stack protection instrumentation.  But, that
verification can easily fail if we've diagnosed a frame offset overflow.
asan_emit_stack_protection just emits some extra code in the prologue,
if we've reported errors, we aren't producing assembly, so it doesn't
really matter if we don't include the protection code, compilation
is going to fail anyway.

2022-11-24  Jakub Jelinek  

PR middle-end/107317
* asan.cc: Include diagnostic-core.h.
(asan_emit_stack_protection): Return NULL early if seen_error ().

* gcc.dg/asan/pr107317.c: New test.

(cherry picked from commit b6330a7685476fc30b8ae9bbf3fca1a9b0d4be95)

[Bug debug/106719] [10/11/12 Regression] '-fcompare-debug' failure w/ -O2 since r10-6038-ge5e07b68187b9a

2023-02-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106719

--- Comment #6 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:ed8e7ece850bab599c15db3d43041b70d9e99237

commit r12-9124-ged8e7ece850bab599c15db3d43041b70d9e99237
Author: Jakub Jelinek 
Date:   Thu Dec 8 14:57:22 2022 +0100

cfgbuild: Fix DEBUG_INSN handling in find_bb_boundaries [PR106719]

The following testcase FAILs on aarch64-linux.  We have some atomic
instruction followed by 2 DEBUG_INSNs (if -g only of course) followed
by NOTE_INSN_EPILOGUE_BEG followed by some USE insn.
Now, split3 pass replaces the atomic instruction with a code sequence
which ends with a conditional jump and the split3 pass calls
find_many_sub_basic_blocks.
For -g0, find_bb_boundaries sees the flow_transfer_insn (the new
conditional
jump), then NOTE_INSN_EPILOGUE_BEG which can live in between basic blocks
and then the USE insn, so splits block after the NOTE_INSN_EPILOGUE_BEG
and puts the NOTE in between the blocks.
For -g, if sees a DEBUG_INSN after the flow_transfer_insn, so sets
debug_insn to it, then walks over another DEBUG_INSN,
NOTE_INSN_EPILOGUE_BEG
until it finally sees the USE insn, and triggers the:
  rtx_insn *prev = PREV_INSN (insn);

  /* If the first non-debug inside_basic_block_p insn after a
control
 flow transfer is not a label, split the block before the debug
 insn instead of before the non-debug insn, so that the debug
 insns are not lost.  */
  if (debug_insn && code != CODE_LABEL && code != BARRIER)
prev = PREV_INSN (debug_insn);
code I've added for PR81325.  If there are only DEBUG_INSNs, that is
the right thing to do, but if in between debug_insn and insn there are
notes which can stay in between basic blocks or simnilarly JUMP_TABLE_DATA
or their associated CODE_LABELs, it causes -fcompare-debug differences.

The following patch fixes it by clearing debug_insn if JUMP_TABLE_DATA
or associated CODE_LABEL is seen (I'm afraid there is no good answer
what to do with DEBUG_INSNs before those; the code then removes them:
  /* Clean up the bb field for the insns between the blocks. 
*/
  for (x = NEXT_INSN (flow_transfer_insn);
   x != BB_HEAD (fallthru->dest);
   x = next)
{
  next = NEXT_INSN (x);
  /* Debug insns should not be in between basic blocks,
 drop them on the floor.  */
  if (DEBUG_INSN_P (x))
delete_insn (x);
  else if (!BARRIER_P (x))
set_block_for_insn (x, NULL);
}
but if there are NOTEs, the patch just reorders the NOTEs and DEBUG_INSNs,
such that the NOTEs come first (so that they stay in between basic blocks
like with -g0) and DEBUG_INSNs after those (so that bb is split before
them, so they will be in the basic block after NOTE_INSN_BASIC_BLOCK).

2022-12-08  Jakub Jelinek  

PR debug/106719
* cfgbuild.cc (find_bb_boundaries): If there are NOTEs in between
debug_insn (seen after flow_transfer_insn) and insn, move NOTEs
before all the DEBUG_INSNs and split after NOTEs.  If there are
other insns like jump table data, clear debug_insn.

* gcc.dg/pr106719.c: New test.

(cherry picked from commit d9f9d5d30feb33c359955d7030cc6be50ef6dc0a)

[Bug tree-optimization/108068] [10/11/12 Regression] decimal floating point signed zero is not honored

2023-02-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108068

--- Comment #13 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:29ac1dcd36901a094f7d698bbe244489a58e2715

commit r12-9134-g29ac1dcd36901a094f7d698bbe244489a58e2715
Author: Jakub Jelinek 
Date:   Fri Dec 23 16:12:21 2022 +0100

tree-ssa-dom: can_infer_simple_equiv fixes [PR108068]

As reported in the PR, tree-ssa-dom.cc uses real_zerop call to find
if a floating point constant is zero and it shouldn't try to infer
equivalences from comparison against it if signed zeros are honored.
This doesn't work at all for decimal types, because real_zerop always
returns false for them (one can have different representations of decimal
zero beyond -0/+0), and it doesn't work for vector compares either,
as real_zerop checks if all elements are zero, while we need to avoid
infering equivalences from comparison against vector constants which have
at least one zero element in it (if signed zeros are honored).
Furthermore, as mentioned by Joseph, for decimal types many other values
aren't singleton.

So, this patch stops infering anything if element mode is decimal, and
otherwise uses instead of real_zerop a new function, real_maybe_zerop,
which will work even for decimal types and for complex or vector will
return true if any element is or might be zero (so it returns true
for anything but constants for now).

2022-12-23  Jakub Jelinek  

PR tree-optimization/108068
* tree.h (real_maybe_zerop): Declare.
* tree.cc (real_maybe_zerop): Define.
* tree-ssa-dom.cc (record_edge_info): Use it instead of
real_zerop or TREE_CODE (op1) == SSA_NAME || real_zerop.  Always
set
can_infer_simple_equiv to false for decimal floating point types.

* gcc.dg/dfp/pr108068.c: New test.

(cherry picked from commit fd1b0aefda5b65f3f841ca6e61ccea6a72daa060)

[Bug c++/107065] GCC treats rvalue as an lvalue

2023-02-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107065

--- Comment #16 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:bc1ee711eeab4b0d55463cd153747d30c69225c7

commit r12-9127-gbc1ee711eeab4b0d55463cd153747d30c69225c7
Author: Jakub Jelinek 
Date:   Thu Dec 15 19:17:45 2022 +0100

c++: Ensure !!var is not an lvalue [PR107065]

The TRUTH_NOT_EXPR case in cp_build_unary_op is one of the spots where
we somewhat fold immediately using invert_truthvalue_loc.
I've tried using
  return build1_loc (location, TRUTH_NOT_EXPR, boolean_type_node, arg);
in there instead, but unfortunately that regressed
Wlogical-not-parentheses-*.c pr49706.c pr62199.c pr65120.c sequence-pt-1.C
tests, so at least for backporting that doesn't seem to be a way to go.

So, this patch instead wraps it into NON_LVALUE_EXPR if needed (which also
need a tweak for some tests in the pr47906.c test, but nothing major),
with the intent to make it backportable, and later I'll try to do further
steps to avoid folding here prematurely.  Most of the problems with
build1 TRUTH_NOT_EXPR are that it doesn't even invert comparisons as most
common case and lots of warning code isn't able to deal with ! around
comparisons; so perhaps one way to do this would be fold by hand only
invertable comparisons and for the rest create TRUTH_NOT_EXPR.

2022-12-15  Jakub Jelinek  

PR c++/107065
gcc/cp/
* typeck.cc (cp_build_unary_op) : If
invert_truthvalue_loc returns obvalue_p, wrap it into
NON_LVALUE_EXPR.
* parser.cc (cp_parser_binary_expression): Don't call
warn_logical_not_parentheses if current.lhs is a NON_LVALUE_EXPR
of a decl with boolean type.
gcc/testsuite/
* g++.dg/cpp0x/pr107065.C: New test.

(cherry picked from commit 8b775b4c48a3cc4ef5c50e56144aea02da2e9cc6)

[Bug c++/108206] [12 Regression] ICE: tree check: expected tree that contains 'decl minimal' structure, have 'error_mark' in merge_default_template_args, at cp/decl.cc:1563 since r12-7562-gfe548eb8436

2023-02-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108206

--- Comment #6 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:7048e8c1073fcf2bf6be1a3d079393a78864ca61

commit r12-9137-g7048e8c1073fcf2bf6be1a3d079393a78864ca61
Author: Jakub Jelinek 
Date:   Wed Jan 4 18:42:31 2023 +0100

c++: Error recovery in merge_default_template_args [PR108206]

We ICE on the following testcase during error recovery, both new_parm
and old_parm are error_mark_node, the ICE is on
  error ("redefinition of default argument for %q+#D", new_parm);
  inform (DECL_SOURCE_LOCATION (old_parm),
  "original definition appeared here");
where we don't print anything useful for new_parm and ICE trying to
access DECL_SOURCE_LOCATION of old_parm.  I think we shouldn't diagnose
anything when either of the parms is erroneous, GCC 11 before
merge_default_template_args has been added was doing
  if (TREE_VEC_ELT (tmpl_parms, i) == error_mark_node
  || TREE_VEC_ELT (parms, i) == error_mark_node)
continue;

  tmpl_parm = TREE_VALUE (TREE_VEC_ELT (tmpl_parms, i));
  if (error_operand_p (tmpl_parm))
return false;
in redeclare_class_template.

2023-01-04  Jakub Jelinek  

PR c++/108206
* decl.cc (merge_default_template_args): Return false if either
new_parm or old_parm are erroneous.

* g++.dg/template/pr108206.C: New test.

(cherry picked from commit fc349931adcf1024ee95e0a0cd98cf4a41996093)

[Bug rtl-optimization/108596] [10/11/12 Regression] error: EDGE_CROSSING missing across section boundary

2023-02-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108596

--- Comment #8 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:e365bfacf2617403f6bc6aa79a45a27bdba8da36

commit r12-9146-ge365bfacf2617403f6bc6aa79a45a27bdba8da36
Author: Jakub Jelinek 
Date:   Tue Jan 31 09:46:35 2023 +0100

bbpart: Fix up ICE on asm goto [PR108596]

On the following testcase we have asm goto in hot block with 2 successors,
one cold to which it both falls through and has one of the label
pointing to it and another hot successor with another label.

Now, during bbpart we want to ensure that no blocks from one partition fall
through into a block in a different partition.  fix_up_fall_thru_edges
does that by temporarily clearing the EDGE_CROSSING on the fallthrough
edge,
calling force_nonfallthru and then depending on whether it created a new
bb either set EDGE_CROSSING on the single successor edge from the new bb
(the new bb is kept in the same partition as the predecessor block), or
if no new bb has been created setting EDGE_CROSSING back on the fallthru
edge which has been forced non-EDGE_FALLTHRU.
For asm goto this doesn't always work, force_nonfallthru can create a new
bb
and change the fallthrough edge to point to that, but if the original
fallthru destination block has its label referenced among the asm goto
labels, it will create a new non-fallthru edge for the label(s).
But because we've temporarily cheated and cleared EDGE_CROSSING on the
edge,
it is cleared on the new edge as well, then the caller sees we've created
a new bb and just sets EDGE_CROSSING on the single fallthru edge from the
new bb.  But the direct edge from cur_bb to fallthru edge's destination
isn't handled and fails afterwards consistency checks, because it crosses
partitions.

The following patch notes the case and sets EDGE_CROSSING on that edge too.

2023-01-31  Jakub Jelinek  

PR rtl-optimization/108596
* bb-reorder.cc (fix_up_fall_thru_edges): Handle the case where
cur_bb
ends with asm goto and has a crossing fallthrough edge to the same
bb
that contains at least one of its labels by restoring EDGE_CROSSING
flag even on possible edge from cur_bb to new_bb successor.

* gcc.c-torture/compile/pr108596.c: New test.

(cherry picked from commit 603a6fbcaac1e80aa90d1d26318c881a53473066)

[Bug c++/108286] [12 Regression] OpenMP Target directive causes internal compiler error

2023-02-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108286

--- Comment #6 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:5de999df9fa0134a1621b552eb2abd65a6384ffd

commit r12-9138-g5de999df9fa0134a1621b552eb2abd65a6384ffd
Author: Jakub Jelinek 
Date:   Thu Jan 5 11:57:30 2023 +0100

openmp: Fix up finish_omp_target_clauses [PR108286]

The comment in the loop says that we shouldn't add a map clause if such
a clause exists already, but the loop was actually using OMP_CLAUSE_DECL
on any clause.  Target construct can have various clauses which don't
have OMP_CLAUSE_DECL at all (e.g. nowait, device or if) or clause
where it means something different (e.g. privatization clauses, allocate,
depend).

So, only check OMP_CLAUSE_DECL on OMP_CLAUSE_MAP clauses.

2023-01-05  Jakub Jelinek  

PR c++/108286
* semantics.cc (finish_omp_target_clauses): Ignore clauses other
than
OMP_CLAUSE_MAP.

* testsuite/libgomp.c++/pr108286.C: New test.

(cherry picked from commit 29c3218618ef6177dc33871b26c8fbd9b21eabe1)

[Bug fortran/108349] LTO mismatch for __builtin_realloc between glibc and gfortran frontend

2023-02-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108349

--- Comment #7 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:463bf7cfb0b03d9e75754ea8ba89c61186d0982f

commit r12-9139-g463bf7cfb0b03d9e75754ea8ba89c61186d0982f
Author: Jakub Jelinek 
Date:   Wed Jan 11 10:40:54 2023 +0100

fortran: Fix up function types for realloc and sincos{,f,l} builtins
[PR108349]

As reported in the PR, the FUNCTION_TYPE for __builtin_realloc in the
Fortran FE is wrong since r0-100026-gb64fca63690ad which changed
-  tmp = tree_cons (NULL_TREE, pvoid_type_node, void_list_node);
-  tmp = tree_cons (NULL_TREE, size_type_node, tmp);
-  ftype = build_function_type (pvoid_type_node, tmp);
+  ftype = build_function_type_list (pvoid_type_node,
+size_type_node, pvoid_type_node,
+NULL_TREE);
   gfc_define_builtin ("__builtin_realloc", ftype, BUILT_IN_REALLOC,
  "realloc", false);
The return type is correct, void *, but the first argument should be
void * too and only second one size_t, while the above change changed
realloc to be void *__builtin_realloc (size_t, void *);
I went through all other changes from that commit and found that
__builtin_sincos{,f,l} got broken as well, instead of the former
void __builtin_sincos{,f,l} (ftype, ftype *, ftype *);
where ftype is {double,float,long double} it is now incorrectly
void __builtin_sincos{,f,l} (ftype *, ftype *);

The following patch fixes that, plus some formatting issues around
the spots I've changed.

2023-01-11  Jakub Jelinek  

PR fortran/108349
* f95-lang.cc (gfc_init_builtin_function): Fix up function types
for BUILT_IN_REALLOC and BUILT_IN_SINCOS{F,,L}.  Formatting fixes.

(cherry picked from commit 0986c351aa8a9f08b3cb614baec13564dd62c114)

[Bug target/108599] [12 Regression] Incorrect code generation newer intel architectures

2023-02-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108599

--- Comment #12 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:7d7f275ebe7295264a0406876c0670e25a50169a

commit r12-9147-g7d7f275ebe7295264a0406876c0670e25a50169a
Author: Jakub Jelinek 
Date:   Tue Jan 31 10:12:19 2023 +0100

i386: Fix up ix86_convert_const_wide_int_to_broadcast [PR108599]

The following testcase is miscompiled.  The problem is that during
RTL DSE we see a V4DI register is being loaded { 16, 16, 0, 0 }
value and DSE mostly works in terms of scalar modes, so it calls
movoi to set an OImode REG to (const_wide_int 0x100010)
and ix86_convert_const_wide_int_to_broadcast thinks it can compute
that value by broadcasting DImode 0x10.  While it is true that
for TImode result the broadcast could be used, for OImode/XImode
it can't be, because all but the lowest 2 HOST_WIDE_INTs aren't
present (so are 0 or -1 depending on sign), not 0x10 in this case.
The function checks if the least significant HOST_WIDE_INT elt
of the CONST_WIDE_INT is broadcastable from QI/HI/SI/DImode and then
  /* Check if OP can be broadcasted from VAL.  */
  for (int i = 1; i < CONST_WIDE_INT_NUNITS (op); i++)
if (val != CONST_WIDE_INT_ELT (op, i))
  return nullptr;
That is needed of course, but nothing checks that
CONST_WIDE_INT_NUNITS (op) isn't too small for the mode in question.
I think if op would be 0 or -1, it ought to be never CONST_WIDE_INT,
but CONST_INT and so we can just punt whenever the number of
CONST_WIDE_INT elts is not the expected one.

2023-01-31  Jakub Jelinek  

PR target/108599
* config/i386/i386-expand.cc
(ix86_convert_const_wide_int_to_broadcast): Return nullptr if
CONST_WIDE_INT_NUNITS (op) times HOST_BITS_PER_WIDE_INT isn't
equal to bitsize of mode.

* gcc.target/i386/avx2-pr108599.c: New test.

(cherry picked from commit 963315a922e228c4f685382151fc540f111a)

[Bug tree-optimization/108095] powerpc-linux / powerpc64-linux: ICEs when building Linux's arch/powerpc/kernel/align.c (asm goto)

2023-02-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108095

--- Comment #10 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:7e54e5a2bba69dc7fcbc88fe8cb20c91aaafabd2

commit r12-9126-g7e54e5a2bba69dc7fcbc88fe8cb20c91aaafabd2
Author: Jakub Jelinek 
Date:   Thu Dec 15 09:26:44 2022 +0100

into-ssa: Fix emitting debug stmts after asm goto [PR108095]

The following testcase ICEs, because ccp1 replaced
  s.0_1 = &s;
  __asm__ goto("" : "=r" MEM[(T *)s.0_1] :  :  : "lab" lab);
with
  __asm__ goto("" : "=r" s :  :  : "lab" lab);
and because s is no longer addressable, we are rewriting it into
ssa and want
  __asm__ goto("" : "=r" s_7 :  :  : "lab" lab);
plus debug stmt
  # DEBUG s => s_7
The code assumes that there is at most one non-EH edge in that
case, but with the addition of outputs to asm goto that is no longer the
case, we can have many outgoing edges.

The patch keeps the checking assertion that there is at most one such
edge for everything but asm goto, but moves the addition of the debug
stmt into the loop, so that it can be added on all edges where it is
possible, not just one of them.

Furthermore, looking at gsi_insert_on_edge_immediate
-> gimple_find_edge_insert_loc, the conditions to insert stmt there
to the destination block are
  if (single_pred_p (dest)
  && gimple_seq_empty_p (phi_nodes (dest))
  && dest != EXIT_BLOCK_PTR_FOR_FN (cfun))
(plus there is code to insert it in the previous block but that is
never true when the pred is known to be stmt_ends_bb_p), while
mayube_register_def was just checking
 if (ef && single_pred_p (ef->dest)
 && ef->dest != EXIT_BLOCK_PTR_FOR_FN (cfun))
so if for whatever reason ef->dest had any PHIs, we'd split the
edge for -g and not for -g0, something we must avoid for -fcompare-debug
stability.  So, I've added the no phi_nodes check too.

2022-12-15  Jakub Jelinek  

PR tree-optimization/108095
* tree-into-ssa.cc (maybe_register_def): Insert debug stmt
on all non-EH edges from asm goto if they have a single
predecessor rather than asserting there is at most one such edge.
Test whether there are no PHI nodes next to the single predecessor
test.

* gcc.dg/pr108095.c: New test.

(cherry picked from commit bf3ce6f84a7a994a0fc87419b383b9ce4efed442)

[Bug tree-optimization/107997] [10/11/12/13 Regression] r13-4389-gfd8dd6c0384969 probably uncovered an issue building the Linux kernel

2023-02-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107997

--- Comment #13 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:89daf0dd6f1748077c03fbeb27ca5980a0b9abd5

commit r12-9125-g89daf0dd6f1748077c03fbeb27ca5980a0b9abd5
Author: Jakub Jelinek 
Date:   Sat Dec 10 16:50:39 2022 +0100

ivopts: Fix IP_END handling for asm goto [PR107997]

The following testcase ICEs, because the latch bb ends with
asm goto which has both fallthrough to the header and one or more labels
in the header too.  In that case there is just a single edge out of the
latch block, but still the asm goto is stmt_ends_bb_p statement, yet
ivopts decides to emit an IV bump at the IP_END position and inserts
it into the same bb as the asm goto after it, which then fails verification
(control flow in the middle of bb).

The following patch fixes it by splitting the latch -> header edge in that
case and inserting into the newly created bb, where split_edge ->
redirect_edge_and_branch is able to deal with this case correctly.

2022-12-10  Jakub Jelinek  

PR tree-optimization/107997
* tree-ssa-loop-ivopts.cc: Include cfganal.h.
(create_new_iv) : If ip_end_pos bb is non-empty and
ends
with a stmt which ends bb, instead of adding iv update after it
split
the latch edge and insert iterator into the new latch bb.

* gcc.c-torture/compile/pr107997.c: New test.

(cherry picked from commit 7676235f690e624b7ed41a22b22ce8ccfac1492f)

[Bug c/105972] [12 Regression] ICE in lower_stmt, at gimple-low.cc:312 since r12-4608-gb4702276615ff8d4

2023-02-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105972

--- Comment #8 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:c2b33b330c16a97627e987c60a6ca35ed0fdea56

commit r12-9140-gc2b33b330c16a97627e987c60a6ca35ed0fdea56
Author: Jakub Jelinek 
Date:   Wed Jan 11 22:18:42 2023 +0100

c: Don't emit DEBUG_BEGIN_STMTs for K&R function argument declarations
[PR105972]

K&R function parameter declarations are handled by calling
recursively c_parser_declaration_or_fndef in a loop, where each such
call will add_debug_begin_stmt at the start.
Now, if the K&R function definition is not a nested function,
building_stmt_list_p () is false and so we don't emit the DEBUG_BEGIN_STMTs
anywhere, but if it is a nested function, we emit it in the containing
function at the point of the nested function definition.
As the following testcase shows, it can cause ICEs if the containing
function has var-tracking disabled but nested function has them enabled,
as the DEBUG_BEGIN_STMTs are added to the containing function which
shouldn't have them but MAY_HAVE_DEBUG_MARKER_STMTS is checked already
for the nested function, or just wrong experience in the debugger.

The following patch ensures we don't emit any such DEBUG_BEGIN_STMTs for
the
K&R function parameter declarations even in nested functions.

2023-01-11  Jakub Jelinek  

PR c/105972
* c-parser.cc (c_parser_declaration_or_fndef): Disable debug
non-bind
markers for K&R function parameter declarations of nested
functions.

* gcc.dg/pr105972.c: New test.

(cherry picked from commit 23b4ce18379cd336d99d7c71701be28118905b57)

[Bug tree-optimization/108692] [11/12 Regression] Miscompilation of orc_test.c since r11-5160

2023-02-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108692

--- Comment #6 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:00136f439e2849af2bfd9934d79a8297ab09a1d9

commit r12-9152-g00136f439e2849af2bfd9934d79a8297ab09a1d9
Author: Jakub Jelinek 
Date:   Wed Feb 8 18:41:21 2023 +0100

vect-patterns: Fix up vect_widened_op_tree [PR108692]

The following testcase is miscompiled on aarch64-linux since r11-5160.
Given
   [local count: 955630225]:
  # i_22 = PHI 
  # r_23 = PHI 
...
  a.0_5 = (unsigned char) a_15;
  _6 = (int) a.0_5;
  b.1_7 = (unsigned char) b_17;
  _8 = (int) b.1_7;
  c_18 = _6 - _8;
  _9 = ABS_EXPR ;
  r_19 = _9 + r_23;
...
where SSA_NAMEs 15/17 have signed char, 5/7 unsigned char and rest is int
we first pattern recognize c_18 as
patt_34 = (a.0_5) w- (b.1_7);
which is still correct, 5/7 are unsigned char subtracted in wider type,
but then vect_recog_sad_pattern turns it into
SAD_EXPR 
which is incorrect, because 15/17 are signed char and so it is
sum of absolute signed differences rather than unsigned sum of
absolute unsigned differences.
The reason why this happens is that vect_recog_sad_pattern calls
vect_widened_op_tree with MINUS_EXPR, WIDEN_MINUS_EXPR on the
patt_34 = (a.0_5) w- (b.1_7); statement's vinfo and vect_widened_op_tree
calls vect_look_through_possible_promotion on the operands of the
WIDEN_MINUS_EXPR, which looks through the further casts.
vect_look_through_possible_promotion has careful code to stop when there
would be nested casts that need to be preserved, but the problem here
is that the WIDEN_*_EXPR operation itself has an implicit cast on the
operands already - in this case of WIDEN_MINUS_EXPR the unsigned char
5/7 SSA_NAMEs are widened to unsigned short before the subtraction,
and vect_look_through_possible_promotion obviously isn't told about that.

Now, I think when we see those WIDEN_{MULT,MINUS,PLUS}_EXPR codes, we had
to look through possible promotions already when creating those and so
vect_look_through_possible_promotion again isn't really needed, all we need
to do is arrange what that function will do if the operand isn't result
of any cast.  Other option would be let
vect_look_through_possible_promotion
know about the implicit promotion from the WIDEN_*_EXPR, but I'm afraid
that would be much harder.

2023-02-08  Jakub Jelinek  

PR tree-optimization/108692
* tree-vect-patterns.cc (vect_widened_op_tree): If rhs_code is
widened_code which is different from code, don't call
vect_look_through_possible_promotion but instead just check op is
SSA_NAME with integral type for which vect_is_simple_use is true
and call set_op on this_unprom.

* gcc.dg/pr108692.c: New test.

(cherry picked from commit 6ad1c1027628f094260037536f6b6fcdb63b5add)

[Bug tree-optimization/108498] [11/12 Regression] ppc64 big endian generates uninitialized reads with -fstore-merging

2023-02-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108498

--- Comment #27 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:671b7c29dd666cb74dfe5ab01b501d6a0ca7b41c

commit r12-9144-g671b7c29dd666cb74dfe5ab01b501d6a0ca7b41c
Author: Jakub Jelinek 
Date:   Wed Jan 25 10:50:27 2023 +0100

store-merging: Disable string_concatenate mode if start or end aren't byte
aligned [PR108498]

The first of the following testcases is miscompiled on powerpc64-linux -O2
-m64 at least, the latter at least on x86_64-linux -m32/-m64.
Since GCC 11 store-merging has a separate string_concatenation mode which
turns stores into setting a MEM_REF from a STRING_CST.
This mode is triggered if at least one of the to be merged stores
is a STRING_CST store and either the first store (to earliest address)
is that STRING_CST store or the first store is 8-bit INTEGER_CST store
and then there are some rules when to turn that mode off or not merge
further stores into it.

The problem with these 2 testcases is that the actual implementation
relies on start/width of the store to be at byte boundaries, as it
simply creates a char array, MEM_REF can be only on byte boundaries
and the char array too, plus obviously STRING_CST as well.
But as can be easily seen in the second testcase, nothing verifies this,
while the first store has to be a STRING_CST (which will be aligned)
or 8-bit INTEGER_CST, that 8-bit INTEGER_CST store could be a bitfield
store, nothing verifies any stores in between whether they actually are
8-bit and aligned, the only major requirement is that all the stores
are consecutive.

For GCC 14 I think we should reconsider this, simply treat STRING_CST
stores during the merging like INTEGER_CST stores and deal with it only
during split_group where we can create multiple parts, this part
would be a normal store, this part would be STRING_CST store, this part
another normal store etc.  But that is quite a lot of work, the following
patch just disables the string_concatenate mode if boundaries aren't byte
aligned in the spot where we disable it if it is too short too.
If that happens, we'll just try to do the merging using normal 1/2/4/8 etc.
byte stores as usually with RMW masking for any bits that shouldn't be
touched or punt if we end up with too many stores compared to the original.

Note, an original STRING_CST store will count as one store in that case,
something we might want to reconsider later too (but, after all,
CONSTRUCTOR
stores (aka zeroing) already have the same problem, they can be large and
expensive and we still count them as one store).

2023-01-25  Jakub Jelinek  

PR tree-optimization/108498
* gimple-ssa-store-merging.cc (class store_operand_info):
End coment with full stop rather than comma.
(split_group): Likewise.
(merged_store_group::apply_stores): Clear string_concatenation if
start or end aren't on a byte boundary.

* gcc.c-torture/execute/pr108498-1.c: New test.
* gcc.c-torture/execute/pr108498-2.c: New test.

(cherry picked from commit 617be7ba436bcbf9d7b883968c6b3c011206b56c)

[Bug testsuite/108151] gcc.dg/pr64536.c stores pointers in a long, broken for llp64

2023-02-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108151

--- Comment #6 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:e4f6149fe272101af6de3a19be4e41d0e77e7f6c

commit r12-9130-ge4f6149fe272101af6de3a19be4e41d0e77e7f6c
Author: Jakub Jelinek 
Date:   Mon Dec 19 15:05:16 2022 +0100

testsuite: Fix up pr64536.c for LLP64 targets [PR108151]

Apparently llp64 had 2 further warnings, fixed thusly.

2022-12-19  Jakub Jelinek  

PR testsuite/108151
* gcc.dg/pr64536.c (bar): Cast long to __INTPTR_TYPE__
before casting to long *.

(cherry picked from commit 6e85f89a7d59a99a3395b6e153b99262a58b2f6c)

[Bug testsuite/108151] gcc.dg/pr64536.c stores pointers in a long, broken for llp64

2023-02-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108151

--- Comment #5 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:4430147d3779d8f089d8eb765b4c7e0333279424

commit r12-9129-g4430147d3779d8f089d8eb765b4c7e0333279424
Author: Jakub Jelinek 
Date:   Mon Dec 19 13:49:52 2022 +0100

testsuite: Fix up pr64536.c for LLP64 targets [PR108151]

The test casts a pointer to long, which is ok for ilp32 and lp64
targets but not for llp64 targets.  Nothing reads the values later,
it is a link test, so all we care about is that it is the same
cast on s390x-linux where it used to fail before the PR64536 fix,
and that we don't warn about it.

2022-12-19  Jakub Jelinek  

PR testsuite/108151
* gcc.dg/pr64536.c (bar): Use casts to __INTPTR_TYPE__ rather than
long when casting pointer to integral type.

(cherry picked from commit ea37e96a37b50dad17b91d46edc518bbb9132d8e)

[Bug c++/108607] [12 Regression] ICE in potential_constant_expression_1, at cp/constexpr.cc:10003

2023-02-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108607

--- Comment #4 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:a62d952064c896eaf94e70d7999e6e27343babcf

commit r12-9148-ga62d952064c896eaf94e70d7999e6e27343babcf
Author: Jakub Jelinek 
Date:   Wed Feb 1 10:38:46 2023 +0100

c++, openmp: Handle some OMP_*/OACC_* constructs during constant expression
evaluation [PR108607]

While potential_constant_expression_1 handled most of OMP_* codes (by
saying that
they aren't potential constant expressions), OMP_SCOPE was missing in that
list.
I've also added OMP_SCAN, though that is less important (similarly to
OMP_SECTION
it ought to appear solely inside of OMP_{FOR,SIMD} resp. OMP_SECTIONS).
As the testcase shows, it isn't enough, potential_constant_expression_1
can catch only some cases, as soon as one uses switch or ifs where at least
one of the possible paths could be constant expression, we can run into the
same codes during cxx_eval_constant_expression, so this patch handles those
there as well.

2023-02-01  Jakub Jelinek  

PR c++/108607
* constexpr.cc (cxx_eval_constant_expression): Handle OMP_*
and OACC_* constructs as non-constant.
(potential_constant_expression_1): Handle OMP_SCAN and OMP_SCOPE.

* g++.dg/gomp/pr108607.C: New test.

(cherry picked from commit bfc070595bfb00abef88a002eee5d9117f5b86a7)

[Bug rtl-optimization/108193] [13 Regression] ICE in do_SUBST, at combine.cc:700

2023-02-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108193

--- Comment #4 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:cb8022eab6d076325495360da632321078326135

commit r12-9132-gcb8022eab6d076325495360da632321078326135
Author: Jakub Jelinek 
Date:   Thu Dec 22 12:44:13 2022 +0100

cse: Fix up CSE const_anchor handling [PR108193]

The following testcase ICEs on aarch64, because insert_const_anchor
inserts invalid CONST_INT into the CSE tables - 0x8000 for SImode.
The second hunk of the patch fixes that, the first one is to avoid
triggering undefined behavior at compile time during compute_const_anchors
computations - performing those additions and subtractions in
HOST_WIDE_INT means it can overflow for certain constants.

2022-12-22  Jakub Jelinek  

PR rtl-optimization/108193
* cse.cc (compute_const_anchors): Change n type to
unsigned HOST_WIDE_INT, adjust comparison against it to avoid
warnings.  Formatting fix.
(insert_const_anchor): Use gen_int_mode instead of GEN_INT.

* gfortran.dg/pr108193.f90: New test.

(cherry picked from commit 0cb5d7cdbab8e5f8359764ef5f62d93c2bc88552)

[Bug tree-optimization/106523] [10/11/12 Regression] forwprop miscompile

2023-02-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106523

--- Comment #9 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:a558a4d3d1b488783b96dff7141d12e02ded3ad3

commit r12-9157-ga558a4d3d1b488783b96dff7141d12e02ded3ad3
Author: Jakub Jelinek 
Date:   Tue Jan 17 12:14:25 2023 +0100

forwprop: Fix up rotate pattern matching [PR106523]

The comment above simplify_rotate roughly describes what patterns
are matched into what:
   We are looking for X with unsigned type T with bitsize B, OP being
   +, | or ^, some type T2 wider than T.  For:
   (X << CNT1) OP (X >> CNT2)   iff CNT1 + CNT2 ==
B
   ((T) ((T2) X << CNT1)) OP ((T) ((T2) X >> CNT2)) iff CNT1 + CNT2 ==
B

   transform these into:
   X r<< CNT1

   Or for:
   (X << Y) OP (X >> (B - Y))
   (X << (int) Y) OP (X >> (int) (B - Y))
   ((T) ((T2) X << Y)) OP ((T) ((T2) X >> (B - Y)))
   ((T) ((T2) X << (int) Y)) OP ((T) ((T2) X >> (int) (B - Y)))
   (X << Y) | (X >> ((-Y) & (B - 1)))
   (X << (int) Y) | (X >> (int) ((-Y) & (B - 1)))
   ((T) ((T2) X << Y)) | ((T) ((T2) X >> ((-Y) & (B - 1
   ((T) ((T2) X << (int) Y)) | ((T) ((T2) X >> (int) ((-Y) & (B - 1

   transform these into (last 2 only if ranger can prove Y < B):
   X r<< Y

   Or for:
   (X << (Y & (B - 1))) | (X >> ((-Y) & (B - 1)))
   (X << (int) (Y & (B - 1))) | (X >> (int) ((-Y) & (B - 1)))
   ((T) ((T2) X << (Y & (B - 1 | ((T) ((T2) X >> ((-Y) & (B - 1
   ((T) ((T2) X << (int) (Y & (B - 1 \
 | ((T) ((T2) X >> (int) ((-Y) & (B - 1

   transform these into:
   X r<< (Y & (B - 1))

The following testcase shows that 2 of these are problematic.
If T2 is wider than T, then the 2 which yse (-Y) & (B - 1) on one
of the shift counts but Y on the can do something different from
rotate.  E.g.:
__attribute__((noipa)) unsigned char
f7 (unsigned char x, unsigned int y)
{
  unsigned int t = x;
  return (t << y) | (t >> ((-y) & 7));
}
if y is [0, 7], then it is a normal rotate, and if y is in [32, ~0U]
then it is UB, but for y in [9, 31] the left shift in this case
will never leave any bits in the result, while in a rotate they are
left there.  Say for y 5 and x 0xaa the expression gives
0x55 which is the same thing as rotate, while for y 19 and x 0xaa
0x5, which is different.
Now, I believe the
   ((T) ((T2) X << Y)) OP ((T) ((T2) X >> (B - Y)))
   ((T) ((T2) X << (int) Y)) OP ((T) ((T2) X >> (int) (B - Y)))
forms are ok, because B - Y still needs to be a valid shift count,
and if Y > B then B - Y should be either negative or very large
positive (for unsigned types).
And similarly the last 2 cases above which use & (B - 1) on both
shift operands are definitely ok.

The following patch disables the
   ((T) ((T2) X << Y)) | ((T) ((T2) X >> ((-Y) & (B - 1
   ((T) ((T2) X << (int) Y)) | ((T) ((T2) X >> (int) ((-Y) & (B - 1
unless ranger says Y is not in [B, B2 - 1] range.

And, looking at it again this morning, actually the Y equal to B
case is still fine, if Y is equal to 0, then it is
(T) (((T2) X << 0) | ((T2) X >> 0))
and so X, for Y == B it is
(T) (((T2) X << B) | ((T2) X >> 0))
which is the same as
(T) (0 | ((T2) X >> 0))
which is also X.  So instead of the [B, B2 - 1] range we could use
[B + 1, B2 - 1].  And, if we wanted to go further, even multiplies
of B are ok if they are smaller than B2, so we could construct a detailed
int_range_max if we wanted.

2023-01-17  Jakub Jelinek  

PR tree-optimization/106523
* tree-ssa-forwprop.cc (simplify_rotate): For the
patterns with (-Y) & (B - 1) in one operand's shift
count and Y in another, if T2 has wider precision than T,
punt if Y could have a value in [B, B2 - 1] range.

* c-c++-common/rotate-2.c (f5, f6, f7, f8, f13, f14, f15, f16,
f37, f38, f39, f40, f45, f46, f47, f48): Add assertions using
__builtin_unreachable about shift count.
* c-c++-common/rotate-2b.c: New test.
* c-c++-common/rotate-4.c (f5, f6, f7, f8, f13, f14, f15, f16,
f37, f38, f39, f40, f45, f46, f47, f48): Add assertions using
__builtin_unreachable about shift count.
* c-c++-common/rotate-4b.c: New test.
* gcc.c-torture/execute/pr106523.c: New test.

(cherry picked from commit 001121e8921d5d1a439ce0e64ab04c5959b0bfd8)

[Bug c++/108180] [OpenMP] Passing a class member variable to firstprivate() erroneously calls its dtor

2023-02-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108180

--- Comment #4 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:c4b8949a3ad0a2259388841f3c833876a19bd2a2

commit r12-9131-gc4b8949a3ad0a2259388841f3c833876a19bd2a2
Author: Jakub Jelinek 
Date:   Wed Dec 21 09:05:27 2022 +0100

openmp: Don't try to destruct DECL_OMP_PRIVATIZED_MEMBER vars [PR108180]

DECL_OMP_PRIVATIZED_MEMBER vars are artificial vars with DECL_VALUE_EXPR
of this->field used just during gimplification and omp lowering/expansion
to privatize individual fields in methods when needed.
As the following testcase shows, when not in templates, they were handled
right, but in templates we actually called cp_finish_decl on them and
that can result in their destruction, which is obviously undesirable,
we should only destruct the privatized copies of them created in omp
lowering.

Fixed thusly.

2022-12-21  Jakub Jelinek  

PR c++/108180
* pt.cc (tsubst_expr): Don't call cp_finish_decl on
DECL_OMP_PRIVATIZED_MEMBER vars.

* testsuite/libgomp.c++/pr108180.C: New test.

(cherry picked from commit 1119902b6c7c1c50123ed85ec1def8be4772d68c)

[Bug tree-optimization/108440] rotate optimization may introduce new UB

2023-02-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108440

--- Comment #10 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:a015ebe382cd6d0beab9db4ad33fbd252b7e2339

commit r12-9158-ga015ebe382cd6d0beab9db4ad33fbd252b7e2339
Author: Jakub Jelinek 
Date:   Thu Jan 19 10:00:51 2023 +0100

forwprop: Further fixes for simplify_rotate [PR108440]

As mentioned in the simplify_rotate comment, for e.g.
   ((T) ((T2) X << (Y & (B - 1 | ((T) ((T2) X >> ((-Y) & (B - 1
we already emit
   X r<< (Y & (B - 1))
as replacement.  This PR is about the
   ((T) ((T2) X << Y)) OP ((T) ((T2) X >> (B - Y)))
   ((T) ((T2) X << (int) Y)) OP ((T) ((T2) X >> (int) (B - Y)))
forms if T2 is wider than T.  Unlike e.g.
   (X << Y) OP (X >> (B - Y))
which is valid just for Y in [1, B - 1], the above 2 forms are actually
valid and do the rotates for Y in [0, B] - for Y 0 the X value is preserved
by the left shift and right logical shift by B adds just zeros (but because
the shift is in wider precision B is still valid shift count), while for
Y equal to B X is preserved through the latter shift and the former adds
just zeros.
Now, it is unclear if we in the middle-end treat rotates with rotate count
equal or larger than precision as UB or not, unlike shifts there are less
reasons to do so, but e.g. expansion of X r<< Y if there is no rotate optab
for the mode is emitted as (X << Y) | (((unsigned) X) >> ((-Y) & (B - 1)))
and so with UB on Y == B.

The following patch does multiple things:
1) for the above 2, asks the ranger if Y could be equal to B and if so,
   instead of using X r<< Y uses X r<< (Y & (B - 1))
2) for the
   ((T) ((T2) X << Y)) | ((T) ((T2) X >> ((-Y) & (B - 1
   ((T) ((T2) X << (int) Y)) | ((T) ((T2) X >> (int) ((-Y) & (B - 1
   forms that were fixed 2 days ago it only punts if Y might be in the
   [B,B2-1] range but isn't known to be in the
   [0,B][2*B,2*B][3*B,3*B]... range.  Because for Y which is a multiple
   of B but smaller than B2 it acts as a rotate too, left shift provides
   0 and (-Y) & (B - 1) is 0 and so preserves X.  Though, for the cases
   where Y is not known to be in [0,B-1] the patch also uses
   X r<< (Y & (B - 1)) rather than X r<< Y
3) as discussed with Aldy, instead of using global ranger it uses a pass
   specific copy but lazily created on first simplify_rotate that needs it;
   this e.g. handles rotate inside of if body where the guarding condition
   limits the shift count to some range which will not work with the
   global ranger (unless there is some SSA_NAME to attach the range to).

Note, e.g. on x86 X r<< (Y & (B - 1)) and X r<< Y actually emit the
same assembly because rotates work the same even for larger rotate counts,
but that is handled only during combine.

2023-01-19  Jakub Jelinek  

PR tree-optimization/108440
* tree-ssa-forwprop.cc: Include gimple-range.h.
(simplify_rotate): For the forms with T2 wider than T and shift
counts of
Y and B - Y add & (B - 1) masking for the rotate count if Y could
be equal
to B.  For the forms with T2 wider than T and shift counts of
Y and (-Y) & (B - 1), don't punt if range could be [B, B2], but
only if
range doesn't guarantee Y < B or Y = N * B.  If range doesn't
guarantee
Y < B, also add & (B - 1) masking for the rotate count.  Use lazily
created
pass specific ranger instead of get_global_range_query.
(pass_forwprop::execute): Disable that ranger at the end of pass if
it has
been created.

* c-c++-common/rotate-10.c: New test.
* c-c++-common/rotate-11.c: New test.

(cherry picked from commit 05b9868b182bb9ed2013b39a0bc6297354a0db49)

[Bug debug/108573] [13 Regression] '-fcompare-debug' failure (length) at -O2

2023-02-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108573

--- Comment #6 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:7bd8b65bd5d51a33f31ec39dfb435b84e36260e9

commit r12-9149-g7bd8b65bd5d51a33f31ec39dfb435b84e36260e9
Author: Jakub Jelinek 
Date:   Wed Feb 1 12:52:52 2023 +0100

ree: Fix -fcompare-debug issues in combine_reaching_defs [PR108573]

The PR78437 r7-4871 changes made combine_reaching_defs punt on
WORD_REGISTER_OPERATIONS targets if a setter of smaller than word
register has wider uses.  This unfortunately breaks -fcompare-debug,
because if such a use appears only in DEBUG_INSN(s), while all other
uses aren't wider than the setter, we can REE optimize it without -g
and not with -g.

Such decisions shouldn't be based on debug instructions.  We could try
to reset them or adjust in some other way after we decide to perform the
change, but at least on the testcase which used to fail on riscv64-linux
the
(debug_insn 8 7 9 2 (var_location:HI s (minus:HI (subreg:HI (and:DI (reg:DI
10 a0 [160])
(const_int 1 [0x1])) 0)
(subreg:HI (ashiftrt:DI (reg/v:DI 9 s1 [orig:151 l ] [151])
(debug_expr:SI D#1)) 0))) "pr108573.c":12:5 -1
 (nil))
clearly doesn't care about the upper bits and I have hard time imaging how
could one end up with DEBUG_INSN which actually cares about those upper
bits.

So, the following patch just ignores uses on DEBUG_INSNs in this case,
if we run into something where we'd need to do something further later on,
let's deal with it when we have a testcase for it.

2023-02-01  Jakub Jelinek  

PR debug/108573
* ree.cc (combine_reaching_defs): Don't return false for
paradoxical
subregs in DEBUG_INSNs.

* gcc.dg/pr108573.c: New test.

(cherry picked from commit e4473d7cf871c8ddf8f22d105c5af6375ebe37bf)

[Bug middle-end/108264] [11/12 Regression] ICE compiling guacamole-server on s390x-linux

2023-02-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108264

--- Comment #5 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:ee25e54233c6a1548eda06aa9a11f09cd7eb32ac

commit r12-9135-gee25e54233c6a1548eda06aa9a11f09cd7eb32ac
Author: Jakub Jelinek 
Date:   Tue Jan 3 12:13:24 2023 +0100

expr: Fix up store_expr into SUBREG_PROMOTED_* target [PR108264]

The following testcase ICEs on s390x-linux (e.g. with -march=z13).
The problem is that target is (subreg/s/u:SI (reg/v:DI 66 [ x+-4 ]) 4)
and we call convert_move from temp to the SUBREG_REG of that, expecting
to extend the value properly.  That works nicely if temp has some
scalar integer mode (or partial one), but ICEs when temp has V4QImode
on the assertion that from and to modes have the same bitsize.
store_expr generally allows say store from V4QI to SI target because
they have the same size and if temp is a CONST_INT, we already have code
to convert the constant properly, so the following patch just adds handling
of non-scalar integer modes by converting them to the mode of target
first before convert_move extends them.

2023-01-03  Jakub Jelinek  

PR middle-end/108264
* expr.cc (store_expr): For stores into SUBREG_PROMOTED_* targets
from source which doesn't have scalar integral mode first convert
it to outer_mode.

* gcc.dg/pr108264.c: New test.

(cherry picked from commit 226a498733e7919de72eb6f1bf3e16883ad159f6)

[Bug tree-optimization/108166] [12 Regression] Wrong code with -O2 since r12-8078-ga42aa68bf1ad745a

2023-02-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108166

--- Comment #10 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:86d252ab555d487aefb616562e770ffa46e05b01

commit r12-9133-g86d252ab555d487aefb616562e770ffa46e05b01
Author: Jakub Jelinek 
Date:   Thu Dec 22 12:52:48 2022 +0100

phiopt: Drop SSA_NAME_RANGE_INFO in maybe equal case [PR108166]

The following place in value_replacement is after proving that
x == cst1 ? cst2 : x
phi result is only used in a comparison with constant which doesn't
care if it compares cst1 or cst2 and replaces it with x.
The testcase is miscompiled because we have after the replacement
incorrect range info for the phi result, we would need to
effectively union the phi result range with cst1 (oarg in the code)
because previously that constant might be missing in the range, but
newly it can appear (we've just verified that the single use stmt
of the phi result doesn't care about that value in particular).

The following patch just resets the info, bootstrapped/regtested
on x86_64-linux and i686-linux, ok for trunk?

Aldy/Andrew, how would one instead union the SSA_NAME_RANGE_INFO
with some INTEGER_CST and store it back into SSA_NAME_RANGE_INFO
(including adjusting non-zero bits and the like)?

2022-12-22  Jakub Jelinek  

PR tree-optimization/108166
* tree-ssa-phiopt.cc (value_replacement): For the maybe_equal_p
case turned into equal_p reset SSA_NAME_RANGE_INFO of phi result.

* g++.dg/torture/pr108166.C: New test.

(cherry picked from commit 5c17adfb5d08e34da7a7f234dfc2ed1f0aaadaa9)

[Bug middle-end/102633] [11/12 Regression] warning for self-initialization despite -Wno-init-self

2023-02-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102633

--- Comment #11 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:aabebf76e9d9a805ea5b443d4ee4f49f13155d87

commit r12-9160-gaabebf76e9d9a805ea5b443d4ee4f49f13155d87
Author: Marek Polacek 
Date:   Tue Jul 26 13:55:58 2022 -0400

c-family: Honor -Wno-init-self for cv-qual vars [PR102633]

Since r11-5188-g32934a4f45a721, we drop qualifiers during l-to-r
conversion by creating a NOP_EXPR.  For e.g.

  const int i = i;

that means that the DECL_INITIAL is '(int) i' and not 'i' anymore.
Consequently, we don't suppress_warning here:

711 case DECL_EXPR:
715   if (VAR_P (DECL_EXPR_DECL (*expr_p))
716   && !DECL_EXTERNAL (DECL_EXPR_DECL (*expr_p))
717   && !TREE_STATIC (DECL_EXPR_DECL (*expr_p))
718   && (DECL_INITIAL (DECL_EXPR_DECL (*expr_p)) == DECL_EXPR_DECL
(*expr_p))
719   && !warn_init_self)
720 suppress_warning (DECL_EXPR_DECL (*expr_p), OPT_Winit_self);

because of the check on line 718 -- (int) i is not i.  So -Wno-init-self
doesn't disable the warning as it's supposed to.

The following patch fixes it by moving the suppress_warning call from
c_gimplify_expr to the front ends, at points where we haven't created
the NOP_EXPR yet.

PR middle-end/102633

gcc/c-family/ChangeLog:

* c-gimplify.cc (c_gimplify_expr) : Don't call
suppress_warning here.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_initializer): Add new tree parameter.  Use
it.
Call suppress_warning.
(c_parser_declaration_or_fndef): Pass d down to
c_parser_initializer.
(c_parser_omp_declare_reduction): Pass omp_priv down to
c_parser_initializer.

gcc/cp/ChangeLog:

* decl.cc (cp_finish_decl): Call suppress_warning.

gcc/testsuite/ChangeLog:

* c-c++-common/Winit-self1.c: New test.
* c-c++-common/Winit-self2.c: New test.

(cherry picked from commit 04ce2400b35225302e0d6883bb0817378180f5d7)

[Bug fortran/108451] [13 Regression] ICE in check_complete_insertion, at hash-table.h:578

2023-02-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108451

--- Comment #7 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:f2731d1b9a52a7c97af9bbb6ea76603630cc11c2

commit r12-9151-gf2731d1b9a52a7c97af9bbb6ea76603630cc11c2
Author: Jakub Jelinek 
Date:   Fri Feb 3 21:37:27 2023 +0100

fortran: Fix up hash table usage in gfc_trans_use_stmts [PR108451]

The first testcase in the PR (which I haven't included in the patch because
it is unclear to me if it is supposed to be valid or not) ICEs since extra
hash table checking has been added recently.  The problem is that
gfc_trans_use_stmts does
  tree *slot = entry->decls->find_slot_with_hash (rent->use_name,
hash,
  INSERT);
  if (*slot == NULL)
and later on doesn't store anything into *slot and continues.  Another spot
a few lines later correctly clears the slot if it decides not to use the
slot, so the following patch does the same.

2023-02-03  Jakub Jelinek  

PR fortran/108451
* trans-decl.cc (gfc_trans_use_stmts): Call clear_slot before
doing continue.

(cherry picked from commit 76f7f0eddcb7c418d1ec3dea3e2341ca99097301)

[Bug middle-end/108237] [13 Regression] ICE: in gimple_expand_vec_cond_expr, at gimple-isel.cc:281 at -O since r13-1085-g90467f0ad649d081

2023-02-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108237

--- Comment #9 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:4c8e17a6a578b9eb15cd210651b6ea273022db39

commit r12-9136-g4c8e17a6a578b9eb15cd210651b6ea273022db39
Author: Jakub Jelinek 
Date:   Wed Jan 4 10:54:38 2023 +0100

generic-match-head: Don't assume GENERIC folding is done only early
[PR108237]

We ICE on the following testcase, because a valid V2DImode
!= comparison is folded into an unsupported V2DImode > comparison.
The match.pd pattern which does this looks like:
/* Transform comparisons of the form (X & Y) CMP 0 to X CMP2 Z
   where ~Y + 1 == pow2 and Z = ~Y.  */
(for cst (VECTOR_CST INTEGER_CST)
 (for cmp (eq ne)
  icmp (le gt)
  (simplify
   (cmp (bit_and:c@2 @0 cst@1) integer_zerop)
(with { tree csts = bitmask_inv_cst_vector_p (@1); }
 (if (csts && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2)))
  (with { auto optab = VECTOR_TYPE_P (TREE_TYPE (@1))
 ? optab_vector : optab_default;
  tree utype = unsigned_type_for (TREE_TYPE (@1)); }
   (if (target_supports_op_p (utype, icmp, optab)
|| (optimize_vectors_before_lowering_p ()
&& (!target_supports_op_p (type, cmp, optab)
|| !target_supports_op_p (type, BIT_AND_EXPR, optab
(if (TYPE_UNSIGNED (TREE_TYPE (@1)))
 (icmp @0 { csts; })
 (icmp (view_convert:utype @0) { csts; })
and that optimize_vectors_before_lowering_p () guarded stuff there
already deals with this problem, not trying to fold a supported comparison
into a non-supported one.  The reason it doesn't work in this case is that
it isn't GIMPLE folding which does this, but GENERIC folding done during
forwprop4 - forward_propagate_into_comparison ->
forward_propagate_into_comparison_1
-> combine_cond_expr_cond -> fold_binary_loc -> generic_simplify
and we simply assumed that GENERIC folding happens only before
gimplification.

The following patch fixes that by checking cfun properties instead of
always returning true in those cases.

2023-01-04  Jakub Jelinek  

PR middle-end/108237
* generic-match-head.cc: Include tree-pass.h.
(canonicalize_math_p, optimize_vectors_before_lowering_p): Define
to false if cfun and cfun->curr_properties has PROP_gimple_opt_math
resp. PROP_gimple_lvec property set.

* gcc.c-torture/compile/pr108237.c: New test.

(cherry picked from commit 345dffd0d4ebff7e705dfff1a8a72017a167120a)

  1   2   >