[Bug d/113667] [14 Regression] libgphobos symbols missing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113667 Richard Biener changed: What|Removed |Added Keywords||ABI Priority|P3 |P1 Target Milestone|--- |14.0
[Bug go/113668] [14 Regression] libgo soname bump needed for the GCC 14 release?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113668 Richard Biener changed: What|Removed |Added Keywords||ABI CC||rguenth at gcc dot gnu.org Target Milestone|--- |14.0
[Bug middle-end/113669] -fsanitize=undefined failed to check a signed integer overflow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113669 Richard Biener changed: What|Removed |Added Last reconfirmed||2024-01-31 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #2 from Richard Biener --- So confirmed.
[Bug tree-optimization/113670] ICE with vectors in named registers and -fno-vect-cost-model
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113670 Richard Biener changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2024-01-31 Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #3 from Richard Biener --- I'll hunt it down.
[Bug tree-optimization/99395] s116 benchmark of TSVC is vectorized by clang and not by gcc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395 --- Comment #14 from JuzheZhong --- Thanks Richard. It seems that we can't fix this issue for now. Is that right ? If I understand correctly, do you mean we should wait after SLP representations are finished and then revisit this PR?
[Bug tree-optimization/99395] s116 benchmark of TSVC is vectorized by clang and not by gcc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395 --- Comment #15 from rguenther at suse dot de --- On Wed, 31 Jan 2024, juzhe.zhong at rivai dot ai wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395 > > --- Comment #14 from JuzheZhong --- > Thanks Richard. > > It seems that we can't fix this issue for now. Is that right ? > > If I understand correctly, do you mean we should wait after SLP > representations > are finished and then revisit this PR? Yes.
[Bug regression/113672] [14 Regression] FAIL: g++.dg/pch/line-map-3.C -g -I. -Dwith_PCH (test for excess errors)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113672 Richard Biener changed: What|Removed |Added Keywords||testsuite-fail Target Milestone|--- |14.0
[Bug tree-optimization/113673] [12/13/14 Regression] ICE: verify_flow_info failed: BB 5 cannot throw but has an EH edge with -Os -finstrument-functions -fnon-call-exceptions -ftrapv
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113673 Richard Biener changed: What|Removed |Added Priority|P3 |P2 --- Comment #2 from Richard Biener --- Looks like an issue in bswap with regard to EH.
[Bug c++/113674] [11/12/13/14 Regression] [[____attr____]] causes internal compiler error: in decl_attributes, at attribs.cc:776
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113674 Richard Biener changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2024-01-31
[Bug tree-optimization/113676] [12 Regression] Miscompilation tree-vrp __builtin_unreachable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113676 Richard Biener changed: What|Removed |Added Target||x86_64-*-* Summary|[11/12 Regression] |[12 Regression] |Miscompilation tree-vrp |Miscompilation tree-vrp |__builtin_unreachable |__builtin_unreachable --- Comment #1 from Richard Biener --- Needs -std=c++20. I can't reproduce locally.
[Bug tree-optimization/99395] s116 benchmark of TSVC is vectorized by clang and not by gcc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395 --- Comment #16 from JuzheZhong --- (In reply to rguent...@suse.de from comment #15) > On Wed, 31 Jan 2024, juzhe.zhong at rivai dot ai wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395 > > > > --- Comment #14 from JuzheZhong --- > > Thanks Richard. > > > > It seems that we can't fix this issue for now. Is that right ? > > > > If I understand correctly, do you mean we should wait after SLP > > representations > > are finished and then revisit this PR? > > Yes. It seems to be a big refactor work. I wonder I can do anything to help with SLP representations ?
[Bug tree-optimization/113677] Missing `VEC_PERM_EXPR <{a, CST}, CST, {0, 1, 2, ...}>` optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113677 Richard Biener changed: What|Removed |Added Last reconfirmed||2024-01-31 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #3 from Richard Biener --- Yeah, most of the code in forwprop/match doesn't deal with the "new" permutes where the result isn't the same length as the inputs.
[Bug target/113607] [14] RISC-V rv64gcv vector: Runtime mismatch at -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113607 --- Comment #23 from Robin Dapp --- > this is: > > _429 = mask_patt_205.47_276[i] ? vect_cst__262[i] : (vect_cst__262 << > {0,..})[i]; > vect_iftmp.55_287 = mask_patt_209.54_286[i] ? _429 [i] : vect_cst__262[i] But isn't it rather _429 = mask_patt_205.47_276[i] ? (vect_cst__262[i] << vect_cst__262[i]) : {0,..})[i]? The else should be the last operand, shouldn't it? On aarch64 we don't seem to emit a COND_SHL therefore this particular situation does not occur. However the simplification was introduced for aarch64: (for cond_op (COND_BINARY) (simplify (vec_cond @0 (cond_op:s @1 @2 @3 @4) @3) (cond_op (bit_and @1 @0) @2 @3 @4))) It is supposed to simplify (in gcc.target/aarch64/sve/pre_cond_share_1.c) _256 = .COND_MUL (mask__108.48_193, vect_iftmp.45_187, vect_cst__190, { 0.0, ... }); vect_prephitmp_151.50_197 = VEC_COND_EXPR ; into COND_MUL (mask108 & mask101, vect_iftmp.45_187, vect_cst__190, { 0.0, ... }); But that doesn't look valid to me either. No matter what _256 is, the result for !mask101 should be vect_cst__190 and not 0.0.
[Bug tree-optimization/113678] SLP misses up vec_concat
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113678 Richard Biener changed: What|Removed |Added Last reconfirmed||2024-01-31 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- I think the SLP tree we discover is sound: t2.c:11:14: note: node 0x5db76f0 (max_nunits=8, refcnt=2) vector(8) char t2.c:11:14: note: op template: *a_7(D) = _1; t2.c:11:14: note: stmt 0 *a_7(D) = _1; t2.c:11:14: note: stmt 1 MEM[(char *)a_7(D) + 1B] = _2; t2.c:11:14: note: stmt 2 MEM[(char *)a_7(D) + 2B] = _3; t2.c:11:14: note: stmt 3 MEM[(char *)a_7(D) + 3B] = _4; t2.c:11:14: note: stmt 4 MEM[(char *)a_7(D) + 4B] = _1; t2.c:11:14: note: stmt 5 MEM[(char *)a_7(D) + 5B] = _2; t2.c:11:14: note: stmt 6 MEM[(char *)a_7(D) + 6B] = _3; t2.c:11:14: note: stmt 7 MEM[(char *)a_7(D) + 7B] = _4; t2.c:11:14: note: children 0x5db7778 t2.c:11:14: note: node 0x5db7778 (max_nunits=8, refcnt=2) vector(8) char t2.c:11:14: note: op template: _1 = *b_6(D); t2.c:11:14: note: stmt 0 _1 = *b_6(D); t2.c:11:14: note: stmt 1 _2 = MEM[(char *)b_6(D) + 1B]; t2.c:11:14: note: stmt 2 _3 = MEM[(char *)b_6(D) + 2B]; t2.c:11:14: note: stmt 3 _4 = MEM[(char *)b_6(D) + 3B]; t2.c:11:14: note: stmt 4 _1 = *b_6(D); t2.c:11:14: note: stmt 5 _2 = MEM[(char *)b_6(D) + 1B]; t2.c:11:14: note: stmt 6 _3 = MEM[(char *)b_6(D) + 2B]; t2.c:11:14: note: stmt 7 _4 = MEM[(char *)b_6(D) + 3B]; t2.c:11:14: note: load permutation { 0 1 2 3 0 1 2 3 } the issue is as so often t2.c:11:14: note: ==> examining statement: _1 = *b_6(D); t2.c:11:14: missed: BB vectorization with gaps at the end of a load is not supported t2.c:3:19: missed: not vectorized: relevant stmt not supported: _1 = *b_6(D); t2.c:11:14: note: Building vector operands of 0x5db7778 from scalars instead where we are not applying much non-ad-hoc work to deal with those "out-of-bound" accesses. The choice here would be obvious in doing a single vector(4) load instead.
[Bug c/113679] New: long long minus double with gcc -m32 produces different results than other compilers or gcc -m64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113679 Bug ID: 113679 Summary: long long minus double with gcc -m32 produces different results than other compilers or gcc -m64 Product: gcc Version: 13.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: dilyan.palauzov at aegee dot org Target Milestone: --- diff.c is: #include int main(void) { long long l = 9223372036854775806; double d = 9223372036854775808.0; printf("%f\n", (double)l - d); return 0; } With gcc (GCC) 13.2.1 20231205 (Red Hat 13.2.1-6), gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, clang 16.0.4 and clang 17.0.5: $ gcc -m64 -o diff diff.c && ./diff 0.00 $ gcc -m32 -o diff diff.c && ./diff -2.00 $ clang -m64 -o diff diff.c && ./diff 0.00 $ clang -m32 -o diff diff.c && ./diff 0.00 With cl.exe 19.29.3015319.29.30153 (first is x84 - 32 bit, second is 64 bit) C:\> CALL "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvarsall.bat" x86 10.0.17763.0 C:\> cl diff.c >nul 2>nul & .\diff.exe 0.00 C:\> CALL "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvarsall.bat" amd64 10.0.17763.0 C:\> cl diff.c >nul 2>nul & .\diff.exe 0.00 gcc -m32 produces a different result, compared to gcc -m64, clang 17 (32 and 64bit), and MSCV Visual Studio 2019 (32 and 64bit).
[Bug target/113679] long long minus double with gcc -m32 produces different results than other compilers or gcc -m64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113679 Andrew Pinski changed: What|Removed |Added Component|c |target --- Comment #1 from Andrew Pinski --- I suspect the issue is excessive precision with x87 fp.
[Bug target/113679] long long minus double with gcc -m32 produces different results than other compilers or gcc -m64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113679 --- Comment #2 from Дилян Палаузов --- This happens only without optimizations: $ gcc -O0 -m32 -o diff diff.c && ./diff -2.00 $ gcc -O1 -m32 -o diff diff.c && ./diff 0.00 $ gcc -O2 -m32 -o diff diff.c && ./diff 0.00 $ gcc -O3 -m32 -o diff diff.c && ./diff 0.00
[Bug target/113679] long long minus double with gcc -m32 produces different results than other compilers or gcc -m64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113679 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE --- Comment #3 from Andrew Pinski --- [apinski@xeond2 gcc]$ ~/upstream-gcc/bin/gcc -m32 tr56.c [apinski@xeond2 gcc]$ ./a.out -2.00 [apinski@xeond2 gcc]$ ~/upstream-gcc/bin/gcc -m32 tr56.c -fexcess-precision=standard [apinski@xeond2 gcc]$ ./a.out 0.00 [apinski@xeond2 gcc]$ ~/upstream-gcc/bin/gcc -m32 tr56.c -msse2 -mfpmath=sse [apinski@xeond2 gcc]$ ./a.out 0.00 Yes it is due to excessive precision of x87. Use either `-fexcess-precision=standard` or `-msse2 -mfpmath=sse` if you don't want to use the execessive precision of the x87 FP. *** This bug has been marked as a duplicate of bug 323 ***
[Bug middle-end/323] optimized code gives strange floating point results
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=323 Andrew Pinski changed: What|Removed |Added CC||dilyan.palauzov at aegee dot org --- Comment #231 from Andrew Pinski --- *** Bug 113679 has been marked as a duplicate of this bug. ***
[Bug target/113679] long long minus double with gcc -m32 produces different results than other compilers or gcc -m64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113679 --- Comment #4 from Jakub Jelinek --- Yeah, it is, that is how excess precision behaves. Due to the cast applying just to l rather than l - d it returns 0.0 with -fexcess-precision=standard, but if you change it to (double)(l - d) then it will return -2.0 at all optimization levels with -fexcess-precision=standard. -fexcess-precision=fast behaves depending on what instructions are actually used and where the conversions to float or double happen due to storing of expressions or subexpressions into memory as documented. If you don't like excess precision and have SSE2, you can use -msse2 -mfpmath=sse.
[Bug tree-optimization/99395] s116 benchmark of TSVC is vectorized by clang and not by gcc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395 --- Comment #17 from rguenther at suse dot de --- On Wed, 31 Jan 2024, juzhe.zhong at rivai dot ai wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395 > > --- Comment #16 from JuzheZhong --- > (In reply to rguent...@suse.de from comment #15) > > On Wed, 31 Jan 2024, juzhe.zhong at rivai dot ai wrote: > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395 > > > > > > --- Comment #14 from JuzheZhong --- > > > Thanks Richard. > > > > > > It seems that we can't fix this issue for now. Is that right ? > > > > > > If I understand correctly, do you mean we should wait after SLP > > > representations > > > are finished and then revisit this PR? > > > > Yes. > > It seems to be a big refactor work. It's not too bad if people wouldn't continue to add features not implementing SLP ... > I wonder I can do anything to help with SLP representations ? I hope to get back to this before stage1 re-opens and will post another request for testing. It's really mostly going to be making sure all paths have coverage which means testing all the various architectures - I can only easily test x86. There's a branch I worked on last year, refs/users/rguenth/heads/vect-force-slp, which I use to hunt down cases not supporting SLP (it's a bit overeager to trigger, and it has known holes so it's not really a good starting point yet for folks to try other archs).
[Bug tree-optimization/113670] ICE with vectors in named registers and -fno-vect-cost-model
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113670 --- Comment #4 from GCC Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:924137b9012cee5603482242de08fbf0b2030f6a commit r14-8645-g924137b9012cee5603482242de08fbf0b2030f6a Author: Richard Biener Date: Wed Jan 31 09:09:50 2024 +0100 tree-optimization/113670 - gather/scatter to/from hard registers The following makes sure we're not taking the address of hard registers when vectorizing appearant gathers or scatters to/from them. PR tree-optimization/113670 * tree-vect-data-refs.cc (vect_check_gather_scatter): Make sure we can take the address of the reference base. * gcc.target/i386/pr113670.c: New testcase.
[Bug tree-optimization/113676] [12 Regression] Miscompilation tree-vrp __builtin_unreachable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113676 --- Comment #2 from Magnus Hokland Hegdahl --- Hi, here's a version that doesn't need -std=c++20 or argv: https://godbolt.org/z/Y9ooY998e #include constexpr auto bit_ceil(unsigned x) -> unsigned { if (x <= 1) return 1U; int w = 32 - __builtin_clz(x - 1); return 1U << w; } int main(int argc, char **) { auto rounded_n = bit_ceil(static_cast(argc + 1)); auto a = std::vector(2UL * rounded_n); for (std::size_t i = rounded_n; i-- > 1;) { if (!(0 < i && i < rounded_n)) __builtin_unreachable(); a[i] = 0; } } Exact compile command used with g++-12 (GCC) 12.3.0 on arch linux, x86_64: g++-12 -O1 -ftree-vrp main.cpp
[Bug tree-optimization/113670] ICE with vectors in named registers and -fno-vect-cost-model
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113670 Richard Biener changed: What|Removed |Added Known to fail|14.0| Target Milestone|--- |14.0 Resolution|--- |FIXED Status|ASSIGNED|RESOLVED Known to work||14.0 --- Comment #5 from Richard Biener --- Fixed for trunk.
[Bug tree-optimization/99395] s116 benchmark of TSVC is vectorized by clang and not by gcc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395 --- Comment #18 from JuzheZhong --- (In reply to rguent...@suse.de from comment #17) > On Wed, 31 Jan 2024, juzhe.zhong at rivai dot ai wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395 > > > > --- Comment #16 from JuzheZhong --- > > (In reply to rguent...@suse.de from comment #15) > > > On Wed, 31 Jan 2024, juzhe.zhong at rivai dot ai wrote: > > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395 > > > > > > > > --- Comment #14 from JuzheZhong --- > > > > Thanks Richard. > > > > > > > > It seems that we can't fix this issue for now. Is that right ? > > > > > > > > If I understand correctly, do you mean we should wait after SLP > > > > representations > > > > are finished and then revisit this PR? > > > > > > Yes. > > > > It seems to be a big refactor work. > > It's not too bad if people wouldn't continue to add features not > implementing SLP ... > > > I wonder I can do anything to help with SLP representations ? > > I hope to get back to this before stage1 re-opens and will post > another request for testing. It's really mostly going to be making > sure all paths have coverage which means testing all the various > architectures - I can only easily test x86. There's a branch > I worked on last year, refs/users/rguenth/heads/vect-force-slp, > which I use to hunt down cases not supporting SLP (it's a bit > overeager to trigger, and it has known holes so it's not really > a good starting point yet for folks to try other archs). Ok. It seems that you almost done with that but needs more testing in various targets. So, if I want to work on optimizing vectorization (start with TSVC), I should avoid touching the failed vectorized due to data reference/dependence analysis (e.g. this PR case, s116). and avoid adding new features into loop vectorizer, e.g. min/max reduction with index (s315). To not to make your SLP refactoring work heavier. Am I right ?
[Bug target/113633] FAIL: gcc.dg/bf-ms-attrib.c execution test, wrong size for ms_struct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113633 LIU Hao changed: What|Removed |Added CC||lh_mouse at 126 dot com --- Comment #1 from LIU Hao --- My suggestion is that following what MSVC produces is the only way to go.
[Bug tree-optimization/113676] [12 Regression] Miscompilation tree-vrp __builtin_unreachable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113676 Jakub Jelinek changed: What|Removed |Added CC||aldyh at gcc dot gnu.org, ||amacleod at redhat dot com, ||jakub at gcc dot gnu.org Keywords|needs-bisection | --- Comment #3 from Jakub Jelinek --- Bisection with -O2 -ftree-vrp #include unsigned bit_ceil (unsigned x) { if (x <= 1) return 1U; int w = 32 - __builtin_clz (x - 1); return 1U << w; } int main (int argc, char **) { unsigned rounded_n = bit_ceil ((unsigned) (argc + 1)); auto a = std::vector (2UL * rounded_n); for (long unsigned int i = rounded_n; i-- > 1;) { if (!(0 < i && i < rounded_n)) __builtin_unreachable(); a[i] = 0; } } shows this started with r12-155-gd8e1f1d24179690fd9c0f63c27b12e030010d9ea and went away with r13-3596-ge7310e24b1c0ca67b1bb507c1330b2bf39e59e32 so nothing really backportable.
[Bug tree-optimization/113676] [12 Regression] Miscompilation tree-vrp __builtin_unreachable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113676 --- Comment #4 from Jakub Jelinek --- And with --param=vrp1-mode=vrp it segfaulted even with r13-4276-gce917b0422c145779b83e005afd8433c0c86fb06 but the next revision removed that parameter, so can't go further.
[Bug ipa/111444] [14 Regression] Wrong code at -O2/3/s on x86_64-gnu since r14-3226-gd073e2d75d9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111444 --- Comment #8 from Richard Biener --- OK, so the issue is that we're recording the IPA result with the wrong VUSE since we're calling vn_reference_lookup_2 with !data->last_vuse_ptr but data->finish (vr->set, vr->base_set, v) inserts a hashtable entry with data->last_vuse. Note it's somewhat unexpected that vn_reference_lookup_2 performs hashtable insertion which is what causes the issue. It's also not as easy as using the updated vuse since if we're coming from translation through a memcpy that would be wrong. In fact we probably want to avoid doing any insertion if theres sth fishy going on (!data->last_vuse_ptr). The best fix would likely be to pre-insert all the IPA-CP known constants instead of trying to discover them "late". I'm testing the easy fix for now.
[Bug tree-optimization/99395] s116 benchmark of TSVC is vectorized by clang and not by gcc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395 --- Comment #19 from rguenther at suse dot de --- On Wed, 31 Jan 2024, juzhe.zhong at rivai dot ai wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395 > > --- Comment #18 from JuzheZhong --- > (In reply to rguent...@suse.de from comment #17) > > On Wed, 31 Jan 2024, juzhe.zhong at rivai dot ai wrote: > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395 > > > > > > --- Comment #16 from JuzheZhong --- > > > (In reply to rguent...@suse.de from comment #15) > > > > On Wed, 31 Jan 2024, juzhe.zhong at rivai dot ai wrote: > > > > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395 > > > > > > > > > > --- Comment #14 from JuzheZhong --- > > > > > Thanks Richard. > > > > > > > > > > It seems that we can't fix this issue for now. Is that right ? > > > > > > > > > > If I understand correctly, do you mean we should wait after SLP > > > > > representations > > > > > are finished and then revisit this PR? > > > > > > > > Yes. > > > > > > It seems to be a big refactor work. > > > > It's not too bad if people wouldn't continue to add features not > > implementing SLP ... > > > > > I wonder I can do anything to help with SLP representations ? > > > > I hope to get back to this before stage1 re-opens and will post > > another request for testing. It's really mostly going to be making > > sure all paths have coverage which means testing all the various > > architectures - I can only easily test x86. There's a branch > > I worked on last year, refs/users/rguenth/heads/vect-force-slp, > > which I use to hunt down cases not supporting SLP (it's a bit > > overeager to trigger, and it has known holes so it's not really > > a good starting point yet for folks to try other archs). > > Ok. It seems that you almost done with that but needs more testing in > various targets. > > So, if I want to work on optimizing vectorization (start with TSVC), > I should avoid touching the failed vectorized due to data reference/dependence > analysis (e.g. this PR case, s116). It depends on the actual case - the one in this bug at least looks like half of it might be dealt with with the refactoring. > and avoid adding new features into loop vectorizer, e.g. min/max reduction > with > index (s315). It's fine to add features if they works with SLP as well ;) Note that in the future SLP will also do the "single lane" case but it doesn't do that on trunk. Some features are difficult with multi-lane SLP and probably not important in practice for that case, still handling single-lane SLP will be important as otherwise the feature is lost. > To not to make your SLP refactoring work heavier. > > Am I right ? Yes. I've got early break vectorization to chase now, I was "finished" with the parts I could exercise on x86_64 in autumn ...
[Bug debug/113637] ICE: in as_a, at machmode.h:381 with extern function declaration and _BitInt() used as VLA size
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113637 --- Comment #4 from GCC Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:457d2b59b58e5998e1e6967316d4e3e8f24edeed commit r14-8651-g457d2b59b58e5998e1e6967316d4e3e8f24edeed Author: Jakub Jelinek Date: Wed Jan 31 10:56:15 2024 +0100 dwarf2out: Fix ICE on large _BitInt in loc_list_from_tree_1 [PR113637] This spot uses SCALAR_INT_TYPE_MODE which obviously ICEs for large/huge BITINT_TYPE types which have BLKmode. But such large BITINT_TYPEs certainly don't fit into DWARF2_ADDR_SIZE either, so we can just assume it would be false if type has BLKmode. 2024-01-31 Jakub Jelinek PR debug/113637 * dwarf2out.cc (loc_list_from_tree_1): Assume integral types with BLKmode are larger than DWARF2_ADDR_SIZE. * gcc.dg/bitint-80.c: New test.
[Bug tree-optimization/113639] ICE: in handle_operand_addr, at gimple-lower-bitint.cc:2265 at -O with _BitInt() in a struct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113639 --- Comment #3 from GCC Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:90ac839a470d61ffcd9eee0d7d37ca9c385dfefb commit r14-8650-g90ac839a470d61ffcd9eee0d7d37ca9c385dfefb Author: Jakub Jelinek Date: Wed Jan 31 10:50:33 2024 +0100 lower-bitint: Fix up VIEW_CONVERT_EXPR handling in handle_operand_addr [PR113639] Yet another spot where we need to treat VIEW_CONVERT_EXPR differently from NOP_EXPR/CONVERT_EXPR. 2024-01-31 Jakub Jelinek PR tree-optimization/113639 * gimple-lower-bitint.cc (bitint_large_huge::handle_operand_addr): For VIEW_CONVERT_EXPR set rhs1 to its operand. * gcc.dg/bitint-79.c: New test.
[Bug target/113679] long long minus double with gcc -m32 produces different results than other compilers or gcc -m64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113679 --- Comment #5 from Дилян Палаузов --- gcc -m64 -fexcess-precision=fast -o diff diff.c && ./diff 0.00 gcc -m32 -fexcess-precision=fast -o diff diff.c && ./diff -2.00 clang -m32 -fexcess-precision=fast -o diff diff.c && ./diff 0.00 clang -m64 -fexcess-precision=fast -o diff diff.c && ./diff 0.00 gcc -m64 -fexcess-precision=standard -o diff diff.c && ./diff 0.00 gcc -m32 -fexcess-precision=standard -o diff diff.c && ./diff 0.00 clang -m32 -fexcess-precision=standard -o diff diff.c && ./diff 0.00 clang -m64 -fexcess-precision=standard -o diff diff.c && ./diff 0.00 If this excess precision has justification, why are the results different for 32 and 64bit code? With printf("%f\n", (double)l - d); printf("%f\n", (double)(l - d)); there is indeed a difference: $ gcc -m32 -fexcess-precision=standard -o diff diff.c && ./diff 0.00 -2.00
[Bug target/113679] long long minus double with gcc -m32 produces different results than other compilers or gcc -m64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113679 --- Comment #6 from Andrew Pinski --- Because 64bit uses the SSE2 fp instructions rather than x87 fp instructions.
[Bug rtl-optimization/113656] [x86] ICE in simplify_const_unary_operation, at simplify-rtx.cc:1954 with new -mavx10.1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113656 --- Comment #7 from GCC Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:b59775b642bb2b1ecd2e6d52c988b9c432117bc8 commit r14-8652-gb59775b642bb2b1ecd2e6d52c988b9c432117bc8 Author: Jakub Jelinek Date: Wed Jan 31 10:56:56 2024 +0100 simplify-rtx: Fix up last argument to simplify_gen_unary [PR113656] When simplifying e.g. (float_truncate:SF (float_truncate:DF (reg:XF)) or (float_truncate:SF (float_extend:XF (reg:DF)) etc. into (float_truncate:SF (reg:XF)) or (float_truncate:SF (reg:DF)) we call simplify_gen_unary with incorrect op_mode argument, it should be the argument's mode, but we call it with the outer mode instead. As these are all floating point operations, the argument always has non-VOIDmode and so we can just use that mode (as done in similar simplifications a few lines later), but neither FLOAT_TRUNCATE nor FLOAT_EXTEND are operations that should have the same modes of operand and result. This bug hasn't been a problem for years because normally op_mode is used only if the mode of op is VOIDmode, otherwise it is redundant, but r10-2139 added an assertion in some spots that op_mode is right even in such cases. 2024-01-31 Jakub Jelinek PR rtl-optimization/113656 * simplify-rtx.cc (simplify_context::simplify_unary_operation_1) : Fix up last argument to simplify_gen_unary. * gcc.target/i386/pr113656.c: New test.
[Bug libstdc++/90276] PSTL tests fail in Debug Mode
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90276 Jonathan Wakely changed: What|Removed |Added Last reconfirmed|2024-01-24 00:00:00 |2019-04-29 0:00 --- Comment #4 from Jonathan Wakely --- In testsuite/util/pstl/test_utils.h we have: template struct reverse_invoker { template void operator()(Rest&&... rest) { // Random-access iterator iterator_invoker()(std::forward(rest)...); // Forward iterator iterator_invoker()(std::forward(rest)...); // Bidirectional iterator iterator_invoker()(std::forward(rest)...); } }; This is called with rvalue iterators e.g. TestUtils::invoke_on_all_policies(check_minelement(), wseq.seq.cbegin(), wseq.seq.cend()); In the body of reverse_invoker::operator() we forward them as rvalues which causes them to be moved into the by-value parameters of iterator_invoker::operator() Then we forward them again, which causes them to be moved again. The debug iterators abort at this point, because they're singular after the first move. So the problem is that a moved-from __debug::vector::iterator is singular, and therefore can't be moved or copied. I wonder if that's really what we want, or if a moved-from iterator should have the value-initialized state instead of a singular state. The standard is clear that a singular iterator cannot be copied or moved, unless it was value-initialized, see [iterator.requirements.general] p7. In any case, the PSTL test harness should probably not be using moved-from iterators more than once.
[Bug libstdc++/90276] PSTL tests fail in Debug Mode
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90276 Jonathan Wakely changed: What|Removed |Added See Also||https://github.com/llvm/llv ||m-project/issues/80126 --- Comment #5 from Jonathan Wakely --- Reported upstream: https://github.com/llvm/llvm-project/issues/80126
[Bug target/113679] long long minus double with gcc -m32 produces different results than other compilers or gcc -m64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113679 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #7 from Jakub Jelinek --- And while SSE/SSE2 has instructions for performing arithmetics in IEEE754 single and double formats, x87 does not, everything is done in extended precision (unless the FPU is configured to use smaller precision but then it doesn't support the extended precision long double on the other side) and conversions to IEEE754 single/double have to be done when storing the extended precision registers into memory. So, it is impossible to achieve the expected IEEE754 single and double arithmetics behavior, one can get only something close to it (but with double rounding problems) if all the temporaries are immediately stored into memory and loaded from it again. The -ffloat-store option does it to a limited extent (doesn't convert everything though), but still, the performance is terrible. C allows extended precision and specifies how to should behave, that is the -fexcess-precision=standard model (e.g. enabled by default for -std=c{99,11,...} options as opposed to -std=gnu..., then it is consistently using the excess precision with some casts/assignments mandating rounding to lower precisions, while -fexcess-precision=fast is what gcc has been implementing before it has been introduced, excess precision is used there as long as something is kept in the FPU registers and conversions are done when it needs to be spilled to memory.
[Bug tree-optimization/113639] ICE: in handle_operand_addr, at gimple-lower-bitint.cc:2265 at -O with _BitInt() in a struct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113639 Jakub Jelinek changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #4 from Jakub Jelinek --- Fixed.
[Bug debug/113637] ICE: in as_a, at machmode.h:381 with extern function declaration and _BitInt() used as VLA size
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113637 Jakub Jelinek changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #5 from Jakub Jelinek --- Fixed.
[Bug rtl-optimization/113656] [x86] ICE in simplify_const_unary_operation, at simplify-rtx.cc:1954 with new -mavx10.1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113656 --- Comment #8 from Jakub Jelinek --- Fixed on the trunk so far.
[Bug target/111403] LoongArch: Wrong code with -O -mlasx -fopenmp-simd
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111403 --- Comment #3 from Xi Ruoyao --- It seems no longer happening with current trunk. Let me do a bisection...
[Bug ipa/111444] [14 Regression] Wrong code at -O2/3/s on x86_64-gnu since r14-3226-gd073e2d75d9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111444 --- Comment #9 from Richard Biener --- (In reply to Richard Biener from comment #8) > The best fix would likely be to pre-insert all the IPA-CP known constants > instead of trying to discover them "late". > > I'm testing the easy fix for now. Hmm. gcc.dg/ipa/pr92497-1.c FAILs because of that. We get __attribute__((noinline)) int bar.constprop (struct a a) { intD.6 a$aD.2808; intD.6 D.2807; struct a aD.2806; intD.6 _4; [local count: 1073741824]: # .MEM_5 = VDEF <.MEM_2(D)> aD.2806 = aD.2800; # VUSE <.MEM_5> a$a_3 = aD.2806.aD.2769; here and thus translate through the aggregate copy - the result should then be put on aD.2806 but of course only with .MEM_5. Maybe we can and should always use the default def here but I'm slightly uneasy with the ref adjustment, esp. since we're going to record for the saved operands (if those exist - the path where it goes wrong isn't translated).
[Bug ipa/111444] [14 Regression] Wrong code at -O2/3/s on x86_64-gnu since r14-3226-gd073e2d75d9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111444 --- Comment #10 from Richard Biener --- Hmm, I have another fix.
[Bug tree-optimization/113630] [11/12/13/14 Regression] -fno-strict-aliasing introduces out-of-bounds memory access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113630 --- Comment #6 from GCC Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:724b64304ff5c8ac08a913509afd6fde38d7b767 commit r14-8653-g724b64304ff5c8ac08a913509afd6fde38d7b767 Author: Richard Biener Date: Wed Jan 31 11:28:50 2024 +0100 tree-optimization/113630 - invalid code hoisting The following avoids code hoisting (but also PRE insertion) of expressions that got value-numbered to another one that are not a valid replacement (but still compute the same value). This time because the access path ends in a structure with different size, meaning we consider a related access as not trapping because of the size of the base of the access. PR tree-optimization/113630 * tree-ssa-pre.cc (compute_avail): Avoid registering a reference with a representation with not matching base access size. * gcc.dg/torture/pr113630.c: New testcase.
[Bug tree-optimization/113630] [11/12/13 Regression] -fno-strict-aliasing introduces out-of-bounds memory access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113630 Richard Biener changed: What|Removed |Added Priority|P3 |P2 Summary|[11/12/13/14 Regression]|[11/12/13 Regression] |-fno-strict-aliasing|-fno-strict-aliasing |introduces out-of-bounds|introduces out-of-bounds |memory access |memory access Known to work||14.0 --- Comment #7 from Richard Biener --- Fixed on trunk sofar.
[Bug tree-optimization/113134] gcc does not version loops with early break conditions that don't have side-effects
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134 --- Comment #19 from JuzheZhong --- The loop is: bb 3 -> bb 4 -> bb 5 | |__⬆ |__⬆ The condition in bb 3 is if (i_21 == 1001). The condition in bb 4 is if (N_13(D) > i_18). Look into lsplit: This loop doesn't satisfy the check of: if (split_loop (loop) || split_loop_on_cond (loop)) In split_loop_on_cond, it's trying to split the loop that condition is loop invariant. However, no matter bb 3 or bb 4, their conditions are not loop invariant. I wonder whether we should add a new kind of loop splitter like: diff --git a/gcc/tree-ssa-loop-split.cc b/gcc/tree-ssa-loop-split.cc index 04215fe7937..a4081b9b6f5 100644 --- a/gcc/tree-ssa-loop-split.cc +++ b/gcc/tree-ssa-loop-split.cc @@ -1769,7 +1769,8 @@ tree_ssa_split_loops (void) if (optimize_loop_for_size_p (loop)) continue; - if (split_loop (loop) || split_loop_on_cond (loop)) + if (split_loop (loop) || split_loop_on_cond (loop) + || split_loop_for_early_break (loop)) { /* Mark our containing loop as having had some split inner loops. */ loop_outer (loop)->aux = loop;
[Bug libstdc++/99832] std::chrono::system_clock::to_time_t needs ABI tag for 32-bit time_t
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99832 --- Comment #1 from Jonathan Wakely --- Maybe something like this: diff --git a/libstdc++-v3/config/os/gnu-linux/os_defines.h b/libstdc++-v3/config/os/gnu-linux/os_defines.h index 0af29325388..f7c73560831 100644 --- a/libstdc++-v3/config/os/gnu-linux/os_defines.h +++ b/libstdc++-v3/config/os/gnu-linux/os_defines.h @@ -84,7 +84,13 @@ // Since glibc 2.34 all pthreads functions are usable without linking to // libpthread. # define _GLIBCXX_GTHREAD_USE_WEAK 0 -# endif +// Since glibc 2.34 using -D_TIME_BITS=64 will enable 64-bit time_t +// for "legacy ABIs", i.e. ones that historically used 32-bit time_t. +// This internal glibc macro will be defined iff new 64-bit time_t is in use. +# ifdef __USE_TIME_BITS64 +# define _GLIBCXX_TIME_BITS64 1 +# endif +# endif // glibc 2.34 #endif // __linux__ #endif diff --git a/libstdc++-v3/include/bits/chrono.h b/libstdc++-v3/include/bits/chrono.h index 579c5a266be..a63782b92ff 100644 --- a/libstdc++-v3/include/bits/chrono.h +++ b/libstdc++-v3/include/bits/chrono.h @@ -1242,6 +1242,9 @@ _GLIBCXX_BEGIN_INLINE_ABI_NAMESPACE(_V2) now() noexcept; // Map to C API +#ifdef _GLIBCXX_TIME_BITS64 + [[__gnu__::__abi_tag__("__time64")]] +#endif static std::time_t to_time_t(const time_point& __t) noexcept { @@ -1249,6 +1252,9 @@ _GLIBCXX_BEGIN_INLINE_ABI_NAMESPACE(_V2) (__t.time_since_epoch()).count()); } +#ifdef _GLIBCXX_TIME_BITS64 + [[__gnu__::__abi_tag__("__time64")]] +#endif static time_point from_time_t(std::time_t __t) noexcept { Alternatively, in do: #define _GLIBCXX_TIME_BITS64_ABI_TAG and then in config/os/gnu-linux/os_defines.h: # ifdef __USE_TIME_BITS64 # undef _GLIBCXX_TIME_BITS64_ABI_TAG # define _GLIBCXX_TIME_BITS64_ABI_TAG [[__gnu__::__abi_tag__("__time64")]] # endif Then the chrono code can just use that unconditionally instead of using #ifdef I think for musl, newer versions use 64-bit time_t unconditionally. I'm not sure if we can (or need to) use the abi_tag there.
[Bug tree-optimization/113134] gcc does not version loops with early break conditions that don't have side-effects
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134 --- Comment #20 from Richard Biener --- I think we want split_loop () handle this case. That means extending it to handle loops with multiple exits. OTOH after loop rotation to if (i_21 == 1001) goto ; [1.00%] else goto ; [99.00%] [local count: 1004539166]: i_18 = i_21 + 1; if (N_13(D) > i_18) goto ; [94.50%] else goto ; [5.50%] it could be also IVCANONs job to rewrite the exit test so the bound is loop invariant and it becomes a single exit. There's another recent PR where an exit condition like i < N && i < M should become i < MIN(N,M).
[Bug libstdc++/99832] std::chrono::system_clock::to_time_t needs ABI tag for 32-bit time_t
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99832 --- Comment #2 from Jonathan Wakely --- (In reply to Jonathan Wakely from comment #1) > +// Since glibc 2.34 using -D_TIME_BITS=64 will enable 64-bit time_t > +// for "legacy ABIs", i.e. ones that historically used 32-bit time_t. > +// This internal glibc macro will be defined iff new 64-bit time_t is in > use. This is correct for current glibc releases, but in glibc master __USE_TIME_BITS64 is defined unconditionally to 0 or 1 and tells you the size of time_t, not whether it switched to 64-bit counter to the legacy ABI: https://inbox.sourceware.org/libc-alpha/20240118131801.600373-1-adhemerval.zane...@linaro.org/ Yay.
[Bug libstdc++/90276] PSTL tests fail in Debug Mode
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90276 --- Comment #6 from Jonathan Wakely --- Some of the tests FAIL for different reasons: /home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/stl_algo.h:2051: In function: _FIter std::upper_bound(_FIter, _FIter, const _Tp&, _Compare) [with _FIter = gnu_debug::_Safe_iterator*, vector, allocator > > >, debug::vector, allocator > >, random_access_iterator_tag>; _Tp = Num; _Compare = main()::, Num)>] Error: elements in iterator range [first, last) are not partitioned by the predicate __comp and value __val. Objects involved in the operation: iterator "first" @ 0x7ffda0426810 { type = gnu_cxx::normal_iterator*, std::vector, std::allocator > > > (mutable iterator); state = dereferenceable (start-of-sequence); references sequence with type 'std::debug::vector, std::allocator > >' @ 0x7ffda0427730 } iterator "last" @ 0x7ffda0426840 { type = gnu_cxx::normal_iterator*, std::vector, std::allocator > > > (mutable iterator); state = dereferenceable; references sequence with type 'std::debug::vector, std::allocator > >' @ 0x7ffda0427730 } FAIL: 25_algorithms/pstl/alg_sorting/partial_sort.cc -std=gnu++17 execution test
[Bug rtl-optimization/113680] New: Missed optimization: Redundant cmp/test instructions when comparing (x - y) > 0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113680 Bug ID: 113680 Summary: Missed optimization: Redundant cmp/test instructions when comparing (x - y) > 0 Product: gcc Version: 13.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: Explorer09 at gmail dot com Target Milestone: --- Note: This issue is not limited to x86-64. I also tested it with ARM64 gcc in Compiler Explorer (https://godbolt.org/) and it has the same "redundant cmp instruction" problem. This may be related to bug #3507 but I can't make sure it's the same bug. I apologize if I reported a duplicate. == Test code == ```c #include void func1(int x, int y) { int diff = x - y; if (diff > 0) putchar('>'); if (diff < 0) putchar('<'); } void func2(int x, int y) { if ((x - y) > 0) putchar('>'); if ((x - y) < 0) putchar('<'); } void func3(int x, int y) { if (x > y) putchar('>'); if (x < y) { putchar('<'); } void func4(int x, int y) { int diff = x - y; if (diff > 0) putchar('>'); if (x < y) { putchar('<'); } ``` == Actual result == With x86-64 "gcc -Os" it generates the following. In short, gcc can recognize func1() and func2() as completely identical, but didn't recognize func1() and func2() can both optimize to func3(). func4() currently generates the worst assembly, but it might be another issue to address (something messes up the register allocation algorithm). ```x86asm func1: subl%esi, %edi testl %edi, %edi jle .L2 movl$62, %edi jmp .L4 .L2: je .L1 movl$60, %edi .L4: jmp putchar .L1: ret func2: jmp func1 func3: cmpl%esi, %edi jle .L8 movl$62, %edi jmp .L10 .L8: jge .L7 movl$60, %edi .L10: jmp putchar .L7: ret func4: pushq %rbp movl%edi, %ebp pushq %rbx movl%esi, %ebx pushq %rcx cmpl%esi, %edi jle .L12 movl$62, %edi callputchar .L12: cmpl%ebx, %ebp jge .L11 popq%rdx movl$60, %edi popq%rbx popq%rbp jmp putchar .L11: popq%rax popq%rbx popq%rbp ret ``` == Expected result == func1(), func2(), func3() and func4() are all identical. With the func3() as the example for the best assembly. (No redundant "test" instruction; the "sub" instruction can simplify into a "cmp".)
[Bug middle-end/113680] Missed optimization: Redundant cmp/test instructions when comparing (x - y) > 0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113680 Richard Biener changed: What|Removed |Added Component|rtl-optimization|middle-end Status|UNCONFIRMED |NEW Last reconfirmed||2024-01-31 Keywords||easyhack, ||missed-optimization Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- I don't think we have or had a (a - b) CMP 0 simplification pattern which this seems to be about. We have a +- CST CMP CST'. Note the reverse, a < b -> (a - b) < 0 isn't valid.
[Bug ipa/111444] [14 Regression] Wrong code at -O2/3/s on x86_64-gnu since r14-3226-gd073e2d75d9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111444 --- Comment #11 from GCC Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:cfb3f666562fb4ab896a05c234a697afb63627a4 commit r14-8655-gcfb3f666562fb4ab896a05c234a697afb63627a4 Author: Richard Biener Date: Wed Jan 31 10:42:48 2024 +0100 tree-optimization/111444 - avoid insertions when skipping defs The following avoids inserting expressions for IPA CP discovered equivalences into the VN hashtables when we are optimistically skipping may-defs in the attempt to prove it's redundant. PR tree-optimization/111444 * tree-ssa-sccvn.cc (vn_reference_lookup_3): Do not use vn_reference_lookup_2 when optimistically skipping may-defs. * gcc.dg/torture/pr111444.c: New testcase.
[Bug ipa/111444] [14 Regression] Wrong code at -O2/3/s on x86_64-gnu since r14-3226-gd073e2d75d9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111444 Richard Biener changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #12 from Richard Biener --- Fixed.
[Bug middle-end/113680] Missed optimization: Redundant cmp/test instructions when comparing (x - y) > 0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113680 --- Comment #2 from Kang-Che Sung --- I forgot to mention that such optimization is unsafe for floating points (actually I knew that when I write my code). `(a - b) < 0` optimization shouldn't be performed with unsigned integers either. I request only optimizations on signed integers.
[Bug libstdc++/90276] PSTL tests fail in Debug Mode
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90276 --- Comment #7 from Jonathan Wakely --- __pstl::__tbb_backend::__merge_func::split_merging (which should be a reserved name) does: if (__nx < __ny) { __ym = _M_ys + __ny / 2; if (_x_orig) __xm = std::upper_bound(_M_x_beg + _M_xs, _M_x_beg + _M_xe, *(_M_x_beg + __ym), _M_comp) - _M_x_beg; else __xm = std::upper_bound(_M_z_beg + _M_xs, _M_z_beg + _M_xe, *(_M_z_beg + __ym), _M_comp) - _M_z_beg; } which aborts because the range is not correctly sorted w.r.t _M_comp, as required by upper_bound. The range looks like this: $1 = std::__cxx1998::vector of length 1284, capacity 1284 = {{val = 0}, {val = 5290}, {val = 9862}, {val = 8699}, {val = 5471}, {val = 4810}, { val = 6176}, {val = 1400}, {val = 5025}, {val = 3246}, {val = 2547}, {val = 8814}, {val = 2463}, {val = 8800}, {val = 3074}, {val = 5741}, { val = 5234}, {val = 736}, {val = 4895}, {val = 6803}, {val = 2363}, {val = 5351}, {val = 6719}, {val = 7967}, {val = 732}, {val = 1399}, {val = 7586}, {val = 4659}, {val = 3800}, {val = 6956}, {val = 4087}, {val = 9090}, {val = 2293}, {val = 8702}, {val = 2263}, {val = 7765}, {val = 3233}, { val = 8440}, {val = 3918}, {val = 8259}, {val = 6439}, {val = 6465}, {val = 6794}, {val = 3656}, {val = 10018}, {val = 4621}, {val = 9397}, { val = 4973}, {val = 584}, {val = 9046}, {val = 6530}, {val = 2474}, {val = 4118}, {val = 2970}, {val = 162}, {val = 4850}, {val = 9401}, {val = 7748}, {val = 9509}, {val = 2923}, {val = 4425}, {val = 8349}, {val = 6766}, {val = 6719}, {val = 6773}, {val = 3783}, {val = 4205}, {val = 4759}, { val = 6976}, {val = 8123}, {val = 2739}, {val = 3136}, {val = 4309}, {val = 4286}, {val = 6792}, {val = 4048}, {val = 8908}, {val = 664}, { val = 3774}, {val = 9019}, {val = 9710}, {val = 111}, {val = 1214}, {val = 8581}, {val = 2996}, {val = 6409}, {val = 3152}, {val = 7150}, { val = 3878}, {val = 7415}, {val = 10073}, {val = 3057}, {val = 238}, {val = 1314}, {val = 9776}, {val = 7011}, {val = 5097}, {val = 8734}, { val = 6524}, {val = 1794}, {val = 6578}, {val = 9263}, {val = 9962}, {val = 5640}, {val = 3271}, {val = 1229}, {val = 4441}, {val = 6932}, { val = 1893}, {val = 2968}, {val = 425}, {val = 6356}, {val = 2994}, {val = 6671}, {val = 4658}, {val = 743}, {val = 2801}, {val = 2563}, {val = 7893}, {val = 1433}, {val = 4731}, {val = 2441}, {val = 4490}, {val = 4970}, {val = 8787}, {val = 3987}, {val = 6734}, {val = 3605}, {val = 7474}, { val = 2979}, {val = 152}, {val = 8805}, {val = 1964}, {val = 10114}, {val = 4166}, {val = 10267}, {val = 6096}, {val = 3360}, {val = 1673}, { val = 2742}, {val = 6328}, {val = 7130}, {val = 9098}, {val = 4075}, {val = 8554}, {val = 8509}, {val = 9850}, {val = 1077}, {val = 794}, { val = 7465}, {val = 2510}, {val = 5525}, {val = 4659}, {val = 1753}, {val = 216}, {val = 3167}, {val = 493}, {val = 1704}, {val = 1525}, {val = 7967}, {val = 4683}, {val = 6709}, {val = 6493}, {val = 1400}, {val = 1297}, {val = 5412}, {val = 6420}, {val = 7394}, {val = 8772}, {val = 2846}, { val = 10136}, {val = 9853}, {val = 9976}, {val = 3709}, {val = 8682}, {val = 8252}, {val = 1939}, {val = 8253}, {val = 4082}, {val = 7765}, { val = 5439}, {val = 1345}, {val = 3012}, {val = 4851}, {val = 3098}, {val = 8260}, {val = 2771}, {val = 3591}, {val = 4717}, {val = 9328}, { val = 1279}, {val = 9401}, {val = 5758}, {val = 2525}, {val = 5554}, {val = 1809}, {val = 7937}, {val = 1696}, {val = 9203}, {val = 1183}...} This is indeed not partitioned: (gdb) p __val $2 = (const Num &) @0x77994e28: {val = 687} (gdb) p __first[46] $4 = (Num &) @0x779930c8: {val = 9397} (gdb) p __first[47] $5 = (Num &) @0x779930cc: {val = 4973} (gdb) p __first[48] $6 = (Num &) @0x779930d0: {val = 584} < (gdb) p __first[49] $7 = (Num &) @0x779930d4: {val = 9046} I think this needs to be reported upstream too.
[Bug libstdc++/90276] PSTL tests fail in Debug Mode
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90276 --- Comment #8 from Jonathan Wakely --- https://github.com/llvm/llvm-project/issues/80136
[Bug modula2/111627] modula2: Excess test fails with a case-preserving-case-insensitive source tree.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111627 --- Comment #2 from Gaius Mulley --- Created attachment 57267 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57267&action=edit Proposed fix Here is a proposed patch, the problem was fixed by renaming conflicting testnames. There were some testsuite named modules which matched library names (but used a different case).
[Bug tree-optimization/113681] New: [14 Regression] ICE in tree_profiling, at tree-profile.cc:803 since r14-6201-gf0a90c7d7333fc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113681 Bug ID: 113681 Summary: [14 Regression] ICE in tree_profiling, at tree-profile.cc:803 since r14-6201-gf0a90c7d7333fc Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: ice-on-valid-code Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: mjires at suse dot cz CC: aoliva at gcc dot gnu.org Target Milestone: --- Compiling reduced testcase c-c++-common/torture/strub-inlinable2.c results in ICE since r14-6201-gf0a90c7d7333fc which introduced this test. $ cat strub-inlinable2.c inline void __attribute__((strub, always_inline)) inl_int_ali() {} void bat() { inl_int_ali(); } $ gcc strub-inlinable2.c -fbranch-probabilities strub-inlinable2.c:2:14: error: calling ‘always_inline’ ‘strub’ ‘inl_int_ali’ in non-‘strub’ context ‘bat’ 2 | void bat() { inl_int_ali(); } | ^ during IPA pass: profile strub-inlinable2.c:2:1: internal compiler error: in tree_profiling, at tree-profile.cc:803 2 | void bat() { inl_int_ali(); } | ^~~~ 0x181b321 tree_profiling /home/mjires/git/GCC/master/gcc/tree-profile.cc:803 0x181bc4c execute /home/mjires/git/GCC/master/gcc/tree-profile.cc:990 Please submit a full bug report, with preprocessed source (by using -freport-bug). Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. $ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/home/mjires/built/master/libexec/gcc/x86_64-pc-linux-gnu/14.0.1/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /home/mjires/git/GCC/master/configure --prefix=/home/mjires/built/master --disable-bootstrap --enable-languages=c,c++,fortran,lto --disable-multilib --disable-libsanitizer --enable-checking : (reconfigured) /home/mjires/git/GCC/master/configure --prefix=/home/mjires/built/master --disable-bootstrap --enable-languages=c,c++,fortran,lto --disable-multilib --disable-libsanitizer --enable-checking Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 14.0.1 20240131 (experimental) (GCC)
[Bug tree-optimization/110176] [11/12/13/14 Regression] wrong code at -Os and above on x86_64-linux-gnu since r11-2446
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110176 --- Comment #9 from Richard Biener --- With all VARYING we simplify i_19 = (int) _2; _6 = (int) _5; Value numbering stmt = _7 = _6 <= i_19; Applying pattern match.pd:6775, gimple-match-4.cc:1795 Match-and-simplified _6 <= i_19 to 1 where _5 is _Bool and _2 is unsigned int. We match zext <= (int) 4294967295u note that I see Value numbering stmt = _2 = f$0_25; Setting value number of _2 to 4294967295 (changed) Value numbering stmt = i_19 = (int) _2; Match-and-simplified (int) _2 to -1 RHS (int) _2 simplified to -1 Not changing value number of i_19 from VARYING to -1 Making available beyond BB6 i_19 for value i_19 so it's odd we see the constant here, but ... we go (if (TREE_CODE (@10) == INTEGER_CST && INTEGRAL_TYPE_P (TREE_TYPE (@00)) && !int_fits_type_p (@10, TREE_TYPE (@00))) (with { tree min = lower_bound_in_type (TREE_TYPE (@10), TREE_TYPE (@00)); tree max = upper_bound_in_type (TREE_TYPE (@10), TREE_TYPE (@00)); bool above = integer_nonzerop (const_binop (LT_EXPR, type, max, @10)); bool below = integer_nonzerop (const_binop (LT_EXPR, type, @10, min)); } (if (above || below) failing to see that we deal with a relational compare and a sign-change. The original code from fold-const.cc had only INTEGER_TYPE support, r6-4300-gf6c1575958f7bf made it cover all integral types (it half-way supported BOOLEAN_TYPE already). But the issue was latent I think. One notable difference was that I think get_unwidened made sure to convert a constant to the wider type while here we have @10 != @1 and the conversion not applied. We're doing it correct in earlier code: /* ??? The special-casing of INTEGER_CST conversion was in the original code and here to avoid a spurious overflow flag on the resulting constant which fold_convert produces. */ (if (TREE_CODE (@1) == INTEGER_CST) using @1 instead of @10. Correcting that avoids the pattern from triggering in this wrong way.
[Bug other/113682] New: Branches in branchless binary search rather than cmov/csel/csinc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113682 Bug ID: 113682 Summary: Branches in branchless binary search rather than cmov/csel/csinc Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: other Assignee: unassigned at gcc dot gnu.org Reporter: redbeard0531 at gmail dot com Target Milestone: --- I've been trying to eliminate unpredictable branches in a hot function where perf counters show a high mispredict percentage. Unfortunately, I haven't been able to find an incantation to get gcc to generate branchless code other than inline asm, which I'd rather avoid. In this case I've even laid out the critical lines so that they exactly match the behavior of the csinc and csel instructions on arm64, but they are still not used. Somewhat minimized repro: typedef unsigned long size_t; struct ITEM {char* addr; size_t len;}; int cmp(ITEM* user, ITEM* tree); size_t bsearch2(ITEM* user, ITEM** tree, size_t treeSize) { auto pos = tree; size_t low = 0; size_t high = treeSize; while (low < high) { size_t i = (low + high) / 2; int res = cmp(user, tree[i]); // These should be cmp + csinc + csel on arm // and lea + test + cmov + cmov on x86. low = res > 0 ? i + 1 : low; // csinc high = res < 0 ? i: high; // csel if (res == 0) return i; } return -1; } On arm64 that generates a conditional branch on res > 0: bl cmp(ITEM*, ITEM*) cmp w0, 0 bgt .L15 // does low = i + 1 then loops mov x20, x19 bne .L4 // loop On x86_64 it does similar: callcmp(ITEM*, ITEM*) testeax, eax jg .L16 jne .L17 Note that clang produces the desired codegen for both: arm: bl cmp(ITEM*, ITEM*) cmp w0, #0 csinc x23, x23, x22, le cselx19, x22, x19, lt cbnzw0, .LBB0_1 x86: callcmp(ITEM*, ITEM*)@PLT lea rcx, [r12 + 1] testeax, eax cmovg r13, rcx cmovs rbx, r12 jne .LBB0_1 (full output for all 4 available at https://www.godbolt.org/z/aWrKbYPTG. Code snippets from trunk, but also repos on 13.2) While ideally gcc would generate the branchless output for the supplied code, if there is some (reasonable) incantation that would cause it to produce branchless output, I'd be happy to have that too.
[Bug analyzer/113509] ICE: SIGSEGV in c_tree_printer (c-objc-common.cc:341) with -fanalyzer -fanalyzer-verbose-state-changes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113509 David Malcolm changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #4 from David Malcolm --- Should be resolved by the above patch.
[Bug target/111677] [12/13/14 Regression] darktable build on aarch64 fails with unrecognizable insn due to -fstack-protector changes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111677 --- Comment #26 from GCC Commits --- The master branch has been updated by Alex Coplan : https://gcc.gnu.org/g:0529ba8168c89f24314e8750237d77bb132bea9c commit r14-8657-g0529ba8168c89f24314e8750237d77bb132bea9c Author: Alex Coplan Date: Tue Jan 30 10:22:48 2024 + aarch64: Avoid out-of-range shrink-wrapped saves [PR111677] The PR shows us ICEing due to an unrecognizable TFmode save emitted by aarch64_process_components. The problem is that for T{I,F,D}mode we conservatively require mems to be in range for x-register ldp/stp. That is because (at least for TImode) it can be allocated to both GPRs and FPRs, and in the GPR case that is an x-reg ldp/stp, and the FPR case is a q-register load/store. As Richard pointed out in the PR, aarch64_get_separate_components already checks that the offsets are suitable for a single load, so we just need to choose a mode in aarch64_reg_save_mode that gives the full q-register range. In this patch, we choose V16QImode as an alternative 16-byte "bag-of-bits" mode that doesn't have the artificial range restrictions imposed on T{I,F,D}mode. For T{F,D}mode in GCC 15 I think we could consider relaxing the restriction imposed in aarch64_classify_address, as typically T{F,D}mode should be allocated to FPRs. But such a change seems too invasive to consider for GCC 14 at this stage (let alone backports). Fortunately the new flexible load/store pair patterns in GCC 14 allow this mode change to work without further changes. The backports are more involved as we need to adjust the load/store pair handling to cater for V16QImode in a few places. Note that for the testcase we are relying on the torture options to add -funroll-loops at -O3 which is necessary to trigger the ICE on trunk (but not on the 13 branch). gcc/ChangeLog: PR target/111677 * config/aarch64/aarch64.cc (aarch64_reg_save_mode): Use V16QImode for the full 16-byte FPR saves in the vector PCS case. gcc/testsuite/ChangeLog: PR target/111677 * gcc.target/aarch64/torture/pr111677.c: New test.
[Bug target/111677] [12/13 Regression] darktable build on aarch64 fails with unrecognizable insn due to -fstack-protector changes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111677 Alex Coplan changed: What|Removed |Added Summary|[12/13/14 Regression] |[12/13 Regression] |darktable build on aarch64 |darktable build on aarch64 |fails with unrecognizable |fails with unrecognizable |insn due to |insn due to |-fstack-protector changes |-fstack-protector changes --- Comment #27 from Alex Coplan --- Fixed on trunk for GCC 14, keeping open for backports.
[Bug target/113357] [14 regression] m68k-linux bootstrap failure in stage2 due to segfault compiling unwind-dw2.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113357 --- Comment #4 from Mikael Pettersson --- Confirmed: 04c9cf5c786b94fbe3f6f21f06cae73a7575ff7a is the first new commit commit 04c9cf5c786b94fbe3f6f21f06cae73a7575ff7a Author: Manolis Tsamis Date: Mon Oct 16 13:08:12 2023 -0600 Implement new RTL optimizations pass: fold-mem-offsets
[Bug rtl-optimization/113682] Branches in branchless binary search rather than cmov/csel/csinc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113682 Richard Biener changed: What|Removed |Added Keywords||missed-optimization Component|other |rtl-optimization Version|unknown |14.0 Target||aarch64, x86_64-*-* --- Comment #1 from Richard Biener --- Since there's a loop exit involved (and the loop has multiple exits) if-conversion is made difficult here. You could try rotating manually producing a do { } while loop with a "nicer" exit condition and see whether that helps.
[Bug tree-optimization/113681] [14 Regression] ICE in tree_profiling, at tree-profile.cc:803 since r14-6201-gf0a90c7d7333fc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113681 Richard Biener changed: What|Removed |Added Keywords||error-recovery Target Milestone|--- |14.0
[Bug tree-optimization/113681] [14 Regression] ICE in tree_profiling, at tree-profile.cc:803 since r14-6201-gf0a90c7d7333fc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113681 Richard Biener changed: What|Removed |Added Priority|P3 |P4
[Bug debug/92444] [11/12/13/14 regression] gcc generates wrong debug information at -O2 and -O3 since r10-4122-gf658ad3002a0af
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92444 Richard Biener changed: What|Removed |Added Target Milestone|--- |11.5
[Bug target/105275] [12/13/14 regression] 525.x264_r and 538.imagick_r regressed on x86_64 at -O2 with PGO after r12-7319-g90d693bdc9d718
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105275 Richard Biener changed: What|Removed |Added Target Milestone|--- |12.4
[Bug target/113679] long long minus double with gcc -m32 produces different results than other compilers or gcc -m64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113679 --- Comment #8 from Дилян Палаузов --- -fexcess-precision=standard does not ensure consistent behaviour between gcc 13.2.1 20231205 (Red Hat 13.2.1-6) and clang 17.0.5. -msse2 -mfpmath=sse does for diff.c: #include #include int main(void) { long long l = 9223372036854775806; double d = 9223372036854775808.0; printf("%f\n", (double)l - d); printf("%i\n", pow(3.3, 4.4) == 191.18831051580915); return 0; } $ gcc -lm -fexcess-precision=standard -m32 -o diff diff.c && ./diff 0.00 0 $ clang -lm -fexcess-precision=standard -m32 -o diff diff.c && ./diff 0.00 1 $ gcc -lm -fexcess-precision=standard -m64 -o diff diff.c && ./diff 0.00 1 $ clang -lm -fexcess-precision=standard -m64 -o diff diff.c && ./diff 0.00 1 $ gcc -lm -fexcess-precision=fast -m32 -o diff diff.c && ./diff -2.00 1 $ clang -lm -fexcess-precision=fast -m32 -o diff diff.c && ./diff 0.00 1 $ gcc -lm -fexcess-precision=fast -m64 -o diff diff.c && ./diff 0.00 1 $ clang -lm -fexcess-precision=fast -m64 -o diff diff.c && ./diff 0.00 1 $ gcc -lm -msse2 -mfpmath=sse -m32 -o diff diff.c && ./diff 0.00 1 $ clang -lm -msse2 -mfpmath=sse -m32 -o diff diff.c && ./diff 0.00 1 $ gcc -lm -msse2 -mfpmath=sse -m64 -o diff diff.c && ./diff 0.00 1 $ clang -lm -msse2 -mfpmath=sse -m64 -o diff diff.c && ./diff 0.00 1 cl.exe also prints 0.00 and 1
[Bug rtl-optimization/110390] [13/14 regression] ICE on valid code on x86_64-linux-gnu with sel-scheduling: in av_set_could_be_blocked_by_bookkeeping_p, at sel-sched.cc:3609 since r13-3596-ge7310e24b1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110390 Richard Biener changed: What|Removed |Added Target Milestone|--- |13.3
[Bug target/111170] [13/14 regression] Malformed manifest does not allow to run gcc on Windows XP (Accessing a corrupted shared library) since r13-6552-gd11e088210a551
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70 Richard Biener changed: What|Removed |Added Target Milestone|--- |13.3
[Bug target/113542] [14 Regression] gcc.target/arm/bics_3.c regression after change for pr111267
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113542 Richard Biener changed: What|Removed |Added Target Milestone|--- |14.0
[Bug testsuite/113611] [14 Regression] gcc.dg/pr110279-1.c fails on cross build since gcc-14-5779-g746344dd538
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113611 Richard Biener changed: What|Removed |Added Target Milestone|--- |14.0
[Bug rtl-optimization/113546] [13/14 Regression] aarch64: bootstrap-debug-lean broken with -fcompare-debug failure since r13-2921-gf1adf45b17f7f1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113546 Richard Biener changed: What|Removed |Added Target Milestone|--- |13.3
[Bug target/113641] [13/14 regression] 510.parest_r with PGO at O2 slower than GCC 12 (7% on Zen 3&2, 4% on CascadeLake) since r13-4272-g8caf155a3d6e23
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113641 Richard Biener changed: What|Removed |Added Target Milestone|--- |13.3
[Bug target/113679] long long minus double with gcc -m32 produces different results than other compilers or gcc -m64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113679 --- Comment #9 from Jakub Jelinek --- That is not what I read from what you've posted, -fexcess-precision=standard is consistent between the compilers, -fexcess-precision=fast is not (and doesn't have to be), neither between different compilers, nor between different optimization levels etc.
[Bug target/113679] long long minus double with gcc -m32 produces different results than other compilers or gcc -m64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113679 --- Comment #10 from Jakub Jelinek --- Oh, you mean the pow equality comparison. I think you should study something about floating point, errors, why equality comparisons of floating point values are usually a bad idea etc. There is no gcc bug, just bad user expectations.
[Bug gcov-profile/113646] PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646 --- Comment #3 from Martin Jambor --- (In reply to Richard Biener from comment #1) > Did you try with -fprofile-partial-training (is that default on? it > probably should ...). Can you please try training with the rate data > instead of train > to rule out a mismatch? With -fprofile-partial-training the znver4 LTO vs LTOPGO regression (on a newer master) goes down from 66% to 54%. So far I did not find a way to easily train with the reference run (when I add "train_with = refrate" to the config, I always get "ERROR: The workload specified by train_with MUST be a training workload!")
[Bug tree-optimization/110176] [11/12/13/14 Regression] wrong code at -Os and above on x86_64-linux-gnu since r11-2446
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110176 --- Comment #10 from GCC Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:22dbfbe8767ff4c1d93e39f68ec7c2d5b1358beb commit r14-8658-g22dbfbe8767ff4c1d93e39f68ec7c2d5b1358beb Author: Richard Biener Date: Wed Jan 31 14:40:24 2024 +0100 middle-end/110176 - wrong zext (bool) <= (int) 4294967295u folding The following fixes a wrong pattern that didn't match the behavior of the original fold_widened_comparison in that get_unwidened returned a constant always in the wider type. But here we're using (int) 4294967295u without the conversion applied. Fixed by doing as earlier in the pattern - matching constants only if the conversion was actually applied. PR middle-end/110176 * match.pd (zext (bool) <= (int) 4294967295u): Make sure to match INTEGER_CST only without outstanding conversion. * gcc.dg/torture/pr110176.c: New testcase.
[Bug regression/113672] [14 Regression] FAIL: g++.dg/pch/line-map-3.C -g -I. -Dwith_PCH (test for excess errors)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113672 Lewis Hyatt changed: What|Removed |Added Resolution|--- |DUPLICATE Status|UNCONFIRMED |RESOLVED CC||lhyatt at gcc dot gnu.org --- Comment #1 from Lewis Hyatt --- Thanks, yes the test is problematic because the warnings it looks for are platform dependent. There is a patch to address it here: https://gcc.gnu.org/pipermail/gcc-patches/2024-January/644487.html. It's being tracked at the original PR, so marking as a dupe of that one. *** This bug has been marked as a duplicate of bug 105608 ***
[Bug sanitizer/112644] [14 Regression] Some of the hwasan testcase fail after the recent merge
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112644 --- Comment #9 from GCC Commits --- The master branch has been updated by Tamar Christina : https://gcc.gnu.org/g:a73421bcf301911f2cbdb1c58316ddf3473ea6d5 commit r14-8659-ga73421bcf301911f2cbdb1c58316ddf3473ea6d5 Author: Tamar Christina Date: Wed Jan 31 14:44:35 2024 + libsanitizer: Sync fixes for asan interceptors from upstream This cherry-picks and squashes the differences between commits d3e5c20ab846303874a2a25e5877c72271fc798b..76e1e45922e6709392fb82aac44bebe3dbc2ea63 from LLVM upstream from compiler-rt/lib/hwasan/ to GCC on the changes relevant for GCC. This is required to fix the linked PR. As mentioned in the PR the last sync brought in a bug from upstream[1] where operations became non-recoverable and as such the tests in AArch64 started failing. This cherry picks the fix and there are minor updates needed to GCC after this to fix the cases. [1] https://github.com/llvm/llvm-project/pull/74000 PR sanitizer/112644 Cherry-pick llvm-project revision 672b71cc1003533460a82f06b7d24fbdc02ffd58, 5fcf3bbb1acfe226572474636714ede86fffcce8, 3bded112d02632209bd55fb28c6c5c234c23dec3 and 76e1e45922e6709392fb82aac44bebe3dbc2ea63.
[Bug tree-optimization/110176] [11/12/13 Regression] wrong code at -Os and above on x86_64-linux-gnu since r11-2446
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110176 Richard Biener changed: What|Removed |Added Known to work||14.0 Summary|[11/12/13/14 Regression]|[11/12/13 Regression] wrong |wrong code at -Os and above |code at -Os and above on |on x86_64-linux-gnu since |x86_64-linux-gnu since |r11-2446|r11-2446 --- Comment #11 from Richard Biener --- Fixed on trunk sofar.
[Bug preprocessor/105608] [11/12/13/14 Regression] ICE: in linemap_add with a really long defined macro on the command line r11-338-g2a0225e47868fbfc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105608 Lewis Hyatt changed: What|Removed |Added CC||danglin at gcc dot gnu.org --- Comment #14 from Lewis Hyatt --- *** Bug 113672 has been marked as a duplicate of this bug. ***
[Bug sanitizer/112644] [14 Regression] Some of the hwasan testcase fail after the recent merge
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112644 --- Comment #10 from GCC Commits --- The master branch has been updated by Tamar Christina : https://gcc.gnu.org/g:0debaceb11dee9781f9a8b320cb5893836324878 commit r14-8660-g0debaceb11dee9781f9a8b320cb5893836324878 Author: Tamar Christina Date: Wed Jan 31 14:50:33 2024 + hwasan: instrument new memory and string functions [PR112644] Recent libhwasan updates[1] intercept various string and memory functions. These functions have checking in them, which means there's no need to inline the checking. This patch marks said functions as intercepted, and adjusts a testcase to handle the difference. It also looks for HWASAN in a check in expand_builtin. This check originally is there to avoid using expand to inline the behaviour of builtins like memset which are intercepted by ASAN and hence which we rely on the function call staying as a function call. With the new reliance on function calls in HWASAN we need to do the same thing for HWASAN too. HWASAN and ASAN don't seem to however instrument the same functions. Looking into libsanitizer/sanitizer_common/sanitizer_common_interceptors_memintrinsics.inc it looks like the common ones are memset, memmove and memcpy. The rest of the routines for asan seem to be defined in compiler-rt/lib/asan/asan_interceptors.h however compiler-rt/lib/hwasan/ does not have such a file but it does have compiler-rt/lib/hwasan/hwasan_platform_interceptors.h which it looks like is forcing off everything but memset, memmove, memcpy, memcmp and bcmp. As such I've taken those as the final list that hwasan currently supports. This also means that on future updates this list should be cross checked. [1] https://discourse.llvm.org/t/hwasan-question-about-the-recent-interceptors-being-added/75351 gcc/ChangeLog: PR sanitizer/112644 * asan.h (asan_intercepted_p): Incercept memset, memmove, memcpy and memcmp. * builtins.cc (expand_builtin): Include HWASAN when checking for builtin inlining. gcc/testsuite/ChangeLog: PR sanitizer/112644 * c-c++-common/hwasan/builtin-special-handling.c: Update testcase. Co-Authored-By: Matthew Malcomson
[Bug sanitizer/112644] [14 Regression] Some of the hwasan testcase fail after the recent merge
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112644 --- Comment #11 from GCC Commits --- The master branch has been updated by Tamar Christina : https://gcc.gnu.org/g:0a640455928a050315f6addd88ace5d945eba130 commit r14-8661-g0a640455928a050315f6addd88ace5d945eba130 Author: Tamar Christina Date: Wed Jan 31 14:51:36 2024 + hwasan: Remove testsuite check for a complaint message [PR112644] With recent updates to hwasan runtime libraries, the error reporting for this particular check is has been reworked. I would question why it has lost this message. To me it looks strange that num_descriptions_printed is incremented whenever we call PrintHeapOrGlobalCandidate whether that function prints anything or not. (See PrintAddressDescription in libsanitizer/hwasan/hwasan_report.cpp). The message is no longer printed because we increment this num_descriptions_printed variable indicating that we have found some description. I would like to question this upstream, but it doesn't look that much of a problem and if pressed for time we should just change our testsuite. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. gcc/testsuite/ChangeLog: PR sanitizer/112644 * c-c++-common/hwasan/hwasan-thread-clears-stack.c: Update testcase.
[Bug testsuite/113502] gcc.target/aarch64/vect-early-break-cbranch.c and gcc.target/aarch64/sve/vect-early-break-cbranch.c testcase are too sensitive
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113502 --- Comment #2 from GCC Commits --- The master branch has been updated by Tamar Christina : https://gcc.gnu.org/g:f7935beef7b02fbba0adf33fb2ba5c0a27d7e9ff commit r14-8662-gf7935beef7b02fbba0adf33fb2ba5c0a27d7e9ff Author: Tamar Christina Date: Wed Jan 31 14:52:59 2024 + AArch64: relax cbranch tests to accepted inverted branches [PR113502] Recently something in the midend had started inverting the branches by inverting the condition and the branches. While this is fine, it makes it hard to actually test. In RTL I disable scheduling and BB reordering to prevent this. But in GIMPLE there seems to be nothing I can do. __builtin_expect seems to have no impact on the change since I suspect this is happening during expand where conditions can be flipped regardless of probability during compare_and_branch. Since the mid-end has plenty of correctness tests, this weakens the backend tests to just check that a correct looking sequence is emitted. gcc/testsuite/ChangeLog: PR testsuite/113502 * gcc.target/aarch64/sve/vect-early-break-cbranch.c: Ignore exact branch. * gcc.target/aarch64/vect-early-break-cbranch.c: Likewise.
[Bug c++/112580] [14 Regression]: g++.dg/modules/xtreme-header-4_b.C et al; ICE tree check: expected class 'type', have 'declaration'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112580 Patrick Palka changed: What|Removed |Added See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=112737 CC||ppalka at gcc dot gnu.org --- Comment #7 from Patrick Palka --- The xtreme-header-{4,5,6} fails need -mx32 (rather than -m32) on x86_64-linux. The error is: .../x86_64-pc-linux-gnu/libstdc++-v3/include/format:3662:28: error: invalid use of non-static data member ‘std::basic_format_args, wchar_t> >::__as_base ::’ 3662 | __arg._M_val = _M_values[__i]; |^ The xtreme-header{,2} fails are also tracked by PR112737, and the error on x86_64-linux is: /src/gcc/testsuite/g++.dg/modules/xtreme-header-2_a.H: error: conflicting global module declaration 'template class _Cont, class _Rg, class ... _Args> using std::ranges::__detail::_DeduceExpr1 = decltype (_Cont<...auto...>(declval<_Rg>(), (declval<_Args>)()...))' ...
[Bug sanitizer/112644] [14 Regression] Some of the hwasan testcase fail after the recent merge
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112644 Tamar Christina changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #12 from Tamar Christina --- Fixed. Thanks
[Bug testsuite/113502] gcc.target/aarch64/vect-early-break-cbranch.c and gcc.target/aarch64/sve/vect-early-break-cbranch.c testcase are too sensitive
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113502 Tamar Christina changed: What|Removed |Added Resolution|--- |FIXED CC||tnfchris at gcc dot gnu.org Status|UNCONFIRMED |RESOLVED --- Comment #3 from Tamar Christina --- Fixed, thanks
[Bug c++/112737] [14 Regression] g++.dg/modules/xtreme-header-2_b.C -std=c++2b (test for excess errors)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112737 --- Comment #5 from Patrick Palka --- Reduced: $ cat 112737.h template class _Cont> using _DeduceExpr1 = decltype(_Cont{}); $ cat 112737_a.H #include "112737.h" $ cat 112737_b.C import "112737_a.H"; #include "112737.h" $ g++ -fmodules-ts 112737_a.H 112737_b.C In file included from 112737_b.C:2: 112737.h:2:7: error: conflicting declaration of template ‘template class _Cont> using _DeduceExpr1 = decltype (_Cont<...auto...>{})’ 2 | using _DeduceExpr1 = decltype(_Cont{}); | ^~~~ In file included from 112737_a.H:1, of module ./112737_a.H, imported at 112737_b.C:1: 112737.h:2:7: note: previous declaration ‘template class _Cont> using _DeduceExpr1 = decltype (_Cont<...auto...>{})’ 2 | using _DeduceExpr1 = decltype(_Cont{}); | ^~~~
[Bug c++/106052] ICE with -Wmismatched-tags with partially specialized friend struct of self type
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106052 Marek Polacek changed: What|Removed |Added CC||mpolacek at gcc dot gnu.org --- Comment #3 from Marek Polacek --- Started with r10-7424-g04dd734b52de12: commit 04dd734b52de121853e1ea6b3c197a598b294e23 Author: Martin Sebor Date: Fri Mar 27 12:07:45 2020 -0400 c++: avoid -Wredundant-tags on a first declaration in use [PR 93824]
[Bug middle-end/113680] Missed optimization: Redundant cmp/test instructions when comparing (x - y) > 0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113680 --- Comment #3 from Kang-Che Sung --- Oops. I made a typo in the test code. func4() shouldn't have that redundant brace. The corrected example: ``` void func4(int x, int y) { int diff = x - y; if (diff > 0) putchar('>'); if (x < y) putchar('<'); } ```
[Bug c++/113683] New: explicit template instantiation wrongly checks private base class accessibility
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113683 Bug ID: 113683 Summary: explicit template instantiation wrongly checks private base class accessibility Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: schaumb at gmail dot com Target Milestone: --- The compiler wrongly checks that the (private) base class is accessible when explicit template instantiation happens. (standard: C++20) It shouldn't, see https://eel.is/c++draft/temp.spec#general-6 template struct I{}; struct A {}; class B : A { static const B b; }; I need to instantiate the B::b object address with const A*. But this line is failing: template struct I(&B::b)>; // fails on static_cast Simplified "real" example code: https://godbolt.org/z/zj9co5bMh
[Bug c++/112737] [14 Regression] g++.dg/modules/xtreme-header-2_b.C -std=c++2b (test for excess errors)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112737 --- Comment #6 from Patrick Palka --- Ah, this seems to be a general declaration matching issue not specific to modules. Here's a non-modules testcase: template class TT, class T> decltype(TT{T()}) f(); // #1 template class TT, class T> decltype(TT{T()}) f(); // #2, should be considered a redeclaration of #1 template struct A { A(T); }; int main() { f(); // ambiguity error } We (wrongly?) consider the return types of the two f's to be different, because the CTAD placeholders refer to different TEMPLATE_DECLs (of a logically equivalent ttp) and structural_comptypes uses pointer identity here. Perhaps we need to relax structural_comptypes in this case.
[Bug modula2/111627] modula2: Excess test fails with a case-preserving-case-insensitive source tree.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111627 --- Comment #3 from GCC Commits --- The master branch has been updated by Gaius Mulley : https://gcc.gnu.org/g:4fd094835a8997cdcc3d18d7d297debe1527202d commit r14-8663-g4fd094835a8997cdcc3d18d7d297debe1527202d Author: Gaius Mulley Date: Wed Jan 31 15:44:32 2024 + PR modula2/111627 Excess test fails with a case-preserving-case-insensitive source tree This patch renames gm2 testsuite modules whose names conflict with library modules. The conflict is not seen on case preserving case sensitive file systems. gcc/testsuite/ChangeLog: PR modula2/111627 * gm2/pim/pass/stdio.mod: Moved to... * gm2/pim/pass/teststdio.mod: ...here. * gm2/pim/run/pass/builtins.mod: Moved to... * gm2/pim/run/pass/testbuiltins.mod: ...here. * gm2/pim/run/pass/math.mod: Moved to... * gm2/pim/run/pass/testmath.mod: ...here. * gm2/pim/run/pass/math2.mod: Moved to... * gm2/pim/run/pass/testmath2.mod: ...here. Signed-off-by: Gaius Mulley
[Bug c++/112737] [14 Regression] g++.dg/modules/xtreme-header-2_b.C -std=c++2b (test for excess errors)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112737 Patrick Palka changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |ppalka at gcc dot gnu.org --- Comment #7 from Patrick Palka --- (In reply to Patrick Palka from comment #6) > Perhaps we need to relax structural_comptypes in this case. I guess I can submit a patch to that effect.
[Bug target/113679] long long minus double with gcc -m32 produces different results than other compilers or gcc -m64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113679 --- Comment #11 from Jakub Jelinek --- Anyway, seems clang is buggy: clang -O2 -m32 -mno-sse -mfpmath=387 -fexcess-precision=standard #include int main () { #if FLT_EVAL_METHOD == 2 && LDBL_MANT_DIG == 64 && DBL_MANT_DIG == 53 if ((double) 191.18831051580915 == 191.18831051580915) __builtin_abort (); #endif } should always succeed, because if FLT_EVAL_METHOD is 2, it ought to be evaluated as (long double) (double) 191.18831051580915L == 191.18831051580915L and (double) 191.18831051580915L is 0x1.7e606a3c65c95p+7 while 191.18831051580915L is 0x1.7e606a3c65c9503ap+7L, so they aren't equal.