[Bug c++/109884] New: __builtin_Xq returns _Float128 instead of __float128

2023-05-17 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109884

Bug ID: 109884
   Summary: __builtin_Xq returns _Float128 instead of __float128
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: g.peterh...@t-online.de
  Target Milestone: ---

#include 
#include 
#include 
#include 
#include 

template 
inline std::string nameof()
{
 return boost::core::demangle(typeid(Type).name());
}

int main()
{
 std::cout << nameof() << std::endl;
 std::cout << nameof() << std::endl;
 std::cout << nameof() << std::endl;
}

compiled with 13 returns the incorrect type
_Float128
_Float128
_Float128
with 12 or older gives the correct type
__float128
__float128
__float128

regards
Gero

[Bug target/109874] [SH] GCC 13's -Os code is 50% bigger than GCC 4's

2023-05-17 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109874

Richard Biener  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2023-05-17
   Keywords||missed-optimization
 Status|UNCONFIRMED |NEW
 Target||sh*

--- Comment #2 from Richard Biener  ---
It looks like the target cannot do arbitrary constant shifts so it benefits
from shifting incrementally.  Even if that is exposed early enough for CSE the
optimal sequences for shifting by 10, 11, 12 and 13 could prevent CSE here.

I'm not sure if there are other targets affected but this is a "global"
optimization problem which for example also affects optimal power expansion.

Generally strength-reduction techniques apply to improve these kind of
things, possibly in a machine dependent pass.

The regression was likely introduced when merging the shifts at the GIMPLE
level without considering the uses of the intermediate values (after the
transform
the values can be computed in parallel since the dependency chains are
shortened)

[Bug c++/109884] __builtin_Xq returns _Float128 instead of __float128

2023-05-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109884

--- Comment #1 from Andrew Pinski  ---
I think this is expected behavior now.

[Bug c++/109877] Support for clang-style attributes is needed to parse Darwin SDK headers properly

2023-05-17 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109877

Richard Biener  changed:

   What|Removed |Added

 Target||*-darwin
Version|unknown |14.0

--- Comment #7 from Richard Biener  ---
can we fixinclude the headers?

[Bug c++/109884] __builtin_Xq returns _Float128 instead of __float128

2023-05-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109884

--- Comment #2 from Andrew Pinski  ---
_Float128 is the standard specified way of defining these types in c++23 IIRC.

[Bug libgcc/109712] Segmentation fault in linear_search_fdes

2023-05-17 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109712

--- Comment #9 from Richard Biener  ---
Yes, using a newer libgcc_s.so.1 or libstdc++.so.6 should work fine - again,
unless we end up with mixing static/dynamic parts of the unwinder of different
versions.

[Bug c++/109884] __builtin_Xq returns _Float128 instead of __float128

2023-05-17 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109884

--- Comment #3 from g.peterh...@t-online.de ---
But these are different types (even if they are mathematically/behaviorally
equivalent)
std::is_same_v --> false

[Bug c++/109884] __builtin_Xq returns _Float128 instead of __float128

2023-05-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109884

--- Comment #4 from Andrew Pinski  ---
OK. And?
Q specifies the _Float128 type now.

I don't think we had any abi guarantees on the builtins nor on the q literals.

[Bug c++/109884] __builtin_Xq returns _Float128 instead of __float128

2023-05-17 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109884

--- Comment #5 from Jonathan Wakely  ---
This changed with r13-2887 when adding _Float128 to C++

[Bug c/102989] Implement C2x's n2763 (_BitInt)

2023-05-17 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102989

--- Comment #41 from Richard Biener  ---
(In reply to Jakub Jelinek from comment #40)
> Created attachment 55094 [details]
> gcc14-bitint-wip.patch
> 
> So, on IRC we've agreed with Richi that given the limits we have in the
> compiler
> (what wide_int/widest_int can represent at most without making the types have
> optional arbitrary length indirect payload, what INTEGER_CST can handle
> (right
> now 255 64-bit limbs) and TYPE_PRECISION limitation (max 65535 precision))
> it would be best to first try to implement _BitInt support with small
> BITINT_MAXWIDTH (in particular, what fits into wide_int, which is e.g. on
> x86_64
> 575 bits) and only when the implementation of that is complete, attempt to
> lift
> up some of the limits (start with the wide_int/widest_int one, INTEGER_CST
> could
> be handled by bumping the 2 counters from 8-bit to 16-bit and killing the
> cache,
> with that we'd be at 65535 as BITINT_MAXWIDTH and whether we'd want to grow
> it
> further is a question).
> 
> This patch implements some WIP, as the testcases show, it can already do
> something, but doesn't have any of the argument/return value passing code
> implemented, nor middle-end needed changes (promoting as much as possible to
> small INTEGER_TYPEs early for small BITINT_TYPEs and adding a lowering pass
> which will turn the larger ones into loops etc.).  Also, wb/uwb constants
> aren't
> really done yet.

Another idea is to have a large BITINT_MAXWIDTH (up to what TYPE_PRECISION
supports) but restrict constant folding to the cases we can represent in
INTEGER_CST.  For the cases where the language requires constant evaluation
we'd then sorry ().  I think we should be able to handle all-ones
encoded and since constant initializers are restricted it should handle
most practical cases already.

[Bug debug/109805] LTO affecting -fdebug-prefix-map

2023-05-17 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109805

--- Comment #13 from rguenther at suse dot de  ---
On Tue, 16 May 2023, sergiodj at sergiodj dot net wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109805
>
> --- Comment #12 from Sergio Durigan Junior  ---
> Sorry, I have been busy with other things, but I'm paying attention to the
> developments here.
>
> I still have to test the workaround I suggested (passing -fdebug-prefix-map to
> LDFLAGS) more broadly, because I think I may have found at least one scenario
> where it doesn't work.  Something else that's puzzling me is the fact that I
> don't see this behaviour everywhere; some packages do have the expected
> DW_AT_comp_dir even after being compiled with LTO enabled.

Yeah, it's clearly odd and we lack testsuite coverage completely.
Having small testcases that show cases that work and cases that do not
would be very useful in understanding the bits and how they do
(not) work together properly.

[Bug c++/109877] Support for clang-style attributes is needed to parse Darwin SDK headers properly

2023-05-17 Thread iains at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109877

--- Comment #8 from Iain Sandoe  ---
(In reply to Richard Biener from comment #7)
> can we fixinclude the headers?

1. not yet (PR105719) - although let us hope we can find a way to do that for
more limited cases (I've implemented the consumer code, but the generation and
install side is more work).

2. In any event, (especially for 'availability') that would be a huge job
(essentially re-writing a significant bunch of framework and /usr/include
cases), and keeping up with frequent Xcode / SDK updates would be quite a
maintenance burden***.

3. It does not help our downstream to use other projects which make use of
these features (in non-SDK sources).



*** other options considered:

for "closed" SDKs for system versions out of vendor support, I suppose we could
just have a script that sed'ed the headers into a replacement, but that is
still some machinery to implement.

It would be nice to have an open SDK - but that is a huge project in its own
right, and likewise would need someone with time to maintain the bleeding edge
version

[Bug tree-optimization/109885] New: gcc does not generate movmskps and testps instructions (clang does)

2023-05-17 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109885

Bug ID: 109885
   Summary: gcc does not generate movmskps and testps instructions
 (clang does)
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vincenzo.innocente at cern dot ch
  Target Milestone: ---

in this simple code (on avx2)

int sum(float const * x) {
   int ret = 0;
   for (int i=0; i<8; ++i) ret +=(0==x[i]);
   return ret;
}

int one(float const * x) {
   int ret = 0;
   for (int i=0; i<8; ++i) ret |=(0==x[i]);
   return ret;
}

int all(float const * x) {
   int ret = 1;
   for (int i=0; i<8; ++i) ret &=(0==x[i]);
   return ret;
}

clang uses movmskps and testps instructions, gcc does not

see for instance

https://godbolt.org/z/r11r8xoYz

[Bug target/109885] gcc does not generate movmskps and testps instructions (clang does)

2023-05-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109885

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||missed-optimization
  Component|tree-optimization   |target
   Severity|normal  |enhancement

[Bug c++/100052] [11/12/13/14 regression] ICE in compiling g++.dg/modules/xtreme-header-3_b.C after r11-8118

2023-05-17 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100052

--- Comment #13 from Jiu Fu Guo  ---
Pass on trunk, gcc-12, gcc-11 for xtreme-header-* cases:

make check-gcc-c++ RUNTESTFLAGS="--target_board=unix'{-m64}'
modules.exp=xtreme-header-*" 
=== g++ Summary ===

# of expected passes72

[Bug c++/101853] [12/13/14 Regression] g++.dg/modules/xtreme-header-5_b.C ICE

2023-05-17 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101853

--- Comment #14 from Jiu Fu Guo  ---
Pass on trunk, gcc-12, gcc-11 for xtreme-header-* cases:

make check-gcc-c++ RUNTESTFLAGS="--target_board=unix'{-m64}'
modules.exp=xtreme-header-*" 
=== g++ Summary ===

# of expected passes72

[Bug tree-optimization/109868] [13/14 regression] ICE: segmentation fault or ICE in min_value with zero sized bitfield

2023-05-17 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109868

--- Comment #16 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:78327cf06e6b65fc9c614622c98f6a3f3bfb7784

commit r14-927-g78327cf06e6b65fc9c614622c98f6a3f3bfb7784
Author: Jakub Jelinek 
Date:   Wed May 17 10:15:50 2023 +0200

c++: Don't try to initialize zero width bitfields in zero initialization
[PR109868]

My GCC 12 change to avoid removing zero-sized bitfields as they are
important for ABI and are needed for layout compatibility traits
apparently causes zero sized bitfields to be initialized in the IL,
which at least in 13+ results in ICEs in the ranger which is upset
about zero precision types.

I think we could even avoid initializing other unnamed bitfields, but
unfortunately !CONSTRUCTOR_NO_CLEARING doesn't mean in the middle-end
clearing of padding bits and until we have some new flag that represents
the request to clear padding bits, I think it is better to keep zeroing
non-zero sized unnamed bitfields.

In addition to skipping those fields, I have changed the logic how
UNION_TYPEs are handled, the current code was a little bit weird in that
e.g. if first non-static data member had error_mark_node type, we'd happily
zero initialize the second non-static data member, etc.

2023-05-17  Jakub Jelinek  

PR c++/109868
* init.cc (build_zero_init_1): Don't initialize zero-width
bitfields.
For unions only initialize the first FIELD_DECL.

* g++.dg/init/pr109868.C: New test.

[Bug sanitizer/109882] sanitizer/common_interface_defs.h bogusly defines __has_feature

2023-05-17 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109882

--- Comment #5 from Jonathan Wakely  ---
Libstdc++ itself does this:

#if __SANITIZE_THREAD__
#  define _GLIBCXX_TSAN 1
#elif defined __has_feature
# if __has_feature(thread_sanitizer)
#  define _GLIBCXX_TSAN 1
# endif
#endif

The sanitizers could do something similar, although it looks like they don't
actually need to. The only use of __has_feature in the public API is in
asan_interface.h and that could easily be replaced. Then __has_feature can be
redefined in the internal headers, which (I assume) aren't meant to be included
by user code.

Something like this (untested):

diff --git a/libsanitizer/include/sanitizer/asan_interface.h
b/libsanitizer/include/sanitizer/asan_interface.h
index 9bff21c117b..186269ad694 100644
--- a/libsanitizer/include/sanitizer/asan_interface.h
+++ b/libsanitizer/include/sanitizer/asan_interface.h
@@ -48,7 +48,15 @@ void __asan_poison_memory_region(void const volatile *addr,
size_t size);
 void __asan_unpoison_memory_region(void const volatile *addr, size_t size);

 // Macros provided for convenience.
-#if __has_feature(address_sanitizer) || defined(__SANITIZE_ADDRESS__)
+#ifdef __has_feature
+#if __has_feature(address_sanitizer)
+#define ASAN_DEFINE_REGION_MACROS
+#endif
+#elif defined(__SANITIZE_ADDRESS__)
+#define ASAN_DEFINE_REGION_MACROS
+#endif
+
+#ifdef ASAN_DEFINE_REGION_MACROS
 /// Marks a memory region as unaddressable.
 ///
 /// \note Macro provided for convenience; defined as a no-op if ASan is not
@@ -74,6 +82,7 @@ void __asan_unpoison_memory_region(void const volatile *addr,
size_t size);
 #define ASAN_UNPOISON_MEMORY_REGION(addr, size) \
   ((void)(addr), (void)(size))
 #endif
+#undef ASAN_DEFINE_REGION_MACROS

 /// Checks if an address is poisoned.
 ///
diff --git a/libsanitizer/include/sanitizer/common_interface_defs.h
b/libsanitizer/include/sanitizer/common_interface_defs.h
index 2f415bd9e85..2f9c83ef74e 100644
--- a/libsanitizer/include/sanitizer/common_interface_defs.h
+++ b/libsanitizer/include/sanitizer/common_interface_defs.h
@@ -15,11 +15,6 @@
 #include 
 #include 

-// GCC does not understand __has_feature.
-#if !defined(__has_feature)
-#define __has_feature(x) 0
-#endif
-
 #ifdef __cplusplus
 extern "C" {
 #endif
diff --git a/libsanitizer/sanitizer_common/sanitizer_internal_defs.h
b/libsanitizer/sanitizer_common/sanitizer_internal_defs.h
index 98186c429e9..7574dce7f4a 100644
--- a/libsanitizer/sanitizer_common/sanitizer_internal_defs.h
+++ b/libsanitizer/sanitizer_common/sanitizer_internal_defs.h
@@ -14,6 +14,11 @@

 #include "sanitizer_platform.h"

+// GCC does not understand __has_feature.
+#if !defined(__has_feature)
+#define __has_feature(x) 0
+#endif
+
 #ifndef SANITIZER_DEBUG
 # define SANITIZER_DEBUG 0
 #endif

[Bug sanitizer/109882] sanitizer/common_interface_defs.h bogusly defines __has_feature

2023-05-17 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109882

--- Comment #6 from Jakub Jelinek  ---
Looks ok to me.  Now how to convince upstream to apply this? (Or we could keep
it as LOCAL_PATCHES.)

[Bug c++/109884] __builtin_Xq returns _Float128 instead of __float128

2023-05-17 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109884

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #6 from Jakub Jelinek  ---
Note, these builtins aren't standard builtins, but backend registered ones:
grep '"__builtin_[a-z]*q["=]' gcc/config/*/* 2>/dev/null
gcc/config/i386/i386-builtins.cc:  def_builtin_const (0, 0, "__builtin_infq",
gcc/config/i386/i386-builtins.cc:  decl = add_builtin_function
("__builtin_nanq", ftype, IX86_BUILTIN_NANQ,
gcc/config/i386/i386-builtins.cc:  decl = add_builtin_function
("__builtin_nansq", ftype, IX86_BUILTIN_NANSQ,
gcc/config/i386/i386-builtins.cc:  decl = add_builtin_function
("__builtin_fabsq", ftype, IX86_BUILTIN_FABSQ,
gcc/config/i386/i386-builtins.cc:  decl = add_builtin_function
("__builtin_copysignq", ftype,
gcc/config/ia64/ia64.cc:  decl = add_builtin_function ("__builtin_infq",
ftype,
gcc/config/ia64/ia64.cc:  decl = add_builtin_function ("__builtin_nanq",
ftype,
gcc/config/ia64/ia64.cc:  decl = add_builtin_function ("__builtin_nansq",
ftype,
gcc/config/ia64/ia64.cc:  decl = add_builtin_function ("__builtin_fabsq",
ftype,
gcc/config/ia64/ia64.cc:  decl = add_builtin_function
("__builtin_copysignq", ftype,
gcc/config/pa/pa.cc:  decl = add_builtin_function ("__builtin_fabsq",
ftype,
gcc/config/pa/pa.cc:  decl = add_builtin_function ("__builtin_copysignq",
ftype,
gcc/config/pa/pa.cc:  decl = add_builtin_function ("__builtin_infq", ftype,
gcc/config/rs6000/rs6000-c.cc:  builtin_define
("__builtin_fabsq=__builtin_fabsf128");
gcc/config/rs6000/rs6000-c.cc:  builtin_define
("__builtin_copysignq=__builtin_copysignf128");
gcc/config/rs6000/rs6000-c.cc:  builtin_define
("__builtin_nanq=__builtin_nanf128");
gcc/config/rs6000/rs6000-c.cc:  builtin_define
("__builtin_nansq=__builtin_nansf128");
gcc/config/rs6000/rs6000-c.cc:  builtin_define
("__builtin_infq=__builtin_inff128");
and have been that way before as well.  Given how they are defined on rs6000,
at least there because it is just a macro for the f128 suffixed ones it really
has to return _Float128.

[Bug target/109811] libxjl 0.7 is a lot slower in GCC 13.1 vs Clang 16

2023-05-17 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109811

Jan Hubicka  changed:

   What|Removed |Added

 CC||hubicka at gcc dot gnu.org
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2023-05-17

--- Comment #4 from Jan Hubicka  ---
Confirmed. LTO is not necessary to reproduce the differnce.

I got libjxl and the test jpeg file from Phoronix testuiste and configure clang
build with:

CC=clang CXX=clang++ CFLAGS="-O3 -g-march=native -fno-exceptions"
CXXFLAGS="-O3 -g   -march=native -fno-exceptions" cmake
-DCMAKE_C_FLAGS_RELEASE="$CFLAGS -DNDEBUG" -DCMAKE_CXX_FLAGS_RELEASE="$CXXFLAGS
-DNDEBUG" -DBUILD_TESTING=OFF ..

and

CFLAGS="-O3 -g-march=native -fno-exceptions" CXXFLAGS="-O3 -g  
-march=native -fno-exceptions" cmake -DCMAKE_C_FLAGS_RELEASE="$CFLAGS -DNDEBUG"
-DCMAKE_CXX_FLAGS_RELEASE="$CXXFLAGS -DNDEBUG" -DBUILD_TESTING=OFF ..

jh@ryzen3:~/.phoronix-test-suite/installed-tests/pts/jpegxl-1.5.0>
./libjxl-0.7.0/build-gcc/tools/cjxl sample-photo-6000x4000.JPG --quality=90
--lossless_jpeg=0
JPEG XL encoder v0.7.0 [AVX2]
No output file specified.
Encoding will be performed, but the result will be discarded.
Read 6000x4000 image, 7837694 bytes, 926.0 MP/s
Encoding [Container | VarDCT, d1.000, effort: 7 | 29424-byte Exif], 
Compressed to 2288431 bytes including container (0.763 bpp).
6000 x 4000, 11.12 MP/s [11.12, 11.12], 1 reps, 16 threads.
jh@ryzen3:~/.phoronix-test-suite/installed-tests/pts/jpegxl-1.5.0>
./libjxl-0.7.0/build-gcc/tools/cjxl sample-photo-6000x4000.JPG --quality=90
--lossless_jpeg=0 test
JPEG XL encoder v0.7.0 [AVX2]
Read 6000x4000 image, 7837694 bytes, 926.5 MP/s
Encoding [Container | VarDCT, d1.000, effort: 7 | 29424-byte Exif], 
Compressed to 2288431 bytes including container (0.763 bpp).
6000 x 4000, 11.09 MP/s [11.09, 11.09], 1 reps, 16 threads.
jh@ryzen3:~/.phoronix-test-suite/installed-tests/pts/jpegxl-1.5.0>
./libjxl-0.7.0/build-gcc/tools/cjxl sample-photo-6000x4000.JPG --quality=90
--lossless_jpeg=0 test
JPEG XL encoder v0.7.0 [AVX2]
Read 6000x4000 image, 7837694 bytes, 925.6 MP/s
Encoding [Container | VarDCT, d1.000, effort: 7 | 29424-byte Exif], 
Compressed to 2288431 bytes including container (0.763 bpp).
6000 x 4000, 11.12 MP/s [11.12, 11.12], 1 reps, 16 threads.
jh@ryzen3:~/.phoronix-test-suite/installed-tests/pts/jpegxl-1.5.0>
./libjxl-0.7.0/build-clang/tools/cjxl sample-photo-6000x4000.JPG --quality=90
--lossless_jpeg=0 test
JPEG XL encoder v0.7.0 [AVX2]
Read 6000x4000 image, 7837694 bytes, 924.6 MP/s
Encoding [Container | VarDCT, d1.000, effort: 7 | 29424-byte Exif], 
Compressed to 2288430 bytes including container (0.763 bpp).
6000 x 4000, 15.17 MP/s [15.17, 15.17], 1 reps, 16 threads.
jh@ryzen3:~/.phoronix-test-suite/installed-tests/pts/jpegxl-1.5.0>
./libjxl-0.7.0/build-clang/tools/cjxl sample-photo-6000x4000.JPG --quality=90
--lossless_jpeg=0 test
JPEG XL encoder v0.7.0 [AVX2]
Read 6000x4000 image, 7837694 bytes, 922.4 MP/s
Encoding [Container | VarDCT, d1.000, effort: 7 | 29424-byte Exif], 
Compressed to 2288430 bytes including container (0.763 bpp).
6000 x 4000, 15.18 MP/s [15.18, 15.18], 1 reps, 16 threads.


So GCC does 11MB/s while clang 15MB/s

[Bug tree-optimization/109759] UBSAN error: shift exponent 64 is too large for 64-bit type 'long unsigned int'

2023-05-17 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109759

Martin Jambor  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #3 from Martin Jambor  ---
This test now passed with UBSAN instrumented compiler, so probably indeed a
dup.

*** This bug has been marked as a duplicate of bug 109788 ***

[Bug fortran/109788] [14 Regression] gcc/hwint.h:293:61: runtime error: shift exponent 64 is too large for 64-bit type 'long unsigned int since r14-377-gc92b8be9b52b7e

2023-05-17 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109788

--- Comment #13 from Martin Jambor  ---
*** Bug 109759 has been marked as a duplicate of this bug. ***

[Bug other/63426] [meta-bug] Issues found with -fsanitize=undefined

2023-05-17 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63426
Bug 63426 depends on bug 109759, which changed state.

Bug 109759 Summary: UBSAN error: shift exponent 64 is too large for 64-bit type 
'long unsigned int'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109759

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

[Bug ipa/109886] New: UBSAN error: shift exponent 64 is too large for 64-bit type when compiling gcc.c-torture/compile/pr96796.c

2023-05-17 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109886

Bug ID: 109886
   Summary: UBSAN error: shift exponent 64 is too large for 64-bit
type when compiling gcc.c-torture/compile/pr96796.c
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: ipa
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jamborm at gcc dot gnu.org
CC: aldyh at gcc dot gnu.org, marxin at gcc dot gnu.org
  Target Milestone: ---
  Host: x86_64-linux-gnu
Target: x86_64-linux-gnu

Bootstrap with undefined behavior sanitizer and subsequent run of
the testsuite (on revision ac3a5bbc629, so check for PR 109788 is
included) reports a new error when compiling C torture testcase
gcc/testsuite/gcc.c-torture/compile/pr96796.c:

$ UBSAN_OPTIONS="halt_on_error=1 print_stacktrace=1"
/home/mjambor/gcc/mine/b-obj/gcc/xgcc -B/home/mjambor/gcc/mine/b-obj/gcc/
-fdiagnostics-plain-output -O1 -w -fcommon -c -o pr96796.o
/home/mjambor/gcc/mine/src/gcc/testsuite/gcc.c-torture/compile/pr96796.c

/home/mjambor/gcc/mine/src/gcc/hwint.h:293:61: runtime error: shift exponent 64
is too large for 64-bit type 'long unsigned int'

#0 0xbf8117 in sext_hwi(long, unsigned int)
/home/mjambor/gcc/mine/src/gcc/hwint.h:293
#1 0xbf8117 in wi::hwi_with_prec::hwi_with_prec(long, unsigned int, signop)
/home/mjambor/gcc/mine/src/gcc/wide-int.h:1622
#2 0xbf8117 in wi::shwi(long, unsigned int)
/home/mjambor/gcc/mine/src/gcc/wide-int.h:1631
#3 0xbf8117 in wi::minus_one(unsigned int)
/home/mjambor/gcc/mine/src/gcc/wide-int.h:1645
#4 0xbf8117 in irange::set_varying(tree_node*)
/home/mjambor/gcc/mine/src/gcc/value-range.h:871
#5 0x2257e45 in range_cast(vrange&, tree_node*)
/home/mjambor/gcc/mine/src/gcc/range-op.cc:4860
#6 0x1b119a6 in ipa_compute_jump_functions_for_edge
/home/mjambor/gcc/mine/src/gcc/ipa-prop.cc:2325
#7 0x1b14f66 in ipa_compute_jump_functions_for_bb
/home/mjambor/gcc/mine/src/gcc/ipa-prop.cc:2449
#8 0x1b14f66 in analysis_dom_walker::before_dom_children(basic_block_def*)
/home/mjambor/gcc/mine/src/gcc/ipa-prop.cc:3035
#9 0x65a5ff3 in dom_walker::walk(basic_block_def*)
/home/mjambor/gcc/mine/src/gcc/domwalk.cc:311
#10 0x1b0e601 in ipa_analyze_node(cgraph_node*)
/home/mjambor/gcc/mine/src/gcc/ipa-prop.cc:3103
#11 0x1991487 in inline_indirect_intraprocedural_analysis
/home/mjambor/gcc/mine/src/gcc/ipa-fnsummary.cc:4315
#12 0x1991487 in inline_analyze_function(cgraph_node*)
/home/mjambor/gcc/mine/src/gcc/ipa-fnsummary.cc:4334
#13 0x1991afc in ipa_fn_summary_generate
/home/mjambor/gcc/mine/src/gcc/ipa-fnsummary.cc:4378
#14 0x21351c1 in execute_ipa_summary_passes(ipa_opt_pass_d*)
/home/mjambor/gcc/mine/src/gcc/passes.cc:2304
#15 0x10a2163 in ipa_passes
/home/mjambor/gcc/mine/src/gcc/cgraphunit.cc:2235
[...]

[Bug c++/109887] New: Different mangled name for template specialization for clang and gcc

2023-05-17 Thread yedeng.yd at linux dot alibaba.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109887

Bug ID: 109887
   Summary: Different mangled name for template specialization for
clang and gcc
   Product: gcc
   Version: 12.3.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: yedeng.yd at linux dot alibaba.com
  Target Milestone: ---

(This is a duplication of https://github.com/llvm/llvm-project/issues/62765
since I don't know which one is worse)

Reproducer:

```
#include 

namespace llvm {
class StringRef {
public:
StringRef(const char*);

};
template  class Optional {};
}

namespace n {
struct S {
template 
std::enable_if_t::value, llvm::Optional>
get(llvm::StringRef) const {
return {};
}
};

template <>
llvm::Optional
S::get(llvm::StringRef) const;
}

void use() {
n::S().get("hello");
}
```

For the specialization `S::get(llvm::StringRef)`, gcc will mangle it as:

```
_ZNK1n1S3getIbEENSt9enable_ifIXsrSt11is_integralIT_E5valueEN4llvm8OptionalIS4_EEE4typeENS6_9StringRefE
```

and clang will mangle it as:

```
_ZNK1n1S3getIbEENSt9enable_ifIXsr3std11is_integralIT_EE5valueEN4llvm8OptionalIS3_EEE4typeENS4_9StringRefE
```

Also the c++filt can only recognize the name mangled by gcc. And the
llvm-cxxfilt can only recognize the name mangled by clang.

So I am not sure if this is bug really or this is by design. But I think clang
and gcc are trying to make ABI compatible.

[Bug c++/109884] __builtin_Xq returns _Float128 instead of __float128

2023-05-17 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109884

Jakub Jelinek  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org
   Last reconfirmed||2023-05-17

--- Comment #7 from Jakub Jelinek  ---
Created attachment 55098
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55098&action=edit
gcc14-pr109884.patch

Untested fix.  For ia64, I think it already uses float128t_type_node,
for rs6000 as I wrote it is more difficult because it doesn't have the
builtins but macros and in pa case, __float128 is the same as long double.

[Bug c++/109884] __builtin_Xq returns _Float128 instead of __float128

2023-05-17 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109884

--- Comment #8 from Jakub Jelinek  ---
(In reply to Andrew Pinski from comment #4)
> Q specifies the _Float128 type now.

No, Q suffix specifies __float128 actually.  F128 or f128 specify _Float128.

[Bug target/109811] libxjl 0.7 is a lot slower in GCC 13.1 vs Clang 16

2023-05-17 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109811

--- Comment #5 from Jan Hubicka  ---
Also forgot to mention, I used zen3 machine.  So Raptor lake is not necessary.
Note that build systems appends -O2 after any CFLAGS specified, so it really is
-O2 build:

# Force build with optimizations in release mode.
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -O2")

For Clang other options are appended:

  -fnew-alignment=8
  -fno-cxx-exceptions
  -fno-slp-vectorize
  -fno-vectorize

  -disable-free
  -disable-llvm-verifier


Perf profile mixing both GCC and clang build is:

   8.36%  cjxl libjxl.so.0.7.0  [.] jxl::(anonymous
namespace)::FindTextLikePatches
   
   ◆
   5.74%  cjxl libjxl.so.0.7.0  [.] jxl::FindBestPatchDictionary   
   
   
   ▒
   4.51%  cjxl libjxl.so.0.7.0  [.] jxl::N_AVX2::EstimateEntropy   
   
   
   ▒
   4.50%  cjxl libjxl.so.0.7.0  [.] jxl::N_AVX2::(anonymous
namespace)::TransformFromPixels
   
   ▒
   4.25%  cjxl libjxl.so.0.7.0  [.] jxl::N_AVX2::QuantizeBlockAC   
   
   
   ▒
   4.10%  cjxl libjxl.so.0.7.0  [.] jxl::N_AVX2::EstimateEntropy   
   
   
   ▒
   3.77%  cjxl libjxl.so.0.7.0  [.] jxl::N_AVX2::(anonymous
namespace)::TransformFromPixels
   
   ▒
   3.46%  cjxl libjxl.so.0.7.0  [.] jxl::N_AVX2::QuantizeBlockAC   
   
   
   ▒
   3.08%  cjxl libjxl.so.0.7.0  [.] jxl::N_AVX2::FindBestMultiplier
   
   
   ▒
   3.04%  cjxl libjxl.so.0.7.0  [.] jxl::N_AVX2::FindBestMultiplier
   
   
   ▒
   2.98%  cjxl libjxl.so.0.7.0  [.] jxl::N_AVX2::(anonymous
namespace)::DCT1DImpl<8ul, 8ul>::operator()
   
   ▒
   2.80%  cjxl libjxl.so.0.7.0  [.]
jxl::ThreadPool::RunCallState const&, jxl::RectT
const&, jxl::WeightsSymmetric5 const&, jxl::ThreadPool*,
jxl::Plane*)::{l▒
   2.75%  cjxl libjxl.so.0.7.0  [.]
jxl::ThreadPool::RunCallState const&, jxl::RectT
const&, jxl::WeightsSymmetric5 const&, jxl::ThreadPool*,
jxl::Plane*)::$_▒
   2.26%  cjxl libjxl.so.0.7.0  [.]
jxl::ThreadPool::RunCallState const&, float const*,
jxl::ThreadPool*, jxl::Image3*)::$_0>::CallDataFunc 
▒
   2.00%  cjxl libjxl.so.0.7.0  [.] jxl::N_AVX2::(anonymous
namespace)::DCT1DWrapper<4ul, 4ul, jxl::N_AVX2::(anonymous namespace)::DCTFrom,
jxl::N_AVX2::(anonymous namespace)::DCTTo> 
   ▒
   1.95%  cjxl libjxl.so.0.7.0  [.] jxl::N_AVX2::(anonymous
namespace)::DCT1DImpl<16ul, 8ul>::operator()   
   
   ▒
   1.68%  cjxl libjxl.so.0.7.0  [.]
jxl::ThreadPool::RunCallState, unsigned long,
unsigned long, jxl::ColorEncoding const&, unsigned long, bool, unsigned long,
JxlEnd▒
   1.68%  cjxl libjxl.so.0.7.0  [.] jxl::N_AVX2::(anonymous
namespace)::DCT1DImpl<32ul, 8ul>::operator()   
   

[Bug libgomp/109875] [OpenMP] nteams-var / OMP_NUM_TEAMS → ICV not passed to the device / default value

2023-05-17 Thread burnus at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109875

--- Comment #1 from Tobias Burnus  ---
Tested it now also with true offloading.

For AMD GCN, I get:
  host: max_teams: 2
  tgt: max_teams: 3
  num_teams: 120

For nvptx, I get:
  host: max_teams: 2
  tgt: max_teams: 3
  num_teams: 240

And for completeness, for host fallback:
  host: max_teams: 3
  tgt: max_teams: 3
  num_teams: 1

i.e. the ICV is handled correctly. However, the ICV is not honored for the
target region.

By contrast, an explicit 'num_teams(4)' is honored by GCN/nvptx/host fallback
...

 * * *

The spec wording is:
"If the *num_teams* clause is not specified on a construct then the effect is
as if _upper-bound_ was specified as follows. If the value of the nteams-var
ICV is greater than zero, the effect is as if upper-bound was specified to an
implementation-defined value greater than zero but less than or equal to the
value of the nteams-var ICV."

[Bug libgomp/109875] [OpenMP] nteams-var / OMP_NUM_TEAMS → ICV not passed to the device / default value

2023-05-17 Thread burnus at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109875

--- Comment #2 from Tobias Burnus  ---
The host-fallback explicitly sets the number of teams to the lower_bound,
if available, and otherwise to 1 - which is fine.

Regarding changing the default from 0 to the actually used number,
the problem is on the device side it is only known at runtime
→ issue with OMP_DISPLAY_ENV.
BTW, the "[host] OMP_NUM_TEAMS = '0'" should be "[all] OMP_NUM_TEAMS = '0'
for the default.

For the device side, I think we need (untested):

--- a/libgomp/config/gcn/target.c
+++ b/libgomp/config/gcn/target.c
@@ -51 +51,3 @@ GOMP_teams4 (unsigned int num_teams_lower, unsigned int -   
num_teams_upper = num_workgroups;
+num_teams_upper = ((GOMP_ADDITIONAL_ICVS.nteams > 0
+   && num_workgroups > GOMP_ADDITIONAL_ICVS.nteams)
+  ? GOMP_ADDITIONAL_ICVS.nteams : num_workgroups);
diff --git a/libgomp/config/nvptx/target.c b/libgomp/config/nvptx/target.c
index f102d7d02d9..125d92a2ea9 100644
--- a/libgomp/config/nvptx/target.c
+++ b/libgomp/config/nvptx/target.c
@@ -58 +58,3 @@ GOMP_teams4 (unsigned int num_teams_lower, unsigned int 
-num_teams_upper = num_blocks;
+num_teams_upper = ((GOMP_ADDITIONAL_ICVS.nteams > 0
+   && num_blocks > GOMP_ADDITIONAL_ICVS.nteams)
+  ? GOMP_ADDITIONAL_ICVS.nteams : num_blocks);

[Bug libstdc++/109883] Stack Overflow in functions with types

2023-05-17 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109883

Xi Ruoyao  changed:

   What|Removed |Added

 CC||xry111 at gcc dot gnu.org

--- Comment #1 from Xi Ruoyao  ---
Cannot reproduce for me.  Note that in this case GCC optimizes the entire
function call away (see https://godbolt.org/z/968bPTvh9) even with -O0 so I can
see no way how this will lead to a runtime error.

And GCC for aarch64-darwin target (i. e. "macOS 13.3.1 on M1") is not a part of
this project, so are you using another fork?

[Bug libstdc++/109883] Stack Overflow in functions with types

2023-05-17 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109883

Xi Ruoyao  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2023-05-17
 Status|UNCONFIRMED |WAITING

[Bug c++/109887] Different mangled name for template specialization for clang and gcc

2023-05-17 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109887

--- Comment #1 from Jonathan Wakely  ---
(In reply to Chuanqi Xu from comment #0)
> _ZNK1n1S3getIbEENSt9enable_ifIXsrSt11is_integralIT_E5valueEN4llvm8OptionalIS4
> _EEE4typeENS6_9StringRefE
> ```
> 
> and clang will mangle it as:
> 
> ```
> _ZNK1n1S3getIbEENSt9enable_ifIXsr3std11is_integralIT_EE5valueEN4llvm8Optional
> IS3_EEE4typeENS4_9StringRefE

The difference is that GCC mangles std::is_integral as St11is_integral and
Clang mangles it as 3std::is_integral. I think GCC is right.

Clang uses St9enable_if for std::enable_if so I don't knwo why it doesn't use
the St substitution for std::is_integral.

[Bug middle-end/97048] [meta-bug] bogus/missing -Wstringop-overread warnings

2023-05-17 Thread tonyguil at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97048

--- Comment #3 from Tony Guilfoyle  ---
I jumped through enough hoops already, I think. You can take it from 
here if you want.

All the best,

Tony

On 16/05/2023 18:28, redi at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97048
>
> --- Comment #2 from Jonathan Wakely  ---
> Tony, this is just a meta-bug that has links to the real bugs. Please either
> add that as a comment to an existing bug (if it's the same as one of them) or
> file a new bug (and set "Blocks: 97048" so that it links back here). But since
> your one seems to be about -Wstringop-overflow not -Wstringop-overread I don't
> think it is actually related to this meta-bug at all. Maybe it's related to PR
> 97185 instead.
>

[Bug c++/109888] New: GCC 13 Fails to Compile Code with Explicit Constructor for std::array in Template Class

2023-05-17 Thread vincent.lebourlot at starqube dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109888

Bug ID: 109888
   Summary: GCC 13 Fails to Compile Code with Explicit Constructor
for std::array in Template Class
   Product: gcc
   Version: 13.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vincent.lebourlot at starqube dot com
  Target Milestone: ---

In a C++ codebase, a class String is defined with an explicit constructor that
takes a variable number of arguments and constructs a std::array from them.
This constructor is being called when creating a std::pair in a
function call.

While this code compiles successfully with GCC 12, it fails to compile with GCC
13. The error messages indicate that the std::array must be initialized with a
brace-enclosed initializer, which is not what's happening when forwarding the
arguments to the std::array's constructor.

This issue seems to be specific to GCC 13 and does not occur in GCC 12. It's
unclear whether this is due to changes in the C++ standard or in GCC's
implementation of the standard. The code snippet that reproduces this issue is
provided below:

#include 
#include 
#include 
#include 
#include 

class String {
public:
templateString(const char(&s)[n])noexcept:value(){
if constexpr(n<=16){
std::memcpy(value.data(),s,n);}
else{value=std::array{};}}
template...>, int> = 0>
explicit String(Args&&... args) : value(std::forward(args)...) {}
private:
std::arrayvalue;
};

int main() {
auto check=[&](std::vector>textInputs){};
check({{{"Hello", "World"}, {"Foo", "Bar"}}});
return 0;
}


And here's the error message:

test.cpp: In function ‘int main()’:
test.cpp:20:10: error: converting to ‘const String’ from initializer list would
use explicit constructor ‘String::String(Args&& ...) [with Args = {const char
(&)[6], const char (&)[6]}; typename
std::enable_if, std::allocator >, Args>...>, int>::type
 = 0]’
   20 | check({{{"Hello", "World"}, {"Foo", "Bar"}}});
  | ~^~~~
test.cpp:20:10: error: converting to ‘const String’ from initializer list would
use explicit constructor ‘String::String(Args&& ...) [with Args = {const char
(&)[4], const char (&)[4]}; typename
std::enable_if, std::allocator >, Args>...>, int>::type
 = 0]’
test.cpp: In instantiation of ‘String::String(Args&& ...) [with Args = {const
char (&)[6], const char (&)[6]}; typename
std::enable_if, std::allocator >, Args>...>, int>::type
 = 0]’:
test.cpp:20:10:   required from here
test.cpp:12:39: error: no matching function for call to ‘std::array::array(const char [6], const char [6])’
   12 | explicit String(Args&&... args) :
value(std::forward(args)...) {}
  |  
^~
In file included from test.cpp:3:
/usr/local/gcc/gcc-13/include/c++/13.1.0/array:94:12: note: candidate:
‘std::array::array()’
   94 | struct array
  |^
/usr/local/gcc/gcc-13/include/c++/13.1.0/array:94:12: note:   candidate expects
0 arguments, 2 provided
/usr/local/gcc/gcc-13/include/c++/13.1.0/array:94:12: note: candidate:
‘constexpr std::array::array(const std::array&)’
/usr/local/gcc/gcc-13/include/c++/13.1.0/array:94:12: note:   candidate expects
1 argument, 2 provided
/usr/local/gcc/gcc-13/include/c++/13.1.0/array:94:12: note: candidate:
‘constexpr std::array::array(std::array&&)’
/usr/local/gcc/gcc-13/include/c++/13.1.0/array:94:12: note:   candidate expects
1 argument, 2 provided
test.cpp: In instantiation of ‘String::String(Args&& ...) [with Args = {const
char (&)[4], const char (&)[4]}; typename
std::enable_if, std::allocator >, Args>...>, int>::type
 = 0]’:
test.cpp:20:10:   required from here
test.cpp:12:39: error: no matching function for call to ‘std::array::array(const char [4], const char [4])’
   12 | explicit String(Args&&... args) :
value(std::forward(args)...) {}
  |  
^~
/usr/local/gcc/gcc-13/include/c++/13.1.0/array:94:12: note: candidate:
‘std::array::array()’
   94 | struct array
  |^
/usr/local/gcc/gcc-13/include/c++/13.1.0/array:94:12: note:   candidate expects
0 arguments, 2 provided
/usr/local/gcc/gcc-13/include/c++/13.1.0/array:94:12: note: candidate:
‘constexpr std::array::array(const std::array&)’
/usr/local/gcc/gcc-13/include/c++/13.1.0/array:94:12: note:   candidate expects
1 argument, 2 provided
/usr/local/gcc/gcc-13/include/c++/13.1.0/array:94:12: note: candidate:
‘constexpr std::array::array(std::array&&)’
/usr/local/gcc/gcc-13/include/c++/13.1.0/array:94:12: note:   candidate expects
1 argument, 2 provided

[Bug c++/109887] Different mangled name for template specialization for clang and gcc

2023-05-17 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109887

--- Comment #2 from Jonathan Wakely  ---
(In reply to Jonathan Wakely from comment #1)
> Clang mangles it as 3std::is_integral.

Oops, I mean 3std11is_integral of course.

[Bug c++/109887] Different mangled name for template specialization for clang and gcc

2023-05-17 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109887

--- Comment #3 from Jakub Jelinek  ---
So, simpler testcase would be
#include 

template 
std::enable_if_t ::value, int>
foo() { return 0; }

int a = foo ();

GCC mangles this as
_Z3fooIiENSt9enable_ifIXsrSt11is_integralIT_E5valueEiE4typeEv
while clang as
_Z3fooIiENSt9enable_ifIXsr3std11is_integralIT_EE5valueEiE4typeEv
but c++filt is able to demangle both as
std::enable_if::value, int>::type foo()
So, the difference between the two is that gcc uses substitution St for std::
while clang doesn't.
In https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling sr is
 ::= [gs]  # x or
(with "gs") ::x
::= sr   # T::x
/ decltype(p)::x
...
and
 ::=  [  ]# T:: or
T::
::=   #
decltype(p)::
::= 
and
 ::= St # ::std::
among other things, so I think st is what should be used instead of 3std.

[Bug c++/109888] GCC 13 Fails to Compile Code with Explicit Constructor for std::array in Template Class

2023-05-17 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109888

Jonathan Wakely  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Jonathan Wakely  ---
Another dup of Bug 109247

*** This bug has been marked as a duplicate of bug 109247 ***

[Bug c++/109247] [13/14 Regression] optional o; o = {x}; wants to use explicit optional(U) constructor since r13-6765-ga226590fefb35ed6

2023-05-17 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109247

Jonathan Wakely  changed:

   What|Removed |Added

 CC||vincent.lebourlot@starqube.
   ||com

--- Comment #13 from Jonathan Wakely  ---
*** Bug 109888 has been marked as a duplicate of this bug. ***

[Bug libstdc++/109889] New: [13/14 Regression] Segfault in __run_exit_handlers since r13-5309-gc3c6c307792026

2023-05-17 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109889

Bug ID: 109889
   Summary: [13/14 Regression] Segfault in __run_exit_handlers
since r13-5309-gc3c6c307792026
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: wrong-code
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redi at gcc dot gnu.org
  Target Milestone: ---
Target: powerpc64le-unknown-linux-gnu

Created attachment 55099
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55099&action=edit
Gzipped preprocessed source

I'm seeing test failures on powerpc64le when using -D_GLIBCXX_DEBUG, which
started with r13-5309-gc3c6c307792026. I don't see anything wrong with that
library change, so if I'm not missing something silly, then it might be a
latent compiler bug that was revealed by reducing the amount of code run in the
library.

The attached preprocessed source crashes when built with -O2
-ffunction-sections -Wl,--gc-sections

It runs OK with -fno-lifetime-dse or with -fsanitize=undefined or if either of
-ffunction-sections or -Wl,--gc-sections is removed.


At the crash GDB shows:
Program received signal SIGSEGV, Segmentation fault.
0x7765b7cc in __run_exit_handlers (status=, 
listp=0x77860ad0 <__exit_funcs>, 
run_list_atexit=run_list_atexit@entry=true, 
run_dtors=run_dtors@entry=true) at exit.c:62
62__exit_funcs_done = true;   
─── Assembly ─
 0x7765b7b8  __run_exit_handlers+600 std r9,0(r24)
 0x7765b7bc  __run_exit_handlers+604 bne 0x7765b8d8
<__run_exit_handlers+888>
 0x7765b7c0  __run_exit_handlers+608 li  r10,1
 0x7765b7c4  __run_exit_handlers+612 nop
 0x7765b7c8  __run_exit_handlers+616 li  r9,0
 0x7765b7cc  __run_exit_handlers+620 stb r10,-18040(r2)
 0x7765b7d0  __run_exit_handlers+624 lwsync
 0x7765b7d4  __run_exit_handlers+628 lwarx   r10,0,r31
 0x7765b7d8  __run_exit_handlers+632 stwcx.  r9,0,r31
 0x7765b7dc  __run_exit_handlers+636 bne-0x7765b7d4
<__run_exit_handlers+628>
─── Registers 
 r0 0x7765b700  r1 0x7fffe8b0
 r2 0x  r3 0x
 r4 0x  r5 0x
 r6 0x  r7 0x
 r8 0x  r9 0x
r10 0x0001 r11 0x2000
r12 0x77a30960 r13 0x77ffc320
r14 0x r15 0x
r16 0x r17 0x
r18 0x r19 0x
r20 0x r21 0x
r22 0x r23 0x0001
r24 0x77860ad0 r25 0x
r26 0x r27 0x77862468
r28 0x0001 r29 0x
r30 0x77862458 r31 0x77862868
 pc 0x7765b7cc msr 0x9000d033
 cr 0x24002422  lr 0x7765b700
ctr 0x xer 0x00dd
  fpscr 0xvscr 0x
 vrsave 0x ppr 0x000c
   dscr 0x0010 tar 0x
  mmcr0 0x   mmcr2 0x
   siar 0xsdar 0x
   sier 0x orig_r3 0x7765b61c
   trap 0x0380
─── Source ───
 57  
 58if (cur == NULL)
 59  {
 60/* Exit processing complete.  We will not allow any more
 61   atexit/on_exit registrations.  */
 62__exit_funcs_done = true;
 63break;
 64  }
 65  
 66while (cur->idx > 0)
─── Stack 
[0] from 0x7765b7cc in __run_exit_handlers+620 at exit.c:62
[1] from 0x7765b948 in __GI_exit+40 at exit.c:143
[2] from 0x77637fb8 in __libc_start_call_main+168 at
../sysdeps/nptl/libc_start_call_main.h:74
[3] from 0x776381ec in generic_start_main+252 at
../csu/libc-start.c:381
[4] from 0x776381ec in __libc_start_main_impl+428 at
../sysdeps/uni

[Bug libstdc++/109889] [13/14 Regression] Segfault in __run_exit_handlers since r13-5309-gc3c6c307792026

2023-05-17 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109889

--- Comment #1 from Jonathan Wakely  ---
Tulio found out that __gnu_debug::_Safe_iterator_base::_M_reset() is
overwriting the stack where r2 (TOC pointer) was saved by __run_exit_handlers()
(at address 0x7fffe8e8). This function was called with the wrong
address of the object.

He was able to track this value back from
__gnu_debug::_Safe_sequence_base::_M_detach_all() at debug.cc:325

p *this
$1 = {
  _M_iterators = 0x7fffe8e8,
  _M_const_iterators = 0x0,
  _M_version = 1
}

[Bug libstdc++/109889] [13/14 Regression] Segfault in __run_exit_handlers since r13-5309-gc3c6c307792026

2023-05-17 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109889

--- Comment #2 from Jakub Jelinek  ---
r2 is the toc pointer, so having it 0 is weird.
Looking at glibc-2.36-10.fc37 (not sure if you are using a different one), I
see
0005b560 <__run_exit_handlers>:
   5b560:   21 00 4c 3c addis   r2,r12,33
   5b564:   a0 b9 42 38 addir2,r2,-18016
...
   5b5a8:   18 00 41 f8 std r2,24(r1)
so wonder what x/1gx $r1+24 is.  Most likely some call from that function
didn't restore r2 properly?
Though, I believe in PowerPC ELFv2 it is the caller's responsibility to restore
it
and that is why it has the nops after bl (in case the call is guaranteed to be
into code with the same TOC) and ld r2,24(r1) otherwise.

[Bug libstdc++/109889] [13/14 Regression] Segfault in __run_exit_handlers since r13-5309-gc3c6c307792026

2023-05-17 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109889

--- Comment #3 from Jonathan Wakely  ---
I wonder if we have a static destructor ordering problem.

The libstdc++  test code uses a local static std::map, which will be
constructed on first use and destroyed on exit. When built with
-D_GLIBCXX_DEBUG that is a __gnu_debug::map which uses checked iterators, so
keeps a list of all constructed iterators. On destruction that map locks a
mutex, which is another local static, and .

Since r13-6282-gd70f49e98245f8 the mutexes are created in a char buffer and
never destroyed:

// Use a static buffer, so that the mutexes are not destructed
// before potential users (or at all)
static __attribute__ ((aligned(__alignof__(M
  char buffer[(sizeof (M)) * (mask + 1)];
static M *m = new (buffer) M[mask + 1];
return m[i];

But something could be wrong with lifetimes of those statics, causing an
invalid 'this' pointer to be used somewhere.

[Bug libstdc++/109889] [13/14 Regression] Segfault in __run_exit_handlers since r13-5309-gc3c6c307792026

2023-05-17 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109889

--- Comment #4 from Jonathan Wakely  ---
(In reply to Jakub Jelinek from comment #2)
> r2 is the toc pointer, so having it 0 is weird.
> Looking at glibc-2.36-10.fc37 (not sure if you are using a different one), I

glibc-2.36-9.fc37.ppc64le

> see
> 0005b560 <__run_exit_handlers>:
>5b560:   21 00 4c 3c addis   r2,r12,33
>5b564:   a0 b9 42 38 addir2,r2,-18016
> ...
>5b5a8:   18 00 41 f8 std r2,24(r1)
> so wonder what x/1gx $r1+24 is.

(gdb) x/1gx $r1+24
0x7fffe8d8: 0x

[Bug sanitizer/109882] sanitizer/common_interface_defs.h bogusly defines __has_feature

2023-05-17 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109882

--- Comment #7 from Jonathan Wakely  ---
I'll do a little more testing and submit it upstream.

[Bug libstdc++/109883] Stack Overflow in functions with types

2023-05-17 Thread matt at mattborland dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109883

--- Comment #2 from Matt Borland  ---
(In reply to Xi Ruoyao from comment #1)
> Cannot reproduce for me.  Note that in this case GCC optimizes the entire
> function call away (see https://godbolt.org/z/968bPTvh9) even with -O0 so I
> can see no way how this will lead to a runtime error.

Here is an updated reproducer:

#include 
#include 
#include 

int main()
{
auto val = std::pow(0.5F64, 2);
std::cout << val << std::endl;
}

The failure can be seen godbolt here: https://godbolt.org/z/ej5nPn7o4. Running
this same snippet locally with ASAN yields:

AddressSanitizer:DEADLYSIGNAL
=
==110879==ERROR: AddressSanitizer: stack-overflow on address 0x7fff6e2e7ff8 (pc
0x0040126e bp 0x7fff6e2e8010 sp 0x7fff6e2e8000 T0)

#0 0x40126e in __gnu_cxx::__promote_2::__value>::__type)(0))+((__gnu_cxx::__promote_2<_Float64,
std::__is_integer<_Float64>::__value>::__type)(0))), std::__is_integer::__value>::__type)(0))+((__gnu_cxx::__promote_2<_Float64,
std::__is_integer<_Float64>::__value>::__type)(0)))>::__value>::__type
std::pow<_Float64, _Float64>(_Float64, _Float64)
(/home/mborland/Documents/boost/libs/math/test/so+0x40126e) (BuildId:
6f720390f8d2a24a6dabec3c85e9cf5bb4c192ea)

SUMMARY: AddressSanitizer: stack-overflow
(/home/mborland/Documents/boost/libs/math/test/so+0x40126e) (BuildId:
6f720390f8d2a24a6dabec3c85e9cf5bb4c192ea) in __gnu_cxx::__promote_2::__value>::__type)(0))+((__gnu_cxx::__promote_2<_Float64,
std::__is_integer<_Float64>::__value>::__type)(0))), std::__is_integer::__value>::__type)(0))+((__gnu_cxx::__promote_2<_Float64,
std::__is_integer<_Float64>::__value>::__type)(0)))>::__value>::__type
std::pow<_Float64, _Float64>(_Float64, _Float64)
==110879==ABORTING

For brevity I snipped out 245 more instances of the message next to #0.

> And GCC for aarch64-darwin target (i. e. "macOS 13.3.1 on M1") is not a part
> of this project, so are you using another fork?

It is provided by homebrew as gcc@13. For this reply I am using my Fedora 38
system with "gcc version 13.1.1 20230511 (Red Hat 13.1.1-2) (GCC)"

[Bug libstdc++/109883] Stack Overflow in functions with types and -std=c++23

2023-05-17 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109883

Xi Ruoyao  changed:

   What|Removed |Added

  Known to fail||13.1.0, 14.0
 Status|WAITING |NEW
Summary|Stack Overflow in|Stack Overflow in 
   |functions with|functions with 
   |types   |types and -std=c++23

--- Comment #3 from Xi Ruoyao  ---
Confirmed.  -std=c++23 is needed to reproduce.

[Bug libstdc++/109883] Stack Overflow in functions with types and -std=c++23

2023-05-17 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109883

--- Comment #4 from Xi Ruoyao  ---
It seems the function

__gnu_cxx::__promote_2::__value>::__type)(0))+((__gnu_cxx::__promote_2<_Float64,
std::__is_integer<_Float64>::__value>::__type)(0))), std::__is_integer::__value>::__type)(0))+((__gnu_cxx::__promote_2<_Float64,
std::__is_integer<_Float64>::__value>::__type)(0)))>::__value>::__type
std::pow<_Float64, _Float64>(_Float64, _Float64)

is recursing infinitely.

[Bug target/109811] libjxl 0.7 is a lot slower in GCC 13.1 vs Clang 16

2023-05-17 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109811

--- Comment #6 from Jan Hubicka  ---
hottest loop in clang's profile is:
  for (size_t y = 0; y < opsin.ysize(); y++) {
for (size_t x = 0; x < opsin.xsize(); x++) {
  if (is_background_row[y * is_background_stride + x]) continue;
  cc.clear();
  stack.clear();
  stack.emplace_back(x, y);
  size_t min_x = x;
  size_t max_x = x;
  size_t min_y = y;
  size_t max_y = y;
  std::pair reference;
  bool found_border = false;
  bool all_similar = true;
  while (!stack.empty()) {
std::pair cur = stack.back();
stack.pop_back();
if (visited_row[cur.second * visited_stride + cur.first]) continue;
^^^
closed by this continue.
visited_row[cur.second * visited_stride + cur.first] = 1;
if (cur.first < min_x) min_x = cur.first;
if (cur.first > max_x) max_x = cur.first;
if (cur.second < min_y) min_y = cur.second;
if (cur.second > max_y) max_y = cur.second;
if (paint_ccs) {
  cc.push_back(cur);
}
for (int dx = -kSearchRadius; dx <= kSearchRadius; dx++) {
  for (int dy = -kSearchRadius; dy <= kSearchRadius; dy++) {
if (dx == 0 && dy == 0) continue;
int next_first = static_cast(cur.first) + dx;
int next_second = static_cast(cur.second) + dy;
if (next_first < 0 || next_second < 0 ||
static_cast(next_first) >= opsin.xsize() ||
static_cast(next_second) >= opsin.ysize()) {
  continue;
}
std::pair next{next_first, next_second};
if (!is_background_row[next.second * is_background_stride +
   next.first]) {
  stack.push_back(next);
} else {
  if (!found_border) {
reference = next;
found_border = true;
  } else {
if (!is_similar_b(next, reference)) all_similar = false;
  }
}
  }
}
  }
  if (!found_border || !all_similar || max_x - min_x >= kMaxPatchSize ||
  max_y - min_y >= kMaxPatchSize) {
continue;
  }
  size_t bpos = background_stride * reference.second + reference.first;
  float ref[3] = {background_rows[0][bpos], background_rows[1][bpos],
  background_rows[2][bpos]};
  bool has_similar = false;
  for (size_t iy = std::max(
   static_cast(min_y) - kHasSimilarRadius, 0);
   iy < std::min(max_y + kHasSimilarRadius + 1, opsin.ysize()); iy++) {
for (size_t ix = std::max(
 static_cast(min_x) - kHasSimilarRadius, 0);
 ix < std::min(max_x + kHasSimilarRadius + 1, opsin.xsize());
 ix++) {
  size_t opos = opsin_stride * iy + ix;
  float px[3] = {opsin_rows[0][opos], opsin_rows[1][opos],
 opsin_rows[2][opos]};
  if (pci.is_similar_v(ref, px, kHasSimilarThreshold)) {
has_similar = true;
  }
}
  }
  if (!has_similar) continue;
  info.emplace_back();
  info.back().second.emplace_back(min_x, min_y);
  QuantizedPatch& patch = info.back().first;
  patch.xsize = max_x - min_x + 1;
  patch.ysize = max_y - min_y + 1;
  int max_value = 0;
  for (size_t c : {1, 0, 2}) {
for (size_t iy = min_y; iy <= max_y; iy++) {
  for (size_t ix = min_x; ix <= max_x; ix++) {
size_t offset = (iy - min_y) * patch.xsize + ix - min_x;
patch.fpixels[c][offset] =
opsin_rows[c][iy * opsin_stride + ix] - ref[c];
int val = pci.Quantize(patch.fpixels[c][offset], c);
patch.pixels[c][offset] = val;
if (std::abs(val) > max_value) max_value = std::abs(val);
  }
}
  }
  if (max_value < kMinPeak) {
info.pop_back();
continue;
  }
  if (paint_ccs) {
float cc_color = rng.UniformF(0.5, 1.0);
for (std::pair p : cc) {
  ccs.Row(p.second)[p.first] = cc_color;
}
  }
}
  }

I guess such a large loop nest with hottest loop not being the innermost is bad
for register pressure. 
Clangs code is :
  0.02 │1196:┌─→cmp  %r10,-0xb8(%rbp)  
  ▒
   │ │jxl::FindBestPatchDictionary(jxl::Image3 const&,
jxl::PassesEncoderState*, JxlCmsInterface co▒
   │ │while (!stack.empty()) { 
  ◆
  1.39 │ │↓ je   1690  
  ▒
   │ │std::pair cur = stack.back();
  ▒
   │11a3:│  mov  -0x8(%r10),%rbx   

[Bug libstdc++/109883] Stack Overflow in functions with types and -std=c++23

2023-05-17 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109883

--- Comment #5 from Xi Ruoyao  ---
1203
<_ZSt3powIDF64_DF64_EN9__gnu_cxx11__promote_2IDTplcvNS1_IT_XsrSt12__is_integerIS2_E7__valueEE6__typeELi0EcvNS1_IT0_XsrS3_IS7_E7__valueEE6__typeELi0EEXsrS3_ISB_E7__valueEE6__typeES2_S7_>:
1203:   55  push   %rbp
1204:   48 89 e5mov%rsp,%rbp
1207:   48 83 ec 10 sub$0x10,%rsp
120b:   f2 0f 11 45 f8  movsd  %xmm0,-0x8(%rbp)
1210:   f2 0f 11 4d f0  movsd  %xmm1,-0x10(%rbp)
1215:   f2 0f 10 45 f0  movsd  -0x10(%rbp),%xmm0
121a:   48 8b 45 f8 mov-0x8(%rbp),%rax
121e:   66 0f 28 c8 movapd %xmm0,%xmm1
1222:   66 48 0f 6e c0  movq   %rax,%xmm0
1227:   e8 d7 ff ff ff  call   1203
<_ZSt3powIDF64_DF64_EN9__gnu_cxx11__promote_2IDTplcvNS1_IT_XsrSt12__is_integerIS2_E7__valueEE6__typeELi0EcvNS1_IT0_XsrS3_IS7_E7__valueEE6__typeELi0EEXsrS3_ISB_E7__valueEE6__typeES2_S7_>
122c:   66 48 0f 7e c0  movq   %xmm0,%rax
1231:   66 48 0f 6e c0  movq   %rax,%xmm0
1236:   c9  leave
1237:   c3  ret

This is just stupid...

[Bug libstdc++/109883] Stack Overflow in functions with types and -std=c++23

2023-05-17 Thread matt at mattborland dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109883

--- Comment #6 from Matt Borland  ---
(In reply to Xi Ruoyao from comment #4)
> It seems the function
> 
> __gnu_cxx::__promote_2 std::__is_integer<_Float64>::__value>::__type)(0))+((__gnu_cxx::
> __promote_2<_Float64, std::__is_integer<_Float64>::__value>::__type)(0))),
> std::__is_integer std::__is_integer<_Float64>::__value>::__type)(0))+((__gnu_cxx::
> __promote_2<_Float64,
> std::__is_integer<_Float64>::__value>::__type)(0)))>::__value>::__type
> std::pow<_Float64, _Float64>(_Float64, _Float64)
> 
> is recursing infinitely.

For Boost.Math's implementation of promote_2 we found template specializations
to be effective:
https://github.com/boostorg/math/pull/978/files#diff-2463d99030329b154489b8b34ce1068a34e736cab268c3421b058ca0e516680cR189.

[Bug libstdc++/109883] Stack Overflow in functions with types and -std=c++23

2023-05-17 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109883

--- Comment #7 from Jakub Jelinek  ---
I think we need to move those __promote_{2,3} using templates for atan2, fmod,
pow, copysign, fdim, fmax, fmin, hypot, nextafter, remainder, remquo and fma
later, because right now we have the overloads with float, double and long
double arguments, then these templates and later on _Float{16,32,64,128} and
bfloat16_t overloads, and as those __promote_{2,3} templates call itself with
promoted arguments, they self-recurse if the promoted arguments are
_Float{16,32,64,128} or bfloat16_t.

[Bug c++/109532] -fshort-enums does not pick smallest underlying type for scoped enum

2023-05-17 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109532

--- Comment #7 from CVS Commits  ---
The master branch has been updated by Jonathan Wakely :

https://gcc.gnu.org/g:d8a656d5b6246457e84934bc35115c134bc38def

commit r14-932-gd8a656d5b6246457e84934bc35115c134bc38def
Author: Jonathan Wakely 
Date:   Thu Apr 27 12:02:38 2023 +0100

doc: Describe behaviour of enums with fixed underlying type [PR109532]

gcc/ChangeLog:

PR c++/109532
* doc/invoke.texi (Code Gen Options): Note that -fshort-enums
is ignored for a fixed underlying type.
(C++ Dialect Options): Likewise for -fstrict-enums.

Reviewed-by: Marek Polacek 

[Bug libstdc++/109883] Stack Overflow in functions with types and -std=c++23

2023-05-17 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109883

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #8 from Jakub Jelinek  ---
Created attachment 55100
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55100&action=edit
gcc14-pr109883.patch

Untested fix.  Still need to add some testsuite coverage.

[Bug c++/100052] [11/12/13/14 regression] ICE in compiling g++.dg/modules/xtreme-header-3_b.C after r11-8118

2023-05-17 Thread seurer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100052

--- Comment #14 from seurer at gcc dot gnu.org ---
The failures occur erratically so one clean run doesn't mean much.  Scanning
the test results mailing list I see failures for this just today in trunk.

[Bug c++/98202] C++ cannot parse F128 suffix for float128 literals

2023-05-17 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98202

--- Comment #5 from Jonathan Wakely  ---
Q can't be used with -std=c++NN strict modes, as noted in bug 87274

limits:2085: error: unable to find numeric literal operator 'operator""Q'

[Bug libstdc++/109890] New: vector's constructor doesn't start object lifetimes during constant evaluation

2023-05-17 Thread barry.revzin at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109890

Bug ID: 109890
   Summary: vector's constructor doesn't start object lifetimes
during constant evaluation
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: barry.revzin at gmail dot com
  Target Milestone: ---

>From StackOverflow (https://stackoverflow.com/q/76269606/2069064), clang
rejects this code when compiling with libstdc++:

#include 

consteval auto bar(int n){
std::vector v(n);
return v[0];
}
constexpr auto m = bar(5);

This is because libstdc++ basically does something like this:

#include 

class V {
int* p;
int n;
std::allocator alloc;

public:
constexpr V(int n)
: n(n)
{
p = alloc.allocate(n);

// fill with 0s?
for (int i = 0; i != n; ++i) {
p[i] = 0;
}
}

constexpr ~V() {
alloc.deallocate(p, n);
}
};

consteval auto bar(int n) {
V v(n);
return n;
}
static_assert(bar(5) == 5);

And clang is more picky about the assignment there - it doesn't like just
writing p[0] = 0, because the int's lifetime hasn't started yet. gcc accepts
the above though. 

I think that's... technically correct (if pedantic) and libstdc++'s path needs
to do a construct_at somewhere.

[Bug c++/109532] -fshort-enums does not pick smallest underlying type for scoped enum

2023-05-17 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109532

--- Comment #8 from Jonathan Wakely  ---
I've updated the docs to make this clear.

[Bug target/109811] libjxl 0.7 is a lot slower in GCC 13.1 vs Clang 16

2023-05-17 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109811

JuzheZhong  changed:

   What|Removed |Added

 CC||juzhe.zhong at rivai dot ai

--- Comment #7 from JuzheZhong  ---
(In reply to Jan Hubicka from comment #6)
> hottest loop in clang's profile is:
>   for (size_t y = 0; y < opsin.ysize(); y++) {
> for (size_t x = 0; x < opsin.xsize(); x++) {
>   if (is_background_row[y * is_background_stride + x]) continue;
>   cc.clear();
>   stack.clear();
>   stack.emplace_back(x, y);
>   size_t min_x = x;
>   size_t max_x = x;
>   size_t min_y = y;
>   size_t max_y = y;
>   std::pair reference;
>   bool found_border = false;
>   bool all_similar = true;
>   while (!stack.empty()) {
> std::pair cur = stack.back();
> stack.pop_back();
> if (visited_row[cur.second * visited_stride + cur.first]) continue;
> ^^^
> closed by this continue.
> visited_row[cur.second * visited_stride + cur.first] = 1;
> if (cur.first < min_x) min_x = cur.first;
> if (cur.first > max_x) max_x = cur.first;
> if (cur.second < min_y) min_y = cur.second;
> if (cur.second > max_y) max_y = cur.second;
> if (paint_ccs) {
>   cc.push_back(cur);
> }
> for (int dx = -kSearchRadius; dx <= kSearchRadius; dx++) {
>   for (int dy = -kSearchRadius; dy <= kSearchRadius; dy++) {
> if (dx == 0 && dy == 0) continue;
> int next_first = static_cast(cur.first) + dx;
> int next_second = static_cast(cur.second) + dy;
> if (next_first < 0 || next_second < 0 ||
> static_cast(next_first) >= opsin.xsize() ||
> static_cast(next_second) >= opsin.ysize()) {
>   continue;
> }
> std::pair next{next_first, next_second};
> if (!is_background_row[next.second * is_background_stride +
>next.first]) {
>   stack.push_back(next);
> } else {
>   if (!found_border) {
> reference = next;
> found_border = true;
>   } else {
> if (!is_similar_b(next, reference)) all_similar = false;
>   }
> }
>   }
> }
>   }
>   if (!found_border || !all_similar || max_x - min_x >= kMaxPatchSize ||
>   max_y - min_y >= kMaxPatchSize) {
> continue;
>   }
>   size_t bpos = background_stride * reference.second + reference.first;
>   float ref[3] = {background_rows[0][bpos], background_rows[1][bpos],
>   background_rows[2][bpos]};
>   bool has_similar = false;
>   for (size_t iy = std::max(
>static_cast(min_y) - kHasSimilarRadius, 0);
>iy < std::min(max_y + kHasSimilarRadius + 1, opsin.ysize());
> iy++) {
> for (size_t ix = std::max(
>  static_cast(min_x) - kHasSimilarRadius, 0);
>  ix < std::min(max_x + kHasSimilarRadius + 1, opsin.xsize());
>  ix++) {
>   size_t opos = opsin_stride * iy + ix;
>   float px[3] = {opsin_rows[0][opos], opsin_rows[1][opos],
>  opsin_rows[2][opos]};
>   if (pci.is_similar_v(ref, px, kHasSimilarThreshold)) {
> has_similar = true;
>   }
> }
>   }
>   if (!has_similar) continue;
>   info.emplace_back();
>   info.back().second.emplace_back(min_x, min_y);
>   QuantizedPatch& patch = info.back().first;
>   patch.xsize = max_x - min_x + 1;
>   patch.ysize = max_y - min_y + 1;
>   int max_value = 0;
>   for (size_t c : {1, 0, 2}) {
> for (size_t iy = min_y; iy <= max_y; iy++) {
>   for (size_t ix = min_x; ix <= max_x; ix++) {
> size_t offset = (iy - min_y) * patch.xsize + ix - min_x;
> patch.fpixels[c][offset] =
> opsin_rows[c][iy * opsin_stride + ix] - ref[c];
> int val = pci.Quantize(patch.fpixels[c][offset], c);
> patch.pixels[c][offset] = val;
> if (std::abs(val) > max_value) max_value = std::abs(val);
>   }
> }
>   }
>   if (max_value < kMinPeak) {
> info.pop_back();
> continue;
>   }
>   if (paint_ccs) {
> float cc_color = rng.UniformF(0.5, 1.0);
> for (std::pair p : cc) {
>   ccs.Row(p.second)[p.first] = cc_color;
> }
>   }
> }
>   }
> 
> I guess such a large loop nest with hottest loop not being the innermost is
> bad for register pressure. 
> Clangs code is :
>   0.02 │1196:┌─→cmp  %r10,-0xb8(%rbp)   
> ▒
>│ │jxl::FindBestPatchDictionary(jxl::Image3 

[Bug target/109811] libjxl 0.7 is a lot slower in GCC 13.1 vs Clang 16

2023-05-17 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109811

--- Comment #8 from Jan Hubicka  ---
Created attachment 55101
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55101&action=edit
hottest loop

jpegxl build machinery adds -fno-vectorize and -fno-slp-vectorize to clang
flags.  Adding -fno-tree-vectorize -fno-tree-slp-vectorize makes GCC generated
code more similar.  With this most difference is caused by
FindBestPatchDictionary or FindTextLikePatches if that function is not inlined.

  15.22%  cjxl libjxl.so.0.7.0   [.] jxl::(anonymous
namespace)::FindTextLikePatches 
  10.19%  cjxl libjxl.so.0.7.0   [.] jxl::FindBestPatchDictionary   
   5.27%  cjxl libjxl.so.0.7.0   [.] jxl::N_AVX2::QuantizeBlockAC   
   5.06%  cjxl libjxl.so.0.7.0   [.] jxl::N_AVX2::EstimateEntropy   
   4.82%  cjxl libjxl.so.0.7.0   [.] jxl::N_AVX2::EstimateEntropy   
   4.35%  cjxl libjxl.so.0.7.0   [.] jxl::N_AVX2::QuantizeBlockAC   
   4.21%  cjxl libjxl.so.0.7.0   [.] jxl::N_AVX2::(anonymous
namespace)::TransformFromPixels 
   3.87%  cjxl libjxl.so.0.7.0   [.] jxl::N_AVX2::(anonymous
namespace)::TransformFromPixels 
   3.78%  cjxl libjxl.so.0.7.0   [.] jxl::N_AVX2::FindBestMultiplier
   3.27%  cjxl libjxl.so.0.7.0   [.] jxl::N_AVX2::FindBestMultiplier

I think it is mostly register allocation not handling well the internal loop
quoted above.  I am adding preprocessed sources.

[Bug libstdc++/109890] vector's constructor doesn't start object lifetimes during constant evaluation

2023-05-17 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109890

Jonathan Wakely  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2023-05-17
 Status|UNCONFIRMED |NEW

--- Comment #1 from Jonathan Wakely  ---
For trivial types the std::uninitialized_xxx algos elide the constructors and
just do something like memcpy/memset. We need to use
std::is_constant_evaluated() to elide the elision in this case.

[Bug target/109811] libjxl 0.7 is a lot slower in GCC 13.1 vs Clang 16

2023-05-17 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109811

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #9 from Jakub Jelinek  ---
(In reply to JuzheZhong from comment #7)
> It seems that Clang has better performance than GCC in case of no vectorizer?

That is very general statement.  On some particular code, some particular arch,
with some particular flags Clang performs better than GCC, on other it is the
other way around, on some it is wash.  How it performs on larger amounts of
code can be seen from standard benchmarks like SPEC, the Phoronix benchmark
suite is known not to be a very good benchmark for various reasons, but that
doesn't mean it isn't worth looking at it.

[Bug libstdc++/109891] New: Null pointer special handling in ostream's operator << for C-strings

2023-05-17 Thread mimomorin at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109891

Bug ID: 109891
   Summary: Null pointer special handling in ostream's operator <<
for C-strings
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: mimomorin at gmail dot com
  Target Milestone: ---

This code

#include 
int main() { std::cout << (char*)nullptr; }

does not cause any bad things (like SEGV), because libstdc++'s
operator<<(ostream, char const*) has special handling of null pointers: 

template
inline basic_ostream<_CharT, _Traits>&
operator<<(basic_ostream<_CharT, _Traits>& __out, const _CharT* __s)
{
if (!__s)
__out.setstate(ios_base::badbit);
else
__ostream_insert(...);
return __out;
}

Passing a null pointer to this operator is a precondition violation, so the
current implementation perfectly conforms to the C++ standard. But, why don't
we remove this special handling? By doing so, we get
- better interoperability with toolings (i.e. sanitizers can find the bug
easily)
- unnoticeable performace improvement
and we lose
- deterministic behaviors (of poor codes) on a particular stdlib
I believe the first point makes more sense than the last point.

It seems that old special handling `if (s == NULL) s = "(null)";`
(https://github.com/gcc-mirror/gcc/blob/6599da0/libio/iostream.cc#L638) was
removed in GCC 3.0, but reintroduced (in the current form) in GCC 3.2 in
response to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6518 .

[Bug libstdc++/109883] Stack Overflow in functions with types and -std=c++23

2023-05-17 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109883

Jakub Jelinek  changed:

   What|Removed |Added

  Attachment #55100|0   |1
is obsolete||
 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org

--- Comment #9 from Jakub Jelinek  ---
Created attachment 55102
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55102&action=edit
gcc14-pr109883.patch

Updated patch including testsuite coverage.

[Bug libstdc++/109891] Null pointer special handling in ostream's operator << for C-strings

2023-05-17 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109891

--- Comment #1 from Jonathan Wakely  ---
Adding more UB to the library doesn't seem wise.

We could make it abort in debug mode, instead of setting badbit, but I don't
think we should just make it UB.

[Bug libstdc++/109891] Null pointer special handling in ostream's operator << for C-strings

2023-05-17 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109891

--- Comment #2 from Jonathan Wakely  ---
--- a/libstdc++-v3/include/bits/ostream.tcc
+++ b/libstdc++-v3/include/bits/ostream.tcc
@@ -306,6 +306,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 basic_ostream<_CharT, _Traits>&
 operator<<(basic_ostream<_CharT, _Traits>& __out, const char* __s)
 {
+  _GLIBCXX_DEBUG_PEDANTIC(__s != 0);
   if (!__s)
__out.setstate(ios_base::badbit);
   else

[Bug tree-optimization/109892] New: SLP failure with explicit fma

2023-05-17 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109892

Bug ID: 109892
   Summary: SLP failure with explicit fma
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: amonakov at gcc dot gnu.org
  Target Milestone: ---

At -O2 -mfma (x86) or -O3 (arm64) we fail to SLP-vectorize 'f', but succeed in
'g':

double f(double x[], long n)
{
double r0 = 0, r1 = 0;
for (; n; x += 2, n--) {
r0 = __builtin_fma(x[0], x[0], r0);
r1 = __builtin_fma(x[1], x[1], r1);
}
return r0 + r1;
}
static double muladd(double x, double y, double z)
{
return x * y + z;
}
double g(double x[], long n)
{
double r0 = 0, r1 = 0;
for (; n; x += 2, n--) {
r0 = muladd(x[0], x[0], r0);
r1 = muladd(x[1], x[1], r1);
}
return r0 + r1;
}

It seems we are calling vectorizable_reduction for __builtin_fma even though it
would not participate in a reduction when vectorizing for 16-byte vectors?

[Bug fortran/109684] compiling failure: complaining about a final subroutine of a type being not PURE (while it is indeed PURE)

2023-05-17 Thread wangmianzhi1 at linuxmail dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109684

Mianzhi Wang  changed:

   What|Removed |Added

  Attachment #54964|0   |1
is obsolete||
 CC||wangmianzhi1 at linuxmail dot 
org

--- Comment #1 from Mianzhi Wang  ---
Created attachment 55103
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55103&action=edit
slightly more simplified

build with cmake

[Bug c++/97340] Spurious rejection of member variable template of reference type

2023-05-17 Thread ppalka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97340

Patrick Palka  changed:

   What|Removed |Added

   Keywords||rejects-valid
 CC||ppalka at gcc dot gnu.org
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=108848

--- Comment #3 from Patrick Palka  ---
We accept the original testcase (where A is not a template) since
r13-6380-gd3d205ab440886, but we still incorrectly reject the version where A
is a template:

template
struct A {
  template
  static constexpr const int &x=0;
};

template
struct B {
  static constexpr int y=A::template x;
};

template struct B;

[Bug libstdc++/46906] istreambuf_iterator is late?

2023-05-17 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46906

--- Comment #13 from Jonathan Wakely  ---
This seems related to https://cplusplus.github.io/LWG/issue2366 and the changes
I'm proposing there.

[Bug rtl-optimization/109858] [14 Regression] r14-172 caused some SPEC2017 bmk to degrade on Power

2023-05-17 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109858

--- Comment #10 from Segher Boessenkool  ---
(In reply to Hongtao.liu from comment #8)
> (In reply to Segher Boessenkool from comment #7)
> > > The patch will still use GENERAL_REGS when hard_regno_mode_ok for mode and
> > > GENERAL_REGS(which is the case in PR109610), hope it can also fix this
> > > regression.
> > 
> > That sounds more reasonable.  But, why use any heuristics like this?  Can't
> > you
> > just look at the actual costs of using mem and regs?
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109610#c2

That is not an answer to my question at all?

[Bug target/109885] gcc does not generate movmskps and testps instructions (clang does)

2023-05-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109885

--- Comment #1 from Andrew Pinski  ---
Just FYI, GCC does better on aarch64 with sum.
GCC:
ldp q29, q30, [x0]
moviv31.4s, 0x1
fcmeq   v29.4s, v29.4s, 0
fcmeq   v30.4s, v30.4s, 0
and v31.16b, v31.16b, v29.16b
sub v31.4s, v31.4s, v30.4s
addvs31, v31.4s
fmovw0, s31
ret

vs this mess:
sub sp, sp, #16
ldp q1, q0, [x0]
adrpx8, .LCPI0_0
fcmeq   v1.4s, v1.4s, #0.0
fcmeq   v0.4s, v0.4s, #0.0
uzp1v0.8h, v1.8h, v0.8h
ldr q1, [x8, :lo12:.LCPI0_0]
and v0.16b, v0.16b, v1.16b
addvh0, v0.8h
fmovw8, s0
and w8, w8, #0xff
fmovs0, w8
cnt v0.8b, v0.8b
uaddlv  h0, v0.8b
fmovw0, s0
add sp, sp, #16
ret

The reason is it looks like clang/LLVM is tuned to try to use movmskps/testps
while GCC is tuned to do just a sum reduction in general.
Though I think GCC could be slightly better here too.
ldp q29, q30, [x0]
fcmeq   v29.4s, v29.4s, 0
fcmeq   v30.4s, v30.4s, 0
add v31.16b, v29.16b, v30.16b
addvs31, v31.4s
fmovw0, s31
neg w0, w0
ret

I think might be the best code for aarch64 reduction of bools

[Bug middle-end/109849] suboptimal code for vector walking loop

2023-05-17 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

Jan Hubicka  changed:

   What|Removed |Added

 Blocks||109811
 CC||mjambor at suse dot cz

--- Comment #6 from Jan Hubicka  ---
Here is slightly improved testcase which actually pushes into stack and
measures something. It test loops 1000 times and returns.  It also makes stack
to be local variable so race conditions are not a problem.

#include 
typedef unsigned int uint32_t;
std::pair pair;
void
test()
{
std::vector> stack;
stack.push_back (pair);
while (!stack.empty()) {
std::pair cur = stack.back();
stack.pop_back();
if (!cur.first)
{
cur.second++;
stack.push_back (cur);
}
if (cur.second > 1)
break;
}
}
int
main()
{
for (int i = 0; i < 1; i++)
  test();
}

Clang code is about twice as fast

jan@localhost:/tmp> clang++ -O2 tt.C  -fno-exceptions
jan@localhost:/tmp> g++ -O2 tt.C  -fno-exceptions -o a.out-gcc
jan@localhost:/tmp> perf stat ./a.out

 Performance counter stats for './a.out':

434.24 msec task-clock:u #0.997 CPUs
utilized 
 0  context-switches:u   #0.000 /sec
 0  cpu-migrations:u #0.000 /sec
   129  page-faults:u#  297.073 /sec
 1,003,191,657  cycles:u #2.310 GHz 
68,927  stalled-cycles-frontend:u#0.01% frontend
cycles idle  
   800,792,619  stalled-cycles-backend:u #   79.82% backend
cycles idle   
 1,904,682,933  instructions:u   #1.90  insn per
cycle
  #0.42  stalled cycles per
insn   
   500,912,196  branches:u   #1.154 G/sec   
23,144  branch-misses:u  #0.00% of all
branches   

   0.435340389 seconds time elapsed

   0.431409000 seconds user
   0.003994000 seconds sys


jan@localhost:/tmp> perf stat ./a.out-gcc

 Performance counter stats for './a.out-gcc':

  1,197.28 msec task-clock:u #0.999 CPUs
utilized 
 0  context-switches:u   #0.000 /sec
 0  cpu-migrations:u #0.000 /sec
   131  page-faults:u#  109.415 /sec
 2,903,995,656  cycles:u #2.425 GHz 
86,204  stalled-cycles-frontend:u#0.00% frontend
cycles idle  
 2,690,907,052  stalled-cycles-backend:u #   92.66% backend
cycles idle   
 2,005,212,311  instructions:u   #0.69  insn per
cycle
  #1.34  stalled cycles per
insn   
   401,007,320  branches:u   #  334.932 M/sec   
23,290  branch-misses:u  #0.01% of all
branches   

   1.198388186 seconds time elapsed

   1.19845 seconds user
   0.0 seconds sys


The problem seems to be, like in first example, that we keep updating in-memory
stack in the main loop.

.L39:
movl12(%rsp), %ebx
.L30:
movq16(%rsp), %rax
cmpl$1, %ebx
ja  .L33
.L40:
movq24(%rsp), %rdi
cmpq%rdi, %rax
je  .L28
.L34:
movq-8(%rdi), %rax
leaq-8(%rdi), %rsi
movq%rsi, 24(%rsp)
movq%rax, 8(%rsp)
testl   %eax, %eax
jne .L39

While clang does:

.LBB0_1:#   in Loop: Header=BB0_4 Depth=1
movq%rax, %r14
.LBB0_2:#   in Loop: Header=BB0_4 Depth=1
movq%rbx, %r12
movq%r12, %rbx
cmpl$10001, %r13d   # imm = 0x2711
jae .LBB0_27
.LBB0_4:# =>This Loop Header: Depth=1
# Child Loop BB0_16 Depth 2
# Child Loop BB0_21 Depth 2
cmpq%r14, %rbx
je  .LBB0_26
# %bb.5:#   in Loop: Header=BB0_4 Depth=1
leaq-8(%r14), %rax
movq-8(%r14), %rcx
movq%rcx, %r13
shrq$32, %r13
testl   %ecx, %ecx
jne .LBB0_1


Referenced Bugs:

https://gcc.gnu.org/bugzilla/sho

[Bug target/109811] libjxl 0.7 is a lot slower in GCC 13.1 vs Clang 16

2023-05-17 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109811

--- Comment #10 from Jan Hubicka  ---
Actually vectorization hurts on both compilers and bit more with clang.
It seems that all important loops are hand vectorized and since register
pressure is a problem, vectorizing other loops causes enough of collateral
damage to register allocation to regress performance.

I believe the core of the problem (or at least one of them) is simply way we
compile loops popping data from std::vector based stack. See PR109849
We keep updating stack datastructure in the innermost loop becuase in not too
common case reallocation needs to be done and that is done by offlined code.

[Bug tree-optimization/106900] Regression after memchr optimization

2023-05-17 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106900

--- Comment #6 from CVS Commits  ---
The trunk branch has been updated by Andrew Pinski :

https://gcc.gnu.org/g:f65af1eeef670f2c249b1896726ef57bbf65fe2f

commit r14-937-gf65af1eeef670f2c249b1896726ef57bbf65fe2f
Author: Andrew Pinski 
Date:   Tue May 16 14:34:05 2023 -0700

Fix PR 106900: array-bounds warning inside simplify_builtin_call

The problem here is that VRP cannot figure out isize could not be 0
due to using integer_zerop. This patch removes the use of integer_zerop
and instead checks for 0 directly after converting the tree to
an unsigned HOST_WIDE_INT. This allows VRP to figure out isize is not 0
and `isize - 1` will always be >= 0.

This patch is just to avoid the warning that GCC could produce sometimes
and does not change any code generation or even VRP.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-ssa-forwprop.cc (simplify_builtin_call): Check
against 0 instead of calling integer_zerop.

[Bug analyzer/109570] detect fclose on unopened or NULL files

2023-05-17 Thread clyon at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109570

--- Comment #5 from Christophe Lyon  ---
Not sure how to update/fix the testcases though?
Since they get the declaration of fclose from stdio.h, we'd need to make
dg-error conditional to the glibc version in use, which seems unpractical.

Should we instead remove #include  and provide suitable declarations
in the testcase?

[Bug libstdc++/109889] [13/14 Regression] Segfault in __run_exit_handlers since r13-5309-gc3c6c307792026

2023-05-17 Thread tuliom at ascii dot art.br via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109889

--- Comment #5 from Tulio Magno Quites Machado Filho  ---
(In reply to Jonathan Wakely from comment #3)
> I wonder if we have a static destructor ordering problem.

I'm afraid the issue is happening earlier, when these iterators are being
initialized.
Look at this backtrace taken during initialization:

#0  0x77b536e4 in __gnu_debug::_Safe_sequence_base::_M_attach_single
(this=0x100414c8 <__gnu_cxx::annotate_base::map_alloc()::_S_map>, 
__it=0x7fffe8f8, __constant=false) at
/home/test/src/gcc/libstdc++-v3/src/c++11/debug.cc:396
#1  0x77b5376c in __gnu_debug::_Safe_sequence_base::_M_attach
(this=0x100414c8 <__gnu_cxx::annotate_base::map_alloc()::_S_map>, 
__it=0x7fffe8f8, __constant=false) at
/home/test/src/gcc/libstdc++-v3/src/c++11/debug.cc:383
#2  0x77b53cd8 in __gnu_debug::_Safe_iterator_base::_M_attach
(this=0x7fffe8f8, 
__seq=0x100414c8 <__gnu_cxx::annotate_base::map_alloc()::_S_map>,
__constant=false) at /home/test/src/gcc/libstdc++-v3/src/c++11/debug.cc:430
#3  0x10012244 in __gnu_debug::_Safe_iterator_base::_Safe_iterator_base
(__constant=false, 
__seq=0x100414c8 <__gnu_cxx::annotate_base::map_alloc()::_S_map>,
this=)
at /home/test/gcc-14/include/c++/14.0.0/debug/safe_base.h:91
#4  __gnu_debug::_Safe_iterator > >, std::__debug::map, std::less,
std::allocator >
> >, std::forward_iterator_tag>::_Safe_iterator (__seq=0x100414c8
<__gnu_cxx::annotate_base::map_alloc()::_S_map>, __i=..., this=0x7fffe8f0)
at /home/test/gcc-14/include/c++/14.0.0/debug/safe_iterator.h:162
#5  __gnu_debug::_Safe_iterator > >, std::__debug::map, std::less,
std::allocator >
> >, std::bidirectional_iterator_tag>::_Safe_iterator (__seq=0x100414c8
<__gnu_cxx::annotate_base::map_alloc()::_S_map>, __i=..., this=0x7fffe8f0)
at /home/test/gcc-14/include/c++/14.0.0/debug/safe_iterator.h:539
#6  std::__debug::map,
std::less, std::allocator > > >::find (__x=: 0x0, this=0x100414c8
<__gnu_cxx::annotate_base::map_alloc()::_S_map>)
at /home/test/gcc-14/include/c++/14.0.0/debug/map.h:583
#7  __gnu_cxx::annotate_base::check_allocated (this=, size=4,
p=0x0)
at /home/test/gcc-14/include/c++/14.0.0/ext/throw_allocator.h:177
#8  __gnu_cxx::annotate_base::erase (p=p@entry=0x0, size=size@entry=4,
this=)
at /home/test/gcc-14/include/c++/14.0.0/ext/throw_allocator.h:146
#9  0x10010474 in __gnu_cxx::throw_allocator_base::deallocate (this=, __n=1, 
__p=0x0) at /home/test/gcc-14/include/c++/14.0.0/ext/throw_allocator.h:888
#10 __gnu_test::check_deallocate_null<__gnu_cxx::throw_allocator_random >
()
at /home/test/src/gcc/libstdc++-v3/testsuite/util/testsuite_allocator.h:255
#11 main () at
/home/test/src/gcc/libstdc++-v3/testsuite/ext/throw_allocator/check_deallocate_null.cc:30

Frame #2 references 0x7fffe8f8, which is part of the stack. Frame #5 is
also referencing an object in the stack.
After these functions return, these objects shouldn't be used anymore.

[Bug tree-optimization/106900] Regression after memchr optimization

2023-05-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106900

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |FIXED
   Target Milestone|--- |14.0
 Status|ASSIGNED|RESOLVED

--- Comment #7 from Andrew Pinski  ---
Fixed on the trunk; not really worth backporting since it is only an issue with
--enable-werror-always which almost nobody uses.

[Bug tree-optimization/56456] [meta-bug] bogus/missing -Warray-bounds

2023-05-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56456
Bug 56456 depends on bug 106900, which changed state.

Bug 106900 Summary: Regression after memchr optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106900

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug libstdc++/109889] [13/14 Regression] Segfault in __run_exit_handlers since r13-5309-gc3c6c307792026

2023-05-17 Thread tuliom at ascii dot art.br via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109889

--- Comment #6 from Tulio Magno Quites Machado Filho  ---
Let me elaborate my previous comment...
When initializing the object at 0x100414c8, one of its members points to an
address in the stack (0x7fffe8f8).
All these functions return and when __run_exit_handlers() is called, the
address 0x7fffe8f8 is used to save the TOC pointer (r2) before calling the
destructors of the library.
The destructors manipulate the object at 0x100414c8, zeroing all its members,
including the address where the TOC pointer was saved.

[Bug libstdc++/109891] Null pointer special handling in ostream's operator << for C-strings

2023-05-17 Thread mimomorin at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109891

--- Comment #3 from Michel Morin  ---
>From the safety point of view, I agree with you. But, at the same time, I
thought that detectable UB (with the help of sanitizers) is useful than silent
bug. 

How about `throw`ing as in std::string's constructor?

[Bug libstdc++/109891] Null pointer special handling in ostream's operator << for C-strings

2023-05-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109891

--- Comment #4 from Andrew Pinski  ---
IIRC this was to added to be similar to glibc's nullptr handling for %s:
printf("xyza %s\n", nullptr);

[Bug analyzer/109570] detect fclose on unopened or NULL files

2023-05-17 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109570

--- Comment #6 from Xi Ruoyao  ---
(In reply to Christophe Lyon from comment #5)
> Not sure how to update/fix the testcases though?
> Since they get the declaration of fclose from stdio.h, we'd need to make
> dg-error conditional to the glibc version in use, which seems unpractical.
> 
> Should we instead remove #include  and provide suitable
> declarations in the testcase?

I guess we need to change

  return ferror (f) || fclose (f) != 0;

to

  return !f || ferror (f) || fclose (f) != 0;

Because "failing to check if the file is opened successfully" is definitely a
bug, and these tests are intended not to raise warnings for a bug-free program.

BTW ferror(f) segfaults as well when f is NULL, so IMO we should mark it
nonnull in Glibc as well.

[Bug tree-optimization/109893] New: [14 Regression] Missed Dead Code Elimination when using __builtin_unreachable since r14-160-gf828503eeb79ad1f1ada6db7deccc5abcc2f3ca3

2023-05-17 Thread theodort at inf dot ethz.ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109893

Bug ID: 109893
   Summary: [14 Regression]  Missed Dead Code Elimination when
using __builtin_unreachable since
r14-160-gf828503eeb79ad1f1ada6db7deccc5abcc2f3ca3
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: theodort at inf dot ethz.ch
  Target Milestone: ---

void foo(void);
void bar(void);
static char a;
static int b, e, f;
static int *c = &b, *g;
int main() {
int *j = 0;
if (a) {
g = 0;
if (c)
bar();
} else {
j = &e;
c = 0;
}
if (c == &f == b || c == &e)
;
else
__builtin_unreachable();
if (g || e) {
if (j == &e || j == 0)
;
else
foo();
}
a = 4;
}

gcc -O3: 

main:
cmpb$0, a(%rip)
je  .L2
xorl%esi, %esi
cmpq$0, c(%rip)
movq%rsi, g(%rip)
je  .L7
pushq   %rdx
callbar
movb$4, a(%rip)
xorl%eax, %eax
popq%rcx
ret
.L2:
xorl%eax, %eax
movq%rax, c(%rip)
.L7:
movb$4, a(%rip)
xorl%eax, %eax
ret
c:
.quad   b

gcc-trunk -O3 

main:
subq$8, %rsp
cmpb$0, a(%rip)
je  .L2
xorl%edx, %edx
cmpq$0, c(%rip)
movq%rdx, g(%rip)
je  .L6
callbar
xorl%eax, %eax
.L4:
cmpq$0, g(%rip)
je  .L9
.L6:
movb$4, a(%rip)
xorl%eax, %eax
addq$8, %rsp
ret
.L2:
xorl%eax, %eax
movq%rax, c(%rip)
movl$e, %eax
jmp .L4
.L9:
cmpl$0, e(%rip)
je  .L6
testq   %rax, %rax
je  .L6
cmpq$e, %rax
je  .L6
callfoo
jmp .L6
c:
.quad   b

Bisects to:
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=f828503eeb79ad1f1ada6db7deccc5abcc2f3ca3

[Bug tree-optimization/109893] [14 Regression] Missed Dead Code Elimination when using __builtin_unreachable since r14-160-gf828503eeb79ad1f1ada6db7deccc5abcc2f3ca3

2023-05-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109893

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Keywords||missed-optimization
   Last reconfirmed||2023-05-17
 Status|UNCONFIRMED |NEW
   Target Milestone|--- |14.0

--- Comment #1 from Andrew Pinski  ---
Confirmed.

A minor regression I suspect.

[Bug tree-optimization/109892] SLP failure with explicit fma

2023-05-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109892

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2023-05-17
 Status|UNCONFIRMED |NEW
   Severity|normal  |enhancement

--- Comment #1 from Andrew Pinski  ---
Confirmed.

I Notice that clang/LLVM does not vectorize the __builtin_fma either.

I also noticed for aarch64, GCC does not use faddp for the final reduction (but
I saw there was a patch submitted for that in 2021 but had not been updated for
the comments on it ...).

[Bug target/109885] gcc does not generate movmskps and testps instructions (clang does)

2023-05-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109885

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2023-05-17
 Ever confirmed|0   |1

--- Comment #2 from Andrew Pinski  ---
Confirmed.

[Bug middle-end/90663] [10/11/12/13/14 Regression] strcmp (&a[i], a + i) not folded for arrays and constant index

2023-05-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90663

Andrew Pinski  changed:

   What|Removed |Added

 Status|ASSIGNED|NEW
   Assignee|pinskia at gcc dot gnu.org |unassigned at gcc dot 
gnu.org

--- Comment #11 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #10)
> Created attachment 55097 [details]
> Patch which I am testing

Hmm, one failure
+FAIL: c-c++-common/Wrestrict.c  -Wc++-compat  memcpy (test for warnings, line
120)

GCC's code says:
  /* Avoid diagnosing exact overlap in calls to __builtin_memcpy.
 It's safe and may even be emitted by GCC itself (see bug
 32667).  */

Reduced testcase for the missing warning:
```
/* PR 35503 - Warn about restricted pointers
   { dg-do compile }
   { dg-options "-O2 -Wrestrict -ftrack-macro-expansion=0" } */

void sink (void*, ...);

/* Exercise memcpy with constant or known arguments.  */
void test_memcpy_cst (void *d, const void *s)
{
struct {
  char a[7];
  char b[7];
  char c[7];
} x;
sink (&x);

d = x.a + 7;
s = x.b;
__builtin_memcpy (d, s, 3); /* { dg-warning "\\\[-Wrestrict"
"memcpy" } */
sink (&x);
}

```

I am no longer working on this because I am not 100% sure if we want to still
warn here or not ...

[Bug libstdc++/109889] [13/14 Regression] Segfault in __run_exit_handlers since r13-5309-gc3c6c307792026

2023-05-17 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109889

--- Comment #7 from Jonathan Wakely  ---
When the function returns the iterator's destructor should detach itself from
the sequence's list of iterators, so that it doesn't outlive the stack frame
containing the iterator.

The commit that caused the regression included this change:

_GLIBCXX_DEBUG_VERIFY(this->_M_incrementable(),
  _M_message(__msg_bad_inc)
  ._M_iterator(*this, "this"));
-   _Safe_iterator __ret = *this;
+   _Safe_iterator __ret(*this, _Unchecked());
++*this;
return __ret;
   }

Maybe this affects how/when the __ret object gets destroyed, so it fails to
detach itself.

[Bug libstdc++/109889] [13/14 Regression] Segfault in __run_exit_handlers since r13-5309-gc3c6c307792026

2023-05-17 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109889

--- Comment #8 from Jonathan Wakely  ---
With -std=c++14 there's no crash, with -std=c++17, so that confirms it's
something related to copy elision.

[Bug libstdc++/109889] [13/14 Regression] Segfault in __run_exit_handlers since r13-5309-gc3c6c307792026

2023-05-17 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109889

--- Comment #9 from Jonathan Wakely  ---
(In reply to Jonathan Wakely from comment #8)
> With -std=c++14 there's no crash, with -std=c++17,

Should have said "only with -std=c++17" (and later, of course).

[Bug libstdc++/109889] [13/14 Regression] Segfault in __run_exit_handlers since r13-5309-gc3c6c307792026

2023-05-17 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109889

--- Comment #10 from Jonathan Wakely  ---
(In reply to Jonathan Wakely from comment #9)
> Should have said "only with -std=c++17" (and later, of course).

Actually, that's wrong, *only* with C++17, not earlier *or* later.

So the further changes to elision rules after C++17 changed the behaviour
again.

[Bug modula2/109894] New: WriteInt in the ISO libraries should not emit the '+' when writing positive values

2023-05-17 Thread gaius at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109894

Bug ID: 109894
   Summary: WriteInt in the ISO libraries should not emit the '+'
when writing positive values
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: modula2
  Assignee: gaius at gcc dot gnu.org
  Reporter: gaius at gcc dot gnu.org
  Target Milestone: ---

As reported on the gm2 mailing list.  WriteInt in the ISO libraries should not
emit the '+' when writing positive values.

[Bug modula2/109894] WriteInt in the ISO libraries should not emit the '+' when writing positive values

2023-05-17 Thread gaius at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109894

Gaius Mulley  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2023-05-17
 Status|UNCONFIRMED |ASSIGNED

--- Comment #1 from Gaius Mulley  ---
Confirmed.

[Bug modula2/109894] WriteInt in the ISO libraries should not emit the '+' when writing positive values

2023-05-17 Thread gaius at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109894

Gaius Mulley  changed:

   What|Removed |Added

 CC||gaius at gcc dot gnu.org

--- Comment #2 from Gaius Mulley  ---
Created attachment 55104
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55104&action=edit
Proposed fix

Proposed fix for WriteInt in m2iso.

[Bug modula2/109894] WriteInt in the ISO libraries should not emit the '+' when writing positive values

2023-05-17 Thread gaius at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109894

Gaius Mulley  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Gaius Mulley  ---
Closing now that the patch has been applied.

[Bug fortran/109865] different results when routine moved inside the contains statement

2023-05-17 Thread Gary.White at ColoState dot edu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109865

GARY.WHITE at ColoState dot edu  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|WAITING |RESOLVED

--- Comment #16 from GARY.WHITE at ColoState dot edu  ---
I resolved the issue.  The parameter ir was declared intent(out) in subroutine
mc11ad, but there was a check in an if statement to see if ir == 0, meaning ir
was defined on input.  This check followed code that set ir when n == 1, and
this was never executed when the code did not produce correct answers.  Anyway,
changing intent(out) to intent(in out) resolved the -O3 optimization issue and
the code works as expected.

I guess its too much to expect that the compiler would detect that a parameter
was actually being access before being set if the parameter is declared
intent(out) only.

[Bug libstdc++/109891] Null pointer special handling in ostream's operator << for C-strings

2023-05-17 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109891

--- Comment #5 from Jonathan Wakely  ---
(In reply to Michel Morin from comment #3)
> From the safety point of view, I agree with you. But, at the same time, I
> thought that detectable UB (with the help of sanitizers) is useful than
> silent bug. 

Detectable UB doesn't guarantee detection. Sanitizers are not suitable for
production code. Introducing UB here would be strictly less safe, full stop.

And the bug isn't silent, it makes the stream unusable.

> How about `throw`ing as in std::string's constructor?

Set the exception flag on the stream and you get an exception when badbit is
set.

[Bug fortran/109865] different results when routine moved inside the contains statement

2023-05-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109865

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|FIXED   |INVALID

--- Comment #17 from Andrew Pinski  ---
Since there is no GCC bug changing the issue status to invalid.

  1   2   >