[Bug libstdc++/94049] New: For better diagnostics CPOs should not use concepts for operator()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94049 Bug ID: 94049 Summary: For better diagnostics CPOs should not use concepts for operator() Product: gcc Version: 10.0 Status: UNCONFIRMED Keywords: diagnostic Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- Consider the following code: #include void foo0() { int t = 0; std::ranges::begin(t); } Diagnostics for it is mostly unreadable and points to the internals of libstdc++ https://godbolt.org/z/c-RwuY . This could be significantly improved. Right now the `requires` clause on `std::ranges::__cust_access::_Begin::operator()` duplicates the body of the function. So instead of such duplication all the requirements could be just asserted in the body: template constexpr auto operator()(_Tp&& __t) const noexcept(_S_noexcept<_Tp>()) { static_assert(__maybe_borrowed_range<_Tp>, "Not a borrowed range or lvalue"); if constexpr (is_array_v>) { ... } else if constexpr (__member_begin<_Tp>) return __t.begin(); else if constexpr (__adl_begin<_Tp>) return begin(__t); else static_assert(!sizeof(_Tp), "_Tp should have either a member begin() or an begin(_Tp&) should be in the namespace of _Tp"); } This gives a much better diagnostics: https://godbolt.org/z/kmLGb7 All the CPOs could be improved in that manner
[Bug middle-end/94146] New: Merging functions with same bodies stopped working
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94146 Bug ID: 94146 Summary: Merging functions with same bodies stopped working Product: gcc Version: 10.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- Consider the example: extern int x , y; int ternary(int i) { return i > 0 ? x : y; } int ternary2(int i) { return i > 0 ? x : y; } GCC9 was merging the functions with -O2: ternary2(int): jmp ternary(int) With GCC10 merging at -O2 is missing and function bodies are duplicated even for very big functions: https://godbolt.org/z/2kH8VR
[Bug c++/67302] [C++14] copy elision in return (expression)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67302 Antony Polukhin changed: What|Removed |Added CC||antoshkka at gmail dot com --- Comment #3 from Antony Polukhin --- Can reproduce with GCC 10.1 https://godbolt.org/z/tYvccG
[Bug c++/96004] New: Copy elision with conditional
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96004 Bug ID: 96004 Summary: Copy elision with conditional Product: gcc Version: 11.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- Consider the example: struct Struct { Struct() = default; Struct(Struct&&); }; Struct question10(bool b) { if (b) { Struct s{}; return s; } else { return {}; } } It is possible to elide move constructor call as the lifetimes of object `s` and `return {}` do not intersect. (some other compilers already do copy elision in that place https://godbolt.org/z/wdpLkT )
[Bug libstdc++/96088] New: Range insertion into unordered_map is less effective than a loop with insertion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96088 Bug ID: 96088 Summary: Range insertion into unordered_map is less effective than a loop with insertion Product: gcc Version: 11.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- Consider the function f1: static constexpr std::initializer_list> lst = { {"long_str_for_dynamic_allocating", 1}}; void f1() { std::unordered_map m(1); m.insert(lst.begin(), lst.end()); } It creates a temporary and as a result makes 4 allocations. Meanwhile f2 does not create a temporary and does aonly 3 allocations: void f2() { std::unordered_map m(1); for (const auto& x : lst) { m.insert(x); } } Godbolt playground: https://godbolt.org/z/VapmBU
[Bug c++/96121] New: Uninitialized variable copying not diagnosed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96121 Bug ID: 96121 Summary: Uninitialized variable copying not diagnosed Product: gcc Version: 11.0 Status: UNCONFIRMED Keywords: diagnostic Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- Consider the example: struct A { A(); }; struct B { B(A); }; struct composed2 { B b_; A a_; composed2() : b_(a_) {} }; GCC does not diagnose the uninitialized variable `a_` usage with -Wall and -Wextra. Some other compiler do diagnose: warning: field 'a_' is uninitialized when used here [-Wuninitialized] composed2() : b_(a_) {} ^ Godbolt playground: https://godbolt.org/z/AbqzjR
[Bug c++/96121] Uninitialized variable copying not diagnosed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96121 --- Comment #4 from Antony Polukhin --- Adding members and usage does not make a difference https://godbolt.org/z/VommHu struct A { A(); int i; }; struct B { B(A); int i; }; struct composed2 { B b_; A a_; composed2() : b_(a_) {} }; auto test() { return composed2{}; }
[Bug c++/96452] New: Narrowing conversion is not rejected
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96452 Bug ID: 96452 Summary: Narrowing conversion is not rejected Product: gcc Version: 11.0 Status: UNCONFIRMED Keywords: accepts-invalid Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- Consider the example: float test_main(double d) { float f2{d}; return f2; } Narrowing of double to float in brace-init is not rejected, only a warning is issued. Godbolt playground: https://godbolt.org/z/fzPT8r
[Bug c++/96452] Narrowing conversion is not rejected
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96452 --- Comment #5 from Antony Polukhin --- Hm... My reading of http://eel.is/c++draft/dcl.init.list#3.9 is that the program is ill-formed for narrowing conversions. And http://eel.is/c++draft/dcl.init.list#7.2 states that conversion from double to float is a narrowing one, except where the source is a constant expression. Am I missing something?
[Bug c++/96452] Narrowing conversion is not rejected
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96452 --- Comment #7 from Antony Polukhin --- (In reply to Jonathan Wakely from comment #6) > Your understanding of what a compiler needs to do for ill-formed programs is > wrong. You're right, thank you!
[Bug c++/92375] New: Warn on suspicious taking of function address instead of calling a function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92375 Bug ID: 92375 Summary: Warn on suspicious taking of function address instead of calling a function Product: gcc Version: 10.0 Status: UNCONFIRMED Keywords: diagnostic Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- Consider the simple example: bool function(); bool test() { bool result = function; if (result) { return 1; } else { return 2; } } Students and scholars often forget to put braces to actually call the function. Unfortunately GCC does not give a warning that is implemented in other compilers: :4:19: warning: address of function 'function' will always evaluate to 'true' [-Wpointer-bool-conversion] bool result = function; ~~ ^~~~ :4:19: note: prefix with the address-of operator to silence this warning bool result = function; ^ & :4:19: note: suffix with parentheses to turn this into a function call bool result = function; ^ () Please, add a warning.
[Bug middle-end/92455] New: Unnecessary memory read in a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92455 Bug ID: 92455 Summary: Unnecessary memory read in a loop Product: gcc Version: 10.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- Consider the example: typedef struct { int* ptr_; } int_ptr; int_ptr f1(int_ptr* x) { int_ptr* max = x; for (int i =0 ; i < 5; ++ i) { ++ x; if (*max->ptr_ < *x->ptr_) { max = x; } } return *max; } GCC with -O2 generates the following assembly: f1(int_ptr*): lea rsi, [rdi+40] mov rax, rdi .L3: mov rcx, QWORD PTR [rax] ; <== This could be removed from the loop mov rdx, QWORD PTR [rdi+8] add rdi, 8 mov edx, DWORD PTR [rdx] cmp DWORD PTR [rcx], edx cmovl rax, rdi cmp rsi, rdi jne .L3 mov rax, QWORD PTR [rax] ret If we rewrite the example to avoid int_ptr: int* f2(int** x) { int** max = x; for (int i =0 ; i < 5; ++ i) { ++ x; if (**max < **x) { max = x; } } return *max; } Then there'll be less memory accesses in a loop: f2(int**): mov rax, QWORD PTR [rdi] ; <=== Not in a loop any more lea rcx, [rdi+40] .L8: mov rdx, QWORD PTR [rdi+8] add rdi, 8 mov esi, DWORD PTR [rdx] cmp DWORD PTR [rax], esi cmovl rax, rdx cmp rcx, rdi jne .L8 ret Please improve the memory accesses for the first case Godbolt playground: https://godbolt.org/z/CaGbT2
[Bug middle-end/92455] Unnecessary memory read in a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92455 --- Comment #2 from Antony Polukhin --- Can the -ftree-partial-pre flag be enabled by default for -O2?
[Bug middle-end/92455] Unnecessary memory read in a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92455 --- Comment #4 from Antony Polukhin --- (In reply to Richard Biener from comment #3) > But maybe > you can provide benchmark data (including compile-time/memory-use figures)? OK. Is there any GCC specific tool or flag for that?
[Bug target/92592] New: Redundant comparison after subtraction on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92592 Bug ID: 92592 Summary: Redundant comparison after subtraction on x86 Product: gcc Version: 10.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- Consider the example: int sample(int a, int b) { unsigned diff = (unsigned)b - (unsigned)a; unsigned sign_bit = b < a; return diff + sign_bit; } With -O2 and -O3 GCC produces the assembly: sample(int, int): mov eax, esi ; <=== not required xor edx, edx sub eax, edi cmp esi, edi ; <=== not required setl dl add eax, edx ret However, `sub` changes the status flags and there's no need to call `cmp`: sample(int, int): xor eax, eax sub esi, edi setl al add eax, esi ret The above sample is a minimized version of std::midpoint. Godbolt playground: https://godbolt.org/z/j6FGq4
[Bug c++/90647] Warn on returning a lambda with captured local variables
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90647 --- Comment #2 from Antony Polukhin --- -Wreturn-local-addr looks good to me
[Bug c++/66139] destructor not called for members of partially constructed anonymous struct/array
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66139 Antony Polukhin changed: What|Removed |Added CC||antoshkka at gmail dot com --- Comment #16 from Antony Polukhin --- Can we increase the priority of this issue to P1 or P2? It affects the very basics of the C++. BTW, I've minimized example. It aborts on every version of GCC with -std=c++11, passes on Clang: int constructed = 0; class lock_guard_ext{ public: lock_guard_ext() { ++constructed; } ~lock_guard_ext() { --constructed; } }; struct Access { lock_guard_ext lock; int value; }; int t() { throw 0; } Access foo1() { return { {}, t() }; } int main () { try { foo1(); } catch (int) {} if (constructed != 0) __builtin_abort(); }
[Bug c++/93413] New: Destructor definition not found during constant evaluation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93413 Bug ID: 93413 Summary: Destructor definition not found during constant evaluation Product: gcc Version: 10.0 Status: UNCONFIRMED Keywords: rejects-valid Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- Consider the example: struct Base { constexpr virtual ~Base(){} }; struct Derived: Base {}; constexpr Derived d; Destructor for the `Derived` should be implicitly defined. However the above snippet produces error message on GCC-10 with -std=c++2a flag: `error: 'virtual constexpr Derived::~Derived()' used before its definition`.
[Bug c++/93414] New: Bad diagnostics for dynamic_cast during constant evaluation: implementation details leak out
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93414 Bug ID: 93414 Summary: Bad diagnostics for dynamic_cast during constant evaluation: implementation details leak out Product: gcc Version: 10.0 Status: UNCONFIRMED Keywords: diagnostic Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- Consider the example that attempts to throw a std::bad_cast: struct Base { constexpr virtual ~Base(){} }; struct Derived: Base { constexpr ~Derived(){} }; constexpr const Derived& cast(const Base& b) { return dynamic_cast(b); // error! } auto test() { static constexpr Base b; constexpr auto res = cast(b); return res; } The error message is following: : In function 'constexpr const Derived& cast(const Base&)': :10:42: error: call to non-'constexpr' function 'void* __cxa_bad_cast()' 10 | return dynamic_cast(b); // error: call to non-'constexpr' function 'void* __cxa_bad_cast()' That's not informative: users usually know nothing about __cxa_bad_cast Please change the error message to something more informative, for example "During constexpr evaluation attempt to cast a variable `b` with typeid(b) == typeid(Base) to `Derived` was detected"
[Bug c++/55249] New: Multiple copy constructors for template class lead to link errors
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55249 Bug #: 55249 Summary: Multiple copy constructors for template class lead to link errors Classification: Unclassified Product: gcc Version: 4.6.3 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: antosh...@gmail.com Created attachment 28647 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28647 gcc -v -save-temps -std=c++0x -Wall -Wextra main.cpp 1>output.txt 2>&1 Following code leads to linker errors in C++11 mode and in default mode (requires replacement of std::array with boost::array): #include #include template struct inner_type { inner_type() {} inner_type(inner_type& ) {} inner_type(const inner_type& ) {} ~inner_type() {} }; // Uncomment typedef to get undefined reference to // __uninitialized_copyILb0EE13__uninit_copy // Can be workaround by marking inner_type copy constructors with noexcept //typedef std::vector, 3> > type; // Uncomment typedef to get undefined reference to // `inner_type::inner_type(inner_type const&)' //typedef std::array, 3> type; int main() { type t1; type t2 = t1; return 0; }
[Bug c++/55249] Multiple copy constructors for template class lead to link errors
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55249 --- Comment #1 from Antony Polukhin 2012-11-09 10:24:49 UTC --- Created attachment 28648 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28648 Preprocessed file that triggers the bug
[Bug c++/55249] Multiple copy constructors for template class lead to link errors
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55249 --- Comment #4 from Antony Polukhin 2012-11-09 12:28:11 UTC --- (In reply to comment #3) Yes, thanks. `output.txt` will be the same. Also, reproduced this bug on GCC 4.7.2: [cc@ontos-soa-01 ~]$ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/home/cc/dev/gcc-4.7.2/libexec/gcc/x86_64-unknown-linux-gnu/4.7.2/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: ../downloads/gcc-4.7.2/configure --prefix=/home/cc/dev/gcc-4.7.2 --disable-multilib --enable-languages=c,c++ Thread model: posix gcc version 4.7.2 (GCC)
[Bug c++/88445] New: noexcept(expr) should return true with -fno-exceptions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88445 Bug ID: 88445 Summary: noexcept(expr) should return true with -fno-exceptions Product: gcc Version: 9.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- Consider the following example: #include struct test { test(); test(test&&); test& operator=(test&&); }; void test_func() { static_assert(noexcept(test{test()}), ""); static_assert(std::is_nothrow_move_constructible::value, ""); static_assert(std::is_nothrow_move_assignable::value, ""); } The static assertions fail with the -fno-exceptions flag however no exception could happen because all the exceptions are disabled. Please adjust the noexcept(expr) logic for the -fno-exceptions flag. Such adjustment is essential because the standard library heavily relies on the type traits and chooses the suboptimal algorithms in -fno-exceptions environments.
[Bug c++/88445] noexcept(expr) should return true with -fno-exceptions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88445 --- Comment #1 from Antony Polukhin --- Hm... This was discussed in Clang and looks like such optimization could break ABI and cause ODR violations https://bugs.llvm.org/show_bug.cgi?id=27442#c4 If nothing changed since then, I'm OK with closin this issue as Invalid or Won't Fix.
[Bug libstdc++/87431] valueless_by_exception() should unconditionally return false if all the constructors are noexcept
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87431 --- Comment #11 from Antony Polukhin --- Looks good. Note that boost::variant went further: if all the types are nothrow movable then variant always does the trick with moving from temporary. In that way `valueless_by_exception()` like states never happen. Such approach may not fit the libstdc++.
[Bug libstdc++/87431] valueless_by_exception() should unconditionally return false if all the constructors are noexcept
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87431 --- Comment #13 from Antony Polukhin --- Yeah... but some whitelist of types to move could be hardcoded. For example std::basic_string, std::vector, std::unique_ptr and std::shared_ptr could be safely moved and `valueless_by_exception()` never happen for them. Those types cover some of the popular std::variant usages and the overhead from `valueless_by_exception()` will be avoided for those cases.
[Bug c++/53294] Optimize out some exception code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53294 --- Comment #3 from Antony Polukhin --- Any progress?
[Bug c++/89036] ICE if destructor has a requires
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89036 --- Comment #1 from Antony Polukhin --- Compile with flags: -std=c++2a -fconcepts
[Bug c++/89036] New: ICE if destructor has a requires
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89036 Bug ID: 89036 Summary: ICE if destructor has a requires Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- The following code: template struct Y { ~Y() requires(true) = default; ~Y() requires(false) {} }; causes ICE: :6:27: internal compiler error: in add_method, at cp/class.c:1137 6 | ~Y() requires(false) {} | ^ Please submit a full bug report, with preprocessed source if appropriate. See <https://gcc.gnu.org/bugs/> for instructions. Compiler returned: 1
[Bug libstdc++/89120] New: std::minmax_element 2.5 times slower than hand written loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89120 Bug ID: 89120 Summary: std::minmax_element 2.5 times slower than hand written loop Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- std::minmax_element is slow when there's a lot of data and it does not fit into the CPU cache: http://quick-bench.com/Z0iRfbm2_S9KvQ1C92ydh8USF-8
[Bug libstdc++/89121] New: std::min_element (and max_element) 3.6 times slower than hand written loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89121 Bug ID: 89121 Summary: std::min_element (and max_element) 3.6 times slower than hand written loop Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- std::min_element is slow when there's a lot of data and it does not fit into the CPU cache: http://quick-bench.com/tlgxCx9CUMZgOfYbwhFaEI0WNOg
[Bug c++/82899] *this in constructors could not alias with reference input parameters of the same type
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82899 --- Comment #8 from Antony Polukhin --- (In reply to Richard Biener from comment #4) > (In reply to Antony Polukhin from comment #2) > > Looks like [class.ctor] paragraph 14 covers this case: > > > > "During the construction of an object, if the value of the object or any of > > its subobjects is accessed through > > a glvalue that is not obtained, directly or indirectly, from the > > constructor’s this pointer, the value of the > > object or subobject thus obtained is unspecified." > > Yeah, sounds like covering this case. Thus we can make 'this' restrict in > constructors (and possibly assignment operators if self-assignment is > forbidden). Self assignment is tricky and is OK to alias in most cases. It could be restricted at some point after the `this != &rhs` check (as proposed in Bug 82918). I'd rather start by "restricting this" for copy and move constructors, leaving assignment as is.
[Bug c++/82899] *this in constructors could not alias with reference input parameters of the same type
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82899 --- Comment #9 from Antony Polukhin --- There's an identical issue for clang: https://bugs.llvm.org/show_bug.cgi?id=37329 During review of that issue Richard Smith noted that the solution could be made more generic by adding `__restrict` for `this` for any constructor (not just copy and move constructors). Does the violation of noalias in GCC could be treated as unspecified behavior or is it undefined?
[Bug c++/82899] *this in constructors could not alias with reference input parameters of the same type
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82899 --- Comment #11 from Antony Polukhin --- Seems perfect https://godbolt.org/g/GX3GQd The mov is not generated for any constructor and the following code: extern struct A a; struct A { int m, n; A(const A &v); }; A::A(const A &v) : m(v.m), n((a.m = 1, v.m)) {} Is not optimized to "A::A(int, const A &v) : m(v.m), n(v.m) { a.m = 1; }" (which is a mistake). Are there some tests to make sure that the `mov` won't appear again?
[Bug c++/82899] *this in constructors could not alias with reference input parameters of the same type
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82899 --- Comment #12 from Antony Polukhin --- (In reply to Marc Glisse from comment #10) > This seems fixed in 8.1 (at least we don't generate the extra mov anymore), > can you check? Actually it still does not work for subobjects. For example https://godbolt.org/g/zPha3U Code struct array { int d[2]; }; struct test { array data1; array data2; test(const array& t); }; test::test(const array& t) : data1{t} , data2{t} {} produces assembly test::test(array const&): mov rax, QWORD PTR [rsi] mov QWORD PTR [rdi], rax mov rax, QWORD PTR [rsi] <== Not required. Could not alias mov QWORD PTR [rdi+8], rax ret [class.ctor] paragraph 14 also covers this case: "During the construction of an object, if the value of the object *or any of its subobjects* is accessed through a glvalue that is not obtained, directly or indirectly, from the constructor’s this pointer, the value of the object or subobject thus obtained is unspecified." Looks like not only `this` should be marked with __restrict, but also all the subobjects of the type.
[Bug c++/85747] New: suboptimal code without constexpr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85747 Bug ID: 85747 Summary: suboptimal code without constexpr Product: gcc Version: 8.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- Consider the following code snippet: // Bubble-like sort. Anything complex enough will work template constexpr void sort(It first, It last) { for (;first != last; ++first) { auto it = first; ++it; for (; it != last; ++it) { if (*it < *first) { auto tmp = *it; *it = *first; *first = tmp; } } } } static int generate() { int a[7] = {3, 7, 4, 2, 8, 0, 1}; sort(a + 0, a + 7); return a[0] + a[6]; } int no_constexpr() { return generate(); } Above code generates ~30 assembly instructions instead of just generating: no_constexpr(): mov eax, 8 ret But if we change `static` to `constexpr` then the compiler will optimize the code correctly. Could the compiler detect that `a[7]` holds values known at compile time and force the constexpr on `sort(a + 0, a + 7);`? Could the compiler detect that the function `generate()` is an `__attribute__((const))` function without arguments and fully evaluate it's body?
[Bug c++/85747] suboptimal code without constexpr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85747 --- Comment #3 from Antony Polukhin --- (In reply to Richard Biener from comment #1) > What's the reason for writing the code as you pasted it? I've tried to provide a simplified case. In real world `generate()` function will have some arguments and depending on those it could be either constexpr evaluated or not. There's plenty of pre C++14 code that is not well maintained and does not use constexpr a lot, but functions could be treated and evaluated as constexpr in C++14. Main reason for this ticket - is to have some out-of-the-box speedup for such legacy code. Function without arguments seemed to be a good place to start.
[Bug c++/85747] suboptimal code without constexpr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85747 --- Comment #4 from Antony Polukhin --- (In reply to Marc Glisse from comment #2) > (In reply to Antony Polukhin from comment #0) > > Could the compiler detect that `a[7]` holds values known at compile time and > > force the constexpr on `sort(a + 0, a + 7);`? > > There has to be a limit. If I write a program that computes the trillion's > decimal of pi, this is a constant, do you expect the compiler to evaluate > the whole program and compile it to just return cst? We are moving into a > realm where we would want to mix compilation and execution, sort of JIT. > For smaller functions, some heuristics could be used to try compile-time > evaluation, but sorting an array of size 7 already seems large to me. Does providing some kind of -Oon-the-fly switch solves the issue with JIT compile times while still allows more optimizations for the traditional non JIT -O2 builds?
[Bug c++/85747] suboptimal code without constexpr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85747 --- Comment #7 from Antony Polukhin --- (In reply to Jakub Jelinek from comment #6) > IMHO just use constexpr if you care about compile time evaluation > guarantees, that is what it has been added for. Fair point. Overcomplicated logic on the frontend does not seem right. But from my (not experienced) point of view there are some low hanging fruits here. I assume that frontend uses some kind of `__builtin_constant_p` to distinguish between constexpr evaluation or not. Adjusting that function slightly could produce better code out of the box on some optimization levels: static int generate() { int a[7] = {3, 7, 4, 2, 8, 0, 1}; static_assert( __builtin_constant_p(a + 0), "Immediate usage of variable initialized by constant should be a constant expression" ); sort(a + 0, a + 7); // __builtin_constant_p returns `true` => constexpr call static_assert( __builtin_constant_p(a + 0), "Value after constexpr function call should be a constant" ); return a[0] + a[6]; }
[Bug middle-end/91174] New: Suboptimal code for arithmetic with bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91174 Bug ID: 91174 Summary: Suboptimal code for arithmetic with bool Product: gcc Version: 10.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- Consider the example: int test (bool x) { return '0' + x; } For the above snippet the following suboptimal assembly is generated: test(bool): movzx eax, dil add eax, 48 ret More efficient assembly would be: test(bool): lea eax, [rdi + 48] ret
[Bug target/91174] Suboptimal code for arithmetic with bool and char
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91174 --- Comment #2 from Antony Polukhin --- (In reply to Florian Weimer from comment #1) > For which ABI do you propose the change? It's not correct for GNU/Linux: As far as I understand the proposed change does not touch ABI. `lea eax, [rdi + 48]` is equivalent to `movzx+add`
[Bug target/91174] Suboptimal code for arithmetic with bool and char
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91174 --- Comment #4 from Antony Polukhin --- Sorry, now I understood that the bug is invalid. Please close.
[Bug c++/91329] New: Unnecessary call to __cxa_throw_bad_array_new_length
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91329 Bug ID: 91329 Summary: Unnecessary call to __cxa_throw_bad_array_new_length Product: gcc Version: 10.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- For the code int* test(int i) { return new int[i]; } The following assembly is generated: test(int): movsx rdi, edi sub rsp, 8 movabs rax, 2305843009213693950 cmp rdi, rax ja .L2 sal rdi, 2 add rsp, 8 jmp operator new[](unsigned long) test(int) [clone .cold]: .L2: call __cxa_throw_bad_array_new_length However the `i * sizeof(int)` can not be greater than `2305843009213693950`. So the checks should be skipped. Optimal assembly should look close to: test(int): movsx rdi, edi sal rdi, 2 jmp operator new[](unsigned long)
[Bug c++/91329] Unnecessary call to __cxa_throw_bad_array_new_length
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91329 --- Comment #1 from Antony Polukhin --- Ops, sorry. This is invalild. `i` could be negative. Please close as invalid
[Bug middle-end/91358] New: Wrong code with dynamic allocation and optional like class
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91358 Bug ID: 91358 Summary: Wrong code with dynamic allocation and optional like class Product: gcc Version: 9.1.1 Status: UNCONFIRMED Keywords: wrong-code Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- The issue is reproduced on GCCs from 5 to 9 with -O2 and -std=c++11. GCC-10 also generates wrong code with -O2 -std=c++11 -fno-allocation-dce. Source code: template struct optional { optional() : m_initialized(false) {} ~optional() { if (m_initialized) reinterpret_cast(m_storage).~T(); } bool m_initialized; alignas(T) unsigned char m_storage[sizeof(T)]; }; struct NoPtr1 { void *ptr = nullptr; ~NoPtr1() { if (ptr) { __builtin_abort(); } } }; static void test(optional ) noexcept { delete new unsigned; } void process(optional state) { return test(state); } int main() { process({}); } The above code generates a conditional jump that depends on uninitialised value. valgrind complains: ==13823==at 0x4007B2: ~NoPtr1 (main.cpp:18) ==13823==by 0x4007B2: ~optional (main.cpp:7) ==13823==by 0x4007B2: process(optional) (main.cpp:29) ==13823==by 0x40067F: main (main.cpp:33) Running the example under GDB confirms that the destructor of NoPtr1 is called: (gdb) break main.cpp:18 Breakpoint 1 at 0x400686: main.cpp:18. (2 locations) (gdb) r Breakpoint 1, NoPtr1::~NoPtr1 (this=, __in_chrg=) at main.cpp:18 18 if (ptr) { (gdb) bt #0 NoPtr1::~NoPtr1 (this=, __in_chrg=) at main.cpp:18 #1 optional::~optional (this=, __in_chrg=) at main.cpp:7 #2 process (state=...) at main.cpp:29 #3 0x00400680 in main () at main.cpp:33 (gdb) disassemble Dump of assembler code for function process(optional): 0x00400790 <+0>: push %rbp 0x00400791 <+1>: push %rbx 0x00400792 <+2>: sub$0x8,%rsp 0x00400796 <+6>: mov0x8(%rdi),%rbx 0x0040079a <+10>:movzbl (%rdi),%ebp 0x0040079d <+13>:mov$0x4,%edi 0x004007a2 <+18>:callq 0x400600 <_Znwm@plt> 0x004007a7 <+23>:mov%rax,%rdi 0x004007aa <+26>:callq 0x4005f0 <_ZdlPv@plt> => 0x004007af <+31>:test %rbx,%rbx 0x004007b2 <+34>:je 0x4007b9 )+41> 0x004007b4 <+36>:test %bpl,%bpl 0x004007b7 <+39>:jne0x4007c0 )+48> 0x004007b9 <+41>:add$0x8,%rsp 0x004007bd <+45>:pop%rbx 0x004007be <+46>:pop%rbp 0x004007bf <+47>:retq 0x004007c0 <+48>:callq 0x4005e0
[Bug middle-end/91358] Wrong code with dynamic allocation and optional like class
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91358 --- Comment #2 from Antony Polukhin --- (In reply to Michael Matz from comment #1) > So, if you've seen a real problem somewhere (and not just valgrind > complaining about uninitialized registers in comparisons), > then you've reduced the testcase too much. The original test case was not hitting the abort. Only the valgrind was complaining. Original test case uses boost::variant, boost::optional and std::vector, so it's quite hard to analyze. Part of the assembly with two checks after the delete looks quite the same. Valgrind complains are distracting. GDB entering the destructor is missleading. Is there a simple way to change the GCC codegen to avoid the issue and not affect performance? Otherwise, is there some kind of a pattern that valgrind/gdb could detect to avoid false positives?
[Bug middle-end/91358] Wrong code with dynamic allocation and optional like class
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91358 --- Comment #6 from Antony Polukhin --- (In reply to Michael Matz from comment #3) > I don't really see any, no good idea here :-/ How about moving all the optimizations based on reading uninitialized values under a flag like -funinitialized-logic, so that users could build with -O2 -fno-uninitialized-logic ?
[Bug target/91681] New: Missed optimization for 128 bit arithmetic operations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91681 Bug ID: 91681 Summary: Missed optimization for 128 bit arithmetic operations Product: gcc Version: 10.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- Consider the function: void multiply128x64x2_3 ( const unsigned long a, const unsigned long b, const unsigned long c, const unsigned long d, __uint128_t o[2] ) noexcept { __uint128_t B0 = __uint128_t{ b } * c; __uint128_t B2 = __uint128_t{ a } * c; __uint128_t B1 = __uint128_t{ b } * d; __uint128_t B3 = __uint128_t{ a } * d; o[0] = B2 + (B0 >> 64); o[1] = B3 + (B1 >> 64); } With compilation flags "-O2 -std=c++17 -mavx" the following assembly is produced: multiply128x64x2_3(unsigned long, unsigned long, unsigned long, unsigned long, unsigned __int128*): mov rax, rdx push rbx mov rbx, rdx mov r9, rdi mul rsi mov rax, rdx xor edx, edx mov r10, rax mov rax, rbx mov r11, rdx pop rbx mul rdi add rax, r10 adc rdx, r11 mov QWORD PTR [r8], rax mov rax, rsi xor edi, edi mov QWORD PTR [r8+8], rdx mul rcx mov rax, rcx mov rsi, rdx mul r9 add rsi, rax adc rdi, rdx mov QWORD PTR [r8+16], rsi mov QWORD PTR [r8+24], rdi ret However, it is sub-optimal. Touching the stack is not necessary and the same result could be achieved with less instructions: multiply128x64x2_3(unsigned long, unsigned long, unsigned long, unsigned long, unsigned __int128*): mov r9, r8 mov r8, rdx mov rax, rsi mul r8 mov rax, r8 mov r10, rdx mul rdi add r10, rax mov rax, rsi mov QWORD PTR [r9], r10 adc rdx, 0 mov QWORD PTR [8+r9], rdx mul rcx mov rax, rdi mov r11, rdx mul rcx add r11, rax mov QWORD PTR [16+r9], r11 adc rdx, 0 mov QWORD PTR [24+r9], rdx ret
[Bug middle-end/91709] New: Missed optimization for multiplication on 1.5 and 1.25
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91709 Bug ID: 91709 Summary: Missed optimization for multiplication on 1.5 and 1.25 Product: gcc Version: 10.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- On x86_32 for any number X of type (unsigned, unsigned short, unsigned char) multiplication by 1.5 with a conversion back to unsigned with any rounding mode produces the exactly same result as if X + (X >> 1). Same holds for 1.25: unsigned(X * 1.25) == unsigned(X + (X >> 2)) The above transformation allows to emit a short code without floating point computations: test2(unsigned int): mov eax, edi shr eax add eax, edi ret Instead of: test(unsigned int): movl %edi, %edi pxor %xmm0, %xmm0 cvtsi2sdq %rdi, %xmm0 mulsd .LC0(%rip), %xmm0 cvttsd2siq %xmm0, %rax ret .LC0: .long 0 .long 1073217536
[Bug middle-end/91709] Missed optimization for multiplication on 1.5 and 1.25
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91709 --- Comment #1 from Antony Polukhin --- Godbolt playground: https://godbolt.org/z/rHQj2w
[Bug middle-end/91709] Missed optimization for multiplication on 1.5 and 1.25
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91709 --- Comment #3 from Antony Polukhin --- (In reply to jos...@codesourcery.com from comment #2) > If the result of multiplying by 1.5 is outside the range of the integer > type, the version with multiplication is required to raise the FE_INVALID > exception for the out-of-range conversion to integer My reading of the C++ standard tells that such conversion is an undefined behavior: http://eel.is/c++draft/conv.fpint#1 Is it really required to raise FE_INVALID ?
[Bug target/91721] New: Missed optimization for checking nan and comparison
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91721 Bug ID: 91721 Summary: Missed optimization for checking nan and comparison Product: gcc Version: 10.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- Consider the example: int doubleToString_0(double a) { if ( __builtin_isnan( a ) ) return 1; else if ( a == 0. ) return 2; return 3; } A suboptimal assembly with two `ucomisd` is generated for the above sample: doubleToString_0(double): ucomisd xmm0, xmm0 jp .L4 ucomisd xmm0, QWORD PTR .LC0[rip] jnp .L8 .L5: mov eax, 3 ret .L8: jne .L5 mov eax, 2 ret .L4: mov eax, 1 ret .LC0: .long 0 .long 0 More optimal solution would be to do only the second `ucomisd` and check flags for a NaN: doubleToString_0(double): pxor xmm1, xmm1 ucomisd xmm0, xmm1 jp .L4 je .L8 .L5: mov eax, 3 ret .L8: mov eax, 2 ret .L4: mov eax, 1 ret
[Bug middle-end/91739] New: Missed optimization for arithmetic operations of integers and floating point constants
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91739 Bug ID: 91739 Summary: Missed optimization for arithmetic operations of integers and floating point constants Product: gcc Version: 10.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- Consider the example: double foo(unsigned i, unsigned j) { return i * 4.0 + j * 7.0; } Right now GCC emits code that converts integers to a floating points and does the multiplications: foo(unsigned int, unsigned int): # @foo(unsigned int, unsigned int) mov eax, edi cvtsi2sd xmm1, rax mulsd xmm1, qword ptr [rip + .LCPI0_0] mov eax, esi cvtsi2sd xmm0, rax mulsd xmm0, qword ptr [rip + .LCPI0_1] addsd xmm0, xmm1 ret However it is possible to do better. If the max value of integer multiplied by the floating point constant fits into the mantissa and there is an integral type that could also hold the value then do the multiplication using integers: double foo2(unsigned i, unsigned j) { return i * 4ull + j * 7ull; } This results in a much better code: foo2(unsigned int, unsigned int): # @foo2(unsigned int, unsigned int) mov eax, edi mov ecx, esi lea rdx, [8*rcx] sub rdx, rcx lea rax, [rdx + 4*rax] cvtsi2sd xmm0, rax ret
[Bug middle-end/91866] New: Sign extend of an int is not recognized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91866 Bug ID: 91866 Summary: Sign extend of an int is not recognized Product: gcc Version: 10.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- Consider the example: using size_t = unsigned long long; size_t index0(int i) { return size_t(i + 1) - 1; } GCC generates the following assembly: index0(int): lea eax, [rdi+1] cdqe sub rax, 1 ret However a more optimal assembly is possible: index0(int): # @index0(int) movsxd rax, edi ret Godbolt playground: https://godbolt.org/z/3j7_SE
[Bug middle-end/91881] New: Value range knowledge of higher bits not used in optimizations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91881 Bug ID: 91881 Summary: Value range knowledge of higher bits not used in optimizations Product: gcc Version: 10.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- Consider the example: unsigned long long sample2(unsigned long long m) { if (m >= 100) __builtin_unreachable(); m *= 16; return m >> 3; } After the `if` statement we do know that the higher bits are set to 0. So instead of generating the following assembly: sample2(unsigned long long): mov rax, rdi sal rax, 4 shr rax, 3 ret A more optimal assembly could be generated: sample2(unsigned long long): lea rax, [rdi + rdi] ret Godbolt playground: https://godbolt.org/z/1iSpTh P.S.: that optimization is important for std::to_chars(..., double) like functions, where a significant of a double is extracted into an unsigned long long variable, so its upper bits are always zero.
[Bug middle-end/91883] New: Division by a constant could be optimized for known variables value range
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91883 Bug ID: 91883 Summary: Division by a constant could be optimized for known variables value range Product: gcc Version: 10.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- Consider the example: unsigned long long kBorder = (1ull<<62); unsigned long long sample(unsigned long long m) { if (m >= kBorder) __builtin_unreachable(); return m / 10; } It produces the following assembly: sample(unsigned long long): movabs rdx, -3689348814741910323 mov rax, rdi mul rdx mov rax, rdx shr rax, 3 ret However, knowing that the higher bits are always 0, the constant could be adjusted to avoid the `shr rax, 3`: sample(unsigned long long): movabs rax, 1844674407370955162 mul rdi mov rax, rdx ret Godbolt playground: https://godbolt.org/z/YU2yAC This issue is probably related to PR 91881 P.S.: that optimization is important for std::to_chars(..., double) like functions, where a significant of a double is extracted into an unsigned long long variable, so its upper bits are always zero.
[Bug middle-end/91899] New: Merge constant literals
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91899 Bug ID: 91899 Summary: Merge constant literals Product: gcc Version: 10.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- Consider the example: static const char data1[] = {'t','e','s','t'}; static const char data2[] = "test test"; bool index0(const char* cmp) { return cmp == data1 || cmp == data2; } Right now GCC generates suboptimal assembly: index0(char const*): mov eax, offset data1 cmp rdi, rax sete cl mov eax, offset data2 cmp rdi, rax sete al or al, cl ret data1: .ascii "test" data2: .asciz "test test" A more efficient way to generate the code is to merge `data1` and `data2`: index0(char const*): mov eax, offset data cmp rdi, rax sete al ret data: .ascii "test test" Constant literals merging significantly reduces binary size and cache misses.
[Bug middle-end/91899] Merge constant literals
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91899 --- Comment #1 from Antony Polukhin --- Godbolt playground: https://godbolt.org/z/UA_Xsm
[Bug middle-end/91899] Merge constant literals
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91899 --- Comment #4 from Antony Polukhin --- (In reply to Alexander Monakov from comment #3) > unless the compiler somehow proves that overlap is not > observable? Oh, now I see. Here's a valid example: static const char data1[] = "test"; static const char data2[] = "test test"; char lookup1(int i) { return data1[i]; } char lookup2(int i) { return data2[i]; } data1/2 are internal linkage symbols and pointers to them or their content are not returned or passed to any other function. So the overlap is not observable. The above case could be found in std::to_chars, where different internal functions have overlapping `static constexpr char __digits[]` arrays.
[Bug middle-end/91981] New: Speed degradation because of inlining a register clobbering function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91981 Bug ID: 91981 Summary: Speed degradation because of inlining a register clobbering function Product: gcc Version: 10.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- Consider the example that is a simplified version of boost::container::small_vector: #define MAKE_INLINING_BAD 1 struct vector { int* data_; int* capacity_; int* size_; void push_back(int v) { if (capacity_ > size_) { *size_ = v; ++size_; } else { reallocate_and_push(v); } } void reallocate_and_push(int v) #if MAKE_INLINING_BAD { // Just some code that clobbers many registers. // You may skip reading it const auto old_cap = capacity_ - data_; const auto old_size = capacity_ - size_; const auto new_cap = old_cap * 2 + 1; auto new_data_1 = new int[new_cap]; auto new_data = new_data_1; for (int* old_data = data_; old_data != size_; ++old_data, ++new_data) { *new_data = *old_data; } delete[] data_; data_ = new_data_1; size_ = new_data_1 + old_size; capacity_ = new_data_1 + new_cap; *size_ = v; ++size_; } #else ; #endif }; void bad_inlining(vector& v) { v.push_back(42); } With `#define MAKE_INLINING_BAD 0` the generated code is quite good: bad_inlining(vector&): mov rax, QWORD PTR [rdi+16] cmp QWORD PTR [rdi+8], rax jbe .L2 mov DWORD PTR [rax], 42 add rax, 4 mov QWORD PTR [rdi+16], rax ret .L2: mov esi, 42 jmp vector::reallocate_and_push(int) However, with `#define MAKE_INLINING_BAD 1` the compiler decides to inline the `reallocate_and_push` function that clobbers many registers. So the compiler stores the values of those registers on the stack before doing the cmp+jbe: bad_inlining(vector&): push r13 ; don't need those for the `(capacity_ > size_)` case push r12 ; likewise push rbp ; likewise push rbx ; likewise mov rbx, rdi ; likewise sub rsp, 8 ; likewise mov rdx, QWORD PTR [rdi+8] mov rax, QWORD PTR [rdi+16] cmp rdx, rax jbe .L2 mov DWORD PTR [rax], 42 add rax, 4 mov QWORD PTR [rdi+16], rax add rsp, 8 ; don't need those for the `(capacity_ > size_)` case pop rbx ; likewise pop rbp ; likewise pop r12 ; likewise pop r13 ; likewise ret .L2: ; vector::reallocate_and_push(int) implementation goes here This greatly degrades the performance of the first branch (more than x3 degradation in real code). The possible fix would be to place all the push/pop operations near the inlined `reallocate_and_push`: bad_inlining(vector&): mov rax, QWORD PTR [rdi+16] cmp QWORD PTR [rdi+8], rax jbe .L2 mov DWORD PTR [rax], 42 add rax, 4 mov QWORD PTR [rdi+16], rax ret .L2: push r13 push r12 push rbp push rbx mov rbx, rdi sub rsp, 8 ; vector::reallocate_and_push(int) implementation goes here add rsp, 8 pop rbx pop rbp pop r12 pop r13 ret Godbolt playground: https://godbolt.org/z/oDutOd
[Bug rtl-optimization/91981] Speed degradation because of inlining a register clobbering function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91981 --- Comment #4 from Antony Polukhin --- It was broken in GCC-9, GCC-8.3 and below do not have this issue.
[Bug c++/92053] New: Compilation fails or succeeds depending on the optimization flags
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92053 Bug ID: 92053 Summary: Compilation fails or succeeds depending on the optimization flags Product: gcc Version: 10.0 Status: UNCONFIRMED Keywords: accepts-invalid Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- Consider the following code: #include #include #include struct widget; using variant_t = std::variant; struct my_func { my_func(variant_t&& arg) { std::make_unique(std::move(arg)); } }; struct widget {}; my_func f({}); With `-std=c++2a -O0` it compiles. With `-std=c++2a -O2` it fails on a static assert in instantiation of 'struct std::is_default_constructible'. Godbolt playground: https://godbolt.org/z/-d26aG
[Bug c++/92054] New: `final` does not cause devirtualization of nested calls
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92054 Bug ID: 92054 Summary: `final` does not cause devirtualization of nested calls Product: gcc Version: 10.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- Consider the example: struct A { virtual int f() { return 0; } virtual int g() { return f() + 40; } }; struct B2 final : A { int f() override { return 42; } }; int test(B2& b) { return b.g(); } GCC-10 generates the assembly that does a fair vptr call. However, `B2` is final, so any call to the virtual functions of `A` end up with a call to the same function in `B2`. So `B2::g()` should inline the `A::g()` and get optimized to: int test(B2& b) { return B2::f() + 40; } Which is just 82, because `B2::f()` always returns 42. Godbolt playground: https://godbolt.org/z/PJ4nL-
[Bug c++/92053] Compilation fails or succeeds depending on the optimization flags
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92053 Antony Polukhin changed: What|Removed |Added Keywords|needs-reduction,|accepts-invalid |rejects-valid | --- Comment #2 from Antony Polukhin --- Reduced version. Note that Clang reduces to compile it with any -O, while GCC is fine with it on -O0 https://godbolt.org/z/yTM0a4 : template struct unique_ptr { _Tp* pointer_{}; explicit unique_ptr(_Tp* __p) noexcept : pointer_(__p) {} ~unique_ptr() noexcept { delete pointer_; } unique_ptr(const unique_ptr&) = delete; unique_ptr& operator=(const unique_ptr&) = delete; }; namespace my { template unique_ptr<_Tp> make_unique(_Args&&... __args) { return unique_ptr<_Tp>(new _Tp(static_cast<_Args&&>(__args)...)); } template constexpr Target move(_Tp&& __t) noexcept { return static_cast(__t); } } struct widget; template struct my_variant_impl { T value; my_variant_impl() = default; my_variant_impl(T&& val) : value(val) {}; }; template struct my_variant : my_variant_impl {}; using variant_t = my_variant; struct my_func { my_func(variant_t&& arg) { my::make_unique(my::move(arg)); } }; struct widget {}; my_func f({});
[Bug c++/92067] New: __is_constructible(incomplete_type) should make the program ill-formed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92067 Bug ID: 92067 Summary: __is_constructible(incomplete_type) should make the program ill-formed Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- Such change brings consistency with Clang and other built-in traits: struct incomplete; // fails on clang, OK on GCC const bool res = __is_constructible(incomplete); // GCC: invalid use of incomplete type 'struct incomplete' const bool res0 = __is_trivial(incomplete); // GCC: invalid use of incomplete type 'struct incomplete' const bool res1 = __is_final(incomplete); Godbolt playground: https://godbolt.org/z/GVX7mK
[Bug c++/82019] [concepts] ICE if concept is not satisfied
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82019 --- Comment #2 from Antony Polukhin --- Updated version of the test (works well on GCC 10): // { dg-options "-std=c++2a" } template concept VectorOperations = requires(T& v, const Data& data) { v += data; requires __is_same_as(T&, decltype(v += data)); }; template requires VectorOperations void compute_vector_optimal(Container& , const Data& ) {} int main() { unsigned v1[] = {1,2,3}; compute_vector_optimal(v1, v1); // { dg-error "cannot call function" } }
[Bug libstdc++/83754] New: Segmentation fault in regex_search
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83754 Bug ID: 83754 Summary: Segmentation fault in regex_search Product: gcc Version: 7.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- The following code #include #include int main() { std::regex pattern("\\w+\\."); std::string s(100, 'a'); return std::regex_search(s, pattern); } leads to segmentation fault. Backtrace reports the following: #1 0x004174a2 in std::_Function_handler, false, false> >::_M_invoke(std::_Any_data const&, char&&) () #2 0x00415544 in std::function::operator()(char) const () #3 0x00411222 in std::__detail::_State::_M_matches(char) const () #4 0x0040cde3 in std::__detail::_Executor::_M_handle_match #5 0x00409cb0 in std::__detail::_Executor::_M_dfs #6 0x00411656 in std::__detail::_Executor::_M_rep_once_more #7 0x0040ca05 in std::__detail::_Executor::_M_handle_repeat <...> #11350 0x00409cb0 in std::__detail::_Executor::_M_dfs #11351 0x00411656 in std::__detail::_Executor::_M_rep_once_more #11352 0x0040ca05 in std::__detail::_Executor::_M_handle_repeat <...> This issue could be related to the bug 79539
[Bug c++/84099] New: Dynamic initialization is performed in case when constant initialization is permitted
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84099 Bug ID: 84099 Summary: Dynamic initialization is performed in case when constant initialization is permitted Product: gcc Version: 8.0.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- The following code struct foo { const char* data_; unsigned size_; foo(const char* data, unsigned size) noexcept : data_(data) , size_(size) {} }; foo test() { static const foo v{"Hello", 5}; return v; } Produces disassembly with dynamic initialization of the `v` variable. However in this case C++ Standard permits constant initialization: "An implementation is permitted to perform the initialization of a variable with static or thread storage duration as a static initialization even if such initialization is not required to be done statically, provided that — the dynamic version of the initialization does not change the value of any other object of static or thread storage duration prior to its initialization, and — the static version of the initialization produces the same value in the initialized variable as would be produced by the dynamic initialization if all variables not required to be initialized statically were initialized dynamically. " Optimal assembly would look like .LC0: .string "Hello" test(): mov eax, OFFSET FLAT:.LC0 mov edx, 5 ret
[Bug c++/84103] New: Dynamic initialization is performed for non-local variables in case when constant initialization is permitted
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84103 Bug ID: 84103 Summary: Dynamic initialization is performed for non-local variables in case when constant initialization is permitted Product: gcc Version: 8.0.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- Following code struct foo { const char* data_; unsigned size_; foo(const char* data, unsigned size) noexcept : data_(data) , size_(size) {} }; extern const foo v{"Hello", 5}; Produces assembly with dynamic initialization: .LC0: .string "Hello" _GLOBAL__sub_I_v: mov QWORD PTR v[rip], OFFSET FLAT:.LC0 mov DWORD PTR v[rip+8], 5 ret v: .zero 16 However in this case C++ Standard permits constant initialization: "An implementation is permitted to perform the initialization of a variable with static or thread storage duration as a static initialization even if such initialization is not required to be done statically, provided that — the dynamic version of the initialization does not change the value of any other object of static or thread storage duration prior to its initialization, and — the static version of the initialization produces the same value in the initialized variable as would be produced by the dynamic initialization if all variables not required to be initialized statically were initialized dynamically. " Optimal assembly would look like the following v: .quad .L.str .long 5 # 0x5 .zero 4 .L.str: .asciz "Hello" (clang produces the code from above) Bug 84099 may be related to this one. That bug is about local variables initialization, this bug is about non-local variables.
[Bug middle-end/84147] New: RTTI for base class in anonymous namespace could be avoided
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84147 Bug ID: 84147 Summary: RTTI for base class in anonymous namespace could be avoided Product: gcc Version: 8.0.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- Consider the example namespace { struct base { virtual int foo() noexcept {return 1;} }; } struct derived1 final : base {}; struct derived2 final : base {}; struct pair { derived1 d1; derived2 d2; }; pair test() { return {}; } `base` is in the anonymous namespace (has internal linkage) and used only for providing some functions to derived classes. There are no complex inheritances, there are no dynamic_casts and typeid(base) calls. RTTI for base class seems useless in that case, but it is still generated in the assembly: .type typeinfo for (anonymous namespace)::base, @object .size typeinfo for (anonymous namespace)::base, 16 typeinfo for (anonymous namespace)::base: .quad vtable for __cxxabiv1::__class_type_info+16 .quad typeinfo name for (anonymous namespace)::base .align 16 .type typeinfo name for (anonymous namespace)::base, @object .size typeinfo name for (anonymous namespace)::base, 23 typeinfo name for (anonymous namespace)::base: .string "*N12_GLOBAL__N_14baseE"
[Bug c++/84306] New: Wrong overload selected with -std=c++17, explicit and {}
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84306 Bug ID: 84306 Summary: Wrong overload selected with -std=c++17, explicit and {} Product: gcc Version: 8.0.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- Following code uses function (1), however function (2) must be used. struct foo { foo() = default; foo(foo const&); // (1) template explicit foo(T&&); // (2) }; int main() { foo f1; foo f2{f1}; // (1) - wrong, must be (2) } The compiler chooses the right function if 'explicit' is removed, or if '{f1}' is replaced with '(f1)', or if -std=c++17 option is changed to -std=c++14.
[Bug c++/89301] New: [concepts] requires clause on a template alias is ignored
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89301 Bug ID: 89301 Summary: [concepts] requires clause on a template alias is ignored Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- The following code compiles however it should not: template requires(condition) using enable_if_t = T; template> void foo(); void test() { foo(); } Slightly changed example also compiles on GCC (but fails to compile on Clang): template requires(condition) using enable_if_t = T; template enable_if_t foo(); void test() { foo(); }
[Bug libgcc/89625] New: Freeing memory under the lock in __deregister_frame_info_bases
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89625 Bug ID: 89625 Summary: Freeing memory under the lock in __deregister_frame_info_bases Product: gcc Version: 9.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: libgcc Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- `__deregister_frame_info_bases` in file unwind-dw2-fde.c calls `free (ob->u.sort);` under the locked `object_mutex`. This can be avoided by remembering the pointer to free and freeing it outside the critical section. This has been fixed in upstream glibc https://github.com/bminor/glibc/commit/2604882cefd3281679b8177245fdebc7061b8695#diff-17235859a5d2697ce97070a69ab9a602
[Bug c++/89700] New: Warn if move constructor is not generated and not deleted
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89700 Bug ID: 89700 Summary: Warn if move constructor is not generated and not deleted Product: gcc Version: 9.0 Status: UNCONFIRMED Keywords: diagnostic Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- It would be great to have a warning that helps to identify classes with sub-optimal move semantics. Foe example it would be nice to have such a warning for cases like following: struct member { member(); member(const member&); member(member&&); private: int* data_; }; // warninig: `my_class(const my_class&)` disables the // implicit move constructor generation. Use // `my_class(my_class&) = default;` to generate it or // `my_class(my_class&) = delete;` to disable this warning. struct my_class { my_class() = default; my_class(const my_class&); private: member member1; member member2; }; void foo(my_class c); void test() { my_class c; foo(static_cast(c)); // copies } The rules for the warning could be following: Issue a warning if at least one of the class members has a move constructor, class has a copy constructor and the move constructor is not implicitly deleted.
[Bug libstdc++/89728] New: ctype is underconstrained
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89728 Bug ID: 89728 Summary: ctype is underconstrained Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- Because of that overloads from [locale.convenience] compile well with creepy charT template arguments like std::string: std::tolower(std::string{}, std::locale::classic()); That leads to runtime exceptions (bad cast to ctype>) instead of a compile time. Some other standard library implementations are more restrictive and do not allow such weird template parameters for ctype: error: implicit instantiation of undefined template 'std::__1::ctype >'
[Bug c++/89785] New: Incorrect "not a constant expression" error with switch statement that returns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89785 Bug ID: 89785 Summary: Incorrect "not a constant expression" error with switch statement that returns Product: gcc Version: 9.0 Status: UNCONFIRMED Keywords: rejects-valid Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- The following code fails to compile: constexpr int Addrlen(int domain) { switch (domain) { case 0: return 0; case 2: return 42; } throw 42; } Error message is following: : In function 'constexpr int Addrlen(int)': :8:11: error: expression '' is not a constant expression 8 | throw 42; | ^~
[Bug c++/89785] Incorrect "not a constant expression" error with switch statement that returns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89785 --- Comment #2 from Antony Polukhin --- > So you say that Addrlen(0) and Addrlen(2) are proper constexprs? Of course Addrlen(1) is not. Yes. But GCC does not even allow to define the Addrlen function: https://godbolt.org/z/xqR2Lr
[Bug libstdc++/89816] New: [9 Regression] std::variant move construction regressed since GCC 8.3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89816 Bug ID: 89816 Summary: [9 Regression] std::variant move construction regressed since GCC 8.3 Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- The following code #include struct my_type{ my_type(my_type&&) noexcept; }; using V1 = std::variant; V1 test1(V1 v ) { return v; } Was producing a jump table of size 5 on GCC 8.3. GCC 9 produces huge jump tables with over 30 entries. This leads to 3 times bigger binaries with GCC 9. https://godbolt.org/z/SUWL5T
[Bug c++/89700] Warn if move constructor is not generated and not deleted
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89700 --- Comment #6 from Antony Polukhin --- Another way to workaround the warning is to use something like `my_class(my_class&) requires false;`. That's too ugly to use. I'd be fine with closing this issue as a 'won't fix'.
[Bug libstdc++/89816] [9 Regression] std::variant move construction regressed since GCC 8.3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89816 --- Comment #6 from Antony Polukhin --- The fix seems pretty trivial: in function `__variant_construct` get the address of the sorage before entering the `__do_visit` and make it switch only by the `__rhs`. Pseudo-code: template void __variant_construct(_Tp&& __lhs, _Up&& __rhs) { __lhs._M_index = __rhs._M_index; void* storage = std::addressof(__lhs._M_u); __do_visit([storage](auto&& __rhs_mem) -> __detail::__variant::__variant_cookie { using _Type = remove_reference_t; ::new (storage) _Type(std::forward(__rhs_mem)); return {}; }, __variant_cast<_Types...>(std::forward(__rhs))); }
[Bug libstdc++/89819] New: [9 Regression] std::variant operators regressed since GCC 8.3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89819 Bug ID: 89819 Summary: [9 Regression] std::variant operators regressed since GCC 8.3 Product: gcc Version: 9.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- The following code #include struct my_type{}; bool operator==(const my_type&, const my_type&) noexcept; using V1 = std::variant; auto test1(const V1& v) { return v == v; } Was producing a jump table of size 5 on GCC 8.3. GCC 9 produces huge jump tables with over 30 entries. This leads to ~15 times bigger binaries with GCC 9 and ~25% compilation slowdown. https://godbolt.org/z/yoAIrP This could be fixed by changing the `_VARIANT_RELATION_FUNCTION_TEMPLATE` from binary visitation to unary via first checking the `index()` of `__lhs` + `__rhs` and doing the visitation only if they match (hold the same type). Pseudo-code: #define _VARIANT_RELATION_FUNCTION_TEMPLATE(__OP, __NAME) \ template \ constexpr bool operator __OP(const variant<_Types...>& __lhs, \ const variant<_Types...>& __rhs) \ { \ bool __ret = true; \ if ((__lhs.index() + 1) != (__rhs.index() + 1)) { \ return (__lhs.index() + 1) __OP (__rhs.index() + 1); \ } \ __do_visit([&__ret, &__lhs] \ (auto&& __rhs_mem) mutable \ -> __detail::__variant::__variant_cookie \ { \ using __Type = remove_reference_t; \ if constexpr (!is_same_v< \ __Type, \ __detail::__variant::__variant_cookie>) \ __ret = __detail::__variant::__get<__detail::__variant::__index_of_v<__Type, _Types...>>(__this_mem) __OP __rhs_mem; \ return {}; \ }, __rhs); \ return __ret; \ } \ \ constexpr bool operator __OP(monostate, monostate) noexcept \ { return 0 __OP 0; }
[Bug libstdc++/89816] [9 Regression] std::variant move construction regressed since GCC 8.3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89816 --- Comment #9 from Antony Polukhin --- BTW, I think there are some other cases where binary visitation could be simplified to unary (significantly reducing the code size and improving the compile times). I've filled Bug 89819, but looks like assignment and swap could be also optimized.
[Bug c++/89820] New: Returning empty type produces unnecessary instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89820 Bug ID: 89820 Summary: Returning empty type produces unnecessary instructions Product: gcc Version: 9.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- Consider the following code struct my_type_impl {}; my_type_impl foo0() { return {}; } my_type_impl foo1() { my_type_impl tmp; return tmp; } For `foo0` and `foo1` GCC generates the following assembly: xor eax, eax ret However xoring the `eax` seems unnecessary and some of the other compilers just generate the `ret` instruction. The additional `xor` instruction could significantly increase the code size for generic C++ programs. For example in Bug 89819 and Bug 89816 each of he 36 jump table entries has that additional instruction.
[Bug libstdc++/89824] New: Variant jump table reserves space for __variant_cookie twice
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89824 Bug ID: 89824 Summary: Variant jump table reserves space for __variant_cookie twice Product: gcc Version: 9.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- Space for the `__variant_cookie` state is already reserved in _Multi_array `_Multi_array<_Tp, __rest...> _M_arr[__first + __do_cookie];`. Additionally reserving it inside the `__gen_vtable` produces jump table with gaps https://godbolt.org/z/Vx_wEU. Fix: remove the `+ (is_same_v<_Result_type, __variant_cookie> ? 1 : 0)` from `__gen_vtable`. This removes zeros from jump table and slightly reduces the binary size https://godbolt.org/z/gyo0-j
[Bug libstdc++/89825] New: Jump table for variant visitation could be shortened for never empty variants
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89825 Bug ID: 89825 Summary: Jump table for variant visitation could be shortened for never empty variants Product: gcc Version: 9.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- The `__do_cookie` computation in `_Multi_array` seems suboptimal. There are variant types that are never empty, so they never need the cookie value at all. `_Variant_storage::_M_valid()` already use that knowledge to always return `true`. The same logic could be used for the `__do_cookie`. Pseudo-code: + template + struct _Never_empty; + template + struct _Never_empty> + { +static constexpr bool _S_value = (is_trivially_copyable_v<_Types> && ...); + }; template struct _Multi_array<_Ret(*)(_Visitor, _Variants...), __first, __rest...> { + static constexpr size_t __index = sizeof...(_Variants) - sizeof...(__rest) - 1; + using _Variant_current = __remove_cvref_t::type>; static constexpr int __do_cookie = - is_same_v<_Ret, __variant_cookie> ? 1 : 0; + is_same_v<_Ret, __variant_cookie> && _Never_empty<_Variant_current>::_S_value ? 1 : 0; using _Tp = _Ret(*)(_Visitor, _Variants...); template constexpr const _Tp& _M_access(size_t __first_index, _Args... __rest_indices) const { return _M_arr[__first_index + __do_cookie]._M_access(__rest_indices...); } _Multi_array<_Tp, __rest...> _M_arr[__first + __do_cookie]; }; template static constexpr void _S_apply_all_alts(_Array_type& __vtable, std::index_sequence<__var_indices...>) { - if constexpr (is_same_v<_Result_type, __variant_cookie>) + if constexpr (is_same_v<_Result_type, __variant_cookie> + && !_Never_empty>::_S_value) (_S_apply_single_alt( __vtable._M_arr[__var_indices + 1], &(__vtable._M_arr[0])), ...); else (_S_apply_single_alt( __vtable._M_arr[__var_indices]), ...); } The above patch reduces jump table size on up to 2*sizeof...(_Types) for binary visitations.
[Bug libstdc++/89825] Jump table for variant visitation could be shortened for never empty variants
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89825 --- Comment #1 from Antony Polukhin --- There's a typo in proposed solution: it should be `&& !_Never_empty` in `_Multi_array`.
[Bug libstdc++/89825] Jump table for variant visitation could be shortened for never empty variants
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89825 --- Comment #4 from Antony Polukhin --- > Would you be willing to complete a copyright assignment for contributions to > GCC? Yes, I can do that. Please send the instructions to my email.
[Bug libstdc++/89851] New: [Regression] std::variant comparison operators violate [variant.relops]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89851 Bug ID: 89851 Summary: [Regression] std::variant comparison operators violate [variant.relops] Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- The following function should return `false` according to [variant.relops]: #include using V = std::variant; bool test1() { V v1{std::in_place_index<0>, 0}; V v2{std::in_place_index<1>, 0}; return v1 == v2; } std::variant in GCC-8 was returning `false`, however the variant from GCC-9 returns `true`. This could be quickly fixed by comparing indexes at the start of each operator. Another way of fixing is to pass integral_constants instead of types into the __do_visit function.
[Bug middle-end/89922] New: Loop on fixed size array is not unrolled and poorly optimized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89922 Bug ID: 89922 Summary: Loop on fixed size array is not unrolled and poorly optimized Product: gcc Version: 9.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- Consider the example: struct array { int data[5]; }; array test(int i) { array a = {1, i, 2, 3, 4}; for (int j = 0; j < 5; ++j) { a.data[j] += j; } return a; } GCC-9 generates ~20 instructions with jmps. Rewriting the same function with unrolled loop makes the assembly much better: array test2(int i) { array a = {1, i, 2, 3, 4}; a.data[0] += 0; a.data[1] += 1; a.data[2] += 2; a.data[3] += 3; a.data[4] += 4; return a; } Assembly for `test2` takes only ~8 instructions: test2(int): add esi, 1 mov DWORD PTR [rdi], 1 mov rax, rdi movabs rdx, 25769803780 mov DWORD PTR [rdi+4], esi mov QWORD PTR [rdi+8], rdx mov DWORD PTR [rdi+16], 8 ret
[Bug middle-end/89922] Loop on fixed size array is not unrolled and poorly optimized at -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89922 --- Comment #2 from Antony Polukhin --- The estimation is very close to the actual result for the loop. But it does not take into the account the instructions before the loop that are eliminated due to unrolling. Some heuristic like "initializing the local variable with goes away for unrolled loops if the variable is rewritten in loop or if the variable is not used outside the loop"
[Bug middle-end/89922] Loop on fixed size array is not unrolled and poorly optimized at -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89922 --- Comment #4 from Antony Polukhin --- > Was the testcase just an artificial one or does it appear (in this > isolated form!) in a real application/benchmark? I was not investigating a particular benchmark or real world application at first. My guess is that heuristic will affect cryptography (initializing big arrays with magic constants) and math (matrix multiplication with identity matrix for example). I've tried to check the validity of the guess. The very first attempt succeeded. Hash computation for a constant string is not well optimized: https://godbolt.org/z/iKi0pb The heuristic may notice that the string is a local variable and may force the loop unrolling. Hash computations on a constant variable is a common case in libstdc++ when working with unordered maps and sets. There's definitely some room for improvement for cases when a local variable is used in the loop only.
[Bug libstdc++/90008] New: [9 Regression] variant attempts to copy rhs in comparison operators
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90008 Bug ID: 90008 Summary: [9 Regression] variant attempts to copy rhs in comparison operators Product: gcc Version: 9.0 Status: UNCONFIRMED Keywords: rejects-valid Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- _VARIANT_RELATION_FUNCTION_TEMPLATE accidentally accepts the second visitable by copy in `__do_visit<__detail::__variant::__visit_with_index>`. The following test fails right now, but worked in GCC-8: #include struct user_defined { user_defined(); user_defined(const user_defined&) = delete; user_defined(user_defined&&) = delete; }; bool operator==(const user_defined& x, const user_defined& y) { return true; } using v_t = std::variant; auto test(const v_t& v, const v_t& v2) { return v == v2; }
[Bug target/90202] New: AVX-512 instructions not used
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90202 Bug ID: 90202 Summary: AVX-512 instructions not used Product: gcc Version: 9.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- Consider the following test program: struct v { int val[16]; }; v test(v a, v b) { v res; for (int i = 0; i < 16; i++) res.val[i] = a.val[i] + b.val[i]; return res; } When compiled with `g++ -O3 -march=skylake-avx512` the following assembly is produced: test(v, v): push rbp mov rax, rdi mov rbp, rsp vmovdqu32 ymm1, YMMWORD PTR [rbp+16] vmovdqu32 ymm2, YMMWORD PTR [rbp+48] vpaddd ymm0, ymm1, YMMWORD PTR [rbp+80] vmovdqu32 YMMWORD PTR [rdi], ymm0 vpaddd ymm0, ymm2, YMMWORD PTR [rbp+112] vmovdqu32 YMMWORD PTR [rdi+32], ymm0 vzeroupper pop rbp ret it seems suboptimal, as the 512 registers are available and a better assembly is possible: test(v, v): vmovdqu32 zmm0, zmmword ptr [rsp + 72] vpaddd zmm0, zmm0, zmmword ptr [rsp + 8] vmovdqu32 zmmword ptr [rdi], zmm0 mov rax, rdi vzeroupper ret
[Bug c/90204] New: [8 Regression] C code is optimized worse than C++
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90204 Bug ID: 90204 Summary: [8 Regression] C code is optimized worse than C++ Product: gcc Version: 9.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- Consider the example: struct v { int val[16]; }; struct v test(struct v a, struct v b) { struct v res; for (int i = 0; i < 16; i++) res.val[i] = a.val[i] + b.val[i]; return res; } Compiling that snippet with `g++ -O3 -march=skylake-avx512` gives a short assembly: test(v, v): push rbp mov rax, rdi mov rbp, rsp vmovdqu32 ymm1, YMMWORD PTR [rbp+16] vmovdqu32 ymm2, YMMWORD PTR [rbp+48] vpaddd ymm0, ymm1, YMMWORD PTR [rbp+80] vmovdqu32 YMMWORD PTR [rdi], ymm0 vpaddd ymm0, ymm2, YMMWORD PTR [rbp+112] vmovdqu32 YMMWORD PTR [rdi+32], ymm0 vzeroupper pop rbp ret Compiling the same sample with the C compiler and same flags produces a ~150 lines of assembly with a lot of jumps and comparisons. The regression appeared after GCC-7.3
[Bug target/90202] AVX-512 instructions not used
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90202 --- Comment #2 from Antony Polukhin --- Then I'm fine with the current codegen. However with -mavx512f it produces a few additional instructions for rbp register test(v, v): push rbp ; not necessary mov rax, rdi mov rbp, rsp ; not necessary vmovdqu32 zmm1, ZMMWORD PTR [rbp+16]; could use rsp directly vpaddd zmm0, zmm1, ZMMWORD PTR [rbp+80] ; could use rsp directly vmovdqu32 ZMMWORD PTR [rdi], zmm0 vzeroupper pop rbp ; not necessary ret
[Bug c++/90647] New: Warn on returning a lambda with captured local variables
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90647 Bug ID: 90647 Summary: Warn on returning a lambda with captured local variables Product: gcc Version: 10.0 Status: UNCONFIRMED Keywords: diagnostic Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- Consider the example: auto test(int s) { return [&s] { return s; }; } `s` is a local variable, so we return a lambda that has a dangling reference. It would be nice to have a warning for such cases.
[Bug c++/90666] New: Warn if an UB was met during constexpr evaluation attempt
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90666 Bug ID: 90666 Summary: Warn if an UB was met during constexpr evaluation attempt Product: gcc Version: 10.0 Status: UNCONFIRMED Keywords: diagnostic Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- Consider the example: constexpr int test() { const char* from = "wow"; char dest[1] = {*from}; // assignment to dereferenced one-past-the-end pointer dest[1] = 0; return 0; } const auto r = test(); `test()` function is a constexpr function, yet any attempt to call it causes UB. It would be very helpful to have a warning for a constexpr evaluation attempt that met an UB and fell back to runtime evaluation. Note, that such warning would be extremely helpful for contracts. It would allow to detect contract violations at compile time: constexpr int impl(int num) [[pre: num > 0]] { return num + 42; } auto test() { // Core constant expression: const auto f0 = impl(1); // Runtime call to __on_contract_violation. // Warning would be very helpful. const auto f1 = impl(0); }
[Bug libstdc++/71579] type_traits miss checks for type completeness in some traits
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71579 --- Comment #9 from Antony Polukhin --- (In reply to Jonathan Wakely from comment #8) > Is there more work to do to support the whole of https://wg21.link/lwg2797 ? Looks like I've missed the is_nothrow_invocable_r, is_convertible, is_nothrow_convertible, is_swappable_with, is_nothrow_swappable_with. I'll add static asserts in a separate patch. is_base_of is a hard one. But doable. Non first template arguments of some traits could be hardened further. However there are doubts about hardening those and especially the `R` parameter of the is_*invocable_r traits: #include struct X; struct foo{ X operator()(X&, X&); }; // OK on GCC and Clang constexpr bool r0 = std::is_invocable_r::value; struct Y { Y& operator=(X ); }; // OK on GCC, ill-formed on clang constexpr bool r1 = std::is_assignable::value; I'm not sure what to do. We may harden those and make the behavior match the Comments/Preconditions columns in the [meta.*], or relax those preconditions in the WD, or do nothing and leave it as is. Right now I'm in favor of the second approach.
[Bug libstdc++/71579] type_traits miss checks for type completeness in some traits
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71579 --- Comment #12 from Antony Polukhin --- (In reply to Jonathan Wakely from comment #11) > This change broke a compiler test: g++.dg/cpp0x/noexcept15.C > > I'll have to figure out how to update that test to keep testing what it was > meant to test, without triggering the library assertion. Something like the following should do the trick - noexcept(std::is_nothrow_move_constructible::value) + noexcept(noexcept(std::declval() = std::declval()))
[Bug libstdc++/71579] type_traits miss checks for type completeness in some traits
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71579 --- Comment #13 from Antony Polukhin --- I meant + noexcept(noexcept(Tp(std::declval( but now I'm not sure that it would test excactly the same thing.