https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121679
Bug ID: 121679 Summary: Much better code at -O1 than at O2 Product: gcc Version: 16.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: rearnsha at gcc dot gnu.org Target Milestone: --- #include <random> struct A { int a; int b; const char *s; uint8_t array[64]; }; static A f1(int a, int b) { A obj { .a = a + b, .b = b * a, .s = (a < b) ? "hello": "world", .array = {0} }; return obj; } int main(void) { std::mt19937 gen32; A a = f1 (gen32(), gen32()); return 0; } gcc -std=c++23 <optimize> test.c -S When compiled at -O1 this code is entirely optimized away to "return 0;", but at -O2 a significant chunk remains. The relevant pass seems to be dse1: At -O1: ;; Function main (main, funcdef_no=3127, decl_uid=61567, cgraph_uid=803, symbol_order=1346) Deleted dead store: a = f1 (_4, _2); [return slot optimization] Deleted trivially dead stmt: _4 = (int) _11; Deleted dead store: _11 = std::mersenne_twister_engine<long unsigned int, 32, 624, 397, 31, 2567483615, 11, 4294967295, 7, 2636928640, 15, 4022730752, 18, 1812433253>::operator() (&gen32); Deleted trivially dead stmt: _2 = (int) _9; Deleted dead store: _9 = std::mersenne_twister_engine<long unsigned int, 32, 624, 397, 31, 2567483615, 11, 4294967295, 7, 2636928640, 15, 4022730752, 18, 1812433253>::operator() (&gen32); Deleted dead store: std::mersenne_twister_engine<long unsigned int, 32, 624, 397, 31, 2567483615, 11, 4294967295, 7, 2636928640, 15, 4022730752, 18, 1812433253>::seed (&gen32, 5489); Deleted dead store: MEM[(struct mersenne_twister_engine *)&gen32] ={v} {CLOBBER}; But at -O2: Deleted dead store: a.s = iftmp.3_20; Deleted trivially dead PHI: iftmp.3_20 = PHI <"hello"(3), "world"(4)> Deleted dead store: a.b = _19; Deleted trivially dead stmt: _19 = _2 * _4; Deleted dead store: a.a = _18; Deleted trivially dead stmt: _18 = _2 + _4; Deleted dead store: MEM <char[64]> [(struct A *)&a + 16B] = {}; It seems that the earlier inlining of the call to f1() is preventing later optimization of the dead calls