https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110070
Bug ID: 110070 Summary: Code quality regression with for (int i: {1,2,4,6}) Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: roger at nextmovesoftware dot com Target Milestone: --- The fix for PR c++/70167 (in GCC 11.3) inadvertently introduced a code quality regression for simple range-for using initializer lists. The motivating example is an idiom from the stockfish benchmark [update_continuation_histories in src/search.cpp]: #include <initializer_list> extern void ext(int); void foo() { for (int i: {1,2,4,6}) ext(i); } which currently generates inefficient code by copying the array (to the stack) before use: foo(): pushq %rbp pushq %rbx subq $24, %rsp movdqa .LC0(%rip), %xmm0 movq %rsp, %rbx leaq 16(%rsp), %rbp movaps %xmm0, (%rsp) .L2: movl (%rbx), %edi addq $4, %rbx call ext(int) cmpq %rbp, %rbx jne .L2 addq $24, %rsp popq %rbx popq %rbp ret .LC0: .long 1 .long 2 .long 4 .long 6 In GCC 11.2 and earlier, the initializing array is efficiently used without copying: foo(): pushq %rbx movl $C.0.0, %ebx .L2: movl (%rbx), %edi addq $4, %rbx call ext(int) cmpq $C.0.0+16, %rbx jne .L2 popq %rbx ret C.0.0: .long 1 .long 2 .long 4 .long 6 The underlying cause of the code difference stems from whether the initializer is marked "static" in the middle-end, as shown by the differences between: const int init[4] = {1,2,4,6}; for (int i: init) ... // generates a copy and static const int init[4] = {1,2,4,6}; for (int i: init) ... // doesn't generate a copy Fortunately, there's already code in the depth of the C++ front=end for marking such initializer lists/constructors as static, so I initially tried fixing this myself, at first trying: diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc index 05df628..a91693d 100644 --- a/gcc/cp/semantics.cc +++ b/gcc/cp/semantics.cc @@ -3314,7 +3314,6 @@ finish_compound_literal (tree type, tree compound_literal, /* FIXME all C99 compound literals should be variables rather than C++ temporaries, unless they are used as an aggregate initializer. */ if ((!at_function_scope_p () || CP_TYPE_CONST_P (type)) - && fcl_context == fcl_c99 && TREE_CODE (type) == ARRAY_TYPE && !TYPE_HAS_NONTRIVIAL_DESTRUCTOR (type) && initializer_constant_valid_p (compound_literal, type)) and then trying: diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc index 2736f55..53220da 100644 --- a/gcc/cp/call.cc +++ b/gcc/cp/call.cc @@ -8557,7 +8557,10 @@ convert_like_internal (conversion *convs, tree expr, tree fn, int argnum, elttype = cp_build_qualified_type (elttype, cp_type_quals (elttype) | TYPE_QUAL_CONST); array = build_array_of_n_type (elttype, len); - array = finish_compound_literal (array, new_ctor, complain); + /* Indicate that a non-lvalue static const array is acceptable + by specifying fcl_c99. */ + array = finish_compound_literal (array, new_ctor, complain, + fcl_c99); /* Take the address explicitly rather than via decay_conversion to avoid the error about taking the address of a temporary. */ array = cp_build_addr_expr (array, complain); Both of which fix/improve code generation for this case, but break the initializer tests in the g++.dg testsuite in interesting ways. At this point I thought I'd give up and leave the fix to the experts. The range_expr passed to cp_convert_range_for is: <constructor 0x7ffff6dc9348 type <lang_type 0x7ffff6dadc78 init list VOID align:1 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff6dadc78> constant length:4 val <non_lvalue_expr 0x7ffff6dcd540 type <integer_type 0x7ffff6c415e8 int public type_6 SI size <integer_cst 0x7ffff6c43228 constant 32> unit-size <integer_cst 0x7ffff6c43240 constant 4> align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff6c415e8 precision:32 min <integer_cst 0x7ffff6c431e0 -2147483648> max <integer_cst 0x7ffff6c431f8 2147483647> pointer_to_this <pointer_type 0x7ffff6c49b28>> constant public arg:0 <integer_cst 0x7ffff6c43390 constant 1> iter.cc:7:16 start: iter.cc:7:16 finish: iter.cc:7:16> val <non_lvalue_expr 0x7ffff6dcd560 type <integer_type 0x7ffff6c415e8 int> constant public arg:0 <integer_cst 0x7ffff6c43768 constant 2> iter.cc:7:18 start: iter.cc:7:18 finish: iter.cc:7:18> val <non_lvalue_expr 0x7ffff6dcd580 type <integer_type 0x7ffff6c415e8 int> constant public arg:0 <integer_cst 0x7ffff6c43780 constant 4> iter.cc:7:20 start: iter.cc:7:20 finish: iter.cc:7:20> val <non_lvalue_expr 0x7ffff6dcd5a0 type <integer_type 0x7ffff6c415e8 int> constant public arg:0 <integer_cst 0x7ffff6c437b0 constant 6> iter.cc:7:22 start: iter.cc:7:22 finish: iter.cc:7:22>> which contains a lot of non_lvalue_expr, so it's surprising (to me) that we try to turn this into an lvalue, when it should/could be read-only. Thanks in advance. My apologies if this is a duplicate/known issue. Ideally, GCC should be able to unroll this loop, but that's a different issue.