https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87502
Martin Liška <marxin at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jakub at gcc dot gnu.org, | |marxin at gcc dot gnu.org --- Comment #1 from Martin Liška <marxin at gcc dot gnu.org> --- (In reply to M Welinder from comment #0) > Created attachment 44776 [details] > Preprocessed source code > > It appears that gcc is creating quite poor code when "c-style strings" > are used to construct std::string objects. Ideally, the result ought > to be just a few move instructions for small strings. > > > Host: Linux x86_64 4.4.140-62-default (OpenSuSE) > > Test code: > --------------------------------------------------------------- > #include <string> > > extern void foo (const std::string &); > > void > bar () > { > foo ("abc"); > foo (std::string("abc")); > } > --------------------------------------------------------------- > > > > # /usr/local/products/gcc/8.2.0/bin/g++ -std=gnu++1z -S -m32 -O3 ttt.C > # grep 'call.*construct' ttt.s > call > _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE12_M_constructIPKcEEvT_S > 8_St20forward_iterator_tag.constprop.18 > call > _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE12_M_constructIPKcEEvT_S > 8_St20forward_iterator_tag.constprop.18 > > Here gcc generates complete calls to the generic string construction > even though the strings are constructed from small, known strings. With -O2 -fdump-ipa-inline says: function not declared inline and code size would grow > > "-std=gnu++1z" is important; "-m32" and "-O3" (as opposed to "-m64" and > "-O2") are not. With -O3 more inlining happens. > > # /usr/local/products/gcc/8.2.0/bin/g++ -S -m32 -O3 ttt.C > # grep 'call.*construct' ttt.s > # (nada) > > No calls -- good. In this case gcc generates this fragment: > > _Z3barv: > .LFB1084: > .cfi_startproc > .cfi_personality 0,__gxx_personality_v0 > .cfi_lsda 0,.LLSDA1084 > pushl %ebp > .cfi_def_cfa_offset 8 > .cfi_offset 5, -8 > movl $25185, %edx > movl %esp, %ebp > .cfi_def_cfa_register 5 > pushl %edi > pushl %esi > .cfi_offset 7, -12 > .cfi_offset 6, -16 > leal -48(%ebp), %esi > pushl %ebx > .cfi_offset 3, -20 > leal -40(%ebp), %ebx > subl $56, %esp > movl %ebx, -48(%ebp) > pushl %esi > movw %dx, -40(%ebp) > movb $99, -38(%ebp) > movl $3, -44(%ebp) > movb $0, -37(%ebp) > .LEHB6: > .cfi_escape 0x2e,0x10 > call _Z3fooRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE > [...] > > This is better than a call, but not great: > 1. The string is moved into position in three chunks (25185, 99, 0). > This probably comes from inlined memcpy of 3 bytes, but the source > is zero-terminated so rounding the memcpy size up to 4 would have > been better. Yes we end up with: __builtin_memcpy (&D.30710.D.23004._M_local_buf, "abc", 3); > 2. It's unclear why 25185 is passed through a register. It's somehow connected to fact that constant are somehow expensive on x86_64. Jakub can help here..