https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87502

Martin Liška <marxin at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org,
                   |                            |marxin at gcc dot gnu.org

--- Comment #1 from Martin Liška <marxin at gcc dot gnu.org> ---
(In reply to M Welinder from comment #0)
> Created attachment 44776 [details]
> Preprocessed source code
> 
> It appears that gcc is creating quite poor code when "c-style strings"
> are used to construct std::string objects.  Ideally, the result ought
> to be just a few move instructions for small strings.
> 
> 
> Host: Linux x86_64 4.4.140-62-default (OpenSuSE)
> 
> Test code:
> ---------------------------------------------------------------
> #include <string>
> 
> extern void foo (const std::string &);
> 
> void
> bar ()
> {
>   foo ("abc");
>   foo (std::string("abc"));
> }
> ---------------------------------------------------------------
> 
> 
> 
> # /usr/local/products/gcc/8.2.0/bin/g++ -std=gnu++1z  -S -m32 -O3 ttt.C
> # grep 'call.*construct' ttt.s 
>       call
> _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE12_M_constructIPKcEEvT_S
> 8_St20forward_iterator_tag.constprop.18
>       call
> _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE12_M_constructIPKcEEvT_S
> 8_St20forward_iterator_tag.constprop.18
> 
> Here gcc generates complete calls to the generic string construction
> even though the strings are constructed from small, known strings.

With -O2 -fdump-ipa-inline says:
function not declared inline and code size would grow

> 
> "-std=gnu++1z" is important; "-m32" and "-O3" (as opposed to "-m64" and
> "-O2") are not.

With -O3 more inlining happens.

> 
> # /usr/local/products/gcc/8.2.0/bin/g++ -S -m32 -O3 ttt.C
> # grep 'call.*construct' ttt.s
> # (nada)
> 
> No calls -- good.  In this case gcc generates this fragment:
> 
> _Z3barv:
> .LFB1084:
>       .cfi_startproc
>       .cfi_personality 0,__gxx_personality_v0
>       .cfi_lsda 0,.LLSDA1084
>       pushl   %ebp
>       .cfi_def_cfa_offset 8
>       .cfi_offset 5, -8
>       movl    $25185, %edx
>       movl    %esp, %ebp
>       .cfi_def_cfa_register 5
>       pushl   %edi
>       pushl   %esi
>       .cfi_offset 7, -12
>       .cfi_offset 6, -16
>       leal    -48(%ebp), %esi
>       pushl   %ebx
>       .cfi_offset 3, -20
>       leal    -40(%ebp), %ebx
>       subl    $56, %esp
>       movl    %ebx, -48(%ebp)
>       pushl   %esi
>       movw    %dx, -40(%ebp)
>       movb    $99, -38(%ebp)
>       movl    $3, -44(%ebp)
>       movb    $0, -37(%ebp)
> .LEHB6:
>       .cfi_escape 0x2e,0x10
>       call    _Z3fooRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
> [...]
> 
> This is better than a call, but not great:
> 1. The string is moved into position in three chunks (25185, 99, 0).
>    This probably comes from inlined memcpy of 3 bytes, but the source
>    is zero-terminated so rounding the memcpy size up to 4 would have
>    been better.

Yes we end up with:
  __builtin_memcpy (&D.30710.D.23004._M_local_buf, "abc", 3);


> 2. It's unclear why 25185 is passed through a register.

It's somehow connected to fact that constant are somehow expensive
on x86_64. Jakub can help here..

Reply via email to