Re: optimize basic_string

Jonathan Wakely Tue, 26 Nov 2024 05:04:22 -0800

On Tue, 26 Nov 2024 at 12:22, Jan Hubicka <hubi...@ucw.cz> wrote:
>
> > On 24/11/24 01:42 +0100, Jan Hubicka wrote:
> > > Hi,
> > > another common source of unnecesary throw_bad_alloc calls is 
> > > basic_string::_M_create.
> > >    basic_string<_CharT, _Traits, _Alloc>::
> > >    _M_create(size_type& __capacity, size_type __old_capacity)
> > >    {
> > >      // _GLIBCXX_RESOLVE_LIB_DEFECTS
> > >      // 83.  String::npos vs. string::max_size()
> > >      if (__capacity > max_size())
> > >     std::__throw_length_error(__N("basic_string::_M_create"));
> > >
> > >      // The below implements an exponential growth policy, necessary to
> > >      // meet amortized linear time requirements of the library: see
> > >      // http://gcc.gnu.org/ml/libstdc++/2001-07/msg00085.html.
> > >      if (__capacity > __old_capacity && __capacity < 2 * __old_capacity)
> > >     {
> > >       __capacity = 2 * __old_capacity;
> > >       // Never allocate a string bigger than max_size.
> > >       if (__capacity > max_size())
> > >         __capacity = max_size();
> > >     }
> > >
> > >      // NB: Need an array of char_type[__capacity], plus a terminating
> > >      // null char_type() element.
> > >      return _S_allocate(_M_get_allocator(), __capacity + 1);
> > >    }
> > >
> > > The code makes it visible that string is never bigger then max_size, 
> > > however + 1
> > > makes it one byte bigger than maximal size allowed by allocator.  I 
> > > believe
> > > this is by miscounting the 0 at the end of string in max_size function:
> > >
> > > -      { return (_Alloc_traits::max_size(_M_get_allocator()) - 1) / 2; }
> > > +      { return _Alloc_traits::max_size(_M_get_allocator()) / 2 - 1; }
> >
> > Ah yes, good catch. That code was copied from ext/sso_string_base.h
> > where it has been present since r0-76546-g3ad7074772808f in 2006!
> it was caught by VRP :)
> I was thinking to try to add warning attribute to that throw_bad_allocv
> and see what comes out.
>
> Thinking of that code, I am not sure it is correct, still.  I see that
> allocator's max_size returns full address space and / 2 is needed to get
> to ptr_diff_t range.  However I think allocator is not required to
> return full address space and in case it returns something smaller,
> diding by 2 is not necessary.


That's true. If the allocator doesn't provide max_size() then the
default is required to be size_t(-1) / sizeof(T), which I consider a
defect in the standard:
https://eel.is/c++draft/allocator.traits.members#8
It probably should have been PTRDIFF_MAX / sizeof(T).


>  In fact I think this already happen for
> wstring.
>
> Vector's max size is:
> static _GLIBCXX20_CONSTEXPR size_type
> _S_max_size(const _Tp_alloc_type& __a) _GLIBCXX_NOEXCEPT
> {
>   // std::distance(begin(), end()) cannot be greater than PTRDIFF_MAX,
>   // and realistically we can't store more than PTRDIFF_MAX/sizeof(T)
>   // (even if std::allocator_traits::max_size says we can).
>   const size_t __diffmax
>     = __gnu_cxx::__numeric_traits<ptrdiff_t>::__max / sizeof(_Tp);
>   const size_t __allocmax = _Alloc_traits::max_size(__a);
>   return (std::min)(__diffmax, __allocmax);
> }
>
> Perhaps we want to do the same here just add -1 for the trailing 0?

Sounds good.

> {
>   const size_t __diffmax
>     = __gnu_cxx::__numeric_traits<ptrdiff_t>::__max / sizeof(_charT);
>   const size_t __allocmax = _Alloc_traits::max_size(_M_get_allocator());
>   return (std::min)(__diffmax, __allocmax) - 1;
> }

I have no objection.

It's theoretically an ABI break, because somebody could do:

constexpr max_string_len = std::string().max_size();
char buf[some_function_of(max_string_len)];

But I think in practice, nobody really cares about max_size anyway. On
64-bit targets you will never be able to use anything close to that
size.

Anybody who has baked std::string::max_size() into their program's ABI
is a fool.



> > > I find it funny that the code special cases a.size () == 1 to copy single 
> > > byte
> > > and a.size () == 0 to bypass memcpy call.  I would say it is job of 
> > > middle-end
> > > to optimize memcpy reasonably even for small blocks.  Was this based on 
> > > some data
> > > showing that many strings have size of 0 and 1 which is not visible to
> > > compiler?
> >
> > That was added as _M_copy in 2004 by r0-62838-gec61e852bc917b and when
> > I implemented the new std::__cxx11::basic_string I copied it, only
> > renaming it to _S_copy because that's our convention for static member
> > functions.
> >
> > There's no explanation in the ChangeLog (obviously, because ChangeLog
> > files are useless at telling you *why* something happened), but this
> > comment was added:
> >
> > +      // When __n = 1 way faster than the general multichar
> > +      // traits_type::copy/move/assign.
> >
> > Back in 2004 char_traits<char>::copy called memcpy (not
> > __builtin_memcpy) and maybe it was badly optimized.
> >
> > I think we can probably drop that "optimization" for __n == 1.
>
> If size of string is not compile-time constant, gcc by default will call
> memcpy (on x86 memcpy is faster for very large memory blocks) which does
> have fast path for small blocks.  So the code special casing _n == 1 may
> get faster for variably sized strings that are very often of size 1, but
> that observation applies to many other copy operations we have around.

I think strings of size one are not common :-)
I think most users would prefer if their strings > 1 get faster even
if it means a tiny performance loss for strings == 1.


> >
> > Please use __sz for these variables, as that's consistent with names
> > used elsewhere.
>
> Updated in my tree.
> >
> > > +   if (__siz > max_size ())
> > > +     __builtin_unreachable ();
> > > +   return __siz;
> > > +      }
> > >
> > >       ///  Returns the number of characters in the string, not including 
> > > any
> > >       ///  null-termination.
> > >       _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
> > >       size_type
> > >       length() const _GLIBCXX_NOEXCEPT
> > > -      { return _M_string_length; }
> > > +      {
> > > +   size_type __siz = _M_string_length;
> > > +   if (__siz > max_size ())
> > > +     __builtin_unreachable ();
> > > +   return __siz;
> >
> > Should this just call size() instead of repeating the definition?
>
> I also changed this to call.
> >
> > > +      }
> > >
> > >       ///  Returns the size() of the largest possible %string.
> > >       _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
> > >       size_type
> > >       max_size() const _GLIBCXX_NOEXCEPT
> > > -      { return (_Alloc_traits::max_size(_M_get_allocator()) - 1) / 2; }
> > > +      { return _Alloc_traits::max_size(_M_get_allocator()) / 2 - 1; }
> > >
> > >       /**
> > >        *  @brief  Resizes the %string to the specified number of 
> > > characters.
> > > @@ -1184,8 +1194,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
> > >       size_type
> > >       capacity() const _GLIBCXX_NOEXCEPT
> > >       {
> > > -   return _M_is_local() ? size_type(_S_local_capacity)
> > > -                        : _M_allocated_capacity;
> > > +   size_t __siz = _M_is_local() ? size_type(_S_local_capacity)
> >
> > size_type __sz = ...
> Also fixed
> >
> > > +                                : _M_allocated_capacity;
> > > +   if (__siz < _S_local_capacity || __siz > max_size () + 1)
> >
> > Is this +1 correct?
> >
> > I think it should be simply `__siz > max_size()`
> >
> > The largest allocation allowed is max_size()+1 but the value stored in
> > _M_allocated_capacity will be just max_size() without the +1.
>
> You are right. I misunderstood the code and tought capacity includes
> also the trailing 0.

Re: optimize basic_string

Reply via email to