Re: [PATCH 1/7] libstdc++: Refactor std::uninitialized_{copy, fill, fill_n} algos [PR68350]

Patrick Palka Wed, 16 Oct 2024 18:39:55 -0700

On Tue, 15 Oct 2024, Jonathan Wakely wrote:

> This is v2 of
> https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665246.html
> fixing some thinkos in uninitialized_{fill,fill_n}. We don't need to
> worry about overwriting tail-padding in those algos, because we only use
> memset for 1-byte integer types. So they have no tail padding that can
> be reused anyway! So this changes __n > 1 to __n > 0 in a few places
> (which fixes the problem that it was not actually filling anything for
> the n==1 cases).
> 
> Also simplify std::__to_address(__result++) to just __result++ because
> we already have a pointer, and use std::to_address(result++) for a C++20
> std::contiguous_iterator case, instead of addressof(*result++).
> 
> Tested x86_64-linux.
> 
> -- >8 --
> 
> This refactors the std::uninitialized_copy, std::uninitialized_fill and
> std::uninitialized_fill_n algorithms to directly perform memcpy/memset
> optimizations instead of dispatching to std::copy/std::fill/std::fill_n.
> 
> The reasons for this are:
> 
> - Use 'if constexpr' to simplify and optimize compilation throughput, so
>   dispatching to specialized class templates is only needed for C++98
>   mode.
> - Relax the conditions for using memcpy/memset, because the C++20 rules
>   on implicit-lifetime types mean that we can rely on memcpy to begin
>   lifetimes of trivially copyable types.  We don't need to require
>   trivially default constructible, so don't need to limit the
>   optimization to trivial types. See PR 68350 for more details.
> - The conditions on non-overlapping ranges are stronger for
>   std::uninitialized_copy than for std::copy so we can use memcpy instead
>   of memmove, which might be a minor optimization.
> - Avoid including <bits/stl_algobase.h> in <bits/stl_uninitialized.h>.
>   It only needs some iterator utilities from that file now, which belong
>   in <bits/stl_iterator.h> anyway, so this moves them there.
> 
> Several tests need changes to the diagnostics matched by dg-error
> because we no longer use the __constructible() function that had a
> static assert in. Now we just get straightforward errors for attempting
> to use a deleted constructor.
> 
> Two tests needed more signficant changes to the actual expected results
> of executing the tests, because they were checking for old behaviour
> which was incorrect according to the standard.
> 20_util/specialized_algorithms/uninitialized_copy/64476.cc was expecting
> std::copy to be used for a call to std::uninitialized_copy involving two
> trivially copyable types. That was incorrect behaviour, because a
> non-trivial constructor should have been used, but using std::copy used
> trivial default initialization followed by assignment.
> 20_util/specialized_algorithms/uninitialized_fill_n/sizes.cc was testing
> the behaviour with a non-integral Size passed to uninitialized_fill_n,
> but I wrote the test looking at the requirements of uninitialized_copy_n
> which are not the same as uninitialized_fill_n. The former uses --n and
> tests n > 0, but the latter just tests n-- (which will never be false
> for a floating-point value with a fractional part).
> 
> libstdc++-v3/ChangeLog:
> 
>       PR libstdc++/68350
>       PR libstdc++/93059
>       * include/bits/stl_algobase.h (__niter_base, __niter_wrap): Move
>       to ...
>       * include/bits/stl_iterator.h: ... here.
>       * include/bits/stl_uninitialized.h (__check_constructible)
>       (_GLIBCXX_USE_ASSIGN_FOR_INIT): Remove.
>       [C++98] (__unwrappable_niter): New trait.
>       (__uninitialized_copy<true>): Replace use of std::copy.
>       (uninitialized_copy): Fix Doxygen comments. Open-code memcpy
>       optimization for C++11 and later.
>       (__uninitialized_fill<true>): Replace use of std::fill.
>       (uninitialized_fill): Fix Doxygen comments. Open-code memset
>       optimization for C++11 and later.
>       (__uninitialized_fill_n<true>): Replace use of std::fill_n.
>       (uninitialized_fill_n): Fix Doxygen comments. Open-code memset
>       optimization for C++11 and later.
>       * testsuite/20_util/specialized_algorithms/uninitialized_copy/64476.cc:
>       Adjust expected behaviour to match what the standard specifies.
>       * 
> testsuite/20_util/specialized_algorithms/uninitialized_fill_n/sizes.cc:
>       Likewise.
>       * testsuite/20_util/specialized_algorithms/uninitialized_copy/1.cc:
>       Adjust dg-error directives.
>       * testsuite/20_util/specialized_algorithms/uninitialized_copy/89164.cc:
>       Likewise.
>       * 
> testsuite/20_util/specialized_algorithms/uninitialized_copy_n/89164.cc:
>       Likewise.
>       * testsuite/20_util/specialized_algorithms/uninitialized_fill/89164.cc:
>       Likewise.
>       * 
> testsuite/20_util/specialized_algorithms/uninitialized_fill_n/89164.cc:
>       Likewise.
>       * testsuite/23_containers/vector/cons/89164.cc: Likewise.
>       * testsuite/23_containers/vector/cons/89164_c++17.cc: Likewise.
> ---
>  libstdc++-v3/include/bits/stl_algobase.h      |  45 --
>  libstdc++-v3/include/bits/stl_iterator.h      |  54 +++
>  libstdc++-v3/include/bits/stl_uninitialized.h | 385 +++++++++++++-----
>  .../uninitialized_copy/1.cc                   |   3 +-
>  .../uninitialized_copy/64476.cc               |   6 +-
>  .../uninitialized_copy/89164.cc               |   3 +-
>  .../uninitialized_copy_n/89164.cc             |   3 +-
>  .../uninitialized_fill/89164.cc               |   3 +-
>  .../uninitialized_fill_n/89164.cc             |   3 +-
>  .../uninitialized_fill_n/sizes.cc             |  22 +-
>  .../23_containers/vector/cons/89164.cc        |   5 +-
>  .../23_containers/vector/cons/89164_c++17.cc  |   3 +-
>  12 files changed, 383 insertions(+), 152 deletions(-)
> 
> diff --git a/libstdc++-v3/include/bits/stl_algobase.h 
> b/libstdc++-v3/include/bits/stl_algobase.h
> index 384e5fdcdc9..751b7ad119b 100644
> --- a/libstdc++-v3/include/bits/stl_algobase.h
> +++ b/libstdc++-v3/include/bits/stl_algobase.h
> @@ -308,51 +308,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>        return __a;
>      }
>  
> -  // Fallback implementation of the function in bits/stl_iterator.h used to
> -  // remove the __normal_iterator wrapper. See copy, fill, ...
> -  template<typename _Iterator>
> -    _GLIBCXX20_CONSTEXPR
> -    inline _Iterator
> -    __niter_base(_Iterator __it)
> -    
> _GLIBCXX_NOEXCEPT_IF(std::is_nothrow_copy_constructible<_Iterator>::value)
> -    { return __it; }
> -
> -#if __cplusplus < 201103L
> -  template<typename _Ite, typename _Seq>
> -    _Ite
> -    __niter_base(const ::__gnu_debug::_Safe_iterator<_Ite, _Seq,
> -              std::random_access_iterator_tag>&);
> -
> - template<typename _Ite, typename _Cont, typename _Seq>
> -    _Ite
> -    __niter_base(const ::__gnu_debug::_Safe_iterator<
> -              ::__gnu_cxx::__normal_iterator<_Ite, _Cont>, _Seq,
> -              std::random_access_iterator_tag>&);
> -#else
> -  template<typename _Ite, typename _Seq>
> -    _GLIBCXX20_CONSTEXPR
> -    decltype(std::__niter_base(std::declval<_Ite>()))
> -    __niter_base(const ::__gnu_debug::_Safe_iterator<_Ite, _Seq,
> -              std::random_access_iterator_tag>&)
> -    noexcept(std::is_nothrow_copy_constructible<_Ite>::value);
> -#endif
> -
> -  // Reverse the __niter_base transformation to get a
> -  // __normal_iterator back again (this assumes that __normal_iterator
> -  // is only used to wrap random access iterators, like pointers).
> -  template<typename _From, typename _To>
> -    _GLIBCXX20_CONSTEXPR
> -    inline _From
> -    __niter_wrap(_From __from, _To __res)
> -    { return __from + (std::__niter_base(__res) - 
> std::__niter_base(__from)); }
> -
> -  // No need to wrap, iterator already has the right type.
> -  template<typename _Iterator>
> -    _GLIBCXX20_CONSTEXPR
> -    inline _Iterator
> -    __niter_wrap(const _Iterator&, _Iterator __res)
> -    { return __res; }
> -
>    // All of these auxiliary structs serve two purposes.  (1) Replace
>    // calls to copy with memmove whenever possible.  (Memmove, not memcpy,
>    // because the input and output ranges are permitted to overlap.)
> diff --git a/libstdc++-v3/include/bits/stl_iterator.h 
> b/libstdc++-v3/include/bits/stl_iterator.h
> index 28a600c81cb..85b98ffff61 100644
> --- a/libstdc++-v3/include/bits/stl_iterator.h
> +++ b/libstdc++-v3/include/bits/stl_iterator.h
> @@ -1338,10 +1338,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  _GLIBCXX_END_NAMESPACE_VERSION
>  } // namespace
>  
> +namespace __gnu_debug
> +{
> +  template<typename _Iterator, typename _Sequence, typename _Category>
> +    class _Safe_iterator;
> +}
> +
>  namespace std _GLIBCXX_VISIBILITY(default)
>  {
>  _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  
> +  // Unwrap a __normal_iterator to get the underlying iterator
> +  // (usually a pointer)
>    template<typename _Iterator, typename _Container>
>      _GLIBCXX20_CONSTEXPR
>      _Iterator
> @@ -1349,6 +1357,52 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>      
> _GLIBCXX_NOEXCEPT_IF(std::is_nothrow_copy_constructible<_Iterator>::value)
>      { return __it.base(); }
>  
> +  // Fallback implementation of the function in bits/stl_iterator.h used to
> +  // remove the __normal_iterator wrapper. See std::copy, std::fill, etc.
> +  template<typename _Iterator>
> +    _GLIBCXX20_CONSTEXPR
> +    inline _Iterator
> +    __niter_base(_Iterator __it)
> +    
> _GLIBCXX_NOEXCEPT_IF(std::is_nothrow_copy_constructible<_Iterator>::value)
> +    { return __it; }
> +
> +  // Overload for _Safe_iterator needs to be declared before __niter_base 
> uses.


Do we also need to declare __niter_base(move_iterator) earlier in this
file so that name lookup finds it when calling __niter_base ...

> +#if __cplusplus < 201103L
> +  template<typename _Ite, typename _Seq>
> +    _Ite
> +    __niter_base(const ::__gnu_debug::_Safe_iterator<_Ite, _Seq,
> +              std::random_access_iterator_tag>&);
> +
> + template<typename _Ite, typename _Cont, typename _Seq>
> +    _Ite
> +    __niter_base(const ::__gnu_debug::_Safe_iterator<
> +              ::__gnu_cxx::__normal_iterator<_Ite, _Cont>, _Seq,
> +              std::random_access_iterator_tag>&);
> +#else
> +  template<typename _Ite, typename _Seq>
> +    _GLIBCXX20_CONSTEXPR
> +    decltype(std::__niter_base(std::declval<_Ite>()))

... here and ...

> +    __niter_base(const ::__gnu_debug::_Safe_iterator<_Ite, _Seq,
> +              std::random_access_iterator_tag>&)
> +    noexcept(std::is_nothrow_copy_constructible<_Ite>::value);
> +#endif
> +
> +  // Reverse the __niter_base transformation to get a
> +  // __normal_iterator back again (this assumes that __normal_iterator
> +  // is only used to wrap random access iterators, like pointers).
> +  template<typename _From, typename _To>
> +    _GLIBCXX20_CONSTEXPR
> +    inline _From
> +    __niter_wrap(_From __from, _To __res)
> +    { return __from + (std::__niter_base(__res) - 
> std::__niter_base(__from)); }

... here?

> +
> +  // No need to wrap, iterator already has the right type.
> +  template<typename _Iterator>
> +    _GLIBCXX20_CONSTEXPR
> +    inline _Iterator
> +    __niter_wrap(const _Iterator&, _Iterator __res)
> +    { return __res; }
> +
>  #if __cplusplus >= 201103L && __cplusplus <= 201703L
>    // Need to overload __to_address because the pointer_traits primary 
> template
>    // will deduce element_type of __normal_iterator<T*, C> as T* rather than 
> T.
> diff --git a/libstdc++-v3/include/bits/stl_uninitialized.h 
> b/libstdc++-v3/include/bits/stl_uninitialized.h
> index f663057b1a1..ec980d66ccf 100644
> --- a/libstdc++-v3/include/bits/stl_uninitialized.h
> +++ b/libstdc++-v3/include/bits/stl_uninitialized.h
> @@ -57,16 +57,16 @@
>  #define _STL_UNINITIALIZED_H 1
>  
>  #if __cplusplus >= 201103L
> -#include <type_traits>
> +# include <type_traits>
> +# include <bits/ptr_traits.h>      // __to_address
> +# include <bits/stl_pair.h>        // pair
>  #endif
>  
> -#include <bits/stl_algobase.h>    // copy
> +#include <bits/cpp_type_traits.h> // __is_pointer
> +#include <bits/stl_iterator_base_funcs.h> // distance, advance
> +#include <bits/stl_iterator.h>    // __niter_base
>  #include <ext/alloc_traits.h>     // __alloc_traits
>  
> -#if __cplusplus >= 201703L
> -#include <bits/stl_pair.h>
> -#endif
> -
>  namespace std _GLIBCXX_VISIBILITY(default)
>  {
>  _GLIBCXX_BEGIN_NAMESPACE_VERSION
> @@ -77,36 +77,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  
>    /// @cond undocumented
>  
> -#if __cplusplus >= 201103L
> -  template<typename _ValueType, typename _Tp>
> -    constexpr bool
> -    __check_constructible()
> -    {
> -      // Trivial types can have deleted constructors, but std::copy etc.
> -      // only use assignment (or memmove) not construction, so we need an
> -      // explicit check that construction from _Tp is actually valid,
> -      // otherwise some ill-formed uses of std::uninitialized_xxx would
> -      // compile without errors. This gives a nice clear error message.
> -      static_assert(is_constructible<_ValueType, _Tp>::value,
> -       "result type must be constructible from input type");
> -
> -      return true;
> -    }
> -
> -// If the type is trivial we don't need to construct it, just assign to it.
> -// But trivial types can still have deleted or inaccessible assignment,
> -// so don't try to use std::copy or std::fill etc. if we can't assign.
> -# define _GLIBCXX_USE_ASSIGN_FOR_INIT(T, U) \
> -    __is_trivial(T) && __is_assignable(T&, U) \
> -    && std::__check_constructible<T, U>()
> -#else
> -// No need to check if is_constructible<T, U> for C++98. Trivial types have
> -// no user-declared constructors, so if the assignment is valid, construction
> -// should be too.
> -# define _GLIBCXX_USE_ASSIGN_FOR_INIT(T, U) \
> -    __is_trivial(T) && __is_assignable(T&, U)
> -#endif
> -
>    template<typename _ForwardIterator, typename _Alloc = void>
>      struct _UninitDestroyGuard
>      {
> @@ -160,6 +130,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>        _UninitDestroyGuard(const _UninitDestroyGuard&);
>      };
>  
> +  // This is the default implementation of std::uninitialized_copy.
>    template<typename _InputIterator, typename _ForwardIterator>
>      _GLIBCXX20_CONSTEXPR
>      _ForwardIterator
> @@ -173,7 +144,22 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>        return __result;
>      }
>  
> -  template<bool _TrivialValueTypes>
> +#if __cplusplus < 201103L
> +
> +  // True if we can unwrap _Iter to get a pointer by using std::__niter_base.
> +  template<typename _Iter>
> +    struct __unwrappable_niter
> +    {
> +      template<typename> struct __is_ptr { enum { __value = 0 }; };
> +      template<typename _Tp> struct __is_ptr<_Tp*> { enum { __value = 1 }; };
> +
> +      typedef __decltype(std::__niter_base(*(_Iter*)0)) _Base;
> +
> +      enum { __value = __is_ptr<_Base>::__value };
> +    };

It might be slightly cheaper to define this without the nested class
template as:

  template<typename _Iter, typename _Base = 
__decltype(std::__niter_base(*(_Iter*)0))>
  struct __unwrappable_niter
  { enum { __value = false }; };

  template<typename _Iter, typename _Tp>
  struct __unwrappable_niter<_Iter, _Tp*>
  { enum { __value = true }; };

> +
> +  // Use template specialization for C++98 when 'if constexpr' can't be used.
> +  template<bool _CanMemcpy>
>      struct __uninitialized_copy
>      {
>        template<typename _InputIterator, typename _ForwardIterator>
> @@ -186,53 +172,150 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>    template<>
>      struct __uninitialized_copy<true>
>      {
> +      // Overload for generic iterators.
>        template<typename _InputIterator, typename _ForwardIterator>
>          static _ForwardIterator
>          __uninit_copy(_InputIterator __first, _InputIterator __last,
>                     _ForwardIterator __result)
> -        { return std::copy(__first, __last, __result); }
> -    };
> +     {
> +       if (__unwrappable_niter<_InputIterator>::__value
> +             && __unwrappable_niter<_ForwardIterator>::__value)
> +         {
> +           __uninit_copy(std::__niter_base(__first),
> +                         std::__niter_base(__last),
> +                         std::__niter_base(__result));
> +           std::advance(__result, std::distance(__first, __last));
> +           return __result;
> +         }
> +       else
> +         return std::__do_uninit_copy(__first, __last, __result);
> +     }
>  
> +      // Overload for pointers.
> +      template<typename _Tp, typename _Up>
> +     static _Up*
> +     __uninit_copy(_Tp* __first, _Tp* __last, _Up* __result)
> +     {
> +       // Ensure that we don't successfully memcpy in cases that should be
> +       // ill-formed because is_constructible<_Up, _Tp&> is false.
> +       typedef __typeof__(static_cast<_Up>(*__first)) __check
> +         __attribute__((__unused__));
> +
> +       if (const ptrdiff_t __n = __last - __first)

Do we have to worry about the __n == 1 case here like in the C++11 code path?

> +         {
> +           __builtin_memcpy(__result, __first, __n * sizeof(_Tp));
> +           __result += __n;
> +         }
> +       return __result;
> +     }
> +    };
> +#endif
>    /// @endcond
>  
> +#pragma GCC diagnostic push
> +#pragma GCC diagnostic ignored "-Wc++17-extensions"
>    /**
>     *  @brief Copies the range [first,last) into result.
>     *  @param  __first  An input iterator.
>     *  @param  __last   An input iterator.
> -   *  @param  __result An output iterator.
> -   *  @return   __result + (__first - __last)
> +   *  @param  __result A forward iterator.
> +   *  @return   __result + (__last - __first)
>     *
> -   *  Like copy(), but does not require an initialized output range.
> +   *  Like std::copy, but does not require an initialized output range.
>    */
>    template<typename _InputIterator, typename _ForwardIterator>
>      inline _ForwardIterator
>      uninitialized_copy(_InputIterator __first, _InputIterator __last,
>                      _ForwardIterator __result)
>      {
> +      // We can use memcpy to copy the ranges under these conditions:
> +      //
> +      // _ForwardIterator and _InputIterator are both contiguous iterators,
> +      // so that we can turn them into pointers to pass to memcpy.
> +      // Before C++20 we can't detect all contiguous iterators, so we only
> +      // handle built-in pointers and __normal_iterator<T*, C> types.
> +      //
> +      // The value types of both iterators are trivially-copyable types,
> +      // so that memcpy is not undefined and can begin the lifetime of
> +      // new objects in the output range.
> +      //
> +      // Finally, memcpy from the source type, S, to the destination type, D,
> +      // must give the same value as initialization of D from S would give.
> +      // We require is_trivially_constructible<D, S> to be true, but that is
> +      // not sufficient. Some cases of trivial initialization are not just a
> +      // bitwise copy, even when sizeof(D) == sizeof(S),
> +      // e.g. bit_cast<unsigned>(1.0f) != 1u because the corresponding bits
> +      // of the value representations do not have the same meaning.
> +      // We cannot tell when this condition is true in general,
> +      // so we rely on the __memcpyable trait.
> +
> +#if __cplusplus >= 201103L
> +      using _Dest = decltype(std::__niter_base(__result));
> +      using _Src = decltype(std::__niter_base(__first));
> +      using _ValT = typename iterator_traits<_ForwardIterator>::value_type;
> +
> +      if constexpr (!__is_trivially_constructible(_ValT, decltype(*__first)))
> +     return std::__do_uninit_copy(__first, __last, __result);
> +      else if constexpr (__memcpyable<_Dest, _Src>::__value)
> +     {
> +       ptrdiff_t __n = __last - __first;
> +       if (__builtin_expect(__n > 1, true))

Could we use [[__likely__]] instead?

> +         {
> +           using _ValT = typename remove_pointer<_Src>::type;
> +           __builtin_memcpy(std::__niter_base(__result),
> +                            std::__niter_base(__first),
> +                            __n * sizeof(_ValT));
> +           __result += __n;
> +         }
> +       else if (__n == 1) // memcpy could overwrite tail padding
> +         std::_Construct(__result++, *__first);
> +       return __result;
> +     }
> +#if __cpp_lib_concepts
> +      else if constexpr (contiguous_iterator<_ForwardIterator>
> +                        && contiguous_iterator<_InputIterator>)
> +     {
> +       using _DestPtr = decltype(std::to_address(__result));
> +       using _SrcPtr = decltype(std::to_address(__first));
> +       if constexpr (__memcpyable<_DestPtr, _SrcPtr>::__value)
> +         {
> +           if (auto __n = __last - __first; __n > 1) [[likely]]
> +             {
> +               void* __dest = std::to_address(__result);
> +               const void* __src = std::to_address(__first);
> +               size_t __nbytes = __n * sizeof(remove_pointer_t<_DestPtr>);
> +               __builtin_memmove(__dest, __src, __nbytes);

Why do we need to use memmove instead of memcpy here?

> +               __result += __n;
> +             }
> +           else if (__n == 1) // memcpy could overwrite tail padding
> +             std::construct_at(std::to_address(__result++), *__first);
> +           return __result;
> +         }
> +       else
> +         return std::__do_uninit_copy(__first, __last, __result);
> +     }
> +#endif
> +      else
> +     return std::__do_uninit_copy(__first, __last, __result);
> +#else // C++98
>        typedef typename iterator_traits<_InputIterator>::value_type
>       _ValueType1;
>        typedef typename iterator_traits<_ForwardIterator>::value_type
>       _ValueType2;
>  
> -      // _ValueType1 must be trivially-copyable to use memmove, so don't
> -      // bother optimizing to std::copy if it isn't.
> -      // XXX Unnecessary because std::copy would check it anyway?
> -      const bool __can_memmove = __is_trivial(_ValueType1);
> +      const bool __can_memcpy
> +     = __memcpyable<_ValueType1*, _ValueType2*>::__value
> +         && __is_trivially_constructible(_ValueType2, __decltype(*__first));
>  
> -#if __cplusplus < 201103L
> -      typedef typename iterator_traits<_InputIterator>::reference _From;
> -#else
> -      using _From = decltype(*__first);
> +      return __uninitialized_copy<__can_memcpy>::
> +            __uninit_copy(__first, __last, __result);
>  #endif
> -      const bool __assignable
> -     = _GLIBCXX_USE_ASSIGN_FOR_INIT(_ValueType2, _From);
> -
> -      return std::__uninitialized_copy<__can_memmove && __assignable>::
> -     __uninit_copy(__first, __last, __result);
>      }
> +#pragma GCC diagnostic pop
>  
>    /// @cond undocumented
>  
> +  // This is the default implementation of std::uninitialized_fill.
>    template<typename _ForwardIterator, typename _Tp>
>      _GLIBCXX20_CONSTEXPR void
>      __do_uninit_fill(_ForwardIterator __first, _ForwardIterator __last,
> @@ -244,12 +327,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>        __guard.release();
>      }
>  
> -  template<bool _TrivialValueType>
> +#if __cplusplus < 201103L
> +  // Use template specialization for C++98 when 'if constexpr' can't be used.
> +  template<bool _CanMemset>
>      struct __uninitialized_fill
>      {
>        template<typename _ForwardIterator, typename _Tp>
> -        static void
> -        __uninit_fill(_ForwardIterator __first, _ForwardIterator __last,
> +     static void
> +     __uninit_fill(_ForwardIterator __first, _ForwardIterator __last,
>                     const _Tp& __x)
>       { std::__do_uninit_fill(__first, __last, __x); }
>      };
> @@ -257,56 +342,129 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>    template<>
>      struct __uninitialized_fill<true>
>      {
> +      // Overload for generic iterators.
>        template<typename _ForwardIterator, typename _Tp>
> -        static void
> -        __uninit_fill(_ForwardIterator __first, _ForwardIterator __last,
> +     static void
> +     __uninit_fill(_ForwardIterator __first, _ForwardIterator __last,
>                     const _Tp& __x)
> -        { std::fill(__first, __last, __x); }
> -    };
> +     {
> +       if (__unwrappable_niter<_ForwardIterator>::__value)
> +         __uninit_fill(std::__niter_base(__first),
> +                       std::__niter_base(__last),
> +                       __x);
> +       else
> +         std::__do_uninit_copy(__first, __last, __x);
> +     }
>  
> +      // Overload for pointers.
> +      template<typename _Up, typename _Tp>
> +     static void
> +     __uninit_fill(_Up* __first, _Up* __last, const _Tp& __x)
> +     {
> +       // Ensure that we don't successfully memset in cases that should be
> +       // ill-formed because is_constructible<_Up, const _Tp&> is false.
> +       typedef __typeof__(static_cast<_Up>(__x)) __check
> +         __attribute__((__unused__));
> +
> +       if (__first != __last)
> +         __builtin_memset(__first, (int)__x, __last - __first);

Maybe (unsigned char)__x for consistency with the C++11 code path?

> +     }
> +    };
> +#endif
>    /// @endcond
>  
>    /**
>     *  @brief Copies the value x into the range [first,last).
> -   *  @param  __first  An input iterator.
> -   *  @param  __last   An input iterator.
> +   *  @param  __first  A forward iterator.
> +   *  @param  __last   A forward iterator.
>     *  @param  __x      The source value.
>     *  @return   Nothing.
>     *
> -   *  Like fill(), but does not require an initialized output range.
> +   *  Like std::fill, but does not require an initialized output range.
>    */
>    template<typename _ForwardIterator, typename _Tp>
>      inline void
>      uninitialized_fill(_ForwardIterator __first, _ForwardIterator __last,
>                      const _Tp& __x)
>      {
> +      // We would like to use memset to optimize this loop when possible.
> +      // As for std::uninitialized_copy, the optimization requires
> +      // contiguous iterators and trivially copyable value types,
> +      // with the additional requirement that sizeof(_Tp) == 1 because
> +      // memset only writes single bytes.
> +
> +      // FIXME: We could additionally enable this for 1-byte enums.
> +      // Maybe any 1-byte Val if is_trivially_constructible<Val, const T&>?
> +
>        typedef typename iterator_traits<_ForwardIterator>::value_type
>       _ValueType;
>  
> -      // Trivial types do not need a constructor to begin their lifetime,
> -      // so try to use std::fill to benefit from its memset optimization.
> -      const bool __can_fill
> -     = _GLIBCXX_USE_ASSIGN_FOR_INIT(_ValueType, const _Tp&);
> +#if __cplusplus >= 201103L
> +#pragma GCC diagnostic push
> +#pragma GCC diagnostic ignored "-Wc++17-extensions"
> +      if constexpr (__is_byte<_ValueType>::__value)
> +     if constexpr (is_same<_ValueType, _Tp>::value
> +                     || is_integral<_Tp>::value)
> +       {
> +         using _BasePtr = decltype(std::__niter_base(__first));
> +         if constexpr (is_pointer<_BasePtr>::value)
> +           {
> +             void* __dest = std::__niter_base(__first);
> +             ptrdiff_t __n = __last - __first;
> +             if (__builtin_expect(__n > 0, true))
> +               __builtin_memset(__dest, (unsigned char)__x, __n);
> +             return;
> +           }
> +#if __cpp_lib_concepts
> +         else if constexpr (contiguous_iterator<_ForwardIterator>)
> +           {
> +             auto __dest = std::__to_address(__first);
> +             auto __n = __last - __first;
> +             if (__builtin_expect(__n > 0, true))
> +               __builtin_memset(__dest, (unsigned char)__x, __n);
> +             return;
> +           }
> +#endif
> +       }
> +      std::__do_uninit_fill(__first, __last, __x);
> +#pragma GCC diagnostic pop
> +#else // C++98
> +      const bool __can_memset = __is_byte<_ValueType>::__value
> +                               && __is_integer<_Tp>::__value;
>  
> -      std::__uninitialized_fill<__can_fill>::
> -     __uninit_fill(__first, __last, __x);
> +      __uninitialized_fill<__can_memset>::__uninit_fill(__first, __last, 
> __x);
> +#endif
>      }
>  
>    /// @cond undocumented
>  
> +  // This is the default implementation of std::uninitialized_fill_n.
>    template<typename _ForwardIterator, typename _Size, typename _Tp>
>      _GLIBCXX20_CONSTEXPR
>      _ForwardIterator
>      __do_uninit_fill_n(_ForwardIterator __first, _Size __n, const _Tp& __x)
>      {
>        _UninitDestroyGuard<_ForwardIterator> __guard(__first);
> -      for (; __n > 0; --__n, (void) ++__first)
> +#if __cplusplus >= 201103L
> +#pragma GCC diagnostic push
> +#pragma GCC diagnostic ignored "-Wc++17-extensions"
> +      if constexpr (is_integral<_Size>::value)
> +     // Loop will never terminate if __n is negative.
> +     __glibcxx_assert(__n >= 0);
> +      else if constexpr (is_floating_point<_Size>::value)
> +     // Loop will never terminate if __n is not an integer.
> +     __glibcxx_assert(__n >= 0 && static_cast<size_t>(__n) == __n);
> +#pragma GCC diagnostic pop
> +#endif
> +      for (; __n--; ++__first)
>       std::_Construct(std::__addressof(*__first), __x);
>        __guard.release();
>        return __first;
>      }
>  
> -  template<bool _TrivialValueType>
> +#if __cplusplus < 201103L
> +  // Use template specialization for C++98 when 'if constexpr' can't be used.
> +  template<bool _CanMemset>
>      struct __uninitialized_fill_n
>      {
>        template<typename _ForwardIterator, typename _Size, typename _Tp>
> @@ -319,47 +477,92 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>    template<>
>      struct __uninitialized_fill_n<true>
>      {
> +      // Overload for generic iterators.
>        template<typename _ForwardIterator, typename _Size, typename _Tp>
>       static _ForwardIterator
>          __uninit_fill_n(_ForwardIterator __first, _Size __n,
>                       const _Tp& __x)
> -        { return std::fill_n(__first, __n, __x); }
> +     {
> +       if (__unwrappable_niter<_ForwardIterator>::__value)
> +         {
> +           _ForwardIterator __last = __first;
> +           std::advance(__last, __n);
> +           __uninitialized_fill<true>::__uninit_fill(__first, __last, __x);
> +           return __last;
> +         }
> +       else
> +         return std::__do_uninit_fill_n(__first, __n, __x);
> +     }
>      };
> -
> +#endif
>    /// @endcond
>  
> +#pragma GCC diagnostic push
> +#pragma GCC diagnostic ignored "-Wc++17-extensions"
>     // _GLIBCXX_RESOLVE_LIB_DEFECTS
>     // DR 1339. uninitialized_fill_n should return the end of its range
>    /**
>     *  @brief Copies the value x into the range [first,first+n).
> -   *  @param  __first  An input iterator.
> +   *  @param  __first  A forward iterator.
>     *  @param  __n      The number of copies to make.
>     *  @param  __x      The source value.
> -   *  @return   Nothing.
> +   *  @return   __first + __n.
>     *
> -   *  Like fill_n(), but does not require an initialized output range.
> +   *  Like std::fill_n, but does not require an initialized output range.
>    */
>    template<typename _ForwardIterator, typename _Size, typename _Tp>
>      inline _ForwardIterator
>      uninitialized_fill_n(_ForwardIterator __first, _Size __n, const _Tp& __x)
>      {
> +      // See uninitialized_fill conditions. We also require _Size to be
> +      // an integer. The standard only requires _Size to be decrementable
> +      // and contextually convertible to bool, so don't assume first+n works.
> +
> +      // FIXME: We could additionally enable this for 1-byte enums.
> +
>        typedef typename iterator_traits<_ForwardIterator>::value_type
>       _ValueType;
>  
> -      // Trivial types do not need a constructor to begin their lifetime,
> -      // so try to use std::fill_n to benefit from its optimizations.
> -      const bool __can_fill
> -     = _GLIBCXX_USE_ASSIGN_FOR_INIT(_ValueType, const _Tp&)
> -      // For arbitrary class types and floating point types we can't assume
> -      // that __n > 0 and std::__size_to_integer(__n) > 0 are equivalent,
> -      // so only use std::fill_n when _Size is already an integral type.
> -     && __is_integer<_Size>::__value;
> +#if __cplusplus >= 201103L
> +      if constexpr (__is_byte<_ValueType>::__value)
> +     if constexpr (is_integral<_Tp>::value)
> +       if constexpr (is_integral<_Size>::value)
> +         {
> +           using _BasePtr = decltype(std::__niter_base(__first));
> +           if constexpr (is_pointer<_BasePtr>::value)
> +             {
> +               void* __dest = std::__niter_base(__first);
> +               if (__builtin_expect(__n > 0, true))
> +                 {
> +                   __builtin_memset(__dest, (unsigned char)__x, __n);
> +                   __first += __n;
> +                 }
> +               return __first;
> +             }
> +#if __cpp_lib_concepts
> +           else if constexpr (contiguous_iterator<_ForwardIterator>)
> +             {
> +               auto __dest = std::__to_address(__first);
> +               if (__builtin_expect(__n > 0, true))
> +                 {
> +                   __builtin_memset(__dest, (unsigned char)__x, __n);
> +                   __first += __n;
> +                 }
> +               return __first;
> +             }
> +#endif
> +         }
> +      return std::__do_uninit_fill_n(__first, __n, __x);
> +#else // C++98
> +      const bool __can_memset = __is_byte<_ValueType>::__value
> +                               && __is_integer<_Tp>::__value
> +                               && __is_integer<_Size>::__value;
>  
> -      return __uninitialized_fill_n<__can_fill>::
> +      return __uninitialized_fill_n<__can_memset>::
>       __uninit_fill_n(__first, __n, __x);
> +#endif
>      }
> -
> -#undef _GLIBCXX_USE_ASSIGN_FOR_INIT
> +#pragma GCC diagnostic pop
>  
>    /// @cond undocumented
>  
> @@ -619,7 +822,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>           = std::__addressof(*__first);
>         std::_Construct(__val);
>         if (++__first != __last)
> -         std::fill(__first, __last, *__val);
> +         std::uninitialized_fill(__first, __last, *__val);
>       }
>      };
>  
> @@ -653,7 +856,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>               = std::__addressof(*__first);
>             std::_Construct(__val);
>             ++__first;
> -           __first = std::fill_n(__first, __n - 1, *__val);
> +           __first = std::uninitialized_fill_n(__first, __n - 1, *__val);

These last two changes seem to be missing in the ChangeLog.

>           }
>         return __first;
>       }
> diff --git 
> a/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_copy/1.cc
>  
> b/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_copy/1.cc
> index 398d8690b56..27b3100d362 100644
> --- 
> a/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_copy/1.cc
> +++ 
> b/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_copy/1.cc
> @@ -34,4 +34,5 @@ test01(T* result)
>    T t[1];
>    std::uninitialized_copy(t, t+1, result); // { dg-error "here" }
>  }
> -// { dg-error "must be constructible from input type" "" { target *-*-* } 0 }
> +// { dg-error "no matching function" "construct_at" { target c++20 } 0 }
> +// { dg-error "use of deleted function" "T::T(const T&)" { target *-*-* } 0 }
> diff --git 
> a/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_copy/64476.cc
>  
> b/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_copy/64476.cc
> index 2f7dda3417d..e99338dff39 100644
> --- 
> a/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_copy/64476.cc
> +++ 
> b/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_copy/64476.cc
> @@ -54,8 +54,10 @@ test01()
>  
>    std::uninitialized_copy(a, a+10, b);
>  
> -  VERIFY(constructed == 0);
> -  VERIFY(assigned == 10);
> +  // In GCC 14 and older std::uninitialized_copy was optimized to std::copy
> +  // and so used assignments not construction, but that was non-conforming.
> +  VERIFY(constructed == 10);
> +  VERIFY(assigned == 0);
>  }
>  
>  int
> diff --git 
> a/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_copy/89164.cc
>  
> b/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_copy/89164.cc
> index 48c16da4d32..6e978a7e36c 100644
> --- 
> a/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_copy/89164.cc
> +++ 
> b/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_copy/89164.cc
> @@ -35,4 +35,5 @@ void test01()
>  
>    std::uninitialized_copy(x, x+1, p); // { dg-error "here" }
>  }
> -// { dg-error "must be constructible" "" { target *-*-* } 0 }
> +// { dg-error "no matching function" "construct_at" { target c++20 } 0 }
> +// { dg-error "use of deleted function" "X(const X&)" { target *-*-* } 0 }
> diff --git 
> a/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_copy_n/89164.cc
>  
> b/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_copy_n/89164.cc
> index 4e8fb0f4af2..96156208372 100644
> --- 
> a/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_copy_n/89164.cc
> +++ 
> b/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_copy_n/89164.cc
> @@ -32,4 +32,5 @@ void test01()
>  
>    std::uninitialized_copy_n(x, 1, p); // { dg-error "here" }
>  }
> -// { dg-error "must be constructible" "" { target *-*-* } 0 }
> +// { dg-error "no matching function" "construct_at" { target c++20 } 0 }
> +// { dg-error "use of deleted function" "X(const X&)" { target *-*-* } 0 }
> diff --git 
> a/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_fill/89164.cc
>  
> b/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_fill/89164.cc
> index 8353b5882f0..0dcaa1aa9c3 100644
> --- 
> a/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_fill/89164.cc
> +++ 
> b/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_fill/89164.cc
> @@ -32,4 +32,5 @@ void f()
>  
>    std::uninitialized_fill(p, p+1, x); // { dg-error "here" }
>  }
> -// { dg-error "must be constructible" "" { target *-*-* } 0 }
> +// { dg-error "no matching function" "construct_at" { target c++20 } 0 }
> +// { dg-error "use of deleted function" "X(const X&)" { target *-*-* } 0 }
> diff --git 
> a/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_fill_n/89164.cc
>  
> b/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_fill_n/89164.cc
> index 4b38c673d32..9b61157b934 100644
> --- 
> a/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_fill_n/89164.cc
> +++ 
> b/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_fill_n/89164.cc
> @@ -32,4 +32,5 @@ void test01()
>  
>    std::uninitialized_fill_n(p, 1, x); // { dg-error "here" }
>  }
> -// { dg-error "must be constructible" "" { target *-*-* } 0 }
> +// { dg-error "no matching function" "construct_at" { target c++20 } 0 }
> +// { dg-error "use of deleted function" "X(const X&)" { target *-*-* } 0 }
> diff --git 
> a/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_fill_n/sizes.cc
>  
> b/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_fill_n/sizes.cc
> index e2ba9355c56..876ec5443fb 100644
> --- 
> a/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_fill_n/sizes.cc
> +++ 
> b/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_fill_n/sizes.cc
> @@ -24,21 +24,35 @@ void
>  test01()
>  {
>    int i[4] = { };
> -  std::uninitialized_fill_n(i, 2.0001, 0xabcd);
> +  // Floating-point n should work, but only if it's an integer value.
> +  std::uninitialized_fill_n(i, 3.0, 0xabcd);
>    VERIFY( i[0] == 0xabcd );
>    VERIFY( i[1] == 0xabcd );
>    VERIFY( i[2] == 0xabcd );
>    VERIFY( i[3] == 0 );
>  }
>  
> -// The standard only requires that n>0 and --n are valid expressions.
> +// The standard only requires that `if (n--)` is a valid expression.
>  struct Size
>  {
>    int value;
>  
> -  void operator--() { --value; }
> +  struct testable
> +  {
> +#if __cplusplus >= 201103L
> +    explicit
> +#endif
> +    operator bool() const { return nonzero; }
>  
> -  int operator>(void*) { return value != 0; }
> +    bool nonzero;
> +  };
> +
> +  testable operator--(int)
> +  {
> +    testable t = { value != 0 };
> +    --value;
> +    return t;
> +  }
>  };
>  
>  void
> diff --git a/libstdc++-v3/testsuite/23_containers/vector/cons/89164.cc 
> b/libstdc++-v3/testsuite/23_containers/vector/cons/89164.cc
> index 106963ecbb9..36907dc508e 100644
> --- a/libstdc++-v3/testsuite/23_containers/vector/cons/89164.cc
> +++ b/libstdc++-v3/testsuite/23_containers/vector/cons/89164.cc
> @@ -32,7 +32,7 @@ void test01()
>    X x[1];
>    // Should not be able to create vector using uninitialized_copy:
>    std::vector<X> v1{x, x+1}; // { dg-error "here" "" { target c++17_down } }
> -  // { dg-error "deleted function 'X::X" "" { target c++20 } 0 }
> +  // { dg-error "deleted function 'X::X" "" { target *-*-* } 0 }
>  }
>  
>  void test02()
> @@ -41,8 +41,7 @@ void test02()
>  
>    // Should not be able to create vector using uninitialized_fill_n:
>    std::vector<Y> v2{2u, Y{}};        // { dg-error "here" "" { target 
> c++17_down } }
> -  // { dg-error "deleted function .*Y::Y" "" { target c++20 } 0 }
> +  // { dg-error "deleted function .*Y::Y" "" { target *-*-* } 0 }
>  }
>  
> -// { dg-error "must be constructible from input type" "" { target *-*-* } 0 }
>  // { dg-prune-output "construct_at" }
> diff --git a/libstdc++-v3/testsuite/23_containers/vector/cons/89164_c++17.cc 
> b/libstdc++-v3/testsuite/23_containers/vector/cons/89164_c++17.cc
> index 09d3dc6f93d..07d4bab9117 100644
> --- a/libstdc++-v3/testsuite/23_containers/vector/cons/89164_c++17.cc
> +++ b/libstdc++-v3/testsuite/23_containers/vector/cons/89164_c++17.cc
> @@ -32,8 +32,7 @@ void test03()
>    // Can create initializer_list<Y> with C++17 guaranteed copy elision,
>    // but shouldn't be able to copy from it with uninitialized_copy:
>    std::vector<X> v3{X{}, X{}, X{}};   // { dg-error "here" "" { target 
> c++17_only } }
> -  // { dg-error "deleted function .*X::X" "" { target c++20 } 0 }
> +  // { dg-error "deleted function .*X::X" "" { target *-*-* } 0 }
>  }
>  
> -// { dg-error "must be constructible from input type" "" { target *-*-* } 0 }
>  // { dg-prune-output "construct_at" }
> -- 
> 2.46.2
> 
>

Re: [PATCH 1/7] libstdc++: Refactor std::uninitialized_{copy, fill, fill_n} algos [PR68350]

Reply via email to