Re: [PATCH] libstdc++: Add support for C++20 barriers

Thomas Rodgers Wed, 04 Nov 2020 11:44:19 -0800

> On Nov 4, 2020, at 10:50 AM, Jonathan Wakely <jwak...@redhat.com> wrote:
> 
> On 04/11/20 09:29 -0800, Thomas Rodgers wrote:
>> From: Thomas Rodgers <trodg...@redhat.com>
>> 
>> Adds <barrier>
>> 
>> libstdc++/ChangeLog:
>> 
>>      * include/Makefile.am (std_headers): Add new header.
>>      * include/Makefile.in: Regenerate.
>>      * include/std/barrier: New file.
>>      * testsuite/30_thread/barrier/1.cc: New test.
>>      * testsuite/30_thread/barrier/2.cc: Likewise.
>>      * testsuite/30_thread/barrier/arrive_and_drop.cc: Likewise.
>>      * testsuite/30_thread/barrier/arrive_and_wait.cc: Likewise.
>>      * testsuite/30_thread/barrier/arrive.cc: Likewise.
>>      * testsuite/30_thread/barrier/completion.cc: Likewise.
>>      * testsuite/30_thread/barrier/max.cc: Likewise.
>> ---
>> libstdc++-v3/include/std/barrier              | 248 ++++++++++++++++++
>> .../testsuite/30_threads/barrier/1.cc         |  27 ++
>> .../testsuite/30_threads/barrier/2.cc         |  27 ++
>> .../testsuite/30_threads/barrier/arrive.cc    |  51 ++++
>> .../30_threads/barrier/arrive_and_drop.cc     |  49 ++++
>> .../30_threads/barrier/arrive_and_wait.cc     |  51 ++++
>> .../30_threads/barrier/completion.cc          |  54 ++++
>> .../testsuite/30_threads/barrier/max.cc       |  44 ++++
>> 8 files changed, 551 insertions(+)
>> create mode 100644 libstdc++-v3/include/std/barrier
>> create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/1.cc
>> create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/2.cc
>> create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/arrive.cc
>> create mode 100644 
>> libstdc++-v3/testsuite/30_threads/barrier/arrive_and_drop.cc
>> create mode 100644 
>> libstdc++-v3/testsuite/30_threads/barrier/arrive_and_wait.cc
>> create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/completion.cc
>> create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/max.cc
>> 
>> diff --git a/libstdc++-v3/include/std/barrier 
>> b/libstdc++-v3/include/std/barrier
>> new file mode 100644
>> index 00000000000..80e6d668cf5
>> --- /dev/null
>> +++ b/libstdc++-v3/include/std/barrier
>> @@ -0,0 +1,248 @@
>> +// <barrier> -*- C++ -*-
>> +
>> +// Copyright (C) 2020 Free Software Foundation, Inc.
>> +//
>> +// This file is part of the GNU ISO C++ Library.  This library is free
>> +// software; you can redistribute it and/or modify it under the
>> +// terms of the GNU General Public License as published by the
>> +// Free Software Foundation; either version 3, or (at your option)
>> +// any later version.
>> +
>> +// This library is distributed in the hope that it will be useful,
>> +// but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +// GNU General Public License for more details.
>> +
>> +// You should have received a copy of the GNU General Public License along
>> +// with this library; see the file COPYING3.  If not see
>> +// <http://www.gnu.org/licenses/>.
>> +
>> +// This implementation is based on libcxx/include/barrier
>> +//===-- barrier.h --------------------------------------------------===//
>> +//
>> +// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
>> Exceptions.
>> +// See https://llvm.org/LICENSE.txt for license information.
>> +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
>> +//
>> +//===---------------------------------------------------------------===//
>> +
>> +#ifndef _GLIBCXX_BARRIER
>> +#define _GLIBCXX_BARRIER 1
>> +
>> +#pragma GCC system_header
>> +
>> +#if __cplusplus > 201703L
>> +#define __cpp_lib_barrier 201907L
> 
> This feature test macro will be defined unconditionally, even if
> _GLIBCXX_HAS_GTHREADS is not defined. It should be inside the check
> for gthreads.
> 
> You're also missing an edit to <version> (which should depend on the
> same conditions).
> 
> 
>> +#include <bits/c++config.h>
>> +
>> +#if defined(_GLIBCXX_HAS_GTHREADS)
>> +#include <bits/atomic_base.h>
>> +#include <bits/atomic_wait.h>
>> +#include <bits/functional_hash.h>
>> +#include <ext/numeric_traits.h>
>> +
>> +#include <memory>
>> +
>> +namespace std _GLIBCXX_VISIBILITY(default)
>> +{
>> +_GLIBCXX_BEGIN_NAMESPACE_VERSION
>> +
>> +  struct __empty_completion
>> +  {
>> +    _GLIBCXX_ALWAYS_INLINE void
>> +    operator()() noexcept
>> +    { }
>> +  };
>> +
>> +/*
>> +
>> +The default implementation of __barrier_base is a classic tree barrier.
>> +
>> +It looks different from literature pseudocode for two main reasons:
>> + 1. Threads that call into std::barrier functions do not provide indices,
>> +    so a numbering step is added before the actual barrier algorithm,
>> +    appearing as an N+1 round to the N rounds of the tree barrier.
>> + 2. A great deal of attention has been paid to avoid cache line thrashing
>> +    by flattening the tree structure into cache-line sized arrays, that
>> +    are indexed in an efficient way.
>> +
>> +*/
>> +
>> +  using __barrier_phase_t = uint8_t;
> 
> Please add <stdint.h> or <cstdint> since you're using uint8_t
> (it's currently included by <bits/atomic_base.h> but that could
> change).
> 
> Would it work to use a scoped enumeration type here instead? That
> would prevent people accidentally doing arithmetic on it, or passing
> it to functions taking an integer (and prevent it promoting to int in
> arithmetic).
> 
> e.g. define it similar to std::byte:
> 
> enum class __barrier_phase_t : unsigned char { };
> 
> and then cast to an integer on the way in and the way out, so that the
> implementation works with its numeric value, but users have a
> non-arithmetic type that they can't do anything with.
> 
>> +
>> +  template<class _CompletionF>
> 
> s/typename/class/
> 
>> +    class __barrier_base
>> +    {
>> +      struct alignas(64) /* naturally-align the heap state */ __state_t
>> +      {
>> +    struct
>> +    {
>> +      __atomic_base<__barrier_phase_t> __phase = ATOMIC_VAR_INIT(0);
> 
> No need to use ATOMIC_VAR_INIT, we're not trying to be compatible with
> C, it can be just = {0} or similar.
> 
>> +    } __tickets[64];
>> +      };
>> +
>> +      ptrdiff_t _M_expected;
>> +      unique_ptr<char[]> _M_state_allocation;
>> +      __state_t*            _M_state;
>> +      __atomic_base<ptrdiff_t> _M_expected_adjustment;
>> +      _CompletionF _M_completion;
>> +      __atomic_base<__barrier_phase_t> _M_phase;
> 
> Can these just use atomic instead of __atomic_base?
> 
> The order of the members looks suboptimal. If alignof(_CompletionF) is
> less than alignof(atomic<__barrier_phase_t>) then you'll get padding
> bytes between them (or is that intentional, to avoid false sharing?)
> 
> I would put _M_completion last, and add [[no_unique_address]] so it
> doesn't need to waste space if it's an empty type.
> 
>> +
>> +      static __gthread_t
>> +      _S_get_tid() noexcept
>> +      {
>> +#ifdef __GLIBC__
>> +    // For the GNU C library pthread_self() is usable without linking to
>> +    // libpthread.so but returns 0, so we cannot use it in single-threaded
>> +    // programs, because this_thread::get_id() != thread::id{} must be true.
>> +    // We know that pthread_t is an integral type in the GNU C library.
>> +    if (!__gthread_active_p())
>> +      return 1;
>> +#endif
>> +    return __gthread_self();
> 
> This is the third place we're doing this (and all of them need to be
> changed for recent versions of glibc) so we should put it in one
> place.
> 
>> +      }
>> +
>> +      bool
>> +      _M_arrive(__barrier_phase_t __old_phase)
>> +      {
>> +    __barrier_phase_t const __half_step = __old_phase + 1,
>> +                            __full_step = __old_phase + 2;
>> +    size_t __current_expected = _M_expected;
>> +    size_t __current = _Hash_impl::hash(_S_get_tid())
>> +                            % ((_M_expected + 1) >> 1);
>> +    for (int __round = 0;; ++__round)
>> +      {
>> +        if (__current_expected <= 1)
>> +            return true;
>> +        size_t const __end_node = ((__current_expected + 1) >> 1),
>> +                     __last_node = __end_node - 1;
>> +        for (;;++__current)
> 
> A space after the semi-colons please.
> 
>> +          {
>> +            if (__current == __end_node)
>> +              __current = 0;
>> +            __barrier_phase_t __expect = __old_phase;
>> +            if (__current == __last_node && (__current_expected & 1))
>> +              {
>> +                if (_M_state[__current].__tickets[__round].__phase
>> +                    .compare_exchange_strong(__expect, __full_step,
>> +                                             memory_order_acq_rel))
>> +                  break;     // I'm 1 in 1, go to next __round
>> +              }
>> +            else if (_M_state[__current].__tickets[__round].__phase
>> +                     .compare_exchange_strong(__expect, __half_step,
>> +                                              memory_order_acq_rel))
>> +              {
>> +                return false; // I'm 1 in 2, done with arrival
>> +              }
>> +            else if (__expect == __half_step)
>> +              {
>> +                if (_M_state[__current].__tickets[__round].__phase
>> +                    .compare_exchange_strong(__expect, __full_step,
>> +                                             memory_order_acq_rel))
>> +                  break;    // I'm 2 in 2, go to next __round
>> +              }
>> +          }
>> +        __current_expected = __last_node + 1;
>> +        __current >>= 1;
>> +      }
>> +      }
>> +
>> +    public:
>> +      using arrival_token = __barrier_phase_t;
>> +
>> +      static constexpr ptrdiff_t
>> +      max() noexcept
>> +      { return __gnu_cxx::__numeric_traits<ptrdiff_t>::__max; }
> 
> s/__numeric_traits/__int_traits/
> 
> Or just use __PTRDIFF_MAX__ which doesn't require any template
> instantiation and is pre-defined by the compiler.
> 
>> +
>> +      __barrier_base(ptrdiff_t __expected, _CompletionF __completion = 
>> _CompletionF())
> 
> This line is too long.
> 
> As the ctor can be called with a single argument it should be
> 'explicit', although I don't think it needs to be callable with a
> single argument. It's always going to be passed two arguments by the
> derived class, so it doesn't need the default argument for
> __completion (which solves the line length problem!)
> 
> 
>> +      : _M_expected(__expected), _M_expected_adjustment(0),
>> +        _M_completion(move(__completion)), _M_phase(0)
>> +      {
>> +    size_t const __count = (_M_expected + 1) >> 1;
>> +    size_t const __size = sizeof(__state_t) * __count;
>> +    size_t __allocation_size = __size + alignof(__state_t);
>> +    _M_state_allocation = unique_ptr<char[]>(new char[__allocation_size]);
>> +    void* __allocation = _M_state_allocation.get();
>> +    void* const __state = align(alignof(__state_t), __size, __allocation,
>> +                                  __allocation_size);
>> +    _M_state = new (__state) __state_t[__count];
> 
> This looks like a workaround for not having aligned-new. I think you
> can get rid of _M_state_allocation and just do:
> 
> #ifndef __cpp_aligned_new
> # error NO BARRIER FOR YOU
> #endif
>        _M_state = std::make_unique<__state_t[]>(__count);
> 
> Probably a bit more user-friendly to add a check for __cpp_aligned_new
> to the #if at the top of the file though.
> 
> Aligned new is enabled by default in C++20. If anybody wants to
> compile with -fno-aligned-new then they don't get nice things.
> 
> 
>> +      }
>> +
>> +      [[nodiscard]] arrival_token
>> +      arrive(ptrdiff_t __update)
>> +      {
>> +    auto const __old_phase = _M_phase.load(memory_order_relaxed);
>> +    for(; __update; --__update)
>> +      {
>> +        if(_M_arrive(__old_phase))
>> +          {
>> +            _M_completion();
>> +            _M_expected += 
>> _M_expected_adjustment.load(memory_order_relaxed);
>> +            _M_expected_adjustment.store(0, memory_order_relaxed);
>> +            _M_phase.store(__old_phase + 2, memory_order_release);
>> +            _M_phase.notify_all();
>> +          }
>> +      }
>> +    return __old_phase;
>> +      }
>> +
>> +      void
>> +      wait(arrival_token&& __old_phase) const
>> +      {
>> +    auto const __test_fn = [=, this]
>> +      {
>> +        return _M_phase.load(memory_order_acquire) != __old_phase;
>> +      };
>> +    _M_phase._M_wait(__old_phase, __test_fn, memory_order_acquire);
>> +      }
>> +
>> +      void
>> +      arrive_and_drop()
>> +      {
>> +    _M_expected_adjustment.fetch_sub(1, memory_order_relaxed);
>> +    (void)arrive(1);
>> +      }
>> +    };
>> +
>> +  template<class _CompletionF = __empty_completion>
> 
> s/class/typename/
> 
>> +    class barrier
>> +    {
>> +      __barrier_base<_CompletionF> _M_b;
> 
> New line after this member declaration.
> 
> Why is it called _base if it's a member? Why is the impl in a separate
> class anyway? It just seems to add a level of indirection for no real
> benefit.
> 

In the original implementation there are two discrete barrier algorithm 
implementations, the tree barrier algorithm here, and a “central barrier” 
algorithm (which also has two variants based on whether or not the completion 
function is empty), and what is __barrier_base here is split into two parts 
__barrier_algorithm_base and __barrier_base, the former existing as the ABI 
stable interface to the underlying algorithm and the latter being the shared 
bits which depend on the type of _CompletionF. I left most of the structure 
intact but took only the tree barrier algorithm, which will scale better for 
larger core/socket counts, but it’s conceivable we may want to eventually add 
the central barrier algorithm back which is a better choice for small 
core-count targets.  

>> +    public:
>> +      using arrival_token = typename 
>> __barrier_base<_CompletionF>::arrival_token;
>> +
>> +      static constexpr ptrdiff_t
>> +      max() noexcept
>> +      { return __barrier_base<_CompletionF>::max(); }
>> +
>> +      barrier(ptrdiff_t __count, _CompletionF __completion = _CompletionF())
> 
> This is missing 'explicit'
> 
>> +      : _M_b(__count, std::move(__completion))
>> +      { }
>> +
>> +      barrier(barrier const&) = delete;
>> +      barrier& operator=(barrier const&) = delete;
>> +
>> +      [[nodiscard]] arrival_token
>> +      arrive(ptrdiff_t __update = 1)
>> +      { return _M_b.arrive(__update); }
>> +
>> +      void
>> +      wait(arrival_token&& __phase) const
>> +      { _M_b.wait(std::move(__phase)); }
>> +
>> +      void
>> +      arrive_and_wait()
>> +      { wait(arrive()); }
>> +
>> +      void
>> +      arrive_and_drop()
>> +      { _M_b.arrive_and_drop(); }
>> +    };
>> +
>> +_GLIBCXX_END_NAMESPACE_VERSION
>> +} // namespace
>> +#endif // _GLIBCXX_HAS_GTHREADS
>> +#endif // __cplusplus > 201703L
>> +#endif // _GLIBCXX_BARRIER
>> +
>> diff --git a/libstdc++-v3/testsuite/30_threads/barrier/1.cc 
>> b/libstdc++-v3/testsuite/30_threads/barrier/1.cc
>> new file mode 100644
>> index 00000000000..0b38160a58b
>> --- /dev/null
>> +++ b/libstdc++-v3/testsuite/30_threads/barrier/1.cc
>> @@ -0,0 +1,27 @@
>> +// Copyright (C) 2020 Free Software Foundation, Inc.
>> +//
>> +// This file is part of the GNU ISO C++ Library.  This library is free
>> +// software; you can redistribute it and/or modify it under the
>> +// terms of the GNU General Public License as published by the
>> +// Free Software Foundation; either version 3, or (at your option)
>> +// any later version.
>> +
>> +// This library is distributed in the hope that it will be useful,
>> +// but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +// GNU General Public License for more details.
>> +
>> +// You should have received a copy of the GNU General Public License along
>> +// with this library; see the file COPYING3.  If not see
>> +// <http://www.gnu.org/licenses/>.
>> +
>> +// { dg-options "-std=gnu++2a" }
>> +// { dg-do compile { target c++2a } }
>> +
>> +#include <barrier>
>> +
>> +#ifndef __cpp_lib_barrier
>> +# error "Feature-test macro for barrier missing in <barrier>"
>> +#elif __cpp_lib_barrier != 201907L
>> +# error "Feature-test macro for barrier has wrong value in <barrier>"
>> +#endif
>> diff --git a/libstdc++-v3/testsuite/30_threads/barrier/2.cc 
>> b/libstdc++-v3/testsuite/30_threads/barrier/2.cc
>> new file mode 100644
>> index 00000000000..1d8d83639e0
>> --- /dev/null
>> +++ b/libstdc++-v3/testsuite/30_threads/barrier/2.cc
>> @@ -0,0 +1,27 @@
>> +// Copyright (C) 2019-2020 Free Software Foundation, Inc.
> 
> Just 2020 here
> 
>> +// This file is part of the GNU ISO C++ Library.  This library is free
>> +// software; you can redistribute it and/or modify it under the
>> +// terms of the GNU General Public License as published by the
>> +// Free Software Foundation; either version 3, or (at your option)
>> +// any later version.
>> +
>> +// This library is distributed in the hope that it will be useful,
>> +// but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +// GNU General Public License for more details.
>> +
>> +// You should have received a copy of the GNU General Public License along
>> +// with this library; see the file COPYING3.  If not see
>> +// <http://www.gnu.org/licenses/>.
>> +
>> +// { dg-options "-std=gnu++2a" }
>> +// { dg-do compile { target c++2a } }
> 
> Once the macro is only defined inside the _GLIBCXX_HAS_GTHREADS guard
> this test should { dg-require-gthreads "" }.
> 
> Now that I think about it, we should probably have negative tests that
> check that the feature test macro *isn't* defined when we don't have a
> gthreads target. But we aren't doing the anywhere else either.
> 
> 
>> +
>> +#include <version>
>> +
>> +#ifndef __cpp_lib_barrier
>> +# error "Feature-test macro for barrier missing in <version>"
>> +#elif __cpp_lib_barrier != 201907L
>> +# error "Feature-test macro for barrier has wrong value in <version>"
>> +#endif
> 
> This is going to currently fail, right? The patch doesn't add anything
> to <version>.
> 
>> diff --git a/libstdc++-v3/testsuite/30_threads/barrier/arrive.cc 
>> b/libstdc++-v3/testsuite/30_threads/barrier/arrive.cc
>> new file mode 100644
>> index 00000000000..c128d5ea794
>> --- /dev/null
>> +++ b/libstdc++-v3/testsuite/30_threads/barrier/arrive.cc
>> @@ -0,0 +1,51 @@
>> +// { dg-options "-std=gnu++2a -pthread" }
>> +// { dg-do run { target c++2a } }
>> +// { dg-require-effective-target pthread }
>> +// { dg-require-gthreads "" }
> 
> The directives should be:
> 
> // { dg-options "-std=gnu++2a" }
> // { dg-do run { target c++2a } }
> // { dg-require-gthreads "" }
> // { dg-additional-options "-pthread" { target pthread } }
> 
> i.e. don't require pthreads, only add -pthread for pthread targets.
> 
> This allows it to work for targets that have a non-pthreads
> implementation of the gthreads abstraction (e.g. vxworks).
> 
> Similarly for the remaining tests.
> 
>> +
>> +// Copyright (C) 2020 Free Software Foundation, Inc.
>> +//
>> +// This file is part of the GNU ISO C++ Library.  This library is free
>> +// software; you can redistribute it and/or modify it under the
>> +// terms of the GNU General Public License as published by the
>> +// Free Software Foundation; either version 3, or (at your option)
>> +// any later version.
>> +
>> +// This library is distributed in the hope that it will be useful,
>> +// but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +// GNU General Public License for more details.
>> +
>> +// You should have received a copy of the GNU General Public License along
>> +// with this library; see the file COPYING3.  If not see
>> +// <http://www.gnu.org/licenses/>.
>> +
>> +// This implementation is based on 
>> libcxx/test/std/thread/thread.barrier/arrive.pass.cpp
>> +//===----------------------------------------------------------------------===//
>> +//
>> +// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
>> Exceptions.
>> +// See https://llvm.org/LICENSE.txt for license information.
>> +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
>> +//
>> +//===----------------------------------------------------------------------===//
>> +
>> +#include <barrier>
>> +#include <thread>
>> +
>> +#include <testsuite_hooks.h>
> 
> No need for this header if you don't use VERIFY.
> 
>> +int main(int, char**)
> 
> int main()
> 
>> +{
>> +  std::barrier<> b(2);
>> +
>> +  auto tok = b.arrive();
>> +  std::thread t([&](){
>> +    (void)b.arrive();
>> +  });
>> +  b.wait(std::move(tok));
>> +  t.join();
>> +
>> +  auto tok2 = b.arrive(2);
>> +  b.wait(std::move(tok2));
>> +  return 0;
> 
> No need for the return statement.
> 
> And similarly for the remaining tests.
Re: [PATCH] libstdc++: Add support for C++20 barriers

Reply via email to