> On Nov 4, 2020, at 10:50 AM, Jonathan Wakely <jwak...@redhat.com> wrote:
>
> On 04/11/20 09:29 -0800, Thomas Rodgers wrote:
>> From: Thomas Rodgers <trodg...@redhat.com>
>>
>> Adds <barrier>
>>
>> libstdc++/ChangeLog:
>>
>> * include/Makefile.am (std_headers): Add new header.
>> * include/Makefile.in: Regenerate.
>> * include/std/barrier: New file.
>> * testsuite/30_thread/barrier/1.cc: New test.
>> * testsuite/30_thread/barrier/2.cc: Likewise.
>> * testsuite/30_thread/barrier/arrive_and_drop.cc: Likewise.
>> * testsuite/30_thread/barrier/arrive_and_wait.cc: Likewise.
>> * testsuite/30_thread/barrier/arrive.cc: Likewise.
>> * testsuite/30_thread/barrier/completion.cc: Likewise.
>> * testsuite/30_thread/barrier/max.cc: Likewise.
>> ---
>> libstdc++-v3/include/std/barrier | 248 ++++++++++++++++++
>> .../testsuite/30_threads/barrier/1.cc | 27 ++
>> .../testsuite/30_threads/barrier/2.cc | 27 ++
>> .../testsuite/30_threads/barrier/arrive.cc | 51 ++++
>> .../30_threads/barrier/arrive_and_drop.cc | 49 ++++
>> .../30_threads/barrier/arrive_and_wait.cc | 51 ++++
>> .../30_threads/barrier/completion.cc | 54 ++++
>> .../testsuite/30_threads/barrier/max.cc | 44 ++++
>> 8 files changed, 551 insertions(+)
>> create mode 100644 libstdc++-v3/include/std/barrier
>> create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/1.cc
>> create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/2.cc
>> create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/arrive.cc
>> create mode 100644
>> libstdc++-v3/testsuite/30_threads/barrier/arrive_and_drop.cc
>> create mode 100644
>> libstdc++-v3/testsuite/30_threads/barrier/arrive_and_wait.cc
>> create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/completion.cc
>> create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/max.cc
>>
>> diff --git a/libstdc++-v3/include/std/barrier
>> b/libstdc++-v3/include/std/barrier
>> new file mode 100644
>> index 00000000000..80e6d668cf5
>> --- /dev/null
>> +++ b/libstdc++-v3/include/std/barrier
>> @@ -0,0 +1,248 @@
>> +// <barrier> -*- C++ -*-
>> +
>> +// Copyright (C) 2020 Free Software Foundation, Inc.
>> +//
>> +// This file is part of the GNU ISO C++ Library. This library is free
>> +// software; you can redistribute it and/or modify it under the
>> +// terms of the GNU General Public License as published by the
>> +// Free Software Foundation; either version 3, or (at your option)
>> +// any later version.
>> +
>> +// This library is distributed in the hope that it will be useful,
>> +// but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>> +// GNU General Public License for more details.
>> +
>> +// You should have received a copy of the GNU General Public License along
>> +// with this library; see the file COPYING3. If not see
>> +// <http://www.gnu.org/licenses/>.
>> +
>> +// This implementation is based on libcxx/include/barrier
>> +//===-- barrier.h --------------------------------------------------===//
>> +//
>> +// Part of the LLVM Project, under the Apache License v2.0 with LLVM
>> Exceptions.
>> +// See https://llvm.org/LICENSE.txt for license information.
>> +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
>> +//
>> +//===---------------------------------------------------------------===//
>> +
>> +#ifndef _GLIBCXX_BARRIER
>> +#define _GLIBCXX_BARRIER 1
>> +
>> +#pragma GCC system_header
>> +
>> +#if __cplusplus > 201703L
>> +#define __cpp_lib_barrier 201907L
>
> This feature test macro will be defined unconditionally, even if
> _GLIBCXX_HAS_GTHREADS is not defined. It should be inside the check
> for gthreads.
>
> You're also missing an edit to <version> (which should depend on the
> same conditions).
>
>
>> +#include <bits/c++config.h>
>> +
>> +#if defined(_GLIBCXX_HAS_GTHREADS)
>> +#include <bits/atomic_base.h>
>> +#include <bits/atomic_wait.h>
>> +#include <bits/functional_hash.h>
>> +#include <ext/numeric_traits.h>
>> +
>> +#include <memory>
>> +
>> +namespace std _GLIBCXX_VISIBILITY(default)
>> +{
>> +_GLIBCXX_BEGIN_NAMESPACE_VERSION
>> +
>> + struct __empty_completion
>> + {
>> + _GLIBCXX_ALWAYS_INLINE void
>> + operator()() noexcept
>> + { }
>> + };
>> +
>> +/*
>> +
>> +The default implementation of __barrier_base is a classic tree barrier.
>> +
>> +It looks different from literature pseudocode for two main reasons:
>> + 1. Threads that call into std::barrier functions do not provide indices,
>> + so a numbering step is added before the actual barrier algorithm,
>> + appearing as an N+1 round to the N rounds of the tree barrier.
>> + 2. A great deal of attention has been paid to avoid cache line thrashing
>> + by flattening the tree structure into cache-line sized arrays, that
>> + are indexed in an efficient way.
>> +
>> +*/
>> +
>> + using __barrier_phase_t = uint8_t;
>
> Please add <stdint.h> or <cstdint> since you're using uint8_t
> (it's currently included by <bits/atomic_base.h> but that could
> change).
>
> Would it work to use a scoped enumeration type here instead? That
> would prevent people accidentally doing arithmetic on it, or passing
> it to functions taking an integer (and prevent it promoting to int in
> arithmetic).
>
> e.g. define it similar to std::byte:
>
> enum class __barrier_phase_t : unsigned char { };
>
> and then cast to an integer on the way in and the way out, so that the
> implementation works with its numeric value, but users have a
> non-arithmetic type that they can't do anything with.
>
>> +
>> + template<class _CompletionF>
>
> s/typename/class/
>
>> + class __barrier_base
>> + {
>> + struct alignas(64) /* naturally-align the heap state */ __state_t
>> + {
>> + struct
>> + {
>> + __atomic_base<__barrier_phase_t> __phase = ATOMIC_VAR_INIT(0);
>
> No need to use ATOMIC_VAR_INIT, we're not trying to be compatible with
> C, it can be just = {0} or similar.
>
>> + } __tickets[64];
>> + };
>> +
>> + ptrdiff_t _M_expected;
>> + unique_ptr<char[]> _M_state_allocation;
>> + __state_t* _M_state;
>> + __atomic_base<ptrdiff_t> _M_expected_adjustment;
>> + _CompletionF _M_completion;
>> + __atomic_base<__barrier_phase_t> _M_phase;
>
> Can these just use atomic instead of __atomic_base?
>
> The order of the members looks suboptimal. If alignof(_CompletionF) is
> less than alignof(atomic<__barrier_phase_t>) then you'll get padding
> bytes between them (or is that intentional, to avoid false sharing?)
>
> I would put _M_completion last, and add [[no_unique_address]] so it
> doesn't need to waste space if it's an empty type.
>
>> +
>> + static __gthread_t
>> + _S_get_tid() noexcept
>> + {
>> +#ifdef __GLIBC__
>> + // For the GNU C library pthread_self() is usable without linking to
>> + // libpthread.so but returns 0, so we cannot use it in single-threaded
>> + // programs, because this_thread::get_id() != thread::id{} must be true.
>> + // We know that pthread_t is an integral type in the GNU C library.
>> + if (!__gthread_active_p())
>> + return 1;
>> +#endif
>> + return __gthread_self();
>
> This is the third place we're doing this (and all of them need to be
> changed for recent versions of glibc) so we should put it in one
> place.
>
>> + }
>> +
>> + bool
>> + _M_arrive(__barrier_phase_t __old_phase)
>> + {
>> + __barrier_phase_t const __half_step = __old_phase + 1,
>> + __full_step = __old_phase + 2;
>> + size_t __current_expected = _M_expected;
>> + size_t __current = _Hash_impl::hash(_S_get_tid())
>> + % ((_M_expected + 1) >> 1);
>> + for (int __round = 0;; ++__round)
>> + {
>> + if (__current_expected <= 1)
>> + return true;
>> + size_t const __end_node = ((__current_expected + 1) >> 1),
>> + __last_node = __end_node - 1;
>> + for (;;++__current)
>
> A space after the semi-colons please.
>
>> + {
>> + if (__current == __end_node)
>> + __current = 0;
>> + __barrier_phase_t __expect = __old_phase;
>> + if (__current == __last_node && (__current_expected & 1))
>> + {
>> + if (_M_state[__current].__tickets[__round].__phase
>> + .compare_exchange_strong(__expect, __full_step,
>> + memory_order_acq_rel))
>> + break; // I'm 1 in 1, go to next __round
>> + }
>> + else if (_M_state[__current].__tickets[__round].__phase
>> + .compare_exchange_strong(__expect, __half_step,
>> + memory_order_acq_rel))
>> + {
>> + return false; // I'm 1 in 2, done with arrival
>> + }
>> + else if (__expect == __half_step)
>> + {
>> + if (_M_state[__current].__tickets[__round].__phase
>> + .compare_exchange_strong(__expect, __full_step,
>> + memory_order_acq_rel))
>> + break; // I'm 2 in 2, go to next __round
>> + }
>> + }
>> + __current_expected = __last_node + 1;
>> + __current >>= 1;
>> + }
>> + }
>> +
>> + public:
>> + using arrival_token = __barrier_phase_t;
>> +
>> + static constexpr ptrdiff_t
>> + max() noexcept
>> + { return __gnu_cxx::__numeric_traits<ptrdiff_t>::__max; }
>
> s/__numeric_traits/__int_traits/
>
> Or just use __PTRDIFF_MAX__ which doesn't require any template
> instantiation and is pre-defined by the compiler.
>
>> +
>> + __barrier_base(ptrdiff_t __expected, _CompletionF __completion =
>> _CompletionF())
>
> This line is too long.
>
> As the ctor can be called with a single argument it should be
> 'explicit', although I don't think it needs to be callable with a
> single argument. It's always going to be passed two arguments by the
> derived class, so it doesn't need the default argument for
> __completion (which solves the line length problem!)
>
>
>> + : _M_expected(__expected), _M_expected_adjustment(0),
>> + _M_completion(move(__completion)), _M_phase(0)
>> + {
>> + size_t const __count = (_M_expected + 1) >> 1;
>> + size_t const __size = sizeof(__state_t) * __count;
>> + size_t __allocation_size = __size + alignof(__state_t);
>> + _M_state_allocation = unique_ptr<char[]>(new char[__allocation_size]);
>> + void* __allocation = _M_state_allocation.get();
>> + void* const __state = align(alignof(__state_t), __size, __allocation,
>> + __allocation_size);
>> + _M_state = new (__state) __state_t[__count];
>
> This looks like a workaround for not having aligned-new. I think you
> can get rid of _M_state_allocation and just do:
>
> #ifndef __cpp_aligned_new
> # error NO BARRIER FOR YOU
> #endif
> _M_state = std::make_unique<__state_t[]>(__count);
>
> Probably a bit more user-friendly to add a check for __cpp_aligned_new
> to the #if at the top of the file though.
>
> Aligned new is enabled by default in C++20. If anybody wants to
> compile with -fno-aligned-new then they don't get nice things.
>
>
>> + }
>> +
>> + [[nodiscard]] arrival_token
>> + arrive(ptrdiff_t __update)
>> + {
>> + auto const __old_phase = _M_phase.load(memory_order_relaxed);
>> + for(; __update; --__update)
>> + {
>> + if(_M_arrive(__old_phase))
>> + {
>> + _M_completion();
>> + _M_expected +=
>> _M_expected_adjustment.load(memory_order_relaxed);
>> + _M_expected_adjustment.store(0, memory_order_relaxed);
>> + _M_phase.store(__old_phase + 2, memory_order_release);
>> + _M_phase.notify_all();
>> + }
>> + }
>> + return __old_phase;
>> + }
>> +
>> + void
>> + wait(arrival_token&& __old_phase) const
>> + {
>> + auto const __test_fn = [=, this]
>> + {
>> + return _M_phase.load(memory_order_acquire) != __old_phase;
>> + };
>> + _M_phase._M_wait(__old_phase, __test_fn, memory_order_acquire);
>> + }
>> +
>> + void
>> + arrive_and_drop()
>> + {
>> + _M_expected_adjustment.fetch_sub(1, memory_order_relaxed);
>> + (void)arrive(1);
>> + }
>> + };
>> +
>> + template<class _CompletionF = __empty_completion>
>
> s/class/typename/
>
>> + class barrier
>> + {
>> + __barrier_base<_CompletionF> _M_b;
>
> New line after this member declaration.
>
> Why is it called _base if it's a member? Why is the impl in a separate
> class anyway? It just seems to add a level of indirection for no real
> benefit.
>
In the original implementation there are two discrete barrier algorithm
implementations, the tree barrier algorithm here, and a “central barrier”
algorithm (which also has two variants based on whether or not the completion
function is empty), and what is __barrier_base here is split into two parts
__barrier_algorithm_base and __barrier_base, the former existing as the ABI
stable interface to the underlying algorithm and the latter being the shared
bits which depend on the type of _CompletionF. I left most of the structure
intact but took only the tree barrier algorithm, which will scale better for
larger core/socket counts, but it’s conceivable we may want to eventually add
the central barrier algorithm back which is a better choice for small
core-count targets.
>> + public:
>> + using arrival_token = typename
>> __barrier_base<_CompletionF>::arrival_token;
>> +
>> + static constexpr ptrdiff_t
>> + max() noexcept
>> + { return __barrier_base<_CompletionF>::max(); }
>> +
>> + barrier(ptrdiff_t __count, _CompletionF __completion = _CompletionF())
>
> This is missing 'explicit'
>
>> + : _M_b(__count, std::move(__completion))
>> + { }
>> +
>> + barrier(barrier const&) = delete;
>> + barrier& operator=(barrier const&) = delete;
>> +
>> + [[nodiscard]] arrival_token
>> + arrive(ptrdiff_t __update = 1)
>> + { return _M_b.arrive(__update); }
>> +
>> + void
>> + wait(arrival_token&& __phase) const
>> + { _M_b.wait(std::move(__phase)); }
>> +
>> + void
>> + arrive_and_wait()
>> + { wait(arrive()); }
>> +
>> + void
>> + arrive_and_drop()
>> + { _M_b.arrive_and_drop(); }
>> + };
>> +
>> +_GLIBCXX_END_NAMESPACE_VERSION
>> +} // namespace
>> +#endif // _GLIBCXX_HAS_GTHREADS
>> +#endif // __cplusplus > 201703L
>> +#endif // _GLIBCXX_BARRIER
>> +
>> diff --git a/libstdc++-v3/testsuite/30_threads/barrier/1.cc
>> b/libstdc++-v3/testsuite/30_threads/barrier/1.cc
>> new file mode 100644
>> index 00000000000..0b38160a58b
>> --- /dev/null
>> +++ b/libstdc++-v3/testsuite/30_threads/barrier/1.cc
>> @@ -0,0 +1,27 @@
>> +// Copyright (C) 2020 Free Software Foundation, Inc.
>> +//
>> +// This file is part of the GNU ISO C++ Library. This library is free
>> +// software; you can redistribute it and/or modify it under the
>> +// terms of the GNU General Public License as published by the
>> +// Free Software Foundation; either version 3, or (at your option)
>> +// any later version.
>> +
>> +// This library is distributed in the hope that it will be useful,
>> +// but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>> +// GNU General Public License for more details.
>> +
>> +// You should have received a copy of the GNU General Public License along
>> +// with this library; see the file COPYING3. If not see
>> +// <http://www.gnu.org/licenses/>.
>> +
>> +// { dg-options "-std=gnu++2a" }
>> +// { dg-do compile { target c++2a } }
>> +
>> +#include <barrier>
>> +
>> +#ifndef __cpp_lib_barrier
>> +# error "Feature-test macro for barrier missing in <barrier>"
>> +#elif __cpp_lib_barrier != 201907L
>> +# error "Feature-test macro for barrier has wrong value in <barrier>"
>> +#endif
>> diff --git a/libstdc++-v3/testsuite/30_threads/barrier/2.cc
>> b/libstdc++-v3/testsuite/30_threads/barrier/2.cc
>> new file mode 100644
>> index 00000000000..1d8d83639e0
>> --- /dev/null
>> +++ b/libstdc++-v3/testsuite/30_threads/barrier/2.cc
>> @@ -0,0 +1,27 @@
>> +// Copyright (C) 2019-2020 Free Software Foundation, Inc.
>
> Just 2020 here
>
>> +// This file is part of the GNU ISO C++ Library. This library is free
>> +// software; you can redistribute it and/or modify it under the
>> +// terms of the GNU General Public License as published by the
>> +// Free Software Foundation; either version 3, or (at your option)
>> +// any later version.
>> +
>> +// This library is distributed in the hope that it will be useful,
>> +// but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>> +// GNU General Public License for more details.
>> +
>> +// You should have received a copy of the GNU General Public License along
>> +// with this library; see the file COPYING3. If not see
>> +// <http://www.gnu.org/licenses/>.
>> +
>> +// { dg-options "-std=gnu++2a" }
>> +// { dg-do compile { target c++2a } }
>
> Once the macro is only defined inside the _GLIBCXX_HAS_GTHREADS guard
> this test should { dg-require-gthreads "" }.
>
> Now that I think about it, we should probably have negative tests that
> check that the feature test macro *isn't* defined when we don't have a
> gthreads target. But we aren't doing the anywhere else either.
>
>
>> +
>> +#include <version>
>> +
>> +#ifndef __cpp_lib_barrier
>> +# error "Feature-test macro for barrier missing in <version>"
>> +#elif __cpp_lib_barrier != 201907L
>> +# error "Feature-test macro for barrier has wrong value in <version>"
>> +#endif
>
> This is going to currently fail, right? The patch doesn't add anything
> to <version>.
>
>> diff --git a/libstdc++-v3/testsuite/30_threads/barrier/arrive.cc
>> b/libstdc++-v3/testsuite/30_threads/barrier/arrive.cc
>> new file mode 100644
>> index 00000000000..c128d5ea794
>> --- /dev/null
>> +++ b/libstdc++-v3/testsuite/30_threads/barrier/arrive.cc
>> @@ -0,0 +1,51 @@
>> +// { dg-options "-std=gnu++2a -pthread" }
>> +// { dg-do run { target c++2a } }
>> +// { dg-require-effective-target pthread }
>> +// { dg-require-gthreads "" }
>
> The directives should be:
>
> // { dg-options "-std=gnu++2a" }
> // { dg-do run { target c++2a } }
> // { dg-require-gthreads "" }
> // { dg-additional-options "-pthread" { target pthread } }
>
> i.e. don't require pthreads, only add -pthread for pthread targets.
>
> This allows it to work for targets that have a non-pthreads
> implementation of the gthreads abstraction (e.g. vxworks).
>
> Similarly for the remaining tests.
>
>> +
>> +// Copyright (C) 2020 Free Software Foundation, Inc.
>> +//
>> +// This file is part of the GNU ISO C++ Library. This library is free
>> +// software; you can redistribute it and/or modify it under the
>> +// terms of the GNU General Public License as published by the
>> +// Free Software Foundation; either version 3, or (at your option)
>> +// any later version.
>> +
>> +// This library is distributed in the hope that it will be useful,
>> +// but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>> +// GNU General Public License for more details.
>> +
>> +// You should have received a copy of the GNU General Public License along
>> +// with this library; see the file COPYING3. If not see
>> +// <http://www.gnu.org/licenses/>.
>> +
>> +// This implementation is based on
>> libcxx/test/std/thread/thread.barrier/arrive.pass.cpp
>> +//===----------------------------------------------------------------------===//
>> +//
>> +// Part of the LLVM Project, under the Apache License v2.0 with LLVM
>> Exceptions.
>> +// See https://llvm.org/LICENSE.txt for license information.
>> +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
>> +//
>> +//===----------------------------------------------------------------------===//
>> +
>> +#include <barrier>
>> +#include <thread>
>> +
>> +#include <testsuite_hooks.h>
>
> No need for this header if you don't use VERIFY.
>
>> +int main(int, char**)
>
> int main()
>
>> +{
>> + std::barrier<> b(2);
>> +
>> + auto tok = b.arrive();
>> + std::thread t([&](){
>> + (void)b.arrive();
>> + });
>> + b.wait(std::move(tok));
>> + t.join();
>> +
>> + auto tok2 = b.arrive(2);
>> + b.wait(std::move(tok2));
>> + return 0;
>
> No need for the return statement.
>
> And similarly for the remaining tests.