On Fri, 13 Jun 2025, Tomasz Kaminski wrote: > > > On Thu, Jun 12, 2025 at 9:05 PM Patrick Palka <ppa...@redhat.com> wrote: > On Thu, 12 Jun 2025, Patrick Palka wrote: > > > On Thu, 12 Jun 2025, Jonathan Wakely wrote: > > > > > > > > > > > On Thu, 12 Jun 2025, 16:56 Patrick Palka, <ppa...@redhat.com> wrote: > > > Tested on x86_64-pc-linux-gnu, does this look OK for trunk? > > > > > > I'm sure if introducing a new overload is preferable to > > > introducing a constexpr if branch in the existing overload. > > > > > > > > > I think a constexpr if branch in the existing functions is cheaper. > We need to check the traits either way, but we can avoid doing overload > resolution. > > > > Ack. I opted to just define specialized functors for these helpers > > instead of using lambdas. That way we can easily optimize the case > > where only one of the functions is stateless. > > > > Changes in v2: > > - Use a functor utilizing [[no_unique_address]] instead of a lambda, > > to also concisely handle the case where only one of the functino > > object is stateless. In turn check is_copy_xible instead of > > is_default_xible. > > > > -- >8 -- > > > > When creating a composite comparator/predicate that invokes a given > > projection function, we don't need to capture a stateless function > > object by reference, instead we can capture it by value and use > > [[no_unique_address]] to elide its storage. This makes using > > __make_comp_proj zero-cost in the common case where the comparator or > > projection (or both) are stateless. > > > > libstdc++-v3/ChangeLog: > > > > * include/bits/ranges_algo.h (__detail::_Comp_proj): New. > > (__detail::__make_comp_proj): Use it instead. > > (__detail::_Pred_proj): New. > > (__detail::__make_pred_proj): Use it instead. > > --- > > libstdc++-v3/include/bits/ranges_algo.h | 70 > +++++++++++++++++++------ > > 1 file changed, 54 insertions(+), 16 deletions(-) > > > > diff --git a/libstdc++-v3/include/bits/ranges_algo.h > b/libstdc++-v3/include/bits/ranges_algo.h > > index 9f94c6b1d57e..ce48ca047b69 100644 > > --- a/libstdc++-v3/include/bits/ranges_algo.h > > +++ b/libstdc++-v3/include/bits/ranges_algo.h > > @@ -49,27 +49,65 @@ namespace ranges > > namespace __detail > > { > > template<typename _Comp, typename _Proj> > > - constexpr auto > > + struct _Comp_proj > > + { > > + [[no_unique_address]] > > + __conditional_t<is_empty_v<_Comp> && > is_copy_constructible_v<_Comp>, > > + _Comp, _Comp&> _M_comp; > > Comparators, predicates and projections in ranges algorithms are already > constrained to by copyable, so checking is_copy_constructible_v here > is redundant. > > -- >8 -- > > Subject: [PATCH] libstdc++: Optimize __make_comp_proj and > __make_pred_proj > > Changes in v3: > - Remove redundant is_copy_constructible check when deciding > whether to capture by reference or by value. > > Changes in v2: > - Use a functor utilizing [[no_unique_address]] instead of a lambda, > to also concisely handle the case where only one of the functino > object is stateless. In turn check is_copy_xible instead of > is_default_xible. > > -- >8 -- > > When creating a composite comparator/predicate that invokes a given > projection function, we don't need to capture a stateless function > object by reference, instead we can capture it by value and use > [[no_unique_address]] to elide its storage. This makes using > __make_comp_proj zero-cost in the common case where the comparator > and projection are both stateless. > > libstdc++-v3/ChangeLog: > > * include/bits/ranges_algo.h (__detail::_Comp_proj): New. > (__detail::__make_comp_proj): Use it instead. > (__detail::_Pred_proj): New. > (__detail::__make_pred_proj): Use it instead. > --- > libstdc++-v3/include/bits/ranges_algo.h | 64 ++++++++++++++++++------- > 1 file changed, 48 insertions(+), 16 deletions(-) > > diff --git a/libstdc++-v3/include/bits/ranges_algo.h > b/libstdc++-v3/include/bits/ranges_algo.h > index cc182d110b30..a7df4c9a13ca 100644 > --- a/libstdc++-v3/include/bits/ranges_algo.h > +++ b/libstdc++-v3/include/bits/ranges_algo.h > @@ -49,27 +49,59 @@ namespace ranges > namespace __detail > { > template<typename _Comp, typename _Proj> > - constexpr auto > + struct _Comp_proj > + { > + [[no_unique_address]] > + __conditional_t<is_empty_v<_Comp>, _Comp, _Comp&> _M_comp; > + [[no_unique_address]] > + __conditional_t<is_empty_v<_Proj>, _Proj, _Proj&> _M_proj; > > I would also store function pointers and member pointers (for proj) by value, > instead of by reference. So instead of is_empty_v I would check: > is_empty_v<_Func> || is_pointer_v<_Func> || is_member_pointer_v<_Func> > Or just: > __or_v<is_scalar<_Func>, is_empty<_Func>>.
Sounds good, like so? -- >8 -- Subject: [PATCH] libstdc++: Optimize __make_comp/pred_proj for empty/scalar types Changes in v4: - Also capture scalar (non-empty) functions by value. Changes in v3: - Remove redundant is_copy_constructible check when deciding whether to capture by reference or by value. Changes in v2: - Use a functor utilizing [[no_unique_address]] instead of a lambda, to also concisely handle the case where only one of the functino object is stateless. In turn check is_copy_xible instead of is_default_xible. -- >8 -- When creating a composite comparator/predicate that invokes a given projection function, we don't need to capture a scalar (such as a function pointer or member pointer) or empty object by reference, instead capture it by value and use [[no_unique_address]] to elide its storage in the empty case. This makes using __make_comp_proj zero-cost in the common case where the comparator and projection are both empty/scalars. libstdc++-v3/ChangeLog: * include/bits/ranges_algo.h (__detail::__by_ref_or_value_fn): New. (__detail::_Comp_proj): New. (__detail::__make_comp_proj): Use it instead. (__detail::_Pred_proj): New. (__detail::__make_pred_proj): Use it instead. --- libstdc++-v3/include/bits/ranges_algo.h | 64 ++++++++++++++++++------- 1 file changed, 48 insertions(+), 16 deletions(-) diff --git a/libstdc++-v3/include/bits/ranges_algo.h b/libstdc++-v3/include/bits/ranges_algo.h index cc182d110b30..7447fbd21f7e 100644 --- a/libstdc++-v3/include/bits/ranges_algo.h +++ b/libstdc++-v3/include/bits/ranges_algo.h @@ -48,28 +48,60 @@ namespace ranges { namespace __detail { + template<typename _Fp> + using __by_ref_or_value_fn + = __conditional_t<is_scalar_v<_Fp> || is_empty_v<_Fp>, _Fp, _Fp&>; + template<typename _Comp, typename _Proj> - constexpr auto + struct _Comp_proj + { + [[no_unique_address]] __by_ref_or_value_fn<_Comp> _M_comp; + [[no_unique_address]] __by_ref_or_value_fn<_Proj> _M_proj; + + constexpr + _Comp_proj(_Comp& __comp, _Proj& __proj) + : _M_comp(__comp), _M_proj(__proj) + { } + + template<typename _Tp, typename _Up> + constexpr bool + operator()(_Tp&& __x, _Up&& __y) + { + return std::__invoke(_M_comp, + std::__invoke(_M_proj, std::forward<_Tp>(__x)), + std::__invoke(_M_proj, std::forward<_Up>(__y))); + } + }; + + template<typename _Comp, typename _Proj> + constexpr _Comp_proj<_Comp, _Proj> __make_comp_proj(_Comp& __comp, _Proj& __proj) + { return {__comp, __proj}; } + + template<typename _Pred, typename _Proj> + struct _Pred_proj { - return [&] (auto&& __lhs, auto&& __rhs) -> bool { - using _TL = decltype(__lhs); - using _TR = decltype(__rhs); - return std::__invoke(__comp, - std::__invoke(__proj, std::forward<_TL>(__lhs)), - std::__invoke(__proj, std::forward<_TR>(__rhs))); - }; - } + [[no_unique_address]] __by_ref_or_value_fn<_Pred> _M_pred; + [[no_unique_address]] __by_ref_or_value_fn<_Proj> _M_proj; + + constexpr + _Pred_proj(_Pred& __pred, _Proj& __proj) + : _M_pred(__pred), _M_proj(__proj) + { } + + template<typename _Tp> + constexpr bool + operator()(_Tp&& __x) + { + return std::__invoke(_M_pred, + std::__invoke(_M_proj, std::forward<_Tp>(__x))); + } + }; template<typename _Pred, typename _Proj> - constexpr auto + constexpr _Pred_proj<_Pred, _Proj> __make_pred_proj(_Pred& __pred, _Proj& __proj) - { - return [&] <typename _Tp> (_Tp&& __arg) -> bool { - return std::__invoke(__pred, - std::__invoke(__proj, std::forward<_Tp>(__arg))); - }; - } + { return {__pred, __proj}; } } // namespace __detail struct __all_of_fn -- 2.50.0.rc2