On Fri, 13 Jun 2025, Tomasz Kaminski wrote:

> 
> 
> On Thu, Jun 12, 2025 at 9:05 PM Patrick Palka <ppa...@redhat.com> wrote:
>       On Thu, 12 Jun 2025, Patrick Palka wrote:
> 
>       > On Thu, 12 Jun 2025, Jonathan Wakely wrote:
>       >
>       > >
>       > >
>       > > On Thu, 12 Jun 2025, 16:56 Patrick Palka, <ppa...@redhat.com> wrote:
>       > >       Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
>       > >
>       > >       I'm sure if introducing a new overload is preferable to
>       > >       introducing a constexpr if branch in the existing overload.
>       > >
>       > >
>       > > I think a constexpr if branch in the existing functions is cheaper. 
> We need to check the traits either way, but we can avoid doing overload 
> resolution. 
>       >
>       > Ack.  I opted to just define specialized functors for these helpers
>       > instead of using lambdas.  That way we can easily optimize the case
>       > where only one of the functions is stateless.
>       >
>       > Changes in v2:
>       >   - Use a functor utilizing [[no_unique_address]] instead of a lambda,
>       >     to also concisely handle the case where only one of the functino
>       >     object is stateless.  In turn check is_copy_xible instead of
>       >     is_default_xible.
>       >
>       > -- >8 --
>       >
>       > When creating a composite comparator/predicate that invokes a given
>       > projection function, we don't need to capture a stateless function
>       > object by reference, instead we can capture it by value and use
>       > [[no_unique_address]] to elide its storage.  This makes using
>       > __make_comp_proj zero-cost in the common case where the comparator or
>       > projection (or both) are stateless.
>       >
>       > libstdc++-v3/ChangeLog:
>       >
>       >       * include/bits/ranges_algo.h (__detail::_Comp_proj): New.
>       >       (__detail::__make_comp_proj): Use it instead.
>       >       (__detail::_Pred_proj): New.
>       >       (__detail::__make_pred_proj): Use it instead.
>       > ---
>       >  libstdc++-v3/include/bits/ranges_algo.h | 70 
> +++++++++++++++++++------
>       >  1 file changed, 54 insertions(+), 16 deletions(-)
>       >
>       > diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
> b/libstdc++-v3/include/bits/ranges_algo.h
>       > index 9f94c6b1d57e..ce48ca047b69 100644
>       > --- a/libstdc++-v3/include/bits/ranges_algo.h
>       > +++ b/libstdc++-v3/include/bits/ranges_algo.h
>       > @@ -49,27 +49,65 @@ namespace ranges
>       >    namespace __detail
>       >    {
>       >      template<typename _Comp, typename _Proj>
>       > -      constexpr auto
>       > +      struct _Comp_proj
>       > +      {
>       > +     [[no_unique_address]]
>       > +       __conditional_t<is_empty_v<_Comp> && 
> is_copy_constructible_v<_Comp>,
>       > +                       _Comp, _Comp&> _M_comp;
> 
>       Comparators, predicates and projections in ranges algorithms are already
>       constrained to by copyable, so checking is_copy_constructible_v here
>       is redundant.
> 
>       -- >8 --
> 
>       Subject: [PATCH] libstdc++: Optimize __make_comp_proj and 
> __make_pred_proj
> 
>       Changes in v3:
>         - Remove redundant is_copy_constructible check when deciding
>           whether to capture by reference or by value.
> 
>       Changes in v2:
>         - Use a functor utilizing [[no_unique_address]] instead of a lambda,
>           to also concisely handle the case where only one of the functino
>           object is stateless.  In turn check is_copy_xible instead of
>           is_default_xible.
> 
>       -- >8 --
> 
>       When creating a composite comparator/predicate that invokes a given
>       projection function, we don't need to capture a stateless function
>       object by reference, instead we can capture it by value and use
>       [[no_unique_address]] to elide its storage.  This makes using
>       __make_comp_proj zero-cost in the common case where the comparator
>       and projection are both stateless.
> 
>       libstdc++-v3/ChangeLog:
> 
>               * include/bits/ranges_algo.h (__detail::_Comp_proj): New.
>               (__detail::__make_comp_proj): Use it instead.
>               (__detail::_Pred_proj): New.
>               (__detail::__make_pred_proj): Use it instead.
>       ---
>        libstdc++-v3/include/bits/ranges_algo.h | 64 ++++++++++++++++++-------
>        1 file changed, 48 insertions(+), 16 deletions(-)
> 
>       diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
> b/libstdc++-v3/include/bits/ranges_algo.h
>       index cc182d110b30..a7df4c9a13ca 100644
>       --- a/libstdc++-v3/include/bits/ranges_algo.h
>       +++ b/libstdc++-v3/include/bits/ranges_algo.h
>       @@ -49,27 +49,59 @@ namespace ranges
>          namespace __detail
>          {
>            template<typename _Comp, typename _Proj>
>       -      constexpr auto
>       +      struct _Comp_proj
>       +      {
>       +       [[no_unique_address]]
>       +         __conditional_t<is_empty_v<_Comp>, _Comp, _Comp&> _M_comp;
>       +       [[no_unique_address]]
>       +         __conditional_t<is_empty_v<_Proj>, _Proj, _Proj&> _M_proj;
> 
> I would also store function pointers and member pointers (for proj) by value,
> instead of by reference. So instead of is_empty_v I would check:
> is_empty_v<_Func> || is_pointer_v<_Func> || is_member_pointer_v<_Func>
> Or just: 
> __or_v<is_scalar<_Func>, is_empty<_Func>>.

Sounds good, like so?

-- >8 --

Subject: [PATCH] libstdc++: Optimize __make_comp/pred_proj for empty/scalar
 types

Changes in v4:
  - Also capture scalar (non-empty) functions by value.

Changes in v3:
  - Remove redundant is_copy_constructible check when deciding
    whether to capture by reference or by value.

Changes in v2:
  - Use a functor utilizing [[no_unique_address]] instead of a lambda,
    to also concisely handle the case where only one of the functino
    object is stateless.  In turn check is_copy_xible instead of
    is_default_xible.

-- >8 --

When creating a composite comparator/predicate that invokes a given
projection function, we don't need to capture a scalar (such as a
function pointer or member pointer) or empty object by reference,
instead capture it by value and use [[no_unique_address]] to elide its
storage in the empty case.  This makes using __make_comp_proj zero-cost
in the common case where the comparator and projection are both
empty/scalars.

libstdc++-v3/ChangeLog:

        * include/bits/ranges_algo.h (__detail::__by_ref_or_value_fn): New.
        (__detail::_Comp_proj): New.
        (__detail::__make_comp_proj): Use it instead.
        (__detail::_Pred_proj): New.
        (__detail::__make_pred_proj): Use it instead.
---
 libstdc++-v3/include/bits/ranges_algo.h | 64 ++++++++++++++++++-------
 1 file changed, 48 insertions(+), 16 deletions(-)

diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
b/libstdc++-v3/include/bits/ranges_algo.h
index cc182d110b30..7447fbd21f7e 100644
--- a/libstdc++-v3/include/bits/ranges_algo.h
+++ b/libstdc++-v3/include/bits/ranges_algo.h
@@ -48,28 +48,60 @@ namespace ranges
 {
   namespace __detail
   {
+    template<typename _Fp>
+      using __by_ref_or_value_fn
+       = __conditional_t<is_scalar_v<_Fp> || is_empty_v<_Fp>, _Fp, _Fp&>;
+
     template<typename _Comp, typename _Proj>
-      constexpr auto
+      struct _Comp_proj
+      {
+       [[no_unique_address]] __by_ref_or_value_fn<_Comp> _M_comp;
+       [[no_unique_address]] __by_ref_or_value_fn<_Proj> _M_proj;
+
+       constexpr
+       _Comp_proj(_Comp& __comp, _Proj& __proj)
+       : _M_comp(__comp), _M_proj(__proj)
+       { }
+
+       template<typename _Tp, typename _Up>
+       constexpr bool
+       operator()(_Tp&& __x, _Up&& __y)
+       {
+         return std::__invoke(_M_comp,
+                              std::__invoke(_M_proj, std::forward<_Tp>(__x)),
+                              std::__invoke(_M_proj, std::forward<_Up>(__y)));
+       }
+      };
+
+    template<typename _Comp, typename _Proj>
+      constexpr _Comp_proj<_Comp, _Proj>
       __make_comp_proj(_Comp& __comp, _Proj& __proj)
+      { return {__comp, __proj}; }
+
+    template<typename _Pred, typename _Proj>
+      struct _Pred_proj
       {
-       return [&] (auto&& __lhs, auto&& __rhs) -> bool {
-         using _TL = decltype(__lhs);
-         using _TR = decltype(__rhs);
-         return std::__invoke(__comp,
-                              std::__invoke(__proj, std::forward<_TL>(__lhs)),
-                              std::__invoke(__proj, std::forward<_TR>(__rhs)));
-       };
-      }
+       [[no_unique_address]] __by_ref_or_value_fn<_Pred> _M_pred;
+       [[no_unique_address]] __by_ref_or_value_fn<_Proj> _M_proj;
+
+       constexpr
+       _Pred_proj(_Pred& __pred, _Proj& __proj)
+       : _M_pred(__pred), _M_proj(__proj)
+       { }
+
+       template<typename _Tp>
+         constexpr bool
+         operator()(_Tp&& __x)
+         {
+           return std::__invoke(_M_pred,
+                                std::__invoke(_M_proj, 
std::forward<_Tp>(__x)));
+         }
+      };
 
     template<typename _Pred, typename _Proj>
-      constexpr auto
+      constexpr _Pred_proj<_Pred, _Proj>
       __make_pred_proj(_Pred& __pred, _Proj& __proj)
-      {
-       return [&] <typename _Tp> (_Tp&& __arg) -> bool {
-         return std::__invoke(__pred,
-                              std::__invoke(__proj, std::forward<_Tp>(__arg)));
-       };
-      }
+      { return {__pred, __proj}; }
   } // namespace __detail
 
   struct __all_of_fn
-- 
2.50.0.rc2

Reply via email to