Re: [PATCH v5 05/10] libstdc++: Implement layout_left from mdspan.

Luc Grosheintz Sat, 03 May 2025 06:47:15 -0700

This chain discusses changes to `mapping::operator()`. For concrete
discussion, see below. I have a general question: is there a reason
other than style to prefer folds over recursion?



On 4/30/25 7:13 AM, Tomasz Kaminski wrote:

Hi,

As we will be landing patches for extends, this will become a separate
patch series.
I would prefer, if you could commit per layout, and start with layout_right
(default)
I try to provide prompt responses, so if that works better for you, you can
post a patch
only with this layout first, as most of the comments will apply to all of
them.

For the general design we have constructors that allow conversion between
rank-0
and rank-1 layouts left and right. This is done because they essentially
represents
the same layout. I think we could benefit from that in code by having a
base classes
for rank0 and rank1 mapping:
template<typename _Extents>
_Rank0_mapping_base
{
    static_assert(_Extents::rank() == 0);

    template<OtherExtents>
    // explicit, requires goes here
    _Rank0_mapping_base(_Rank0_mapping_base<OtherExtents>);

     // All members layout_type goes her
};

template<typename _Extents>
_Rank1_mapping_base
{
    static_assert(_Extents::rank() == 1);
   // Static assert for product is much simpler here, as we need to check one

    template<OtherExtents>
    // explicit, requires goes here
    _Rank1_mapping_base(_Rank1_mapping_base<OtherExtents>);

   // Call operator can also be simplified
   index_type operator()(index_type i) const // conversion happens at user
side

   // cosntructor from strided_layout of Rank1 goes here.

     // All members layout_type goes her
};
Then we will specialize layout_left/right/stride to use _Rank0_mapping_base
as a base for rank() == 0
and layout_left/right to use _Rank1_mapping as base for rank()1;
template<typename T, unsigned... Ids>
struct extents {};

struct layout
{
template<typename Extends>
struct mapping
{
// static assert that Extents mmyst be specialization of _Extents goes here.
}
};

template<typename _IndexType>
struct layout::mapping<extents<_IndexType>>
: _Rank0_mapping_base<extents<_IndexType>>
{
using layout_type = layout_left;
// Provides converting constructor.
using _Rank0_mapping_base<extents<_IndexType>>::_Rank0_mapping_base;
// This one is implicit;
mapping(_Rank0_mapping_base<extents<_IndexType>> const&);
};

template<typename _IndexType, unsigned _Ext>
struct layout::mapping<extents<_IndexType, _Ext>>
: _Rank1_mapping_base<extents<_IndexType>>

{
using layout_type = layout_left;
// Provides converting constructor.
using _Rank0_mapping_base<extents<_IndexType>>::_Rank0_mapping_base;
// This one is implicit, allows construction from layout_right
mapping(_Rank1_mapping_base<extents<_IndexType>> const&);
};
};

template<typename _IndexType, unsigned... _Ext>
requires sizeof..(_Ext) > = 2
struct layout::mapping<extents<_IndexType, _Ext>>

The last one is a generic implementation that you can use in yours.
Please also include a comment explaining that we are deviating from
standard text here.


On Tue, Apr 29, 2025 at 2:56 PM Luc Grosheintz <luc.groshei...@gmail.com>
wrote:

Implements the parts of layout_left that don't depend on any of the
other layouts.

libstdc++/ChangeLog:

         * include/std/mdspan (layout_left): New class.

Signed-off-by: Luc Grosheintz <luc.groshei...@gmail.com>
---
  libstdc++-v3/include/std/mdspan | 179 ++++++++++++++++++++++++++++++++
  1 file changed, 179 insertions(+)

diff --git a/libstdc++-v3/include/std/mdspan
b/libstdc++-v3/include/std/mdspan
index 39ced1d6301..e05048a5b93 100644
--- a/libstdc++-v3/include/std/mdspan
+++ b/libstdc++-v3/include/std/mdspan
@@ -286,6 +286,26 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

    namespace __mdspan
    {
+    template<typename _Extents>
+      constexpr typename _Extents::index_type
+      __fwd_prod(const _Extents& __exts, size_t __r) noexcept
+      {
+       typename _Extents::index_type __fwd = 1;
+       for(size_t __i = 0; __i < __r; ++__i)
+         __fwd *= __exts.extent(__i);
+       return __fwd;
+      }

As we are inside the standard library implementation, we can do some tricks
here,
and provide two functions:
// Returns the std::span(_ExtentsStorage::_Ext).substr(f, l);
// For extents forward to __static_exts
span<typename Extends::index_type> __static_exts(size_t f, size_t l);
// Returns the
std::span(_ExtentsStorage::_M_dynamic_extents).substr(_S_dynamic_index[f],
_S_dynamic_index[l);
span<typename Extends::index_type> __dynamic_exts(Extents const& c);
Then you can befriend this function both to extents and _ExtentsStorage.
Also add index_type members to _ExtentsStorage.

Then instead of having fwd-prod and rev-prod I would have:
template<typename _Extents>
consteval size_t __static_ext_prod(size_t f, size_t l)
{
   // multiply E != dynamic_ext from __static_exts
}
constexpr size __ext_prod(const _Extents& __exts, size_t f, size_t l)
{
    // multiply __static_ext_prod<_Extents>(f, l) and each elements of
__dynamic_exts(__exts, f, l);
}

Then fwd-prod(e, n) would be __ext_prod(e, 0, n), and rev_prod(e, n) would
be __ext_prod(e, __ext.rank() -n, n, __ext.rank())

+
+    template<typename _Extents>
+      constexpr typename _Extents::index_type
+      __rev_prod(const _Extents& __exts, size_t __r) noexcept
+      {
+       typename _Extents::index_type __rev = 1;
+       for(size_t __i = __r + 1; __i < __exts.rank(); ++__i)
+         __rev *= __exts.extent(__i);
+       return __rev;
+      }
+
      template<typename _IndexType, size_t... _Counts>
        auto __build_dextents_type(integer_sequence<size_t, _Counts...>)
         -> extents<_IndexType, ((void) _Counts, dynamic_extent)...>;
@@ -304,6 +324,165 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
      explicit extents(_Integrals...) ->
        extents<size_t, __mdspan::__dynamic_extent<_Integrals>()...>;

+  struct layout_left
+  {
+    template<typename _Extents>
+      class mapping;
+  };
+
+  namespace __mdspan
+  {
+    template<typename _Tp>
+      constexpr bool __is_extents = false;
+
+    template<typename _IndexType, size_t... _Extents>
+      constexpr bool __is_extents<extents<_IndexType, _Extents...>> =
true;
+
+    template<size_t _Count>
+    struct _LinearIndexLeft
+    {
+      template<typename _Extents, typename... _Indices>
+       static constexpr typename _Extents::index_type
+       _S_value(const _Extents& __exts, typename _Extents::index_type
__idx,
+                _Indices... __indices) noexcept
+       {
+         return __idx + __exts.extent(_Count)
+           * _LinearIndexLeft<_Count + 1>::_S_value(__exts, __indices...);
+       }
+
+      template<typename _Extents>
+       static constexpr typename _Extents::index_type
+       _S_value(const _Extents&) noexcept
+       { return 0; }
+    };
+
+    template<typename _Extents, typename... _Indices>
+      constexpr typename _Extents::index_type
+      __linear_index_left(const _Extents& __exts, _Indices... __indices)
+      {
+       return _LinearIndexLeft<0>::_S_value(__exts, __indices...);
+      }

This can be eliminated by fold expressions, see below.

+
+    template<typename _IndexType, typename _Tp, size_t _Nm>
+      consteval bool
+      __is_representable_product(array<_Tp, _Nm> __factors)
+      {
+       size_t __rest = numeric_limits<_IndexType>::max();
+       for(size_t __i = 0; __i < _Nm; ++__i)
+       {
+         if (__factors[__i] == 0)
+           return true;
+         __rest /= _IndexType(__factors[__i]);
+       }
+       return __rest > 0;
+      }

I would replace that with
template<IndexType>
consteval size_t __div_reminder(span<const size_t, _Nm> __factors, size_t
__val)
{
      size_t __rest = val;
      for(size_t __i = 0; __i < _Nm; ++__i)
      {
        if (__factors[__i] == dynamic_extent)
          continue;
        if (__factors[__i] != 0)
           return val;
         __rest /= _IndexType(__factors[__i]);
        if (__res == 0)
          return 0;
      }
     return __rest;
}

We can express the is presentable check as
static constexpr __dyn_reminder = __div_reminder(__static_exts<Extents>(0,
rank()), std::numeric_limits<Index>::max());
static_assert(__dyn_reminder > 0);
However, with __dyn_reminder value, the precondition
https://eel.is/c++draft/mdspan.layout#left.cons-1,
can be checked by doing equivalent of __div_remainder for __dyn_extents
with __val being __dyn_reminder.

+
+    template<typename _Extents>
+      consteval array<typename _Extents::index_type, _Extents::rank()>
+      __static_extents_array()
+      {
+       array<typename _Extents::index_type, _Extents::rank()> __exts;
+       for(size_t __i = 0; __i < _Extents::rank(); ++__i)
+         __exts[__i] = _Extents::static_extent(__i);
+       return __exts;
+      }


Replaced by __static_exts accessor, as described above.

+
+    template<typename _Extents, typename _IndexType>
+      concept __representable_size = _Extents::rank_dynamic() != 0
+      || __is_representable_product<_IndexType>(
+         __static_extents_array<_Extents>());
+
+    template<typename _Extents>
+      concept __layout_extent = __representable_size<
+       _Extents, typename _Extents::index_type>;
+  }

+  template<typename _Extents>
+    class layout_left::mapping
+    {
+      static_assert(__mdspan::__layout_extent<_Extents>,
+       "The size of extents_type is not representable as index_type.");
+    public:
+      using extents_type = _Extents;
+      using index_type = typename extents_type::index_type;
+      using size_type = typename extents_type::size_type;
+      using rank_type = typename extents_type::rank_type;
+      using layout_type = layout_left;
+
+      constexpr
+      mapping() noexcept = default;
+
+      constexpr
+      mapping(const mapping&) noexcept = default;
+
+      constexpr
+      mapping(const extents_type& __extents) noexcept
+      : _M_extents(__extents)
+      {

}
+
+      template<typename _OExtents>
+       requires (is_constructible_v<extents_type, _OExtents>)
+       constexpr explicit(!is_convertible_v<_OExtents, extents_type>)
+       mapping(const mapping<_OExtents>& __other) noexcept
+       : _M_extents(__other.extents())
+       {


Here we could do checks at compile time:
if constexpr(_OExtents::rank_dynamic() == 0)
  static_assert( __div_remainder(...) > 0);
}

}
+
+      constexpr mapping&
+      operator=(const mapping&) noexcept = default;
+
+      constexpr const extents_type&
+      extents() const noexcept { return _M_extents; }
+
+      constexpr index_type
+      required_span_size() const noexcept
+      { return __mdspan::__fwd_prod(_M_extents, _M_extents.rank()); }
+
+      template<__mdspan::__valid_index_type<index_type>... _Indices>

// Because we extracted rank0 and rank1 specializations

+       requires (sizeof...(_Indices) + 1 == extents_type::rank())
+       constexpr index_type
+       operator()(index_type __idx, _Indices... __indices) const noexcept
+       {

This could be implemented as, please synchronize the names.
if constexpr (!is_same_v<_Indices, index_type> || ...)
   // Reduce the number of instantations.
   return operator()(index_type _idx0,
static_cast<index_type>(std::move(__indices))....);
else
   {
       // This can be used for layout stride, if you start with __res = 0;
       index_type __res = _idx0;
       index_type __mult = _M_extents.extent(0);
       auto __update = [&__res, &__mult, __pos = 1u](index_type __idx)
mutable
          {
              __res += __idx * __mult;
              __mult *= _M_extents.extent(__pos);
              ++__pos;
          };
      // Fold over invocation of lambda
       (__update(_Indices), ....);
       return __res;
   }


The reason I chose the recursive formulation is that I'd like it to
implement: `i + n*(j + m*k)` and because the following code

size_t linear_index1(const std::extents<size_t, 3, 5, 7, 11>& exts,
                     size_t i, size_t j, size_t k, size_t l)
{
  Layout::mapping<std::extents<size_t, 3, 5, 7, 11>> m(exts);
  return m(i, j, k, l);
}

compiles to (with -O2):

   0:   4a 8d 04 c1             lea    rax,[rcx+r8*8]
   4:   4c 29 c0                sub    rax,r8
   7:   48 8d 04 80             lea    rax,[rax+rax*4]
   b:   48 01 d0                add    rax,rdx
   e:   48 8d 04 40             lea    rax,[rax+rax*2]
  12:   48 01 f0                add    rax,rsi
  15:   c3                      ret

Whereas the suggested implementation using a lambda and folding
implements `i + n*j + n*m*k` and compiles to:

   0:   4d 6b c0 69             imul   r8,r8,0x69
   4:   48 89 c8                mov    rax,rcx
   7:   48 c1 e0 04             shl    rax,0x4
   b:   48 29 c8                sub    rax,rcx
   e:   49 01 c0                add    r8,rax
  11:   48 8d 04 52             lea    rax,[rdx+rdx*2]
  15:   49 01 f0                add    r8,rsi
  18:   4c 01 c0                add    rax,r8
  1b:   c3                      ret

It's not obvious to me which is preferred, the one doesn't use `imul`
while the other could maybe make more use of a super-scalar processor?
(For example the `imul` could overlap with the next three instructions.)

Since this is (likely) a performance critical part of mdspan, it makes
sense to discuss which formula we'd like to implement and pick whatever
dialect of C++ that compiles into the preferred assembly.

I checked both implementations on something similar to:

double a = 0.0;
double b = 0.0;
double c = 0.0;
double d = 0.0;

for(size_t i = 0; i < n / 4; i += 4)
{
  a += values[m(i+0, j, k)];
  b += values[m(i+1, j, k)];
  c += values[m(i+2, j, k)];
  d += values[m(i+3, j, k)];
}
double total = a + b + c + d;

and with both implementations, as far as I can tell, vectorization works
as expected. In particular, the compiler is perfectly able to see that
the stride in the inner most loop is 1.

I feel this is one of the places where, at a slightly later point, we'll
probably will want to measure.

Would it make sense to postpone a detailed discussion of this until we've
settled the other parts of mdspan?


This could be even simpler and written as (use for layout stride):
size_t __pos = 0;
return (index_type(0) + ... + __indices * stride(__pos++));
Here, I prefer to avoid multiplying multiple times.


(This is already essentially how layout_stride implement it.)



+         return __mdspan::__linear_index_left(

+           _M_extents, static_cast<index_type>(__indices)...);
+       }
+
+      static constexpr bool
+      is_always_unique() noexcept { return true; }
+
+      static constexpr bool
+      is_always_exhaustive() noexcept { return true; }
+
+      static constexpr bool
+      is_always_strided() noexcept { return true; }
+
+      static constexpr bool
+      is_unique() noexcept { return true; }
+
+      static constexpr bool
+      is_exhaustive() noexcept { return true; }
+
+      static constexpr bool
+      is_strided() noexcept { return true; }
+
+      constexpr index_type
+      stride(rank_type __i) const noexcept
+      requires (extents_type::rank() > 0)
+      {
+       __glibcxx_assert(__i < extents_type::rank());
+       return __mdspan::__fwd_prod(_M_extents, __i);
+      }
+
+      template<typename _OExtents>
+       requires (extents_type::rank() == _OExtents::rank())
+       friend constexpr bool
+       operator==(const mapping& __self, const mapping<_OExtents>&
__other)
+       noexcept
+       { return __self.extents() == __other.extents(); }
+
+    private:
+      [[no_unique_address]] extents_type _M_extents;
+    };
+
  _GLIBCXX_END_NAMESPACE_VERSION
  }
  #endif
--
2.49.0

Re: [PATCH v5 05/10] libstdc++: Implement layout_left from mdspan.

Reply via email to