This chain discusses changes to `mapping::operator()`. For concrete
discussion, see below. I have a general question: is there a reason
other than style to prefer folds over recursion?
On 4/30/25 7:13 AM, Tomasz Kaminski wrote:
Hi,
As we will be landing patches for extends, this will become a separate
patch series.
I would prefer, if you could commit per layout, and start with layout_right
(default)
I try to provide prompt responses, so if that works better for you, you can
post a patch
only with this layout first, as most of the comments will apply to all of
them.
For the general design we have constructors that allow conversion between
rank-0
and rank-1 layouts left and right. This is done because they essentially
represents
the same layout. I think we could benefit from that in code by having a
base classes
for rank0 and rank1 mapping:
template<typename _Extents>
_Rank0_mapping_base
{
static_assert(_Extents::rank() == 0);
template<OtherExtents>
// explicit, requires goes here
_Rank0_mapping_base(_Rank0_mapping_base<OtherExtents>);
// All members layout_type goes her
};
template<typename _Extents>
_Rank1_mapping_base
{
static_assert(_Extents::rank() == 1);
// Static assert for product is much simpler here, as we need to check one
template<OtherExtents>
// explicit, requires goes here
_Rank1_mapping_base(_Rank1_mapping_base<OtherExtents>);
// Call operator can also be simplified
index_type operator()(index_type i) const // conversion happens at user
side
// cosntructor from strided_layout of Rank1 goes here.
// All members layout_type goes her
};
Then we will specialize layout_left/right/stride to use _Rank0_mapping_base
as a base for rank() == 0
and layout_left/right to use _Rank1_mapping as base for rank()1;
template<typename T, unsigned... Ids>
struct extents {};
struct layout
{
template<typename Extends>
struct mapping
{
// static assert that Extents mmyst be specialization of _Extents goes here.
}
};
template<typename _IndexType>
struct layout::mapping<extents<_IndexType>>
: _Rank0_mapping_base<extents<_IndexType>>
{
using layout_type = layout_left;
// Provides converting constructor.
using _Rank0_mapping_base<extents<_IndexType>>::_Rank0_mapping_base;
// This one is implicit;
mapping(_Rank0_mapping_base<extents<_IndexType>> const&);
};
template<typename _IndexType, unsigned _Ext>
struct layout::mapping<extents<_IndexType, _Ext>>
: _Rank1_mapping_base<extents<_IndexType>>
{
using layout_type = layout_left;
// Provides converting constructor.
using _Rank0_mapping_base<extents<_IndexType>>::_Rank0_mapping_base;
// This one is implicit, allows construction from layout_right
mapping(_Rank1_mapping_base<extents<_IndexType>> const&);
};
};
template<typename _IndexType, unsigned... _Ext>
requires sizeof..(_Ext) > = 2
struct layout::mapping<extents<_IndexType, _Ext>>
The last one is a generic implementation that you can use in yours.
Please also include a comment explaining that we are deviating from
standard text here.
On Tue, Apr 29, 2025 at 2:56 PM Luc Grosheintz <luc.groshei...@gmail.com>
wrote:
Implements the parts of layout_left that don't depend on any of the
other layouts.
libstdc++/ChangeLog:
* include/std/mdspan (layout_left): New class.
Signed-off-by: Luc Grosheintz <luc.groshei...@gmail.com>
---
libstdc++-v3/include/std/mdspan | 179 ++++++++++++++++++++++++++++++++
1 file changed, 179 insertions(+)
diff --git a/libstdc++-v3/include/std/mdspan
b/libstdc++-v3/include/std/mdspan
index 39ced1d6301..e05048a5b93 100644
--- a/libstdc++-v3/include/std/mdspan
+++ b/libstdc++-v3/include/std/mdspan
@@ -286,6 +286,26 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
namespace __mdspan
{
+ template<typename _Extents>
+ constexpr typename _Extents::index_type
+ __fwd_prod(const _Extents& __exts, size_t __r) noexcept
+ {
+ typename _Extents::index_type __fwd = 1;
+ for(size_t __i = 0; __i < __r; ++__i)
+ __fwd *= __exts.extent(__i);
+ return __fwd;
+ }
As we are inside the standard library implementation, we can do some tricks
here,
and provide two functions:
// Returns the std::span(_ExtentsStorage::_Ext).substr(f, l);
// For extents forward to __static_exts
span<typename Extends::index_type> __static_exts(size_t f, size_t l);
// Returns the
std::span(_ExtentsStorage::_M_dynamic_extents).substr(_S_dynamic_index[f],
_S_dynamic_index[l);
span<typename Extends::index_type> __dynamic_exts(Extents const& c);
Then you can befriend this function both to extents and _ExtentsStorage.
Also add index_type members to _ExtentsStorage.
Then instead of having fwd-prod and rev-prod I would have:
template<typename _Extents>
consteval size_t __static_ext_prod(size_t f, size_t l)
{
// multiply E != dynamic_ext from __static_exts
}
constexpr size __ext_prod(const _Extents& __exts, size_t f, size_t l)
{
// multiply __static_ext_prod<_Extents>(f, l) and each elements of
__dynamic_exts(__exts, f, l);
}
Then fwd-prod(e, n) would be __ext_prod(e, 0, n), and rev_prod(e, n) would
be __ext_prod(e, __ext.rank() -n, n, __ext.rank())
+
+ template<typename _Extents>
+ constexpr typename _Extents::index_type
+ __rev_prod(const _Extents& __exts, size_t __r) noexcept
+ {
+ typename _Extents::index_type __rev = 1;
+ for(size_t __i = __r + 1; __i < __exts.rank(); ++__i)
+ __rev *= __exts.extent(__i);
+ return __rev;
+ }
+
template<typename _IndexType, size_t... _Counts>
auto __build_dextents_type(integer_sequence<size_t, _Counts...>)
-> extents<_IndexType, ((void) _Counts, dynamic_extent)...>;
@@ -304,6 +324,165 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
explicit extents(_Integrals...) ->
extents<size_t, __mdspan::__dynamic_extent<_Integrals>()...>;
+ struct layout_left
+ {
+ template<typename _Extents>
+ class mapping;
+ };
+
+ namespace __mdspan
+ {
+ template<typename _Tp>
+ constexpr bool __is_extents = false;
+
+ template<typename _IndexType, size_t... _Extents>
+ constexpr bool __is_extents<extents<_IndexType, _Extents...>> =
true;
+
+ template<size_t _Count>
+ struct _LinearIndexLeft
+ {
+ template<typename _Extents, typename... _Indices>
+ static constexpr typename _Extents::index_type
+ _S_value(const _Extents& __exts, typename _Extents::index_type
__idx,
+ _Indices... __indices) noexcept
+ {
+ return __idx + __exts.extent(_Count)
+ * _LinearIndexLeft<_Count + 1>::_S_value(__exts, __indices...);
+ }
+
+ template<typename _Extents>
+ static constexpr typename _Extents::index_type
+ _S_value(const _Extents&) noexcept
+ { return 0; }
+ };
+
+ template<typename _Extents, typename... _Indices>
+ constexpr typename _Extents::index_type
+ __linear_index_left(const _Extents& __exts, _Indices... __indices)
+ {
+ return _LinearIndexLeft<0>::_S_value(__exts, __indices...);
+ }
This can be eliminated by fold expressions, see below.
+
+ template<typename _IndexType, typename _Tp, size_t _Nm>
+ consteval bool
+ __is_representable_product(array<_Tp, _Nm> __factors)
+ {
+ size_t __rest = numeric_limits<_IndexType>::max();
+ for(size_t __i = 0; __i < _Nm; ++__i)
+ {
+ if (__factors[__i] == 0)
+ return true;
+ __rest /= _IndexType(__factors[__i]);
+ }
+ return __rest > 0;
+ }
I would replace that with
template<IndexType>
consteval size_t __div_reminder(span<const size_t, _Nm> __factors, size_t
__val)
{
size_t __rest = val;
for(size_t __i = 0; __i < _Nm; ++__i)
{
if (__factors[__i] == dynamic_extent)
continue;
if (__factors[__i] != 0)
return val;
__rest /= _IndexType(__factors[__i]);
if (__res == 0)
return 0;
}
return __rest;
}
We can express the is presentable check as
static constexpr __dyn_reminder = __div_reminder(__static_exts<Extents>(0,
rank()), std::numeric_limits<Index>::max());
static_assert(__dyn_reminder > 0);
However, with __dyn_reminder value, the precondition
https://eel.is/c++draft/mdspan.layout#left.cons-1,
can be checked by doing equivalent of __div_remainder for __dyn_extents
with __val being __dyn_reminder.
+
+ template<typename _Extents>
+ consteval array<typename _Extents::index_type, _Extents::rank()>
+ __static_extents_array()
+ {
+ array<typename _Extents::index_type, _Extents::rank()> __exts;
+ for(size_t __i = 0; __i < _Extents::rank(); ++__i)
+ __exts[__i] = _Extents::static_extent(__i);
+ return __exts;
+ }
Replaced by __static_exts accessor, as described above.
+
+ template<typename _Extents, typename _IndexType>
+ concept __representable_size = _Extents::rank_dynamic() != 0
+ || __is_representable_product<_IndexType>(
+ __static_extents_array<_Extents>());
+
+ template<typename _Extents>
+ concept __layout_extent = __representable_size<
+ _Extents, typename _Extents::index_type>;
+ }
+
+ template<typename _Extents>
+ class layout_left::mapping
+ {
+ static_assert(__mdspan::__layout_extent<_Extents>,
+ "The size of extents_type is not representable as index_type.");
+ public:
+ using extents_type = _Extents;
+ using index_type = typename extents_type::index_type;
+ using size_type = typename extents_type::size_type;
+ using rank_type = typename extents_type::rank_type;
+ using layout_type = layout_left;
+
+ constexpr
+ mapping() noexcept = default;
+
+ constexpr
+ mapping(const mapping&) noexcept = default;
+
+ constexpr
+ mapping(const extents_type& __extents) noexcept
+ : _M_extents(__extents)
+ {
}
+
+ template<typename _OExtents>
+ requires (is_constructible_v<extents_type, _OExtents>)
+ constexpr explicit(!is_convertible_v<_OExtents, extents_type>)
+ mapping(const mapping<_OExtents>& __other) noexcept
+ : _M_extents(__other.extents())
+ {
Here we could do checks at compile time:
if constexpr(_OExtents::rank_dynamic() == 0)
static_assert( __div_remainder(...) > 0);
}
}
+
+ constexpr mapping&
+ operator=(const mapping&) noexcept = default;
+
+ constexpr const extents_type&
+ extents() const noexcept { return _M_extents; }
+
+ constexpr index_type
+ required_span_size() const noexcept
+ { return __mdspan::__fwd_prod(_M_extents, _M_extents.rank()); }
+
+ template<__mdspan::__valid_index_type<index_type>... _Indices>
// Because we extracted rank0 and rank1 specializations
+ requires (sizeof...(_Indices) + 1 == extents_type::rank())
+ constexpr index_type
+ operator()(index_type __idx, _Indices... __indices) const noexcept
+ {
This could be implemented as, please synchronize the names.
if constexpr (!is_same_v<_Indices, index_type> || ...)
// Reduce the number of instantations.
return operator()(index_type _idx0,
static_cast<index_type>(std::move(__indices))....);
else
{
// This can be used for layout stride, if you start with __res = 0;
index_type __res = _idx0;
index_type __mult = _M_extents.extent(0);
auto __update = [&__res, &__mult, __pos = 1u](index_type __idx)
mutable
{
__res += __idx * __mult;
__mult *= _M_extents.extent(__pos);
++__pos;
};
// Fold over invocation of lambda
(__update(_Indices), ....);
return __res;
}
The reason I chose the recursive formulation is that I'd like it to
implement: `i + n*(j + m*k)` and because the following code
size_t linear_index1(const std::extents<size_t, 3, 5, 7, 11>& exts,
size_t i, size_t j, size_t k, size_t l)
{
Layout::mapping<std::extents<size_t, 3, 5, 7, 11>> m(exts);
return m(i, j, k, l);
}
compiles to (with -O2):
0: 4a 8d 04 c1 lea rax,[rcx+r8*8]
4: 4c 29 c0 sub rax,r8
7: 48 8d 04 80 lea rax,[rax+rax*4]
b: 48 01 d0 add rax,rdx
e: 48 8d 04 40 lea rax,[rax+rax*2]
12: 48 01 f0 add rax,rsi
15: c3 ret
Whereas the suggested implementation using a lambda and folding
implements `i + n*j + n*m*k` and compiles to:
0: 4d 6b c0 69 imul r8,r8,0x69
4: 48 89 c8 mov rax,rcx
7: 48 c1 e0 04 shl rax,0x4
b: 48 29 c8 sub rax,rcx
e: 49 01 c0 add r8,rax
11: 48 8d 04 52 lea rax,[rdx+rdx*2]
15: 49 01 f0 add r8,rsi
18: 4c 01 c0 add rax,r8
1b: c3 ret
It's not obvious to me which is preferred, the one doesn't use `imul`
while the other could maybe make more use of a super-scalar processor?
(For example the `imul` could overlap with the next three instructions.)
Since this is (likely) a performance critical part of mdspan, it makes
sense to discuss which formula we'd like to implement and pick whatever
dialect of C++ that compiles into the preferred assembly.
I checked both implementations on something similar to:
double a = 0.0;
double b = 0.0;
double c = 0.0;
double d = 0.0;
for(size_t i = 0; i < n / 4; i += 4)
{
a += values[m(i+0, j, k)];
b += values[m(i+1, j, k)];
c += values[m(i+2, j, k)];
d += values[m(i+3, j, k)];
}
double total = a + b + c + d;
and with both implementations, as far as I can tell, vectorization works
as expected. In particular, the compiler is perfectly able to see that
the stride in the inner most loop is 1.
I feel this is one of the places where, at a slightly later point, we'll
probably will want to measure.
Would it make sense to postpone a detailed discussion of this until we've
settled the other parts of mdspan?
This could be even simpler and written as (use for layout stride):
size_t __pos = 0;
return (index_type(0) + ... + __indices * stride(__pos++));
Here, I prefer to avoid multiplying multiple times.
(This is already essentially how layout_stride implement it.)
+ return __mdspan::__linear_index_left(
+ _M_extents, static_cast<index_type>(__indices)...);
+ }
+
+ static constexpr bool
+ is_always_unique() noexcept { return true; }
+
+ static constexpr bool
+ is_always_exhaustive() noexcept { return true; }
+
+ static constexpr bool
+ is_always_strided() noexcept { return true; }
+
+ static constexpr bool
+ is_unique() noexcept { return true; }
+
+ static constexpr bool
+ is_exhaustive() noexcept { return true; }
+
+ static constexpr bool
+ is_strided() noexcept { return true; }
+
+ constexpr index_type
+ stride(rank_type __i) const noexcept
+ requires (extents_type::rank() > 0)
+ {
+ __glibcxx_assert(__i < extents_type::rank());
+ return __mdspan::__fwd_prod(_M_extents, __i);
+ }
+
+ template<typename _OExtents>
+ requires (extents_type::rank() == _OExtents::rank())
+ friend constexpr bool
+ operator==(const mapping& __self, const mapping<_OExtents>&
__other)
+ noexcept
+ { return __self.extents() == __other.extents(); }
+
+ private:
+ [[no_unique_address]] extents_type _M_extents;
+ };
+
_GLIBCXX_END_NAMESPACE_VERSION
}
#endif
--
2.49.0