I think I will have a few additional suggestions on how we could use it here, and I think it would be a separate commit on top of this series.
So, I will only add "Precondition" comments to this PR. On Mon, Aug 4, 2025 at 7:57 PM Luc Grosheintz <luc.groshei...@gmail.com> wrote: > > > On 8/4/25 17:26, Tomasz Kaminski wrote: > > On Sun, Aug 3, 2025 at 11:07 PM Luc Grosheintz <luc.groshei...@gmail.com > > > > wrote: > > > >> The methods layout_{left,right}::mapping::stride are defined > >> as > >> > >> \prod_{i = 0}^r E[i] > >> \prod_{i = r+1}^n E[i] > >> > >> This is computed as the product of a precomputed static product and the > >> product of the required dynamic extents. > >> > >> Disassembly shows that even for low-rank extents, i.e. rank == 1 and > >> rank == 2, with at least one dynamic extent, the generated code loads > >> two values; and then runs the loop over at most one element, e.g. for > >> stride_left_d5 defined below the generated code is: > >> > >> 220: 48 8b 04 f5 00 00 00 mov rax,QWORD PTR [rsi*8+0x0] > >> 227: 00 > >> 228: 31 d2 xor edx,edx > >> 22a: 48 85 c0 test rax,rax > >> 22d: 74 23 je 252 <stride_left_d5+0x32> > >> 22f: 48 8b 0c f5 00 00 00 mov rcx,QWORD PTR [rsi*8+0x0] > >> 236: 00 > >> 237: 48 c1 e1 02 shl rcx,0x2 > >> 23b: 74 13 je 250 <stride_left_d5+0x30> > >> 23d: 48 01 f9 add rcx,rdi > >> 240: 48 63 17 movsxd rdx,DWORD PTR [rdi] > >> 243: 48 83 c7 04 add rdi,0x4 > >> 247: 48 0f af c2 imul rax,rdx > >> 24b: 48 39 f9 cmp rcx,rdi > >> 24e: 75 f0 jne 240 <stride_left_d5+0x20> > >> 250: 89 c2 mov edx,eax > >> 252: 89 d0 mov eax,edx > >> 254: c3 ret > >> > >> If there's no dynamic extents, it simply loads the precomputed product > >> of static extents. > >> > >> For rank == 1 the answer is the constant `1`; for rank == 2 it's either > 1 > >> or > >> extents.extent(k), with k == 0 for layout_left and k == 1 for > >> layout_right. > >> > >> Consider, > >> > >> using Ed = std::extents<int, dyn>; > >> int stride_left_d(const std::layout_left::mapping<Ed>& m, size_t r) > >> { return m.stride(r); } > >> > >> using E3d = std::extents<int, 3, dyn>; > >> int stride_left_3d(const std::layout_left::mapping<E3d>& m, size_t r) > >> { return m.stride(r); } > >> > >> using Ed5 = std::extents<int, dyn, 5>; > >> int stride_left_d5(const std::layout_left::mapping<Ed5>& m, size_t r) > >> { return m.stride(r); } > >> > >> The optimized code for these three cases is: > >> > >> 0000000000000060 <stride_left_d>: > >> 60: b8 01 00 00 00 mov eax,0x1 > >> 65: c3 ret > >> > >> 0000000000000090 <stride_left_3d>: > >> 90: 48 83 fe 01 cmp rsi,0x1 > >> 94: 19 c0 sbb eax,eax > >> 96: 83 e0 fe and eax,0xfffffffe > >> 99: 83 c0 03 add eax,0x3 > >> 9c: c3 ret > >> > >> 00000000000000a0 <stride_left_d5>: > >> a0: b8 01 00 00 00 mov eax,0x1 > >> a5: 48 85 f6 test rsi,rsi > >> a8: 74 02 je ac <stride_left_d5+0xc> > >> aa: 8b 07 mov eax,DWORD PTR [rdi] > >> ac: c3 ret > >> > >> For rank == 1 it simply returns 1 (as expected). For rank == 2, it > >> either implements a branchless formula, or conditionally loads one > >> value. In all cases involving a dynamic extent this seems like it's > >> always doing clearly less work, both in terms of computation and loads. > >> In cases not involving a dynamic extent, it replaces loading one value > >> with a branchless sequence of four instructions. > >> > >> This commit also refactors __size to no use any of the precomputed > >> arrays. This prevents instantiating __{fwd,rev}_partial_prods for > >> low-rank extents. This results in a further size reduction of a > >> reference object file (described two commits prior) by 9% from 46.0kB to > >> 41.9kB. > >> > >> In a prior commit we optimized __size to produce better object code by > >> precomputing the static products. This refactor enables the optimizer to > >> generate the same optimized code. > >> > >> libstdc++-v3/ChangeLog: > >> > >> * include/std/mdspan (__mdspan::__fwd_prod): Optimize > >> for rank <= 2. > >> (__mdspan::__rev_prod): Ditto. > >> (__mdspan::__size): Refactor to use a pre-computed product, not > >> a partial product. > >> > >> Signed-off-by: Luc Grosheintz <luc.groshei...@gmail.com> > >> --- > > > > LGTM, with two small suggestions that I can to implement locally, > > if you agree with them: > > * add precondition comment in form: > > // Preconditions: _r < _Extents::rank() > > * reduce size of __fwd_partial_prods to rank-1. > > // Pre-compute: \prod_{i = 0}^r _Extents[i], for r = 0,..., n > > (exclusive) > > template<array _Extents> > > constexpr auto __fwd_partial_prods = [] consteval > > { > > constexpr size_t __rank = _Extents.size(); > > std::array<size_t, __rank> __ret; > > for(size_t __r = 0; __r < __rank; ++__r) > > __ret[__r] = __static_prod<_Extents>(0, __r); > > return __ret; > > }(); > > We can also do the second change separately. > > Very annoying! I'd fixed that already; but I see it got reverted > somehow. There's a lot of assembly in the commit messages; and > object file sizes. (This change could absolute affect the unimportant > details.) > > Let's see first see if you find more issues. If there's nothing > else we can decide which option we prefer. > > > > > > > > > libstdc++-v3/include/std/mdspan | 32 ++++++++++++++++++++++++++------ > >> 1 file changed, 26 insertions(+), 6 deletions(-) > >> > >> diff --git a/libstdc++-v3/include/std/mdspan > >> b/libstdc++-v3/include/std/mdspan > >> index dc1b44ee35c..11a5063f60d 100644 > >> --- a/libstdc++-v3/include/std/mdspan > >> +++ b/libstdc++-v3/include/std/mdspan > >> @@ -423,25 +423,45 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > >> constexpr typename _Extents::index_type > >> __fwd_prod(const _Extents& __exts, size_t __r) noexcept > >> { > >> + constexpr size_t __rank = _Extents::rank(); > >> constexpr auto __sta_exts = __static_extents<_Extents>(); > >> - size_t __sta_prod = __fwd_partial_prods<__sta_exts>[__r]; > >> - return __extents_prod(__exts, __sta_prod, 0, __r); > >> + if constexpr (__rank == 1) > >> + return 1; > >> + else if constexpr (__rank == 2) > >> + return __r == 0 ? 1 : __exts.extent(0); > >> + else > >> + { > >> + size_t __sta_prod = __fwd_partial_prods<__sta_exts>[__r]; > >> + return __extents_prod(__exts, __sta_prod, 0, __r); > >> + } > >> } > >> > >> template<typename _Extents> > >> constexpr typename _Extents::index_type > >> __rev_prod(const _Extents& __exts, size_t __r) noexcept > >> { > >> - constexpr auto __sta_exts = __static_extents<_Extents>(); > >> constexpr size_t __rank = _Extents::rank(); > >> - size_t __sta_prod = __rev_partial_prods<__sta_exts>[__r]; > >> - return __extents_prod(__exts, __sta_prod, __r + 1, __rank); > >> + constexpr auto __sta_exts = __static_extents<_Extents>(); > >> + if constexpr (__rank == 1) > >> + return 1; > >> + else if constexpr (__rank == 2) > >> + return __r == 0 ? __exts.extent(1) : 1; > >> + else > >> + { > >> + size_t __sta_prod = __rev_partial_prods<__sta_exts>[__r]; > >> + return __extents_prod(__exts, __sta_prod, __r + 1, __rank); > >> + } > >> } > >> > >> template<typename _Extents> > >> constexpr typename _Extents::index_type > >> __size(const _Extents& __exts) noexcept > >> - { return __fwd_prod(__exts, __exts.rank()); } > >> + { > >> + constexpr auto __sta_exts = __static_extents<_Extents>(); > >> + constexpr size_t __rank = _Extents::rank(); > >> + constexpr size_t __sta_prod = __static_prod<__sta_exts>(0, > __rank); > >> + return __extents_prod(__exts, __sta_prod, 0, __rank); > >> + } > >> > >> template<typename _IndexType, size_t... _Counts> > >> auto __build_dextents_type(integer_sequence<size_t, _Counts...>) > >> -- > >> 2.50.0 > >> > >> > > > >