Hi, I have posted v3 patches with changes I have made locally for first 6 patches, and I think this series is ready to land, in addition to https://gcc.gnu.org/pipermail/libstdc++/2025-July/062727.html, that I already reviewed. I will keep aligned accessor on top of it.
As the separate commit, we should update the __fwd_partial_prod and __rev_partial_prod computation to use fact that this is partial prod, and we have previous partial product computed: template<array _Extents> constexpr auto __fwd_partial_prods = [] consteval { constexpr size_t __rank = _Extents.size(); std::array<size_t, __rank + 1> __ret; __ret[0] = 1; for(size_t __r = 1; __r < __rank + 1; ++__r) if (_Extents[__r] != dynamic_extents) __ret[__r] = __ret[0] * _Extents[__r]; return __ret; }(); We are doing this at compile time, but this should help composition speed. I would then inline __static_prod loop into the __mdspan::__size function and remove that function entirely. I would be interested if we could reduce the __fwd/rev_partial_prod sizes, but not keep the outermost dimensions there, i.e. adding a check for __r == 0 early in the __fwd_prod funciton. constexpr size_t __rank = _Extents::rank(); constexpr auto& __sta_exts = __static_extents<_Extents>(); if constexpr (__rank == 1) return 1; // new if here, that is extracted from __rank == 2 else if (__r == 0) return 1; else if constexpr (__rank == 2) return _exts.extent(0); else if constexpr (__all_dynamic(std::span(__sta_exts).first(__rank-1))) return __extents_prod(__exts, 1, 0, __r); else { size_t __sta_prod = __fwd_partial_prods<__sta_exts>[__r-1]; // size reduce by one here return __extents_prod(__exts, __sta_prod, 0, __r); } Similarly for __rev_prod: constexpr size_t __rank = _Extents::rank(); constexpr auto& __sta_exts = __static_extents<_Extents>(); if constexpr (__rank == 1) return 1; else if (__r == __rank - 1) return 1; else if constexpr (__rank == 2) return __exts.extent(1); else if constexpr (__all_dynamic(std::span(__sta_exts).last(__rank-1))) return __extents_prod(__exts, 1, __r + 1, __rank); else { size_t __sta_prod = __rev_partial_prods<__sta_exts>[__r-1]; // size reduced by one here. return __extents_prod(__exts, __sta_prod, __r + 1, __rank); } Regards, Tomasz On Mon, Aug 4, 2025 at 7:51 PM Luc Grosheintz <luc.groshei...@gmail.com> wrote: > > > On 8/4/25 17:42, Tomasz Kaminski wrote: > > On Mon, Aug 4, 2025 at 1:14 PM Tomasz Kaminski <tkami...@redhat.com> > wrote: > > > >> > >> > >> On Mon, Aug 4, 2025 at 1:08 PM Luc Grosheintz <luc.groshei...@gmail.com > > > >> wrote: > >> > >>> Hi Tomasz, > >>> > >>> Thank you for the review! Sorry about the missing parens, even after > >>> "Ctrl+F"ing each of the emails I can't find the requires clause (the > >>> two I found both need the parens). > >>> > >> I was mentioning this one, but it is possible that I am wrong on this > one, > >> I haven't yet checked it, as I was planning to do it during full review. > >> template<array _Extents> > >> requires (__all_dynamic<_Extents>()) > >> class _StaticExtents<_Extents>; > >> But regardless, I will make that change locally if it does work, so you > do > >> not > >> need to update patch series for it. > >> > > Parenthesis seem to be indeed required here, so my comment was a red > > herring. > > I'm pretty sure that this is the reason why I wrongly concluded > that the pattern is `requires(cond)`, i.e. the parens are always > needed. > > > > >> > >>> Do you think it's possible to merge this series before the other two > >>> outstanding patch series? There's some risk of collision; making > changes > >>> to this patch series is much more time consuming than rebasing (and > >>> retesting the other two patch series). > >>> > >> This sounds very reasonable. I will prepare a stack of patches in your > >> suggested > >> order locally, and push them once they are approved. But it still make > >> > >>> > >>> Thank you, > >>> > >> Thank you for your continued contributions. > >> > >>> Luc > >>> > >>> On 8/4/25 12:43, Tomasz Kaminski wrote: > >>>> Hi, > >>>> > >>>> I this time I made a quick pass through all changes, before commenting > >>> on > >>>> first commits, > >>>> and they look solid to me, and I haven't noticed anything I would like > >>> to > >>>> change (except parentheses > >>>> around requires, but I will handle that locally). I will try to do a > >>> full > >>>> review during this week. > >>>> > >>>> Regards, > >>>> Tomasz > >>>> > >>>> On Sun, Aug 3, 2025 at 10:59 PM Luc Grosheintz < > >>> luc.groshei...@gmail.com> > >>>> wrote: > >>>> > >>>>> The combined effect of this sequence of change is: > >>>>> > >>>>> * a reduction in the number of template instantiations, by > >>>>> - avoiding needless dependency of IndexType, > >>>>> - special formulas for low-rank extents, > >>>>> - special formulas for (nearly) fully dynamic extents. > >>>>> > >>>>> * improved code quality, by > >>>>> - precomputing partial products of the static extents, > >>>>> - special cases for low-rank extents, > >>>>> - rewriting the condition E[i] == dynamic_extent in a more > >>>>> optimizer friendly manner. > >>>>> - effectively loop-unrolling extents::operator==. > >>>>> > >>>>> While simplistic micro-benchmarking shows the effectiveness of these > >>>>> changes, likely the stronger argument is presented in each commit: > >>>>> a) each change removes needless complexity, > >>>>> b) before/after examples of generated code show the > effectiveness. > >>>>> > >>>>> Luc Grosheintz (8): > >>>>> libstdc++: Reduce template instantiations in <mdspan>. > >>>>> libstdc++: Precompute products of static extents. > >>>>> libstdc++: Improve low-rank layout_{left,right}::stride. > >>>>> libstdc++: Improve fully dynamic extents in mdspan. > >>>>> libstdc++: Improve nearly fully dynamic extents in mdspan. > >>>>> libstdc++: Reduce indirection in extents::extent. > >>>>> libstdc++: Improve extents::operator==. > >>>>> libstdc++: Replace numeric_limit with __int_traits in mdspan. > >>>>> > >>>>> libstdc++-v3/include/std/mdspan | 282 > >>> +++++++++++++----- > >>>>> .../mdspan/extents/class_mandates_neg.cc | 3 + > >>>>> 2 files changed, 208 insertions(+), 77 deletions(-) > >>>>> > >>>>> -- > >>>>> 2.50.0 > >>>>> > >>>>> > >>>> > >>> > >>> > > > >