On Tue, Aug 5, 2025 at 1:14 PM Luc Grosheintz <luc.groshei...@gmail.com> wrote:
> > > On 8/5/25 10:16, Tomasz Kaminski wrote: > > Hi, > > > > I have posted v3 patches with changes I have made locally for first 6 > > patches, and I think this series > > is ready to land, in addition to > > https://gcc.gnu.org/pipermail/libstdc++/2025-July/062727.html, that I > > already reviewed. > > I will keep aligned accessor on top of it. > > > > As the separate commit, we should update the __fwd_partial_prod and > > __rev_partial_prod computation > > to use fact that this is partial prod, and we have previous partial > > product computed: > > template<array _Extents> > > constexpr auto __fwd_partial_prods = [] consteval > > { > > constexpr size_t __rank = _Extents.size(); > > std::array<size_t, __rank + 1> __ret; > > __ret[0] = 1; > > for(size_t __r = 1; __r < __rank + 1; ++__r) > > if (_Extents[__r] != dynamic_extents) > > __ret[__r] = __ret[0] * _Extents[__r]; > > return __ret; > > }(); > > We are doing this at compile time, but this should help composition > speed. > > I would then inline __static_prod loop into the __mdspan::__size function > > and remove that function > > entirely. > > I checked both approaches, and it didn't affect the compile time in > the slightest, so I decided it wasn't worth it; and went with the > less error prone solution. OTOH, you're right, knowingly leaving in > something that's accidentally quadratic is begging for some obscure > problem to arise. Moreover, measuring these can be deceptive, because > they have a tendency to be fast enough, right up to the point where it > becomes prohibitively expensive. > Note that we are talking about the compile-time computation, that is done by an interpreter. It will never be costfull at runtime. And remove the unnecessary element from __fwd_partial_prod that should be the size of __rank instead of __rank + 1. > > If you want I can fix this up in a separate patch. > > > > > > > I would be interested if we could reduce the __fwd/rev_partial_prod > sizes, > > but not keep the outermost dimensions there, > > i.e. adding a check for __r == 0 early in the __fwd_prod funciton. > > constexpr size_t __rank = _Extents::rank(); > > constexpr auto& __sta_exts = __static_extents<_Extents>(); > > if constexpr (__rank == 1) > > return 1; > > // new if here, that is extracted from __rank == 2 > > else if (__r == 0) > > return 1; > > else if constexpr (__rank == 2) > > return _exts.extent(0); > > else if constexpr > > (__all_dynamic(std::span(__sta_exts).first(__rank-1))) > > return __extents_prod(__exts, 1, 0, __r); > > else > > { > > size_t __sta_prod = __fwd_partial_prods<__sta_exts>[__r-1]; > // > > size reduce by one here > > return __extents_prod(__exts, __sta_prod, 0, __r); > > } > > To me it's not obvious how this one will go. Intuitively, this adds a > comparison and branching to avoid loading 8 bytes. Compared to the other > size reductions, this has a lower upper limit as to how effective it's > going to be. > My idea here is that we will avoid storing 8 bytes per possible combination of extents values, to store a constant that always has a value of 1. This seems wasteful to me. So, I would go for it if: * produces the same code for rank == 0, as before * does not prevent using vectorization for all dynamic extents with, just with addition of single if. > I don't think I'll work on this, since I don't have the required test > cases to properly measure this to justify the change; and I don't trust > myself to make the right call without very exhaustive measurements. I > also don't have access to a suitably large variety of hardware. > > > > > Similarly for __rev_prod: > > constexpr size_t __rank = _Extents::rank(); > > constexpr auto& __sta_exts = __static_extents<_Extents>(); > > if constexpr (__rank == 1) > > return 1; > > else if (__r == __rank - 1) > > return 1; > > else if constexpr (__rank == 2) > > return __exts.extent(1); > > else if constexpr > > (__all_dynamic(std::span(__sta_exts).last(__rank-1))) > > return __extents_prod(__exts, 1, __r + 1, __rank); > > else > > { > > size_t __sta_prod = __rev_partial_prods<__sta_exts>[__r-1]; > // > > size reduced by one here. > > return __extents_prod(__exts, __sta_prod, __r + 1, __rank); > > } > > > > Regards, > > Tomasz > > > > > > On Mon, Aug 4, 2025 at 7:51 PM Luc Grosheintz <luc.groshei...@gmail.com> > > wrote: > > > >> > >> > >> On 8/4/25 17:42, Tomasz Kaminski wrote: > >>> On Mon, Aug 4, 2025 at 1:14 PM Tomasz Kaminski <tkami...@redhat.com> > >> wrote: > >>> > >>>> > >>>> > >>>> On Mon, Aug 4, 2025 at 1:08 PM Luc Grosheintz < > luc.groshei...@gmail.com > >>> > >>>> wrote: > >>>> > >>>>> Hi Tomasz, > >>>>> > >>>>> Thank you for the review! Sorry about the missing parens, even after > >>>>> "Ctrl+F"ing each of the emails I can't find the requires clause (the > >>>>> two I found both need the parens). > >>>>> > >>>> I was mentioning this one, but it is possible that I am wrong on this > >> one, > >>>> I haven't yet checked it, as I was planning to do it during full > review. > >>>> template<array _Extents> > >>>> requires (__all_dynamic<_Extents>()) > >>>> class _StaticExtents<_Extents>; > >>>> But regardless, I will make that change locally if it does work, so > you > >> do > >>>> not > >>>> need to update patch series for it. > >>>> > >>> Parenthesis seem to be indeed required here, so my comment was a red > >>> herring. > >> > >> I'm pretty sure that this is the reason why I wrongly concluded > >> that the pattern is `requires(cond)`, i.e. the parens are always > >> needed. > >> > >>> > >>>> > >>>>> Do you think it's possible to merge this series before the other two > >>>>> outstanding patch series? There's some risk of collision; making > >> changes > >>>>> to this patch series is much more time consuming than rebasing (and > >>>>> retesting the other two patch series). > >>>>> > >>>> This sounds very reasonable. I will prepare a stack of patches in your > >>>> suggested > >>>> order locally, and push them once they are approved. But it still make > >>>> > >>>>> > >>>>> Thank you, > >>>>> > >>>> Thank you for your continued contributions. > >>>> > >>>>> Luc > >>>>> > >>>>> On 8/4/25 12:43, Tomasz Kaminski wrote: > >>>>>> Hi, > >>>>>> > >>>>>> I this time I made a quick pass through all changes, before > commenting > >>>>> on > >>>>>> first commits, > >>>>>> and they look solid to me, and I haven't noticed anything I would > like > >>>>> to > >>>>>> change (except parentheses > >>>>>> around requires, but I will handle that locally). I will try to do a > >>>>> full > >>>>>> review during this week. > >>>>>> > >>>>>> Regards, > >>>>>> Tomasz > >>>>>> > >>>>>> On Sun, Aug 3, 2025 at 10:59 PM Luc Grosheintz < > >>>>> luc.groshei...@gmail.com> > >>>>>> wrote: > >>>>>> > >>>>>>> The combined effect of this sequence of change is: > >>>>>>> > >>>>>>> * a reduction in the number of template instantiations, by > >>>>>>> - avoiding needless dependency of IndexType, > >>>>>>> - special formulas for low-rank extents, > >>>>>>> - special formulas for (nearly) fully dynamic extents. > >>>>>>> > >>>>>>> * improved code quality, by > >>>>>>> - precomputing partial products of the static extents, > >>>>>>> - special cases for low-rank extents, > >>>>>>> - rewriting the condition E[i] == dynamic_extent in a more > >>>>>>> optimizer friendly manner. > >>>>>>> - effectively loop-unrolling extents::operator==. > >>>>>>> > >>>>>>> While simplistic micro-benchmarking shows the effectiveness of > these > >>>>>>> changes, likely the stronger argument is presented in each commit: > >>>>>>> a) each change removes needless complexity, > >>>>>>> b) before/after examples of generated code show the > >> effectiveness. > >>>>>>> > >>>>>>> Luc Grosheintz (8): > >>>>>>> libstdc++: Reduce template instantiations in <mdspan>. > >>>>>>> libstdc++: Precompute products of static extents. > >>>>>>> libstdc++: Improve low-rank layout_{left,right}::stride. > >>>>>>> libstdc++: Improve fully dynamic extents in mdspan. > >>>>>>> libstdc++: Improve nearly fully dynamic extents in mdspan. > >>>>>>> libstdc++: Reduce indirection in extents::extent. > >>>>>>> libstdc++: Improve extents::operator==. > >>>>>>> libstdc++: Replace numeric_limit with __int_traits in mdspan. > >>>>>>> > >>>>>>> libstdc++-v3/include/std/mdspan | 282 > >>>>> +++++++++++++----- > >>>>>>> .../mdspan/extents/class_mandates_neg.cc | 3 + > >>>>>>> 2 files changed, 208 insertions(+), 77 deletions(-) > >>>>>>> > >>>>>>> -- > >>>>>>> 2.50.0 > >>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>>> > >>> > >> > >> > > > >