On Tue, Aug 5, 2025 at 2:14 PM Luc Grosheintz <luc.groshei...@gmail.com> wrote:
> > > On 8/5/25 13:25, Tomasz Kaminski wrote: > > On Tue, Aug 5, 2025 at 1:14 PM Luc Grosheintz <luc.groshei...@gmail.com> > > wrote: > > > >> > >> > >> On 8/5/25 10:16, Tomasz Kaminski wrote: > >>> Hi, > >>> > >>> I have posted v3 patches with changes I have made locally for first 6 > >>> patches, and I think this series > >>> is ready to land, in addition to > >>> https://gcc.gnu.org/pipermail/libstdc++/2025-July/062727.html, that I > >>> already reviewed. > >>> I will keep aligned accessor on top of it. > >>> > >>> As the separate commit, we should update the __fwd_partial_prod and > >>> __rev_partial_prod computation > >>> to use fact that this is partial prod, and we have previous partial > >>> product computed: > >>> template<array _Extents> > >>> constexpr auto __fwd_partial_prods = [] consteval > >>> { > >>> constexpr size_t __rank = _Extents.size(); > >>> std::array<size_t, __rank + 1> __ret; > >>> __ret[0] = 1; > >>> for(size_t __r = 1; __r < __rank + 1; ++__r) > >>> if (_Extents[__r] != dynamic_extents) > >>> __ret[__r] = __ret[0] * _Extents[__r]; > >>> return __ret; > >>> }(); > >>> We are doing this at compile time, but this should help composition > >> speed. > >>> I would then inline __static_prod loop into the __mdspan::__size > function > >>> and remove that function > >>> entirely. > >> > >> I checked both approaches, and it didn't affect the compile time in > >> the slightest, so I decided it wasn't worth it; and went with the > >> less error prone solution. OTOH, you're right, knowingly leaving in > >> something that's accidentally quadratic is begging for some obscure > >> problem to arise. Moreover, measuring these can be deceptive, because > >> they have a tendency to be fast enough, right up to the point where it > >> becomes prohibitively expensive. > >> > > Note that we are talking about the compile-time computation, that is > done > > by an interpreter. It will never be costfull at runtime. And remove the > > unnecessary > > element from __fwd_partial_prod that should be the size of __rank instead > > of __rank + 1. > > Yes, but leaving in code that could bring the compiler / compilation > to a grinding halt doesn't seems smart either (in hindsight) :) > > > > >> > >> If you want I can fix this up in a separate patch. > >> > >>> > >>> > >>> I would be interested if we could reduce the __fwd/rev_partial_prod > >> sizes, > >>> but not keep the outermost dimensions there, > >>> i.e. adding a check for __r == 0 early in the __fwd_prod funciton. > >>> constexpr size_t __rank = _Extents::rank(); > >>> constexpr auto& __sta_exts = __static_extents<_Extents>(); > >>> if constexpr (__rank == 1) > >>> return 1; > >>> // new if here, that is extracted from __rank == 2 > >>> else if (__r == 0) > >>> return 1; > >>> else if constexpr (__rank == 2) > >>> return _exts.extent(0); > >>> else if constexpr > >>> (__all_dynamic(std::span(__sta_exts).first(__rank-1))) > >>> return __extents_prod(__exts, 1, 0, __r); > >>> else > >>> { > >>> size_t __sta_prod = > __fwd_partial_prods<__sta_exts>[__r-1]; > >> // > >>> size reduce by one here > >>> return __extents_prod(__exts, __sta_prod, 0, __r); > >>> } > >> > >> To me it's not obvious how this one will go. Intuitively, this adds a > >> comparison and branching to avoid loading 8 bytes. Compared to the other > >> size reductions, this has a lower upper limit as to how effective it's > >> going to be. > >> > > My idea here is that we will avoid storing 8 bytes per possible > combination > > of extents values, to store a constant that always has a value of 1. > > This seems wasteful to me. So, I would go for it if: > > * produces the same code for rank == 0, as before > > * does not prevent using vectorization for all dynamic extents with, just > > with > > addition of single if. > > Since you really don't like wasting space: in many cases the 7 high > bytes of size_t will be zero (or 0xff...ff). What's stopping us from > storing [3, 5, 7] in an array<uint8_t>? We need to be very careful > handling dynamic_extent; but I think it's only used in checks, e.g. > > _Extent[i] == dynamic_extent > > and that can be translated to > > _Extent[i] == decltype(_Extent[i])(-1) > > and we've saved 8x .rodata space =) > Actually, using the product of all static extents to determine the element type __fwd_partial_prod/__rev_partial_prod sounds like something we could explore if someone complains about the sizes. Same for the _S_dynamic_index_data. > > > > > > >> I don't think I'll work on this, since I don't have the required test > >> cases to properly measure this to justify the change; and I don't trust > >> myself to make the right call without very exhaustive measurements. I > >> also don't have access to a suitably large variety of hardware. > >> > >>> > >>> Similarly for __rev_prod: > >>> constexpr size_t __rank = _Extents::rank(); > >>> constexpr auto& __sta_exts = __static_extents<_Extents>(); > >>> if constexpr (__rank == 1) > >>> return 1; > >>> else if (__r == __rank - 1) > >>> return 1; > >>> else if constexpr (__rank == 2) > >>> return __exts.extent(1); > >>> else if constexpr > >>> (__all_dynamic(std::span(__sta_exts).last(__rank-1))) > >>> return __extents_prod(__exts, 1, __r + 1, __rank); > >>> else > >>> { > >>> size_t __sta_prod = > __rev_partial_prods<__sta_exts>[__r-1]; > >> // > >>> size reduced by one here. > >>> return __extents_prod(__exts, __sta_prod, __r + 1, > __rank); > >>> } > >>> > >>> Regards, > >>> Tomasz > >>> > >>> > >>> On Mon, Aug 4, 2025 at 7:51 PM Luc Grosheintz < > luc.groshei...@gmail.com> > >>> wrote: > >>> > >>>> > >>>> > >>>> On 8/4/25 17:42, Tomasz Kaminski wrote: > >>>>> On Mon, Aug 4, 2025 at 1:14 PM Tomasz Kaminski <tkami...@redhat.com> > >>>> wrote: > >>>>> > >>>>>> > >>>>>> > >>>>>> On Mon, Aug 4, 2025 at 1:08 PM Luc Grosheintz < > >> luc.groshei...@gmail.com > >>>>> > >>>>>> wrote: > >>>>>> > >>>>>>> Hi Tomasz, > >>>>>>> > >>>>>>> Thank you for the review! Sorry about the missing parens, even > after > >>>>>>> "Ctrl+F"ing each of the emails I can't find the requires clause > (the > >>>>>>> two I found both need the parens). > >>>>>>> > >>>>>> I was mentioning this one, but it is possible that I am wrong on > this > >>>> one, > >>>>>> I haven't yet checked it, as I was planning to do it during full > >> review. > >>>>>> template<array _Extents> > >>>>>> requires (__all_dynamic<_Extents>()) > >>>>>> class _StaticExtents<_Extents>; > >>>>>> But regardless, I will make that change locally if it does work, so > >> you > >>>> do > >>>>>> not > >>>>>> need to update patch series for it. > >>>>>> > >>>>> Parenthesis seem to be indeed required here, so my comment was a red > >>>>> herring. > >>>> > >>>> I'm pretty sure that this is the reason why I wrongly concluded > >>>> that the pattern is `requires(cond)`, i.e. the parens are always > >>>> needed. > >>>> > >>>>> > >>>>>> > >>>>>>> Do you think it's possible to merge this series before the other > two > >>>>>>> outstanding patch series? There's some risk of collision; making > >>>> changes > >>>>>>> to this patch series is much more time consuming than rebasing (and > >>>>>>> retesting the other two patch series). > >>>>>>> > >>>>>> This sounds very reasonable. I will prepare a stack of patches in > your > >>>>>> suggested > >>>>>> order locally, and push them once they are approved. But it still > make > >>>>>> > >>>>>>> > >>>>>>> Thank you, > >>>>>>> > >>>>>> Thank you for your continued contributions. > >>>>>> > >>>>>>> Luc > >>>>>>> > >>>>>>> On 8/4/25 12:43, Tomasz Kaminski wrote: > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> I this time I made a quick pass through all changes, before > >> commenting > >>>>>>> on > >>>>>>>> first commits, > >>>>>>>> and they look solid to me, and I haven't noticed anything I would > >> like > >>>>>>> to > >>>>>>>> change (except parentheses > >>>>>>>> around requires, but I will handle that locally). I will try to > do a > >>>>>>> full > >>>>>>>> review during this week. > >>>>>>>> > >>>>>>>> Regards, > >>>>>>>> Tomasz > >>>>>>>> > >>>>>>>> On Sun, Aug 3, 2025 at 10:59 PM Luc Grosheintz < > >>>>>>> luc.groshei...@gmail.com> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>> The combined effect of this sequence of change is: > >>>>>>>>> > >>>>>>>>> * a reduction in the number of template instantiations, by > >>>>>>>>> - avoiding needless dependency of IndexType, > >>>>>>>>> - special formulas for low-rank extents, > >>>>>>>>> - special formulas for (nearly) fully dynamic extents. > >>>>>>>>> > >>>>>>>>> * improved code quality, by > >>>>>>>>> - precomputing partial products of the static extents, > >>>>>>>>> - special cases for low-rank extents, > >>>>>>>>> - rewriting the condition E[i] == dynamic_extent in a > more > >>>>>>>>> optimizer friendly manner. > >>>>>>>>> - effectively loop-unrolling extents::operator==. > >>>>>>>>> > >>>>>>>>> While simplistic micro-benchmarking shows the effectiveness of > >> these > >>>>>>>>> changes, likely the stronger argument is presented in each > commit: > >>>>>>>>> a) each change removes needless complexity, > >>>>>>>>> b) before/after examples of generated code show the > >>>> effectiveness. > >>>>>>>>> > >>>>>>>>> Luc Grosheintz (8): > >>>>>>>>> libstdc++: Reduce template instantiations in <mdspan>. > >>>>>>>>> libstdc++: Precompute products of static extents. > >>>>>>>>> libstdc++: Improve low-rank layout_{left,right}::stride. > >>>>>>>>> libstdc++: Improve fully dynamic extents in mdspan. > >>>>>>>>> libstdc++: Improve nearly fully dynamic extents in mdspan. > >>>>>>>>> libstdc++: Reduce indirection in extents::extent. > >>>>>>>>> libstdc++: Improve extents::operator==. > >>>>>>>>> libstdc++: Replace numeric_limit with __int_traits in > mdspan. > >>>>>>>>> > >>>>>>>>> libstdc++-v3/include/std/mdspan | 282 > >>>>>>> +++++++++++++----- > >>>>>>>>> .../mdspan/extents/class_mandates_neg.cc | 3 + > >>>>>>>>> 2 files changed, 208 insertions(+), 77 deletions(-) > >>>>>>>>> > >>>>>>>>> -- > >>>>>>>>> 2.50.0 > >>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>> > >>>> > >>>> > >>> > >> > >> > > > >