On Tue, Aug 5, 2025 at 1:14 PM Luc Grosheintz <luc.groshei...@gmail.com>
wrote:

>
>
> On 8/5/25 10:16, Tomasz Kaminski wrote:
> > Hi,
> >
> > I have posted v3 patches with changes I have made locally for first 6
> > patches, and I think this series
> > is ready to land, in addition to
> > https://gcc.gnu.org/pipermail/libstdc++/2025-July/062727.html, that I
> > already reviewed.
> > I will keep aligned accessor on top of it.
> >
> > As the separate commit, we should update the __fwd_partial_prod and
> > __rev_partial_prod computation
> > to use fact that this is partial prod, and we have previous partial
> > product computed:
> >      template<array _Extents>
> >        constexpr auto __fwd_partial_prods = [] consteval
> >          {
> >            constexpr size_t __rank = _Extents.size();
> >            std::array<size_t, __rank + 1> __ret;
> >            __ret[0] = 1;
> >            for(size_t __r = 1; __r < __rank + 1; ++__r)
> >              if (_Extents[__r] != dynamic_extents)
> >                __ret[__r] = __ret[0] * _Extents[__r];
> >            return __ret;
> >          }();
> > We are doing this at compile time, but this should help composition
> speed.
> > I would then inline __static_prod loop into the __mdspan::__size function
> > and remove that function
> > entirely.
>
> I checked both approaches, and it didn't affect the compile time in
> the slightest, so I decided it wasn't worth it; and went with the
> less error prone solution. OTOH, you're right, knowingly leaving in
> something that's accidentally quadratic is begging for some obscure
> problem to arise. Moreover, measuring these can be deceptive, because
> they have a tendency to be fast enough, right up to the point where it
> becomes prohibitively expensive.
>
Note that we are talking about the compile-time computation, that  is done
by an interpreter. It will never be costfull at runtime. And remove the
unnecessary
element from __fwd_partial_prod that should be the size of __rank instead
of __rank + 1.

>
> If you want I can fix this up in a separate patch.
>
> >
> >
> > I would be interested if we could reduce the __fwd/rev_partial_prod
> sizes,
> > but not keep the outermost dimensions there,
> > i.e. adding a check for __r == 0 early in the __fwd_prod funciton.
> >          constexpr size_t __rank = _Extents::rank();
> >          constexpr auto& __sta_exts = __static_extents<_Extents>();
> >          if constexpr (__rank == 1)
> >            return 1;
> >         // new if here, that is extracted from __rank == 2
> >          else if (__r == 0)
> >            return 1;
> >          else if constexpr (__rank == 2)
> >            return _exts.extent(0);
> >          else if constexpr
> > (__all_dynamic(std::span(__sta_exts).first(__rank-1)))
> >            return __extents_prod(__exts, 1, 0, __r);
> >          else
> >            {
> >              size_t __sta_prod = __fwd_partial_prods<__sta_exts>[__r-1];
> //
> > size reduce by one here
> >              return __extents_prod(__exts, __sta_prod, 0, __r);
> >            }
>
> To me it's not obvious how this one will go. Intuitively, this adds a
> comparison and branching to avoid loading 8 bytes. Compared to the other
> size reductions, this has a lower upper limit as to how effective it's
> going to be.
>
My idea here is that we will avoid storing 8 bytes per possible combination
of extents values, to store a constant that always has a value of 1.
This seems wasteful to me. So, I would go for it if:
* produces the same code for rank == 0, as before
* does not prevent using vectorization for all dynamic extents with, just
with
  addition of single if.


> I don't think I'll work on this, since I don't have the required test
> cases to properly measure this to justify the change; and I don't trust
> myself to make the right call without very exhaustive measurements. I
> also don't have access to a suitably large variety of hardware.
>
> >
> > Similarly for __rev_prod:
> >          constexpr size_t __rank = _Extents::rank();
> >          constexpr auto& __sta_exts = __static_extents<_Extents>();
> >          if constexpr (__rank == 1)
> >            return 1;
> >         else if (__r == __rank - 1)
> >            return 1;
> >          else if constexpr (__rank == 2)
> >            return __exts.extent(1);
> >          else if constexpr
> > (__all_dynamic(std::span(__sta_exts).last(__rank-1)))
> >            return __extents_prod(__exts, 1, __r + 1, __rank);
> >          else
> >            {
> >              size_t __sta_prod = __rev_partial_prods<__sta_exts>[__r-1];
> //
> > size reduced by one here.
> >              return __extents_prod(__exts, __sta_prod, __r + 1, __rank);
> >            }
> >
> > Regards,
> > Tomasz
> >
> >
> > On Mon, Aug 4, 2025 at 7:51 PM Luc Grosheintz <luc.groshei...@gmail.com>
> > wrote:
> >
> >>
> >>
> >> On 8/4/25 17:42, Tomasz Kaminski wrote:
> >>> On Mon, Aug 4, 2025 at 1:14 PM Tomasz Kaminski <tkami...@redhat.com>
> >> wrote:
> >>>
> >>>>
> >>>>
> >>>> On Mon, Aug 4, 2025 at 1:08 PM Luc Grosheintz <
> luc.groshei...@gmail.com
> >>>
> >>>> wrote:
> >>>>
> >>>>> Hi Tomasz,
> >>>>>
> >>>>> Thank you for the review! Sorry about the missing parens, even after
> >>>>> "Ctrl+F"ing each of the emails I can't find the requires clause (the
> >>>>> two I found both need the parens).
> >>>>>
> >>>> I was mentioning this one, but it is possible that I am wrong on this
> >> one,
> >>>> I haven't yet checked it, as I was planning to do it during full
> review.
> >>>>       template<array _Extents>
> >>>>         requires (__all_dynamic<_Extents>())
> >>>>         class _StaticExtents<_Extents>;
> >>>> But regardless, I will make that change locally if it does work, so
> you
> >> do
> >>>> not
> >>>> need to update patch series for it.
> >>>>
> >>> Parenthesis seem to be indeed required here, so my comment was a red
> >>> herring.
> >>
> >> I'm pretty sure that this is the reason why I wrongly concluded
> >> that the pattern is `requires(cond)`, i.e. the parens are always
> >> needed.
> >>
> >>>
> >>>>
> >>>>> Do you think it's possible to merge this series before the other two
> >>>>> outstanding patch series? There's some risk of collision; making
> >> changes
> >>>>> to this patch series is much more time consuming than rebasing (and
> >>>>> retesting the other two patch series).
> >>>>>
> >>>> This sounds very reasonable. I will prepare a stack of patches in your
> >>>> suggested
> >>>> order locally, and push them once they are approved. But it still make
> >>>>
> >>>>>
> >>>>> Thank you,
> >>>>>
> >>>> Thank you for your continued contributions.
> >>>>
> >>>>> Luc
> >>>>>
> >>>>> On 8/4/25 12:43, Tomasz Kaminski wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> I this time I made a quick pass through all changes, before
> commenting
> >>>>> on
> >>>>>> first commits,
> >>>>>> and they look solid to me, and I haven't noticed anything I would
> like
> >>>>> to
> >>>>>> change (except parentheses
> >>>>>> around requires, but I will handle that locally). I will try to do a
> >>>>> full
> >>>>>> review during this week.
> >>>>>>
> >>>>>> Regards,
> >>>>>> Tomasz
> >>>>>>
> >>>>>> On Sun, Aug 3, 2025 at 10:59 PM Luc Grosheintz <
> >>>>> luc.groshei...@gmail.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> The combined effect of this sequence of change is:
> >>>>>>>
> >>>>>>>      * a reduction in the number of template instantiations, by
> >>>>>>>        - avoiding needless dependency of IndexType,
> >>>>>>>        - special formulas for low-rank extents,
> >>>>>>>        - special formulas for (nearly) fully dynamic extents.
> >>>>>>>
> >>>>>>>      * improved code quality, by
> >>>>>>>        - precomputing partial products of the static extents,
> >>>>>>>        - special cases for low-rank extents,
> >>>>>>>        - rewriting the condition E[i] == dynamic_extent in a more
> >>>>>>>          optimizer friendly manner.
> >>>>>>>        - effectively loop-unrolling extents::operator==.
> >>>>>>>
> >>>>>>> While simplistic micro-benchmarking shows the effectiveness of
> these
> >>>>>>> changes, likely the stronger argument is presented in each commit:
> >>>>>>>      a) each change removes needless complexity,
> >>>>>>>      b) before/after examples of generated code show the
> >> effectiveness.
> >>>>>>>
> >>>>>>> Luc Grosheintz (8):
> >>>>>>>      libstdc++: Reduce template instantiations in <mdspan>.
> >>>>>>>      libstdc++: Precompute products of static extents.
> >>>>>>>      libstdc++: Improve low-rank layout_{left,right}::stride.
> >>>>>>>      libstdc++: Improve fully dynamic extents in mdspan.
> >>>>>>>      libstdc++: Improve nearly fully dynamic extents in mdspan.
> >>>>>>>      libstdc++: Reduce indirection in extents::extent.
> >>>>>>>      libstdc++: Improve extents::operator==.
> >>>>>>>      libstdc++: Replace numeric_limit with __int_traits in mdspan.
> >>>>>>>
> >>>>>>>     libstdc++-v3/include/std/mdspan               | 282
> >>>>> +++++++++++++-----
> >>>>>>>     .../mdspan/extents/class_mandates_neg.cc      |   3 +
> >>>>>>>     2 files changed, 208 insertions(+), 77 deletions(-)
> >>>>>>>
> >>>>>>> --
> >>>>>>> 2.50.0
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>
> >>
> >>
> >
>
>

Reply via email to