Sorry to react so late on this patch.

I'm only surprised by the expected result of the added __builtin_expect
which is 0.

It means that we consider that hasher is not perfect, we have several
entries in the same bucket. Shouldn't we reward those that are spending
time on their hasher to make it as perfect as possible ?

Said differently is using 1 in the __builtin_expect changes the produced
figures a lot ?

François

On Wed, Dec 18, 2024 at 5:01 PM Jonathan Wakely <jwak...@redhat.com> wrote:

> On Wed, 18 Dec 2024 at 14:14, Tamar Christina <tamar.christ...@arm.com>
> wrote:
> >
> > > e791e52ec329277474f3218d8a44cd37ded14ac3..8101d868d0c5f7ac4f97931a
> > > > ffcf71d826c88094 100644
> > > > > --- a/libstdc++-v3/include/bits/hashtable.h
> > > > > +++ b/libstdc++-v3/include/bits/hashtable.h
> > > > > @@ -2171,7 +2171,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > > > >           if (this->_M_equals(__k, __code, *__p))
> > > > >             return __prev_p;
> > > > >
> > > > > -         if (!__p->_M_nxt || _M_bucket_index(*__p->_M_next()) !=
> __bkt)
> > > > > +         if (__builtin_expect (!__p->_M_nxt ||
> _M_bucket_index(*__p-
> > > >_M_next())
> > > > != __bkt, 0))
> > > > >             break;
> > > > >           __prev_p = __p;
> > > > >         }
> > > > > @@ -2201,7 +2201,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > > > >             if (this->_M_equals_tr(__k, __code, *__p))
> > > > >               return __prev_p;
> > > > >
> > > > > -           if (!__p->_M_nxt || _M_bucket_index(*__p->_M_next())
> != __bkt)
> > > > > +           if (__builtin_expect (!__p->_M_nxt ||
> _M_bucket_index(*__p-
> > > > >_M_next()) != __bkt, 0))
> > > > >               break;
> > > > >             __prev_p = __p;
> > > > >           }
> > > > > @@ -2228,7 +2228,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > > > >
> pointer_to(const_cast<__node_base&>(_M_before_begin));
> > > > >           while (__loc._M_before->_M_nxt)
> > > > >             {
> > > > > -             if (this->_M_key_equals(__k, *__loc._M_node()))
> > > > > +             if (__builtin_expect (this->_M_key_equals(__k,
> *__loc._M_node()), 1))
> > > > >                 return __loc;
> > > > >               __loc._M_before = __loc._M_before->_M_nxt;
> > > > >             }
> > > >
> > > > The first two changes make sense to me: we're typically going to loop
> > > > more than once when searching for an element, so the break is
> > > > unlikely. The third change is a bit more surprising, because although
> > > > it's likely we'll find an element *eventually* it is only likely once
> > > > out of N iterations, not likely on every iteration. But if the
> numbers
> > > > show it helps, then that just shows my intuition is probably wrong.
> > > >
> > > > If this is optimized for 'find' is it possible that this change
> > > > pessimizes insertion (which also has to use _M_locate to find where
> to
> > > > insert)?
> > >
> > > You're right there seems to be a uniform slowdown for insertions in
> small sized
> > > hashtables of ~10 entries.  They're about 10-14% slower with that
> change.
> > >
> > > As expected just the ones on _M_find_before_node does not cause any
> issues.
> > >
> > > Since the insertions into small hashtables don't run long enough I've
> also increased
> > > the number of iterations to ~1 million even out the score a bit more
> but there's still
> > > a sizable 5 regressions.
> > >
> > > I've kicked off a longer run removing the change on _M_locate and with
> some
> > > more
> > > variable size finds/inserts.
> > >
> > > So far it looks like the additional benefits gotten on _M_locate were
> > > mainly on two tests.  I'll look at the perf traces to figure out
> exactly why but I'll
> > > respin
> > > the patch without the _M_locate change and send it tomorrow morning
> once
> > > benchmarks finish.
> > >
> >
> > Hi,
> >
> > Here's a new patch and new numbers showing the improvements over the
> previous
> > one and total improvement with the two together.  This change seems to
> be mostly
> > beneficial on larger sized hashmaps and otherwise no real losses on
> Insert or Find.
> > (around the region of ~1% which look like noise).
> >
> > which results in ~0-10% extra on top of the previous patch.
> >
> > In table form:
> >
> >
> +-------------+---------------+-------+--------------------+-------------------+-----------------+
> > | benchmark   | Type          | Size  | Inline vs baseline | final vs
> baseline | final vs inline |
> >
> +-------------+---------------+-------+--------------------+-------------------+-----------------+
> > | find many   | uint64_t      | 11253 | -15.67%            | -22.96%
>        | -8.65%          |
> > | find many   | uint64_t      | 11253 | -16.74%            | -23.37%
>        | -7.96%          |
> > | find single | uint64_t      | 345   | -5.88%             | -11.54%
>        | -6.02%          |
> > | find many   | string        | 11253 | -4.50%             | -9.56%
>       | -5.29%          |
> > | find single | uint64_t      | 345   | -4.38%             | -9.41%
>       | -5.26%          |
> > | find single | shared string | 11253 | -6.67%             | -11.00%
>        | -4.64%          |
> > | find single | shared string | 11253 | -4.63%             | -9.03%
>       | -4.61%          |
> > | find single | shared string | 345   | -10.41%            | -14.44%
>        | -4.50%          |
> > | find many   | string        | 11253 | -3.41%             | -7.51%
>       | -4.24%          |
> > | find many   | shared string | 11253 | -2.30%             | -5.72%
>       | -3.50%          |
> > | find many   | string        | 13    | 2.86%              | -0.30%
>       | -3.07%          |
> > | find single | string        | 11253 | 4.47%              | 1.34%
>        | -3.00%          |
> > | find many   | custom string | 11253 | 0.25%              | -2.75%
>       | -2.99%          |
> > | find single | uint64_t      | 345   | 2.99%              | 0.01%
>        | -2.90%          |
> > | find single | shared string | 345   | -11.53%            | -13.67%
>        | -2.41%          |
> > | find single | uint64_t      | 11253 | 0.49%              | -1.59%
>       | -2.07%          |
> >
> +-------------+---------------+-------+--------------------+-------------------+-----------------+
> >
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
>
> OK, thanks!
>
>
> >
> > Thanks,
> > Tamar
> >
> > libstdc++-v3/ChangeLog:
> >
> >         * include/bits/hashtable.h
> >         (_M_find_before_node): Make it likely that the map has at least
> one
> >         entry and so we do at least one iteration.
> >
> > -- inline copy of patch --
> >
> > diff --git a/libstdc++-v3/include/bits/hashtable.h
> b/libstdc++-v3/include/bits/hashtable.h
> > index
> e791e52ec329277474f3218d8a44cd37ded14ac3..51437619b44ce8b93be6a1d223828fda5a4a4c45
> 100644
> > --- a/libstdc++-v3/include/bits/hashtable.h
> > +++ b/libstdc++-v3/include/bits/hashtable.h
> > @@ -2171,7 +2171,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >           if (this->_M_equals(__k, __code, *__p))
> >             return __prev_p;
> >
> > -         if (!__p->_M_nxt || _M_bucket_index(*__p->_M_next()) != __bkt)
> > +         if (__builtin_expect (!__p->_M_nxt ||
> _M_bucket_index(*__p->_M_next()) != __bkt, 0))
> >             break;
> >           __prev_p = __p;
> >         }
> > @@ -2201,7 +2201,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >             if (this->_M_equals_tr(__k, __code, *__p))
> >               return __prev_p;
> >
> > -           if (!__p->_M_nxt || _M_bucket_index(*__p->_M_next()) !=
> __bkt)
> > +           if (__builtin_expect (!__p->_M_nxt ||
> _M_bucket_index(*__p->_M_next()) != __bkt, 0))
> >               break;
> >             __prev_p = __p;
> >           }
>
>

Reply via email to