On Fri, 13 Dec 2024 at 17:12, Tamar Christina <tamar.christ...@arm.com> wrote: > > Hi All, > > In GCC 12 there was a ~40% regression in the performance of hashmap->find. > > This regression came about accidentally: > > Before GCC 12 the find function was small enough that IPA would inline it even > though it wasn't marked inline. In GCC-12 an optimization was added to > perform > a linear search when the entries in the hashmap are small. > > This increased the size of the function enough that IPA would no longer > inline. > Inlining had two benefits: > > 1. The return value is a reference. so it has to be returned and dereferenced > even though the search loop may have already dereference it. > 2. The pattern is a hard pattern to track for branch predictors. This causes > a large number of branch misses if the value is immediately checked and > branched on. i.e. if (a != m.end()) which is a common pattern. > > The patch fixes both these issues by adding the inline keyword to _M_locate > to allow the inliner to consider inlining again. > > This and the other patches have been ran through serveral benchmarks where > the size, number of elements searched for and type (reference vs value) etc > were tested. > > The change shows no statistical regression, but an average find improvement of > ~27% and a range between ~10-60% improvements. A selection of the results: > > +-----------+--------------------+-------+----------+ > | Group | Benchmark | Size | % Inline | > +-----------+--------------------+-------+----------+ > | Find | unord<uint64_t | 11264 | 53.52% | > | Find | unord<uint64_t | 11264 | 47.98% | > | Find Many | unord<uint64_t | 11 | 47.62% | > | Find Many | unord<std::string | 11 | 44.94% | > | Find Many | unord<std::string | 11 | 44.89% | > | Find Many | unord<uint64_t | 11 | 40.90% | > | Find | unord<NonSSOString | 11 | 31.80% | > | Find | unord<NonSSOString | 11 | 31.17% | > | Find Many | unord<uint64_t | 352 | 30.57% | > | Find | unord<uint64_t | 352 | 28.27% | > | Find Many | unord<uint64_t | 352 | 26.80% | > | Find Many | unord<NonSSOString | 352 | 26.59% | > | Find Many | unord<NonSSOString | 352 | 25.93% | > | Find | unord<std::string | 11 | 25.66% | > | Find Many | unord<NonSSOString | 11 | 23.21% | > | Find Many | unord<std::string | 352 | 23.12% | > | Find Many | unord<NonSSOString | 11 | 22.04% | > | Find Many | unord<uint64_t | 11264 | 21.33% | > | Find Many | unord<uint64_t | 11264 | 20.71% | > | Find | unord<std::string | 11 | 20.36% | > | Find Many | unord<std::string | 352 | 19.23% | > | Find | unord<std::string | 352 | 18.59% | > | Find | unord<uint64_t | 352 | 15.43% | > | Find | unord<NonSSOString | 352 | 13.79% | > | Find | unord<std::string | 11264 | 11.80% | > | Find | unord<std::string | 352 | 11.12% | > | Find | val<uint64_t | 11 | 9.98% | > | Find | unord<std::string | 11264 | 9.97% | > | Find | unord<NonSSOString | 352 | 9.69% | > | Find | unord<NonSSOString | 11264 | 9.19% | > +-----------+--------------------+-------+----------+ > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? and possible backports?
Thanks for the thorough benchmarking. OK for trunk. A different patch will be needed for the release branches because _M_locate only exists on trunk. > > Thanks, > Tamar > > libstdc++-v3/ChangeLog: > > * include/bits/hashtable.h: Inline _M_locate. > > --- > diff --git a/libstdc++-v3/include/bits/hashtable.h > b/libstdc++-v3/include/bits/hashtable.h > index > b8bd8c2f41816a800bab9b3589fe609b16285ad1..e791e52ec329277474f3218d8a44cd37ded14ac3 > 100644 > --- a/libstdc++-v3/include/bits/hashtable.h > +++ b/libstdc++-v3/include/bits/hashtable.h > @@ -2213,7 +2213,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > typename _ExtractKey, typename _Equal, > typename _Hash, typename _RangeHash, typename _Unused, > typename _RehashPolicy, typename _Traits> > - auto > + inline auto > _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, > _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>:: > _M_locate(const key_type& __k) const > > > > > --