Hi All, In GCC 12 there was a ~40% regression in the performance of hashmap->find.
This regression came about accidentally: Before GCC 12 the find function was small enough that IPA would inline it even though it wasn't marked inline. In GCC-12 an optimization was added to perform a linear search when the entries in the hashmap are small. This increased the size of the function enough that IPA would no longer inline. Inlining had two benefits: 1. The return value is a reference. so it has to be returned and dereferenced even though the search loop may have already dereference it. 2. The pattern is a hard pattern to track for branch predictors. This causes a large number of branch misses if the value is immediately checked and branched on. i.e. if (a != m.end()) which is a common pattern. The patch fixes both these issues by adding the inline keyword to _M_locate to allow the inliner to consider inlining again. This and the other patches have been ran through serveral benchmarks where the size, number of elements searched for and type (reference vs value) etc were tested. The change shows no statistical regression, but an average find improvement of ~27% and a range between ~10-60% improvements. A selection of the results: +-----------+--------------------+-------+----------+ | Group | Benchmark | Size | % Inline | +-----------+--------------------+-------+----------+ | Find | unord<uint64_t | 11264 | 53.52% | | Find | unord<uint64_t | 11264 | 47.98% | | Find Many | unord<uint64_t | 11 | 47.62% | | Find Many | unord<std::string | 11 | 44.94% | | Find Many | unord<std::string | 11 | 44.89% | | Find Many | unord<uint64_t | 11 | 40.90% | | Find | unord<NonSSOString | 11 | 31.80% | | Find | unord<NonSSOString | 11 | 31.17% | | Find Many | unord<uint64_t | 352 | 30.57% | | Find | unord<uint64_t | 352 | 28.27% | | Find Many | unord<uint64_t | 352 | 26.80% | | Find Many | unord<NonSSOString | 352 | 26.59% | | Find Many | unord<NonSSOString | 352 | 25.93% | | Find | unord<std::string | 11 | 25.66% | | Find Many | unord<NonSSOString | 11 | 23.21% | | Find Many | unord<std::string | 352 | 23.12% | | Find Many | unord<NonSSOString | 11 | 22.04% | | Find Many | unord<uint64_t | 11264 | 21.33% | | Find Many | unord<uint64_t | 11264 | 20.71% | | Find | unord<std::string | 11 | 20.36% | | Find Many | unord<std::string | 352 | 19.23% | | Find | unord<std::string | 352 | 18.59% | | Find | unord<uint64_t | 352 | 15.43% | | Find | unord<NonSSOString | 352 | 13.79% | | Find | unord<std::string | 11264 | 11.80% | | Find | unord<std::string | 352 | 11.12% | | Find | val<uint64_t | 11 | 9.98% | | Find | unord<std::string | 11264 | 9.97% | | Find | unord<NonSSOString | 352 | 9.69% | | Find | unord<NonSSOString | 11264 | 9.19% | +-----------+--------------------+-------+----------+ Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? and possible backports? Thanks, Tamar libstdc++-v3/ChangeLog: * include/bits/hashtable.h: Inline _M_locate. --- diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h index b8bd8c2f41816a800bab9b3589fe609b16285ad1..e791e52ec329277474f3218d8a44cd37ded14ac3 100644 --- a/libstdc++-v3/include/bits/hashtable.h +++ b/libstdc++-v3/include/bits/hashtable.h @@ -2213,7 +2213,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION typename _ExtractKey, typename _Equal, typename _Hash, typename _RangeHash, typename _Unused, typename _RehashPolicy, typename _Traits> - auto + inline auto _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>:: _M_locate(const key_type& __k) const --
diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h index b8bd8c2f41816a800bab9b3589fe609b16285ad1..e791e52ec329277474f3218d8a44cd37ded14ac3 100644 --- a/libstdc++-v3/include/bits/hashtable.h +++ b/libstdc++-v3/include/bits/hashtable.h @@ -2213,7 +2213,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION typename _ExtractKey, typename _Equal, typename _Hash, typename _RangeHash, typename _Unused, typename _RehashPolicy, typename _Traits> - auto + inline auto _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>:: _M_locate(const key_type& __k) const