On Fri, 13 Dec 2024 at 17:12, Tamar Christina <tamar.christ...@arm.com> wrote:
>
> Hi All,
>
> In GCC 12 there was a ~40% regression in the performance of hashmap->find.
>
> This regression came about accidentally:
>
> Before GCC 12 the find function was small enough that IPA would inline it even
> though it wasn't marked inline.  In GCC-12 an optimization was added to 
> perform
> a linear search when the entries in the hashmap are small.
>
> This increased the size of the function enough that IPA would no longer 
> inline.
> Inlining had two benefits:
>
> 1.  The return value is a reference. so it has to be returned and dereferenced
>     even though the search loop may have already dereference it.
> 2.  The pattern is a hard pattern to track for branch predictors.  This causes
>     a large number of branch misses if the value is immediately checked and
>     branched on. i.e. if (a != m.end()) which is a common pattern.
>
> The patch fixes both these issues by adding the inline keyword to _M_locate
> to allow the inliner to consider inlining again.
>
> This and the other patches have been ran through serveral benchmarks where
> the size, number of elements searched for and type (reference vs value) etc
> were tested.
>
> The change shows no statistical regression, but an average find improvement of
> ~27% and a range between ~10-60% improvements.  A selection of the results:
>
> +-----------+--------------------+-------+----------+
> | Group     | Benchmark          | Size  | % Inline |
> +-----------+--------------------+-------+----------+
> | Find      | unord<uint64_t     | 11264 | 53.52%   |
> | Find      | unord<uint64_t     | 11264 | 47.98%   |
> | Find Many | unord<uint64_t     | 11    | 47.62%   |
> | Find Many | unord<std::string  | 11    | 44.94%   |
> | Find Many | unord<std::string  | 11    | 44.89%   |
> | Find Many | unord<uint64_t     | 11    | 40.90%   |
> | Find      | unord<NonSSOString | 11    | 31.80%   |
> | Find      | unord<NonSSOString | 11    | 31.17%   |
> | Find Many | unord<uint64_t     | 352   | 30.57%   |
> | Find      | unord<uint64_t     | 352   | 28.27%   |
> | Find Many | unord<uint64_t     | 352   | 26.80%   |
> | Find Many | unord<NonSSOString | 352   | 26.59%   |
> | Find Many | unord<NonSSOString | 352   | 25.93%   |
> | Find      | unord<std::string  | 11    | 25.66%   |
> | Find Many | unord<NonSSOString | 11    | 23.21%   |
> | Find Many | unord<std::string  | 352   | 23.12%   |
> | Find Many | unord<NonSSOString | 11    | 22.04%   |
> | Find Many | unord<uint64_t     | 11264 | 21.33%   |
> | Find Many | unord<uint64_t     | 11264 | 20.71%   |
> | Find      | unord<std::string  | 11    | 20.36%   |
> | Find Many | unord<std::string  | 352   | 19.23%   |
> | Find      | unord<std::string  | 352   | 18.59%   |
> | Find      | unord<uint64_t     | 352   | 15.43%   |
> | Find      | unord<NonSSOString | 352   | 13.79%   |
> | Find      | unord<std::string  | 11264 | 11.80%   |
> | Find      | unord<std::string  | 352   | 11.12%   |
> | Find      | val<uint64_t       | 11    | 9.98%    |
> | Find      | unord<std::string  | 11264 | 9.97%    |
> | Find      | unord<NonSSOString | 352   | 9.69%    |
> | Find      | unord<NonSSOString | 11264 | 9.19%    |
> +-----------+--------------------+-------+----------+
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master? and possible backports?

Thanks for the thorough benchmarking.

OK for trunk.

A different patch will be needed for the release branches because
_M_locate only exists on trunk.

>
> Thanks,
> Tamar
>
> libstdc++-v3/ChangeLog:
>
>         * include/bits/hashtable.h: Inline _M_locate.
>
> ---
> diff --git a/libstdc++-v3/include/bits/hashtable.h 
> b/libstdc++-v3/include/bits/hashtable.h
> index 
> b8bd8c2f41816a800bab9b3589fe609b16285ad1..e791e52ec329277474f3218d8a44cd37ded14ac3
>  100644
> --- a/libstdc++-v3/include/bits/hashtable.h
> +++ b/libstdc++-v3/include/bits/hashtable.h
> @@ -2213,7 +2213,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>            typename _ExtractKey, typename _Equal,
>            typename _Hash, typename _RangeHash, typename _Unused,
>            typename _RehashPolicy, typename _Traits>
> -    auto
> +    inline auto
>      _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
>                _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
>      _M_locate(const key_type& __k) const
>
>
>
>
> --

Reply via email to