http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57885
--- Comment #5 from François Dumont <fdumont at gcc dot gnu.org> --- The biggest performance regression introduced in version 4.8 is coming from an attempt to enhance unordered containers global performances. The data model has been modified because the erase operation had become very expensive in some situations since the Standard requires it to return the iterator following the erased one. To do so we moved from a vector<forward_list> like data model to a combination of vector<forward_list::iterator> for buckets and a single forward_list for elements. In the former model when erasing an element we needed to look for the next non-empty bucket potentially looping through a lot of empty ones resulting in very bad performances. In the new model all elements are linked to the next one and the access is direct. The drawback is that when looking for an element you need to get its bucket and then loop on the bucket element. In the former model the loop end condition was the forward_list::end(). In the new model you need to compute the next element bucket to check that you can stop because you moved to another bucket. This explains the additional divq instruction and the performance hint. Of course your bench is isolating the main drawback of the new data model. There are benches in libstdc++ testsuite/performance that compares tr1 and std implementations and globally std ones are better even if you can already see the performance hint you are reporting here. So we chose to move from a data model very good on some use cases but very bad on some others to a data model good in all situations but not very good anymore in your use case. I agree however that your use case is quite important and the only way I can think of to restore its performance would be to introduce a concept of empty node in the data model so that we detect end of bucket without relying on a divq instruction. The drawback will be a memory overhead, one bool per node to flag empty/non-empty node and an additional empty node per non empty bucket. This is perhaps more acceptable for such a container which purpose is not to be memory efficient but rather performance efficient. The performance hint of version 4.9 comes from C++11 allocator conformity. I will avoid details but for exception safety reasons I had to correctly manage an unordered container instance having no bucket allocated. In this case the number of bucket is 0 and doing a 'hash code % 0' operation is bad so I had to introduce some checks on the bucket number at the beginning of some methods like the find one. It explains the additional testq operation which is I fear unavoidable for a Standard conforming implementation.