https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98781
Bug ID: 98781 Summary: Poor std::hash for bitset and vector<bool> Product: gcc Version: 11.0 Status: UNCONFIRMED Keywords: ABI Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: redi at gcc dot gnu.org Target Milestone: --- The std::hash specializations for std::bitset and std::vector<bool> do not differentiate between sequences that have different number of zero bits at the end, if they don't spill over into a new byte. e.g. the following all have the same representation, a single byte with the low bit set: std::bitset<1> b1(1); std::bitset<2> b2(1); std::bitset<8> b8(1); As a result, they all hash to the same value. This is less problematic for std::bitset where they have different types anyway, but it's more noticeable for std::vector<bool>: std::vector<bool> b{true}; std::hash<std::vector<bool>> h; for (int i = 0; i < 10; ++i) { std::cout << b.size() << '\t' << h(b) << '\n'; b.push_back(false); } Each iteration produces different values (which don't compare equal) but the hash value only changes when the size increases by a multiple of CHAR_BIT: 1 4334672815104069193 2 4334672815104069193 3 4334672815104069193 4 4334672815104069193 5 4334672815104069193 6 4334672815104069193 7 4334672815104069193 8 4334672815104069193 9 9994027557101822035 10 9994027557101822035 The hash function could combine the result of hashing the value bytes with the hash of the sequence length. That would produce a distinct hash for each value above. That would be an ABI change though, as different translation units could give different hash results for the same inputs. FWIW MSVC has the same problem, but the hash only changes for multiples of 32, and libc++ never changes the result for any number of trailing zeros (it appears to hash vector<bool> as a simple XOR of the integer words).