Hello, I'm curious about something which may qualify as a stupid question.
What bad thing would happen if we used OIDs directly as hash values in internal hash tables (that is, instead of uint32_hash() we'd use uint32_identity(), or somehow optimise it away entirely, as you can see in some C++ standard libraries for eg std::hash<int>)? From what I know of this subject, the main risk is that, in general, the distribution of integer keys stored in a hash table might *accidentally* happen to have regular gaps with a stride that shares a common factor with the number of buckets leading to an unbalanced hash table (memory addresses are an example because they tend to have a stride corresponding to hardware alignment rules, as are some distributed sequence generation tricks), whereas it probably takes a non-accidental attack to get higher collision rates with a decent hash function. A good hash function would defend against that kind of thing (but cost cycles to compute), and alternatively a prime number of buckets would defend against it (but cost more cycles to %). However, as far as I can see OIDs are expected to have an even distribution (or at least we don't expect regular sized gaps), so the hazard doesn't seem to apply. One problem could be that, although collisions are not expected with high probability when the hash table is big enough, when they do occur, hash tables using linear probing-based collision strategies (simplehash, but not dynahash) probably work less well if the chance of non-empty buckets having non-empty neighbours (AKA clustering) increases. I'm not sure whether to consider OIDs to be clustered in general or not (though obviously the ones created by initdb are). -- Thomas Munro http://www.enterprisedb.com