On 2017-03-01 09:21:47 +0530, Robert Haas wrote: > On Wed, Mar 1, 2017 at 8:48 AM, Andres Freund <and...@anarazel.de> wrote: > >> BTW, another option to consider is lowering the target fillfactor. > >> IIRC, Kuntal mentioned to me that cranking it down seemed to fix the > >> issue. Obviously, it's better to detect when we need a lower > >> fillfactor than to always use a lower one, but obviously the tighter > >> you pack the hash table, the more likely it is that you're going to > >> have these kinds of problems. > > > > Yea, that'd be one approach, but I feel better dynamically increasing > > the fillfactor like in the patch. Even with a lower fillfactor you could > > see issues, and as you say a higher fillfactor is nice [TM]; after the > > patch I played with *increasing* the fillfactor, without being able to > > measure a downside. > > I am going to bet that increasing the fillfactor would be a mistake. > The expected length of a clump in the table is going to be about > 1/(1-ff), which increases very quickly beyond the current value of > 0.8. For example, if the actual fillfactor is 0.9 and the keys are > perfectly distributed -- nine in a row and then one empty slot, > lather, rinse, repeat -- probing for a key that's not there has a 10% > chance of hitting an empty slot, a 10% chance of hitting the slot just > before an empty slot, and so on. So the expected number of probes to > find that there's no match is 4.5. However, it could easily happen > that you have 3 or 4 empty slots in a row in one place and 27 or 36 > occupied slots in a row in another place, and that could cause all > kinds of grief. Moreover, real-world hash functions on real-world > data aren't always perfect, which increases the chances of things > going off track. I'm not exactly sure how high a fillfactor is too > high, but note that > https://en.wikipedia.org/wiki/Hash_table#Open_addressing claims that > "performance dramatically degrades when the load factor grows beyond > 0.7 or so", which isn't cited but the same text appears in > https://www.unf.edu/~wkloster/3540/wiki_book2.pdf for whatever that's > worth.
That's without the "trick" of "robin hood" hashing though: https://en.wikipedia.org/wiki/Hash_table#Robin_Hood_hashing http://codecapsule.com/2013/11/17/robin-hood-hashing-backward-shift-deletion/ https://www.pvk.ca/Blog/2013/11/26/the-other-robin-hood-hashing/ it probably depends a bit on the scenario how high you want to have the fillfactor - for hashaggs a high fill factor would probably be good (they're often "read" heavy), for bitmapscans less so. But I guess there's not much point in immediately increasing the fillfactor, can do that in a second step. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers