Hi Stephen, On 3.7.2014 20:10, Stephen Frost wrote: > Tomas, > > * Tomas Vondra (t...@fuzzy.cz) wrote: >> However it's likely there are queries where this may not be the case, >> i.e. where rebuilding the hash table is not worth it. Let me know if you >> can construct such query (I wasn't). > > Thanks for working on this! I've been thinking on this for a while > and this seems like it may be a good approach. Have you considered a > bloom filter over the buckets..? Also, I'd suggest you check the
I know you've experimented with it, but I haven't looked into that yet. > archives from about this time last year for test cases that I was > using which showed cases where hashing the larger table was a better > choice- those same cases may also show regression here (or at least > would be something good to test). Good idea, I'll look at the test cases - thanks. > Have you tried to work out what a 'worst case' regression for this > change would look like? Also, how does the planning around this > change? Are we more likely now to hash the smaller table (I'd guess > 'yes' just based on the reduction in NTUP_PER_BUCKET, but did you > make any changes due to the rehashing cost?)? The case I was thinking about is underestimated cardinality of the inner table and a small outer table. That'd lead to a large hash table and very few lookups (thus making the rehash inefficient). I.e. something like this: Hash Join Seq Scan on small_table (rows=100) (actual rows=100) Hash Seq Scan on bad_estimate (rows=100) (actual rows=1000000000) Filter: ((a < 100) AND (b < 100)) But I wasn't able to reproduce this reasonably, because in practice that'd lead to a nested loop or something like that (which is a planning issue, impossible to fix in hashjoin code). Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers