Hi, On 2017-03-01 10:39:11 +0530, Kuntal Ghosh wrote: > On Wed, Mar 1, 2017 at 9:33 AM, Kuntal Ghosh <kuntalghosh.2...@gmail.com> > wrote: > > On Wed, Mar 1, 2017 at 9:19 AM, Andres Freund <and...@anarazel.de> wrote: > >> That's without the patch in > >> http://archives.postgresql.org/message-id/20161123083351.5vramz52nmdokhzz%40alap3.anarazel.de > >> ? With that patch it should complete without that change, right? > >> > > No, it's with the patch in the above link which is > > 0001-Resize-simplehash.h-table-in-case-of-long-runs.patch. But, I've > > kept the SH_FILLFACTOR to 0.8 and SH_MAX_FILLFACTOR to 0.95. I'll do > > another round of testing with the constants provided by you. > > > I've tested with the patch > 0001-Resize-simplehash.h-table-in-case-of-long-runs.patch and the > results are same, i.e., hash table grows even when it is only 10-12% > filled. I've attached the logfile for reference.
So, I observed similar things, where the *leader backend* reports a lot of "long runs". WARNING: clumping, growing this thing LOG: size: 1048576, members: 158071, filled: 0.150748, total chain: 95622, max chain: 35, avg chain: 0.604931, total_collisions: 35909, max_collisions: 5, avg_collisions: 0.227170 STATEMENT: EXPLAIN ANALYZE select l_orderkey from lineitem group by l_orderkey; WARNING: clumping, growing this thing LOG: size: 2097152, members: 238992, filled: 0.113960, total chain: 1850141, max chain: 422, avg chain: 7.741435, total_collisions: 55363, max_collisions: 5, avg_collisions: 0.231652 STATEMENT: EXPLAIN ANALYZE select l_orderkey from lineitem group by l_orderkey; WARNING: clumping, growing this thing LOG: size: 4194304, members: 881089, filled: 0.210068, total chain: 567907, max chain: 29, avg chain: 0.644551, total_collisions: 174681, max_collisions: 6, avg_collisions: 0.198256 STATEMENT: EXPLAIN ANALYZE select l_orderkey from lineitem group by l_orderkey; WARNING: clumping, growing this thing LOG: size: 8388608, members: 5397649, filled: 0.643450, total chain: 5752690, max chain: 22, avg chain: 1.065777, total_collisions: 1437957, max_collisions: 8, avg_collisions: 0.266404 STATEMENT: EXPLAIN ANALYZE select l_orderkey from lineitem group by l_orderkey; The pertinent part is that there's "forced" resizes due to long runs at 0.15%, 0.11%, 0.21%, 0.64%. Note that there's a good number of "ordinary" resizes, and at the end there's 7500000 tuples in the hashtable. I.e. there's a number of resizes after the last one due to clumping, and all of those are entirely guided through fillfactor; the final hash-table is entirely reasonably sized. Trying to do the math, that just didn't sit well with me. There's just no way there could be that many and such long runs, without something broken. And indeed: Thinking about this even more I realized the reason this happens is that b81b5a96f4 didn't actually address the problem of inserting in hash-table order. To understand, let's first look at the query plan: Finalize HashAggregate (cost=899227.75..903176.55 rows=394880 width=4) (actual time=6255.957..7415.152 rows=7500000 loops=1) Group Key: l_orderkey -> Gather (cost=652427.75..893304.55 rows=2369280 width=4) (actual time=1741.706..3699.248 rows=7940881 loops=1) Workers Planned: 6 Workers Launched: 6 -> Partial HashAggregate (cost=651427.75..655376.55 rows=394880 width=4) (actual time=1731.468..1956.693 rows=11344 Group Key: l_orderkey -> Parallel Seq Scan on lineitem (cost=0.00..638927.80 rows=4999980 width=4) (actual time=0.025..711.771 rows Planning time: 0.192 ms Execution time: 7700.027 ms The important part is the estimated rows=394880 and actual rows=7500000. That means that the hash-table on the leader backend will first be created suitably for 394880 rows. But we obviously get a lot more. The problem then is that even after introducing the hash_iv stuff (b81b5a96f), we essentially still iterate over the tuples in hash-order. TupleHashTableHash() computes the hash for a single-column group condition as: uint32 hashkey = hashtable->hash_iv; hashkey = (hashkey << 1) | ((hashkey & 0x80000000) ? 1 : 0); attr = slot_getattr(slot, att, &isNull); if (!isNull) /* treat nulls as having hash key 0 */ { uint32 hkey; hkey = DatumGetUInt32(FunctionCall1(&hashfunctions[i], attr)); hashkey ^= hkey; } the resulting hash-values aren't actually meaningfully influenced by the IV. Because we just xor with the IV, most hash-value that without the IV would have fallen into a single hash-bucket, fall into a single hash-bucket afterwards as well; just somewhere else in the hash-range. Which then leads to the issue that we insert hundreds of rows into the leader's hash-value that fall into the same hash-bucket (as the leader's hashtable is quite small at this point, the hash-buckets on the worker tables have *far* fewer entries per bucket / are far smaller on the workers than the leaders). Which then leads to super-long runs. As soon as the table gets to it "real" size, the issue of the overlong runs is gone, because now the hash-table has the appropriate size for its contents. So that explains why there's a there's a lot of resizes when the fillfactor is low (as you've observed seen in the 10-20% range) - in specific hash-ranges the fillfactor is extremely high, but in other parts the table is effectively empty. The question remains what to do about that... I think it's pretty clear that we need a "defense" against such unbalanced hashtables - now that I understand how we can get into the situation of these being so unbalanced I'm a lot less uncomfortable adding the necessary logic. In addition to that it seems quite worthwhile to provide an iterator that's not vulnerable to this. An approach that I am, seemingly successfully, testing is to iterate the hashtable in multiple (in my case 23, because why not) passes, accessing only every nth element. That allows the data to be inserted in a lot less "dense" fashion. But that's more an optimization, so I'll just push something like the patch mentioned in the thread already. Makes some sense? - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers