On Thu, Dec 8, 2016 at 5:23 PM, Robert Haas <robertmh...@gmail.com> wrote: > On Wed, Nov 23, 2016 at 3:33 AM, Andres Freund <and...@anarazel.de> wrote: >> On 2016-11-18 08:00:40 -0500, Robert Haas wrote: >>> On Tue, Nov 15, 2016 at 2:28 PM, Andres Freund <and...@anarazel.de> wrote: >>> > I've a working fix for this, and for a similar issue Robert found. I'm >>> > still playing around with it, but basically the fix is to make the >>> > growth policy a bit more adaptive. >>> >>> Any chance you can post a patch soon? >> >> Here's my WIP series addressing this and related problems. With this >> we're again noticeably faster than the dynahash implementation, in both >> the case here, and the query you brought up over IM. >> >> This definitely needs some more TLC, but the general approach seems >> good. I particularly like that it apparently allows us to increase the >> default fillfactor without much downside according to my measurements. > > Are you going to commit something here? At least enough to make > Finalize HashAgg -> Gather -> Partial HashAgg terminate in finite > time? Because the fact that it doesn't really sucks.
I took a look at Andres's patches today and saw that they can't really be applied as-is, because they "temporarily" change the master's ParallelWorkerNumber but have no provision for restoring it after an ERROR. I think that approach isn't right anyway; it seems to me that what we want to do is set hash_iv based on we're in a Partial HashAgg, not whether we're in a parallel worker. For instance, if we're somehow in a nodeSubplan.c there's no need for this just because we happen to be in a worker -- I think. That led me to develop the attached patch. This may not be perfect, but it causes TPC-H Q18 to complete instead of hanging forever, so I'm going to commit it RSN unless there are loud objections combined with promising steps toward some alternative fix. It's been over a month since these problems were reported, and it is not good that the tree has been in a broken state for that entire time. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
diff --git a/src/backend/executor/execGrouping.c b/src/backend/executor/execGrouping.c index 94cc59d..3149fbe 100644 --- a/src/backend/executor/execGrouping.c +++ b/src/backend/executor/execGrouping.c @@ -18,6 +18,8 @@ */ #include "postgres.h" +#include "access/hash.h" +#include "access/parallel.h" #include "executor/executor.h" #include "miscadmin.h" #include "utils/lsyscache.h" @@ -289,7 +291,8 @@ BuildTupleHashTable(int numCols, AttrNumber *keyColIdx, FmgrInfo *eqfunctions, FmgrInfo *hashfunctions, long nbuckets, Size additionalsize, - MemoryContext tablecxt, MemoryContext tempcxt) + MemoryContext tablecxt, MemoryContext tempcxt, + bool use_variable_hash_iv) { TupleHashTable hashtable; Size entrysize = sizeof(TupleHashEntryData) + additionalsize; @@ -314,6 +317,19 @@ BuildTupleHashTable(int numCols, AttrNumber *keyColIdx, hashtable->in_hash_funcs = NULL; hashtable->cur_eq_funcs = NULL; + /* + * If parallelism is in use, even if the master backend is performing the + * scan itself, we don't want to create the hashtable exactly the same way + * in all workers. As hashtables are iterated over in keyspace-order, + * doing so in all processes in the same way is likely to lead to + * "unbalanced" hashtables when the table size initially is + * underestimated. + */ + if (use_variable_hash_iv) + hashtable->hash_iv = hash_uint32(ParallelWorkerNumber); + else + hashtable->hash_iv = 0; + hashtable->hashtab = tuplehash_create(tablecxt, nbuckets); hashtable->hashtab->private_data = hashtable; @@ -450,7 +466,7 @@ TupleHashTableHash(struct tuplehash_hash *tb, const MinimalTuple tuple) TupleHashTable hashtable = (TupleHashTable) tb->private_data; int numCols = hashtable->numCols; AttrNumber *keyColIdx = hashtable->keyColIdx; - uint32 hashkey = 0; + uint32 hashkey = hashtable->hash_iv; TupleTableSlot *slot; FmgrInfo *hashfunctions; int i; diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c index eefb3d6..a093862 100644 --- a/src/backend/executor/nodeAgg.c +++ b/src/backend/executor/nodeAgg.c @@ -1723,7 +1723,8 @@ build_hash_table(AggState *aggstate) node->numGroups, additionalsize, aggstate->aggcontexts[0]->ecxt_per_tuple_memory, - tmpmem); + tmpmem, + !DO_AGGSPLIT_SKIPFINAL(aggstate->aggsplit)); } /* diff --git a/src/backend/executor/nodeRecursiveunion.c b/src/backend/executor/nodeRecursiveunion.c index acded07..5b734c0 100644 --- a/src/backend/executor/nodeRecursiveunion.c +++ b/src/backend/executor/nodeRecursiveunion.c @@ -43,7 +43,8 @@ build_hash_table(RecursiveUnionState *rustate) node->numGroups, 0, rustate->tableContext, - rustate->tempContext); + rustate->tempContext, + false); } diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c index e94555e..760b935 100644 --- a/src/backend/executor/nodeSetOp.c +++ b/src/backend/executor/nodeSetOp.c @@ -130,7 +130,8 @@ build_hash_table(SetOpState *setopstate) node->numGroups, 0, setopstate->tableContext, - setopstate->tempContext); + setopstate->tempContext, + false); } /* diff --git a/src/backend/executor/nodeSubplan.c b/src/backend/executor/nodeSubplan.c index 8ca8fc4..d343600 100644 --- a/src/backend/executor/nodeSubplan.c +++ b/src/backend/executor/nodeSubplan.c @@ -510,7 +510,8 @@ buildSubPlanHash(SubPlanState *node, ExprContext *econtext) nbuckets, 0, node->hashtablecxt, - node->hashtempcxt); + node->hashtempcxt, + false); if (!subplan->unknownEqFalse) { @@ -529,7 +530,8 @@ buildSubPlanHash(SubPlanState *node, ExprContext *econtext) nbuckets, 0, node->hashtablecxt, - node->hashtempcxt); + node->hashtempcxt, + false); } /* diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h index b4d09f9..3f649fa 100644 --- a/src/include/executor/executor.h +++ b/src/include/executor/executor.h @@ -143,7 +143,7 @@ extern TupleHashTable BuildTupleHashTable(int numCols, AttrNumber *keyColIdx, FmgrInfo *hashfunctions, long nbuckets, Size additionalsize, MemoryContext tablecxt, - MemoryContext tempcxt); + MemoryContext tempcxt, bool use_variable_hash_iv); extern TupleHashEntry LookupTupleHashEntry(TupleHashTable hashtable, TupleTableSlot *slot, bool *isnew); diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h index 1de5c81..703604a 100644 --- a/src/include/nodes/execnodes.h +++ b/src/include/nodes/execnodes.h @@ -533,6 +533,7 @@ typedef struct TupleHashTableData TupleTableSlot *inputslot; /* current input tuple's slot */ FmgrInfo *in_hash_funcs; /* hash functions for input datatype(s) */ FmgrInfo *cur_eq_funcs; /* equality functions for input vs. table */ + uint32 hash_iv; /* hash-function IV */ } TupleHashTableData; typedef tuplehash_iterator TupleHashIterator;
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers