Hi,

On 2019-06-13 16:23:34 -0700, Andres Freund wrote:
> On June 13, 2019 3:38:47 PM PDT, Tom Lane <t...@sss.pgh.pa.us> wrote:
> >Andres Freund <and...@anarazel.de> writes:
> >> I am too tired to look further into this. I suspect the only reason
> >we
> >> didn't previously run into trouble with the executor stashing
> >hashkeys
> >> manually at a different tree level with:
> >> ((HashState *) innerPlanState(hjstate))->hashkeys
> >> is that hashkeys itself isn't printed...
> >
> >TBH, I think 5f32b29c is just wrong and should be reverted for now.
> >If there's a need to handle those expressions differently, it will
> >require some cooperation from the planner not merely a two-line hack
> >in executor startup.  That commit didn't include any test case or
> >other demonstration that it was solving a live problem, so I think
> >we can leave it for v13 to address the issue.
> 
> I'm pretty sure you'd get an assertion failure if you reverted it
> (that's why it was added). So it's a bit more complicated than that.
> Unfortunately I'll not get back to work until Monday, but I'll spend
> time on this then.

Indeed, there are assertion failures when initializing the expression
with HashJoinState as parent - that's because when computing the
hashvalue for nodeHash input, we expect the slot from the node below to
be of the type that HashState returns (as that's what INNER_VAR for an
expression at the HashJoin level refers to), rather than the type of the
input to HashState.  We could work around that by marking the slots from
underlying nodes as being of an unknown type, but that'd slow down
execution.

I briefly played with the dirty hack of set_deparse_planstate()
setting dpns->inner_planstate = ps for IsA(ps, HashState), but that
seems just too ugly.

I think the most straight-forward fix might just be to just properly
split the expression at plan time. Adding workarounds for things as
dirty as building an expression for a subsidiary node in the parent, and
then modifying the subsidiary node from the parent, doesn't seem like a
better way forward.

The attached *prototype* does so.

If we go that way, we probably need to:
- Add a test for the failure case at hand
- check a few of the comments around inner/outer in nodeHash.c
- consider moving the setrefs.c code into its own function?
- probably clean up the naming scheme in createplan.c

I think there's a few more things we could do, although it's not clear
that that needs to happen in v12:
- Consider not extracting hj_OuterHashKeys, hj_HashOperators,
  hj_Collations out of HashJoin->hashclauses, and instead just directly
  handing them individually in the planner.  create_mergejoin_plan()
  already partially does that.

Greetings,

Andres Freund

PS: If I were to write hashjoin today, it sure wouldn't be as two nodes
- it seems pretty clear that the boundaries are just too fuzzy. To the
point that I wonder if it'd not be worth merging them at some point.
>From ba351470f7d6837a4a86c24ce87ee33919314a77 Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Mon, 17 Jun 2019 23:59:38 -0700
Subject: [PATCH v1] wip-fix-hash-key-computation

---
 src/backend/executor/nodeHash.c         | 19 +++++++++++--------
 src/backend/executor/nodeHashjoin.c     | 18 +++---------------
 src/backend/nodes/copyfuncs.c           |  1 +
 src/backend/nodes/outfuncs.c            |  1 +
 src/backend/nodes/readfuncs.c           |  1 +
 src/backend/optimizer/plan/createplan.c | 13 +++++++++++++
 src/backend/optimizer/plan/setrefs.c    | 16 ++++++++++++++++
 src/include/nodes/execnodes.h           |  3 ---
 src/include/nodes/plannodes.h           |  1 +
 9 files changed, 47 insertions(+), 26 deletions(-)

diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index d16120b9c48..181f45a5b1c 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -157,7 +157,8 @@ MultiExecPrivateHash(HashState *node)
 	econtext = node->ps.ps_ExprContext;
 
 	/*
-	 * get all inner tuples and insert into the hash table (or temp files)
+	 * Get all tuples from subsidiary node and insert into the hash table (or
+	 * temp files).
 	 */
 	for (;;)
 	{
@@ -165,7 +166,7 @@ MultiExecPrivateHash(HashState *node)
 		if (TupIsNull(slot))
 			break;
 		/* We have to compute the hash value */
-		econtext->ecxt_innertuple = slot;
+		econtext->ecxt_outertuple = slot;
 		if (ExecHashGetHashValue(hashtable, econtext, hashkeys,
 								 false, hashtable->keepNulls,
 								 &hashvalue))
@@ -281,7 +282,7 @@ MultiExecParallelHash(HashState *node)
 				slot = ExecProcNode(outerNode);
 				if (TupIsNull(slot))
 					break;
-				econtext->ecxt_innertuple = slot;
+				econtext->ecxt_outertuple = slot;
 				if (ExecHashGetHashValue(hashtable, econtext, hashkeys,
 										 false, hashtable->keepNulls,
 										 &hashvalue))
@@ -388,8 +389,8 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
 	/*
 	 * initialize child expressions
 	 */
-	hashstate->ps.qual =
-		ExecInitQual(node->plan.qual, (PlanState *) hashstate);
+	hashstate->hashkeys =
+		ExecInitExprList(node->hashkeys, (PlanState *) hashstate);
 
 	return hashstate;
 }
@@ -1773,9 +1774,11 @@ ExecParallelHashTableInsertCurrentBatch(HashJoinTable hashtable,
  * ExecHashGetHashValue
  *		Compute the hash value for a tuple
  *
- * The tuple to be tested must be in either econtext->ecxt_outertuple or
- * econtext->ecxt_innertuple.  Vars in the hashkeys expressions should have
- * varno either OUTER_VAR or INNER_VAR.
+ * The tuple to be tested must be in econtext->ecxt_outertuple and Vars in the
+ * hashkeys expressions should have a varno of OUTER_VAR (but note that the
+ * econtext, expression, and slot might belong to the HashJoin's outer rel, or
+ * be from the Hash, and thus refer to the join's outer/inner side
+ * respectively).
  *
  * A true result means the tuple's hash value has been successfully computed
  * and stored at *hashvalue.  A false result means the tuple cannot match
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 8484a287e73..f9262b1aa38 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -601,8 +601,6 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
 	Plan	   *outerNode;
 	Hash	   *hashNode;
 	List	   *lclauses;
-	List	   *rclauses;
-	List	   *rhclauses;
 	List	   *hoperators;
 	List	   *hcollations;
 	TupleDesc	outerDesc,
@@ -731,14 +729,11 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
 	hjstate->hj_CurTuple = NULL;
 
 	/*
-	 * Deconstruct the hash clauses into outer and inner argument values, so
-	 * that we can evaluate those subexpressions separately.  Also make a list
-	 * of the hash operator OIDs, in preparation for looking up the hash
-	 * functions to use.
+	 * Deconstruct the hash clauses into outer argument values, so that we can
+	 * evaluate those subexpressions separately.  Also make a list of the hash
+	 * operator OIDs, in preparation for looking up the hash functions to use.
 	 */
 	lclauses = NIL;
-	rclauses = NIL;
-	rhclauses = NIL;
 	hoperators = NIL;
 	hcollations = NIL;
 	foreach(l, node->hashclauses)
@@ -747,19 +742,12 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
 
 		lclauses = lappend(lclauses, ExecInitExpr(linitial(hclause->args),
 												  (PlanState *) hjstate));
-		rclauses = lappend(rclauses, ExecInitExpr(lsecond(hclause->args),
-												  (PlanState *) hjstate));
-		rhclauses = lappend(rhclauses, ExecInitExpr(lsecond(hclause->args),
-													innerPlanState(hjstate)));
 		hoperators = lappend_oid(hoperators, hclause->opno);
 		hcollations = lappend_oid(hcollations, hclause->inputcollid);
 	}
 	hjstate->hj_OuterHashKeys = lclauses;
-	hjstate->hj_InnerHashKeys = rclauses;
 	hjstate->hj_HashOperators = hoperators;
 	hjstate->hj_Collations = hcollations;
-	/* child Hash node needs to evaluate inner hash keys, too */
-	((HashState *) innerPlanState(hjstate))->hashkeys = rhclauses;
 
 	hjstate->hj_JoinState = HJ_BUILD_HASHTABLE;
 	hjstate->hj_MatchedOuter = false;
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 78deade89b4..65098f04712 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -1066,6 +1066,7 @@ _copyHash(const Hash *from)
 	/*
 	 * copy remainder of node
 	 */
+	COPY_NODE_FIELD(hashkeys);
 	COPY_SCALAR_FIELD(skewTable);
 	COPY_SCALAR_FIELD(skewColumn);
 	COPY_SCALAR_FIELD(skewInherit);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 8400dd319e2..567a26fd636 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -863,6 +863,7 @@ _outHash(StringInfo str, const Hash *node)
 
 	_outPlanInfo(str, (const Plan *) node);
 
+	WRITE_NODE_FIELD(hashkeys);
 	WRITE_OID_FIELD(skewTable);
 	WRITE_INT_FIELD(skewColumn);
 	WRITE_BOOL_FIELD(skewInherit);
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 6c2626ee62b..6ae136e0bb3 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -2274,6 +2274,7 @@ _readHash(void)
 
 	ReadCommonPlan(&local_node->plan);
 
+	READ_NODE_FIELD(hashkeys);
 	READ_OID_FIELD(skewTable);
 	READ_INT_FIELD(skewColumn);
 	READ_BOOL_FIELD(skewInherit);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 608d5adfed2..f7d486ed8e1 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -225,6 +225,7 @@ static HashJoin *make_hashjoin(List *tlist,
 							   Plan *lefttree, Plan *righttree,
 							   JoinType jointype, bool inner_unique);
 static Hash *make_hash(Plan *lefttree,
+					   List *hashkeys,
 					   Oid skewTable,
 					   AttrNumber skewColumn,
 					   bool skewInherit);
@@ -4381,9 +4382,11 @@ create_hashjoin_plan(PlannerInfo *root,
 	List	   *joinclauses;
 	List	   *otherclauses;
 	List	   *hashclauses;
+	List	   *hashkeys = NIL;
 	Oid			skewTable = InvalidOid;
 	AttrNumber	skewColumn = InvalidAttrNumber;
 	bool		skewInherit = false;
+	ListCell	*lc;
 
 	/*
 	 * HashJoin can project, so we don't have to demand exact tlists from the
@@ -4475,10 +4478,18 @@ create_hashjoin_plan(PlannerInfo *root,
 		}
 	}
 
+	foreach(lc, hashclauses)
+	{
+		OpExpr	   *hclause = (OpExpr *) lfirst(lc);
+
+		hashkeys = lappend(hashkeys, lsecond(hclause->args));
+	}
+
 	/*
 	 * Build the hash node and hash join node.
 	 */
 	hash_plan = make_hash(inner_plan,
+						  hashkeys,
 						  skewTable,
 						  skewColumn,
 						  skewInherit);
@@ -5568,6 +5579,7 @@ make_hashjoin(List *tlist,
 
 static Hash *
 make_hash(Plan *lefttree,
+		  List *hashkeys,
 		  Oid skewTable,
 		  AttrNumber skewColumn,
 		  bool skewInherit)
@@ -5580,6 +5592,7 @@ make_hash(Plan *lefttree,
 	plan->lefttree = lefttree;
 	plan->righttree = NULL;
 
+	node->hashkeys = hashkeys;
 	node->skewTable = skewTable;
 	node->skewColumn = skewColumn;
 	node->skewInherit = skewInherit;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index dc11f098e0f..1b0564337f9 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -646,6 +646,22 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			break;
 
 		case T_Hash:
+			{
+				Hash       *hplan = (Hash *) plan;
+				Plan	   *outer_plan = plan->lefttree;
+				indexed_tlist *outer_itlist;
+
+				outer_itlist = build_tlist_index(outer_plan->targetlist);
+
+				hplan->hashkeys = (List *)
+					fix_upper_expr(root,
+								   (Node *) hplan->hashkeys,
+								   outer_itlist,
+								   OUTER_VAR,
+								   rtoffset);
+			}
+			/* fall through */
+
 		case T_Material:
 		case T_Sort:
 		case T_Unique:
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 99b9fa414f1..4c4194617ed 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1852,7 +1852,6 @@ typedef struct MergeJoinState
  *
  *		hashclauses				original form of the hashjoin condition
  *		hj_OuterHashKeys		the outer hash keys in the hashjoin condition
- *		hj_InnerHashKeys		the inner hash keys in the hashjoin condition
  *		hj_HashOperators		the join operators in the hashjoin condition
  *		hj_HashTable			hash table for the hashjoin
  *								(NULL if table not built yet)
@@ -1883,7 +1882,6 @@ typedef struct HashJoinState
 	JoinState	js;				/* its first field is NodeTag */
 	ExprState  *hashclauses;
 	List	   *hj_OuterHashKeys;	/* list of ExprState nodes */
-	List	   *hj_InnerHashKeys;	/* list of ExprState nodes */
 	List	   *hj_HashOperators;	/* list of operator OIDs */
 	List	   *hj_Collations;
 	HashJoinTable hj_HashTable;
@@ -2222,7 +2220,6 @@ typedef struct HashState
 	PlanState	ps;				/* its first field is NodeTag */
 	HashJoinTable hashtable;	/* hash table for the hashjoin */
 	List	   *hashkeys;		/* list of ExprState nodes */
-	/* hashkeys is same as parent's hj_InnerHashKeys */
 
 	SharedHashInfo *shared_info;	/* one entry per worker */
 	HashInstrumentation *hinstrument;	/* this worker's entry */
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 70f8b8e22b7..f1bce9eb66c 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -904,6 +904,7 @@ typedef struct Hash
 	bool		skewInherit;	/* is outer join rel an inheritance tree? */
 	/* all other info is in the parent HashJoin node */
 	double		rows_total;		/* estimate total rows if parallel_aware */
+	List	   *hashkeys;		/* hash keys from the hashjoin condition */
 } Hash;
 
 /* ----------------
-- 
2.22.0.dirty

Reply via email to