I wrote: > For a query like this, where the hash join is being done repeatedly, > it might be useful for the executor itself to track how often each > subplan has been seen to be empty.
I implemented a simple form of this, and it made 8.1 faster than 8.0 on the test case I was using. Give it a try ... regards, tom lane Index: src/backend/executor/nodeHashjoin.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/executor/nodeHashjoin.c,v retrieving revision 1.75.2.2 diff -c -r1.75.2.2 nodeHashjoin.c *** src/backend/executor/nodeHashjoin.c 28 Nov 2005 17:14:47 -0000 1.75.2.2 --- src/backend/executor/nodeHashjoin.c 28 Nov 2005 23:41:28 -0000 *************** *** 120,135 **** * since we aren't going to be able to skip the join on the strength * of an empty inner relation anyway.) * * The only way to make the check is to try to fetch a tuple from the * outer plan node. If we succeed, we have to stash it away for later * consumption by ExecHashJoinOuterGetTuple. */ ! if (outerNode->plan->startup_cost < hashNode->ps.plan->total_cost || ! node->js.jointype == JOIN_LEFT) { node->hj_FirstOuterTupleSlot = ExecProcNode(outerNode); if (TupIsNull(node->hj_FirstOuterTupleSlot)) return NULL; } else node->hj_FirstOuterTupleSlot = NULL; --- 120,147 ---- * since we aren't going to be able to skip the join on the strength * of an empty inner relation anyway.) * + * If we are rescanning the join, we make use of information gained + * on the previous scan: don't bother to try the prefetch if the + * previous scan found the outer relation nonempty. This is not + * 100% reliable since with new parameters the outer relation might + * yield different results, but it's a good heuristic. + * * The only way to make the check is to try to fetch a tuple from the * outer plan node. If we succeed, we have to stash it away for later * consumption by ExecHashJoinOuterGetTuple. */ ! if (node->js.jointype == JOIN_LEFT || ! (outerNode->plan->startup_cost < hashNode->ps.plan->total_cost && ! !node->hj_OuterNotEmpty)) { node->hj_FirstOuterTupleSlot = ExecProcNode(outerNode); if (TupIsNull(node->hj_FirstOuterTupleSlot)) + { + node->hj_OuterNotEmpty = false; return NULL; + } + else + node->hj_OuterNotEmpty = true; } else node->hj_FirstOuterTupleSlot = NULL; *************** *** 159,164 **** --- 171,183 ---- * scanning the outer relation */ hashtable->nbatch_outstart = hashtable->nbatch; + + /* + * Reset OuterNotEmpty for scan. (It's OK if we fetched a tuple + * above, because ExecHashJoinOuterGetTuple will immediately + * set it again.) + */ + node->hj_OuterNotEmpty = false; } /* *************** *** 454,459 **** --- 473,479 ---- hjstate->js.ps.ps_TupFromTlist = false; hjstate->hj_NeedNewOuter = true; hjstate->hj_MatchedOuter = false; + hjstate->hj_OuterNotEmpty = false; return hjstate; } *************** *** 546,551 **** --- 566,574 ---- *hashvalue = ExecHashGetHashValue(hashtable, econtext, hjstate->hj_OuterHashKeys); + /* remember outer relation is not empty for possible rescan */ + hjstate->hj_OuterNotEmpty = true; + return slot; } *************** *** 810,816 **** if (node->hj_HashTable->nbatch == 1 && ((PlanState *) node)->righttree->chgParam == NULL) { ! /* okay to reuse the hash table; needn't rescan inner, either */ } else { --- 833,851 ---- if (node->hj_HashTable->nbatch == 1 && ((PlanState *) node)->righttree->chgParam == NULL) { ! /* ! * okay to reuse the hash table; needn't rescan inner, either. ! * ! * What we do need to do is reset our state about the emptiness ! * of the outer relation, so that the new scan of the outer will ! * update it correctly if it turns out to be empty this time. ! * (There's no harm in clearing it now because ExecHashJoin won't ! * need the info. In the other cases, where the hash table ! * doesn't exist or we are destroying it, we leave this state ! * alone because ExecHashJoin will need it the first time ! * through.) ! */ ! node->hj_OuterNotEmpty = false; } else { Index: src/include/nodes/execnodes.h =================================================================== RCS file: /cvsroot/pgsql/src/include/nodes/execnodes.h,v retrieving revision 1.139.2.2 diff -c -r1.139.2.2 execnodes.h *** src/include/nodes/execnodes.h 22 Nov 2005 18:23:28 -0000 1.139.2.2 --- src/include/nodes/execnodes.h 28 Nov 2005 23:41:28 -0000 *************** *** 1101,1106 **** --- 1101,1107 ---- * hj_FirstOuterTupleSlot first tuple retrieved from outer plan * hj_NeedNewOuter true if need new outer tuple on next call * hj_MatchedOuter true if found a join match for current outer + * hj_OuterNotEmpty true if outer relation known not empty * ---------------- */ *************** *** 1125,1130 **** --- 1126,1132 ---- TupleTableSlot *hj_FirstOuterTupleSlot; bool hj_NeedNewOuter; bool hj_MatchedOuter; + bool hj_OuterNotEmpty; } HashJoinState; ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq