On 12/11/19 8:49 PM, Tom Lane wrote:
Andrey Lepikhov <a.lepik...@postgrespro.ru> writes:
During NestLoop execution we have bad corner case: if outer subtree
contains tuples the join node will scan inner subtree even if it does
not return any tuples.
So the first question about corner-case optimizations like this is always
"how much overhead does it add in the normal case where it fails to gain
anything?". I see no performance numbers in your proposal.
I thought it is trivial. But quick study shows no differences that can
be seen.
I do not much like anything about the code, either: as written it's
only helpful for an especially narrow corner case (so narrow that
I wonder if it really ever helps at all: surely calling a nodeMaterial
whose tuplestore is empty doesn't cost much).
Scanning of large outer can be very costly. If you will try to play with
analytical queries you can find cases, where nested loops uses
materialization of zero tuples. At least one of the cases for this is
finding data gaps.
Also, this optimization exists in logic of hash join.
But that doesn't stop it
from adding a bool to the generic PlanState struct, with global
implications. What I'd expected from your text description is that
nodeNestLoop would remember whether its inner child had returned zero rows
the first time, and assume that subsequent executions could be skipped
unless the inner child's parameters change.
This note I was waiting for. I agree with you that adding a bool
variable to PlanState is excessful. See in attachment another version of
the optimization.
--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company
>From a92617b82d922d5ebac7342bd8c212e0eb5b4553 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepik...@postgrespro.ru>
Date: Mon, 9 Dec 2019 18:25:04 +0500
Subject: [PATCH] Skip scan of outer subtree if inner of the NestedLoop node is
guaranteed empty.
---
src/backend/executor/nodeNestloop.c | 8 ++++++++
src/include/nodes/execnodes.h | 1 +
src/test/regress/expected/partition_prune.out | 8 ++++----
3 files changed, 13 insertions(+), 4 deletions(-)
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index fc6667ef82..4a7da5406d 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -164,6 +164,11 @@ ExecNestLoop(PlanState *pstate)
{
ENL1_printf("no inner tuple, need new outer tuple");
+ if (node->nl_InnerEmpty && list_length(nl->nestParams) == 0 &&
+ (node->js.jointype == JOIN_INNER ||
+ node->js.jointype == JOIN_SEMI))
+ return NULL;
+
node->nl_NeedNewOuter = true;
if (!node->nl_MatchedOuter &&
@@ -200,6 +205,8 @@ ExecNestLoop(PlanState *pstate)
*/
continue;
}
+ else
+ node->nl_InnerEmpty = false;
/*
* at this point we have a new pair of inner and outer tuples so we
@@ -327,6 +334,7 @@ ExecInitNestLoop(NestLoop *node, EState *estate, int eflags)
{
case JOIN_INNER:
case JOIN_SEMI:
+ nlstate->nl_InnerEmpty = true;
break;
case JOIN_LEFT:
case JOIN_ANTI:
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 692438d6df..8829433347 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1847,6 +1847,7 @@ typedef struct NestLoopState
JoinState js; /* its first field is NodeTag */
bool nl_NeedNewOuter;
bool nl_MatchedOuter;
+ bool nl_InnerEmpty;
TupleTableSlot *nl_NullInnerTupleSlot;
} NestLoopState;
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index f9eeda60e6..04cfe0944e 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -2455,9 +2455,9 @@ update ab_a1 set b = 3 from ab where ab.a = 1 and ab.a = ab_a1.a;
Heap Blocks: exact=1
-> Bitmap Index Scan on ab_a1_b2_a_idx (actual rows=1 loops=1)
Index Cond: (a = 1)
- -> Bitmap Heap Scan on ab_a1_b3 ab_2 (actual rows=0 loops=1)
+ -> Bitmap Heap Scan on ab_a1_b3 ab_2 (never executed)
Recheck Cond: (a = 1)
- -> Bitmap Index Scan on ab_a1_b3_a_idx (actual rows=0 loops=1)
+ -> Bitmap Index Scan on ab_a1_b3_a_idx (never executed)
Index Cond: (a = 1)
-> Materialize (actual rows=0 loops=1)
-> Bitmap Heap Scan on ab_a1_b1 ab_a1_1 (actual rows=0 loops=1)
@@ -2496,9 +2496,9 @@ update ab_a1 set b = 3 from ab where ab.a = 1 and ab.a = ab_a1.a;
Heap Blocks: exact=1
-> Bitmap Index Scan on ab_a1_b2_a_idx (actual rows=1 loops=1)
Index Cond: (a = 1)
- -> Bitmap Heap Scan on ab_a1_b3 ab_2 (actual rows=0 loops=1)
+ -> Bitmap Heap Scan on ab_a1_b3 ab_2 (never executed)
Recheck Cond: (a = 1)
- -> Bitmap Index Scan on ab_a1_b3_a_idx (actual rows=1 loops=1)
+ -> Bitmap Index Scan on ab_a1_b3_a_idx (never executed)
Index Cond: (a = 1)
-> Materialize (actual rows=0 loops=1)
-> Bitmap Heap Scan on ab_a1_b3 ab_a1_3 (actual rows=0 loops=1)
--
2.17.1