On 12/10/21 00:51, Arne Roland wrote:
Hi,
thanks for the reply!
From: Tomas Vondra <tomas.von...@enterprisedb.com>
Sent: Thursday, December 2, 2021 20:58
Subject: Re: PATCH: generate fractional cheapest paths in
generate_orderedappend_path
> [...]
> Well, I mentioned three open questions in my first message, and I don't
> think we've really addressed them yet. We've agreed we don't need to add
> the incremental sort here, but that leaves
>
>
> 1) If get_cheapest_fractional_path_for_pathkeys returns NULL, should we
> default to cheapest_startup or cheapest_total?
I think it's reasonable to use cheapest_total like we are doing now. I
hardly see any reason to change it.
True, it's a reasonable first step.
Either we generate the same plan as today (with cheapest_total), or
maybe a better one (if we find a cheaper fractional path with matching
pathkeys). It's true we could do better, but that's life - it's not like
we consider every possible path everywhere.
The incremental sort case you mentioned, seems like the only case that
plan might be interesting. If we really want that to happen, we probably
should check for that separately, i.e. having startup_fractional. Even
though this is a fairly special case as it's mostly relevant for
partitionwise joins, I'm still not convinced it's worth the cpu cycles.
The point is that in most cases factional and startup_fractional will be
the same anyways.
I don't think we can simply check for startup_fractional (although, I'm
not sure I fully understand what would that be). I think what we should
really do in this case is walking all the paths, ensuring it's properly
sorted (with either full or incremental sort), and then pick the
cheapest fractional path from these sorted paths. But I agree that seems
pretty expensive.
And I suspect, even if they aren't, solving that from an application
developers point of view, is in most cases not that difficult. One can
usually match the pathkey. I personally had a lot of real world issues
with missing fractional paths using primary keys. I think it's worth
noting that everything will likely match the partition keys anyways,
because otherwise there is no chance of doing a merge append.
Yeah, I think you're right.
If I am not mistaken, in case we end up doing a full sort, the
cheapest_total path should be completely sufficient.
Definitely true.
> 2) Should get_cheapest_fractional_path_for_pathkeys worry about
> require_parallel_safe? I think yes, but we need to update the patch.
I admit, that such a version of
get_cheapest_fractional_path_for_pathkeys could be consistent with other
functions. And I was confused about that before. But I am not sure what
to use require_parallel_safe for. build_minmax_path doesn't care, they
are never parallel safe. And afaict generate_orderedappend_paths cares
neither, it considers all plans. For instance when it calls
get_cheapest_path_for_pathkeys, it sets require_parallel_safe just to
false as well.
True as well. It's looks a bit strange, but you're right neither place
cares about parallel safety.
> I'll take a closer look next week, once I get home from NYC, and I'll
> submit an improved version for the January CF.
Thank you for your work! The current patch, apart from the
comments/regression tests, seems pretty reasonable to me.
Attached is a cleaned-up patch, with a simple regression test. I'll mark
this as RFC and get it committed in a couple days.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 296dd75c1b..4a5f52287e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1716,6 +1716,7 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
List *pathkeys = (List *) lfirst(lcp);
List *startup_subpaths = NIL;
List *total_subpaths = NIL;
+ List *fractional_subpaths = NIL;
bool startup_neq_total = false;
ListCell *lcr;
bool match_partition_order;
@@ -1745,7 +1746,8 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
{
RelOptInfo *childrel = (RelOptInfo *) lfirst(lcr);
Path *cheapest_startup,
- *cheapest_total;
+ *cheapest_total,
+ *cheapest_fractional = NULL;
/* Locate the right paths, if they are available. */
cheapest_startup =
@@ -1773,6 +1775,33 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
Assert(cheapest_total->param_info == NULL);
}
+ /*
+ * When needed (building fractional path), determine the cheapest
+ * fractional path too.
+ */
+ if (root->tuple_fraction > 0)
+ {
+ double path_fraction = (1.0 / root->tuple_fraction);
+
+ cheapest_fractional =
+ get_cheapest_fractional_path_for_pathkeys(childrel->pathlist,
+ pathkeys,
+ NULL,
+ path_fraction);
+
+ /*
+ * If we found no path with matching pathkeys, use the cheapest
+ * total path instead.
+ *
+ * XXX We might be smarter and consider partially sorted paths,
+ * with just an incremental sort on top. That might be cheaper
+ * than using cheapest_total with regular sort, but we'd have
+ * to build all the incremental paths.
+ */
+ if (!cheapest_fractional)
+ cheapest_fractional = cheapest_total;
+ }
+
/*
* Notice whether we actually have different paths for the
* "cheapest" and "total" cases; frequently there will be no point
@@ -1799,6 +1828,12 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
startup_subpaths = lappend(startup_subpaths, cheapest_startup);
total_subpaths = lappend(total_subpaths, cheapest_total);
+
+ if (cheapest_fractional)
+ {
+ cheapest_fractional = get_singleton_append_subpath(cheapest_fractional);
+ fractional_subpaths = lappend(fractional_subpaths, cheapest_fractional);
+ }
}
else if (match_partition_order_desc)
{
@@ -1812,6 +1847,12 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
startup_subpaths = lcons(cheapest_startup, startup_subpaths);
total_subpaths = lcons(cheapest_total, total_subpaths);
+
+ if (cheapest_fractional)
+ {
+ cheapest_fractional = get_singleton_append_subpath(cheapest_fractional);
+ fractional_subpaths = lcons(cheapest_fractional, fractional_subpaths);
+ }
}
else
{
@@ -1823,6 +1864,10 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
&startup_subpaths, NULL);
accumulate_append_subpath(cheapest_total,
&total_subpaths, NULL);
+
+ if (cheapest_fractional)
+ accumulate_append_subpath(cheapest_fractional,
+ &fractional_subpaths, NULL);
}
}
@@ -1849,6 +1894,17 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
0,
false,
-1));
+
+ if (fractional_subpaths)
+ add_path(rel, (Path *) create_append_path(root,
+ rel,
+ fractional_subpaths,
+ NIL,
+ pathkeys,
+ NULL,
+ 0,
+ false,
+ -1));
}
else
{
@@ -1864,6 +1920,13 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
total_subpaths,
pathkeys,
NULL));
+
+ if (fractional_subpaths)
+ add_path(rel, (Path *) create_merge_append_path(root,
+ rel,
+ fractional_subpaths,
+ pathkeys,
+ NULL));
}
}
}
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 27f7525b3e..bb5b7c47a4 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -4862,3 +4862,51 @@ SELECT t1.*, t2.* FROM alpha t1 INNER JOIN beta t2 ON (t1.a = t2.a AND t1.b = t2
1 | 209 | 0009 | 1 | 209 | 0009
(8 rows)
+-- partitionwise join with fractional paths
+CREATE TABLE fract_t (id BIGINT, PRIMARY KEY (id)) PARTITION BY RANGE (id);
+CREATE TABLE fract_t0 PARTITION OF fract_t FOR VALUES FROM ('0') TO ('1000');
+CREATE TABLE fract_t1 PARTITION OF fract_t FOR VALUES FROM ('1000') TO ('2000');
+-- insert data
+INSERT INTO fract_t (id) (SELECT generate_series(0, 1999));
+ANALYZE fract_t;
+-- verify plan; nested index only scans
+SET max_parallel_workers_per_gather = 0;
+SET enable_partitionwise_join = on;
+EXPLAIN (COSTS OFF)
+SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER BY id ASC LIMIT 10;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Limit
+ -> Merge Append
+ Sort Key: x.id
+ -> Merge Left Join
+ Merge Cond: (x_1.id = y_1.id)
+ -> Index Only Scan using fract_t0_pkey on fract_t0 x_1
+ -> Index Only Scan using fract_t0_pkey on fract_t0 y_1
+ -> Merge Left Join
+ Merge Cond: (x_2.id = y_2.id)
+ -> Index Only Scan using fract_t1_pkey on fract_t1 x_2
+ -> Index Only Scan using fract_t1_pkey on fract_t1 y_2
+(11 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER BY id DESC LIMIT 10;
+ QUERY PLAN
+--------------------------------------------------------------------------------
+ Limit
+ -> Merge Append
+ Sort Key: x.id DESC
+ -> Nested Loop Left Join
+ -> Index Only Scan Backward using fract_t0_pkey on fract_t0 x_1
+ -> Index Only Scan using fract_t0_pkey on fract_t0 y_1
+ Index Cond: (id = x_1.id)
+ -> Nested Loop Left Join
+ -> Index Only Scan Backward using fract_t1_pkey on fract_t1 x_2
+ -> Index Only Scan using fract_t1_pkey on fract_t1 y_2
+ Index Cond: (id = x_2.id)
+(11 rows)
+
+-- cleanup
+DROP TABLE fract_t;
+RESET max_parallel_workers_per_gather;
+RESET enable_partitionwise_join;
diff --git a/src/test/regress/sql/partition_join.sql b/src/test/regress/sql/partition_join.sql
index d97b5b69ff..67f506361f 100644
--- a/src/test/regress/sql/partition_join.sql
+++ b/src/test/regress/sql/partition_join.sql
@@ -1142,3 +1142,28 @@ SELECT t1.*, t2.* FROM alpha t1 INNER JOIN beta t2 ON (t1.a = t2.a AND t1.c = t2
EXPLAIN (COSTS OFF)
SELECT t1.*, t2.* FROM alpha t1 INNER JOIN beta t2 ON (t1.a = t2.a AND t1.b = t2.b AND t1.c = t2.c) WHERE ((t1.b >= 100 AND t1.b < 110) OR (t1.b >= 200 AND t1.b < 210)) AND ((t2.b >= 100 AND t2.b < 110) OR (t2.b >= 200 AND t2.b < 210)) AND t1.c IN ('0004', '0009') ORDER BY t1.a, t1.b;
SELECT t1.*, t2.* FROM alpha t1 INNER JOIN beta t2 ON (t1.a = t2.a AND t1.b = t2.b AND t1.c = t2.c) WHERE ((t1.b >= 100 AND t1.b < 110) OR (t1.b >= 200 AND t1.b < 210)) AND ((t2.b >= 100 AND t2.b < 110) OR (t2.b >= 200 AND t2.b < 210)) AND t1.c IN ('0004', '0009') ORDER BY t1.a, t1.b;
+
+-- partitionwise join with fractional paths
+CREATE TABLE fract_t (id BIGINT, PRIMARY KEY (id)) PARTITION BY RANGE (id);
+CREATE TABLE fract_t0 PARTITION OF fract_t FOR VALUES FROM ('0') TO ('1000');
+CREATE TABLE fract_t1 PARTITION OF fract_t FOR VALUES FROM ('1000') TO ('2000');
+
+-- insert data
+INSERT INTO fract_t (id) (SELECT generate_series(0, 1999));
+ANALYZE fract_t;
+
+-- verify plan; nested index only scans
+SET max_parallel_workers_per_gather = 0;
+SET enable_partitionwise_join = on;
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER BY id ASC LIMIT 10;
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER BY id DESC LIMIT 10;
+
+-- cleanup
+DROP TABLE fract_t;
+
+RESET max_parallel_workers_per_gather;
+RESET enable_partitionwise_join;