On 12/10/21 00:51, Arne Roland wrote:
Hi,

thanks for the reply!

From: Tomas Vondra <tomas.von...@enterprisedb.com>
Sent: Thursday, December 2, 2021 20:58
Subject: Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path
 > [...]
 > Well, I mentioned three open questions in my first message, and I don't
 > think we've really addressed them yet. We've agreed we don't need to add
 > the incremental sort here, but that leaves
 >
 >
 > 1) If get_cheapest_fractional_path_for_pathkeys returns NULL, should we
 > default to cheapest_startup or cheapest_total?

I think it's reasonable to use cheapest_total like we are doing now. I hardly see any reason to change it.

True, it's a reasonable first step.

Either we generate the same plan as today (with cheapest_total), or maybe a better one (if we find a cheaper fractional path with matching pathkeys). It's true we could do better, but that's life - it's not like we consider every possible path everywhere.

The incremental sort case you mentioned, seems like the only case that plan might be interesting. If we really want that to happen, we probably should check for that separately, i.e. having startup_fractional. Even though this is a fairly special case as it's mostly relevant for partitionwise joins, I'm still not convinced it's worth the cpu cycles. The point is that in most cases factional and startup_fractional will be the same anyways.

I don't think we can simply check for startup_fractional (although, I'm not sure I fully understand what would that be). I think what we should really do in this case is walking all the paths, ensuring it's properly sorted (with either full or incremental sort), and then pick the cheapest fractional path from these sorted paths. But I agree that seems pretty expensive.

And I suspect, even if they aren't, solving that from an application developers point of view, is in most cases not that difficult. One can usually match the pathkey. I personally had a lot of real world issues with missing fractional paths using primary keys. I think it's worth noting that everything will likely match the partition keys anyways, because otherwise there is no chance of doing a merge append.

Yeah, I think you're right.

If I am not mistaken, in case we end up doing a full sort, the cheapest_total path should be completely sufficient.


Definitely true.

 > 2) Should get_cheapest_fractional_path_for_pathkeys worry about
 > require_parallel_safe? I think yes, but we need to update the patch.

I admit, that such a version of get_cheapest_fractional_path_for_pathkeys could be consistent with other functions. And I was confused about that before. But I am not sure what to use require_parallel_safe for. build_minmax_path doesn't care, they are never parallel safe. And afaict generate_orderedappend_paths cares neither, it considers all plans. For instance when it calls get_cheapest_path_for_pathkeys, it sets require_parallel_safe just to false as well.


True as well. It's looks a bit strange, but you're right neither place cares about parallel safety.

 > I'll take a closer look next week, once I get home from NYC, and I'll
 > submit an improved version for the January CF.

Thank you for your work! The current patch, apart from the comments/regression tests, seems pretty reasonable to me.


Attached is a cleaned-up patch, with a simple regression test. I'll mark this as RFC and get it committed in a couple days.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 296dd75c1b..4a5f52287e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1716,6 +1716,7 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
 		List	   *pathkeys = (List *) lfirst(lcp);
 		List	   *startup_subpaths = NIL;
 		List	   *total_subpaths = NIL;
+		List	   *fractional_subpaths = NIL;
 		bool		startup_neq_total = false;
 		ListCell   *lcr;
 		bool		match_partition_order;
@@ -1745,7 +1746,8 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
 		{
 			RelOptInfo *childrel = (RelOptInfo *) lfirst(lcr);
 			Path	   *cheapest_startup,
-					   *cheapest_total;
+					   *cheapest_total,
+					   *cheapest_fractional = NULL;
 
 			/* Locate the right paths, if they are available. */
 			cheapest_startup =
@@ -1773,6 +1775,33 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
 				Assert(cheapest_total->param_info == NULL);
 			}
 
+			/*
+			 * When needed (building fractional path), determine the cheapest
+			 * fractional path too.
+			 */
+			if (root->tuple_fraction > 0)
+			{
+				double	path_fraction = (1.0 / root->tuple_fraction);
+
+				cheapest_fractional =
+					get_cheapest_fractional_path_for_pathkeys(childrel->pathlist,
+															  pathkeys,
+															  NULL,
+															  path_fraction);
+
+				/*
+				 * If we found no path with matching pathkeys, use the cheapest
+				 * total path instead.
+				 *
+				 * XXX We might be smarter and consider partially sorted paths,
+				 * with just an incremental sort on top. That might be cheaper
+				 * than using cheapest_total with regular sort, but we'd have
+				 * to build all the incremental paths.
+				 */
+				if (!cheapest_fractional)
+					cheapest_fractional = cheapest_total;
+			}
+
 			/*
 			 * Notice whether we actually have different paths for the
 			 * "cheapest" and "total" cases; frequently there will be no point
@@ -1799,6 +1828,12 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
 
 				startup_subpaths = lappend(startup_subpaths, cheapest_startup);
 				total_subpaths = lappend(total_subpaths, cheapest_total);
+
+				if (cheapest_fractional)
+				{
+					cheapest_fractional = get_singleton_append_subpath(cheapest_fractional);
+					fractional_subpaths = lappend(fractional_subpaths, cheapest_fractional);
+				}
 			}
 			else if (match_partition_order_desc)
 			{
@@ -1812,6 +1847,12 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
 
 				startup_subpaths = lcons(cheapest_startup, startup_subpaths);
 				total_subpaths = lcons(cheapest_total, total_subpaths);
+
+				if (cheapest_fractional)
+				{
+					cheapest_fractional = get_singleton_append_subpath(cheapest_fractional);
+					fractional_subpaths = lcons(cheapest_fractional, fractional_subpaths);
+				}
 			}
 			else
 			{
@@ -1823,6 +1864,10 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
 										  &startup_subpaths, NULL);
 				accumulate_append_subpath(cheapest_total,
 										  &total_subpaths, NULL);
+
+				if (cheapest_fractional)
+					accumulate_append_subpath(cheapest_fractional,
+											  &fractional_subpaths, NULL);
 			}
 		}
 
@@ -1849,6 +1894,17 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
 														  0,
 														  false,
 														  -1));
+
+			if (fractional_subpaths)
+				add_path(rel, (Path *) create_append_path(root,
+														  rel,
+														  fractional_subpaths,
+														  NIL,
+														  pathkeys,
+														  NULL,
+														  0,
+														  false,
+														  -1));
 		}
 		else
 		{
@@ -1864,6 +1920,13 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
 																total_subpaths,
 																pathkeys,
 																NULL));
+
+			if (fractional_subpaths)
+				add_path(rel, (Path *) create_merge_append_path(root,
+																rel,
+																fractional_subpaths,
+																pathkeys,
+																NULL));
 		}
 	}
 }
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 27f7525b3e..bb5b7c47a4 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -4862,3 +4862,51 @@ SELECT t1.*, t2.* FROM alpha t1 INNER JOIN beta t2 ON (t1.a = t2.a AND t1.b = t2
   1 | 209 | 0009 |  1 | 209 | 0009
 (8 rows)
 
+-- partitionwise join with fractional paths
+CREATE TABLE fract_t (id BIGINT, PRIMARY KEY (id)) PARTITION BY RANGE (id);
+CREATE TABLE fract_t0 PARTITION OF fract_t FOR VALUES FROM ('0') TO ('1000');
+CREATE TABLE fract_t1 PARTITION OF fract_t FOR VALUES FROM ('1000') TO ('2000');
+-- insert data
+INSERT INTO fract_t (id) (SELECT generate_series(0, 1999));
+ANALYZE fract_t;
+-- verify plan; nested index only scans
+SET max_parallel_workers_per_gather = 0;
+SET enable_partitionwise_join = on;
+EXPLAIN (COSTS OFF)
+SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER BY id ASC LIMIT 10;
+                              QUERY PLAN                               
+-----------------------------------------------------------------------
+ Limit
+   ->  Merge Append
+         Sort Key: x.id
+         ->  Merge Left Join
+               Merge Cond: (x_1.id = y_1.id)
+               ->  Index Only Scan using fract_t0_pkey on fract_t0 x_1
+               ->  Index Only Scan using fract_t0_pkey on fract_t0 y_1
+         ->  Merge Left Join
+               Merge Cond: (x_2.id = y_2.id)
+               ->  Index Only Scan using fract_t1_pkey on fract_t1 x_2
+               ->  Index Only Scan using fract_t1_pkey on fract_t1 y_2
+(11 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER BY id DESC LIMIT 10;
+                                   QUERY PLAN                                   
+--------------------------------------------------------------------------------
+ Limit
+   ->  Merge Append
+         Sort Key: x.id DESC
+         ->  Nested Loop Left Join
+               ->  Index Only Scan Backward using fract_t0_pkey on fract_t0 x_1
+               ->  Index Only Scan using fract_t0_pkey on fract_t0 y_1
+                     Index Cond: (id = x_1.id)
+         ->  Nested Loop Left Join
+               ->  Index Only Scan Backward using fract_t1_pkey on fract_t1 x_2
+               ->  Index Only Scan using fract_t1_pkey on fract_t1 y_2
+                     Index Cond: (id = x_2.id)
+(11 rows)
+
+-- cleanup
+DROP TABLE fract_t;
+RESET max_parallel_workers_per_gather;
+RESET enable_partitionwise_join;
diff --git a/src/test/regress/sql/partition_join.sql b/src/test/regress/sql/partition_join.sql
index d97b5b69ff..67f506361f 100644
--- a/src/test/regress/sql/partition_join.sql
+++ b/src/test/regress/sql/partition_join.sql
@@ -1142,3 +1142,28 @@ SELECT t1.*, t2.* FROM alpha t1 INNER JOIN beta t2 ON (t1.a = t2.a AND t1.c = t2
 EXPLAIN (COSTS OFF)
 SELECT t1.*, t2.* FROM alpha t1 INNER JOIN beta t2 ON (t1.a = t2.a AND t1.b = t2.b AND t1.c = t2.c) WHERE ((t1.b >= 100 AND t1.b < 110) OR (t1.b >= 200 AND t1.b < 210)) AND ((t2.b >= 100 AND t2.b < 110) OR (t2.b >= 200 AND t2.b < 210)) AND t1.c IN ('0004', '0009') ORDER BY t1.a, t1.b;
 SELECT t1.*, t2.* FROM alpha t1 INNER JOIN beta t2 ON (t1.a = t2.a AND t1.b = t2.b AND t1.c = t2.c) WHERE ((t1.b >= 100 AND t1.b < 110) OR (t1.b >= 200 AND t1.b < 210)) AND ((t2.b >= 100 AND t2.b < 110) OR (t2.b >= 200 AND t2.b < 210)) AND t1.c IN ('0004', '0009') ORDER BY t1.a, t1.b;
+
+-- partitionwise join with fractional paths
+CREATE TABLE fract_t (id BIGINT, PRIMARY KEY (id)) PARTITION BY RANGE (id);
+CREATE TABLE fract_t0 PARTITION OF fract_t FOR VALUES FROM ('0') TO ('1000');
+CREATE TABLE fract_t1 PARTITION OF fract_t FOR VALUES FROM ('1000') TO ('2000');
+
+-- insert data
+INSERT INTO fract_t (id) (SELECT generate_series(0, 1999));
+ANALYZE fract_t;
+
+-- verify plan; nested index only scans
+SET max_parallel_workers_per_gather = 0;
+SET enable_partitionwise_join = on;
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER BY id ASC LIMIT 10;
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER BY id DESC LIMIT 10;
+
+-- cleanup
+DROP TABLE fract_t;
+
+RESET max_parallel_workers_per_gather;
+RESET enable_partitionwise_join;

Reply via email to