Re: [PATCH] Teach planner to further optimize sort in distinct

Ankit Kumar Pandey Thu, 19 Jan 2023 11:26:36 -0800

On 19/01/23 18:49, David Rowley wrote:

I think you should write a function like:

bool pathkeys_count_contained_in_unordered(List *keys1, List *keys2,
List **reorderedkeys, int *n_common)

which works very similarly to pathkeys> _count_contained_in, but
populates *reorderedkeys so it contains all of the keys in keys1, but
put the matching ones in the same order as they are in keys2.  If you
find a keys2 that does not exist in keys1 then just add the additional
unmatched keys1 keys to *reorderedkeys.  Set *n_common to the number
of common keys excluding any that come after a key2 key that does not
exist as a key1 key.

You can just switch to using that function in
create_final_distinct_paths(). You'll need to consider if the query is
a DISTINCT ON query and not try the unordered version of the function
in that case.


Tried this, it worked as expected. Tests are green as well.

I also just noticed that in build_index_paths() we'll leave the index
path's pathkeys empty if we deem the pathkeys as useless.  I'm not
sure what the repercussions of setting those to the return value of
build_index_pathkeys() if useful_pathkeys is otherwise empty.


This is very rigid indeed.

It's possible that truncate_useless_pathkeys() needs to be modified to
check if the pathkeys might be useful for DISTINCT


We have pathkeys_useful_for_merging and pathkeys_useful_for_ordering.

Can we not have pathkeys_useful_for_distinct?

Also, pathkeys_useful_for_ordering calls pathkeys_count_contained_in.

We can add code path on similar lines.

When I
thought about it I assumed that we always set IndexPath's pathkeys to
whatever (if any) sort order that the index provides.


Can we not added original path keys in IndexPath? It could be useful

at other places as well. Atleast, I can see it useful in sorting cases.

makes me wonder if this entire patch is a good idea.


We are still getting some benefit even without index paths for now.


I have attached patch with pathkeys_count_contained_in_unordered

and corresponding changes in create_final_distinct_paths for reference.


Thanks,

Ankit

diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index d2e241c983..f0bc977eff 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -2014,3 +2014,52 @@ has_useful_pathkeys(PlannerInfo *root, RelOptInfo *rel)
 		return true;			/* might be able to use them for ordering */
 	return false;				/* definitely useless */
 }
+
+/*
+ * pathkeys_count_contained_in_unordered
+ *		reorders keys1 to match keys2
+ * Populates *reorderedkeys so it contains all of the keys in keys1, but
+ * put the matching ones in the same order as they are in keys2.  If keys
+ * in keys1 which do not match keys2 are appended in the end.
+ * n_common denotes number of matched keys.
+ */
+bool
+pathkeys_count_contained_in_unordered(List *keys1, List *keys2,
+									  List **reorderedkeys, int *n_common)
+{
+	ListCell*	lc1;
+	ListCell*	lc2;
+	int			n = 0;
+	List*		unmatched_keys = NIL;
+
+	foreach(lc1, keys1)
+	{
+		bool matched = false;
+		PathKey    *pathkey1 = (PathKey *) lfirst(lc1);
+		foreach(lc2, keys2)
+		{
+			PathKey    *pathkey2 = (PathKey *) lfirst(lc2);
+			if (pathkey1 == pathkey2)
+			{
+				*reorderedkeys = lappend(*reorderedkeys, pathkey2);
+				n++;
+				matched = true;
+				break;
+			}
+
+		}
+		/*
+		 * If no match is found, collect path key for appending it later
+		 */
+		if (!matched)
+			unmatched_keys = lappend(unmatched_keys, pathkey1);
+	}
+
+	Assert(n == list_length(*reorderedkeys));
+
+	*reorderedkeys = list_concat_unique(*reorderedkeys, unmatched_keys);
+	Assert(list_length(*reorderedkeys) == list_length(keys1));
+	*n_common = n;
+
+	return *n_common == list_length(keys1);
+}
\ No newline at end of file
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 05f44faf6e..d9904b046a 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4936,10 +4936,28 @@ create_final_distinct_paths(PlannerInfo *root, RelOptInfo *input_rel,
 			Path	   *sorted_path;
 			bool		is_sorted;
 			int			presorted_keys;
+			List*		reordered_keys = NIL;
 
-			is_sorted = pathkeys_count_contained_in(needed_pathkeys,
+			/*
+			 * Attempt to optimize non distinct-on queries
+			 * if sorted keys are present but not in required order.
+			 * We can modify needed_path keys as per as input path key
+			 * ordering and reuse input sort.
+			 */
+			if (!parse->hasDistinctOn)
+			{
+				is_sorted = pathkeys_count_contained_in_unordered(needed_pathkeys,
 													input_path->pathkeys,
+													&reordered_keys,
 													&presorted_keys);
+				needed_pathkeys = reordered_keys;
+			}
+			else
+			{
+				is_sorted = pathkeys_count_contained_in(needed_pathkeys,
+													input_path->pathkeys,
+													&presorted_keys);
+			}
 
 			if (is_sorted)
 				sorted_path = input_path;
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 9b38627efd..532174aeb5 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -253,6 +253,10 @@ extern List *truncate_useless_pathkeys(PlannerInfo *root,
 									   RelOptInfo *rel,
 									   List *pathkeys);
 extern bool has_useful_pathkeys(PlannerInfo *root, RelOptInfo *rel);
+bool pathkeys_count_contained_in_unordered(List *keys1,
+										   List *keys2,
+									       List **reorderedkeys,
+										   int *n_common);
 extern List *append_pathkeys(List *target, List *source);
 extern PathKey *make_canonical_pathkey(PlannerInfo *root,
 									   EquivalenceClass *eclass, Oid opfamily,

Re: [PATCH] Teach planner to further optimize sort in distinct

Reply via email to