Re: [PATCH] Incremental sort (was: PoC: Partial sort)

Tomas Vondra Sat, 28 Mar 2020 16:00:09 -0700

Hi,

Attached is my take on simplification of the useful pathkeyes thing. It
keeps the function, but it truncates query_pathkeys to only members with
EC members in the relation. I think that's essentially the optimization
you've proposed.


I've also noticed an issue in explain output. EXPLAIN ANALYZE on a simple
query gives me this:

                                                                      QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather Merge  (cost=66.30..816060.48 rows=8333226 width=24) (actual 
time=6.464..19091.006 rows=10000000 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Incremental Sort  (cost=66.28..729188.13 rows=4166613 width=24) (actual 
time=1.836..13401.109 rows=3333333 loops=3)
         Sort Key: a, b, c
         Presorted Key: a, b
         Full-sort Groups: 4156 (Methods: quicksort) Memory: 30kB (avg), 30kB 
(max)
         Presorted Groups: 4137 (Methods: quicksort) Memory: 108kB (avg), 111kB 
(max)
         Full-sort Groups: 6888 (Methods: ) Memory: 30kB (avg), 30kB (max)
         Presorted Groups: 6849 (Methods: ) Memory: 121kB (avg), 131kB (max)
         Full-sort Groups: 6869 (Methods: ) Memory: 30kB (avg), 30kB (max)
         Presorted Groups: 6816 (Methods: ) Memory: 128kB (avg), 132kB (max)
         ->  Parallel Index Scan using t_a_b_idx on t  (cost=0.43..382353.69 
rows=4166613 width=24) (actual time=0.033..9346.679 rows=3333333 loops=3)
 Planning Time: 0.133 ms
 Execution Time: 23998.669 ms
(15 rows)

while with incremental sort disabled it looks like this:

                                                             QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
 Gather Merge  (cost=734387.50..831676.35 rows=8333226 width=24) (actual 
time=5597.978..14967.246 rows=10000000 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Sort  (cost=734387.47..744804.00 rows=4166613 width=24) (actual 
time=5584.616..7645.711 rows=3333333 loops=3)
         Sort Key: a, b, c
         Sort Method: external merge  Disk: 111216kB
         Worker 0:  Sort Method: external merge  Disk: 109552kB
         Worker 1:  Sort Method: external merge  Disk: 112112kB
         ->  Parallel Seq Scan on t  (cost=0.00..105361.13 rows=4166613 
width=24) (actual time=0.011..1753.128 rows=3333333 loops=3)
 Planning Time: 0.048 ms
 Execution Time: 19682.582 ms
(11 rows)

So I think there's a couple of issues:

1) Missing worker identification (Worker #).

2) Missing method for workers (we have it for the leader, though).

3) I'm not sure why the lable is "Methods" instead of "Sort Method", and
why it's in parenthesis.

4) Not sure having two lines for each worker is a great idea.

5) I'd probably prefer having multiple labels for avg/max memory values,
instead of (avg) and (max) notes. Also, I think we use "peak" in this
context instead of "max".


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/src/backend/optimizer/path/allpaths.c 
b/src/backend/optimizer/path/allpaths.c
index 32bf734820..49d4990e66 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -2748,7 +2748,6 @@ static List *
 get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel)
 {
        List       *useful_pathkeys_list = NIL;
-       ListCell   *lc;
 
        /*
         * Considering query_pathkeys is always worth it, because it might 
allow us
@@ -2756,29 +2755,27 @@ get_useful_pathkeys_for_relation(PlannerInfo *root, 
RelOptInfo *rel)
         */
        if (root->query_pathkeys)
        {
-               bool            query_pathkeys_ok = true;
+               ListCell   *lc;
+               List       *pathkeys = NIL;
 
                foreach(lc, root->query_pathkeys)
                {
                        PathKey    *pathkey = (PathKey *) lfirst(lc);
                        EquivalenceClass *pathkey_ec = pathkey->pk_eclass;
-                       Expr       *em_expr;
 
                        /*
-                        * We can't use incremental sort for pathkeys 
containing volatile
-                        * expressions. We could walk the exppression itself, 
but checking
-                        * ec_has_volatile here saves some cycles.
+                        * We can't build Incremental Sort when pathkeys 
contain elements
+                        * without an EC member not contained in the current 
relation, so
+                        * just ignore anything after the first such pathkey.
                         */
-                       if (pathkey_ec->ec_has_volatile ||
-                               !(em_expr = find_em_expr_for_rel(pathkey_ec, 
rel)))
-                       {
-                               query_pathkeys_ok = false;
+                       if (!find_em_expr_for_rel(pathkey_ec, rel))
                                break;
-                       }
+
+                       pathkeys = lappend(pathkeys, pathkey);
                }
 
-               if (query_pathkeys_ok)
-                       useful_pathkeys_list = 
list_make1(list_copy(root->query_pathkeys));
+               if (pathkeys)
+                       useful_pathkeys_list = lappend(useful_pathkeys_list, 
pathkeys);
        }
 
        return useful_pathkeys_list;

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

Reply via email to