Hi, Attached is my take on simplification of the useful pathkeyes thing. It keeps the function, but it truncates query_pathkeys to only members with EC members in the relation. I think that's essentially the optimization you've proposed.
I've also noticed an issue in explain output. EXPLAIN ANALYZE on a simple query gives me this: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------- Gather Merge (cost=66.30..816060.48 rows=8333226 width=24) (actual time=6.464..19091.006 rows=10000000 loops=1) Workers Planned: 2 Workers Launched: 2 -> Incremental Sort (cost=66.28..729188.13 rows=4166613 width=24) (actual time=1.836..13401.109 rows=3333333 loops=3) Sort Key: a, b, c Presorted Key: a, b Full-sort Groups: 4156 (Methods: quicksort) Memory: 30kB (avg), 30kB (max) Presorted Groups: 4137 (Methods: quicksort) Memory: 108kB (avg), 111kB (max) Full-sort Groups: 6888 (Methods: ) Memory: 30kB (avg), 30kB (max) Presorted Groups: 6849 (Methods: ) Memory: 121kB (avg), 131kB (max) Full-sort Groups: 6869 (Methods: ) Memory: 30kB (avg), 30kB (max) Presorted Groups: 6816 (Methods: ) Memory: 128kB (avg), 132kB (max) -> Parallel Index Scan using t_a_b_idx on t (cost=0.43..382353.69 rows=4166613 width=24) (actual time=0.033..9346.679 rows=3333333 loops=3) Planning Time: 0.133 ms Execution Time: 23998.669 ms (15 rows) while with incremental sort disabled it looks like this: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- Gather Merge (cost=734387.50..831676.35 rows=8333226 width=24) (actual time=5597.978..14967.246 rows=10000000 loops=1) Workers Planned: 2 Workers Launched: 2 -> Sort (cost=734387.47..744804.00 rows=4166613 width=24) (actual time=5584.616..7645.711 rows=3333333 loops=3) Sort Key: a, b, c Sort Method: external merge Disk: 111216kB Worker 0: Sort Method: external merge Disk: 109552kB Worker 1: Sort Method: external merge Disk: 112112kB -> Parallel Seq Scan on t (cost=0.00..105361.13 rows=4166613 width=24) (actual time=0.011..1753.128 rows=3333333 loops=3) Planning Time: 0.048 ms Execution Time: 19682.582 ms (11 rows) So I think there's a couple of issues: 1) Missing worker identification (Worker #). 2) Missing method for workers (we have it for the leader, though). 3) I'm not sure why the lable is "Methods" instead of "Sort Method", and why it's in parenthesis. 4) Not sure having two lines for each worker is a great idea. 5) I'd probably prefer having multiple labels for avg/max memory values, instead of (avg) and (max) notes. Also, I think we use "peak" in this context instead of "max". regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index 32bf734820..49d4990e66 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -2748,7 +2748,6 @@ static List * get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel) { List *useful_pathkeys_list = NIL; - ListCell *lc; /* * Considering query_pathkeys is always worth it, because it might allow us @@ -2756,29 +2755,27 @@ get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel) */ if (root->query_pathkeys) { - bool query_pathkeys_ok = true; + ListCell *lc; + List *pathkeys = NIL; foreach(lc, root->query_pathkeys) { PathKey *pathkey = (PathKey *) lfirst(lc); EquivalenceClass *pathkey_ec = pathkey->pk_eclass; - Expr *em_expr; /* - * We can't use incremental sort for pathkeys containing volatile - * expressions. We could walk the exppression itself, but checking - * ec_has_volatile here saves some cycles. + * We can't build Incremental Sort when pathkeys contain elements + * without an EC member not contained in the current relation, so + * just ignore anything after the first such pathkey. */ - if (pathkey_ec->ec_has_volatile || - !(em_expr = find_em_expr_for_rel(pathkey_ec, rel))) - { - query_pathkeys_ok = false; + if (!find_em_expr_for_rel(pathkey_ec, rel)) break; - } + + pathkeys = lappend(pathkeys, pathkey); } - if (query_pathkeys_ok) - useful_pathkeys_list = list_make1(list_copy(root->query_pathkeys)); + if (pathkeys) + useful_pathkeys_list = lappend(useful_pathkeys_list, pathkeys); } return useful_pathkeys_list;