On 4/15/21 7:35 PM, James Coleman wrote: > On Thu, Apr 15, 2021 at 5:33 AM Luc Vlaming <l...@swarm64.com> wrote: >> >> On 15-04-2021 04:01, James Coleman wrote: >>> On Wed, Apr 14, 2021 at 5:42 PM James Coleman <jtc...@gmail.com> wrote: >>>> >>>> On Mon, Apr 12, 2021 at 8:37 AM Tomas Vondra >>>> <tomas.von...@enterprisedb.com> wrote: >>>>> >>>>> On 4/12/21 2:24 PM, Luc Vlaming wrote: >>>>>> Hi, >>>>>> >>>>>> When trying to run on master (but afaik also PG-13) TPC-DS queries 94, >>>>>> 95 and 96 on a SF10 I get the error "could not find pathkey item to >>>>>> sort". >>>>>> When I disable enable_gathermerge the problem goes away and then the >>>>>> plan for query 94 looks like below. I tried figuring out what the >>>>>> problem is but to be honest I would need some pointers as the code that >>>>>> tries to matching equivalence members in prepare_sort_from_pathkeys is >>>>>> something i'm really not familiar with. >>>>>> >>>>> >>>>> Could be related to incremental sort, which allowed some gather merge >>>>> paths that were impossible before. We had a couple issues related to >>>>> that fixed in November, IIRC. >>>>> >>>>>> To reproduce you can either ingest and test using the toolkit I used too >>>>>> (see https://github.com/swarm64/s64da-benchmark-toolkit/), or >>>>>> alternatively just use the schema (see >>>>>> https://github.com/swarm64/s64da-benchmark-toolkit/tree/master/benchmarks/tpcds/schemas/psql_native) >>>>>> >>>>> >>>>> Thanks, I'll see if I can reproduce that with your schema. >>>>> >>>>> >>>>> regards >>>>> >>>>> -- >>>>> Tomas Vondra >>>>> EnterpriseDB: http://www.enterprisedb.com >>>>> The Enterprise PostgreSQL Company >>>> >>>> The query in question is: >>>> >>>> select count(*) >>>> from store_sales >>>> ,household_demographics >>>> ,time_dim, store >>>> where ss_sold_time_sk = time_dim.t_time_sk >>>> and ss_hdemo_sk = household_demographics.hd_demo_sk >>>> and ss_store_sk = s_store_sk >>>> and time_dim.t_hour = 15 >>>> and time_dim.t_minute >= 30 >>>> and household_demographics.hd_dep_count = 7 >>>> and store.s_store_name = 'ese' >>>> order by count(*) >>>> limit 100; >>>> >>>> From debugging output it looks like this is the plan being chosen >>>> (cheapest total path): >>>> Gather(store_sales household_demographics time_dim) rows=60626 >>>> cost=3145.73..699910.15 >>>> HashJoin(store_sales household_demographics time_dim) >>>> rows=25261 cost=2145.73..692847.55 >>>> clauses: store_sales.ss_hdemo_sk = >>>> household_demographics.hd_demo_sk >>>> HashJoin(store_sales time_dim) rows=252609 >>>> cost=1989.73..692028.08 >>>> clauses: store_sales.ss_sold_time_sk = >>>> time_dim.t_time_sk >>>> SeqScan(store_sales) rows=11998564 >>>> cost=0.00..658540.64 >>>> SeqScan(time_dim) rows=1070 >>>> cost=0.00..1976.35 >>>> SeqScan(household_demographics) rows=720 >>>> cost=0.00..147.00 >>>> >>>> prepare_sort_from_pathkeys fails to find a pathkey because >>>> tlist_member_ignore_relabel returns null -- which seemed weird because >>>> the sortexpr is an Aggref (in a single member equivalence class) and >>>> the tlist contains a single member that's also an Aggref. It turns out >>>> that the only difference between the two Aggrefs is that the tlist >>>> entry has "aggsplit = AGGSPLIT_INITIAL_SERIAL" while the sortexpr has >>>> aggsplit = AGGSPLIT_SIMPLE. >>>> >>>> That's as far as I've gotten so far, but I figured I'd get that info >>>> out to see if it means anything obvious to anyone else. >>> >>> This really goes back to [1] where we fixed a similar issue by making >>> find_em_expr_usable_for_sorting_rel parallel the rules in >>> prepare_sort_from_pathkeys. >>> >>> Most of those conditions got copied, and the case we were trying to >>> handle is the fact that prepare_sort_from_pathkeys can generate a >>> target list entry under those conditions if one doesn't exist. However >>> there's a further restriction there I don't remember looking at: it >>> uses pull_var_clause and tlist_member_ignore_relabel to ensure that >>> all of the vars that feed into the sort expression are found in the >>> target list. As I understand it, that is: it will build a target list >>> entry for something like "md5(column)" if "column" (and that was one >>> of our test cases for the previous fix) is in the target list already. >>> >>> But there's an additional detail here: the call to pull_var_clause >>> requests aggregates, window functions, and placeholders be treated as >>> vars. That means for our Aggref case it would require that the two >>> Aggrefs be fully equal, so the differing aggsplit member would cause a >>> target list entry not to be built, hence our error here. >>> >>> I've attached a quick and dirty patch that encodes that final rule >>> from prepare_sort_from_pathkeys into >>> find_em_expr_usable_for_sorting_rel. I can't help but think that >>> there's a cleaner way to do with this with less code duplication, but >>> hindering that is that prepare_sort_from_pathkeys is working with a >>> TargetList while find_em_expr_usable_for_sorting_rel is working with a >>> list of expressions. >>>
Yeah, I think it'll be difficult to reuse code from later planner stages exactly because it operates on different representation. So something like your patch is likely necessary. As for the patch, I have a couple comments: 1) expr_list_member_ignore_relabel would deserve a better comment, and maybe a reference to tlist_member_ignore_relabel which it copies 2) I suppose the comment before "if (ec->ec_has_volatile)" needs updating, because now it says we're done as long as the expression is not volatile (but we're doing more stuff). 3) Shouldn't find_em_expr_usable_for_sorting_rel now mostly mimic what prepare_sort_from_pathkeys does? That is, try to match the entries directly first, before the new pull_vars() business? 4) I've simplified the foreach() loop a bit. prepare_sort_from_pathkeys does it differently, but that's because there are multiple foreach levels, I think. Yes, we'll not free the list, but I that's what most other places in planner do ... regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
>From 57fa7a782fb1b7456c37ed64aded6410cd876258 Mon Sep 17 00:00:00 2001 From: jcoleman <jtc...@gmail.com> Date: Thu, 15 Apr 2021 01:48:36 +0000 Subject: [PATCH 1/2] Fix find_em_expr_usable_for_sorting_rel --- src/backend/optimizer/path/equivclass.c | 44 ++++++++++++++++++- .../regress/expected/incremental_sort.out | 26 +++++++++++ src/test/regress/sql/incremental_sort.sql | 7 +++ 3 files changed, 75 insertions(+), 2 deletions(-) diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c index 0188c1e9a1..27867e16e8 100644 --- a/src/backend/optimizer/path/equivclass.c +++ b/src/backend/optimizer/path/equivclass.c @@ -68,6 +68,7 @@ static Bitmapset *get_eclass_indexes_for_relids(PlannerInfo *root, Relids relids); static Bitmapset *get_common_eclass_indexes(PlannerInfo *root, Relids relids1, Relids relids2); +static Expr* expr_list_member_ignore_relabel(Expr *node, List *exprlist); /* @@ -798,6 +799,27 @@ find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel) return NULL; } +static Expr* +expr_list_member_ignore_relabel(Expr *node, List *exprlist) +{ + ListCell *temp; + + while (node && IsA(node, RelabelType)) + node = ((RelabelType *) node)->arg; + + foreach(temp, exprlist) + { + Expr *tlexpr = (Expr *) lfirst(temp); + + while (tlexpr && IsA(tlexpr, RelabelType)) + tlexpr = ((RelabelType *) tlexpr)->arg; + + if (equal(node, tlexpr)) + return node; + } + return NULL; +} + /* * Find an equivalence class member expression that can be safely used to build * a sort node using the provided relation. The rules are a subset of those @@ -819,6 +841,9 @@ find_em_expr_usable_for_sorting_rel(PlannerInfo *root, EquivalenceClass *ec, { EquivalenceMember *em = lfirst(lc_em); Expr *em_expr = em->em_expr; + List *exprvars; + ListCell *k; + PathTarget *target = rel->reltarget; /* * We shouldn't be trying to sort by an equivalence class that @@ -858,8 +883,23 @@ find_em_expr_usable_for_sorting_rel(PlannerInfo *root, EquivalenceClass *ec, * valuable here to expend the work trying to find a match in the * target's exprs since such a sort will have to be postponed anyway. */ - if (!ec->ec_has_volatile) - return em->em_expr; + if (ec->ec_has_volatile) + return NULL; + + + exprvars = pull_var_clause((Node *) em_expr, + PVC_INCLUDE_AGGREGATES | + PVC_INCLUDE_WINDOWFUNCS | + PVC_INCLUDE_PLACEHOLDERS); + foreach(k, exprvars) + { + if (!expr_list_member_ignore_relabel(lfirst(k), target->exprs)) + break; + } + list_free(exprvars); + + if (!k) /* If we made it through exprvars finding a tlist member for each */ + return em->em_expr; /* found usable expression */ } /* We didn't find any suitable equivalence class expression */ diff --git a/src/test/regress/expected/incremental_sort.out b/src/test/regress/expected/incremental_sort.out index a417b566d9..fd18af4c0c 100644 --- a/src/test/regress/expected/incremental_sort.out +++ b/src/test/regress/expected/incremental_sort.out @@ -1579,6 +1579,32 @@ order by 1, 2; -> Function Scan on generate_series (7 rows) +-- Parallel sort with an aggregate that be safely generated partially in parallel, +-- but we can't sort by the partial aggregates. +explain (costs off) select count(*) +from tenk1 t1 +join tenk1 t2 on t1.unique1 = t2.unique2 +join tenk1 t3 on t2.unique1 = t3.unique1 +order by count(*); + QUERY PLAN +----------------------------------------------------------------------------------------------- + Sort + Sort Key: (count(*)) + -> Finalize Aggregate + -> Gather + Workers Planned: 2 + -> Partial Aggregate + -> Parallel Hash Join + Hash Cond: (t2.unique1 = t3.unique1) + -> Parallel Hash Join + Hash Cond: (t1.unique1 = t2.unique2) + -> Parallel Index Only Scan using tenk1_unique1 on tenk1 t1 + -> Parallel Hash + -> Parallel Index Scan using tenk1_unique2 on tenk1 t2 + -> Parallel Hash + -> Parallel Index Only Scan using tenk1_unique1 on tenk1 t3 +(15 rows) + -- Parallel sort but with expression (correlated subquery) that -- is prohibited in parallel plans. explain (costs off) select distinct diff --git a/src/test/regress/sql/incremental_sort.sql b/src/test/regress/sql/incremental_sort.sql index 81429164d4..67748bb64b 100644 --- a/src/test/regress/sql/incremental_sort.sql +++ b/src/test/regress/sql/incremental_sort.sql @@ -257,6 +257,13 @@ from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub; explain (costs off) select sub.unique1, md5(stringu1) from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub order by 1, 2; +-- Parallel sort with an aggregate that be safely generated partially in parallel, +-- but we can't sort by the partial aggregates. +explain (costs off) select count(*) +from tenk1 t1 +join tenk1 t2 on t1.unique1 = t2.unique2 +join tenk1 t3 on t2.unique1 = t3.unique1 +order by count(*); -- Parallel sort but with expression (correlated subquery) that -- is prohibited in parallel plans. explain (costs off) select distinct -- 2.30.2
>From 68740de04e3bb21b15ed239def61474428a79cf4 Mon Sep 17 00:00:00 2001 From: Tomas Vondra <to...@2ndquadrant.com> Date: Fri, 16 Apr 2021 03:09:48 +0200 Subject: [PATCH 2/2] tweaks --- src/backend/optimizer/path/equivclass.c | 25 ++++++++++++++++++++----- 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c index 27867e16e8..bf1c95c847 100644 --- a/src/backend/optimizer/path/equivclass.c +++ b/src/backend/optimizer/path/equivclass.c @@ -799,6 +799,14 @@ find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel) return NULL; } +/* + * Checks if node is member of the list. RelabelType is ignored for expressions + * in the list. + * + * XXX We have list_member_XXX for different types, so maybe make that consistent? + * XXX Perhaps explain why the relabel is ignored. Why is it ignored recursively? + * XXX Maybe reference tlist_member_ignore_relabel, which is doing the same thing? + */ static Expr* expr_list_member_ignore_relabel(Expr *node, List *exprlist) { @@ -882,24 +890,31 @@ find_em_expr_usable_for_sorting_rel(PlannerInfo *root, EquivalenceClass *ec, * up a volatile expression to the proper tlist entry, it doesn't seem * valuable here to expend the work trying to find a match in the * target's exprs since such a sort will have to be postponed anyway. + * + * XXX Doesn't this comment need an update? + * + * XXX Maybe we should start matching the tlist entries? After all, + * prepare_sort_from_pathkeys matches directly, and only if not found + * it does the pull_var_clause stuff, no? */ if (ec->ec_has_volatile) return NULL; - + /* XXX comment would be nice ... */ exprvars = pull_var_clause((Node *) em_expr, PVC_INCLUDE_AGGREGATES | PVC_INCLUDE_WINDOWFUNCS | PVC_INCLUDE_PLACEHOLDERS); + + /* All expressions have to be present in the target list. */ foreach(k, exprvars) { if (!expr_list_member_ignore_relabel(lfirst(k), target->exprs)) - break; + return NULL; } - list_free(exprvars); - if (!k) /* If we made it through exprvars finding a tlist member for each */ - return em->em_expr; /* found usable expression */ + /* all the expressions have a matching target entry. */ + return em->em_expr; } /* We didn't find any suitable equivalence class expression */ -- 2.30.2