On Thu, Apr 15, 2021 at 6:27 PM Tomas Vondra <tomas.von...@enterprisedb.com> wrote:
> > > On 4/15/21 7:35 PM, James Coleman wrote: > > On Thu, Apr 15, 2021 at 5:33 AM Luc Vlaming <l...@swarm64.com> wrote: > >> > >> On 15-04-2021 04:01, James Coleman wrote: > >>> On Wed, Apr 14, 2021 at 5:42 PM James Coleman <jtc...@gmail.com> > wrote: > >>>> > >>>> On Mon, Apr 12, 2021 at 8:37 AM Tomas Vondra > >>>> <tomas.von...@enterprisedb.com> wrote: > >>>>> > >>>>> On 4/12/21 2:24 PM, Luc Vlaming wrote: > >>>>>> Hi, > >>>>>> > >>>>>> When trying to run on master (but afaik also PG-13) TPC-DS queries > 94, > >>>>>> 95 and 96 on a SF10 I get the error "could not find pathkey item to > sort". > >>>>>> When I disable enable_gathermerge the problem goes away and then the > >>>>>> plan for query 94 looks like below. I tried figuring out what the > >>>>>> problem is but to be honest I would need some pointers as the code > that > >>>>>> tries to matching equivalence members in prepare_sort_from_pathkeys > is > >>>>>> something i'm really not familiar with. > >>>>>> > >>>>> > >>>>> Could be related to incremental sort, which allowed some gather merge > >>>>> paths that were impossible before. We had a couple issues related to > >>>>> that fixed in November, IIRC. > >>>>> > >>>>>> To reproduce you can either ingest and test using the toolkit I > used too > >>>>>> (see https://github.com/swarm64/s64da-benchmark-toolkit/), or > >>>>>> alternatively just use the schema (see > >>>>>> > https://github.com/swarm64/s64da-benchmark-toolkit/tree/master/benchmarks/tpcds/schemas/psql_native > ) > >>>>>> > >>>>> > >>>>> Thanks, I'll see if I can reproduce that with your schema. > >>>>> > >>>>> > >>>>> regards > >>>>> > >>>>> -- > >>>>> Tomas Vondra > >>>>> EnterpriseDB: http://www.enterprisedb.com > >>>>> The Enterprise PostgreSQL Company > >>>> > >>>> The query in question is: > >>>> > >>>> select count(*) > >>>> from store_sales > >>>> ,household_demographics > >>>> ,time_dim, store > >>>> where ss_sold_time_sk = time_dim.t_time_sk > >>>> and ss_hdemo_sk = household_demographics.hd_demo_sk > >>>> and ss_store_sk = s_store_sk > >>>> and time_dim.t_hour = 15 > >>>> and time_dim.t_minute >= 30 > >>>> and household_demographics.hd_dep_count = 7 > >>>> and store.s_store_name = 'ese' > >>>> order by count(*) > >>>> limit 100; > >>>> > >>>> From debugging output it looks like this is the plan being chosen > >>>> (cheapest total path): > >>>> Gather(store_sales household_demographics time_dim) > rows=60626 > >>>> cost=3145.73..699910.15 > >>>> HashJoin(store_sales household_demographics time_dim) > >>>> rows=25261 cost=2145.73..692847.55 > >>>> clauses: store_sales.ss_hdemo_sk = > >>>> household_demographics.hd_demo_sk > >>>> HashJoin(store_sales time_dim) rows=252609 > >>>> cost=1989.73..692028.08 > >>>> clauses: store_sales.ss_sold_time_sk = > >>>> time_dim.t_time_sk > >>>> SeqScan(store_sales) rows=11998564 > >>>> cost=0.00..658540.64 > >>>> SeqScan(time_dim) rows=1070 > >>>> cost=0.00..1976.35 > >>>> SeqScan(household_demographics) rows=720 > >>>> cost=0.00..147.00 > >>>> > >>>> prepare_sort_from_pathkeys fails to find a pathkey because > >>>> tlist_member_ignore_relabel returns null -- which seemed weird because > >>>> the sortexpr is an Aggref (in a single member equivalence class) and > >>>> the tlist contains a single member that's also an Aggref. It turns out > >>>> that the only difference between the two Aggrefs is that the tlist > >>>> entry has "aggsplit = AGGSPLIT_INITIAL_SERIAL" while the sortexpr has > >>>> aggsplit = AGGSPLIT_SIMPLE. > >>>> > >>>> That's as far as I've gotten so far, but I figured I'd get that info > >>>> out to see if it means anything obvious to anyone else. > >>> > >>> This really goes back to [1] where we fixed a similar issue by making > >>> find_em_expr_usable_for_sorting_rel parallel the rules in > >>> prepare_sort_from_pathkeys. > >>> > >>> Most of those conditions got copied, and the case we were trying to > >>> handle is the fact that prepare_sort_from_pathkeys can generate a > >>> target list entry under those conditions if one doesn't exist. However > >>> there's a further restriction there I don't remember looking at: it > >>> uses pull_var_clause and tlist_member_ignore_relabel to ensure that > >>> all of the vars that feed into the sort expression are found in the > >>> target list. As I understand it, that is: it will build a target list > >>> entry for something like "md5(column)" if "column" (and that was one > >>> of our test cases for the previous fix) is in the target list already. > >>> > >>> But there's an additional detail here: the call to pull_var_clause > >>> requests aggregates, window functions, and placeholders be treated as > >>> vars. That means for our Aggref case it would require that the two > >>> Aggrefs be fully equal, so the differing aggsplit member would cause a > >>> target list entry not to be built, hence our error here. > >>> > >>> I've attached a quick and dirty patch that encodes that final rule > >>> from prepare_sort_from_pathkeys into > >>> find_em_expr_usable_for_sorting_rel. I can't help but think that > >>> there's a cleaner way to do with this with less code duplication, but > >>> hindering that is that prepare_sort_from_pathkeys is working with a > >>> TargetList while find_em_expr_usable_for_sorting_rel is working with a > >>> list of expressions. > >>> > > Yeah, I think it'll be difficult to reuse code from later planner stages > exactly because it operates on different representation. So something > like your patch is likely necessary. > > As for the patch, I have a couple comments: > > 1) expr_list_member_ignore_relabel would deserve a better comment, and > maybe a reference to tlist_member_ignore_relabel which it copies > > 2) I suppose the comment before "if (ec->ec_has_volatile)" needs > updating, because now it says we're done as long as the expression is > not volatile (but we're doing more stuff). > > 3) Shouldn't find_em_expr_usable_for_sorting_rel now mostly mimic what > prepare_sort_from_pathkeys does? That is, try to match the entries > directly first, before the new pull_vars() business? > > 4) I've simplified the foreach() loop a bit. prepare_sort_from_pathkeys > does it differently, but that's because there are multiple foreach > levels, I think. Yes, we'll not free the list, but I that's what most > other places in planner do ... > > regards > > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > Hi, if (!expr_list_member_ignore_relabel(lfirst(k), target->exprs)) - break; + return NULL; I think it would be better if list_free(exprvars) is called before the return. Cheers