On Thu, Apr 15, 2021 at 5:33 AM Luc Vlaming <l...@swarm64.com> wrote: > > On 15-04-2021 04:01, James Coleman wrote: > > On Wed, Apr 14, 2021 at 5:42 PM James Coleman <jtc...@gmail.com> wrote: > >> > >> On Mon, Apr 12, 2021 at 8:37 AM Tomas Vondra > >> <tomas.von...@enterprisedb.com> wrote: > >>> > >>> On 4/12/21 2:24 PM, Luc Vlaming wrote: > >>>> Hi, > >>>> > >>>> When trying to run on master (but afaik also PG-13) TPC-DS queries 94, > >>>> 95 and 96 on a SF10 I get the error "could not find pathkey item to > >>>> sort". > >>>> When I disable enable_gathermerge the problem goes away and then the > >>>> plan for query 94 looks like below. I tried figuring out what the > >>>> problem is but to be honest I would need some pointers as the code that > >>>> tries to matching equivalence members in prepare_sort_from_pathkeys is > >>>> something i'm really not familiar with. > >>>> > >>> > >>> Could be related to incremental sort, which allowed some gather merge > >>> paths that were impossible before. We had a couple issues related to > >>> that fixed in November, IIRC. > >>> > >>>> To reproduce you can either ingest and test using the toolkit I used too > >>>> (see https://github.com/swarm64/s64da-benchmark-toolkit/), or > >>>> alternatively just use the schema (see > >>>> https://github.com/swarm64/s64da-benchmark-toolkit/tree/master/benchmarks/tpcds/schemas/psql_native) > >>>> > >>> > >>> Thanks, I'll see if I can reproduce that with your schema. > >>> > >>> > >>> regards > >>> > >>> -- > >>> Tomas Vondra > >>> EnterpriseDB: http://www.enterprisedb.com > >>> The Enterprise PostgreSQL Company > >> > >> The query in question is: > >> > >> select count(*) > >> from store_sales > >> ,household_demographics > >> ,time_dim, store > >> where ss_sold_time_sk = time_dim.t_time_sk > >> and ss_hdemo_sk = household_demographics.hd_demo_sk > >> and ss_store_sk = s_store_sk > >> and time_dim.t_hour = 15 > >> and time_dim.t_minute >= 30 > >> and household_demographics.hd_dep_count = 7 > >> and store.s_store_name = 'ese' > >> order by count(*) > >> limit 100; > >> > >> From debugging output it looks like this is the plan being chosen > >> (cheapest total path): > >> Gather(store_sales household_demographics time_dim) rows=60626 > >> cost=3145.73..699910.15 > >> HashJoin(store_sales household_demographics time_dim) > >> rows=25261 cost=2145.73..692847.55 > >> clauses: store_sales.ss_hdemo_sk = > >> household_demographics.hd_demo_sk > >> HashJoin(store_sales time_dim) rows=252609 > >> cost=1989.73..692028.08 > >> clauses: store_sales.ss_sold_time_sk = > >> time_dim.t_time_sk > >> SeqScan(store_sales) rows=11998564 > >> cost=0.00..658540.64 > >> SeqScan(time_dim) rows=1070 > >> cost=0.00..1976.35 > >> SeqScan(household_demographics) rows=720 > >> cost=0.00..147.00 > >> > >> prepare_sort_from_pathkeys fails to find a pathkey because > >> tlist_member_ignore_relabel returns null -- which seemed weird because > >> the sortexpr is an Aggref (in a single member equivalence class) and > >> the tlist contains a single member that's also an Aggref. It turns out > >> that the only difference between the two Aggrefs is that the tlist > >> entry has "aggsplit = AGGSPLIT_INITIAL_SERIAL" while the sortexpr has > >> aggsplit = AGGSPLIT_SIMPLE. > >> > >> That's as far as I've gotten so far, but I figured I'd get that info > >> out to see if it means anything obvious to anyone else. > > > > This really goes back to [1] where we fixed a similar issue by making > > find_em_expr_usable_for_sorting_rel parallel the rules in > > prepare_sort_from_pathkeys. > > > > Most of those conditions got copied, and the case we were trying to > > handle is the fact that prepare_sort_from_pathkeys can generate a > > target list entry under those conditions if one doesn't exist. However > > there's a further restriction there I don't remember looking at: it > > uses pull_var_clause and tlist_member_ignore_relabel to ensure that > > all of the vars that feed into the sort expression are found in the > > target list. As I understand it, that is: it will build a target list > > entry for something like "md5(column)" if "column" (and that was one > > of our test cases for the previous fix) is in the target list already. > > > > But there's an additional detail here: the call to pull_var_clause > > requests aggregates, window functions, and placeholders be treated as > > vars. That means for our Aggref case it would require that the two > > Aggrefs be fully equal, so the differing aggsplit member would cause a > > target list entry not to be built, hence our error here. > > > > I've attached a quick and dirty patch that encodes that final rule > > from prepare_sort_from_pathkeys into > > find_em_expr_usable_for_sorting_rel. I can't help but think that > > there's a cleaner way to do with this with less code duplication, but > > hindering that is that prepare_sort_from_pathkeys is working with a > > TargetList while find_em_expr_usable_for_sorting_rel is working with a > > list of expressions. > > > > James > > > > 1: > > https://www.postgresql.org/message-id/CAAaqYe9C3f6A_tZCRfr9Dm7hPpgGwpp4i-K_%3DNS9GWXuNiFANg%40mail.gmail.com > > > > Hi, > > The patch seems to make the planner proceed and not error out anymore. > Cannot judge if it's doing the right thing however or if its enough :) > It works for me for all reported queries however (queries 94-96). > > And sorry for the confusion wrt the stacktrace and plan. I tried to > produce a plan to possibly help with debugging but that would ofc then > not have the problem of the missing sortkey as otherwise i cannot > present a plan :) The stacktrace was however correct, and the plan > considered involved a gather-merge with a sort. Unfortunately I could > not (easily) get the plan outputted in the end; even when setting the > costs to 0 somehow... > > Regards, > Luc
Same patch, but with a test case now. James
v2-0001-Fix-find_em_expr_usable_for_sorting_rel.patch
Description: Binary data