On Mon, Apr 12, 2021 at 8:37 AM Tomas Vondra <tomas.von...@enterprisedb.com> wrote: > > On 4/12/21 2:24 PM, Luc Vlaming wrote: > > Hi, > > > > When trying to run on master (but afaik also PG-13) TPC-DS queries 94, > > 95 and 96 on a SF10 I get the error "could not find pathkey item to sort". > > When I disable enable_gathermerge the problem goes away and then the > > plan for query 94 looks like below. I tried figuring out what the > > problem is but to be honest I would need some pointers as the code that > > tries to matching equivalence members in prepare_sort_from_pathkeys is > > something i'm really not familiar with. > > > > Could be related to incremental sort, which allowed some gather merge > paths that were impossible before. We had a couple issues related to > that fixed in November, IIRC. > > > To reproduce you can either ingest and test using the toolkit I used too > > (see https://github.com/swarm64/s64da-benchmark-toolkit/), or > > alternatively just use the schema (see > > https://github.com/swarm64/s64da-benchmark-toolkit/tree/master/benchmarks/tpcds/schemas/psql_native) > > > > Thanks, I'll see if I can reproduce that with your schema. > > > regards > > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company
The query in question is: select count(*) from store_sales ,household_demographics ,time_dim, store where ss_sold_time_sk = time_dim.t_time_sk and ss_hdemo_sk = household_demographics.hd_demo_sk and ss_store_sk = s_store_sk and time_dim.t_hour = 15 and time_dim.t_minute >= 30 and household_demographics.hd_dep_count = 7 and store.s_store_name = 'ese' order by count(*) limit 100; >From debugging output it looks like this is the plan being chosen (cheapest total path): Gather(store_sales household_demographics time_dim) rows=60626 cost=3145.73..699910.15 HashJoin(store_sales household_demographics time_dim) rows=25261 cost=2145.73..692847.55 clauses: store_sales.ss_hdemo_sk = household_demographics.hd_demo_sk HashJoin(store_sales time_dim) rows=252609 cost=1989.73..692028.08 clauses: store_sales.ss_sold_time_sk = time_dim.t_time_sk SeqScan(store_sales) rows=11998564 cost=0.00..658540.64 SeqScan(time_dim) rows=1070 cost=0.00..1976.35 SeqScan(household_demographics) rows=720 cost=0.00..147.00 prepare_sort_from_pathkeys fails to find a pathkey because tlist_member_ignore_relabel returns null -- which seemed weird because the sortexpr is an Aggref (in a single member equivalence class) and the tlist contains a single member that's also an Aggref. It turns out that the only difference between the two Aggrefs is that the tlist entry has "aggsplit = AGGSPLIT_INITIAL_SERIAL" while the sortexpr has aggsplit = AGGSPLIT_SIMPLE. That's as far as I've gotten so far, but I figured I'd get that info out to see if it means anything obvious to anyone else. James