On Thu, Oct 13, 2022 at 2:48 PM David Rowley <dgrowle...@gmail.com> wrote:
> On Thu, 13 Oct 2022 at 16:47, Richard Guo <guofengli...@gmail.com> wrote: > > Currently in the patch the optimization is done before we check for > > presorted paths or do the explicit sort of the cheapest path. How about > > we move this optimization into the branch where we've found any > > presorted paths? Maybe something like: > > I've attached a patch to that effect, but it turns out a bit more > complex than what you imagined. We still need to handle the case for > when there's no path that has the required pathkeys and we must add a > SortPath to the cheapest path. That requires adding some similar code > to add the LimitPath after the foreach loop over the pathlist is over. Thanks for the new patch. Previously I considered we just apply this optimization for adequately-presorted paths so that we can just fetch the first row from that path. But yes we can also do this optimization for explicit-sort case so that we can get the result from a top-1 heapsort, just like the new patch does. I was also getting some weird plans like: > > regression=# explain select distinct on (four) four,hundred from tenk1 > where four=0 order by 1,2; > QUERY PLAN > ---------------------------------------------------------------------- > Sort (cost=0.20..0.20 rows=1 width=8) > Sort Key: hundred > -> Limit (cost=0.00..0.19 rows=1 width=8) > -> Seq Scan on tenk1 (cost=0.00..470.00 rows=2500 width=8) > Filter: (four = 0) > > To stop the planner from adding that final sort, I opted to hack the > LimitPath's pathkeys to say that it's already sorted by the > PlannerInfo's sort_pathkeys. That feels slightly icky, but it does > seem a little wasteful to initialise a sort node on every execution of > the plan to sort a single tuple. I don't get how this plan comes out. It seems not correct because Limit node above an unsorted path would give us an unpredictable row. I tried this query without the hack to LimitPath's pathkeys and I get plans below, with or with index scan: explain (costs off) select distinct on (four) four,hundred from tenk1 where four=0 order by 1,2; QUERY PLAN ----------------------------------------------------- Result -> Limit -> Index Scan using tenk1_hundred on tenk1 Filter: (four = 0) explain (costs off) select distinct on (four) four,hundred from tenk1 where four=0 order by 1,2; QUERY PLAN ---------------------------------- Limit -> Sort Sort Key: hundred -> Seq Scan on tenk1 Filter: (four = 0) Thanks Richard