Michael, David thanks for your quick replies. *@Michael* I initially dismissed writing this query using joins or subselects because the real query has about 80 columns and I was afraid that having 80 joins/subselect would cause issues with postgresql including planner that would fallback to GEQO. I'll test it anyway with real data and see how it behaves.
*@David About Subsequent sorts* >From my experiments the time of subsequent sorts is not reduced by previous sort but is in fact higher. In the following example in Query 1 Sort node on key c1, c2 has a time of ~1893 excluding child nodes In the following example in Query 2 - Sort node on key c1 has a time of ~1558 excluding child nodes - Sort node on key c1, c2 has a time of ~2964 excluding child nodes which is higher than the time for the same sort key in Query 1 but in Query 2 data postgresql could have benefited from a preliminary sort on c1. I can't figure why it would be slower. *Query1* explain analyze select first_value(c2) OVER (PARTITION BY c1 order by c2) from t; *Explain Plan Query1* WindowAgg (cost=135042.46..155042.48 rows=1000001 width=47) (actual time=1555.005..2639.659 rows=1000001 loops=1) -> Sort (cost=135042.46..137542.46 rows=1000001 width=15) (actual time=1554.989..2070.462 rows=1000001 loops=1) Sort Key: c1, c2 Sort Method: external merge Disk: 25960kB -> Seq Scan on t (cost=0.00..18294.01 rows=1000001 width=15) (actual time=0.013..177.037 rows=1000001 loops=1) Planning Time: 0.115 ms Execution Time: 2693.935 ms *Query2* explain analyze select first_value(c2) OVER (PARTITION BY c1), first_value(c2) OVER (PARTITION BY c1 order by c2) from t; *Explain Plan Query2* WindowAgg (cost=313730.43..333730.45 rows=1000001 width=79) (actual time=4692.257..5702.226 rows=1000001 loops=1) -> Sort (cost=313730.43..316230.43 rows=1000001 width=47) (actual time=4692.152..5174.928 rows=1000001 loops=1) Sort Key: c1, c2 Sort Method: external merge Disk: 32688kB -> WindowAgg (cost=135042.46..152542.48 rows=1000001 width=47) (actual time=1188.109..2210.845 rows=1000001 loops=1) -> Sort (cost=135042.46..137542.46 rows=1000001 width=15) (actual time=1188.099..1709.253 rows=1000001 loops=1) Sort Key: c1 Sort Method: external merge Disk: 25960kB -> Seq Scan on t (cost=0.00..18294.01 rows=1000001 width=15) (actual time=0.014..150.435 rows=1000001 loops=1) Planning Time: 0.059 ms Execution Time: 5756.591 ms *@David About using an index* You are correct that a simple index cannot be leveraged efficiently. However a covering index including all columns and starting with the partition column c1 could be used to avoid sorting on c1 right? But in my tests it is not used. In the following example covering_index could be used to avoid filtering on c1 but it doesn't: *Covering Index* create index covering_index on t (c1,c2, c3,c4); *Query* explain analyze select c1, first_value(c2) OVER (PARTITION BY c1 order by c2, c4) AS c2, first_value(c3) OVER (PARTITION BY c1 order by coalesce(c4, '000'), c3) AS c3 from t; *Plan* WindowAgg (cost=417853.93..440353.95 rows=1000001 width=123) (actual time=3071.980..4184.030 rows=1000001 loops=1) -> Sort (cost=417853.93..420353.93 rows=1000001 width=91) (actual time=3071.962..3575.641 rows=1000001 loops=1) Sort Key: c1, (COALESCE(c4, '000'::character varying)), c3 Sort Method: external merge Disk: 52912kB -> WindowAgg (cost=193152.96..215652.98 rows=1000001 width=91) (actual time=1268.373..2271.463 rows=1000001 loops=1) -> Sort (cost=193152.96..195652.96 rows=1000001 width=59) (actual time=1268.357..1711.526 rows=1000001 loops=1) Sort Key: c1, c2, c4 Sort Method: external merge Disk: 46168kB -> Seq Scan on t (cost=0.00..18294.01 rows=1000001 width=59) (actual time=0.014..163.347 rows=1000001 loops=1) Planning Time: 0.081 ms Execution Time: 4250.367 ms However the index is used if one of the partition by + order by matches entirely the beginning of the index. For instance query: select c1, first_value(c2) OVER (PARTITION BY c1 order by c2) AS c2, first_value(c3) OVER (PARTITION BY c1 order by coalesce(c4, '000'), c3) AS c3 from t; Uses covering_index to avoid the sort on c1, c2 On Tue, Jul 7, 2020 at 12:48 AM David G. Johnston < david.g.johns...@gmail.com> wrote: > On Monday, July 6, 2020, Sebastien Arod <sebastien.a...@gmail.com> wrote: > >> I would have expected postgresql to "share" a preliminary sort on c1 that >> would then be useful to reduce the work on all window functions but it >> doesn't. >> > > The plan shown does share - the output of one sort goes into another. > Subsequent sorts still have to happen but they should be faster as the > first field is already grouped. Doesn’t change the plan though. > > >> I even created an index on c1 hoping that postgresql would be able to use >> it in order to minimize the cost of the sorts but I couldn't make it use it. >> > > Use it how? You are still evaluating 250k groups so doing anything > piece-wise seems like an expected loss compared to sequential scan. > > David J. > >