A performance issue in ROW_NUMBER() OVER(ORDER BY NULL) [27 times slow than OVER()] V14.5

Kirk Wolak Sun, 19 Feb 2023 11:44:06 -0800

Okay,
  I have some converted code that uses this syntax.
  For 20 Million rows it was taking 15-20 minutes!  (versus 3 minutes) on
live data.
  See here: https://explain.depesz.com/s/VQFJ  [There are 2 optimizations,
removing the ORDER BY NULL, and just using a sequence]
  (The above is a live data run)  The IO with the ORDER BY NULL is crazy
high.  10x the READ in GB!


  The solution is to remove the ORDER BY NULL.  [since that is not
sortable, should it be ignored?]

  This does NOT SHOW UP with 1 million rows.

simple example (slow):
explain (verbose, buffers, analyze) select row_number() over(*ORDER BY NULL*)
as rn, x, random() from generate_series(1,20000000) x order by x desc,
random();

 QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=3365975.67..3415975.67 rows=20000000 width=52) (actual
time=816050.580..817984.447 rows=20000000 loops=1)
   Output: (row_number() OVER (?)), x, (random()), NULL::text
   Sort Key: x.x DESC, (random())
   Sort Method: external merge  Disk: 665376kB
   Buffers: temp read=20136158 written=161301
   ->  WindowAgg  (cost=0.00..550000.00 rows=20000000 width=52) (actual
time=12585.463..791440.272 rows=20000000 loops=1)
         Output: row_number() OVER (?), x, random(), (NULL::text)
         Buffers: temp read=20052986 written=78126
         ->  Function Scan on pg_catalog.generate_series x
 (cost=0.00..200000.00 rows=20000000 width=36) (actual
time=3282.882..6343.470 rows=20000000 loops=1)
               Output: NULL::text, x
               Function Call: generate_series(1, 20000000)
               Buffers: temp read=34180 written=34180
 Query Identifier: 7415410443092007025
 Planning Time: 0.036 ms
 JIT:
   Functions: 7
   Options: Inlining true, Optimization true, Expressions true, Deforming
true
   Timing: Generation 0.231 ms, Inlining 5.021 ms, Optimization 18.850 ms,
Emission 12.934 ms, Total 37.036 ms
 Execution Time: 819003.415 ms
(19 rows)

Time: 819004.041 ms (13:39.004)


the fixed version:
explain (verbose, buffers, analyze) select *row_number() over()* as rn, x,
random() from generate_series(1,20000000) x order by x desc, random();

 QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=3159725.67..3209725.67 rows=20000000 width=20) (actual
time=23361.306..25466.728 rows=20000000 loops=1)
   Output: (row_number() OVER (?)), x, (random())
   Sort Key: x.x DESC, (random())
   Sort Method: external merge  Disk: 665376kB
   Buffers: temp read=117352 written=117355
   ->  WindowAgg  (cost=0.00..500000.00 rows=20000000 width=20) (actual
time=3409.882..9789.217 rows=20000000 loops=1)
         Output: row_number() OVER (?), x, random()
         Buffers: temp read=34180 written=34180
         ->  Function Scan on pg_catalog.generate_series x
 (cost=0.00..200000.00 rows=20000000 width=4) (actual
time=3409.853..6768.737 rows=20000000 loops=1)
               Output: x
               Function Call: generate_series(1, 20000000)
               Buffers: temp read=34180 written=34180
 Query Identifier: -8739706385719272689
 Planning Time: 0.067 ms
 JIT:
   Functions: 4
   Options: Inlining true, Optimization true, Expressions true, Deforming
true
   Timing: Generation 0.180 ms, Inlining 32.829 ms, Optimization 5.890 ms,
Emission 3.899 ms, Total 42.798 ms
 Execution Time: 26070.514 ms

Time: 26071.112 ms (00:26.071)

A performance issue in ROW_NUMBER() OVER(ORDER BY NULL) [27 times slow than OVER()] V14.5

Reply via email to