how is cost calculated? postgres/costsize.c at master · postgres/postgres (github.com) <https://github.com/postgres/postgres/blob/master/src/backend/optimizer/path/costsize.c>
row estimation PostgreSQL: Documentation: 13: 70.1. Row Estimation Examples <https://www.postgresql.org/docs/current/row-estimation-examples.html> FYI, I just read it when I had queries like why rows were estimated when the table has now rows. but i think it requires a deeper reading into the source for cost calculation which i did not do at that time as i did not understand :) but incase above helps to start. On Fri, 4 Jun 2021 at 16:29, Pól Ua Laoínecháin <lineh...@tcd.ie> wrote: > Hi all, > > Noticed this today - relatively simple query - table with 7 records > (all code is shown at the bottom of this post and on the fiddle here): > > > https://dbfiddle.uk/?rdbms=postgres_12&fiddle=efe73a37d29af43f33d2bc79de2b6c97 > > Sample (2 of 7 records); > > ====================== > INSERT INTO test > VALUES > (1, 'Pól' , '2021-06-01', '2021-06-06'), > (2, 'Bill' , '2021-06-02', '2021-06-10'); > ========================= > > Query: > > ======================== > SELECT > id, > GENERATE_SERIES > (t.start_date, t.end_date, '1 DAY')::DATE AS sd, > t.end_date > FROM test t > ORDER BY t.id, t.start_date; > ========================= > > Now, the EXPLAIN (ANALYZE, BUFFERS) of this query is as follow - see > fiddle - with 7 records: > > ==================== > QUERY PLAN > Sort (cost=165708.62..168608.62 rows=1160000 width=10) (actual > time=0.083..0.087 rows=42 loops=1) > Sort Key: id, (((generate_series((start_date)::timestamp with time > zone, (end_date)::timestamp with time zone, '1 > day'::interval)))::date) > Sort Method: quicksort Memory: 26kB > Buffers: shared hit=1 > -> Result (cost=0.00..29036.10 rows=1160000 width=10) (actual > time=0.019..0.056 rows=42 loops=1) > Buffers: shared hit=1 > -> ProjectSet (cost=0.00..5836.10 rows=1160000 width=14) (actual > time=0.018..0.042 rows=42 loops=1) > Buffers: shared hit=1 > -> Seq Scan on test t (cost=0.00..21.60 rows=1160 width=10) (actual > time=0.008..0.010 rows=7 loops=1) > Buffers: shared hit=1 > Planning Time: 0.061 ms > Execution Time: 0.131 ms > 12 rows > =========================== > > Now, the first line of the PLAN has > > > Sort (cost=165708.62..168608.62 rows=1160000 width=10) > > and two other lines below this also contain the figure 1,160,000. > > Where *_on earth_* is PostgreSQL obtaining 1.16M rows? And where do > the cost numbers come from for a query on 7 records? > > This query - a recursive CTE (no GENERATE_SERIES) as follows: > > =================== > WITH RECURSIVE cte (id, sd, ed) AS > ( > SELECT t.id, t.start_date, t.end_date > FROM test t > UNION ALL > SELECT c.id, (c.sd + INTERVAL '1 DAY')::DATE, c.ed > FROM cte c > WHERE c.sd < (SELECT z.end_date FROM test z WHERE z.id = c.id) > ) > SELECT * FROM cte c2 > ORDER BY c2.id, c2.sd, c2.ed; > ============================= > > gives a PLAN as follows: > > ========================== > QUERY PLAN > Sort (cost=955181.47..955281.05 rows=39830 width=10) (actual > time=0.262..0.266 rows=42 loops=1) > Sort Key: c2.id, c2.sd, c2.ed > Sort Method: quicksort Memory: 26kB > Buffers: shared hit=85 > CTE cte > -> Recursive Union (cost=0.00..951341.55 rows=39830 width=10) (actual > time=0.010..0.216 rows=42 loops=1) > Buffers: shared hit=85 > -> Seq Scan on test t (cost=0.00..21.60 rows=1160 width=10) (actual > time=0.009..0.010 rows=7 loops=1) > Buffers: shared hit=1 > -> WorkTable Scan on cte c (cost=0.00..95052.33 rows=3867 width=10) > (actual time=0.006..0.020 rows=4 loops=9) > Filter: (sd < (SubPlan 1)) > Rows Removed by Filter: 1 > Buffers: shared hit=84 > SubPlan 1 > -> Index Scan using test_pkey on test z (cost=0.15..8.17 rows=1 > width=4) (actual time=0.003..0.003 rows=1 loops=42) > Index Cond: (id = c.id) > Buffers: shared hit=84 > -> CTE Scan on cte c2 (cost=0.00..796.60 rows=39830 width=10) (actual > time=0.011..0.233 rows=42 loops=1) > Buffers: shared hit=85 > Planning Time: 0.137 ms > Execution Time: 0.324 ms > 21 rows > =============================== > > Now, this PLAN is more complicated - and I totally get that! However, > where do these numbers: > > > (cost=955181.47..955281.05 rows=39830 width=10) > > come from? > > Now, we've gone from 1,160,000 to 39,830 rows (progress? :-) ) and a > cost of ~ 1M (compared with 168k for the first query). > > I probably need to read up on the EXPLAIN (ANALYZE, BUFFERS) > functionality - but I would appreciate: > > a) an (overview) explanation of what's going on here in particular and > > b) any good references to URLs, papers whatever which would be of > benefit to a (hopefully) reasonably competent SQL programmer with a > desire to grasp internals - how to interpret PostgreSQL's EXPLAIN > functionality. > > TIA and rgs, > > > Pól... > > > -- Thanks, Vijay Mumbai, India