Is your transaction id more or less monotonic according to the date? If so, something like the next can help:
with tenders_filtered as (select * from tenders where country = 'Mexico' and "date" >= '2023-01-01' and "date" < '2024-01-01') Select * from tenders_filtered inner join items on transaction_id = tender_transaction_id where tender_transaction_id between (select min(transaction_id) from tenders_filtered) and (select max(transaction_id) from tenders_filtered) This assumes you have an index on items(tender_transaction_id) and it would be able to select a small subset (less than say 5%) of the table. If your transaction_id is not monotonic, you can consider having something monotonic or even additional denormalized field(s) with country and/or date to your items. Another option is to use a windowing function to get the count, e.g. Select *,count(*) OVER () as cnt from tenders inner join items on transaction_id = tender_transaction_id where country = 'Colombia' and "date" >= '2023-01-01' and "date" < '2024-01-01' This would at least save you from doing a second call. пн, 26 лют. 2024 р. о 16:26 Chema <ch...@interneta.org> пише: > Dear pgsqlers, > > I'm trying to optimize simple queries on two tables (tenders & items) with > a couple million records. Besides the resulting records, the app also > displays the count of total results. Doing count() takes as much time as > the other query (which can be 30+ secs), so it's an obvious target for > optimization. I'm already caching count() results for the most common > conditions (country & year) in a material table, which practically halves > response time. The tables are updated sparingly, and only with bulk > COPYs. Now I'm looking for ways to optimize queries with other conditions. > > Reading around, seems many people are still using this 2005 snippet > <https://www.postgresql.org/message-id/20050810133157.ga46...@winnie.fuhr.org> > to obtain the row count estimate from Explain: > > CREATE FUNCTION count_estimate(query text) RETURNS integer AS $$DECLARE > rec record; > rows integer;BEGIN > FOR rec IN EXECUTE 'EXPLAIN ' || query LOOP > rows := substring(rec."QUERY PLAN" FROM ' rows=([[:digit:]]+)'); > EXIT WHEN rows IS NOT NULL; > END LOOP; > RETURN rows;END;$$ LANGUAGE plpgsql VOLATILE STRICT; > > Is this still the current best practice? Any tips to increase precision? > Currently it can estimate the actual number of rows for over *or* under a > million, as seen on the sample queries (1,955,297 instead of 1,001,200; > 162,080 instead of 1,292,010). > > Any other tips to improve the query are welcome, of course. There's a big > disparity between the two sample queries plans even though only the > filtered country changes. > > I already raised default_statistics_target up to 2k (the planner wasn't > using indexes at all with low values). Gotta get it even higher? These are > my custom settings: > > shared_buffers = 256MB # min 128kB > work_mem = 128MB # min 64kB > maintenance_work_mem = 254MB # min 1MB > effective_cache_size = 2GB > default_statistics_target = 2000 > random_page_cost = 1.0 # same scale as above > > Sample query: > > Explain Analyze > Select * from tenders inner join items on transaction_id = > tender_transaction_id > where country = 'Colombia' > and "date" >= '2023-01-01' and "date" < '2024-01-01' > QUERY PLAN > Gather (cost=253837.99..1506524.32 rows=1955297 width=823) (actual > time=51433.592..63239.809 rows=1001200 loops=1) > Workers Planned: 2 > Workers Launched: 2 > -> Parallel Hash Join (cost=252837.99..1309994.62 rows=814707 > width=823) (actual time=51361.920..61729.142 rows=333733 loops=3) > Hash Cond: (items.tender_transaction_id = tenders.transaction_id) > -> Parallel Seq Scan on items (cost=0.00..1048540.46 > rows=3282346 width=522) (actual time=1.689..56887.108 rows=2621681 loops=3) > -> Parallel Hash (cost=247919.56..247919.56 rows=393475 > width=301) (actual time=2137.473..2137.476 rows=333733 loops=3) > Buckets: 1048576 Batches: 1 Memory Usage: 219936kB > -> Parallel Bitmap Heap Scan on tenders > (cost=16925.75..247919.56 rows=393475 width=301) (actual > time=385.315..908.865 rows=333733 loops=3) > Recheck Cond: ((country = 'Colombia'::text) AND (date > >= '2023-01-01'::date) AND (date < '2024-01-01'::date)) > Heap Blocks: exact=24350 > -> Bitmap Index Scan on > tenders_country_and_date_index (cost=0.00..16689.67 rows=944339 width=0) > (actual time=423.213..423.214 rows=1001200 loops=1) > Index Cond: ((country = 'Colombia'::text) AND > (date >= '2023-01-01'::date) AND (date < '2024-01-01'::date)) > Planning Time: 12.784 ms > JIT: > Functions: 33 > Options: Inlining true, Optimization true, Expressions true, Deforming true > Timing: Generation 14.675 ms, Inlining 383.349 ms, Optimization 1023.521 > ms, Emission 651.442 ms, Total 2072.987 ms > Execution Time: 63378.033 ms > > Explain Analyze > Select * from tenders inner join items on transaction_id = > tender_transaction_id > where country = 'Mexico' > and "date" >= '2023-01-01' and "date" < '2024-01-01' > QUERY PLAN > Gather (cost=1000.99..414258.70 rows=162080 width=823) (actual > time=52.538..7006.128 rows=1292010 loops=1) > Workers Planned: 2 > Workers Launched: 2 > -> Nested Loop (cost=0.99..397050.70 rows=67533 width=823) (actual > time=40.211..4087.081 rows=430670 loops=3) > -> Parallel Index Scan using tenders_country_and_date_index on > tenders (cost=0.43..45299.83 rows=32616 width=301) (actual > time=4.376..59.760 rows=1218 loops=3) > Index Cond: ((country = 'Mexico'::text) AND (date >= > '2023-01-01'::date) AND (date < '2024-01-01'::date)) > -> Index Scan using items_tender_transaction_id_index on items > (cost=0.56..10.67 rows=11 width=522) (actual time=0.321..3.035 rows=353 > loops=3655) > Index Cond: (tender_transaction_id = tenders.transaction_id) > Planning Time: 7.808 ms > JIT: > Functions: 27 > Options: Inlining false, Optimization false, Expressions true, Deforming > true > Timing: Generation 17.785 ms, Inlining 0.000 ms, Optimization 5.080 ms, > Emission 93.274 ms, Total 116.138 ms > Execution Time: 7239.427 ms > > Thanks in advance! >