Hi, Kindly requesting an update on this. Thanks. -Aditya.
On Tue, Oct 20, 2020 at 6:26 PM aditya desai <admad...@gmail.com> wrote: > > > On Mon, Oct 19, 2020 at 9:50 PM Michael Lewis <mle...@entrata.com> wrote: > >> Reply to the group, not just me please. Btw, when you do reply to the >> group, it is best practice on these lists to reply in-line and not just >> reply on top with all prior messages quoted. >> > > Hi Michael, > Please see below inline response. I tried all this on Dev env 2 vCPU and 8 > GB RAM. Still waiting for the PST environment :( with better configuration. > >> >> On Sun, Oct 18, 2020 at 3:23 AM aditya desai <admad...@gmail.com> wrote: >> >>> I tried vacuum full and execution time came down to half. >>> >> Great to hear. >> >> >>> However, it still consumes CPU. Setting parallel workers per gather to 0 >>> did not help much. >>> >> You didn't answer all of my questions, particularly about disabling >> sequential scan. If you still have the default random_page_cost of 4, it >> might be that 1.5 allows better estimates for cost on index (random) vs >> sequential scan of a table. >> > > Please see the next inline answer. > >> >> Laurenz is a brilliant guy. I would implement the indexes he suggests if >> you don't have them already and report back. If the indexes don't get used, >> try set enable_seqscan = false; before the query and if it is way >> faster, then reduce random_page_cost to maybe 1-2 depending how your >> overall cache hit ratio is across the system. >> > > Query plan with enable_seqscan=off , Random page cost=1. With this > execution time and cost of query is almost less than half compared to > original settings. Also used the suggestions given by Laurenze. 1. Made use > of UINON operator and created indexes. > > lmp_delivery_jobs=> explain (analyze,buffers) with JobCount as ( (select > jobstatuscode,count(1) stat_count from job j where 1=1 and j.countrycode = > 'TH' and j.facilitycode in > ('THNPM1','THPRK1','THCNT1','THSPN1','THKRI1','THPKN1','THSBI1','THUTG1','THLRI1','THSRI1','THSUR1','THSKM1') > and j.jobstartdatetime between '2020-08-01 00:00:00' and '2020-09-30 > 00:00:00' group by j.jobstatuscode) UNION (select jobstatuscode,count(1) > stat_count from job j where 1=1 and j.countrycode = 'TH' and > j.facilitycode in > ('THNPM1','THPRK1','THCNT1','THSPN1','THKRI1','THPKN1','THSBI1','THUTG1','THLRI1','THSRI1','THSUR1','THSKM1') > and j.jobstartdatetime is null group by j.jobstatuscode)) > select js.jobstatuscode,COALESCE(stat_count,0) stat_count from JobCount > jc right outer join jobstatus js on jc.jobstatuscode=js.jobstatuscode; > > QUERY PLAN > > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > ----------------------------------------------------------- > Hash Right Join (cost=68652.52..68652.76 rows=10 width=12) (actual > time=676.477..676.495 rows=10 loops=1) > Hash Cond: ((jc.jobstatuscode)::text = (js.jobstatuscode)::text) > Buffers: shared hit=11897 > CTE jobcount > -> HashAggregate (cost=68650.01..68650.11 rows=10 width=24) (actual > time=676.451..676.454 rows=8 loops=1) > Group Key: j.jobstatuscode, (count(1)) > Buffers: shared hit=11895 > -> Append (cost=68645.89..68649.96 rows=10 width=24) (actual > time=676.346..676.441 rows=8 loops=1) > Buffers: shared hit=11895 > -> Finalize GroupAggregate (cost=68645.89..68648.17 > rows=9 width=12) (actual time=676.345..676.379 rows=8 loops=1) > Group Key: j.jobstatuscode > Buffers: shared hit=11889 > -> Gather Merge (cost=68645.89..68647.99 rows=18 > width=12) (actual time=676.330..676.403 rows=24 loops=1) > Workers Planned: 2 > Workers Launched: 2 > Buffers: shared hit=29067 read=1 > I/O Timings: read=0.038 > -> Sort (cost=67645.87..67645.89 rows=9 > width=12) (actual time=669.544..669.548 rows=8 loops=3) > Sort Key: j.jobstatuscode > Sort Method: quicksort Memory: 25kB > Worker 0: Sort Method: quicksort > Memory: 25kB > Worker 1: Sort Method: quicksort > Memory: 25kB > Buffers: shared hit=29067 read=1 > I/O Timings: read=0.038 > -> Partial HashAggregate > (cost=67645.63..67645.72 rows=9 width=12) (actual time=669.506..669.511 > rows=8 loops=3) > Group Key: j.jobstatuscode > Buffers: shared hit=29051 read=1 > I/O Timings: read=0.038 > -> Parallel Index Scan using > job_list_test1 on job j (cost=0.43..66135.88 rows=301950 width=4) (actual > time=0.040..442.373 ro > ws=244800 loops=3) > Index Cond: > (((countrycode)::text = 'TH'::text) AND ((facilitycode)::text = ANY > ('{THNPM1,THPRK1,THCNT1,THSPN1,THKRI1,THP > KN1,THSBI1,THUTG1,THLRI1,THSRI1,THSUR1,THSKM1}'::text[]))) > Filter: ((jobstartdatetime > >= '2020-08-01 00:00:00'::timestamp without time zone) AND > (jobstartdatetime <= '2020-09-30 00 > :00:00'::timestamp without time zone)) > Buffers: shared hit=29051 > read=1 > I/O Timings: read=0.038 > -> GroupAggregate (cost=1.62..1.64 rows=1 width=12) > (actual time=0.043..0.043 rows=0 loops=1) > Group Key: j_1.jobstatuscode > Buffers: shared hit=6 > -> Sort (cost=1.62..1.62 rows=1 width=4) (actual > time=0.041..0.041 rows=0 loops=1) > Sort Key: j_1.jobstatuscode > Sort Method: quicksort Memory: 25kB > Buffers: shared hit=6 > -> Index Scan using job_list_test3 on job > j_1 (cost=0.14..1.61 rows=1 width=4) (actual time=0.034..0.034 rows=0 > loops=1) > Index Cond: ((countrycode)::text = > 'TH'::text) > Filter: ((facilitycode)::text = ANY > ('{THNPM1,THPRK1,THCNT1,THSPN1,THKRI1,THPKN1,THSBI1,THUTG1,THLRI1,THSRI1,THSUR1,THSKM1}'::text[]) > ) > Rows Removed by Filter: 26 > Buffers: shared hit=6 > -> CTE Scan on jobcount jc (cost=0.00..0.20 rows=10 width=24) (actual > time=676.454..676.461 rows=8 loops=1) > Buffers: shared hit=11895 > -> Hash (cost=2.29..2.29 rows=10 width=4) (actual time=0.015..0.015 > rows=10 loops=1) > Buckets: 1024 Batches: 1 Memory Usage: 9kB > Buffers: shared hit=2 > -> Index Only Scan using jobstatus_jobstatuscode_unq on > jobstatus js (cost=0.14..2.29 rows=10 width=4) (actual time=0.005..0.009 > rows=10 loops=1) > Heap Fetches: 0 > Buffers: shared hit=2 > Planning Time: 0.812 ms > Execution Time: 676.642 ms > (55 rows) > > > Query with Random page cost=4 and enable_seq=on > > lmp_delivery_jobs=> set random_page_cost=4; > SET > lmp_delivery_jobs=> explain (analyze,buffers) with JobCount as ( (select > jobstatuscode,count(1) stat_count from job j where 1=1 and j.countrycode = > 'TH' and j.facilitycode in > ('THNPM1','THPRK1','THCNT1','THSPN1','THKRI1','THPKN1','THSBI1','THUTG1','THLRI1','THSRI1','THSUR1','THSKM1') > and j.jobstartdatetime between '2020-08-01 00:00:00' and '2020-09-30 > 00:00:00' group by j.jobstatuscode) UNION (select jobstatuscode,count(1) > stat_count from job j where 1=1 and j.countrycode = 'TH' and > j.facilitycode in > ('THNPM1','THPRK1','THCNT1','THSPN1','THKRI1','THPKN1','THSBI1','THUTG1','THLRI1','THSRI1','THSUR1','THSKM1') > and j.jobstartdatetime is null group by j.jobstatuscode)) > select js.jobstatuscode,COALESCE(stat_count,0) stat_count from JobCount > jc right outer join jobstatus js on jc.jobstatuscode=js.jobstatuscode; > > QUERY PLAN > > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > ----------------------------------------- > Hash Right Join (cost=128145.44..128145.67 rows=10 width=12) (actual > time=1960.823..1960.842 rows=10 loops=1) > Hash Cond: ((jc.jobstatuscode)::text = (js.jobstatuscode)::text) > Buffers: shared hit=20586 read=8706 > I/O Timings: read=49.250 > CTE jobcount > -> HashAggregate (cost=128144.11..128144.21 rows=10 width=24) > (actual time=1960.786..1960.788 rows=8 loops=1) > Group Key: j.jobstatuscode, (count(1)) > Buffers: shared hit=20585 read=8706 > I/O Timings: read=49.250 > -> Append (cost=128135.68..128144.06 rows=10 width=24) > (actual time=1960.634..1960.774 rows=8 loops=1) > Buffers: shared hit=20585 read=8706 > I/O Timings: read=49.250 > -> Finalize GroupAggregate (cost=128135.68..128137.96 > rows=9 width=12) (actual time=1960.632..1960.689 rows=8 loops=1) > Group Key: j.jobstatuscode > Buffers: shared hit=20579 read=8706 > I/O Timings: read=49.250 > -> Gather Merge (cost=128135.68..128137.78 > rows=18 width=12) (actual time=1960.616..1960.690 rows=24 loops=1) > Workers Planned: 2 > Workers Launched: 2 > Buffers: shared hit=58214 read=30130 > I/O Timings: read=152.485 > -> Sort (cost=127135.66..127135.68 rows=9 > width=12) (actual time=1941.131..1941.134 rows=8 loops=3) > Sort Key: j.jobstatuscode > Sort Method: quicksort Memory: 25kB > Worker 0: Sort Method: quicksort > Memory: 25kB > Worker 1: Sort Method: quicksort > Memory: 25kB > Buffers: shared hit=58214 read=30130 > I/O Timings: read=152.485 > -> Partial HashAggregate > (cost=127135.43..127135.52 rows=9 width=12) (actual time=1941.088..1941.092 > rows=8 loops=3) > Group Key: j.jobstatuscode > Buffers: shared hit=58198 > read=30130 > I/O Timings: read=152.485 > -> Parallel Seq Scan on job j > (cost=0.00..125625.68 rows=301950 width=4) (actual time=0.015..1698.223 > rows=244800 loops=3) > Filter: ((jobstartdatetime > >= '2020-08-01 00:00:00'::timestamp without time zone) AND > (jobstartdatetime <= '2020-09-30 00 > :00:00'::timestamp without time zone) AND ((countrycode)::text = > 'TH'::text) AND ((facilitycode)::text = ANY > ('{THNPM1,THPRK1,THCNT1,THSPN1,THKRI1,THPKN1,THSBI1,THUTG1, > THLRI1,THSRI1,THSUR1,THSKM1}'::text[]))) > Rows Removed by Filter: > 673444 > Buffers: shared hit=58198 > read=30130 > I/O Timings: read=152.485 > -> GroupAggregate (cost=5.93..5.95 rows=1 width=12) > (actual time=0.077..0.077 rows=0 loops=1) > Group Key: j_1.jobstatuscode > Buffers: shared hit=6 > -> Sort (cost=5.93..5.94 rows=1 width=4) (actual > time=0.075..0.075 rows=0 loops=1) > Sort Key: j_1.jobstatuscode > Sort Method: quicksort Memory: 25kB > Buffers: shared hit=6 > -> Index Scan using job_list_test3 on job > j_1 (cost=0.14..5.92 rows=1 width=4) (actual time=0.065..0.065 rows=0 > loops=1) > Index Cond: ((countrycode)::text = > 'TH'::text) > Filter: ((facilitycode)::text = ANY > ('{THNPM1,THPRK1,THCNT1,THSPN1,THKRI1,THPKN1,THSBI1,THUTG1,THLRI1,THSRI1,THSUR1,THSKM1}'::text[]) > ) > Rows Removed by Filter: 26 > Buffers: shared hit=6 > -> CTE Scan on jobcount jc (cost=0.00..0.20 rows=10 width=24) (actual > time=1960.789..1960.797 rows=8 loops=1) > Buffers: shared hit=20585 read=8706 > I/O Timings: read=49.250 > -> Hash (cost=1.10..1.10 rows=10 width=4) (actual time=0.023..0.023 > rows=10 loops=1) > Buckets: 1024 Batches: 1 Memory Usage: 9kB > Buffers: shared hit=1 > -> Seq Scan on jobstatus js (cost=0.00..1.10 rows=10 width=4) > (actual time=0.007..0.013 rows=10 loops=1) > Buffers: shared hit=1 > Planning Time: 3.019 ms > Execution Time: 1961.024 ms > > >> >> >>> Auto vacuuming is catching up just fine. No issues in that area. >>> >> If the time came down by half after 'vacuum full', I would question that >> statement. >> > > I checked the last autovacuum on underlying tables before load tests and > it was very recent. Also I explicitly ran VACUUM ANALYZE FREEZ on > underlying tables before load test just to make sure. It did not help much. > >> >> >>> Temp table size is less that original tables without indexes. >>> >> Significantly less would indicate the regular table still being bloated I >> think. Maybe someone else will suggest otherwise. >> > > Please see below. > > SELECT > relname AS TableName > ,n_live_tup AS LiveTuples > ,n_dead_tup AS DeadTuples > FROM pg_stat_user_tables where relname='job'; > tablename | livetuples | deadtuples > -----------+------------+------------ > job | 2754980 | 168 > > >> >> >>> Does this mean we need to upgrade the hardware? Also by caching data , >>> do you mean caching at application side(microservices side) ? Or on >>> postgres side? I tried pg_prewarm, it did not help much. >>> >> I can't say about hardware. Until you have exhausted options like configs >> and indexing, spending more money forever onwards seems premature. I meant >> pre-aggregated data, wherever it makes sense to do that. I wouldn't expect >> pg_prewarm to do a ton since you already show high cache hits. >> > > Understood thanks. > >> >> >>> It is actually the CPU consumption which is the issue. Query is fast >>> otherwise. >>> >> Sure, but that is a symptom of reading and processing a lot of data. >> > > As per application team, it is business requirement to show last 60 days > worth data. This particular query finds the counts of jobstatus(GROUP BY) > which may be taking a lot of compute(CPU spikes) I have tried indexing > suggested by Laurenze as well. Cost and execution time are still high >