Re: CPU Consuming query. Sequential scan despite indexing.

aditya desai Wed, 21 Oct 2020 22:22:58 -0700

Hi,
Kindly requesting an update on this. Thanks.

-Aditya.


On Tue, Oct 20, 2020 at 6:26 PM aditya desai <admad...@gmail.com> wrote:

>
>
> On Mon, Oct 19, 2020 at 9:50 PM Michael Lewis <mle...@entrata.com> wrote:
>
>> Reply to the group, not just me please. Btw, when you do reply to the
>> group, it is best practice on these lists to reply in-line and not just
>> reply on top with all prior messages quoted.
>>
>
> Hi  Michael,
> Please see below inline response. I tried all this on Dev env 2 vCPU and 8
> GB RAM. Still waiting for the PST environment :( with better configuration.
>
>>
>> On Sun, Oct 18, 2020 at 3:23 AM aditya desai <admad...@gmail.com> wrote:
>>
>>> I tried vacuum full and execution time came down to half.
>>>
>> Great to hear.
>>
>>
>>> However, it still consumes CPU. Setting parallel workers per gather to 0
>>> did not help much.
>>>
>> You didn't answer all of my questions, particularly about disabling
>> sequential scan. If you still have the default random_page_cost of 4, it
>> might be that 1.5 allows better estimates for cost on index (random) vs
>> sequential scan of a table.
>>
>
> Please see the next inline answer.
>
>>
>> Laurenz is a brilliant guy. I would implement the indexes he suggests if
>> you don't have them already and report back. If the indexes don't get used,
>> try set enable_seqscan = false; before the query and if it is way
>> faster, then reduce random_page_cost to maybe 1-2 depending how your
>> overall cache hit ratio is across the system.
>>
>
> Query plan with enable_seqscan=off , Random page cost=1. With this
> execution time and cost of query is almost less than half compared to
> original settings. Also used the suggestions given by Laurenze. 1. Made use
> of UINON operator and created indexes.
>
> lmp_delivery_jobs=> explain (analyze,buffers) with JobCount as ( (select
> jobstatuscode,count(1) stat_count from job j where 1=1 and j.countrycode =
> 'TH'   and j.facilitycode in
> ('THNPM1','THPRK1','THCNT1','THSPN1','THKRI1','THPKN1','THSBI1','THUTG1','THLRI1','THSRI1','THSUR1','THSKM1')
> and j.jobstartdatetime  between '2020-08-01 00:00:00' and '2020-09-30
> 00:00:00'    group by j.jobstatuscode) UNION (select jobstatuscode,count(1)
> stat_count from job j where 1=1 and j.countrycode = 'TH'   and
> j.facilitycode in
> ('THNPM1','THPRK1','THCNT1','THSPN1','THKRI1','THPKN1','THSBI1','THUTG1','THLRI1','THSRI1','THSUR1','THSKM1')
> and j.jobstartdatetime is null  group by j.jobstatuscode))
>  select js.jobstatuscode,COALESCE(stat_count,0) stat_count from JobCount
> jc right outer join jobstatus js on jc.jobstatuscode=js.jobstatuscode;
>
>                                   QUERY PLAN
>
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> -----------------------------------------------------------
>  Hash Right Join  (cost=68652.52..68652.76 rows=10 width=12) (actual
> time=676.477..676.495 rows=10 loops=1)
>    Hash Cond: ((jc.jobstatuscode)::text = (js.jobstatuscode)::text)
>    Buffers: shared hit=11897
>    CTE jobcount
>      ->  HashAggregate  (cost=68650.01..68650.11 rows=10 width=24) (actual
> time=676.451..676.454 rows=8 loops=1)
>            Group Key: j.jobstatuscode, (count(1))
>            Buffers: shared hit=11895
>            ->  Append  (cost=68645.89..68649.96 rows=10 width=24) (actual
> time=676.346..676.441 rows=8 loops=1)
>                  Buffers: shared hit=11895
>                  ->  Finalize GroupAggregate  (cost=68645.89..68648.17
> rows=9 width=12) (actual time=676.345..676.379 rows=8 loops=1)
>                        Group Key: j.jobstatuscode
>                        Buffers: shared hit=11889
>                        ->  Gather Merge  (cost=68645.89..68647.99 rows=18
> width=12) (actual time=676.330..676.403 rows=24 loops=1)
>                              Workers Planned: 2
>                              Workers Launched: 2
>                              Buffers: shared hit=29067 read=1
>                              I/O Timings: read=0.038
>                              ->  Sort  (cost=67645.87..67645.89 rows=9
> width=12) (actual time=669.544..669.548 rows=8 loops=3)
>                                    Sort Key: j.jobstatuscode
>                                    Sort Method: quicksort  Memory: 25kB
>                                    Worker 0:  Sort Method: quicksort
> Memory: 25kB
>                                    Worker 1:  Sort Method: quicksort
> Memory: 25kB
>                                    Buffers: shared hit=29067 read=1
>                                    I/O Timings: read=0.038
>                                    ->  Partial HashAggregate
> (cost=67645.63..67645.72 rows=9 width=12) (actual time=669.506..669.511
> rows=8 loops=3)
>                                          Group Key: j.jobstatuscode
>                                          Buffers: shared hit=29051 read=1
>                                          I/O Timings: read=0.038
>                                          ->  Parallel Index Scan using
> job_list_test1 on job j  (cost=0.43..66135.88 rows=301950 width=4) (actual
> time=0.040..442.373 ro
> ws=244800 loops=3)
>                                                Index Cond:
> (((countrycode)::text = 'TH'::text) AND ((facilitycode)::text = ANY
> ('{THNPM1,THPRK1,THCNT1,THSPN1,THKRI1,THP
> KN1,THSBI1,THUTG1,THLRI1,THSRI1,THSUR1,THSKM1}'::text[])))
>                                                Filter: ((jobstartdatetime
> >= '2020-08-01 00:00:00'::timestamp without time zone) AND
> (jobstartdatetime <= '2020-09-30 00
> :00:00'::timestamp without time zone))
>                                                Buffers: shared hit=29051
> read=1
>                                                I/O Timings: read=0.038
>                  ->  GroupAggregate  (cost=1.62..1.64 rows=1 width=12)
> (actual time=0.043..0.043 rows=0 loops=1)
>                        Group Key: j_1.jobstatuscode
>                        Buffers: shared hit=6
>                        ->  Sort  (cost=1.62..1.62 rows=1 width=4) (actual
> time=0.041..0.041 rows=0 loops=1)
>                              Sort Key: j_1.jobstatuscode
>                              Sort Method: quicksort  Memory: 25kB
>                              Buffers: shared hit=6
>                              ->  Index Scan using job_list_test3 on job
> j_1  (cost=0.14..1.61 rows=1 width=4) (actual time=0.034..0.034 rows=0
> loops=1)
>                                    Index Cond: ((countrycode)::text =
> 'TH'::text)
>                                    Filter: ((facilitycode)::text = ANY
> ('{THNPM1,THPRK1,THCNT1,THSPN1,THKRI1,THPKN1,THSBI1,THUTG1,THLRI1,THSRI1,THSUR1,THSKM1}'::text[])
> )
>                                    Rows Removed by Filter: 26
>                                    Buffers: shared hit=6
>    ->  CTE Scan on jobcount jc  (cost=0.00..0.20 rows=10 width=24) (actual
> time=676.454..676.461 rows=8 loops=1)
>          Buffers: shared hit=11895
>    ->  Hash  (cost=2.29..2.29 rows=10 width=4) (actual time=0.015..0.015
> rows=10 loops=1)
>          Buckets: 1024  Batches: 1  Memory Usage: 9kB
>          Buffers: shared hit=2
>          ->  Index Only Scan using jobstatus_jobstatuscode_unq on
> jobstatus js  (cost=0.14..2.29 rows=10 width=4) (actual time=0.005..0.009
> rows=10 loops=1)
>                Heap Fetches: 0
>                Buffers: shared hit=2
>  Planning Time: 0.812 ms
>  Execution Time: 676.642 ms
> (55 rows)
>
>
> Query with Random page cost=4 and enable_seq=on
>
> lmp_delivery_jobs=> set random_page_cost=4;
> SET
> lmp_delivery_jobs=> explain (analyze,buffers) with JobCount as ( (select
> jobstatuscode,count(1) stat_count from job j where 1=1 and j.countrycode =
> 'TH'   and j.facilitycode in
> ('THNPM1','THPRK1','THCNT1','THSPN1','THKRI1','THPKN1','THSBI1','THUTG1','THLRI1','THSRI1','THSUR1','THSKM1')
> and j.jobstartdatetime  between '2020-08-01 00:00:00' and '2020-09-30
> 00:00:00'    group by j.jobstatuscode) UNION (select jobstatuscode,count(1)
> stat_count from job j where 1=1 and j.countrycode = 'TH'   and
> j.facilitycode in
> ('THNPM1','THPRK1','THCNT1','THSPN1','THKRI1','THPKN1','THSBI1','THUTG1','THLRI1','THSRI1','THSUR1','THSKM1')
> and j.jobstartdatetime is null  group by j.jobstatuscode))
>  select js.jobstatuscode,COALESCE(stat_count,0) stat_count from JobCount
> jc right outer join jobstatus js on jc.jobstatuscode=js.jobstatuscode;
>
>                QUERY PLAN
>
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> -----------------------------------------
>  Hash Right Join  (cost=128145.44..128145.67 rows=10 width=12) (actual
> time=1960.823..1960.842 rows=10 loops=1)
>    Hash Cond: ((jc.jobstatuscode)::text = (js.jobstatuscode)::text)
>    Buffers: shared hit=20586 read=8706
>    I/O Timings: read=49.250
>    CTE jobcount
>      ->  HashAggregate  (cost=128144.11..128144.21 rows=10 width=24)
> (actual time=1960.786..1960.788 rows=8 loops=1)
>            Group Key: j.jobstatuscode, (count(1))
>            Buffers: shared hit=20585 read=8706
>            I/O Timings: read=49.250
>            ->  Append  (cost=128135.68..128144.06 rows=10 width=24)
> (actual time=1960.634..1960.774 rows=8 loops=1)
>                  Buffers: shared hit=20585 read=8706
>                  I/O Timings: read=49.250
>                  ->  Finalize GroupAggregate  (cost=128135.68..128137.96
> rows=9 width=12) (actual time=1960.632..1960.689 rows=8 loops=1)
>                        Group Key: j.jobstatuscode
>                        Buffers: shared hit=20579 read=8706
>                        I/O Timings: read=49.250
>                        ->  Gather Merge  (cost=128135.68..128137.78
> rows=18 width=12) (actual time=1960.616..1960.690 rows=24 loops=1)
>                              Workers Planned: 2
>                              Workers Launched: 2
>                              Buffers: shared hit=58214 read=30130
>                              I/O Timings: read=152.485
>                              ->  Sort  (cost=127135.66..127135.68 rows=9
> width=12) (actual time=1941.131..1941.134 rows=8 loops=3)
>                                    Sort Key: j.jobstatuscode
>                                    Sort Method: quicksort  Memory: 25kB
>                                    Worker 0:  Sort Method: quicksort
> Memory: 25kB
>                                    Worker 1:  Sort Method: quicksort
> Memory: 25kB
>                                    Buffers: shared hit=58214 read=30130
>                                    I/O Timings: read=152.485
>                                    ->  Partial HashAggregate
> (cost=127135.43..127135.52 rows=9 width=12) (actual time=1941.088..1941.092
> rows=8 loops=3)
>                                          Group Key: j.jobstatuscode
>                                          Buffers: shared hit=58198
> read=30130
>                                          I/O Timings: read=152.485
>                                          ->  Parallel Seq Scan on job j
> (cost=0.00..125625.68 rows=301950 width=4) (actual time=0.015..1698.223
> rows=244800 loops=3)
>                                                Filter: ((jobstartdatetime
> >= '2020-08-01 00:00:00'::timestamp without time zone) AND
> (jobstartdatetime <= '2020-09-30 00
> :00:00'::timestamp without time zone) AND ((countrycode)::text =
> 'TH'::text) AND ((facilitycode)::text = ANY
> ('{THNPM1,THPRK1,THCNT1,THSPN1,THKRI1,THPKN1,THSBI1,THUTG1,
> THLRI1,THSRI1,THSUR1,THSKM1}'::text[])))
>                                                Rows Removed by Filter:
> 673444
>                                                Buffers: shared hit=58198
> read=30130
>                                                I/O Timings: read=152.485
>                  ->  GroupAggregate  (cost=5.93..5.95 rows=1 width=12)
> (actual time=0.077..0.077 rows=0 loops=1)
>                        Group Key: j_1.jobstatuscode
>                        Buffers: shared hit=6
>                        ->  Sort  (cost=5.93..5.94 rows=1 width=4) (actual
> time=0.075..0.075 rows=0 loops=1)
>                              Sort Key: j_1.jobstatuscode
>                              Sort Method: quicksort  Memory: 25kB
>                              Buffers: shared hit=6
>                              ->  Index Scan using job_list_test3 on job
> j_1  (cost=0.14..5.92 rows=1 width=4) (actual time=0.065..0.065 rows=0
> loops=1)
>                                    Index Cond: ((countrycode)::text =
> 'TH'::text)
>                                    Filter: ((facilitycode)::text = ANY
> ('{THNPM1,THPRK1,THCNT1,THSPN1,THKRI1,THPKN1,THSBI1,THUTG1,THLRI1,THSRI1,THSUR1,THSKM1}'::text[])
> )
>                                    Rows Removed by Filter: 26
>                                    Buffers: shared hit=6
>    ->  CTE Scan on jobcount jc  (cost=0.00..0.20 rows=10 width=24) (actual
> time=1960.789..1960.797 rows=8 loops=1)
>          Buffers: shared hit=20585 read=8706
>          I/O Timings: read=49.250
>    ->  Hash  (cost=1.10..1.10 rows=10 width=4) (actual time=0.023..0.023
> rows=10 loops=1)
>          Buckets: 1024  Batches: 1  Memory Usage: 9kB
>          Buffers: shared hit=1
>          ->  Seq Scan on jobstatus js  (cost=0.00..1.10 rows=10 width=4)
> (actual time=0.007..0.013 rows=10 loops=1)
>                Buffers: shared hit=1
>  Planning Time: 3.019 ms
>  Execution Time: 1961.024 ms
>
>
>>
>>
>>> Auto vacuuming is catching up just fine. No issues in that area.
>>>
>> If the time came down by half after 'vacuum full', I would question that
>> statement.
>>
>
> I checked the last autovacuum on underlying tables before load tests and
> it was very recent. Also I explicitly ran VACUUM ANALYZE FREEZ on
> underlying tables before load test just to make sure. It did not help much.
>
>>
>>
>>> Temp table size is less that original tables without indexes.
>>>
>> Significantly less would indicate the regular table still being bloated I
>> think. Maybe someone else will suggest otherwise.
>>
>
> Please see below.
>
>  SELECT
> relname AS TableName
> ,n_live_tup AS LiveTuples
> ,n_dead_tup AS DeadTuples
> FROM pg_stat_user_tables where relname='job';
>  tablename | livetuples | deadtuples
> -----------+------------+------------
>  job       |    2754980 |        168
>
>
>>
>>
>>> Does this mean we need to upgrade the hardware? Also by caching data ,
>>> do you mean caching at application side(microservices side) ? Or on
>>> postgres side? I tried pg_prewarm, it did not help much.
>>>
>> I can't say about hardware. Until you have exhausted options like configs
>> and indexing, spending more money forever onwards seems premature. I meant
>> pre-aggregated data, wherever it makes sense to do that. I wouldn't expect
>> pg_prewarm to do a ton since you already show high cache hits.
>>
>
> Understood thanks.
>
>>
>>
>>> It is actually the CPU consumption which is the issue. Query is fast
>>> otherwise.
>>>
>> Sure, but that is a symptom of reading and processing a lot of data.
>>
>
> As per application team, it is business requirement to show last 60 days
> worth data. This particular query finds the counts of jobstatus(GROUP BY)
> which may be taking a lot of compute(CPU spikes) I have tried indexing
> suggested by Laurenze as well. Cost and execution time are still high
>

Re: CPU Consuming query. Sequential scan despite indexing.

Reply via email to