Re: Very slow Query compared to Oracle / SQL - Server

Vijaykumar Jain Fri, 07 May 2021 02:56:30 -0700

Is this on windows ?

I see a thread that mentions of performance penalty due to parallel worker



There is a mailing thread with subject line -
Huge performance penalty with parallel queries in Windows x64 v. Linux x64



On Fri, 7 May 2021 at 2:33 PM Semen Yefimenko <semen.yefime...@gmail.com>
wrote:

> As mentionend in the slack comments :
>
> SELECT pg_size_pretty(pg_relation_size('logtable')) as table_size,
> pg_size_pretty(pg_relation_size('idx_entrytype')) as index_size,
> (pgstattuple('logtable')).dead_tuple_percent;
>
> table_size | index_size | dead_tuple_percent
> ------------+------------+--------------------
> 3177 MB | 289 MB | 0
>
> I have roughly 6 indexes which all have around 300 MB
>
> SELECT pg_relation_size('logtable') as table_size,
> pg_relation_size(idx_entrytype) as index_size,
> 100-(pgstatindex('idx_entrytype')).avg_leaf_density as bloat_ratio
>
> table_size | index_size | bloat_ratio
> ------------+------------+-------------------
> 3331694592 | 302555136 | 5.219999999999999
>
> Your queries:
>
> n_live_tup n_dead_tup
> 14118380   0
>
>
> For testing, I've also been running VACUUM and ANALYZE pretty much before
> every test run.
>
> Am Fr., 7. Mai 2021 um 10:44 Uhr schrieb Vijaykumar Jain <
> vijaykumarjain.git...@gmail.com>:
>
>> ok one last thing, not to be a PITA, but just in case if this helps.
>>
>> postgres=# SELECT * FROM pg_stat_user_indexes where relname = 'logtable';
>> postgres=# SELECT * FROM pg_stat_user_tables where relname = 'logtable';
>> basically, i just to verify if the table is not bloated.
>> looking at *n_live_tup* vs *n_dead_tup* would help understand it.
>>
>> if you see too many dead tuples,
>> vacuum (ANALYZE,verbose) logtable; -- would get rid of dead tuples (if
>> there are no tx using the dead tuples)
>>
>> and then run your query.
>>
>> Thanks,
>> Vijay
>>
>>
>>
>>
>>
>>
>> On Fri, 7 May 2021 at 13:34, Semen Yefimenko <semen.yefime...@gmail.com>
>> wrote:
>>
>>> Sorry if I'm cumulatively answering everyone in one E-Mail, I'm not sure
>>> how I'm supposed to do it. (single E-Mails vs many)
>>>
>>>
>>>> Can you try tuning by increasing the shared_buffers slowly in steps of
>>>> 500MB, and running explain analyze against the query.
>>>
>>>
>>> -- 2500 MB shared buffers - random_page_cost = 1;
>>> Gather Merge  (cost=343085.23..392186.19 rows=420836 width=2542) (actual
>>> time=2076.329..3737.050 rows=516517 loops=1)
>>>   Output: column1, .. , column54
>>>   Workers Planned: 2
>>>   Workers Launched: 2
>>>   Buffers: shared hit=295446
>>>   ->  Sort  (cost=342085.21..342611.25 rows=210418 width=2542) (actual
>>> time=2007.487..2202.707 rows=172172 loops=3)
>>>         Output: column1, .. , column54
>>>         Sort Key: logtable.timestampcol DESC
>>>         Sort Method: quicksort  Memory: 65154kB
>>>         Worker 0:  Sort Method: quicksort  Memory: 55707kB
>>>         Worker 1:  Sort Method: quicksort  Memory: 55304kB
>>>         Buffers: shared hit=295446
>>>         Worker 0: actual time=1963.969..2156.624 rows=161205 loops=1
>>>           Buffers: shared hit=91028
>>>         Worker 1: actual time=1984.700..2179.697 rows=161935 loops=1
>>>           Buffers: shared hit=92133
>>>         ->  Parallel Bitmap Heap Scan on schema.logtable
>>>  (cost=5546.39..323481.21 rows=210418 width=2542) (actual
>>> time=322.125..1618.971 rows=172172 loops=3)
>>>               Output: column1, .. , column54
>>>               Recheck Cond: ((logtable.entrytype = 4000) OR
>>> (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
>>>               Filter: (logtable.archivestatus <= 1)
>>>               Heap Blocks: exact=110951
>>>               Buffers: shared hit=295432
>>>               Worker 0: actual time=282.201..1595.117 rows=161205 loops=1
>>>                 Buffers: shared hit=91021
>>>               Worker 1: actual time=303.671..1623.299 rows=161935 loops=1
>>>                 Buffers: shared hit=92126
>>>               ->  BitmapOr  (cost=5546.39..5546.39 rows=510578 width=0)
>>> (actual time=199.119..199.119 rows=0 loops=1)
>>>                     Buffers: shared hit=1334
>>>                     ->  Bitmap Index Scan on idx_entrytype
>>>  (cost=0.00..682.13 rows=67293 width=0) (actual time=28.856..28.857
>>> rows=65970 loops=1)
>>>                           Index Cond: (logtable.entrytype = 4000)
>>>                           Buffers: shared hit=172
>>>                     ->  Bitmap Index Scan on idx_entrytype
>>>  (cost=0.00..2223.63 rows=219760 width=0) (actual time=108.871..108.872
>>> rows=225283 loops=1)
>>>                           Index Cond: (logtable.entrytype = 4001)
>>>                           Buffers: shared hit=581
>>>                     ->  Bitmap Index Scan on idx_entrytype
>>>  (cost=0.00..2261.87 rows=223525 width=0) (actual time=61.377..61.377
>>> rows=225264 loops=1)
>>>                           Index Cond: (logtable.entrytype = 4002)
>>>                           Buffers: shared hit=581
>>> Settings: random_page_cost = '1', temp_buffers = '80MB', work_mem = '1GB'
>>> Planning Time: 0.940 ms
>>> Execution Time: 4188.083 ms
>>>
>>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>> -- 3000 MB shared buffers - random_page_cost = 1;
>>> Gather Merge  (cost=343085.23..392186.19 rows=420836 width=2542) (actual
>>> time=2062.280..3763.408 rows=516517 loops=1)
>>>   Output: column1, .. , column54
>>>   Workers Planned: 2
>>>   Workers Launched: 2
>>>   Buffers: shared hit=295446
>>>   ->  Sort  (cost=342085.21..342611.25 rows=210418 width=2542) (actual
>>> time=1987.933..2180.422 rows=172172 loops=3)
>>>         Output: column1, .. , column54
>>>         Sort Key: logtable.timestampcol DESC
>>>         Sort Method: quicksort  Memory: 66602kB
>>>         Worker 0:  Sort Method: quicksort  Memory: 55149kB
>>>         Worker 1:  Sort Method: quicksort  Memory: 54415kB
>>>         Buffers: shared hit=295446
>>>         Worker 0: actual time=1963.059..2147.916 rows=159556 loops=1
>>>           Buffers: shared hit=89981
>>>         Worker 1: actual time=1949.726..2136.200 rows=158554 loops=1
>>>           Buffers: shared hit=90141
>>>         ->  Parallel Bitmap Heap Scan on schema.logtable
>>>  (cost=5546.39..323481.21 rows=210418 width=2542) (actual
>>> time=340.705..1603.796 rows=172172 loops=3)
>>>               Output: column1, .. , column54
>>>               Recheck Cond: ((logtable.entrytype = 4000) OR
>>> (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
>>>               Filter: (logtable.archivestatus <= 1)
>>>               Heap Blocks: exact=113990
>>>               Buffers: shared hit=295432
>>>               Worker 0: actual time=317.918..1605.548 rows=159556 loops=1
>>>                 Buffers: shared hit=89974
>>>               Worker 1: actual time=304.744..1589.221 rows=158554 loops=1
>>>                 Buffers: shared hit=90134
>>>               ->  BitmapOr  (cost=5546.39..5546.39 rows=510578 width=0)
>>> (actual time=218.972..218.973 rows=0 loops=1)
>>>                     Buffers: shared hit=1334
>>>                     ->  Bitmap Index Scan on idx_entrytype
>>>  (cost=0.00..682.13 rows=67293 width=0) (actual time=37.741..37.742
>>> rows=65970 loops=1)
>>>                           Index Cond: (logtable.entrytype = 4000)
>>>                           Buffers: shared hit=172
>>>                     ->  Bitmap Index Scan on idx_entrytype
>>>  (cost=0.00..2223.63 rows=219760 width=0) (actual time=119.120..119.121
>>> rows=225283 loops=1)
>>>                           Index Cond: (logtable.entrytype = 4001)
>>>                           Buffers: shared hit=581
>>>                     ->  Bitmap Index Scan on idx_entrytype
>>>  (cost=0.00..2261.87 rows=223525 width=0) (actual time=62.097..62.098
>>> rows=225264 loops=1)
>>>                           Index Cond: (logtable.entrytype = 4002)
>>>                           Buffers: shared hit=581
>>> Settings: random_page_cost = '1', temp_buffers = '80MB', work_mem = '1GB'
>>> Planning Time: 2.717 ms
>>> Execution Time: 4224.670 ms
>>>
>>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>> -- 3500 MB shared buffers - random_page_cost = 1;
>>> Gather Merge  (cost=343085.23..392186.19 rows=420836 width=2542) (actual
>>> time=3578.155..4932.858 rows=516517 loops=1)
>>>   Output: column1, .. , column54
>>>   Workers Planned: 2
>>>   Workers Launched: 2
>>>   Buffers: shared hit=14 read=295432 written=67
>>>   ->  Sort  (cost=342085.21..342611.25 rows=210418 width=2542) (actual
>>> time=3482.159..3677.227 rows=172172 loops=3)
>>>         Output: column1, .. , column54
>>>         Sort Key: logtable.timestampcol DESC
>>>         Sort Method: quicksort  Memory: 58533kB
>>>         Worker 0:  Sort Method: quicksort  Memory: 56878kB
>>>         Worker 1:  Sort Method: quicksort  Memory: 60755kB
>>>         Buffers: shared hit=14 read=295432 written=67
>>>         Worker 0: actual time=3435.131..3632.985 rows=166842 loops=1
>>>           Buffers: shared hit=7 read=95783 written=25
>>>         Worker 1: actual time=3441.545..3649.345 rows=179354 loops=1
>>>           Buffers: shared hit=5 read=101608 written=20
>>>         ->  Parallel Bitmap Heap Scan on schema.logtable
>>>  (cost=5546.39..323481.21 rows=210418 width=2542) (actual
>>> time=345.111..3042.932 rows=172172 loops=3)
>>>               Output: column1, .. , column54
>>>               Recheck Cond: ((logtable.entrytype = 4000) OR
>>> (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
>>>               Filter: (logtable.archivestatus <= 1)
>>>               Heap Blocks: exact=96709
>>>               Buffers: shared hit=2 read=295430 written=67
>>>               Worker 0: actual time=300.525..2999.403 rows=166842 loops=1
>>>                 Buffers: shared read=95783 written=25
>>>               Worker 1: actual time=300.552..3004.859 rows=179354 loops=1
>>>                 Buffers: shared read=101606 written=20
>>>               ->  BitmapOr  (cost=5546.39..5546.39 rows=510578 width=0)
>>> (actual time=241.996..241.997 rows=0 loops=1)
>>>                     Buffers: shared hit=2 read=1332
>>>                     ->  Bitmap Index Scan on idx_entrytype
>>>  (cost=0.00..682.13 rows=67293 width=0) (actual time=37.129..37.130
>>> rows=65970 loops=1)
>>>                           Index Cond: (logtable.entrytype = 4000)
>>>                           Buffers: shared read=172
>>>                     ->  Bitmap Index Scan on idx_entrytype
>>>  (cost=0.00..2223.63 rows=219760 width=0) (actual time=131.051..131.052
>>> rows=225283 loops=1)
>>>                           Index Cond: (logtable.entrytype = 4001)
>>>                           Buffers: shared hit=1 read=580
>>>                     ->  Bitmap Index Scan on idx_entrytype
>>>  (cost=0.00..2261.87 rows=223525 width=0) (actual time=73.800..73.800
>>> rows=225264 loops=1)
>>>                           Index Cond: (logtable.entrytype = 4002)
>>>                           Buffers: shared hit=1 read=580
>>> Settings: random_page_cost = '1', temp_buffers = '80MB', work_mem = '1GB'
>>> Planning Time: 0.597 ms
>>> Execution Time: 5389.811 ms
>>>
>>>
>>> This doesn't seem to have had an effect.
>>> Thanks for the suggestion.
>>>
>>> Have you try of excluding not null from index? Can you give dispersion
>>>> of archivestatus?
>>>>
>>>
>>> Yes I have, it yielded the same performance boost as :
>>>
>>>  create index test on logtable(entrytype) where archivestatus <= 1;
>>>
>>> I wonder what the old query plan was...
>>>> Would you include links to your prior correspondance ?
>>>
>>>
>>> So prior Execution Plans are present in the SO.
>>> The other forums I've tried are the official slack channel :
>>> https://postgresteam.slack.com/archives/C0FS3UTAP/p1620286295228600
>>> And SO :
>>> https://stackoverflow.com/questions/67401792/slow-running-postgresql-query
>>> But I think most of the points discussed in these posts have already
>>> been mentionend by you except bloating of indexes.
>>>
>>> Oracle is apparently doing a single scan on "entrytype".
>>>> As a test, you could try forcing that, like:
>>>> begin; SET enable_bitmapscan=off ; explain (analyze) [...]; rollback;
>>>> or
>>>> begin; DROP INDEX idx_arcstatus; explain (analyze) [...]; rollback;
>>>
>>>
>>> I've tried enable_bitmapscan=off but it didn't yield any good results.
>>>
>>> -- 2000 MB shared buffers - random_page_cost = 4 - enable_bitmapscan to
>>> off
>>> Gather Merge  (cost=543949.72..593050.69 rows=420836 width=2542) (actual
>>> time=7716.031..9043.399 rows=516517 loops=1)
>>>   Output: column1, .., column54
>>>   Workers Planned: 2
>>>   Workers Launched: 2
>>>   Buffers: shared hit=192 read=406605
>>>   ->  Sort  (cost=542949.70..543475.75 rows=210418 width=2542) (actual
>>> time=7642.666..7835.527 rows=172172 loops=3)
>>>         Output: column1, .., column54
>>>         Sort Key: logtable.timestampcol DESC
>>>         Sort Method: quicksort  Memory: 58803kB
>>>         Worker 0:  Sort Method: quicksort  Memory: 60376kB
>>>         Worker 1:  Sort Method: quicksort  Memory: 56988kB
>>>         Buffers: shared hit=192 read=406605
>>>         Worker 0: actual time=7610.482..7814.905 rows=177637 loops=1
>>>           Buffers: shared hit=78 read=137826
>>>         Worker 1: actual time=7607.645..7803.561 rows=167316 loops=1
>>>           Buffers: shared hit=80 read=132672
>>>         ->  Parallel Seq Scan on schema.logtable  (cost=0.00..524345.70
>>> rows=210418 width=2542) (actual time=1.669..7189.365 rows=172172 loops=3)
>>>               Output: column1, .., column54
>>>               Filter: ((logtable.acrhivestatus <= 1) AND
>>> ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR
>>> (logtable.entrytype = 4002)))
>>>               Rows Removed by Filter: 4533459
>>>               Buffers: shared hit=96 read=406605
>>>               Worker 0: actual time=1.537..7158.286 rows=177637 loops=1
>>>                 Buffers: shared hit=30 read=137826
>>>               Worker 1: actual time=1.414..7161.670 rows=167316 loops=1
>>>                 Buffers: shared hit=32 read=132672
>>> Settings: enable_bitmapscan = 'off', temp_buffers = '80MB', work_mem =
>>> '1GB'
>>> Planning Time: 0.725 ms
>>> Execution Time: 9500.928 ms
>>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>> 2000 MB shared buffers - random_page_cost = 4 - -- 2000 -- 2000 MB shared
>>> buffers - random_page_cost = 1 - enable_bitmapscan to off
>>> Gather Merge  (cost=543949.72..593050.69 rows=420836 width=2542) (actual
>>> time=7519.032..8871.433 rows=516517 loops=1)
>>>   Output: column1, .., column54
>>>   Workers Planned: 2
>>>   Workers Launched: 2
>>>   Buffers: shared hit=576 read=406221
>>>   ->  Sort  (cost=542949.70..543475.75 rows=210418 width=2542) (actual
>>> time=7451.958..7649.480 rows=172172 loops=3)
>>>         Output: column1, .., column54
>>>         Sort Key: logtable.timestampcol DESC
>>>         Sort Method: quicksort  Memory: 58867kB
>>>         Worker 0:  Sort Method: quicksort  Memory: 58510kB
>>>         Worker 1:  Sort Method: quicksort  Memory: 58788kB
>>>         Buffers: shared hit=576 read=406221
>>>         Worker 0: actual time=7438.271..7644.241 rows=172085 loops=1
>>>           Buffers: shared hit=203 read=135166
>>>         Worker 1: actual time=7407.574..7609.922 rows=172948 loops=1
>>>           Buffers: shared hit=202 read=135225
>>>         ->  Parallel Seq Scan on schema.logtable  (cost=0.00..524345.70
>>> rows=210418 width=2542) (actual time=2.839..7017.729 rows=172172 loops=3)
>>>               Output: column1, .., column54
>>>               Filter: ((logtable.acrhivestatus <= 1) AND
>>> ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR
>>> (logtable.entrytype = 4002)))
>>>               Rows Removed by Filter: 4533459
>>>               Buffers: shared hit=480 read=406221
>>>               Worker 0: actual time=2.628..7006.420 rows=172085 loops=1
>>>                 Buffers: shared hit=155 read=135166
>>>               Worker 1: actual time=3.948..6978.154 rows=172948 loops=1
>>>                 Buffers: shared hit=154 read=135225
>>> Settings: enable_bitmapscan = 'off', random_page_cost = '1',
>>> temp_buffers = '80MB', work_mem = '1GB'
>>> Planning Time: 0.621 ms
>>> Execution Time: 9339.457 ms
>>>
>>> Have you tune shared buffers enough? Each block is of 8k by default.
>>>> BTW, please try to reset random_page_cost.
>>>
>>>
>>> Look above.
>>>
>>> I will try upgrading the minor version next.
>>> I will also try setting up a 13.X version locally and import the data
>>> from 12.2 to 13.X and see if it might be faster.
>>>
>>>
>>> Am Do., 6. Mai 2021 um 23:16 Uhr schrieb Imre Samu <pella.s...@gmail.com
>>> >:
>>>
>>>> *> Postgres Version : *PostgreSQL 12.2,
>>>> >  ... ON ... USING btree
>>>>
>>>> IMHO:
>>>> The next minor (bugix&security) release is near ( expected ~ May 13th,
>>>> 2021 )   https://www.postgresql.org/developer/roadmap/
>>>> so you can update your PostgreSQL to 12.7  ( + full Reindexing
>>>> recommended ! )
>>>>
>>>> You can find a lot of B-tree index-related fixes.
>>>> https://www.postgresql.org/docs/12/release-12-3.html  Release date:
>>>> 2020-05-14
>>>>  - Fix possible undercounting of deleted B-tree index pages in VACUUM
>>>> VERBOSE output
>>>> -  Fix wrong bookkeeping for oldest deleted page in a B-tree index
>>>> -  Ensure INCLUDE'd columns are always removed from B-tree pivot tuples
>>>> https://www.postgresql.org/docs/12/release-12-4.html
>>>>   - Avoid repeated marking of dead btree index entries as dead
>>>> https://www.postgresql.org/docs/12/release-12-5.html
>>>>   - Fix failure of parallel B-tree index scans when the index condition
>>>> is unsatisfiable
>>>> https://www.postgresql.org/docs/12/release-12-6.html Release date:
>>>> 2021-02-11
>>>>
>>>>
>>>> > COLLATE pg_catalog."default"
>>>>
>>>> You can test the "C" Collation  in some columns  (keys ? ) ;  in
>>>> theory, it should be faster :
>>>> "The drawback of using locales other than C or POSIX in PostgreSQL is
>>>> its performance impact. It slows character handling and prevents ordinary
>>>> indexes from being used by LIKE. For this reason use locales only if you
>>>> actually need them."
>>>> https://www.postgresql.org/docs/12/locale.html
>>>>
>>>> https://www.postgresql.org/message-id/flat/CAF6DVKNU0vb4ZeQQ-%3Dagg69QJU3wdjPnMYYrPYY7CKc6iOU7eQ%40mail.gmail.com
>>>>
>>>> Best,
>>>>  Imre
>>>>
>>>>
>>>> Semen Yefimenko <semen.yefime...@gmail.com> ezt írta (időpont: 2021.
>>>> máj. 6., Cs, 16:38):
>>>>
>>>>> Hi there,
>>>>>
>>>>> I've recently been involved in migrating our old system to SQL Server
>>>>> and then PostgreSQL. Everything has been working fine so far but now after
>>>>> executing our tests on Postgres, we saw a very slow running query on a
>>>>> large table in our database.
>>>>> I have tried asking on other platforms but no one has been able to
>>>>> give me a satisfying answer.
>>>>>
>>>>> *Postgres Version : *PostgreSQL 12.2, compiled by Visual C++ build
>>>>> 1914, 64-bit
>>>>> No notable errors in the Server log and the Postgres Server itself.
>>>>>
>>>>> The table structure :
>>>>>
>>>>> CREATE TABLE logtable
>>>>> (
>>>>>     key character varying(20) COLLATE pg_catalog."default" NOT NULL,
>>>>>     id integer,
>>>>>     column3 integer,
>>>>>     column4 integer,
>>>>>     column5 integer,
>>>>>     column6 integer,
>>>>>     column7 integer,
>>>>>     column8 integer,
>>>>>     column9 character varying(128) COLLATE pg_catalog."default",
>>>>>     column10   character varying(2048) COLLATE pg_catalog."default",
>>>>>     column11   character varying(2048) COLLATE pg_catalog."default",
>>>>>     column12   character varying(2048) COLLATE pg_catalog."default",
>>>>>     column13   character varying(2048) COLLATE pg_catalog."default",
>>>>>     column14   character varying(2048) COLLATE pg_catalog."default",
>>>>>     column15   character varying(2048) COLLATE pg_catalog."default",
>>>>>     column16   character varying(2048) COLLATE pg_catalog."default",
>>>>>     column17   character varying(2048) COLLATE pg_catalog."default",
>>>>>     column18   character varying(2048) COLLATE pg_catalog."default",
>>>>>     column19   character varying(2048) COLLATE pg_catalog."default",
>>>>>     column21 character varying(256) COLLATE pg_catalog."default",
>>>>>     column22 character varying(256) COLLATE pg_catalog."default",
>>>>>     column23 character varying(256) COLLATE pg_catalog."default",
>>>>>     column24 character varying(256) COLLATE pg_catalog."default",
>>>>>     column25 character varying(256) COLLATE pg_catalog."default",
>>>>>     column26 character varying(256) COLLATE pg_catalog."default",
>>>>>     column27 character varying(256) COLLATE pg_catalog."default",
>>>>>     column28 character varying(256) COLLATE pg_catalog."default",
>>>>>     column29 character varying(256) COLLATE pg_catalog."default",
>>>>>     column30 character varying(256) COLLATE pg_catalog."default",
>>>>>     column31 character varying(256) COLLATE pg_catalog."default",
>>>>>     column32 character varying(256) COLLATE pg_catalog."default",
>>>>>     column33 character varying(256) COLLATE pg_catalog."default",
>>>>>     column34 character varying(256) COLLATE pg_catalog."default",
>>>>>     column35 character varying(256) COLLATE pg_catalog."default",
>>>>>     entrytype integer,
>>>>>     column37 bigint,
>>>>>     column38 bigint,
>>>>>     column39 bigint,
>>>>>     column40 bigint,
>>>>>     column41 bigint,
>>>>>     column42 bigint,
>>>>>     column43 bigint,
>>>>>     column44 bigint,
>>>>>     column45 bigint,
>>>>>     column46 bigint,
>>>>>     column47 character varying(128) COLLATE pg_catalog."default",
>>>>>     timestampcol timestamp without time zone,
>>>>>     column49 timestamp without time zone,
>>>>>     column50 timestamp without time zone,
>>>>>     column51 timestamp without time zone,
>>>>>     column52 timestamp without time zone,
>>>>>     archivestatus integer,
>>>>>     column54 integer,
>>>>>     column55 character varying(20) COLLATE pg_catalog."default",
>>>>>     CONSTRAINT pkey PRIMARY KEY (key)
>>>>>         USING INDEX TABLESPACE tablespace
>>>>> )
>>>>>
>>>>> TABLESPACE tablespace;
>>>>>
>>>>> ALTER TABLE schema.logtable
>>>>>     OWNER to user;
>>>>>
>>>>> CREATE INDEX idx_timestampcol
>>>>>     ON schema.logtable USING btree
>>>>>     ( timestampcol ASC NULLS LAST )
>>>>>     TABLESPACE tablespace ;
>>>>>
>>>>> CREATE INDEX idx_test2
>>>>>     ON schema.logtable USING btree
>>>>>     ( entrytype  ASC NULLS LAST)
>>>>>     TABLESPACE tablespace
>>>>>     WHERE archivestatus <= 1;
>>>>>
>>>>> CREATE INDEX idx_arcstatus
>>>>>     ON schema.logtable USING btree
>>>>>     ( archivestatus ASC NULLS LAST)
>>>>>     TABLESPACE tablespace;
>>>>>
>>>>> CREATE INDEX idx_entrytype
>>>>>     ON schema.logtable USING btree
>>>>>     ( entrytype ASC NULLS LAST)
>>>>>     TABLESPACE tablespace ;
>>>>>
>>>>>
>>>>> The table contains 14.000.000 entries and has about 3.3 GB of data:
>>>>> No triggers, inserts per day, probably 5-20 K per day.
>>>>>
>>>>> SELECT relname, relpages, reltuples, relallvisible, relkind, relnatts,
>>>>> relhassubclass, reloptions, pg_table_size(oid) FROM pg_class WHERE
>>>>> relname='logtable';
>>>>>
>>>>> relname
>>>>> |relpages|reltuples|relallvisible|relkind|relnatts|relhassubclass|reloptions|pg_table_size|
>>>>>
>>>>> ------------------|--------|---------|-------------|-------|--------|--------------|----------|-------------|
>>>>> logtable          |  405988| 14091424|       405907|r      |
>>>>>  54|false         |NULL      |   3326803968|
>>>>>
>>>>>
>>>>> The slow running query:
>>>>>
>>>>> SELECT column1,..., column54  where ((entrytype = 4000 or entrytype =
>>>>> 4001 or entrytype = 4002) and (archivestatus <= 1)) order by timestampcol
>>>>> desc;
>>>>>
>>>>>
>>>>> This query runs in about 45-60 seconds.
>>>>> The same query runs in about 289 ms Oracle and 423 ms in SQL-Server.
>>>>> Now I understand that actually loading all results would take a while.
>>>>> (about 520K or so rows)
>>>>> But that shouldn't be exactly what happens right? There should be a
>>>>> resultset iterator which can retrieve all data but doesn't from the get 
>>>>> go.
>>>>>
>>>>> With the help of some people in the slack and so thread, I've found a
>>>>> configuration parameter which helps performance :
>>>>>
>>>>> set random_page_cost = 1;
>>>>>
>>>>> This improved performance from 45-60 s to 15-35 s. (since we are using
>>>>> ssd's)
>>>>> Still not acceptable but definitely an improvement.
>>>>> Some maybe relevant system parameters:
>>>>>
>>>>> effective_cache_size 4GB
>>>>> maintenance_work_mem 1GB
>>>>> shared_buffers 2GB
>>>>> work_mem 1GB
>>>>>
>>>>>
>>>>> Currently I'm accessing the data through DbBeaver (JDBC -
>>>>> postgresql-42.2.5.jar) and our JAVA application (JDBC -
>>>>> postgresql-42.2.19.jar). Both use the defaultRowFetchSize=5000 to not load
>>>>> everything into memory and limit the results.
>>>>> The explain plan:
>>>>>
>>>>> EXPLAIN (ANALYZE, BUFFERS, SETTINGS, VERBOSE)...
>>>>> (Above Query)
>>>>>
>>>>>
>>>>> Gather Merge  (cost=347142.71..397196.91 rows=429006 width=2558)
>>>>> (actual time=21210.019..22319.444 rows=515841 loops=1)
>>>>>   Output: column1, .. , column54
>>>>>   Workers Planned: 2
>>>>>   Workers Launched: 2
>>>>>   Buffers: shared hit=141487 read=153489
>>>>>   ->  Sort  (cost=346142.69..346678.95 rows=214503 width=2558) (actual
>>>>> time=21148.887..21297.428 rows=171947 loops=3)
>>>>>         Output: column1, .. , column54
>>>>>         Sort Key: logtable.timestampcol DESC
>>>>>         Sort Method: quicksort  Memory: 62180kB
>>>>>         Worker 0:  Sort Method: quicksort  Memory: 56969kB
>>>>>         Worker 1:  Sort Method: quicksort  Memory: 56837kB
>>>>>         Buffers: shared hit=141487 read=153489
>>>>>         Worker 0: actual time=21129.973..21296.839 rows=166776 loops=1
>>>>>           Buffers: shared hit=45558 read=49514
>>>>>         Worker 1: actual time=21114.439..21268.117 rows=165896 loops=1
>>>>>           Buffers: shared hit=45104 read=49506
>>>>>         ->  Parallel Bitmap Heap Scan on schema.logtable
>>>>>  (cost=5652.74..327147.77 rows=214503 width=2558) (actual
>>>>> time=1304.813..20637.462 rows=171947 loops=3)
>>>>>               Output: column1, .. , column54
>>>>>               Recheck Cond: ((logtable.entrytype = 4000) OR
>>>>> (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
>>>>>               Filter: (logtable.archivestatus <= 1)
>>>>>               Heap Blocks: exact=103962
>>>>>               Buffers: shared hit=141473 read=153489
>>>>>               Worker 0: actual time=1280.472..20638.620 rows=166776
>>>>> loops=1
>>>>>                 Buffers: shared hit=45551 read=49514
>>>>>               Worker 1: actual time=1275.274..20626.219 rows=165896
>>>>> loops=1
>>>>>                 Buffers: shared hit=45097 read=49506
>>>>>               ->  BitmapOr  (cost=5652.74..5652.74 rows=520443
>>>>> width=0) (actual time=1179.438..1179.438 rows=0 loops=1)
>>>>>                     Buffers: shared hit=9 read=1323
>>>>>                     ->  Bitmap Index Scan on idx_entrytype
>>>>>  (cost=0.00..556.61 rows=54957 width=0) (actual time=161.939..161.940
>>>>> rows=65970 loops=1)
>>>>>                           Index Cond: (logtable.entrytype = 4000)
>>>>>                           Buffers: shared hit=1 read=171
>>>>>                     ->  Bitmap Index Scan on idx_entrytype
>>>>>  (cost=0.00..2243.22 rows=221705 width=0) (actual time=548.849..548.849
>>>>> rows=224945 loops=1)
>>>>>                           Index Cond: (logtable.entrytype = 4001)
>>>>>                           Buffers: shared hit=4 read=576
>>>>>                     ->  Bitmap Index Scan on idx_entrytype
>>>>>  (cost=0.00..2466.80 rows=243782 width=0) (actual time=468.637..468.637
>>>>> rows=224926 loops=1)
>>>>>                           Index Cond: (logtable.entrytype = 4002)
>>>>>                           Buffers: shared hit=4 read=576
>>>>> Settings: random_page_cost = '1', search_path = '"$user", schema,
>>>>> public', temp_buffers = '80MB', work_mem = '1GB'
>>>>> Planning Time: 0.578 ms
>>>>> Execution Time: 22617.351 ms
>>>>>
>>>>> As mentioned before, oracle does this much faster.
>>>>>
>>>>>
>>>>> -------------------------------------------------------------------------------------------------------------------------
>>>>> | Id  | Operation                             | Name
>>>>>  | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
>>>>>
>>>>> -------------------------------------------------------------------------------------------------------------------------
>>>>> |   0 | SELECT STATEMENT                      |
>>>>>   |  6878 |  2491K|       |  2143   (1)| 00:00:01 |
>>>>> |   1 |  SORT ORDER BY                        |
>>>>>   |  6878 |  2491K|  3448K|  2143   (1)| 00:00:01 |
>>>>> |   2 |   INLIST ITERATOR                     |
>>>>>   |       |       |       |            |          |
>>>>> |*  3 |    TABLE ACCESS BY INDEX ROWID BATCHED| logtable
>>>>>   |  6878 |  2491K|       |  1597   (1)| 00:00:01 |
>>>>> |*  4 |     INDEX RANGE SCAN                  | idx_entrytype
>>>>>  |  6878 |       |       |    23   (0)| 00:00:01 |
>>>>>
>>>>> -------------------------------------------------------------------------------------------------------------------------
>>>>>
>>>>> Is there much I can analyze, any information you might need to further
>>>>> analyze this?
>>>>>
>>>>
>>
>> --
>> Thanks,
>> Vijay
>> Mumbai, India
>>
> --
Thanks,
Vijay
Mumbai, India

Re: Very slow Query compared to Oracle / SQL - Server

Reply via email to