Performance of a Query
Hello Gurus, I am struggling to tune a query which is doing join on top of aggregate for around 3 million rows. The plan and SQL is attached to the email. Below is system Details: PGSQL version - 10.1 OS - RHEL 3.10.0-693.5.2.el7.x86_64 Binary - Dowloaded from postgres.org compiled and installed. Hardware - Virtual Machine with 8vCPU and 32GB of RAM, on XFS filesystem. Please let me know if you need more information. Regards, Virendra This message is intended only for the use of the addressee and may contain information that is PRIVILEGED AND CONFIDENTIAL. If you are not the intended recipient, you are hereby notified that any dissemination of this communication is strictly prohibited. If you have received this communication in error, please erase all copies of the message and its attachments and notify the sender immediately. Thank you. Aggregate (cost=4794108.63..4794108.64 rows=1 width=232) (actual time=54451.640..54451.641 rows=1 loops=1) Buffers: shared hit=3771629 read=46189, temp read=20532 written=20551 -> Merge Join (cost=491054.79..2272373.76 rows=34782550 width=200) (actual time=3278.909..49768.897 rows=3704652 loops=1) Merge Cond: ((se.peril_id = pe.peril_id) AND (se.account_id = pe.account_id)) Buffers: shared hit=3771629 read=46189, temp read=20532 written=20551 -> Finalize GroupAggregate (cost=491054.35..953772.46 rows=407064 width=152) (actual time=3278.847..17842.707 rows=3704652 loops=1) Group Key: se.peril_id, se.portfolio_id, se.account_id Buffers: shared hit=66 read=46189, temp read=20532 written=20551 -> Gather Merge (cost=491054.35..827582.62 rows=2442384 width=152) (actual time=3278.833..10355.885 rows=3731100 loops=1) Workers Planned: 6 Workers Launched: 6 Buffers: shared hit=66 read=46189, temp read=20532 written=20551 -> Partial GroupAggregate (cost=490054.25..529743.04 rows=407064 width=152) (actual time=3264.819..4961.425 rows=533014 loops=7) Group Key: se.peril_id, se.portfolio_id, se.account_id Buffers: shared hit=676 read=306596, temp read=135840 written=135972 -> Sort (cost=490054.25..491750.35 rows=678441 width=148) (actual time=3264.796..3731.927 rows=581538 loops=7) Sort Key: se.peril_id, se.account_id Sort Method: external merge Disk: 97240kB Buffers: shared hit=676 read=306596, temp read=135840 written=135972 -> Result (cost=0.00..322308.92 rows=678441 width=148) (actual time=0.036..2205.533 rows=581538 loops=7) Buffers: shared hit=448 read=306596 -> Append (cost=0.00..315524.51 rows=678441 width=148) (actual time=0.033..1523.998 rows=581538 loops=7) Buffers: shared hit=448 read=306596 -> Parallel Seq Scan on site_exposure_1192662 se (cost=0.00..315524.51 rows=678441 width=148) (actual time=0.030..896.668 rows=581538 loops=7) Filter: (portfolio_id = '-1192662'::integer) Buffers: shared hit=448 read=306596 -> Materialize (cost=0.44..712345.45 rows=3417895 width=80) (actual time=0.055..24504.782 rows=3704652 loops=1) Buffers: shared hit=3771563 -> GroupAggregate (cost=0.44..669621.76 rows=3417895 width=80) (actual time=0.051..20359.291 rows=3704652 loops=1) Group Key: pe.peril_id, pe.portfolio_id, pe.account_id Buffers: shared hit=3771563 -> Result (cost=0.44..532014.98 rows=3761012 width=80) (actual time=0.040..13908.465 rows=3761146 loops=1) Buffers: shared hit=3771563 -> Merge Append (cost=0.44..494404.86 rows=3761012 width=80) (actual time=0.038..10094.668 rows=3761146 loops=1) Sort Key: pe.peril_id, pe.account_id Buffers: shared hit=3771563 -> Index Scan using policy_exposure_1192662_portfolio_id_peril_id_account_id_idx on policy_exposure_1192662 pe (cost=0.43..466197.26 rows=3761012 width=80) (actual time=0.036..6448.117 rows=3761146 loops=1) Index Cond: (portfolio_id = '-1192662'::integer) Buffers: shared hit=3771563 AggQuery.sql Description: AggQuery.sql
RE: Performance of a Query
Thank you Scott! I have current work_mem set as 4MB, shared_buffers to 8GB, hugepages on. I gradually increased the work_mem to 1GB but it did not help a bit. Am I missing something obvious. Regards, Virendra -Original Message- From: Scott Marlowe [mailto:scott.marl...@gmail.com] Sent: Tuesday, January 09, 2018 5:08 PM To: Kumar, Virendra Cc: pgsql-performa...@postgresql.org Subject: Re: Performance of a Query On Tue, Jan 9, 2018 at 2:18 PM, Kumar, Virendra wrote: > Hello Gurus, > > I am struggling to tune a query which is doing join on top of > aggregate for around 3 million rows. The plan and SQL is attached to the > email. > > Below is system Details: > > PGSQL version – 10.1 > > OS – RHEL 3.10.0-693.5.2.el7.x86_64 > > Binary – Dowloaded from postgres.org compiled and installed. > > Hardware – Virtual Machine with 8vCPU and 32GB of RAM, on XFS filesystem. I uploaded your query plan here: https://explain.depesz.com/s/14r6 The most expensive part is the merge join at the end. Lines like this one: "Buffers: shared hit=676 read=306596, temp read=135840 written=135972" Tell me that your sorts etc are spilling to disk, so the first thing to try is upping work_mem a bit. Don't go crazy, as it can run your machine out of memory if you do. but doubling or tripling it and seeing the effect on the query performance is a good place to start. The good news is that most of your row estimates are about right, so the query planner is doing what it can to make the query fast, but I'm guessing if you get the work_mem high enough it will switch from a merge join to a hash_join or something more efficient for large numbers of rows. This message is intended only for the use of the addressee and may contain information that is PRIVILEGED AND CONFIDENTIAL. If you are not the intended recipient, you are hereby notified that any dissemination of this communication is strictly prohibited. If you have received this communication in error, please erase all copies of the message and its attachments and notify the sender immediately. Thank you.
RE: Performance of a Query
It did not seem to help. See attachment. Regards, Virendra -Original Message- From: Scott Marlowe [mailto:scott.marl...@gmail.com] Sent: Tuesday, January 09, 2018 6:00 PM To: Kumar, Virendra Cc: pgsql-performa...@postgresql.org Subject: Re: Performance of a Query On Tue, Jan 9, 2018 at 3:25 PM, Kumar, Virendra wrote: > Thank you Scott! > I have current work_mem set as 4MB, shared_buffers to 8GB, hugepages on. > I gradually increased the work_mem to 1GB but it did not help a bit. Am I > missing something obvious. > From: Scott Marlowe [mailto:scott.marl...@gmail.com] > Sent: Tuesday, January 09, 2018 5:08 PM > To: Kumar, Virendra > Cc: pgsql-performa...@postgresql.org > Subject: Re: Performance of a Query Try it with something reasonable like 64MB and then post your query plans to explain.depesz and then here and let's compare. Note that some queries are just slow, and this one is handling a lot of data, so there's only so much to do if an index won't fix it. This message is intended only for the use of the addressee and may contain information that is PRIVILEGED AND CONFIDENTIAL. If you are not the intended recipient, you are hereby notified that any dissemination of this communication is strictly prohibited. If you have received this communication in error, please erase all copies of the message and its attachments and notify the sender immediately. Thank you. ap_poc_db=# show work_mem; work_mem -- 64MB (1 row) Time: 0.285 ms ap_poc_db=# explain (analyze,buffers) ap_poc_db-# SELECT SUM se_cov1val + se_cov2val) + se_cov3val) + se_cov4val)) ap_poc_db-#AS "10", ap_poc_db-#SUM (se_site_limit) ap_poc_db-#AS "11", ap_poc_db-#SUM (se_cov1val) ap_poc_db-#AS "12", ap_poc_db-#SUM (se_cov2val) ap_poc_db-#AS "13", ap_poc_db-#SUM (se_cov3val) ap_poc_db-#AS "14", ap_poc_db-#SUM (se_cov4val) ap_poc_db-#AS "15", ap_poc_db-#SUM (se_site_deduct) ap_poc_db-#AS "17", ap_poc_db-#SUM (se_risk_count) ap_poc_db-#AS "21", ap_poc_db-#SUM (se_cov1limit) ap_poc_db-#AS "143", ap_poc_db-#SUM (se_cov1deduct) ap_poc_db-#AS "144", ap_poc_db-#SUM (se_cov2limit) ap_poc_db-#AS "145", ap_poc_db-#SUM (se_cov3limit) ap_poc_db-#AS "147", ap_poc_db-#SUM (se_cov3deduct) ap_poc_db-#AS "148", ap_poc_db-#SUM (se_cov4limit) ap_poc_db-#AS "149", ap_poc_db-#SUM (se_cov4deduct) ap_poc_db-#AS "150", ap_poc_db-#SUM (se_site_bl_deduct) ap_poc_db-#AS "212", ap_poc_db-#SUM (se_agg_limit) ap_poc_db-#AS "213", ap_poc_db-#SUM (se_site_bl_limit) ap_poc_db-#AS "211", ap_poc_db-#SUM (pe_premium) ap_poc_db-#AS "93", ap_poc_db-#SUM (pe_policy_deduct) ap_poc_db-#AS "92", ap_poc_db-#SUM (pe_undercover) ap_poc_db-#AS "127", ap_poc_db-#SUM (pe_prorata) ap_poc_db-#AS "126", ap_poc_db-#SUM (pe_policy_bl_deduct) ap_poc_db-#AS "139", ap_poc_db-#SUM (pe_policy_bl_grosslimit) ap_poc_db-#AS "142", ap_poc_db-#SUM (pe_policy_limit) ap_poc_db-#AS "128", ap_poc_db-#SUM (pe_agg_deduct) ap_poc_db-#AS "155" ap_poc_db-# FROM (SELECT SUM (se.site_limit) AS se_site_limit, ap_poc_db(#SUM (se.cov1val)AS se_cov1val, ap_poc_db(#SUM (se.cov2val)AS se_cov2val, ap_poc_db(#SUM (se.cov3val)AS se_cov3val, ap_poc_db(#SUM (se.cov4val)AS se_cov4val, ap_poc_db(# SUM (se.site_deduct)AS se_site_deduct, ap_poc_db(#SUM (se.risk_count) AS se_risk_count, ap_poc_db(#SUM (se.cov1limit) AS se_cov1limit, ap_poc_db(#SUM (se.cov1deduct) AS se_cov1deduct, ap_poc_db(#SUM (se.cov2limit) AS se_cov2limit, ap_poc_db(#SUM (se.cov3limit) AS se_cov3limit, ap_poc_db(#SUM (se.cov3deduct) AS se_cov3deduct, ap_poc_db(#SUM (se.cov4limit) AS se_cov4limit, ap_poc_db(#SUM (se.cov4deduct) AS se_cov4deduct, ap_poc_db(#SUM (se.site_bl_deduct) AS se_site_bl_deduct, ap_poc_db(#SUM (se.agg_limit) AS se_agg_limit, ap_poc_db(#SUM (se.site_bl_limit) AS se_site_bl_limit, ap_poc_db(#se.peril_id AS se_
Nested Loops
Can somebody help me avoid nested loops in below query: -- ap_poc_db=# explain (analyze,buffers) ap_poc_db-# select site_id, account_id FROM ap.site_exposure se ap_poc_db-# WHERE se.portfolio_id=-1191836 ap_poc_db-# AND EXISTS (select 1 from ap.catevent_flood_sc_split sp where sp.migration_sourcename= 'KatRisk_SC_Flood_2015_v9' AND ST_Intersects(se.shape, sp.shape)) ap_poc_db-# group by site_id, account_id; QUERY PLAN -- Group (cost=23479854.04..23479880.06 rows=206 width=16) (actual time=1387.825..1389.134 rows=1532 loops=1) Group Key: se.site_id, se.account_id Buffers: shared hit=172041 -> Gather Merge (cost=23479854.04..23479879.04 rows=205 width=16) (actual time=1387.823..1388.676 rows=1532 loops=1) Workers Planned: 5 Workers Launched: 5 Buffers: shared hit=172041 -> Group (cost=23478853.96..23478854.27 rows=41 width=16) (actual time=1346.044..1346.176 rows=255 loops=6) Group Key: se.site_id, se.account_id Buffers: shared hit=864280 -> Sort (cost=23478853.96..23478854.07 rows=41 width=16) (actual time=1346.041..1346.079 rows=255 loops=6) Sort Key: se.site_id, se.account_id Sort Method: quicksort Memory: 37kB Buffers: shared hit=864280 -> Nested Loop Semi Join (cost=4.53..23478852.87 rows=41 width=16) (actual time=34.772..1345.489 rows=255 loops=6) Buffers: shared hit=864235 -> Append (cost=0.00..156424.56 rows=123645 width=48) (actual time=1.011..204.748 rows=102990 loops=6) Buffers: shared hit=154879 -> Parallel Seq Scan on site_exposure_1191836 se (cost=0.00..156424.56 rows=123645 width=48) (actual time=1.004..187.702 rows=102990 loops=6) Filter: (portfolio_id = '-1191836'::integer) Buffers: shared hit=154879 -> Bitmap Heap Scan on catevent_flood_sc_split sp (cost=4.53..188.54 rows=15 width=492) (actual time=0.007..0.007 rows=0 loops=617937) Recheck Cond: (se.shape && shape) Filter: ((migration_sourcename = 'KatRisk_SC_Flood_2015_v9'::bpchar) AND _st_intersects(se.shape, shape)) Rows Removed by Filter: 0 Heap Blocks: exact=1060 Buffers: shared hit=709356 -> Bitmap Index Scan on catevent_flood_sc_split_shape_mig_src_gix (cost=0.00..4.52 rows=45 width=0) (actual time=0.005..0.005 rows=0 loops=617937) Index Cond: (se.shape && shape) Buffers: shared hit=691115 Planning time: 116.141 ms Execution time: 1391.785 ms (32 rows) ap_poc_db=# Thank you in advance! Regards, Virendra This message is intended only for the use of the addressee and may contain information that is PRIVILEGED AND CONFIDENTIAL. If you are not the intended recipient, you are hereby notified that any dissemination of this communication is strictly prohibited. If you have received this communication in error, please erase all copies of the message and its attachments and notify the sender immediately. Thank you.
RE: Why the index is not used ?
You can consider outside DB encryption which is less of worry for performance and data at rest will be encrypted. Regards, Virendra -Original Message- From: ROS Didier [mailto:didier@edf.fr] Sent: Sunday, October 07, 2018 2:33 PM To: fola...@peoplecall.com Cc: pavel.steh...@gmail.com; pgsql-...@lists.postgresql.org; pgsql-performance@lists.postgresql.org; pgsql-gene...@lists.postgresql.org Subject: RE: Why the index is not used ? Hi Francisco Thank you for your remark. You're right, but it's the only procedure I found to make search on encrypted fields with good response times (using index) ! Regarding access to the file system, our servers are in protected network areas. few people can connect to it. it's not the best solution, but we have data encryption needs and good performance needs too. I do not know how to do it except the specified procedure.. if anyone has any proposals to put this in place, I'm interested. Thanks in advance Best Regards Didier ROS -Message d'origine- De : fola...@peoplecall.com [mailto:fola...@peoplecall.com] Envoyé : dimanche 7 octobre 2018 17:58 À : ROS Didier Cc : pavel.steh...@gmail.com; pgsql-...@lists.postgresql.org; pgsql-performance@lists.postgresql.org; pgsql-gene...@lists.postgresql.org Objet : Re: Why the index is not used ? ROS: On Sun, Oct 7, 2018 at 3:13 PM, ROS Didier wrote: > -INSERT INTO cartedecredit(username,cc) SELECT 'individu ' || x.id, > pgp_sym_encrypt('test value ' || x.id, 'motdepasse','compress-algo=2, > cipher-algo=aes256') FROM generate_series(1,10) AS x(id); > -CREATE INDEX idx_cartedecredit_cc02 ON > cartedecredit(pgp_sym_decrypt(cc, 'motdepasse','compress-algo=2, > cipher-algo=aes256')); If my french is not too rusty you are encrypting a credit-card, and then storing an UNENCRYPTED copy in the index. So, getting it from the server is trivial for anyone with filesystem access. Francisco Olarte. Ce message et toutes les pièces jointes (ci-après le 'Message') sont établis à l'intention exclusive des destinataires et les informations qui y figurent sont strictement confidentielles. Toute utilisation de ce Message non conforme à sa destination, toute diffusion ou toute publication totale ou partielle, est interdite sauf autorisation expresse. Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de votre système, ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support que ce soit. Nous vous remercions également d'en avertir immédiatement l'expéditeur par retour du message. Il est impossible de garantir que les communications par messagerie électronique arrivent en temps utile, sont sécurisées ou dénuées de toute erreur ou virus. This message and any attachments (the 'Message') are intended solely for the addressees. The information contained in this Message is confidential. Any use of information contained in this Message not in accord with its purpose, any dissemination or disclosure, either whole or partial, is prohibited except formal approval. If you are not the addressee, you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return message. E-mail communication cannot be guaranteed to be timely secure, error or virus-free. This message is intended only for the use of the addressee and may contain information that is PRIVILEGED AND CONFIDENTIAL. If you are not the intended recipient, you are hereby notified that any dissemination of this communication is strictly prohibited. If you have received this communication in error, please erase all copies of the message and its attachments and notify the sender immediately. Thank you.