Performance of a Query

2018-01-09 Thread Kumar, Virendra
Hello Gurus,

I am struggling to tune a query which is doing join on top of aggregate for 
around 3 million rows. The plan and SQL is attached to the email.

Below is system Details:

PGSQL version - 10.1
OS - RHEL 3.10.0-693.5.2.el7.x86_64
Binary - Dowloaded from postgres.org compiled and installed.
Hardware - Virtual Machine with 8vCPU and 32GB of RAM, on XFS filesystem.


Please let me know if you need more information.


Regards,
Virendra



This message is intended only for the use of the addressee and may contain
information that is PRIVILEGED AND CONFIDENTIAL.

If you are not the intended recipient, you are hereby notified that any
dissemination of this communication is strictly prohibited. If you have
received this communication in error, please erase all copies of the message
and its attachments and notify the sender immediately. Thank you.

 Aggregate  (cost=4794108.63..4794108.64 rows=1 width=232) (actual 
time=54451.640..54451.641 rows=1 loops=1)
   Buffers: shared hit=3771629 read=46189, temp read=20532 written=20551
   ->  Merge Join  (cost=491054.79..2272373.76 rows=34782550 width=200) (actual 
time=3278.909..49768.897 rows=3704652 loops=1)
 Merge Cond: ((se.peril_id = pe.peril_id) AND (se.account_id = 
pe.account_id))
 Buffers: shared hit=3771629 read=46189, temp read=20532 written=20551
 ->  Finalize GroupAggregate  (cost=491054.35..953772.46 rows=407064 
width=152) (actual time=3278.847..17842.707 rows=3704652 loops=1)
   Group Key: se.peril_id, se.portfolio_id, se.account_id
   Buffers: shared hit=66 read=46189, temp read=20532 written=20551
   ->  Gather Merge  (cost=491054.35..827582.62 rows=2442384 
width=152) (actual time=3278.833..10355.885 rows=3731100 loops=1)
 Workers Planned: 6
 Workers Launched: 6
 Buffers: shared hit=66 read=46189, temp read=20532 
written=20551
 ->  Partial GroupAggregate  (cost=490054.25..529743.04 
rows=407064 width=152) (actual time=3264.819..4961.425 rows=533014 loops=7)
   Group Key: se.peril_id, se.portfolio_id, 
se.account_id
   Buffers: shared hit=676 read=306596, temp 
read=135840 written=135972
   ->  Sort  (cost=490054.25..491750.35 rows=678441 
width=148) (actual time=3264.796..3731.927 rows=581538 loops=7)
 Sort Key: se.peril_id, se.account_id
 Sort Method: external merge  Disk: 97240kB
 Buffers: shared hit=676 read=306596, temp 
read=135840 written=135972
 ->  Result  (cost=0.00..322308.92 rows=678441 
width=148) (actual time=0.036..2205.533 rows=581538 loops=7)
   Buffers: shared hit=448 read=306596
   ->  Append  (cost=0.00..315524.51 
rows=678441 width=148) (actual time=0.033..1523.998 rows=581538 loops=7)
 Buffers: shared hit=448 read=306596
 ->  Parallel Seq Scan on 
site_exposure_1192662 se  (cost=0.00..315524.51 rows=678441 width=148) (actual 
time=0.030..896.668 rows=581538 loops=7)
   Filter: (portfolio_id = 
'-1192662'::integer)
   Buffers: shared hit=448 
read=306596
 ->  Materialize  (cost=0.44..712345.45 rows=3417895 width=80) (actual 
time=0.055..24504.782 rows=3704652 loops=1)
   Buffers: shared hit=3771563
   ->  GroupAggregate  (cost=0.44..669621.76 rows=3417895 width=80) 
(actual time=0.051..20359.291 rows=3704652 loops=1)
 Group Key: pe.peril_id, pe.portfolio_id, pe.account_id
 Buffers: shared hit=3771563
 ->  Result  (cost=0.44..532014.98 rows=3761012 width=80) 
(actual time=0.040..13908.465 rows=3761146 loops=1)
   Buffers: shared hit=3771563
   ->  Merge Append  (cost=0.44..494404.86 rows=3761012 
width=80) (actual time=0.038..10094.668 rows=3761146 loops=1)
 Sort Key: pe.peril_id, pe.account_id
 Buffers: shared hit=3771563
 ->  Index Scan using 
policy_exposure_1192662_portfolio_id_peril_id_account_id_idx on 
policy_exposure_1192662 pe  (cost=0.43..466197.26 rows=3761012 width=80) 
(actual time=0.036..6448.117 rows=3761146 loops=1)
   Index Cond: (portfolio_id = 
'-1192662'::integer)
   Buffers: shared hit=3771563


AggQuery.sql
Description: AggQuery.sql


RE: Performance of a Query

2018-01-09 Thread Kumar, Virendra
Thank you Scott!
I have current work_mem set as 4MB, shared_buffers to 8GB, hugepages on.
I gradually increased the work_mem to 1GB but it did not help a bit. Am I 
missing something obvious.

Regards,
Virendra
-Original Message-
From: Scott Marlowe [mailto:scott.marl...@gmail.com]
Sent: Tuesday, January 09, 2018 5:08 PM
To: Kumar, Virendra
Cc: pgsql-performa...@postgresql.org
Subject: Re: Performance of a Query

On Tue, Jan 9, 2018 at 2:18 PM, Kumar, Virendra  
wrote:
> Hello Gurus,
>
> I am struggling to tune a query which is doing join on top of
> aggregate for around 3 million rows. The plan and SQL is attached to the 
> email.
>
> Below is system Details:
>
> PGSQL version – 10.1
>
> OS – RHEL 3.10.0-693.5.2.el7.x86_64
>
> Binary – Dowloaded from postgres.org compiled and installed.
>
> Hardware – Virtual Machine with 8vCPU and 32GB of RAM, on XFS filesystem.

I uploaded your query plan here: https://explain.depesz.com/s/14r6

The most expensive part is the merge join at the end.

Lines like this one: "Buffers: shared hit=676 read=306596, temp
read=135840 written=135972"

Tell me that your sorts etc are spilling to disk, so the first thing to try is 
upping work_mem a bit. Don't go crazy, as it can run your machine out of memory 
if you do. but doubling or tripling it and seeing the effect on the query 
performance is a good place to start.

The good news is that most of your row estimates are about right, so the query 
planner is doing what it can to make the query fast, but I'm guessing if you 
get the work_mem high enough it will switch from a merge join to a hash_join or 
something more efficient for large numbers of rows.




This message is intended only for the use of the addressee and may contain
information that is PRIVILEGED AND CONFIDENTIAL.

If you are not the intended recipient, you are hereby notified that any
dissemination of this communication is strictly prohibited. If you have
received this communication in error, please erase all copies of the message
and its attachments and notify the sender immediately. Thank you.


RE: Performance of a Query

2018-01-09 Thread Kumar, Virendra
It did not seem to help.
See attachment.


Regards,
Virendra
-Original Message-
From: Scott Marlowe [mailto:scott.marl...@gmail.com]
Sent: Tuesday, January 09, 2018 6:00 PM
To: Kumar, Virendra
Cc: pgsql-performa...@postgresql.org
Subject: Re: Performance of a Query

On Tue, Jan 9, 2018 at 3:25 PM, Kumar, Virendra  
wrote:
> Thank you Scott!
> I have current work_mem set as 4MB, shared_buffers to 8GB, hugepages on.
> I gradually increased the work_mem to 1GB but it did not help a bit. Am I 
> missing something obvious.
> From: Scott Marlowe [mailto:scott.marl...@gmail.com]
> Sent: Tuesday, January 09, 2018 5:08 PM
> To: Kumar, Virendra
> Cc: pgsql-performa...@postgresql.org
> Subject: Re: Performance of a Query

Try it with something reasonable like 64MB and then post your query plans to 
explain.depesz and then here and let's compare. Note that some queries are just 
slow, and this one is handling a lot of data, so there's only so much to do if 
an index won't fix it.




This message is intended only for the use of the addressee and may contain
information that is PRIVILEGED AND CONFIDENTIAL.

If you are not the intended recipient, you are hereby notified that any
dissemination of this communication is strictly prohibited. If you have
received this communication in error, please erase all copies of the message
and its attachments and notify the sender immediately. Thank you.
ap_poc_db=# show work_mem;
 work_mem
--
 64MB
(1 row)

Time: 0.285 ms
ap_poc_db=# explain (analyze,buffers)
ap_poc_db-# SELECT SUM se_cov1val + se_cov2val) + se_cov3val) + se_cov4val))
ap_poc_db-#AS "10",
ap_poc_db-#SUM (se_site_limit)
ap_poc_db-#AS "11",
ap_poc_db-#SUM (se_cov1val)
ap_poc_db-#AS "12",
ap_poc_db-#SUM (se_cov2val)
ap_poc_db-#AS "13",
ap_poc_db-#SUM (se_cov3val)
ap_poc_db-#AS "14",
ap_poc_db-#SUM (se_cov4val)
ap_poc_db-#AS "15",
ap_poc_db-#SUM (se_site_deduct)
ap_poc_db-#AS "17",
ap_poc_db-#SUM (se_risk_count)
ap_poc_db-#AS "21",
ap_poc_db-#SUM (se_cov1limit)
ap_poc_db-#AS "143",
ap_poc_db-#SUM (se_cov1deduct)
ap_poc_db-#AS "144",
ap_poc_db-#SUM (se_cov2limit)
ap_poc_db-#AS "145",
ap_poc_db-#SUM (se_cov3limit)
ap_poc_db-#AS "147",
ap_poc_db-#SUM (se_cov3deduct)
ap_poc_db-#AS "148",
ap_poc_db-#SUM (se_cov4limit)
ap_poc_db-#AS "149",
ap_poc_db-#SUM (se_cov4deduct)
ap_poc_db-#AS "150",
ap_poc_db-#SUM (se_site_bl_deduct)
ap_poc_db-#AS "212",
ap_poc_db-#SUM (se_agg_limit)
ap_poc_db-#AS "213",
ap_poc_db-#SUM (se_site_bl_limit)
ap_poc_db-#AS "211",
ap_poc_db-#SUM (pe_premium)
ap_poc_db-#AS "93",
ap_poc_db-#SUM (pe_policy_deduct)
ap_poc_db-#AS "92",
ap_poc_db-#SUM (pe_undercover)
ap_poc_db-#AS "127",
ap_poc_db-#SUM (pe_prorata)
ap_poc_db-#AS "126",
ap_poc_db-#SUM (pe_policy_bl_deduct)
ap_poc_db-#AS "139",
ap_poc_db-#SUM (pe_policy_bl_grosslimit)
ap_poc_db-#AS "142",
ap_poc_db-#SUM (pe_policy_limit)
ap_poc_db-#AS "128",
ap_poc_db-#SUM (pe_agg_deduct)
ap_poc_db-#AS "155"
ap_poc_db-#   FROM (SELECT SUM (se.site_limit) AS se_site_limit,
ap_poc_db(#SUM (se.cov1val)AS se_cov1val,
ap_poc_db(#SUM (se.cov2val)AS se_cov2val,
ap_poc_db(#SUM (se.cov3val)AS se_cov3val,
ap_poc_db(#SUM (se.cov4val)AS se_cov4val,
ap_poc_db(#   SUM (se.site_deduct)AS se_site_deduct,
ap_poc_db(#SUM (se.risk_count) AS se_risk_count,
ap_poc_db(#SUM (se.cov1limit)  AS se_cov1limit,
ap_poc_db(#SUM (se.cov1deduct) AS se_cov1deduct,
ap_poc_db(#SUM (se.cov2limit)  AS se_cov2limit,
ap_poc_db(#SUM (se.cov3limit)  AS se_cov3limit,
ap_poc_db(#SUM (se.cov3deduct) AS se_cov3deduct,
ap_poc_db(#SUM (se.cov4limit)  AS se_cov4limit,
ap_poc_db(#SUM (se.cov4deduct) AS se_cov4deduct,
ap_poc_db(#SUM (se.site_bl_deduct) AS se_site_bl_deduct,
ap_poc_db(#SUM (se.agg_limit)  AS se_agg_limit,
ap_poc_db(#SUM (se.site_bl_limit)  AS se_site_bl_limit,
ap_poc_db(#se.peril_id AS se_

Nested Loops

2018-01-30 Thread Kumar, Virendra
Can somebody help me avoid nested loops in below query:
--
ap_poc_db=# explain (analyze,buffers)
ap_poc_db-# select site_id, account_id FROM ap.site_exposure se
ap_poc_db-# WHERE se.portfolio_id=-1191836
ap_poc_db-# AND EXISTS (select 1 from ap.catevent_flood_sc_split sp 
where sp.migration_sourcename= 'KatRisk_SC_Flood_2015_v9' AND 
ST_Intersects(se.shape, sp.shape))
ap_poc_db-# group by site_id, account_id;

  QUERY PLAN
--
Group  (cost=23479854.04..23479880.06 rows=206 width=16) (actual 
time=1387.825..1389.134 rows=1532 loops=1)
   Group Key: se.site_id, se.account_id
   Buffers: shared hit=172041
   ->  Gather Merge  (cost=23479854.04..23479879.04 rows=205 width=16) (actual 
time=1387.823..1388.676 rows=1532 loops=1)
 Workers Planned: 5
 Workers Launched: 5
 Buffers: shared hit=172041
 ->  Group  (cost=23478853.96..23478854.27 rows=41 width=16) (actual 
time=1346.044..1346.176 rows=255 loops=6)
   Group Key: se.site_id, se.account_id
   Buffers: shared hit=864280
   ->  Sort  (cost=23478853.96..23478854.07 rows=41 width=16) 
(actual time=1346.041..1346.079 rows=255 loops=6)
 Sort Key: se.site_id, se.account_id
 Sort Method: quicksort  Memory: 37kB
 Buffers: shared hit=864280
 ->  Nested Loop Semi Join  (cost=4.53..23478852.87 rows=41 
width=16) (actual time=34.772..1345.489 rows=255 loops=6)
   Buffers: shared hit=864235
   ->  Append  (cost=0.00..156424.56 rows=123645 
width=48) (actual time=1.011..204.748 rows=102990 loops=6)
 Buffers: shared hit=154879
 ->  Parallel Seq Scan on site_exposure_1191836 
se  (cost=0.00..156424.56 rows=123645 width=48) (actual time=1.004..187.702 
rows=102990 loops=6)
   Filter: (portfolio_id = 
'-1191836'::integer)
   Buffers: shared hit=154879
   ->  Bitmap Heap Scan on catevent_flood_sc_split sp  
(cost=4.53..188.54 rows=15 width=492) (actual time=0.007..0.007 rows=0 
loops=617937)
 Recheck Cond: (se.shape && shape)
 Filter: ((migration_sourcename = 
'KatRisk_SC_Flood_2015_v9'::bpchar) AND _st_intersects(se.shape, shape))
 Rows Removed by Filter: 0
 Heap Blocks: exact=1060
 Buffers: shared hit=709356
 ->  Bitmap Index Scan on 
catevent_flood_sc_split_shape_mig_src_gix  (cost=0.00..4.52 rows=45 width=0) 
(actual time=0.005..0.005 rows=0 loops=617937)
   Index Cond: (se.shape && shape)
   Buffers: shared hit=691115
Planning time: 116.141 ms
Execution time: 1391.785 ms
(32 rows)


ap_poc_db=#

Thank you in advance!


Regards,
Virendra




This message is intended only for the use of the addressee and may contain
information that is PRIVILEGED AND CONFIDENTIAL.

If you are not the intended recipient, you are hereby notified that any
dissemination of this communication is strictly prohibited. If you have
received this communication in error, please erase all copies of the message
and its attachments and notify the sender immediately. Thank you.


RE: Why the index is not used ?

2018-10-07 Thread Kumar, Virendra
You can consider outside DB encryption which is less of worry for performance 
and data at rest will be encrypted.

Regards,
Virendra
-Original Message-
From: ROS Didier [mailto:didier@edf.fr]
Sent: Sunday, October 07, 2018 2:33 PM
To: fola...@peoplecall.com
Cc: pavel.steh...@gmail.com; pgsql-...@lists.postgresql.org; 
pgsql-performance@lists.postgresql.org; pgsql-gene...@lists.postgresql.org
Subject: RE: Why the index is not used ?

Hi Francisco

Thank you for your remark.
You're right, but it's the only procedure I found to make search on encrypted 
fields with good response times (using index) !

Regarding access to the file system, our servers are in protected network 
areas. few people can connect to it.

it's not the best solution, but we have data encryption needs and good 
performance needs too. I do not know how to do it except the specified 
procedure..
if anyone has any proposals to put this in place, I'm interested.

Thanks in advance

Best Regards
Didier ROS

-Message d'origine-
De : fola...@peoplecall.com [mailto:fola...@peoplecall.com]
Envoyé : dimanche 7 octobre 2018 17:58
À : ROS Didier 
Cc : pavel.steh...@gmail.com; pgsql-...@lists.postgresql.org; 
pgsql-performance@lists.postgresql.org; pgsql-gene...@lists.postgresql.org
Objet : Re: Why the index is not used ?

ROS:

On Sun, Oct 7, 2018 at 3:13 PM, ROS Didier  wrote:

> -INSERT INTO cartedecredit(username,cc)  SELECT 'individu ' || x.id, 
> pgp_sym_encrypt('test value ' || x.id, 'motdepasse','compress-algo=2, 
> cipher-algo=aes256') FROM generate_series(1,10) AS x(id);
> -CREATE INDEX idx_cartedecredit_cc02 ON 
> cartedecredit(pgp_sym_decrypt(cc, 'motdepasse','compress-algo=2, 
> cipher-algo=aes256'));

If my french is not too rusty you are encrypting a credit-card, and then 
storing an UNENCRYPTED copy in the index. So, getting it from the server is 
trivial for anyone with filesystem access.

Francisco Olarte.



Ce message et toutes les pièces jointes (ci-après le 'Message') sont établis à 
l'intention exclusive des destinataires et les informations qui y figurent sont 
strictement confidentielles. Toute utilisation de ce Message non conforme à sa 
destination, toute diffusion ou toute publication totale ou partielle, est 
interdite sauf autorisation expresse.

Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le 
copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si 
vous avez reçu ce Message par erreur, merci de le supprimer de votre système, 
ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support 
que ce soit. Nous vous remercions également d'en avertir immédiatement 
l'expéditeur par retour du message.

Il est impossible de garantir que les communications par messagerie 
électronique arrivent en temps utile, sont sécurisées ou dénuées de toute 
erreur ou virus.


This message and any attachments (the 'Message') are intended solely for the 
addressees. The information contained in this Message is confidential. Any use 
of information contained in this Message not in accord with its purpose, any 
dissemination or disclosure, either whole or partial, is prohibited except 
formal approval.

If you are not the addressee, you may not copy, forward, disclose or use any 
part of it. If you have received this message in error, please delete it and 
all copies from your system and notify the sender immediately by return message.

E-mail communication cannot be guaranteed to be timely secure, error or 
virus-free.



This message is intended only for the use of the addressee and may contain
information that is PRIVILEGED AND CONFIDENTIAL.

If you are not the intended recipient, you are hereby notified that any
dissemination of this communication is strictly prohibited. If you have
received this communication in error, please erase all copies of the message
and its attachments and notify the sender immediately. Thank you.