[PERFORM] FW: performance issue with a 2.5gb joinded table
Hi Listers, we migrated an oracle datawarehouse to postgresql 9.1 ( ppas 9.1.7.12 ) and are facing massive issues with response times in postgres when compared to the oracle system. Both database run on the same hardware and storage ( rhel5.8 64bit ). Oracle memory parameters are: SGA=1gb PGA=200mb Postgres currently runs with 15gb of shared buffers ( that’s because the big table in question is around 2.5gb in size and one suggestion was to increase that much so postgresql will cache the complete table. and this is the case now ). explain (analyze,buffers) SELECT test1.slsales_batch , test1.slsales_checksum , test1.slsales_reg_id , test1.slsales_prod_id , test1.slsales_date_id , test1.slsales_pos_id , test1.slsales_amt_sales_gross , test1.slsales_amt_sales_discount , test1.slsales_units_sales_gross , test1.slsales_amt_returns , test1.slsales_amt_returns_discount , test1.slsales_units_returns , (test1.slsales_amt_sales_gross - test1.slsales_amt_returns) * mgmt_fact_winratio.winratio_ratio AS slsales_amt_est_winnings , mgmt_fact_winratio.winratio_ratio AS slsales_ratio FROM mgmtt_own.test1 LEFT JOIN mgmtt_own.mgmt_fact_winratio ON mgmt_fact_winratio.winratio_date_id = test1.slsales_date_id Oracle’s explain plan looks like this: | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | | 0 | SELECT STATEMENT ||25M| 1527M| | 115K (3)| 00:23:10 | |* 1 | HASH JOIN RIGHT OUTER||25M| 1527M| 4376K| 115K (3)| 00:23:10 | | 2 | TABLE ACCESS FULL | MGMT_FACT_WINRATIO | 159K| 2498K| | 167 (5)| 00:00:03 | | 3 | TABLE ACCESS FULL | TEST1 |25M| 1139M| | 43435 (5)| 00:08:42 | Predicate Information (identified by operation id): --- 1 - access("MGMT_FACT_WINRATIO"."WINRATIO_PROD_ID"(+)="TEST1"."SLSALES_PROD_ID" AND "MGMT_FACT_WINRATIO"."WINRATIO_DATE_ID"(+)="TEST1"."SLSALES_DATE_ID") Somehow oracle seems to know that a right join is the better way to go. Postgres’s explain plan: QUERY PLAN Hash Left Join (cost=3948.52..13646089.21 rows=25262160 width=61) (actual time=260.642..81240.692 rows=25262549 loops=1) Hash Cond: ((test1.slsales_date_id = mgmt_fact_winratio.winratio_date_id) AND (test1.slsales_prod_id = mgmt_fact_winratio.winratio_prod_id)) Buffers: shared hit=306590 -> Seq Scan on test1 (cost=0.00..254148.75 rows=25262160 width=56) (actual time=0.009..15674.535 rows=25262161 loops=1) Buffers: shared hit=305430 -> Hash (cost=1582.89..1582.89 rows=157709 width=19) (actual time=260.564..260.564 rows=157709 loops=1) Buckets: 16384 Batches: 1 Memory Usage: 7855kB Buffers: shared hit=1160 -> Seq Scan on mgmt_fact_winratio (cost=0.00..1582.89 rows=157709 width=19) (actual time=0.008..114.406 rows=157709 loops=1) Buffers: shared hit=1160 Total runtime: 95762.025 ms (11 rows) Tried to modify the statement according to oracle’s plan, but this did not help: explain (analyze,buffers) SELECT test1.slsales_batch , test1.slsales_checksum , test1.slsales_reg_id , test1.slsales_prod_id , test1.slsales_date_id , test1.slsales_pos_id , test1.slsales_amt_sales_gross , test1.slsales_amt_sales_discount , test1.slsales_units_sales_gross , test1.slsales_amt_returns , test1.slsales_amt_returns_discount , test1.slsales_units_returns , (test1.slsales_amt_sales_gross - test1.slsales_amt_returns) * mgmt_fact_winratio.winratio_ratio AS slsales_amt_est_winnings , mgmt_fact_winratio.winratio_ratio AS slsales_ratio FROM mgmtt_own.test1 , mgmtt_own.mgmt_fact_winratio WHERE mgmt_fact_winratio.winratio_prod_id(+) = test1.slsales_prod_id AND mgmt_fact_winratio.winratio_date_id(+) = test1.slsales_date_id ; QUERY PLAN Hash Left Join (cost=3948.52..13646089.21 rows=25262160 width=61) (actual time=276.605..80629.400 rows=25262549 loops=1) Hash Cond: ((test1.slsales_prod_id = mgmt_fact_winratio.winratio_prod_id) AND (test1.slsales_date_id = mg
Re: [PERFORM] FW: performance issue with a 2.5gb joinded table
-Original Message- From: Heikki Linnakangas [mailto:hlinnakan...@vmware.com] Sent: Donnerstag, 3. Januar 2013 18:02 To: Daniel Westermann Cc: 'pgsql-performance@postgresql.org' Subject: Re: [PERFORM] FW: performance issue with a 2.5gb joinded table On 03.01.2013 15:30, Daniel Westermann wrote: > What additionally makes me wonder is, that the same table in oracle is taking > much less space than in postgresql: > > SQL> select sum(bytes) from dba_extents where segment_name = > SQL> 'TEST1'; > SUM(BYTES) > -- > 1610612736 > > select pg_relation_size('mgmtt_own.test1'); > pg_relation_size > -- > 2502082560 > (1 row) > > (sysdba@[local]:) [bi_dwht]> \d+ mgmtt_own.test1 > Table "mgmtt_own.test1" > Column| Type | Modifiers | Storage | > Description > --+---+---+-+- > --+---+---+-+- > --+---+---+-+- > --+---+---+-+- > --+---+---+-+- > --+---+---+-+- > --+---+---+-+- > --+---+---+-+- > --+---+---+-+- > --+---+---+-+- > --+---+---+-+- > --+---+---+-+- > --+---+---+-+- > slsales_batch| numeric(8,0) | | main| > slsales_checksum | numeric(8,0) | | main| > slsales_reg_id | numeric(8,0) | | main| > slsales_prod_id | numeric(8,0) | | main| > slsales_date_id | numeric(8,0) | | main| > slsales_pos_id | numeric(8,0) | | main| > slsales_amt_sales_gross | numeric(16,6) | | main| > slsales_amt_sales_discount | numeric(16,6) | | main| > slsales_units_sales_gross| numeric(8,0) | | main| > slsales_amt_returns | numeric(16,6) | | main| > slsales_amt_returns_discount | numeric(16,6) | | main| > slsales_units_returns| numeric(8,0) | | main| > slsales_amt_est_winnings | numeric(16,6) | | main| > Indexes: > "itest1" btree (slsales_date_id) CLUSTER, tablespace "mgmtt_idx" > "itest2" btree (slsales_prod_id), tablespace "mgmtt_idx" > Has OIDs: no > Tablespace: "mgmtt_dat" One difference is that numerics are stored more tightly packed on Oracle. Which is particularly good for Oracle as they don't have other numeric data types than number. On PostgreSQL, you'll want to use int4 for ID-fields, where possible. An int4 always takes up 4 bytes, while a numeric holding an integer value in the same range is typically 5-9 bytes. - Heikki Thanks for poiting that out, Heikki. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] FW: performance issue with a 2.5gb joinded table
-Original Message- From: Tom Lane [mailto:t...@sss.pgh.pa.us] Sent: Freitag, 4. Januar 2013 21:41 To: Heikki Linnakangas Cc: Daniel Westermann; 'pgsql-performance@postgresql.org' Subject: Re: [PERFORM] FW: performance issue with a 2.5gb joinded table Heikki Linnakangas writes: > One difference is that numerics are stored more tightly packed on > Oracle. Which is particularly good for Oracle as they don't have other > numeric data types than number. On PostgreSQL, you'll want to use int4 > for ID-fields, where possible. An int4 always takes up 4 bytes, while > a numeric holding an integer value in the same range is typically 5-9 bytes. >> Replacing those numeric(8) and numeric(16) fields with int4 and int8 would >> be greatly beneficial to comparison and hashing performance, not just table >> size. I'm a >> bit surprised that EDB's porting tools evidently don't do >> this automatically (I infer from the reference to PPAS that the OP is using >> EDB ...) >> >> regards, tom lane Thanks, tom. Any clue where there remaining around 500mb difference come from ? converted all the numeric(8) to int and this saved around 380mb of storage and around 10 secs exectution time... both databases have their files on standard ext3, same fs options. Given that the table has around 25'000'000 rows this is still approx. 20 bytes more per row on average Regards Daniel -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance