Re: [PERFORM] query performance with hstore vs. non-hstore

Huang, Suya Mon, 01 Sep 2014 17:58:42 -0700

Hi Pavel,

See output of explain (analyze,timing off), the total runtime is close to the 
one enable timing.


dev=# EXPLAIN (ANALYZE, TIMING OFF)  select cha_type,sum(visits) from 
weekly_non_hstore a  join seg1 b on a.ref_id=b.ref_id group by cha_type order 
by sum(visits) desc;
                                                                QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Sort  (cost=3674118.09..3674476.91 rows=143528 width=27) (actual rows=3639539 
loops=1)
   Sort Key: (sum(a.visits))
   Sort Method: quicksort  Memory: 391723kB
   ->  HashAggregate  (cost=3660388.94..3661824.22 rows=143528 width=27) 
(actual rows=3639539 loops=1)
         ->  Hash Join  (cost=12029.58..3301288.46 rows=71820096 width=27) 
(actual rows=36962761 loops=1)
               Hash Cond: ((a.ref_id)::text = (b.ref_id)::text)
               ->  Seq Scan on weekly_non_hstore a  (cost=0.00..1852856.96 
rows=71820096 width=75) (actual rows=71818882 loops=1)
               ->  Hash  (cost=7382.59..7382.59 rows=371759 width=47) (actual 
rows=371759 loops=1)
                     Buckets: 65536  Batches: 1  Memory Usage: 28951kB
                     ->  Seq Scan on seg1 b  (cost=0.00..7382.59 rows=371759 
width=47) (actual rows=371759 loops=1)
Total runtime: 42914.194 ms
(11 rows)


dev=# explain (analyze, timing off) select cha_type, sum(visits) from (select 
(each(visits)).key as cha_type,(each(visits)).value::numeric as visits from 
weekly_hstore a  join seg1 b on a.ref_id=b.ref_id )foo  group by cha_type  
order by sum(visits) desc;
                                                             QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Sort  (cost=7599039.89..7599040.39 rows=200 width=64) (actual rows=3639539 
loops=1)
   Sort Key: (sum((((each(a.visits)).value)::numeric)))
   Sort Method: quicksort  Memory: 394779kB
   ->  HashAggregate  (cost=7599030.24..7599032.24 rows=200 width=64) (actual 
rows=3639539 loops=1)
         ->  Hash Join  (cost=12029.58..2022645.24 rows=371759000 width=186) 
(actual rows=36962761 loops=1)
               Hash Cond: ((a.ref_id)::text = (b.ref_id)::text)
               ->  Seq Scan on weekly_hstore a  (cost=0.00..133321.14 
rows=1292314 width=232) (actual rows=1292314 loops=1)
               ->  Hash  (cost=7382.59..7382.59 rows=371759 width=47) (actual 
rows=371759 loops=1)
                     Buckets: 65536  Batches: 1  Memory Usage: 28951kB
                     ->  Seq Scan on seg1 b  (cost=0.00..7382.59 rows=371759 
width=47) (actual rows=371759 loops=1)
Total runtime: 69521.570 ms
(11 rows)

Thanks,
Suya

From: Pavel Stehule [mailto:[email protected]]
Sent: Monday, September 01, 2014 5:07 PM
To: Huang, Suya
Cc: [email protected]
Subject: Re: [PERFORM] query performance with hstore vs. non-hstore



2014-09-01 8:54 GMT+02:00 Huang, Suya 
<[email protected]<mailto:[email protected]>>:
Thank you Pavel.

The cost of unpacking hstore comparing to non-hstore could be calculated by:
Seq scan on hstore table + hash join with seg1 table:
Hstore: 416.741+ 34619.879 =~34 seconds
Non-hstore: 8858.594 +26477.652 =~ 34 seconds

The subsequent hash-aggregate and sort operation should be working on the 
unpacked hstore rows which has same row counts as non-hstore table. however, 
timing on those operations actually makes the big difference.

I don’t quite get why…

These values can be messy -- timing in EXPLAIN ANALYZE has relative big impact 
but different for some methods
try to watch complete time for EXPLAIN (ANALYZE, TIMING OFF)


Thanks,
Suya

From: Pavel Stehule 
[mailto:[email protected]<mailto:[email protected]>]
Sent: Monday, September 01, 2014 4:22 PM
To: Huang, Suya
Cc: [email protected]<mailto:[email protected]>
Subject: Re: [PERFORM] query performance with hstore vs. non-hstore

Hi
In this use case hstore should not help .. there is relative high overhead 
related with unpacking hstore -- so classic schema is better.
Hstore should not to replace well normalized schema - it should be a replace 
for some semi normalized structures as EAV.
Hstore can have some profit from TOAST .. comprimation, less system data 
overhead, but this advantage started from some length of data. You should to 
see this benefit on table size. When table with HStore is less than without, 
then there is benefit of Hstore. Last benefit of Hstore are indexes over tuple 
(key, value) .. but you don't use it.
Regards

Pavel

2014-09-01 8:10 GMT+02:00 Huang, Suya 
<[email protected]<mailto:[email protected]>>:
Hi ,

I’m tweaking table layout to get better performance of query. One table doesn’t 
use hstore but expand all metrics of cha_type to different rows. The other 
table has hstore for metrics column as cha_type->metrics so it has less records 
than the first one.

I would be expecting the query on seconds table has better performance than the 
first one. However, it’s not the case at all. I’m wondering if there’s 
something wrong with my execution plan? With the hstore table, the optimizer 
has totally wrong estimation on row counts at hash aggregate stage and it takes 
34 seconds on hash-join,25 seconds on hash-aggregate, 10 seconds on sort. 
However, with non-hstore table, it takes 17 seconds on hash join, 18 seconds on 
hashaggregate and 2 seconds on sort.

Can someone help me to explain why this is happening? And is there a way to 
fine-tune the query?

Table structure

dev=# \d+ weekly_non_hstore
                     Table "test.weekly_non_hstore"
  Column  |          Type          | Modifiers | Storage  | Stats target | 
Description
----------+------------------------+-----------+----------+--------------+-------------
date     | date                   |           | plain    |              |
ref_id    | character varying(256) |           | extended |              |
cha_typel  | text                   |           | extended |              |
visits   | double precision       |           | plain    |              |
pages    | double precision       |           | plain    |              |
duration | double precision       |           | plain    |              |
Has OIDs: no
Tablespace: "tbs_data"

dev=# \d+ weekly_hstore
                       Table "test.weekly_hstore"
  Column  |          Type          | Modifiers | Storage  | Stats target | 
Description
----------+------------------------+-----------+----------+--------------+-------------
date     | date                   |           | plain    |              |
ref_id    | character varying(256) |           | extended |              |
visits   | hstore                 |           | extended |              |
pages    | hstore                 |           | extended |              |
duration | hstore                 |           | extended |              |
Has OIDs: no
Tablespace: "tbs_data"

dev=# select count(*) from weekly_non_hstore;
  count
----------
71818882
(1 row)


dev=# select count(*) from weekly_hstore;
  count
---------
1292314
(1 row)


Query
dev=# explain analyze select cha_type,sum(visits) from weekly_non_hstore a  
join seg1 b on a.ref_id=b.ref_id group by cha_type order by sum(visits) desc;
                                                                           
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort  (cost=3674073.37..3674431.16 rows=143115 width=27) (actual 
time=47520.637..47969.658 rows=3639539 loops=1)
   Sort Key: (sum(a.visits))
   Sort Method: quicksort  Memory: 391723kB
   ->  HashAggregate  (cost=3660386.70..3661817.85 rows=143115 width=27) 
(actual time=43655.637..44989.202 rows=3639539 loops=1)
         ->  Hash Join  (cost=12029.58..3301286.54 rows=71820032 width=27) 
(actual time=209.789..26477.652 rows=36962761 loops=1)
               Hash Cond: ((a.ref_id)::text = (b.ref_id)::text)
               ->  Seq Scan on weekly_non_hstore a  (cost=0.00..1852856.32 
rows=71820032 width=75) (actual time=0.053..8858.594 rows=71818882 loops=1)
               ->  Hash  (cost=7382.59..7382.59 rows=371759 width=47) (actual 
time=209.189..209.189 rows=371759 loops=1)
                     Buckets: 65536  Batches: 1  Memory Usage: 28951kB
                     ->  Seq Scan on seg1 b  (cost=0.00..7382.59 rows=371759 
width=47) (actual time=0.014..64.695 rows=371759 loops=1)
Total runtime: 48172.405 ms
(11 rows)

Time: 48173.569 ms

dev=# explain analyze select cha_type, sum(visits) from (select 
(each(visits)).key as cha_type,(each(visits)).value::numeric as visits from 
weekly_hstore a  join seg1 b on a.ref_id=b.ref_id )foo  group by cha_type  
order by sum(visits) desc;
                                                                       QUERY 
PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
Sort  (cost=7599039.89..7599040.39 rows=200 width=64) (actual 
time=70424.561..70986.202 rows=3639539 loops=1)
   Sort Key: (sum((((each(a.visits)).value)::numeric)))
   Sort Method: quicksort  Memory: 394779kB
   ->  HashAggregate  (cost=7599030.24..7599032.24 rows=200 width=64) (actual 
time=59267.120..60502.647 rows=3639539 loops=1)
         ->  Hash Join  (cost=12029.58..2022645.24 rows=371759000 width=184) 
(actual time=186.140..34619.879 rows=36962761 loops=1)
               Hash Cond: ((a.ref_id)::text = (b.ref_id)::text)
               ->  Seq Scan on weekly_hstore a  (cost=0.00..133321.14 
rows=1292314 width=230) (actual time=0.107..416.741 rows=1292314 loops=1)
               ->  Hash  (cost=7382.59..7382.59 rows=371759 width=47) (actual 
time=185.742..185.742 rows=371759 loops=1)
                     Buckets: 65536  Batches: 1  Memory Usage: 28951kB
                     ->  Seq Scan on seg1 b  (cost=0.00..7382.59 rows=371759 
width=47) (actual time=0.016..62.123 rows=371759 loops=1)
Total runtime: 71177.675 ms

Re: [PERFORM] query performance with hstore vs. non-hstore

Reply via email to