Re: [PERFORM] Hardware suggestions for maximum read performance

Mike McCann Mon, 13 May 2013 16:59:03 -0700

On May 13, 2013, at 4:24 PM, Jeff Janes wrote:

> On Mon, May 13, 2013 at 3:36 PM, Mike McCann <mcc...@mbari.org> wrote:
> 
> Increasing work_mem to 355 MB improves the performance by a factor of 2:
> 
> stoqs_march2013_s=# set work_mem='355MB';
> SET
> stoqs_march2013_s=# explain analyze select * from stoqs_measuredparameter 
> order by datavalue;
>                                                                 QUERY PLAN    
>                                                             
> ------------------------------------------------------------------------------------------------------------------------------------------
>  Sort  (cost=422106.15..430560.68 rows=3381814 width=20) (actual 
> time=2503.078..2937.130 rows=3381814 loops=1)
>    Sort Key: datavalue
>    Sort Method: quicksort  Memory: 362509kB
>    ->  Seq Scan on stoqs_measuredparameter  (cost=0.00..55359.14 rows=3381814 
> width=20) (actual time=0.016..335.745 rows=3381814 loops=1)
>  Total runtime: 3094.601 ms
> (5 rows)
> 
> I tried changing random_page_cost to from 4 to 1 and saw no change.
> 
> I'm wondering now what changes might get this query to run in less than one 
> second.  
> 
> 
> I think you are worrying about the wrong thing here.  What is a web app going 
> to do with 3,381,814 rows, once it obtains them?  Your current testing is not 
> testing the time it takes to stream that data to the client, or for the 
> client to do something meaningful with that data.
> 
> If you only plan to actually fetch a few dozen of those rows, then you 
> probably need to incorporate that into your test, either by using a LIMIT, or 
> by using a mock-up of the actual application to do some timings.
> 
> Also, what is the type and collation of the column you are sorting on?  
> non-'C' collations of text columns sort about 3 times slower than 'C' 
> collation does.
> 
>  
> If all the data is in memory, then will faster CPU and memory be the things 
> that help?
> 
> Yes, those would help (it is not clear to me which of the two would help 
> more), but I think you need to rethink your design of sending the entire 
> database table to the application server for each page-view.
> 
> 
> Cheers,
> 
> Jeff


Hi Jeff,

The datavalue column is double precision:

stoqs_march2013_s=# \d+ stoqs_measuredparameter
                                              Table 
"public.stoqs_measuredparameter"
     Column     |       Type       |                              Modifiers     
                          | Storage | Description 
----------------+------------------+----------------------------------------------------------------------+---------+-------------
 id             | integer          | not null default 
nextval('stoqs_measuredparameter_id_seq'::regclass) | plain   | 
 measurement_id | integer          | not null                                   
                          | plain   | 
 parameter_id   | integer          | not null                                   
                          | plain   | 
 datavalue      | double precision | not null                                   
                          | plain   | 
Indexes:
    "stoqs_measuredparameter_pkey" PRIMARY KEY, btree (id)
    "stoqs_measuredparameter_measurement_id_parameter_id_key" UNIQUE 
CONSTRAINT, btree (measurement_id, parameter_id)
    "stoqs_measuredparameter_datavalue" btree (datavalue)
    "stoqs_measuredparameter_measurement_id" btree (measurement_id)
    "stoqs_measuredparameter_parameter_id" btree (parameter_id)
Foreign-key constraints:
    "stoqs_measuredparameter_measurement_id_fkey" FOREIGN KEY (measurement_id) 
REFERENCES stoqs_measurement(id) DEFERRABLE INITIALLY DEFERRED
    "stoqs_measuredparameter_parameter_id_fkey" FOREIGN KEY (parameter_id) 
REFERENCES stoqs_parameter(id) DEFERRABLE INITIALLY DEFERRED
Has OIDs: no


Thanks for the suggestion and advice to examine the web app performance.  We've 
actually taken quite a few steps to optimize how the web app works. The example 
query I provided is a simple worst-case one that we can use to help us decide 
on the proper hardware.  An actual query performed by the web app is:

stoqs_march2013_s=# explain analyze SELECT stoqs_measuredparameter.id,
stoqs_march2013_s-#        stoqs_parameter.name AS parameter__name,
stoqs_march2013_s-#        stoqs_parameter.standard_name AS 
parameter__standard_name,
stoqs_march2013_s-#        stoqs_measurement.depth AS measurement__depth,
stoqs_march2013_s-#        stoqs_measurement.geom AS measurement__geom,
stoqs_march2013_s-#        stoqs_instantpoint.timevalue AS 
measurement__instantpoint__timevalue,
stoqs_march2013_s-#        stoqs_platform.name AS 
measurement__instantpoint__activity__platform__name,
stoqs_march2013_s-#        stoqs_measuredparameter.datavalue AS datavalue,
stoqs_march2013_s-#        stoqs_parameter.units AS parameter__units
stoqs_march2013_s-# FROM stoqs_parameter p1,
stoqs_march2013_s-#      stoqs_measuredparameter
stoqs_march2013_s-#      INNER JOIN stoqs_measurement ON 
(stoqs_measuredparameter.measurement_id = stoqs_measurement.id)
stoqs_march2013_s-#      INNER JOIN stoqs_instantpoint ON 
(stoqs_measurement.instantpoint_id = stoqs_instantpoint.id)
stoqs_march2013_s-#      INNER JOIN stoqs_parameter ON 
(stoqs_measuredparameter.parameter_id = stoqs_parameter.id)
stoqs_march2013_s-#      INNER JOIN stoqs_activity ON 
(stoqs_instantpoint.activity_id = stoqs_activity.id)
stoqs_march2013_s-#      INNER JOIN stoqs_platform ON 
(stoqs_activity.platform_id = stoqs_platform.id)
stoqs_march2013_s-#      INNER JOIN stoqs_measuredparameter mp1 ON 
mp1.measurement_id = stoqs_measuredparameter.measurement_id
stoqs_march2013_s-# WHERE (p1.name = 'sea_water_sigma_t')
stoqs_march2013_s-#       AND (mp1.datavalue > 25.19)
stoqs_march2013_s-#       AND (mp1.datavalue < 26.01)
stoqs_march2013_s-#       AND (mp1.parameter_id = p1.id)
stoqs_march2013_s-#       AND (stoqs_instantpoint.timevalue <= '2013-03-17 
19:05:06'
stoqs_march2013_s(#       AND stoqs_instantpoint.timevalue >= '2013-03-17 
15:35:13'
stoqs_march2013_s(#       AND stoqs_parameter.name IN ('fl700_uncorr')
stoqs_march2013_s(#       AND stoqs_measurement.depth >= -1.88
stoqs_march2013_s(#       AND stoqs_platform.name IN ('dorado')
stoqs_march2013_s(#       AND stoqs_measurement.depth <= 83.57)
stoqs_march2013_s-# ORDER BY stoqs_activity.name ASC, 
stoqs_instantpoint.timevalue ASC;
                                                                                
                QUERY PLAN                                                      
                 
                          
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------
 Sort  (cost=10741.41..10741.42 rows=1 width=1282) (actual 
time=770.211..770.211 rows=0 loops=1)
   Sort Key: stoqs_activity.name, stoqs_instantpoint.timevalue
   Sort Method: quicksort  Memory: 25kB
   ->  Hash Join  (cost=3002.89..10741.40 rows=1 width=1282) (actual 
time=770.200..770.200 rows=0 loops=1)
         Hash Cond: (stoqs_instantpoint.activity_id = stoqs_activity.id)
         ->  Nested Loop  (cost=2983.69..10722.19 rows=3 width=954) (actual 
time=770.036..770.036 rows=0 loops=1)
               ->  Nested Loop  (cost=2983.69..9617.36 rows=191 width=946) 
(actual time=91.369..680.072 rows=20170 loops=1)
                     ->  Hash Join  (cost=2983.69..8499.07 rows=193 width=842) 
(actual time=91.346..577.633 rows=20170 loops=1)
                           Hash Cond: (stoqs_measuredparameter.parameter_id = 
stoqs_parameter.id)
                           ->  Nested Loop  (cost=2982.38..8478.47 rows=4628 
width=24) (actual time=91.280..531.408 rows=197746 loops=1)
                                 ->  Nested Loop  (cost=2982.38..4862.37 
rows=512 width=4) (actual time=91.202..116.140 rows=20170 loops=1)
                                       ->  Seq Scan on stoqs_parameter p1  
(cost=0.00..1.30 rows=1 width=4) (actual time=0.002..0.011 rows=1 loops=1)
                                             Filter: ((name)::text = 
'sea_water_sigma_t'::text)
                                       ->  Bitmap Heap Scan on 
stoqs_measuredparameter mp1  (cost=2982.38..4854.40 rows=534 width=8) (actual 
time=91.194..109.846 rows=20170 loop
s=1)
                                             Recheck Cond: ((datavalue > 
25.19::double precision) AND (datavalue < 26.01::double precision) AND 
(parameter_id = p1.id))
                                             ->  BitmapAnd  
(cost=2982.38..2982.38 rows=534 width=0) (actual time=90.794..90.794 rows=0 
loops=1)
                                                   ->  Bitmap Index Scan on 
stoqs_measuredparameter_datavalue  (cost=0.00..259.54 rows=12292 width=0) 
(actual time=62.769..62.769
 rows=23641 loops=1)
                                                         Index Cond: 
((datavalue > 25.19::double precision) AND (datavalue < 26.01::double 
precision))
                                                   ->  Bitmap Index Scan on 
stoqs_measuredparameter_parameter_id  (cost=0.00..2719.38 rows=147035 width=0) 
(actual time=27.412..2
7.412 rows=34750 loops=1)
                                                         Index Cond: 
(parameter_id = p1.id)
                                 ->  Index Scan using 
stoqs_measuredparameter_measurement_id on stoqs_measuredparameter  
(cost=0.00..6.98 rows=7 width=20) (actual time=0.008..0.
017 rows=10 loops=20170)
                                       Index Cond: (measurement_id = 
mp1.measurement_id)
                           ->  Hash  (cost=1.30..1.30 rows=1 width=826) (actual 
time=0.012..0.012 rows=1 loops=1)
                                 Buckets: 1024  Batches: 1  Memory Usage: 1kB
                                 ->  Seq Scan on stoqs_parameter  
(cost=0.00..1.30 rows=1 width=826) (actual time=0.007..0.010 rows=1 loops=1)
                                       Filter: ((name)::text = 
'fl700_uncorr'::text)
                     ->  Index Scan using stoqs_measurement_pkey on 
stoqs_measurement  (cost=0.00..5.78 rows=1 width=116) (actual time=0.004..0.004 
rows=1 loops=20170)
                           Index Cond: (id = 
stoqs_measuredparameter.measurement_id)
                           Filter: ((depth >= (-1.88)::double precision) AND 
(depth <= 83.57::double precision))
               ->  Index Scan using stoqs_instantpoint_pkey on 
stoqs_instantpoint  (cost=0.00..5.77 rows=1 width=16) (actual time=0.004..0.004 
rows=0 loops=20170)
                     Index Cond: (id = stoqs_measurement.instantpoint_id)
                     Filter: ((timevalue <= '2013-03-17 19:05:06-07'::timestamp 
with time zone) AND (timevalue >= '2013-03-17 15:35:13-07'::timestamp with time 
zone))
         ->  Hash  (cost=18.82..18.82 rows=30 width=336) (actual 
time=0.151..0.151 rows=7 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 1kB
               ->  Hash Join  (cost=1.09..18.82 rows=30 width=336) (actual 
time=0.035..0.145 rows=7 loops=1)
                     Hash Cond: (stoqs_activity.platform_id = stoqs_platform.id)
                     ->  Seq Scan on stoqs_activity  (cost=0.00..16.77 rows=177 
width=66) (actual time=0.005..0.069 rows=177 loops=1)
                     ->  Hash  (cost=1.07..1.07 rows=1 width=278) (actual 
time=0.014..0.014 rows=1 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 1kB
                           ->  Seq Scan on stoqs_platform  (cost=0.00..1.07 
rows=1 width=278) (actual time=0.008..0.012 rows=1 loops=1)
                                 Filter: ((name)::text = 'dorado'::text)
 Total runtime: 770.445 ms
(42 rows)


We assume that steps taken to improve the worst-case query scenario will also 
improve these kind of queries.  If anything above pops out as needing better 
planning please let us know that too!

Thanks,
Mike

--
Mike McCann
Software Engineer
Monterey Bay Aquarium Research Institute
7700 Sandholdt Road
Moss Landing, CA 95039-9644
Voice: 831.775.1769  Fax: 831.775.1736 http://www.mbari.org

Re: [PERFORM] Hardware suggestions for maximum read performance

Reply via email to