From: [email protected] [mailto:[email protected]] On
Behalf Of Robert Buck
Sent: Monday, October 01, 2012 8:47 PM
To: [email protected]
Subject: [SQL] [noob] How to optimize this double pivot query?
I have two tables that contain key-value data that I want to combine in pivoted
form into a single result set. They are related to two separate tables.
The tables are: test_results, test_variables, metric_def, metadata_key. The
latter two tables are enum-like tables, basic descriptors of data stored in
other tables. The former two tables are basically key-value tables (with ids as
well); these k-v tables are related to the latter two tables via foreign keys.
The following SQL takes about 11 seconds to run on a high-end laptop. The
largest table is about 54k records, pretty puny.
Can someone provide a hint as to why this is so slow? Again, I am a noob to
SQL, so the SQL is probably poorly written.
Your query, while maybe not great, isn’t the cause of your problem. It is the
table schema, specifically the “key-value” aspect, that is killing you.
You may want to try:
SELECT *
FROM (SELECT id FROM …) id_master
NATURAL LEFT JOIN (SELECT id, field_value AS … FROM … WHERE fieldtype = ‘’) f1
NATURAL LEFT JOIN (SELECT id, field_value AS … FROM … WHERE fieldtype = ‘’) f2
[repeat one left join for every field; though you will then need to decide
if/how to deal with NULL – not that you are currently doing anything special
anyway…]
Mainly the above avoids the use of “max()” and instead uses direct joins
between the relevant tables. I have no clue whether that will improve things
but if you are going to lie in this bed you should at least try different
positions.
The better option is to educate yourself on better ways of constructing the
tables so that you do not have to write this kind of god-awful query. In some
cases key-value has merit but usually only when done in moderation. Not for
the entire database. You likely should simply have a table that looks like the
result of the query below.
As a second (not necessarily mutually exclusive) alternative: install and use
the hstore extension.
David J.
Thanks in advance,
Bob
select
t.id_name,
max(t.begin_time) as begin_time,
max(t.end_time) as end_time,
max(case when (m.id_name = 'package-version') then v.value end) as
package_version,
max(case when (m.id_name = 'database-vendor') then v.value end) as
database_vendor,
max(case when (m.id_name = 'bean-name') then v.value end) as bean_name,
max(case when (m.id_name = 'request-distribution') then v.value end) as
request_distribution,
max(case when (m.id_name = 'ycsb-workload') then v.value end) as
ycsb_workload,
max(case when (m.id_name = 'record-count') then v.value end) as
record_count,
max(case when (m.id_name = 'transaction-engine-count') then v.value end) as
transaction_engine_count,
max(case when (m.id_name = 'transaction-engine-maxmem') then v.value end)
as transaction_engine_maxmem,
max(case when (m.id_name = 'storage-manager-count') then v.value end) as
storage_manager_count,
max(case when (m.id_name = 'test-instance-count') then v.value end) as
test_instance_count,
max(case when (m.id_name = 'operation-count') then v.value end) as
operation_count,
max(case when (m.id_name = 'update-percent') then v.value end) as
update_percent,
max(case when (m.id_name = 'thread-count') then v.value end) as
thread_count,
max(case when (d.id_name = 'tps') then r.value end) as tps,
max(case when (d.id_name = 'Memory') then r.value end) as memory,
max(case when (d.id_name = 'DiskWritten') then r.value end) as disk_written,
max(case when (d.id_name = 'PercentUserTime') then r.value end) as
percent_user,
max(case when (d.id_name = 'PercentCpuTime') then r.value end) as
percent_cpu,
max(case when (d.id_name = 'UserMilliseconds') then r.value end) as
user_milliseconds,
max(case when (d.id_name = 'YcsbUpdateLatencyMicrosecs') then r.value end)
as update_latency,
max(case when (d.id_name = 'YcsbReadLatencyMicrosecs') then r.value end) as
read_latency,
max(case when (d.id_name = 'Updates') then r.value end) as updates,
max(case when (d.id_name = 'Deletes') then r.value end) as deletes,
max(case when (d.id_name = 'Inserts') then r.value end) as inserts,
max(case when (d.id_name = 'Commits') then r.value end) as commits,
max(case when (d.id_name = 'Rollbacks') then r.value end) as rollbacks,
max(case when (d.id_name = 'Objects') then r.value end) as objects,
max(case when (d.id_name = 'ObjectsCreated') then r.value end) as
objects_created,
max(case when (d.id_name = 'FlowStalls') then r.value end) as flow_stalls,
max(case when (d.id_name = 'NodeApplyPingTime') then r.value end) as
node_apply_ping_time,
max(case when (d.id_name = 'NodePingTime') then r.value end) as
node_ping_time,
max(case when (d.id_name = 'ClientCncts') then r.value end) as
client_connections,
max(case when (d.id_name = 'YcsbSuccessCount') then r.value end) as
success_count,
max(case when (d.id_name = 'YcsbWarnCount') then r.value end) as warn_count,
max(case when (d.id_name = 'YcsbFailCount') then r.value end) as fail_count
from test as t
left join test_results as r on r.test_id = t.id
left join test_variables as v on v.test_id = t.id
left join metric_def as d on d.id = r.metric_def_id
left join metadata_key as m on m.id = v.metadata_key_id
group by t.id_name
;
"GroupAggregate (cost=5.87..225516.43 rows=926 width=81)"
" -> Nested Loop Left Join (cost=5.87..53781.24 rows=940964 width=81)"
" -> Nested Loop Left Join (cost=1.65..1619.61 rows=17235 width=61)"
" -> Index Scan using test_uc on test t (cost=0.00..90.06
rows=926 width=36)"
" -> Hash Right Join (cost=1.65..3.11 rows=19 width=29)"
" Hash Cond: (m.id = v.metadata_key_id)"
" -> Seq Scan on metadata_key m (cost=0.00..1.24 rows=24
width=21)"
" -> Hash (cost=1.41..1.41 rows=19 width=16)"
" -> Index Scan using test_variables_test_id_idx on
test_variables v (cost=0.00..1.41 rows=19 width=16)"
" Index Cond: (test_id = t.id)"
" -> Hash Right Join (cost=4.22..6.69 rows=55 width=28)"
" Hash Cond: (d.id = r.metric_def_id)"
" -> Seq Scan on metric_def d (cost=0.00..1.71 rows=71 width=20)"
" -> Hash (cost=3.53..3.53 rows=55 width=16)"
" -> Index Scan using test_results_test_id_idx on
test_results r (cost=0.00..3.53 rows=55 width=16)"
" Index Cond: (test_id = t.id)"