On Wed, 11 Jan 2012 19:40:34 -0500, Andres wrote:
> Unfortunately the pastebins where raptelan provided plans expired by
> now... Perhaps he can provide them again?
Sure, the original curiosity I noticed was that adjusting the block
size of results returned by the CTE had widely different effects, 5000
seemed to be some sort of "magic number", while either 6000 or 4000
worked poorly. Originally, our design used blocks of 100,000.
I then noticed a regression with the 5,000 block size. The difference
in the queries between a prototype (that was pretty fast (5-10s) and
what was generated (slow, (5m+)) was that the order of columns in one
of the join conditions in the main query was reversed. ON
source.column = joined.column was slow, while ON joined.column =
source.column was fast. Apparently this is enough to get a different
possible plan to hit the planner first, while another slow plan with
nearly identical estimates is hitting the planner first in other cases.
Attached are three files - one shows the fast plan, another the slow
plan, and another with the query in question. The ON clause where
reversal makes a difference is the td_13 one. I use a different range
in the CTE for both queries as otherwise filesystem cache makes the
timings look better, but in both cases, the CTE returns exactly 5,000
results.
Yes, the query could be written a lot better, but currently it's
generated this way to conform to an expectation of user-defined custom
where clauses that do not qualify column names. Breaking that
compatibility and redoing this better is a longer-term plan.
Please let me know if I'm omitting any important details.
Regards,
--
Casey Allen Shobe | Senior Software Engineer/DBA
Message Systems | http://messagesystems.com
casey.sh...@messagesystems.com | 443-656-3311 x248
with "cte_contacts" as (
select distinct on ("c"."contactid")
"c"."contactid" "id"
from "contacts" "c"
join "domains" "d"
using ("domainid")
join "contactdetails" "cd"
using ("contactid")
join "contenttypes" "ct"
using ("contenttypeid")
join "activemailings_0000230" "a"
using ("contactid")
left join "activemailingsgenerated_0000230" "g"
using ("progresscounter")
where "a"."progresscounter" between 415000 and 419999
and "g"."contactid" is null
order by "c"."contactid"
)
select distinct on ("cd"."contactid", "cd"."progresscounter")
"cd".*,
td_13.td_13,
relation_4.xd_mw_subscriber_domainname
from (
select "am"."progresscounter",
"am"."json",
"am"."segmentid",
"c"."localpart"||'@'||"d"."domainname" as "emailaddress",
"c"."localpart",
"d"."domainname" as "domain",
"c"."statusid",
"d"."domainid",
"d"."domainname",
"ct"."description" as "contenttype",
"cd".*,
"cd"."firstname"||' '||"cd"."lastname" as "fullname"
from "contacts" "c"
join "domains" "d"
using ("domainid")
join "contactdetails" "cd"
using ("contactid")
join "contenttypes" "ct"
using ("contenttypeid")
join "activemailings_0000230" "am"
using ("contactid")
where "c"."contactid" in (select "id" from "cte_contacts")
) "cd"
left join (
select "contactid",
"tagid" td_13
from "tagdetails"
where "tagid" = '13'
and "contactid" in (select "id" from "cte_contacts")
) as td_13
on td_13."contactid" = "cd"."contactid"
left join (
select "join_contactid",
mw_subscriber.subscriberid as xd_mw_subscriber_subscriberid,
mw_subscriber.countryid as xd_mw_subscriber_countryid,
mw_subscriber.email as xd_mw_subscriber_email,
mw_subscriber.firstname as xd_mw_subscriber_firstname,
mw_subscriber.lastname as xd_mw_subscriber_lastname,
mw_subscriber.address1 as xd_mw_subscriber_address1,
mw_subscriber.address2 as xd_mw_subscriber_address2,
mw_subscriber.city as xd_mw_subscriber_city,
mw_subscriber.stateorprovince as xd_mw_subscriber_stateorprovince,
mw_subscriber.postalcode as xd_mw_subscriber_postalcode,
mw_subscriber.workphone as xd_mw_subscriber_workphone,
mw_subscriber.homephone as xd_mw_subscriber_homephone,
mw_subscriber.dateofbirth as xd_mw_subscriber_dateofbirth,
mw_subscriber.gender as xd_mw_subscriber_gender,
mw_subscriber.issendable as xd_mw_subscriber_issendable,
mw_subscriber.subscribertypeid as xd_mw_subscriber_subscribertypeid,
mw_subscriber.md5_email as xd_mw_subscriber_md5_email,
mw_subscriber.domainname as xd_mw_subscriber_domainname,
mw_subscriber.isdeleted as xd_mw_subscriber_isdeleted,
mw_subscriber.datecreated as xd_mw_subscriber_datecreated,
mw_subscriber.datemodified as xd_mw_subscriber_datemodified
from (
select relation_1."xd_contacts_contactid" as "join_contactid",
subscriber_map.subscriberid as xd_subscriber_map_subscriberid,
subscriber_map.contactid as xd_subscriber_map_contactid
from (
select contacts.contactid as xd_contacts_contactid,
contacts.statusid as xd_contacts_statusid,
contacts.localpart as xd_contacts_localpart,
contacts.domainid as xd_contacts_domainid,
"contacts"."contactid" as "join_contactid"
from org0000001.contacts
where "contactid" in (select "id" from "cte_contacts")
) relation_1
left join relational.subscriber_map
on xd_contacts_contactid = subscriber_map.contactid
) relation_3
left join relational.mw_subscriber
on xd_subscriber_map_subscriberid = mw_subscriber.subscriberid
) relation_4
on relation_4."join_contactid" = "cd"."contactid"
where "cd"."statusid" = 1
and (
(
( td_13 is not null)
)
)
order by "cd"."progresscounter";
Unique (cost=4101.93..4101.94 rows=1 width=1370) (actual
time=4364.751..4368.303 rows=5000 loops=1)
CTE cte_contacts
-> Unique (cost=1795.10..1795.22 rows=25 width=8) (actual
time=811.605..815.486 rows=5000 loops=1)
-> Sort (cost=1795.10..1795.16 rows=25 width=8) (actual
time=811.604..812.618 rows=5000 loops=1)
Sort Key: c.contactid
Sort Method: quicksort Memory: 427kB
-> Nested Loop (cost=41.50..1794.52 rows=25 width=8) (actual
time=34.992..807.514 rows=5000 loops=1)
-> Nested Loop (cost=41.50..1639.66 rows=25 width=12)
(actual time=34.978..210.534 rows=5000 loops=1)
-> Nested Loop (cost=41.50..950.43 rows=25
width=16) (actual time=34.893..148.094 rows=5000 loops=1)
Join Filter: (cd.contenttypeid =
ct.contenttypeid)
-> Nested Loop (cost=41.50..948.27 rows=25
width=18) (actual time=34.873..132.554 rows=5000 loops=1)
-> Hash Left Join
(cost=41.50..259.01 rows=25 width=8) (actual time=34.785..45.214 rows=5000
loops=1)
Hash Cond: (a.progresscounter =
g.progresscounter)
Filter: (g.contactid IS NULL)
-> Index Scan using
activemailings_0000230_pc_idx on activemailings_0000230 a (cost=0.00..184.44
rows=5085 width=12) (act
ual time=22.838..28.119 rows=5000 loops=1)
Index Cond:
((progresscounter >= 415000) AND (progresscounter <= 419999))
-> Hash (cost=24.00..24.00
rows=1400 width=12) (actual time=11.895..11.895 rows=1 loops=1)
Buckets: 1024 Batches: 1
Memory Usage: 1kB
-> Seq Scan on
activemailingsgenerated_0000230 g (cost=0.00..24.00 rows=1400 width=12)
(actual time=11.884..11.885
rows=1 loops=1)
-> Index Scan using
contactdetails_pkey on contactdetails cd (cost=0.00..27.56 rows=1 width=10)
(actual time=0.015..0.016 rows=
1 loops=5000)
Index Cond: (cd.contactid =
a.contactid)
-> Materialize (cost=0.00..1.04 rows=3
width=2) (actual time=0.000..0.001 rows=3 loops=5000)
-> Seq Scan on contenttypes ct
(cost=0.00..1.03 rows=3 width=2) (actual time=0.010..0.012 rows=3 loops=1)
-> Index Scan using idx_contacts1 on contacts c
(cost=0.00..27.56 rows=1 width=12) (actual time=0.011..0.011 rows=1 loops=5000)
Index Cond: (c.contactid = cd.contactid)
-> Index Scan using domains_pkey on domains d
(cost=0.00..6.18 rows=1 width=4) (actual time=0.117..0.118 rows=1 loops=5000)
Index Cond: (d.domainid = c.domainid)
-> Sort (cost=2306.71..2306.71 rows=1 width=1370) (actual
time=4364.749..4365.398 rows=5000 loops=1)
Sort Key: am.progresscounter, cd.contactid
Sort Method: quicksort Memory: 1521kB
-> Hash Left Join (cost=2067.18..2306.70 rows=1 width=1370) (actual
time=4154.976..4354.979 rows=5000 loops=1)
Hash Cond: (cd.contactid = contacts.contactid)
-> Nested Loop (cost=1.69..241.18 rows=1 width=1360) (actual
time=879.632..1069.496 rows=5000 loops=1)
-> Nested Loop (cost=1.69..233.90 rows=1 width=1388)
(actual time=858.193..1016.353 rows=5000 loops=1)
Join Filter: (cd.contenttypeid = ct.contenttypeid)
-> Nested Loop (cost=1.69..232.83 rows=1
width=1226) (actual time=858.179..995.615 rows=5000 loops=1)
-> Nested Loop (cost=1.69..205.25 rows=1
width=104) (actual time=858.163..958.404 rows=5000 loops=1)
-> Nested Loop (cost=1.69..198.85
rows=1 width=89) (actual time=858.152..933.784 rows=5000 loops=1)
-> Hash Join (cost=1.69..171.26
rows=1 width=64) (actual time=858.127..897.187 rows=5000 loops=1)
Hash Cond: (am.contactid =
cte_contacts.id)
-> Nested Loop
(cost=0.56..170.04 rows=25 width=56) (actual time=848.757..881.732 rows=5000
loops=1)
-> HashAggregate
(cost=0.56..0.81 rows=25 width=8) (actual time=822.087..825.069 rows=5000
loops=1)
-> CTE Scan on
cte_contacts (cost=0.00..0.50 rows=25 width=8) (actual time=811.609..818.588
rows=5000 loops=1)
-> Index Scan using
activemailings_0000230_c_idx on activemailings_0000230 am (cost=0.00..6.76
rows=1 width=48) (actual time=0.010..0.010 rows=1 loops=5000)
Index Cond:
(am.contactid = cte_contacts.id)
-> Hash (cost=0.81..0.81
rows=25 width=8) (actual time=9.340..9.340 rows=5000 loops=1)
Buckets: 1024
Batches: 1 Memory Usage: 196kB
-> HashAggregate
(cost=0.56..0.81 rows=25 width=8) (actual time=5.037..6.903 rows=5000 loops=1)
-> CTE Scan on
cte_contacts (cost=0.00..0.50 rows=25 width=8) (actual time=0.001..1.451
rows=5000 loops=1)
-> Index Scan using idx_contacts1
on contacts c (cost=0.00..27.57 rows=1 width=25) (actual time=0.006..0.006
rows=1 loops=5000)
Index Cond: (c.contactid =
cte_contacts.id)
Filter: (c.statusid = 1)
-> Index Scan using domains_pkey on
domains d (cost=0.00..6.38 rows=1 width=19) (actual time=0.004..0.004 rows=1
loops=5000)
Index Cond: (d.domainid =
c.domainid)
-> Index Scan using contactdetails_pkey on
contactdetails cd (cost=0.00..27.57 rows=1 width=1122) (actual
time=0.006..0.006 rows=1 loops=5000)
Index Cond: (cd.contactid =
cte_contacts.id)
-> Seq Scan on contenttypes ct (cost=0.00..1.03
rows=3 width=164) (actual time=0.001..0.001 rows=3 loops=5000)
-> Index Scan using idx_tagdetails1 on tagdetails
(cost=0.00..7.27 rows=1 width=12) (actual time=0.009..0.010 rows=1 loops=5000)
Index Cond: (tagdetails.contactid = cte_contacts.id)
Filter: ((tagdetails.tagid IS NOT NULL) AND
(tagdetails.tagid = 13))
-> Hash (cost=2065.18..2065.18 rows=25 width=18) (actual
time=3275.289..3275.289 rows=5000 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 249kB
-> Nested Loop Left Join (cost=0.56..2065.18 rows=25
width=18) (actual time=42.041..3267.321 rows=5000 loops=1)
-> Nested Loop Left Join (cost=0.56..1377.73
rows=25 width=12) (actual time=29.429..3120.531 rows=5000 loops=1)
-> Nested Loop (cost=0.56..690.41 rows=25
width=8) (actual time=5.126..64.963 rows=5000 loops=1)
-> HashAggregate (cost=0.56..0.81
rows=25 width=8) (actual time=5.110..9.553 rows=5000 loops=1)
-> CTE Scan on cte_contacts
(cost=0.00..0.50 rows=25 width=8) (actual time=0.002..1.447 rows=5000 loops=1)
-> Index Scan using idx_contacts1 on
contacts (cost=0.00..27.57 rows=1 width=8) (actual time=0.009..0.010 rows=1
loops=5000)
Index Cond: (contacts.contactid =
cte_contacts.id)
-> Index Scan using
subscriber_map_contactid_idx on subscriber_map (cost=0.00..27.48 rows=1
width=12) (actual time=0.609..0.610 rows=1 loops=5000)
Index Cond: (contacts.contactid =
subscriber_map.contactid)
-> Index Scan using mw_subscriber_subscriberid_idx
on mw_subscriber (cost=0.00..27.49 rows=1 width=14) (actual time=0.026..0.027
rows=1 loops=5000)
Index Cond: (subscriber_map.subscriberid =
mw_subscriber.subscriberid)
Total runtime: 4369.824 ms
Unique (cost=4271.07..4271.08 rows=1 width=1370) (actual
time=369032.656..369037.280 rows=5000 loops=1)
CTE cte_contacts
-> Unique (cost=1865.00..1865.13 rows=26 width=8) (actual
time=485.087..489.009 rows=5000 loops=1)
-> Sort (cost=1865.00..1865.07 rows=26 width=8) (actual
time=485.086..486.234 rows=5000 loops=1)
Sort Key: c.contactid
Sort Method: quicksort Memory: 427kB
-> Nested Loop (cost=41.50..1864.39 rows=26 width=8) (actual
time=35.338..480.995 rows=5000 loops=1)
Join Filter: (cd.contenttypeid = ct.contenttypeid)
-> Nested Loop (cost=41.50..1862.18 rows=26 width=10)
(actual time=35.318..465.963 rows=5000 loops=1)
-> Nested Loop (cost=41.50..1145.37 rows=26
width=16) (actual time=35.226..366.942 rows=5000 loops=1)
-> Nested Loop (cost=41.50..984.32 rows=26
width=20) (actual time=35.213..125.133 rows=5000 loops=1)
-> Hash Left Join
(cost=41.50..267.52 rows=26 width=8) (actual time=35.121..61.289 rows=5000
loops=1)
Hash Cond: (a.progresscounter =
g.progresscounter)
Filter: (g.contactid IS NULL)
-> Index Scan using
activemailings_0000230_pc_idx on activemailings_0000230 a (cost=0.00..192.24
rows=5275 width=12) (actual time=35.028..55.868 rows=5000 loops=1)
Index Cond:
((progresscounter >= 515000) AND (progresscounter <= 519999))
-> Hash (cost=24.00..24.00
rows=1400 width=12) (actual time=0.027..0.027 rows=1 loops=1)
Buckets: 1024 Batches: 1
Memory Usage: 1kB
-> Seq Scan on
activemailingsgenerated_0000230 g (cost=0.00..24.00 rows=1400 width=12)
(actual time=0.020..0.021 rows=1 loops=1)
-> Index Scan using idx_contacts1 on
contacts c (cost=0.00..27.56 rows=1 width=12) (actual time=0.011..0.011 rows=1
loops=5000)
Index Cond: (c.contactid =
a.contactid)
-> Index Scan using domains_pkey on domains
d (cost=0.00..6.18 rows=1 width=4) (actual time=0.047..0.047 rows=1 loops=5000)
Index Cond: (d.domainid = c.domainid)
-> Index Scan using contactdetails_pkey on
contactdetails cd (cost=0.00..27.56 rows=1 width=10) (actual time=0.017..0.018
rows=1 loops=5000)
Index Cond: (cd.contactid = c.contactid)
-> Materialize (cost=0.00..1.04 rows=3 width=2)
(actual time=0.000..0.001 rows=3 loops=5000)
-> Seq Scan on contenttypes ct (cost=0.00..1.03
rows=3 width=2) (actual time=0.009..0.010 rows=3 loops=1)
-> Sort (cost=2405.94..2405.94 rows=1 width=1370) (actual
time=369032.654..369033.741 rows=5000 loops=1)
Sort Key: am.progresscounter, cd.contactid
Sort Method: quicksort Memory: 1521kB
-> Nested Loop Left Join (cost=1.75..2405.93 rows=1 width=1370)
(actual time=616.017..369009.240 rows=5000 loops=1)
Join Filter: (contacts.contactid = cd.contactid)
-> Nested Loop (cost=1.17..257.80 rows=1 width=1360) (actual
time=538.120..7806.007 rows=5000 loops=1)
-> Nested Loop (cost=1.17..250.52 rows=1 width=1388)
(actual time=511.913..7476.828 rows=5000 loops=1)
Join Filter: (cd.contenttypeid = ct.contenttypeid)
-> Nested Loop (cost=1.17..249.45 rows=1
width=1226) (actual time=511.901..7433.186 rows=5000 loops=1)
-> Nested Loop (cost=1.17..221.87 rows=1
width=104) (actual time=511.884..7375.435 rows=5000 loops=1)
-> Nested Loop (cost=1.17..215.47
rows=1 width=89) (actual time=511.874..7339.516 rows=5000 loops=1)
-> Nested Loop
(cost=1.17..187.89 rows=1 width=64) (actual time=511.855..7296.137 rows=5000
loops=1)
Join Filter: (am.contactid =
cte_contacts.id)
-> HashAggregate
(cost=0.58..0.84 rows=26 width=8) (actual time=495.631..501.471 rows=5000
loops=1)
-> CTE Scan on
cte_contacts (cost=0.00..0.52 rows=26 width=8) (actual time=485.090..491.975
rows=5000 loops=1)
-> Materialize
(cost=0.58..176.97 rows=26 width=56) (actual time=0.003..0.499 rows=5000
loops=5000)
-> Nested Loop
(cost=0.58..176.84 rows=26 width=56) (actual time=16.213..95.466 rows=5000
loops=1)
->
HashAggregate (cost=0.58..0.84 rows=26 width=8) (actual time=5.094..8.226
rows=5000 loops=1)
-> CTE
Scan on cte_contacts (cost=0.00..0.52 rows=26 width=8) (actual
time=0.004..1.472 rows=5000 loops=1)
-> Index Scan
using activemailings_0000230_c_idx on activemailings_0000230 am
(cost=0.00..6.76 rows=1 width=48) (actual time=0.016..0.016 rows=1 loops=5000)
Index
Cond: (am.contactid = cte_contacts.id)
-> Index Scan using idx_contacts1
on contacts c (cost=0.00..27.57 rows=1 width=25) (actual time=0.006..0.007
rows=1 loops=5000)
Index Cond: (c.contactid =
am.contactid)
Filter: (c.statusid = 1)
-> Index Scan using domains_pkey on
domains d (cost=0.00..6.38 rows=1 width=19) (actual time=0.005..0.005 rows=1
loops=5000)
Index Cond: (d.domainid =
c.domainid)
-> Index Scan using contactdetails_pkey on
contactdetails cd (cost=0.00..27.57 rows=1 width=1122) (actual
time=0.009..0.010 rows=1 loops=5000)
Index Cond: (cd.contactid = c.contactid)
-> Seq Scan on contenttypes ct (cost=0.00..1.03
rows=3 width=164) (actual time=0.001..0.002 rows=3 loops=5000)
-> Index Scan using idx_tagdetails1 on tagdetails
(cost=0.00..7.27 rows=1 width=12) (actual time=0.063..0.064 rows=1 loops=5000)
Index Cond: (tagdetails.contactid = cd.contactid)
Filter: ((tagdetails.tagid IS NOT NULL) AND
(tagdetails.tagid = 13))
-> Nested Loop Left Join (cost=0.58..2147.79 rows=26 width=18)
(actual time=0.032..71.270 rows=5000 loops=5000)
-> Nested Loop Left Join (cost=0.58..1432.84 rows=26
width=12) (actual time=0.020..45.707 rows=5000 loops=5000)
-> Nested Loop (cost=0.58..718.03 rows=26 width=8)
(actual time=0.006..23.370 rows=5000 loops=5000)
-> HashAggregate (cost=0.58..0.84 rows=26
width=8) (actual time=0.002..1.443 rows=5000 loops=5000)
-> CTE Scan on cte_contacts
(cost=0.00..0.52 rows=26 width=8) (actual time=0.002..1.431 rows=5000 loops=1)
-> Index Scan using idx_contacts1 on contacts
(cost=0.00..27.57 rows=1 width=8) (actual time=0.004..0.004 rows=1
loops=25000000)
Index Cond: (contacts.contactid =
cte_contacts.id)
-> Index Scan using subscriber_map_contactid_idx on
subscriber_map (cost=0.00..27.48 rows=1 width=12) (actual time=0.004..0.004
rows=1 loops=25000000)
Index Cond: (contacts.contactid =
subscriber_map.contactid)
-> Index Scan using mw_subscriber_subscriberid_idx on
mw_subscriber (cost=0.00..27.49 rows=1 width=14) (actual time=0.004..0.004
rows=1 loops=25000000)
Index Cond: (subscriber_map.subscriberid =
mw_subscriber.subscriberid)
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs