The original query: EXPLAIN ANALYZE SELECT *, (SELECT jlgstatus FROM pgagent.pga_joblog jl WHERE jl.jlgjobid = j.jobid ORDER BY jlgid DESC LIMIT 1) AS joblastresult FROM pgagent.pga_job j JOIN pgagent.pga_jobclass cl ON cl.jclid=jobjclid LEFT OUTER JOIN pgagent.pga_jobagent ag ON ag.jagpid=jobagentid -- + restriction + ORDER BY jobname;
"Sort (cost=5359.18..5360.33 rows=460 width=221) (actual time=0.295..0.300 rows=2 loops=1)" " Sort Key: j.jobname" " -> Hash Join (cost=69.50..5338.84 rows=460 width=221) (actual time=0.189..0.249 rows=2 loops=1)" " Hash Cond: (j.jobjclid = cl.jclid)" " -> Hash Left Join (cost=33.40..54.33 rows=460 width=185) (actual time=0.068..0.084 rows=2 loops=1)" " Hash Cond: (j.jobagentid = ag.jagpid)" " -> Seq Scan on pga_job j (cost=0.00..14.60 rows=460 width=141) (actual time=0.025..0.030 rows=2 loops=1)" " -> Hash (cost=20.40..20.40 rows=1040 width=44) (actual time=0.019..0.019 rows=1 loops=1)" " -> Seq Scan on pga_jobagent ag (cost=0.00..20.40 rows=1040 width=44) (actual time=0.005..0.008 rows=1 loops=1)" " -> Hash (cost=21.60..21.60 rows=1160 width=36) (actual time=0.050..0.050 rows=5 loops=1)" " -> Seq Scan on pga_jobclass cl (cost=0.00..21.60 rows=1160 width=36) (actual time=0.011..0.022 rows=5 loops=1)" " SubPlan" " -> Limit (cost=0.00..11.40 rows=1 width=9) (actual time=0.023..0.025 rows=1 loops=2)" " -> Index Scan Backward using pga_joblog_pkey on pga_joblog jl (cost=0.00..68.38 rows=6 width=9) (actual time=0.014..0.014 rows=1 loops=2)" " Filter: (jlgjobid = $0)" "Total runtime: 0.519 ms" My revised query: EXPLAIN ANALYZE select j.*, cl.*, ag.*, sub3.jlgstatus from pgagent.pga_job j join pgagent.pga_jobclass cl on cl.jclid=jobjclid left outer join pgagent.pga_jobagent ag on ag.jagpid=jobagentid join (select j2.jlgstatus, sub.jlgjobid from pgagent.pga_joblog j2 join (select jl.jlgjobid, max(jl.jlgid) as max_jlgid from pgagent.pga_joblog jl group by jl.jlgjobid) sub on sub.jlgjobid = j2.jlgjobid and sub.max_jlgid = j2.jlgid) sub3 on sub3.jlgjobid = j.jobid -- + restriction + order by jobname; "Sort (cost=81.07..81.07 rows=1 width=226) (actual time=0.780..0.784 rows=2 loops=1)" " Sort Key: j.jobname" " -> Nested Loop (cost=66.00..81.06 rows=1 width=226) (actual time=0.633..0.736 rows=2 loops=1)" " -> Nested Loop Left Join (cost=66.00..80.67 rows=1 width=190) (actual time=0.613..0.684 rows=2 loops=1)" " -> Nested Loop (cost=66.00..80.29 rows=1 width=146) (actual time=0.598..0.651 rows=2 loops=1)" " -> Hash Join (cost=66.00..72.01 rows=1 width=13) (actual time=0.566..0.583 rows=2 loops=1)" " Hash Cond: ((sub.jlgjobid = j2.jlgjobid) AND (sub.max_jlgid = j2.jlgid))" " -> HashAggregate (cost=27.25..29.75 rows=200 width=8) (actual time=0.252..0.257 rows=2 loops=1)" " -> Seq Scan on pga_joblog jl (cost=0.00..21.50 rows=1150 width=8) (actual time=0.012..0.111 rows=44 loops=1)" " -> Hash (cost=21.50..21.50 rows=1150 width=13) (actual time=0.283..0.283 rows=44 loops=1)" " -> Seq Scan on pga_joblog j2 (cost=0.00..21.50 rows=1150 width=13) (actual time=0.023..0.145 rows=44 loops=1)" " -> Index Scan using pga_job_pkey on pga_job j (cost=0.00..8.27 rows=1 width=141) (actual time=0.014..0.017 rows=1 loops=2)" " Index Cond: (sub.jlgjobid = j.jobid)" " -> Index Scan using pga_jobagent_pkey on pga_jobagent ag (cost=0.00..0.37 rows=1 width=44) (actual time=0.002..0.002 rows=0 loops=2)" " Index Cond: (ag.jagpid = j.jobagentid)" " -> Index Scan using pga_jobclass_pkey on pga_jobclass cl (cost=0.00..0.37 rows=1 width=36) (actual time=0.006..0.009 rows=1 loops=2)" " Index Cond: (cl.jclid = j.jobjclid)" "Total runtime: 1.096 ms" My table only has 2 records in it so it might be different when we have several hundred jobs. The cost is significantly lower but the total runtime is higher. This is on a PostgreSQL database installed on my desktop. It has nothing to do with Greenplum. I can't even run an explain plan on GP with that first query because it fails. Another solution would be to call a function in the database rather than imbedding the SQL in C++. If you look at pgagent.sql, there are a few functions created to support jobs. Maybe this could be another function call so it could easily be modified to support Greenplum and make it easier for users to tweak the tool. Thanks! Jon -----Original Message----- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 10, 2007 4:24 PM To: Roberts, Jon Cc: pgsql-bugs@postgresql.org Subject: Re: [BUGS] BUG #3667: Job scheduling with Greenplum fails "Roberts, Jon" <[EMAIL PROTECTED]> writes: > What if the new query has a significantly lower cost compared to the older > one? Much as I'd like the planner to be infallible, it ain't; estimated costs are no proof of any real-world performance difference. Better show EXPLAIN ANALYZE numbers if you want to be taken seriously --- and *not* ones from a Greenplum-modified PG. In any case "I can make this particular query faster" seems a rather different argument from "you guys should eliminate all use of correlated subqueries". regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly