Re: [BUGS] BUG #3667: Job scheduling with Greenplum fails

Roberts, Jon Thu, 11 Oct 2007 06:00:50 -0700

The original query:

EXPLAIN ANALYZE 
SELECT *, 
           (SELECT jlgstatus FROM pgagent.pga_joblog jl WHERE jl.jlgjobid =
j.jobid ORDER BY jlgid DESC LIMIT 1) AS joblastresult 
           FROM pgagent.pga_job j
         JOIN pgagent.pga_jobclass cl ON cl.jclid=jobjclid
         LEFT OUTER JOIN pgagent.pga_jobagent ag ON ag.jagpid=jobagentid
--       + restriction +
        ORDER BY jobname;



"Sort  (cost=5359.18..5360.33 rows=460 width=221) (actual time=0.295..0.300
rows=2 loops=1)"
"  Sort Key: j.jobname"
"  ->  Hash Join  (cost=69.50..5338.84 rows=460 width=221) (actual
time=0.189..0.249 rows=2 loops=1)"
"        Hash Cond: (j.jobjclid = cl.jclid)"
"        ->  Hash Left Join  (cost=33.40..54.33 rows=460 width=185) (actual
time=0.068..0.084 rows=2 loops=1)"
"              Hash Cond: (j.jobagentid = ag.jagpid)"
"              ->  Seq Scan on pga_job j  (cost=0.00..14.60 rows=460
width=141) (actual time=0.025..0.030 rows=2 loops=1)"
"              ->  Hash  (cost=20.40..20.40 rows=1040 width=44) (actual
time=0.019..0.019 rows=1 loops=1)"
"                    ->  Seq Scan on pga_jobagent ag  (cost=0.00..20.40
rows=1040 width=44) (actual time=0.005..0.008 rows=1 loops=1)"
"        ->  Hash  (cost=21.60..21.60 rows=1160 width=36) (actual
time=0.050..0.050 rows=5 loops=1)"
"              ->  Seq Scan on pga_jobclass cl  (cost=0.00..21.60 rows=1160
width=36) (actual time=0.011..0.022 rows=5 loops=1)"
"        SubPlan"
"          ->  Limit  (cost=0.00..11.40 rows=1 width=9) (actual
time=0.023..0.025 rows=1 loops=2)"
"                ->  Index Scan Backward using pga_joblog_pkey on pga_joblog
jl  (cost=0.00..68.38 rows=6 width=9) (actual time=0.014..0.014 rows=1
loops=2)"
"                      Filter: (jlgjobid = $0)"
"Total runtime: 0.519 ms"


My revised query:
EXPLAIN ANALYZE 
select j.*, cl.*, ag.*, sub3.jlgstatus
   from pgagent.pga_job j join 
        pgagent.pga_jobclass cl on cl.jclid=jobjclid left outer join 
        pgagent.pga_jobagent ag on ag.jagpid=jobagentid 
        join (select j2.jlgstatus, 
                     sub.jlgjobid
                from pgagent.pga_joblog j2 join 
                     (select jl.jlgjobid, 
                             max(jl.jlgid) as max_jlgid 
                        from pgagent.pga_joblog jl
                       group by jl.jlgjobid) sub
                     on sub.jlgjobid = j2.jlgjobid and 
                        sub.max_jlgid = j2.jlgid) sub3
        on sub3.jlgjobid = j.jobid
--       + restriction +
order by jobname;

"Sort  (cost=81.07..81.07 rows=1 width=226) (actual time=0.780..0.784 rows=2
loops=1)"
"  Sort Key: j.jobname"
"  ->  Nested Loop  (cost=66.00..81.06 rows=1 width=226) (actual
time=0.633..0.736 rows=2 loops=1)"
"        ->  Nested Loop Left Join  (cost=66.00..80.67 rows=1 width=190)
(actual time=0.613..0.684 rows=2 loops=1)"
"              ->  Nested Loop  (cost=66.00..80.29 rows=1 width=146) (actual
time=0.598..0.651 rows=2 loops=1)"
"                    ->  Hash Join  (cost=66.00..72.01 rows=1 width=13)
(actual time=0.566..0.583 rows=2 loops=1)"
"                          Hash Cond: ((sub.jlgjobid = j2.jlgjobid) AND
(sub.max_jlgid = j2.jlgid))"
"                          ->  HashAggregate  (cost=27.25..29.75 rows=200
width=8) (actual time=0.252..0.257 rows=2 loops=1)"
"                                ->  Seq Scan on pga_joblog jl
(cost=0.00..21.50 rows=1150 width=8) (actual time=0.012..0.111 rows=44
loops=1)"
"                          ->  Hash  (cost=21.50..21.50 rows=1150 width=13)
(actual time=0.283..0.283 rows=44 loops=1)"
"                                ->  Seq Scan on pga_joblog j2
(cost=0.00..21.50 rows=1150 width=13) (actual time=0.023..0.145 rows=44
loops=1)"
"                    ->  Index Scan using pga_job_pkey on pga_job j
(cost=0.00..8.27 rows=1 width=141) (actual time=0.014..0.017 rows=1
loops=2)"
"                          Index Cond: (sub.jlgjobid = j.jobid)"
"              ->  Index Scan using pga_jobagent_pkey on pga_jobagent ag
(cost=0.00..0.37 rows=1 width=44) (actual time=0.002..0.002 rows=0 loops=2)"
"                    Index Cond: (ag.jagpid = j.jobagentid)"
"        ->  Index Scan using pga_jobclass_pkey on pga_jobclass cl
(cost=0.00..0.37 rows=1 width=36) (actual time=0.006..0.009 rows=1 loops=2)"
"              Index Cond: (cl.jclid = j.jobjclid)"
"Total runtime: 1.096 ms"

My table only has 2 records in it so it might be different when we have
several hundred jobs.

The cost is significantly lower but the total runtime is higher.  This is on
a PostgreSQL database installed on my desktop.  It has nothing to do with
Greenplum.  I can't even run an explain plan on GP with that first query
because it fails.

Another solution would be to call a function in the database rather than
imbedding the SQL in C++.  If you look at pgagent.sql, there are a few
functions created to support jobs.  Maybe this could be another function
call so it could easily be modified to support Greenplum and make it easier
for users to tweak the tool.

Thanks!


Jon
-----Original Message-----
From: Tom Lane [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 10, 2007 4:24 PM
To: Roberts, Jon
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #3667: Job scheduling with Greenplum fails 

"Roberts, Jon" <[EMAIL PROTECTED]> writes:
> What if the new query has a significantly lower cost compared to the older
> one?

Much as I'd like the planner to be infallible, it ain't; estimated costs
are no proof of any real-world performance difference.  Better show
EXPLAIN ANALYZE numbers if you want to be taken seriously --- and *not*
ones from a Greenplum-modified PG.

In any case "I can make this particular query faster" seems a rather
different argument from "you guys should eliminate all use of correlated
subqueries".

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to [EMAIL PROTECTED] so that your
       message can get through to the mailing list cleanly

Re: [BUGS] BUG #3667: Job scheduling with Greenplum fails

Reply via email to