Re: [PERFORM] What is the postgres sql command for last_user_id ???

2005-03-02 Thread Michael Fuhr
On Tue, Mar 01, 2005 at 10:46:02PM +0700, [EMAIL PROTECTED] wrote:

> I would like to know whether there is any command which the server will give 
> the
> record ID back to the client when client puts the data and the server 
> generates
> an autoincrement ID for that record.

See "How do I get the value of a SERIAL insert?" and the question
immediately following it in the FAQ:

http://www.postgresql.org/files/documentation/faqs/FAQ.html#4.11.2

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Help with tuning this query

2005-03-02 Thread Richard Huxton
Ken Egervari wrote:
I've tried to use Dan Tow's tuning method
Who? What?
> and created all the right
indexes from his diagraming method, but the query still performs
quite slow both inside the application and just inside pgadmin III.
Can anyone be kind enough to help me tune it so that it performs
better in postgres?  I don't think it's using the right indexes, or
maybe postgres needs special treatment.
I've converted the below query to SQL from a Hibernate query, so the
syntax is probably not perfect but it's semantics are exactly the
same.  I've done so by looking at the source code, but I can't run it
to get the exact SQL since I don't have the database on my home
machine.
Hibernate is a java thing, no? It'd be helpful to have the actual SQL 
the hibernate class (or whatever) generates. One of the problems with 
SQL is that you can have multiple ways to get the same results and it's 
not always possible for the planner to convert from one to the other.

Anyway, people will want to see EXPLAIN ANALYSE for the query in 
question. Obviously, make sure you've vacuumed and analysed the tables 
in question recently. Oh, and make sure yousay what version of PG you're 
running.

select s.* from shipment s inner join carrier_code cc on
s.carrier_code_id = cc.id inner join carrier c on cc.carrier_id =
c.id inner join carrier_to_person ctp on ctp.carrier_id = c.id inner
join person p on p.id = ctp.person_id inner join shipment_status cs
on s.current_status_id = cs.id inner join release_code rc on
cs.release_code_id = rc.id left join shipment_status ss on
ss.shipment_id = s.id where p.id = :personId and s.is_purged = false
and rc.number = '9' and cs is not null and cs.date >= current_date -
31 order by cs.date desc
1. Why are you quoting the 9 when checking against rc.number?
2. The "cs is not null" doesn't appear to be qualified - which table?
Just assume I have no indexes for the moment because while some of
the indexes I made make it work faster, it's still around 250
milliseconds and under heavy load, the query performs very badly (6-7
seconds).
3. If you rewrite the "current_date - 31" as a suitable ago(31) function 
then you can use an index on cs.date
4. Are you familiar with the configuration setting "join_collapse_limit"?

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Help with tuning this query

2005-03-02 Thread Bricklen Anderson
Richard Huxton wrote:
Ken Egervari wrote:
I've tried to use Dan Tow's tuning method
Who? What?
http://www.singingsql.com/
Dan has written some remarkable papers on sql tuning. Some of it is pretty complex, but his book 
"SQL Tuning" is an excellent resource.

--
___
This e-mail may be privileged and/or confidential, and the sender does
not waive any related rights and obligations. Any distribution, use or
copying of this e-mail or the information it contains by other than an
intended recipient is unauthorized. If you received this e-mail in
error, please advise me (by return e-mail or otherwise) immediately.
___
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Help with tuning this query

2005-03-02 Thread Bruce Momjian
Bricklen Anderson wrote:
> Richard Huxton wrote:
> > Ken Egervari wrote:
> > 
> >> I've tried to use Dan Tow's tuning method
> > Who? What?
> 
> http://www.singingsql.com/

That URL is invalid for me.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Help with tuning this query

2005-03-02 Thread John Arbash Meinel
Ken Egervari wrote:
I've tried to use Dan Tow's tuning method and created all the right
indexes from his diagraming method, but the query still performs quite
slow both inside the application and just inside pgadmin III.  Can
anyone be kind enough to help me tune it so that it performs better in
postgres?  I don't think it's using the right indexes, or maybe
postgres needs special treatment.
First, what version of postgres, and have you run VACUUM ANALYZE recently?
Also, please attach the result of running EXPLAIN ANALYZE.
(eg, explain analyze select s.* from shipment ...)
It's very possible that you don't have up-to-date statistics, which
causes postgres to make a bad estimate of what the fastest plan is.
Also, if you are using an older version of postgres (like 7.1) you
really should upgrade. There are quite a few performance and real bug fixes.
I've converted the below query to SQL from a Hibernate query, so the
syntax is probably not perfect but it's semantics are exactly the
same.  I've done so by looking at the source code, but I can't run it
to get the exact SQL since I don't have the database on my home machine.
I don't know how to make Hibernate do what you want, but if you change
the query to using subselects (not all databases support this, so
hibernate might not let you), you can see a performance improvement.
Also sometimes using explicit joins can be worse than just letting the
query manager figure it out. So something like
select s.* from shipment s, carrier_code cc, carrier c, ...
   where s.carrier_code_id = cc.id and c.id = cc.carrier_id and 
But again, since this is generated from another program (Hibernate), I
really don't know how you tell it how to tune the SQL. Probably the
biggest "non-bug" performance improvements are from tuning the SQL.
But if postgres isn't using the right indexes, etc, you can probably fix
that.
John
=:->
select s.*
from shipment s
inner join carrier_code cc on s.carrier_code_id = cc.id
inner join carrier c on cc.carrier_id = c.id
inner join carrier_to_person ctp on ctp.carrier_id = c.id
inner join person p on p.id = ctp.person_id
inner join shipment_status cs on s.current_status_id = cs.id
inner join release_code rc on cs.release_code_id = rc.id
left join shipment_status ss on ss.shipment_id = s.id
where
p.id = :personId and
s.is_purged = false and
rc.number = '9' and
cs is not null and
cs.date >= current_date - 31
order by cs.date desc
Just assume I have no indexes for the moment because while some of the
indexes I made make it work faster, it's still around 250 milliseconds
and under heavy load, the query performs very badly (6-7 seconds).
For your information:
shipment contains 40,000 rows
shipment_status contains 80,000 rows
release_code contains 8 rows
person contains 300 rows
carrier contains 60 rows
carrier_code contains 70 rows
The filter ratios are:
rc.number = '9' (0.125)
cs.date >= current_date - 31 (.10)
p.id = ? (0.003)
s.is_purged = false (.98)
I really hope someone can help since I'm pretty much stuck.
Best regards and many thanks,
Ken



signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Help with tuning this query

2005-03-02 Thread Ken Egervari
First, what version of postgres, and have you run VACUUM ANALYZE recently?
Also, please attach the result of running EXPLAIN ANALYZE.
(eg, explain analyze select s.* from shipment ...)
I'm using postgres 8.0.  I wish I could paste explain analyze, but I won't 
be at work for a few days.  I was hoping some Postgres/SQL experts here 
would be able to simply look at the query and make recommendations because 
it's not a very difficult or unique query.

It's very possible that you don't have up-to-date statistics, which
causes postgres to make a bad estimate of what the fastest plan is.
I run VACUUM ANALYZE religiously.  I even dumped the production database and 
used it as my test database after a full vacuum analyze.  It's really as 
fresh as it can be.

I don't know how to make Hibernate do what you want, but if you change
the query to using subselects (not all databases support this, so
hibernate might not let you), you can see a performance improvement.
Yes, Hibernate supports sub-selects.  In fact, I can even drop down to JDBC 
explicitly, so whatever SQL tricks out there I can use will work on 
Hibernate.  In what way will sub-selects improve this query?

Also sometimes using explicit joins can be worse than just letting the
query manager figure it out. So something like
select s.* from shipment s, carrier_code cc, carrier c, ...
   where s.carrier_code_id = cc.id and c.id = cc.carrier_id and 
I think I can avoid using joins in Hibernate, but it makes the query harder 
to maintain.  How much of a performance benefit are we talking with this 
change?  Since hibernate is an object language, you don't actually have to 
specify many joins.  You can use the "dot" notation.

Query query = session.createQuery(
"select shipment " +
"from Shipment shipment " +
"   inner join 
shipment.cargoControlNumber.carrierCode.carrier.persons person " +
"   inner join shipment.currentStatus currentStatus " +
"   inner join currentStatus.releaseCode releaseCode " +
"   left join fetch shipment.currentStatus " +
"where " +
"   person.id = :personId and " +
"   shipment.isPurged = false and " +
"   releaseCode.number = '9' and " +
"   currentStatus is not null and " +
"   currentStatus.date >= current_date - 31 " +
"order by currentStatus.date desc"
);

query.setParameter( "personId", personId );
query.setFirstResult( firstResult );
query.setMaxResults( maxResults );
return query.list();
As you can see, it's fairly elegant language and maps to SQL quite well.
But again, since this is generated from another program (Hibernate), I
really don't know how you tell it how to tune the SQL. Probably the
biggest "non-bug" performance improvements are from tuning the SQL.
I agree, but the ones I've tried aren't good enough.  I have made these 
indexes that apply to this query as well as others in my from looking at my 
SQL scripts.  Many of my queries have really sped up to 14 milliseconds from 
these indexes.  But I can't make this query run any faster.

CREATE INDEX carrier_to_person_person_id_idx ON carrier_to_person USING 
btree (person_id);
CREATE INDEX carrier_to_person_carrier_id_idx ON carrier_to_person USING 
btree (carrier_id);
CREATE INDEX carrier_code_carrier_id_idx ON carrier_code USING btree 
(carrier_id);
CREATE INDEX shipment_carrier_code_id_idx ON shipment USING btree 
(carrier_code_id);
CREATE INDEX current_status_date_idx ON shipment_status USING btree (date);
CREATE INDEX shipment_current_status_id_idx ON shipment USING btree 
(current_status_id);
CREATE INDEX shipment_status_shipment_id_idx ON shipment_status USING btree 
(shipment_id);

Thanks for your responses everyone.  I'll try and get you that explain 
analyze.  I'm just not at work at the moment but this is a problem that I'm 
simply puzzled and worried about.  I'm getting all of this from CVS on my 
work server.

Ken 

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Help with tuning this query

2005-03-02 Thread John Arbash Meinel
Ken Egervari wrote:
First, what version of postgres, and have you run VACUUM ANALYZE
recently?
Also, please attach the result of running EXPLAIN ANALYZE.
(eg, explain analyze select s.* from shipment ...)

I'm using postgres 8.0.  I wish I could paste explain analyze, but I
won't be at work for a few days.  I was hoping some Postgres/SQL
experts here would be able to simply look at the query and make
recommendations because it's not a very difficult or unique query.
That's the problem. Without explain analyze, it's hard to say why it is
performing weird, because it *does* look like a straightforward query.
It's very possible that you don't have up-to-date statistics, which
causes postgres to make a bad estimate of what the fastest plan is.

I run VACUUM ANALYZE religiously.  I even dumped the production
database and used it as my test database after a full vacuum analyze.
It's really as fresh as it can be.
Good. Again, this is just the first precaution, as not everyone is as
careful as you. And without the explain analyze, you can't tell what the
planner estimates are.
I don't know how to make Hibernate do what you want, but if you change
the query to using subselects (not all databases support this, so
hibernate might not let you), you can see a performance improvement.

Yes, Hibernate supports sub-selects.  In fact, I can even drop down to
JDBC explicitly, so whatever SQL tricks out there I can use will work
on Hibernate.  In what way will sub-selects improve this query?
When doing massive joins across multiple tables (as you are doing) it is
frequently faster to do a couple of small joins where you only need a
couple of rows as input to the rest. Something like:
select * from shipment s
where s.carrier_code_id in
   (select cc.id from carrier_code cc join carrier c on
cc.carrier_id = c.id)
and s.current_status_id in (select cs.id from shipment_status cs where ...)
Again it's something that you can try. I have found quite a few of my
queries performed much better with subselects.
I'm guessing it's because with big queries it has a harder time figuring
out how to refactor (the decision tree becomes big). But I'm not really
sure. I just know it can work.
Also sometimes using explicit joins can be worse than just letting the
query manager figure it out. So something like
select s.* from shipment s, carrier_code cc, carrier c, ...
   where s.carrier_code_id = cc.id and c.id = cc.carrier_id and 

I think I can avoid using joins in Hibernate, but it makes the query
harder to maintain.  How much of a performance benefit are we talking
with this change?  Since hibernate is an object language, you don't
actually have to specify many joins.  You can use the "dot" notation.
I'm not saying this *will* improve performance. It is just something to
try. It very easily could not be worth the overhead.
Query query = session.createQuery(
"select shipment " +
"from Shipment shipment " +
"   inner join
shipment.cargoControlNumber.carrierCode.carrier.persons person " +
"   inner join shipment.currentStatus currentStatus " +
"   inner join currentStatus.releaseCode releaseCode " +
"   left join fetch shipment.currentStatus " +
"where " +
"   person.id = :personId and " +
"   shipment.isPurged = false and " +
"   releaseCode.number = '9' and " +
"   currentStatus is not null and " +
"   currentStatus.date >= current_date - 31 " +
"order by currentStatus.date desc"
);
query.setParameter( "personId", personId );
query.setFirstResult( firstResult );
query.setMaxResults( maxResults );
return query.list();
As you can see, it's fairly elegant language and maps to SQL quite well.
But again, since this is generated from another program (Hibernate), I
really don't know how you tell it how to tune the SQL. Probably the
biggest "non-bug" performance improvements are from tuning the SQL.

I agree, but the ones I've tried aren't good enough.  I have made
these indexes that apply to this query as well as others in my from
looking at my SQL scripts.  Many of my queries have really sped up to
14 milliseconds from these indexes.  But I can't make this query run
any faster.
CREATE INDEX carrier_to_person_person_id_idx ON carrier_to_person
USING btree (person_id);
CREATE INDEX carrier_to_person_carrier_id_idx ON carrier_to_person
USING btree (carrier_id);
CREATE INDEX carrier_code_carrier_id_idx ON carrier_code USING btree
(carrier_id);
CREATE INDEX shipment_carrier_code_id_idx ON shipment USING btree
(carrier_code_id);
CREATE INDEX current_status_date_idx ON shipment_status USING btree
(date);
CREATE INDEX shipment_current_status_id_idx ON shipment USING btree
(current_status_id);
CREATE INDEX shipment_status_shipment_id_idx ON shipment_status USING
btree (shipment_id);
Thanks for your responses everyone.  I'll try and get you that explain
analyze.  I'm just not at work at the moment but this is a problem
that 

[PERFORM] Performance tradeoff

2005-03-02 Thread Shawn Chisholm
Hi All,

I am wondering about the relative performance of  "insert into table1 select 
distinct a,b from ..." and "insert into table1 select a,b from ... group by 
a,b" when querying tables of different sizes (10K, 100K, 1s, 10s, 100s of 
millions of rows).  

The distinct way tends to sort/unique and the group by tends to hash 
aggregate... any opinions on which is better?

I can also change the schema to a certain extent, so would it be worthwhile to 
put indices on the queried tables (or refactor them) hoping the distinct does 
an index scan instead of sort...  would the query planner take advantage of 
that?

Thanks,

Shawn


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Help with tuning this query

2005-03-02 Thread Ragnar Hafstað
On Wed, 2005-03-02 at 01:51 -0500, Ken Egervari wrote:
>  
> select s.*
> from shipment s
> inner join carrier_code cc on s.carrier_code_id = cc.id
> inner join carrier c on cc.carrier_id = c.id
> inner join carrier_to_person ctp on ctp.carrier_id = c.id
> inner join person p on p.id = ctp.person_id
> inner join shipment_status cs on s.current_status_id = cs.id
> inner join release_code rc on cs.release_code_id = rc.id
> left join shipment_status ss on ss.shipment_id = s.id
> where
> p.id = :personId and
> s.is_purged = false and
> rc.number = '9' and
> cs is not null and
> cs.date >= current_date - 31
> order by cs.date desc
> ... 
> shipment contains 40,000 rows
> shipment_status contains 80,000 rows

I may be missing something, but it looks like the second join
on shipment_status (the left join) is not adding anything to your
results, except more work. ss is not used for output, nor in the where
clause, so what is its purpose ?

if cs.date has an upper limit, it might be helpful to change the
condition to a BETWEEN

in any case, i would think you might need an index on
  shipment(carrier_code_id)
  shipment(current_status_id)
  shipment_status(id)

gnari




---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Help with tuning this query

2005-03-02 Thread Ken Egervari
select s.*
from shipment s
inner join carrier_code cc on s.carrier_code_id = cc.id
inner join carrier c on cc.carrier_id = c.id
inner join carrier_to_person ctp on ctp.carrier_id = c.id
inner join person p on p.id = ctp.person_id
inner join shipment_status cs on s.current_status_id = cs.id
inner join release_code rc on cs.release_code_id = rc.id
left join shipment_status ss on ss.shipment_id = s.id
where
p.id = :personId and
s.is_purged = false and
rc.number = '9' and
cs is not null and
cs.date >= current_date - 31
order by cs.date desc
...
shipment contains 40,000 rows
shipment_status contains 80,000 rows
I may be missing something, but it looks like the second join
on shipment_status (the left join) is not adding anything to your
results, except more work. ss is not used for output, nor in the where
clause, so what is its purpose ?
It does look strange doesn't it?  I would think the same thing if it were 
the first time I looked at it.  But rest assured, it's done by design.  A 
shipment relates to many shipment_status rows, but only 1 is the current 
shipment_status for the shipment.  The first does queries on the current 
status only and doesn't analyze the rest of the related items.  The second 
left join is for eager loading so that I don't have to run a seperate query 
to fetch the children for each shipment.  This really does improve 
performance because otherwise you'll have to make N+1 queries to the 
database, and that's just too much overhead.  Since I need all the 
shipment_status children along with the shipment for the domain logic to 
work on them, I have to load them all.

On average, a shipment will have 2 shipment_status rows.  So if the query 
selects 100 shipments, the query returns 200 rows.  Hibernate is intelligent 
enough to map the shipment_status children to the appropriate shipment 
automatically.

if cs.date has an upper limit, it might be helpful to change the
condition to a BETWEEN
Well, I could create an upper limit. It would be the current date.  Would 
adding in this redundant condition improve performance?  I've clustered the 
shipment table so that the dates are together, which has improved 
performance.  I'm not sure adding in this implicit condition will speed up 
anything, but I will definately try it.

in any case, i would think you might need an index on
 shipment(carrier_code_id)
 shipment(current_status_id)
 shipment_status(id)
Unfortunately, I have indexes on all three (Postgres implicitly creates 
indexes for unique keys).  Here are the other 2 that are already created:

CREATE INDEX shipment_carrier_code_id_idx ON shipment USING btree 
(carrier_code_id);
CREATE INDEX shipment_current_status_id_idx ON shipment USING btree 
(current_status_id);

So I guess we've been thinking the same thing.  Don't get me wrong.  These 
indexes speed up the query from 1.6 seconds to 250 milliseconds.  I just 
need to be around 30 milliseconds.

Another idea that had occured to me was trying to force postgres to driver 
on the person table because that filter ratio is so great compared to 
everything else, but I do remember looking at the explain days ago and it 
was one of the last tables being filtered/joined.  Is there anyway to force 
postgres to pick person?  The reason I ask is because this would really 
reduce the number of rows it pulls out from the shipment table.

Thanks for comments.  I'll try making that date explicit and change the 
query to use between to see if that does anything.

Regards and many thanks,
Ken 

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Help with tuning this query

2005-03-02 Thread Ragnar Hafstað
On Wed, 2005-03-02 at 13:28 -0500, Ken Egervari wrote:
> >> select s.*
> >> from shipment s
> >> inner join carrier_code cc on s.carrier_code_id = cc.id
> >> inner join carrier c on cc.carrier_id = c.id
> >> inner join carrier_to_person ctp on ctp.carrier_id = c.id
> >> inner join person p on p.id = ctp.person_id
> >> inner join shipment_status cs on s.current_status_id = cs.id
> >> inner join release_code rc on cs.release_code_id = rc.id
> >> left join shipment_status ss on ss.shipment_id = s.id
> >> where
> >> p.id = :personId and
> >> s.is_purged = false and
> >> rc.number = '9' and
> >> cs is not null and
> >> cs.date >= current_date - 31
> >> order by cs.date desc
> >
> > I may be missing something, but it looks like the second join
> > on shipment_status (the left join) is not adding anything to your
> > results, except more work. ss is not used for output, nor in the where
> > clause, so what is its purpose ?
> ...  The second 
> left join is for eager loading so that I don't have to run a seperate query 
> to fetch the children for each shipment.  This really does improve 
> performance because otherwise you'll have to make N+1 queries to the 
> database, and that's just too much overhead.

are you saying that you are actually doing a
  select s.*,ss.* ...
?

> > if cs.date has an upper limit, it might be helpful to change the
> > condition to a BETWEEN
> 
> Well, I could create an upper limit. It would be the current date.  Would 
> adding in this redundant condition improve performance?

it might help the planner estimate better the number of cs rows 
affected. whether this improves performance depends on whether
the best plans are sensitive to this.

an EXPLAIN ANALYSE might reduce the guessing.

gnari



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Help with tuning this query

2005-03-02 Thread Ken Egervari
left join is for eager loading so that I don't have to run a seperate
query
to fetch the children for each shipment.  This really does improve
performance because otherwise you'll have to make N+1 queries to the
database, and that's just too much overhead.
are you saying that you are actually doing a
 select s.*,ss.* ...
?
Yes, this is how the SQL should be written.  When I manually converted the
query, I forgot to include this detail.  In hibernate, you don't need to
specifiy the ss.* because you are dealing with objects, so you just say
shipment.  The ss.* is indicated in the "fetch" part of the Hibernate query.
That was my mistake.
it might help the planner estimate better the number of cs rows
affected. whether this improves performance depends on whether
the best plans are sensitive to this.
This sounds like a good idea since cs rows are quite large.  shipment and
shipment_status are the largest tables in the database and they will grow
very large over time.
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Help with tuning this query (with explain analyze finally)

2005-03-02 Thread Ken Egervari
it might help the planner estimate better the number of cs rows
affected. whether this improves performance depends on whether
the best plans are sensitive to this.
I managed to try this and see if it did anything.  Unfortunately, it made no 
difference.  It's still 250 milliseconds.  It was a good suggestion though. 
I believed it work too.

an EXPLAIN ANALYSE might reduce the guessing.
Okay, here is the explain analyze I managed to get from work.  It came out 
to 312ms here, but without the analyze it actually runs at ~250ms.  It is 
using indexes, so my guess is that there are too many joins or it's not 
driving on person fast enough.  Release code is such a small table that I 
dont think that sequencial scan matters.  Thanks for taking the time to 
analyze this.

Sort  (cost=1902.27..1902.31 rows=17 width=91) (actual time=312.000..312.000 
rows=39 loops=1)
 Sort Key: ss.date
 ->  Hash Join  (cost=617.07..1901.92 rows=17 width=91) (actual 
time=234.000..312.000 rows=39 loops=1)
   Hash Cond: ("outer".carrier_code_id = "inner".id)
   ->  Merge Join  (cost=602.54..1882.73 rows=870 width=91) (actual 
time=234.000..312.000 rows=310 loops=1)
 Merge Cond: ("outer".current_status_id = "inner".id)
 ->  Index Scan using shipment_current_status_id_idx on 
shipment s  (cost=0.00..2552.13 rows=60327 width=66) (actual 
time=0.000..61.000 rows=27711 loops=1)
   Filter: (is_purged = false)
 ->  Sort  (cost=602.54..607.21 rows=1866 width=25) (actual 
time=125.000..125.000 rows=6934 loops=1)
   Sort Key: ss.id
   ->  Hash Join  (cost=1.11..501.17 rows=1866 width=25) 
(actual time=0.000..78.000 rows=6934 loops=1)
 Hash Cond: ("outer".release_code_id = "inner".id)
 ->  Index Scan using current_status_date_idx on 
shipment_status ss  (cost=0.00..406.78 rows=14924 width=25) (actual 
time=0.000..47.000 rows=15053 loops=1)
   Index Cond: (date >= (('now'::text)::date - 
31))
   Filter: (id IS NOT NULL)
 ->  Hash  (cost=1.10..1.10 rows=1 width=4) (actual 
time=0.000..0.000 rows=0 loops=1)
   ->  Seq Scan on release_code rc 
(cost=0.00..1.10 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=1)
 Filter: ((number)::text = '9'::text)
   ->  Hash  (cost=14.53..14.53 rows=2 width=4) (actual 
time=0.000..0.000 rows=0 loops=1)
 ->  Nested Loop  (cost=4.92..14.53 rows=2 width=4) (actual 
time=0.000..0.000 rows=2 loops=1)
   ->  Index Scan using person_pkey on person p 
(cost=0.00..5.75 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=1)
 Index Cond: (id = 355)
   ->  Hash Join  (cost=4.92..8.75 rows=2 width=8) (actual 
time=0.000..0.000 rows=2 loops=1)
 Hash Cond: ("outer".id = "inner".carrier_id)
 ->  Seq Scan on carrier c  (cost=0.00..3.54 
rows=54 width=4) (actual time=0.000..0.000 rows=54 loops=1)
 ->  Hash  (cost=4.92..4.92 rows=2 width=16) 
(actual time=0.000..0.000 rows=0 loops=1)
   ->  Hash Join  (cost=3.04..4.92 rows=2 
width=16) (actual time=0.000..0.000 rows=2 loops=1)
 Hash Cond: ("outer".carrier_id = 
"inner".carrier_id)
 ->  Seq Scan on carrier_code cc 
(cost=0.00..1.57 rows=57 width=8) (actual time=0.000..0.000 rows=57 loops=1)
 ->  Hash  (cost=3.04..3.04 rows=1 
width=8) (actual time=0.000..0.000 rows=0 loops=1)
   ->  Index Scan using 
carrier_to_person_person_id_idx on carrier_to_person ctp  (cost=0.00..3.04 
rows=1 width=8) (actual time=0.000..0.000 rows=1 loops=1)
 Index Cond: (355 = 
person_id)
Total runtime: 312.000 ms

Ken 

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Help with tuning this query (with explain analyze finally)

2005-03-02 Thread Tom Lane
"Ken Egervari" <[EMAIL PROTECTED]> writes:
> Okay, here is the explain analyze I managed to get from work.

What platform is this on?  It seems very strange/fishy that all the
actual-time values are exact integral milliseconds.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Help with tuning this query (with explain analyze finally)

2005-03-02 Thread Ken Egervari
"Ken Egervari" <[EMAIL PROTECTED]> writes:
Okay, here is the explain analyze I managed to get from work.
What platform is this on?  It seems very strange/fishy that all the
actual-time values are exact integral milliseconds.
regards, tom lane
My machine is WinXP professional, athon xp 2100, but I get similar results 
on my Intel P4 3.0Ghz as well (which is also running WinXP).  Why do you 
ask? 

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] Help with tuning this query (with explain analyze finally)

2005-03-02 Thread Tom Lane
"Ken Egervari" <[EMAIL PROTECTED]> writes:
>> What platform is this on?  It seems very strange/fishy that all the
>> actual-time values are exact integral milliseconds.

> My machine is WinXP professional, athon xp 2100, but I get similar results 
> on my Intel P4 3.0Ghz as well (which is also running WinXP).  Why do you 
> ask? 

Well, what it suggests is that gettimeofday() is only returning a result
good to the nearest millisecond.  (Win32 hackers, does that sound right?)

If so, I'd have to take the EXPLAIN ANALYZE results with a big grain of
salt, because what it's trying to do is add up a lot of
mostly-sub-millisecond intervals.  What would essentially happen is that
whichever plan node had control at a particular millisecond boundary
would get charged for the whole preceding millisecond, and any other
nodes (which might have actually eaten most of the millisecond) would
get charged nothing.

Over a sufficiently long query run, the errors would average out, but
this wasn't that long --- 312 milliseconds, so in essence we are trying
to estimate the query's behavior from only 312 samples of where it was
at the millisecond boundaries.  I don't trust profiles based on less
than a few thousand samples ...

Most modern machines seem to have clocks that can count elapsed time
down to near the microsecond level.  Anyone know if it's possible to get
such numbers out of Windows, or are we stuck with milliseconds?

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Help with tuning this query (with explain analyze finally)

2005-03-02 Thread Dave Held
> -Original Message-
> From: Tom Lane [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, March 02, 2005 4:30 PM
> To: Ken Egervari
> Cc: pgsql-performance@postgresql.org; 
> [EMAIL PROTECTED]
> Subject: Re: [PERFORM] Help with tuning this query (with 
> explain analyze
> finally)
> 
> [...]
> Well, what it suggests is that gettimeofday() is only 
> returning a result good to the nearest millisecond.  (Win32
> hackers, does that sound right?)

No.  There's no such thing as gettimeofday() in Win32.  So it
must be making some other call, or perhaps an emulation.

> [...]
> Most modern machines seem to have clocks that can count elapsed
> time down to near the microsecond level.  Anyone know if it's 
> possible to get such numbers out of Windows, or are we stuck with
> milliseconds?

QueryPerformanceCounter() is your friend.

http://lists.boost.org/MailArchives/boost/msg45626.php

__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East,  Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Help with tuning this query (with explain analyze finally)

2005-03-02 Thread John A Meinel
Tom Lane wrote:
"Ken Egervari" <[EMAIL PROTECTED]> writes:
 

Okay, here is the explain analyze I managed to get from work.
   

What platform is this on?  It seems very strange/fishy that all the
actual-time values are exact integral milliseconds.

I always get round milliseconds on running. In fact, I think I've seen 
cases where it was actually rounding to 15/16ms. Which is the resolution 
of the "clock()" call (IIRC).

This is the function I have for returning time better than clock(), but 
it looks like it is still stuck no better than 1ms.
/*
* MSVC has a function called _ftime64, which is in
* "sys/timeb.h", which should be accurate to milliseconds
*/

#include 
#include 
double mf::getTime()
{
   struct __timeb64 timeNow;
   _ftime64(&timeNow);
   return timeNow.time + timeNow.millitm / 1000.0;
}
I did, however, find this page:
http://www.wideman-one.com/gw/tech/dataacq/wintiming.htm
Which talks about the high performance counter, which is supposed to be 
able to get better than 1us resolution.

GetSystemTimes() returns the idle/kernel/user times, and seems to have a 
resolution of about 100ns (.1us) GetLocalTime()/GetSystemTime() only has 
a resolution of milliseconds.

In my simple test, I was actually getting timings with a resolution of 
.3us for the QueryPerformanceCounter(). That was the overhead of just 
the call, since it was called either in a bare loop, or just one after 
the other.

So probably we just need to switch to QueryPerformanceCounter() 
[/Frequency].

John
=:->


signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Help with tuning this query (with explain analyze finally)

2005-03-02 Thread Ken Egervari
If so, I'd have to take the EXPLAIN ANALYZE results with a big grain of
salt, because what it's trying to do is add up a lot of
mostly-sub-millisecond intervals.  What would essentially happen is that
whichever plan node had control at a particular millisecond boundary
would get charged for the whole preceding millisecond, and any other
nodes (which might have actually eaten most of the millisecond) would
get charged nothing.
Well, we do know that it's at least 75% accurate.  I'm only looking for a 
relative increase in performance.  My goal is to try and get this query down 
to 30 milliseconds. But even 125 or 75 would be an improvement.  Any 
improvement, even based on fuzzy data, is still an improvement.  Being 
precise isn't really that important, at least not to me or the people using 
the application.  I can see how rounding can throw off results in the inner 
parts of the plan though, but I think we should try and work with the 
explain as it is.  If there is anything else I can give you to help me out, 
please ask and I will kindly do it.  I want to make this easy for you.

Over a sufficiently long query run, the errors would average out, but
this wasn't that long --- 312 milliseconds, so in essence we are trying
to estimate the query's behavior from only 312 samples of where it was
at the millisecond boundaries.  I don't trust profiles based on less
than a few thousand samples ...
I'm just using data from the production database, which only has 5 digits 
worth of rows in the main tables.  I don't think I can get millions of rows 
in these tables, although I wish I could.  I'd have to write a program to 
insert the data randomly and try to make it distributed the way a real 
production database might look in a few years if I wanted the most accurate 
results.  I would try to make the dates bunched up correctly and add more 
carriers and shipments over time (as more customers would use the system) 
expoentially.

But I'm trying to be practical too.  This query is too slow for 5 digits of 
rows in the database.  Imagine how bad it would be with millions! 
Unfortunately, this query gets ran by hundreds of people logged in every 60 
seconds on average.  It must be as fast as possible.  During peak times, 
people have to wait 5 or 6 seconds just to see the results of this query.

I understand the app may be at fault too, but if this query performed 
faster, I'm sure that would solve that problem because it's inheritly slow 
and the app is very well layered.  It makes good use of frameworks like 
Spring, Hibernate and database pooling, which have been used on many 
applications and have been running very well for us.   The fact that the 
query is slow in PgAdmin III or phpPgAdmin speaks that the query can be 
tuned better.

I am no master tuner.  I have read as much as I could about database tuning 
in general, about the proper use of Hibernate and so on.  Frankly, I am not 
experienced enough to solve this problem and I wish to learn from the 
experts, like you Tom, John, Ragnar and others that have responded kindly to 
my request.

Most modern machines seem to have clocks that can count elapsed time
down to near the microsecond level.  Anyone know if it's possible to get
such numbers out of Windows, or are we stuck with milliseconds?
These results came from PgAdmin III directly.  I'm not sure how I can get 
different results even if I knew of a way. 

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Help with tuning this query (more musings)

2005-03-02 Thread Ken Egervari
I took John's advice and tried to work with sub-selects.  I tried this 
variation, which actually seems like it would make a difference 
conceptually since it drives on the person table quickly.  But to my 
surprise, the query runs at about 375 milliseconds.  I think it's because 
it's going over that shipment table multiple times, which is where the 
results are coming from.
I also made a version that runs over shipment a single time, but it's 
exactly 250 milliseconds.  I guess the planner does the exact same thing.

select s.*, ss.*
from shipment s
inner join shipment_status ss on s.current_status_id=ss.id
inner join release_code rc on ss.release_code_id=rc.id
left outer join driver d on s.driver_id=d.id
left outer join carrier_code cc on s.carrier_code_id=cc.id
where s.carrier_code_id in (
 select cc.id
 from person p
  inner join carrier_to_person ctp on p.id=ctp.person_id
  inner join carrier c on ctp.carrier_id=c.id
  inner join carrier_code cc on cc.carrier_id = c.id
 where p.id = 355
)
and s.current_status_id is not null
and s.is_purged=false
and(rc.number='9' )
and(ss.date>=current_date-31 )
order by ss.date desc 

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Help with tuning this query (Some musings)

2005-03-02 Thread Ken Egervari
I took John's advice and tried to work with sub-selects.  I tried this
variation, which actually seems like it would make a difference conceptually
since it drives on the person table quickly.  But to my surprise, the query
runs at about 375 milliseconds.  I think it's because it's going over that
shipment table multiple times, which is where the results are coming from.
select s.*, ss.*
from shipment s
inner join shipment_status ss on s.current_status_id=ss.id
inner join release_code rc on ss.release_code_id=rc.id
left outer join driver d on s.driver_id=d.id
left outer join carrier_code cc on s.carrier_code_id=cc.id
where s.id in (
 select s.id
 from person p
  inner join carrier_to_person ctp on p.id=ctp.person_id
  inner join carrier c on ctp.carrier_id=c.id
  inner join carrier_code cc on cc.carrier_id = c.id
  inner join shipment s on s.carrier_code_id = cc.id
 where p.id = 355
 and s.current_status_id is not null
 and s.is_purged=false
)
and(rc.number='9' )
and(ss.date>=current_date-31 )
order by ss.date desc
*** Musing 1
Also, "s.current_status_id is not null" is an important filter that I forgot
to mention.  In this example where p.id = 355, it filters out 90% of the
rows.  In general, that filter ratio is 0.46 though, which is not quite so
high.  However, this filter gets better over time because more and more
users will use a filter that will make this value null.  It's still not as
strong as person though and probably never will be.  But I thought I'd
mention it nonetheless.
*** Musing 2
I do think that the filter "ss.date>=current_date-31" is slowing this query
down.  I don't think it's the mention of "current_date" or even that it's
dynamic instead of static.  I think the range is just too big.  For example,
if I use:
and ss.date between '2005-02-01 00:00:00' and '2005-02-28 23:59:59'
The query still results in 250 milliseconds.  But if I make the range very
small - say Feb 22nd of 2005:
and ss.date between '2005-02-22 00:00:00' and '2005-02-22 23:59:59'
Now the entire query runs in 47 milliseconds on average.  If I can't make
this query perform any better, should I change the user interface to select
the date instead of showing the last 31 days to benefit from this single-day
filter?  This causes more clicks to select the day (like from a calendar),
but most users probably aren't interested in seeing the entire listing
anyway.  However, it's a very important requirement that users know that
shipment enteries exist in the last 31 days (because they are usually
sure-fire problems if they are still in this query after a few days).
I guess I'm wondering if tuning the query is futile and I should get the
requirements changed, or is there something I can do to really speed it up?
Thanks again,
Ken
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Help with tuning this query

2005-03-02 Thread Mark Kirkwood
Ken Egervari wrote:
I've tried to use Dan Tow's tuning method and created all the right indexes 
from his diagraming method, but the query still performs quite slow both inside 
the application and just inside pgadmin III.  Can anyone be kind enough to help 
me tune it so that it performs better in postgres?  I don't think it's using 
the right indexes, or maybe postgres needs special treatment.
I've converted the below query to SQL from a Hibernate query, so the syntax is 
probably not perfect but it's semantics are exactly the same.  I've done so by 
looking at the source code, but I can't run it to get the exact SQL since I 
don't have the database on my home machine.
select s.*
from shipment s
inner join carrier_code cc on s.carrier_code_id = cc.id
inner join carrier c on cc.carrier_id = c.id
inner join carrier_to_person ctp on ctp.carrier_id = c.id
inner join person p on p.id = ctp.person_id
inner join shipment_status cs on s.current_status_id = cs.id
inner join release_code rc on cs.release_code_id = rc.id
left join shipment_status ss on ss.shipment_id = s.id
where
p.id = :personId and
s.is_purged = false and
rc.number = '9' and
cs is not null and
cs.date >= current_date - 31
order by cs.date desc
You might be able to coerce the planner to drive off person by
rearranging the join orders, plus a few other bits... hopefully I have
not brutalized the query to the point where it does not work :-)  :
select p.id, s*, ss.*
from person p
inner join carrier_to_person ctp on p.id = ctp.person_id
inner join carrier c on ctp.carrier_id = c.id
inner join carrier_code cc on cc.carrier_id = c.id
inner join shipment s on s.carrier_code_id = cc.id
inner join shipment_status cs on s.current_status_id = cs.id
inner join release_code rc on cs.release_code_id = rc.id
left join shipment_status ss on ss.shipment_id = s.id
where
p.id = :personId and
s.is_purged = false and
rc.number = 9 and
cs is not null and
cs.date between current_date - 31 and current_date
order by cs.date desc
I have added the 'p.id' in the select list in the hope that that might
encourage the planner to take seriously the idea of getting the person
row(?) first. In addition I made 9 a number and closed the inequality
(just in case it helps a bit).


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Help with tuning this query (more musings)

2005-03-02 Thread John A Meinel
Ken Egervari wrote:
I took John's advice and tried to work with sub-selects.  I tried 
this variation, which actually seems like it would make a difference 
conceptually since it drives on the person table quickly.  But to my 
surprise, the query runs at about 375 milliseconds.  I think it's 
because it's going over that shipment table multiple times, which is 
where the results are coming from.

I also made a version that runs over shipment a single time, but it's 
exactly 250 milliseconds.  I guess the planner does the exact same thing.

Why are you now left joining driver and carrier code, but inner joining 
shipment_status? I assume this is the *real* query that you are executing.

From the earlier explain analyze, and your statements, the initial 
person p should be the heavily selective portion.

And what does "driver" get you? It isn't in the return, and it isn't 
part of a selectivity clause.
You are also double joining against carrier code, once as a left outer 
join, and once in the inner join.

This query doesn't seem quite right. Are you sure it is generating the 
rows you are expecting?

select s.*, ss.*
from shipment s
inner join shipment_status ss on s.current_status_id=ss.id
inner join release_code rc on ss.release_code_id=rc.id
left outer join driver d on s.driver_id=d.id
left outer join carrier_code cc on s.carrier_code_id=cc.id
where s.carrier_code_id in (
 select cc.id
 from person p
  inner join carrier_to_person ctp on p.id=ctp.person_id
  inner join carrier c on ctp.carrier_id=c.id
  inner join carrier_code cc on cc.carrier_id = c.id
 where p.id = 355
)
and s.current_status_id is not null
and s.is_purged=false
and(rc.number='9' )
and(ss.date>=current_date-31 )
order by ss.date desc
You might want to post the explain analyze of this query to have a point 
of reference, but what about something like this:
select s.*, ss.*

from shipment_status ss on s.current_status_id=ss.id
join (select s.* from shipment s
  where s.carrier_code_id in
   (select cc.id
  from person p
 inner join carrier_to_person ctp on p.id=ctp.person_id
 inner join carrier c on ctp.carrier_id=c.id
 inner join carrier_code cc on cc.carrier_id = c.id
where p.id = 355
   )
and s.current_status_id is not null
and s.is_purged=false
) as i -- Just a name for the subselect since it is in a join
inner join release_code rc on ss.release_code_id=rc.id
where (rc.number='9' )
and(ss.date between current_date-31 and current_date())
order by ss.date desc
My idea with this query is to minimize the number of shipment rows that 
need to be generated before joining with the other rows. My syntax is 
probably a little bit off, since I can't actually run it against real 
tables.
But looking at your *original* query, you were getting 15000 rows out of 
shipment_status, and then 27700 rows out of shipment, which was then 
being merge-joined down to only 300 rows, and then hash-joined down to 39.

I'm just trying to think of ways to prevent it from blossoming into 27k 
rows to start with.

Please double check your query, because it seems to be grabbing 
unnecessary rows with the left joins, and then post another explain 
analyze with one (or several) different subselect forms.

John
=:->


signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Performance tradeoff

2005-03-02 Thread Josh Berkus
Shawn,

> I can also change the schema to a certain extent, so would it be worthwhile
> to put indices on the queried tables (or refactor them) hoping the distinct
> does an index scan instead of sort...  would the query planner take
> advantage of that?

Use the GROUP BY, with an index on the grouped columns and lots of work_mem 
(sort_mem in 7.4).   This will give the planner the option of a hashaggregate 
which could be significantly faster than the other methods.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Help with tuning this query

2005-03-02 Thread Josh Berkus
Ken,

> I've tried to use Dan Tow's tuning method and created all the right indexes
> from his diagraming method, but the query still performs quite slow both
> inside the application and just inside pgadmin III. ÂCan anyone be kind
> enough to help me tune it so that it performs better in postgres? ÂI don't
> think it's using the right indexes, or maybe postgres needs special
> treatment.

FWIW, I picked up Dan Tow's book to give it a read, and they guy isn't 
qualified to author "SQL Tuning".   You should chuck that book, it won't help 
you -- not with Oracle or SQL Server, and certainly not with PostgreSQL.   
O'Reilly continues to have trouble turning out quality database books.

Also, if you *were* using Dan's method, you'd be driving off Person, not 
Shipment.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Help with tuning this query (with explain analyze finally)

2005-03-02 Thread Josh Berkus
Ken,

>         ->  Merge Join  (cost=602.54..1882.73 rows=870 width=91) (actual
> time=234.000..312.000 rows=310 loops=1)
>               Merge Cond: ("outer".current_status_id = "inner".id)

Hmmm ... this merge join appears to be the majority of your execution 
time  at least within the resolution that PGWin allows us.Please try 
two things, and give us Explain Analyzes:

1) To determine your query order ala Dan Tow and drive off of person, please 
SET JOIN_COLLAPSE_LIMIT = 1 and then run Mark Kirkwood's version of the 
query.  (Not that I believe in Dan Tow ... see previous message ... but it 
would be interesting to see the results.

2) Force PG to drop the merge join via SET ENABLE_MERGEJOIN = FALSE;

Also, please let us know some about the server you're using and your 
configuration parameters, particularly:
shared_buffers
work_mem
effective_cache_size
random_page_cost

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Help with tuning this query (more musings)

2005-03-02 Thread Ken Egervari
John,
Why are you now left joining driver and carrier code, but inner joining
shipment_status? I assume this is the *real* query that you are executing.
Well, the old and new versions are real queries.  I changed the query a bit 
because I noticed for some users, the listing was pulling out many different 
drivers.  Each separate query on the driver took about 10 milliseconds.  For 
a listing of 39 results, that's a possible 390 milliseconds assuming all the 
drivers are different and none of them are cached.  So, I just left joined 
the driver and it added about 5 milliseconds of overhead to this query.  I 
apoligize for not communicating this change, but I had to make it to speed 
this stuff up during the day until I could fix the root of the problem.  One 
thing that I learned is that left joining and including lots of columns 
rarely slows the query.  The same was done for the carrier_code, although 
this only saved 15 milliseconds.

The end result is still high because the query we are talking about is very 
expensive, but at least the following queries that appeared after are 
eliminated altogether.  The overhead and separate queries really places a 
hamper on overall performance.  For the person 355, the overhead was about 
300 milliseconds since 10 of the drivers were null.  I hope this makes 
sense.

From the earlier explain analyze, and your statements, the initial
person p should be the heavily selective portion.
I totally agree.  I just never really figured out how to tell postgres my 
intentions.

You are also double joining against carrier code, once as a left outer
join, and once in the inner join.
Yes, that was my mistake since Hibernate didn't generate that - I manually 
put in those sub-selects.

This query doesn't seem quite right. Are you sure it is generating the
rows you are expecting?
Yes, the results are the same with the left joins.  I didn't include d.* and 
cc.* in the select, which again, is my mistake.  The main problem is when I 
make changes to the query, I don't think about it in terms of how SQL does 
it.  I think about Hibernate does it.  Earger loading rows is different from 
selecting the main row at the top of the query.  I bet this comes as very 
strange, but in Hibernate they are two-different things.  I've been using 
Hibernate for so long that working with SQL is not so natural for me.  This 
is my mistake and I apologize.

You might want to post the explain analyze of this query to have a point
of reference, but what about something like this:
select s.*, ss.*
Okay.  Here is syntax-corrected version of your very creative query.  I 
wouldn't have thought of doing something like this at all.  It makes perfect 
sense that you are commanding the database to do what it should be doing, 
which is something I really like since the concept of a planner picking 
stuff for me makes me unsettled (even if it is doing it right).

select i.*, ss.*
from shipment_status ss
inner join release_code rc on ss.release_code_id=rc.id,
(
 select s.*
 from shipment s
 where s.current_status_id is not null
  and s.is_purged=false
  and s.carrier_code_id in (
   select cc.id
   from person p
inner join carrier_to_person ctp on p.id=ctp.person_id
inner join carrier c on ctp.carrier_id=c.id
inner join carrier_code cc on cc.carrier_id = c.id
   where p.id = 355
  )
) as i
where (rc.number='9' )
and(i.current_status_id = ss.id)
and(ss.date between current_date-31 and current_date);
When running this on my production database, the speed is 265 milliseconds 
on average running it 20 times (lowest was 250, highest was 281).  Not quite 
what we want, but I'm sure the tuning of this new query hasn't really 
started.  Here is the EXPLAIN ANALYZE.  It seems very similiar to the one 
postgres picked out but it's a bit shorter.

Hash IN Join  (cost=676.15..1943.11 rows=14 width=91) (actual 
time=250.000..328.000 rows=39 loops=1)
 Hash Cond: ("outer".carrier_code_id = "inner".id)
 ->  Merge Join  (cost=661.65..1926.51 rows=392 width=91) (actual 
time=250.000..328.000 rows=310 loops=1)
   Merge Cond: ("outer".current_status_id = "inner".id)
   ->  Index Scan using shipment_current_status_id_idx on shipment s 
(cost=0.00..2702.56 rows=27257 width=66) (actual time=0.000..110.000 
rows=27711 loops=1)
 Filter: ((current_status_id IS NOT NULL) AND (is_purged = 
false))
   ->  Sort  (cost=661.65..666.46 rows=1922 width=25) (actual 
time=140.000..172.000 rows=6902 loops=1)
 Sort Key: ss.id
 ->  Hash Join  (cost=1.11..556.82 rows=1922 width=25) (actual 
time=0.000..94.000 rows=6902 loops=1)
   Hash Cond: ("outer".release_code_id = "inner".id)
   ->  Index Scan using current_status_date_idx on 
shipment_status ss  (cost=0.01..459.64 rows=15372 width=25) (actual 
time=0.000..94.000 rows=14925 loops=1)
 Index Cond: ((date >= (('now'::text)::date - 31)) 
AND (date <= ('now'::text)::date))
   

Re: [PERFORM] Help with tuning this query (with explain analyze finally)

2005-03-02 Thread Ken Egervari
Josh,
1) To determine your query order ala Dan Tow and drive off of person, 
please
SET JOIN_COLLAPSE_LIMIT = 1 and then run Mark Kirkwood's version of the
query.  (Not that I believe in Dan Tow ... see previous message ... but it
would be interesting to see the results.
Unfortunately, the query still takes 250 milliseconds.  I tried it with 
other queries and the results are the same as before.  Here is the explain 
analayze anyway:

Sort  (cost=2036.83..2036.87 rows=16 width=103) (actual 
time=328.000..328.000 rows=39 loops=1)
 Sort Key: cs.date
 ->  Nested Loop Left Join  (cost=620.61..2036.51 rows=16 width=103) 
(actual time=250.000..328.000 rows=39 loops=1)
   ->  Hash Join  (cost=620.61..1984.90 rows=16 width=78) (actual 
time=250.000..328.000 rows=39 loops=1)
 Hash Cond: ("outer".carrier_code_id = "inner".id)
 ->  Merge Join  (cost=606.11..1965.99 rows=825 width=74) 
(actual time=250.000..328.000 rows=310 loops=1)
   Merge Cond: ("outer".current_status_id = "inner".id)
   ->  Index Scan using shipment_current_status_id_idx on 
shipment s  (cost=0.00..2701.26 rows=60307 width=66) (actual 
time=0.000..77.000 rows=27711 loops=1)
 Filter: (is_purged = false)
   ->  Sort  (cost=606.11..610.50 rows=1756 width=12) 
(actual time=141.000..141.000 rows=6902 loops=1)
 Sort Key: cs.id
 ->  Hash Join  (cost=1.11..511.48 rows=1756 
width=12) (actual time=0.000..109.000 rows=6902 loops=1)
   Hash Cond: ("outer".release_code_id = 
"inner".id)
   ->  Index Scan Backward using 
current_status_date_idx on shipment_status cs  (cost=0.01..422.58 rows=14047 
width=16) (actual time=0.000..78.000 rows=14925 loops=1)
 Index Cond: ((date >= 
(('now'::text)::date - 31)) AND (date <= ('now'::text)::date))
 Filter: (cs.* IS NOT NULL)
   ->  Hash  (cost=1.10..1.10 rows=1 width=4) 
(actual time=0.000..0.000 rows=0 loops=1)
 ->  Seq Scan on release_code rc 
(cost=0.00..1.10 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=1)
   Filter: ((number)::text = 
'9'::text)
 ->  Hash  (cost=14.49..14.49 rows=2 width=8) (actual 
time=0.000..0.000 rows=0 loops=1)
   ->  Nested Loop  (cost=6.87..14.49 rows=2 width=8) 
(actual time=0.000..0.000 rows=2 loops=1)
 ->  Index Scan using person_pkey on person p 
(cost=0.00..5.73 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=1)
   Index Cond: (id = 355)
 ->  Hash Join  (cost=6.87..8.74 rows=2 width=8) 
(actual time=0.000..0.000 rows=2 loops=1)
   Hash Cond: ("outer".carrier_id = 
"inner".carrier_id)
   ->  Seq Scan on carrier_code cc 
(cost=0.00..1.57 rows=57 width=8) (actual time=0.000..0.000 rows=57 loops=1)
   ->  Hash  (cost=6.86..6.86 rows=1 width=12) 
(actual time=0.000..0.000 rows=0 loops=1)
 ->  Hash Join  (cost=3.04..6.86 rows=1 
width=12) (actual time=0.000..0.000 rows=1 loops=1)
   Hash Cond: ("outer".id = 
"inner".carrier_id)
   ->  Seq Scan on carrier c 
(cost=0.00..3.54 rows=54 width=4) (actual time=0.000..0.000 rows=54 loops=1)
   ->  Hash  (cost=3.04..3.04 
rows=1 width=8) (actual time=0.000..0.000 rows=0 loops=1)
 ->  Index Scan using 
carrier_to_person_person_id_idx on carrier_to_person ctp  (cost=0.00..3.04 
rows=1 width=8) (actual time=0.000..0.000 rows=1 loops=1)
   Index Cond: (355 = 
person_id)
   ->  Index Scan using shipment_status_shipment_id_idx on 
shipment_status ss  (cost=0.00..3.20 rows=2 width=25) (actual 
time=0.000..0.000 rows=1 loops=39)
 Index Cond: (ss.shipment_id = "outer".id)
Total runtime: 328.000 ms

2) Force PG to drop the merge join via SET ENABLE_MERGEJOIN = FALSE;
Setting this option had no effect either  In fact, the query is a bit slower 
(266 milliseconds but 250 came up once in 20 executions).

Also, please let us know some about the server you're using and your
configuration parameters, particularly:
shared_buffers
work_mem
effective_cache_size
random_page_cost
Well, I'm on a test machine so the settings haven't changed one bit from the 
defaults.  This may sound embarrassing, but I bet the production server is 
not custom configured either.  The computer I'm running these queries on is 
just a simple Athon XP 2100+ on WinXP with 1GB of RAM.  The production 
server is a faster P4, but the rest is the same.  Here 

Re: [PERFORM] Help with tuning this query (more musings)

2005-03-02 Thread Richard Huxton
Ken Egervari wrote:
Hash IN Join  (cost=676.15..1943.11 rows=14 width=91) (actual 
time=250.000..328.000 rows=39 loops=1)
 Hash Cond: ("outer".carrier_code_id = "inner".id)
 ->  Merge Join  (cost=661.65..1926.51 rows=392 width=91) (actual 
time=250.000..328.000 rows=310 loops=1)
   Merge Cond: ("outer".current_status_id = "inner".id)
   ->  Index Scan using shipment_current_status_id_idx on shipment s 
(cost=0.00..2702.56 rows=27257 width=66) (actual time=0.000..110.000 
rows=27711 loops=1)
 Filter: ((current_status_id IS NOT NULL) AND (is_purged = 
false))
There's a feature in PG called partial indexes - see CREATE INDEX 
reference for details. Basically you can do something like:

CREATE INDEX foo_idx ON shipment (carrier_code_id)
WHERE current_status_id IS NOT NULL
AND is_purged = FALSE;
Something similar may be a win here, although the above index might not 
be quite right - sorry, bit tired at moment.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster