[PERFORM] How to interpret this explain analyse?

2005-02-11 Thread Joost Kraaijeveld
Hi all,

A question on how to read and interpret the explain analyse statement (and what 
to do)

I have a query "SELECT A.ordernummer, B.klantnummer FROM orders A LEFT OUTER 
JOIN klt_alg B ON A.Klantnummer=B.Klantnummer ORDER BY A.klantnummer;"

Both tables have an btree index on klantnummer (int4, the column the join is 
on). I have vacuumed and analyzed both tables. The explain analyse is:

QUERY PLAN
Sort  (cost=220539.32..223291.41 rows=1100836 width=12) (actual 
time=51834.128..56065.126 rows=1104380 loops=1)
  Sort Key: a.klantnummer
  ->  Hash Left Join  (cost=41557.43..110069.51 rows=1100836 width=12) (actual 
time=21263.858..42845.158 rows=1104380 loops=1)
Hash Cond: (""outer"".klantnummer = ""inner"".klantnummer)
->  Seq Scan on orders a  (cost=0.00..46495.36 rows=1100836 width=8) 
(actual time=5.986..7378.488 rows=1104380 loops=1)
->  Hash  (cost=40635.14..40635.14 rows=368914 width=4) (actual 
time=21256.683..21256.683 rows=0 loops=1)
  ->  Seq Scan on klt_alg b  (cost=0.00..40635.14 rows=368914 
width=4) (actual time=8.880..18910.120 rows=368914 loops=1)
Total runtime: 61478.077 ms


Questions:
  ->  Hash Left Join  (cost=41557.43..110069.51 rows=1100836 width=12) (actual 
time=21263.858..42845.158 rows=1104380 loops=1)

0. What exactly are the numbers in "cost=41557.43..110069.51" ( I assume for 
the other questions that 41557.43 is the estimated MS the query will take, what 
are the others)?

1. I assume that (cost=41557.43..110069.51 rows=1100836 width=12) is the 
estimated cost and (actual time=21263.858..42845.158 rows=1104380 loops=1) the 
actual cost. Is the difference acceptable?

2. If not, what can I do about it?

Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl 

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] How to interpret this explain analyse?

2005-02-11 Thread Richard Huxton
Joost Kraaijeveld wrote:
Hi all,
A question on how to read and interpret the explain analyse statement
(and what to do)
I have a query "SELECT A.ordernummer, B.klantnummer FROM orders A
LEFT OUTER JOIN klt_alg B ON A.Klantnummer=B.Klantnummer ORDER BY
A.klantnummer;"
Both tables have an btree index on klantnummer (int4, the column the
join is on). I have vacuumed and analyzed both tables. The explain
analyse is:
Indexes not necessarily useful here since you're fetching all rows in A 
and presumably much of B

Sort
  Hash Left Join
Seq Scan on orders a
Hash
  Seq Scan on klt_alg b
I've trimmed the above from your explain output. It's sequentially 
scanning "b" and using a hash to join to "a" before sorting the results.

Questions: ->  Hash Left Join  (cost=41557.43..110069.51 rows=1100836
width=12) (actual time=21263.858..42845.158 rows=1104380 loops=1)
0. What exactly are the numbers in "cost=41557.43..110069.51" ( I
assume for the other questions that 41557.43 is the estimated MS the
query will take, what are the others)?
The cost numbers represent "effort" rather than time. They're only 
really useful in that you can compare one part of the query to another. 
There are two numbers because the first shows startup, the second final 
time. So - the "outer" parts of the query will have increasing startup 
values since the "inner" parts will have to do their work first.

The "actual time" is measured in ms, but remember to multiply it by the 
"loops" value. Oh, and actually measuring the time slows the query down too.

1. I assume that (cost=41557.43..110069.51 rows=1100836 width=12) is
the estimated cost and (actual time=21263.858..42845.158 rows=1104380
loops=1) the actual cost. Is the difference acceptable?
2. If not, what can I do about it?
The key thing to look for here is the number of rows. If PG expects say 
100 rows but there are instead 10,000 then it may choose the wrong plan. 
In this case the estimate is 1,100,836 and the actual is 1,104,380 - 
very close.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq


Re: [PERFORM] Benchmark

2005-02-11 Thread Bruno Wolff III
On Fri, Feb 11, 2005 at 02:22:39 -0500,
  Jaime Casanova <[EMAIL PROTECTED]> wrote:
> What about the free speech rigths, in USA they are in the constitution
> and cannot be denied or revoked, IANAL.

You can voluntarily give up your rights to free speech in the US.

> And like stated by Mitch just numbers are not lies that can be pursued
> in a court of law.

I think part of the reason they don't want people doing this, is because
if you don't configure their database well, you can make it look bad
when it shouldn't.

> Think anout it, In USA you can speak and publish about the President
> but cannot say anything about M$ or Oracles' DBMS?

Not if you signed a contract that says you can't.

If you didn't actually sign an agreement saying you wouldn't publish
benchmarks, then you might have a case. You might argue that a click
through eula isn't a valid contract or that you are a third party
who isn't bound by whatever agreement the person who installed Oracle
made. However it probably would cost you a bundle to have a chance
at winning.

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Benchmark

2005-02-11 Thread Richard Huxton
Bruno Wolff III wrote:
On Fri, Feb 11, 2005 at 02:22:39 -0500,
  Jaime Casanova <[EMAIL PROTECTED]> wrote:
Think anout it, In USA you can speak and publish about the President
but cannot say anything about M$ or Oracles' DBMS?

Not if you signed a contract that says you can't.
If you didn't actually sign an agreement saying you wouldn't publish
benchmarks, then you might have a case. You might argue that a click
through eula isn't a valid contract or that you are a third party
who isn't bound by whatever agreement the person who installed Oracle
made. However it probably would cost you a bundle to have a chance
at winning.
IANAL etc, but the key fear is more likely that Oracle merely cancel 
your licence(s). And deny you any more. And prevent your software from 
running on top of Oracle. At which point, you have to sue Oracle and 
prove restraint of trade or unfair competition or similar. Don't forget 
that you have no right to purchase Oracle licences, they are free to 
sell to whoever they choose and under whatever conditions.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] Benchmark

2005-02-11 Thread Jeff
On Feb 11, 2005, at 2:04 AM, Mitch Pirtle wrote:
I did do the research, but couldn't find one instance where someone
was actually taken to task over it. So far it appears to be bluster.
Horrifying to some, but still bluster.
They may not have done that yet, but they _COULD_.  And if they decide 
to they have more money and power than you likely have and would drive 
you into financial ruin for the rest of your life (Even if you are 
correct).   It is a big risk.  I think that clause is in there so MS, 
etc. can't say "Use FooSQL, its 428% faster than that Oracle POS Just 
look!"

After using oracle in the last few months..  I can see why they'd want 
to prevent those numbers.. Oracle really isn't that good.  I had been 
under the impression that it was holy smokes amazingly fast.  It just 
isn't.  At least, in my experience it isn't.  but that is another 
story.

--
Jeff Trout <[EMAIL PROTECTED]>
http://www.jefftrout.com/
http://www.stuarthamm.net/
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Benchmark

2005-02-11 Thread Greg Stark
Jeff <[EMAIL PROTECTED]> writes:

> After using oracle in the last few months..  I can see why they'd want to
> prevent those numbers.. Oracle really isn't that good.  I had been under the
> impression that it was holy smokes amazingly fast.  It just isn't.  At least,
> in my experience it isn't.  but that is another story.

Oracle's claim to performance comes not from tight coding and low overhead.
For that you use Mysql :)

Oracle's claim to performance comes from how you can throw it at a machine
with 4-16 processors and it really does get 4-16x as fast. Features like
partitioned tables, parallel query, materialized views, etc make it possible
to drive it further up the performance curve than Sybase/MSSQL or Postgres.

In terms of performance, Oracle is to Postgres as Postgres is to Mysql: More
complexity, more overhead, more layers of abstraction, but in the long run it
pays off when you need it. (Only without the user-friendliness of either
open-source softwares.)

-- 
greg


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] How to interpret this explain analyse?

2005-02-11 Thread Tom Lane
Richard Huxton  writes:
> Joost Kraaijeveld wrote:
>> 2. If not, what can I do about it?

> The key thing to look for here is the number of rows. If PG expects say 
> 100 rows but there are instead 10,000 then it may choose the wrong plan. 
> In this case the estimate is 1,100,836 and the actual is 1,104,380 - 
> very close.

On the surface this looks like a reasonable plan choice.  If you like
you can try the other two basic types of join plan by turning off
enable_hashjoin, which will likely drive the planner to use a merge
join, and then also turn off enable_mergejoin to get a nested loop
(or if it thinks nested loop is second best, turn off enable_nestloop
to see the behavior with a merge join).

What's important in comparing different plan alternatives is the ratios
of estimated costs to actual elapsed times.  If the planner is doing its
job well, those ratios should be similar across all the alternatives
(which implies of course that the cheapest-estimate plan is also the
cheapest in reality).  If not, it may be appropriate to fool with the
planner's cost estimate parameters to try to line up estimates and
reality a bit better.

See
http://www.postgresql.org/docs/8.0/static/performance-tips.html
for more detail.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Benchmark

2005-02-11 Thread Mike Benoit
I have never used Oracle myself, nor have I read its license agreement,
but what if you didn't name Oracle directly? ie:

TPS Database
---
112 MySQL
120 PgSQL
90  Sybase
95  "Other database that *may* start with a letter after N"
50  "Other database that *may* start with a letter after L"

As far as I know there are only a couple databases that don't allow you
to post benchmarks, but if they remain "unnamed" can legal action be
taken? 

Just like all those commercials on TV where they advertise: "Cleans 10x
better then the other leading brand".


On Fri, 2005-02-11 at 00:22 -0500, Mitch Pirtle wrote:
> On Thu, 10 Feb 2005 08:21:09 -0500, Jeff <[EMAIL PROTECTED]> wrote:
> > 
> > If you plan on making your results public be very careful with the
> > license agreements on the other db's.  I know Oracle forbids the
> > release of benchmark numbers without their approval.
> 
> ...as all of the other commercial databases do. This may be off-topic,
> but has anyone actually suffered any consequences of a published
> benchmark without permission?
> 
> For example, I am a developer of Mambo, a PHP-based CMS application,
> and am porting the mysql functions to ADOdb so I can use grown-up
> databases ;-)
> 
> What is keeping me from running a copy of Mambo on a donated server
> for testing and performance measures (including the commercial
> databases) and then publishing the results based on Mambo's
> performance on each?
> 
> It would be really useful to know if anyone has ever been punished for
> doing this, as IANAL but that restriction is going to be very, VERY
> difficult to back up in court without precedence. Is this just a
> deterrent, or is it real?
> 
> -- Mitch
> 
> ---(end of broadcast)---
> TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
-- 
Mike Benoit <[EMAIL PROTECTED]>


signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] Benchmark (slightly off topic but oh well)

2005-02-11 Thread PFC

For example, I am a developer of Mambo, a PHP-based CMS application,
and am porting the mysql functions to ADOdb so I can use grown-up
databases ;-)
	Just yesterday I "optimized" a query for a website running MySQL. It's  
the 'new products' type query :

SELECT product_id, pd.product_name, p.price, COALESCE( s.specials_price,  
p.price ) as real_price
FROM products p, products_descriptions pd LEFT join specials s ON  
(p.product_id = s.product_id)
WHERE p.product_id = pd.product_id
AND pd.language_id=(constant)
AND p.product_visible=TRUE
AND s.is_active = TRUE
ORDER BY p.date_added DESC LIMIT 6

	With ~100 products everything went smooth, about 0.5 ms. I decided to  
test with 20.000 because we have a client with a large catalog coming.  
Wow. It took half a second, to yield six products. Note that there are  
appropriate indexes all over the place (for getting the new products, I  
have an index on product_visible, date_added)

	I tested with Postgres : with 100 products it takes 0.4 ms, with 20.000  
it takes 0.6 ms...

	Postgres needs a bit of query massaging (putting an extra ORDER BY  
product_visible to use the index). With MySQL no amount of query rewriting  
would do.
	I noted sometimes MySQL would never use a multicolumn index for an ORDER  
BY LIMIT unless one specifies a dummy condition on the missing parameter.

	So I had to split the query in two : fetch the six product_ids, store  
them in a PHP variable, implode(',',$ids), and SELECT ... WHERE product_id  
IN (x,y,z)

UGLY ! And a lot slower.
Note this is with MySQL 4.0.23 or something. Maybe 4.1 would be faster.
	Here's the URL to the site. There is a query log if you wanna look just  
for laughs. Note that all the products boxes are active which makes a very  
long page time... There are 42000 fictive products and about 60 real  
products. Don't use the search form unless you have a good book to read !  
You can click on "Nouveautés" to see the old "new products" query in  
action, but please, only one people at a time.

	http://pinceau-d-or.com/gros/product_info.php?products_id=164
	Ah, you can buy stuff with the test version if you like, just don't use  
the credit card because ... it works ;)

This is the un-messed-up version (production) :
http://pinceau-d-or.com/product_info.php?products_id=164
	If some day I can recode this mess to use Postgres... this would be nice,  
so nice... the other day my database went apeshit and in the absence of  
foreign keys... and the absence of PHP checking anything...  !
	

test=# CREATE TABLE suicide (id INT NOT NULL, moment TIMESTAMP NOT NULL);
CREATE TABLE
test=# INSERT INTO suicide (id,moment) VALUES (0,now());
INSERT 6145577 1
test=# INSERT INTO suicide (id,moment) VALUES (0,0);
ERREUR:  La colonne <> est de type timestamp without time zone  
mais l'expression est de type integer
HINT:  Vous devez reecrire l'expression ou lui appliquer une  
transformation de type.
test=# INSERT INTO suicide (id,moment) VALUES (NULL,1);
ERREUR:  La colonne <> est de type timestamp without time zone  
mais l'expression est de type integer
HINT:  Vous devez reecrire l'expression ou lui appliquer une  
transformation de type.
test=# INSERT INTO suicide (id,moment) VALUES (NULL,now());
ERREUR:  Une valeur NULL dans la colonne <> viole la contrainte NOT  
NULL
test=# SELECT * FROM suicide;
 id |   moment
+
  0 | 2005-02-11 19:16:21.262359

mysql> CREATE TABLE suicide (id INT NOT NULL, moment DATETIME NOT NULL);
Query OK, 0 rows affected (0.02 sec)
mysql> INSERT INTO suicide (id,moment) VALUES (0,now());
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO suicide (id,moment) VALUES (0,0);
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO suicide (id,moment) VALUES (NULL,1);
ERROR 1048: Column 'id' cannot be null
mysql> INSERT INTO suicide (moment) VALUES (now());
Query OK, 1 row affected (0.00 sec)
hey, did I specify a default value ?
mysql> SELECT * FROM suicide;
++-+
| id | moment  |
++-+
|  0 | 2005-02-11 19:17:49 |
|  0 | -00-00 00:00:00 |
|  0 | 2005-02-11 19:18:45 |
++-+
3 rows in set (0.00 sec)








---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] Benchmark

2005-02-11 Thread PFC

In terms of performance, Oracle is to Postgres as Postgres is to Mysql:  
More
complexity, more overhead, more layers of abstraction, but in the long  
run it
pays off when you need it. (Only without the user-friendliness of either
open-source softwares.)

	I don't find postgres complex... I find it nice, well-behaved, very easy  
to use, very powerful, user-friendly... there are a lot of features but  
somehow it's well integrated and makes a coherent set. It also has some  
very useful secret passages (like the whole GiST family) which can save  
you from some things at which SQL really sucks. It certainly is complex on  
the inside but I think the devs have done a very good job at hiding that.

	It's also polite : it will say 'I have a function with the name you said  
but the parameter types don't match' ; mysql will just say 'syntax error,  
RTFM', or insert its favorite value of 0.

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] Benchmark

2005-02-11 Thread Tom Lane
Mike Benoit <[EMAIL PROTECTED]> writes:
> I have never used Oracle myself, nor have I read its license agreement,
> but what if you didn't name Oracle directly? ie:

> TPS   Database
> ---
> 112   MySQL
> 120   PgSQL
> 90Sybase
> 95"Other database that *may* start with a letter after N"
> 50"Other database that *may* start with a letter after L"

Great Bridge did essentially that years ago, but I think we only got
away with it because we didn't say which DBs "Commercial Database A"
and "Commercial Database B" actually were.  Even off the record, we
were only allowed to tell people that the commercial DBs were Oracle
and SQL Server ... but not which was which.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] How to interpret this explain analyse?

2005-02-11 Thread Joost Kraaijeveld
Hi Tom,

Tom Lane schreef:
> On the surface this looks like a reasonable plan choice.  If you like
> you can try the other two basic types of join plan by turning off
> enable_hashjoin, which will likely drive the planner to use a merge
> join, and then also turn off enable_mergejoin to get a nested loop
> (or if it thinks nested loop is second best, turn off enable_nestloop
> to see the behavior with a merge join).

The problem is that the query logically requests all records  ( as in "select * 
from a join") from the database but actually displays (in practise) in 97% of 
the time the first 1000 records and at most the first 50.000 records 
99.99% of the time by scrolling (using "page down) in the gui and 
an occasional "jump to record " through something called a locator) (both 
percentages tested!).

If I do the same query with a "limit 60.000" or if I do a "set enable_seqscan = 
off" the query returns in 0.3 secs. Otherwise it lasts for 20 secs (which is 
too much for the user to wait for, given the circumstances).

I cannot change the query (it is geneated by a tool called Clarion) but it 
something like (from the psqlodbc_xxx.log):
"...
declare SQL_CUR01 cursor for 
SELECT A.ordernummer, B.klantnummer FROM "orders" A LEFT OUTER JOIN "klt_alg" B 
ON A.Klantnummer=B.Klantnummer ORDER BY A.klantnummer;
fetch 100 in SQL_CUR01;
..."

PostgreSQL does the planning (and than executes accordingly) to the query and 
not the "fetch 100". Changing the query  with a "limit whatever" prohibits 
scrolling after the size of the resultset. If Postgres should delay the 
planning of the actual query untill the fetch it could choose the quick 
solution. Another solution would be to "advise" PostgreSQL which index etc 
(whatever etc means ;-))  to use ( as in the mailing from Silke Trissl in the 
performance list on 09-02-05).

> What's important in comparing different plan alternatives is the ratios
> of estimated costs to actual elapsed times.  If the planner is doing its
> job well, those ratios should be similar across all the alternatives
> (which implies of course that the cheapest-estimate plan is also the
> cheapest in reality).  If not, it may be appropriate to fool with the
> planner's cost estimate parameters to try to line up estimates and
> reality a bit better. 
I I really do a "select *" and display the result, the planner is right (tested 
with "set enable_seqscan = off" and "set enable_seqscan = on).

Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] How to interpret this explain analyse?

2005-02-11 Thread Tom Lane
"Joost Kraaijeveld" <[EMAIL PROTECTED]> writes:
> I cannot change the query (it is geneated by a tool called Clarion) but it 
> something like (from the psqlodbc_xxx.log):
> "...
> declare SQL_CUR01 cursor for 
> SELECT A.ordernummer, B.klantnummer FROM "orders" A LEFT OUTER JOIN "klt_alg" 
> B ON A.Klantnummer=B.Klantnummer ORDER BY A.klantnummer;
> fetch 100 in SQL_CUR01;
> ..."

Well, the planner does put some emphasis on startup time when dealing
with a DECLARE CURSOR plan; the problem you face is just that that
correction isn't large enough.  (From memory, I think it optimizes on
the assumption that 10% of the estimated rows will actually be fetched;
you evidently want a setting of 1% or even less.)

We once talked about setting up a GUC variable to control the percentage
of a cursor that is estimated to be fetched:
http://archives.postgresql.org/pgsql-hackers/2000-10/msg01108.php
It never got done but that seems like the most reasonable solution to
me.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Benchmark

2005-02-11 Thread Merlin Moncure
> I have never used Oracle myself, nor have I read its license
agreement,
> but what if you didn't name Oracle directly? ie:
> 
> TPS   Database
> ---
> 112   MySQL
> 120   PgSQL
> 90Sybase
> 95"Other database that *may* start with a letter after N"
> 50"Other database that *may* start with a letter after L"
> 
> As far as I know there are only a couple databases that don't allow
you
> to post benchmarks, but if they remain "unnamed" can legal action be
> taken?
> 
> Just like all those commercials on TV where they advertise: "Cleans
10x
> better then the other leading brand".

Instead of measuring transactions/second, let's put everything in terms
of transactions/dollar.  This will make it quite easy to determine which
database is which from the results.  Since postgresql is free and would
invalidate our test on mathematical terms, we will sub in the $19.99
price of a T-Shirt (http://www.sourcewear.com/) for the price of the
database.

TP$ Database
---
25  A
.5  B
.01 C
.001D
.1  E

Merlin

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] How to interpret this explain analyse?

2005-02-11 Thread Joost Kraaijeveld
Hi Tom,

Tom Lane schreef:
> Well, the planner does put some emphasis on startup time when dealing
> with a DECLARE CURSOR plan; the problem you face is just that that
> correction isn't large enough.  (From memory, I think it optimizes on
> the assumption that 10% of the estimated rows will actually
> be fetched; you evidently want a setting of 1% or even less.)
I wish I had your mnemory ;-) . The tables contain 1.100.000 records by the way 
 (that is not nearly 10 %, my math is not that good))


> We once talked about setting up a GUC variable to control the
> percentage of a cursor that is estimated to be fetched:
> http://archives.postgresql.org/pgsql-hackers/2000-10/msg01108.php
> It never got done but that seems like the most reasonable solution to
> me. 
If the proposal means that the cursor is not limited to ths limit in the query 
but is limited to the fetch than I support the proposal. A bit late I presume.

Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster