[PERFORM] How to interpret this explain analyse?
Hi all, A question on how to read and interpret the explain analyse statement (and what to do) I have a query "SELECT A.ordernummer, B.klantnummer FROM orders A LEFT OUTER JOIN klt_alg B ON A.Klantnummer=B.Klantnummer ORDER BY A.klantnummer;" Both tables have an btree index on klantnummer (int4, the column the join is on). I have vacuumed and analyzed both tables. The explain analyse is: QUERY PLAN Sort (cost=220539.32..223291.41 rows=1100836 width=12) (actual time=51834.128..56065.126 rows=1104380 loops=1) Sort Key: a.klantnummer -> Hash Left Join (cost=41557.43..110069.51 rows=1100836 width=12) (actual time=21263.858..42845.158 rows=1104380 loops=1) Hash Cond: (""outer"".klantnummer = ""inner"".klantnummer) -> Seq Scan on orders a (cost=0.00..46495.36 rows=1100836 width=8) (actual time=5.986..7378.488 rows=1104380 loops=1) -> Hash (cost=40635.14..40635.14 rows=368914 width=4) (actual time=21256.683..21256.683 rows=0 loops=1) -> Seq Scan on klt_alg b (cost=0.00..40635.14 rows=368914 width=4) (actual time=8.880..18910.120 rows=368914 loops=1) Total runtime: 61478.077 ms Questions: -> Hash Left Join (cost=41557.43..110069.51 rows=1100836 width=12) (actual time=21263.858..42845.158 rows=1104380 loops=1) 0. What exactly are the numbers in "cost=41557.43..110069.51" ( I assume for the other questions that 41557.43 is the estimated MS the query will take, what are the others)? 1. I assume that (cost=41557.43..110069.51 rows=1100836 width=12) is the estimated cost and (actual time=21263.858..42845.158 rows=1104380 loops=1) the actual cost. Is the difference acceptable? 2. If not, what can I do about it? Groeten, Joost Kraaijeveld Askesis B.V. Molukkenstraat 14 6524NB Nijmegen tel: 024-3888063 / 06-51855277 fax: 024-3608416 e-mail: [EMAIL PROTECTED] web: www.askesis.nl ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] How to interpret this explain analyse?
Joost Kraaijeveld wrote: Hi all, A question on how to read and interpret the explain analyse statement (and what to do) I have a query "SELECT A.ordernummer, B.klantnummer FROM orders A LEFT OUTER JOIN klt_alg B ON A.Klantnummer=B.Klantnummer ORDER BY A.klantnummer;" Both tables have an btree index on klantnummer (int4, the column the join is on). I have vacuumed and analyzed both tables. The explain analyse is: Indexes not necessarily useful here since you're fetching all rows in A and presumably much of B Sort Hash Left Join Seq Scan on orders a Hash Seq Scan on klt_alg b I've trimmed the above from your explain output. It's sequentially scanning "b" and using a hash to join to "a" before sorting the results. Questions: -> Hash Left Join (cost=41557.43..110069.51 rows=1100836 width=12) (actual time=21263.858..42845.158 rows=1104380 loops=1) 0. What exactly are the numbers in "cost=41557.43..110069.51" ( I assume for the other questions that 41557.43 is the estimated MS the query will take, what are the others)? The cost numbers represent "effort" rather than time. They're only really useful in that you can compare one part of the query to another. There are two numbers because the first shows startup, the second final time. So - the "outer" parts of the query will have increasing startup values since the "inner" parts will have to do their work first. The "actual time" is measured in ms, but remember to multiply it by the "loops" value. Oh, and actually measuring the time slows the query down too. 1. I assume that (cost=41557.43..110069.51 rows=1100836 width=12) is the estimated cost and (actual time=21263.858..42845.158 rows=1104380 loops=1) the actual cost. Is the difference acceptable? 2. If not, what can I do about it? The key thing to look for here is the number of rows. If PG expects say 100 rows but there are instead 10,000 then it may choose the wrong plan. In this case the estimate is 1,100,836 and the actual is 1,104,380 - very close. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Benchmark
On Fri, Feb 11, 2005 at 02:22:39 -0500, Jaime Casanova <[EMAIL PROTECTED]> wrote: > What about the free speech rigths, in USA they are in the constitution > and cannot be denied or revoked, IANAL. You can voluntarily give up your rights to free speech in the US. > And like stated by Mitch just numbers are not lies that can be pursued > in a court of law. I think part of the reason they don't want people doing this, is because if you don't configure their database well, you can make it look bad when it shouldn't. > Think anout it, In USA you can speak and publish about the President > but cannot say anything about M$ or Oracles' DBMS? Not if you signed a contract that says you can't. If you didn't actually sign an agreement saying you wouldn't publish benchmarks, then you might have a case. You might argue that a click through eula isn't a valid contract or that you are a third party who isn't bound by whatever agreement the person who installed Oracle made. However it probably would cost you a bundle to have a chance at winning. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Benchmark
Bruno Wolff III wrote: On Fri, Feb 11, 2005 at 02:22:39 -0500, Jaime Casanova <[EMAIL PROTECTED]> wrote: Think anout it, In USA you can speak and publish about the President but cannot say anything about M$ or Oracles' DBMS? Not if you signed a contract that says you can't. If you didn't actually sign an agreement saying you wouldn't publish benchmarks, then you might have a case. You might argue that a click through eula isn't a valid contract or that you are a third party who isn't bound by whatever agreement the person who installed Oracle made. However it probably would cost you a bundle to have a chance at winning. IANAL etc, but the key fear is more likely that Oracle merely cancel your licence(s). And deny you any more. And prevent your software from running on top of Oracle. At which point, you have to sue Oracle and prove restraint of trade or unfair competition or similar. Don't forget that you have no right to purchase Oracle licences, they are free to sell to whoever they choose and under whatever conditions. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Benchmark
On Feb 11, 2005, at 2:04 AM, Mitch Pirtle wrote: I did do the research, but couldn't find one instance where someone was actually taken to task over it. So far it appears to be bluster. Horrifying to some, but still bluster. They may not have done that yet, but they _COULD_. And if they decide to they have more money and power than you likely have and would drive you into financial ruin for the rest of your life (Even if you are correct). It is a big risk. I think that clause is in there so MS, etc. can't say "Use FooSQL, its 428% faster than that Oracle POS Just look!" After using oracle in the last few months.. I can see why they'd want to prevent those numbers.. Oracle really isn't that good. I had been under the impression that it was holy smokes amazingly fast. It just isn't. At least, in my experience it isn't. but that is another story. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Benchmark
Jeff <[EMAIL PROTECTED]> writes: > After using oracle in the last few months.. I can see why they'd want to > prevent those numbers.. Oracle really isn't that good. I had been under the > impression that it was holy smokes amazingly fast. It just isn't. At least, > in my experience it isn't. but that is another story. Oracle's claim to performance comes not from tight coding and low overhead. For that you use Mysql :) Oracle's claim to performance comes from how you can throw it at a machine with 4-16 processors and it really does get 4-16x as fast. Features like partitioned tables, parallel query, materialized views, etc make it possible to drive it further up the performance curve than Sybase/MSSQL or Postgres. In terms of performance, Oracle is to Postgres as Postgres is to Mysql: More complexity, more overhead, more layers of abstraction, but in the long run it pays off when you need it. (Only without the user-friendliness of either open-source softwares.) -- greg ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] How to interpret this explain analyse?
Richard Huxton writes: > Joost Kraaijeveld wrote: >> 2. If not, what can I do about it? > The key thing to look for here is the number of rows. If PG expects say > 100 rows but there are instead 10,000 then it may choose the wrong plan. > In this case the estimate is 1,100,836 and the actual is 1,104,380 - > very close. On the surface this looks like a reasonable plan choice. If you like you can try the other two basic types of join plan by turning off enable_hashjoin, which will likely drive the planner to use a merge join, and then also turn off enable_mergejoin to get a nested loop (or if it thinks nested loop is second best, turn off enable_nestloop to see the behavior with a merge join). What's important in comparing different plan alternatives is the ratios of estimated costs to actual elapsed times. If the planner is doing its job well, those ratios should be similar across all the alternatives (which implies of course that the cheapest-estimate plan is also the cheapest in reality). If not, it may be appropriate to fool with the planner's cost estimate parameters to try to line up estimates and reality a bit better. See http://www.postgresql.org/docs/8.0/static/performance-tips.html for more detail. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Benchmark
I have never used Oracle myself, nor have I read its license agreement, but what if you didn't name Oracle directly? ie: TPS Database --- 112 MySQL 120 PgSQL 90 Sybase 95 "Other database that *may* start with a letter after N" 50 "Other database that *may* start with a letter after L" As far as I know there are only a couple databases that don't allow you to post benchmarks, but if they remain "unnamed" can legal action be taken? Just like all those commercials on TV where they advertise: "Cleans 10x better then the other leading brand". On Fri, 2005-02-11 at 00:22 -0500, Mitch Pirtle wrote: > On Thu, 10 Feb 2005 08:21:09 -0500, Jeff <[EMAIL PROTECTED]> wrote: > > > > If you plan on making your results public be very careful with the > > license agreements on the other db's. I know Oracle forbids the > > release of benchmark numbers without their approval. > > ...as all of the other commercial databases do. This may be off-topic, > but has anyone actually suffered any consequences of a published > benchmark without permission? > > For example, I am a developer of Mambo, a PHP-based CMS application, > and am porting the mysql functions to ADOdb so I can use grown-up > databases ;-) > > What is keeping me from running a copy of Mambo on a donated server > for testing and performance measures (including the commercial > databases) and then publishing the results based on Mambo's > performance on each? > > It would be really useful to know if anyone has ever been punished for > doing this, as IANAL but that restriction is going to be very, VERY > difficult to back up in court without precedence. Is this just a > deterrent, or is it real? > > -- Mitch > > ---(end of broadcast)--- > TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] -- Mike Benoit <[EMAIL PROTECTED]> signature.asc Description: This is a digitally signed message part
Re: [PERFORM] Benchmark (slightly off topic but oh well)
For example, I am a developer of Mambo, a PHP-based CMS application, and am porting the mysql functions to ADOdb so I can use grown-up databases ;-) Just yesterday I "optimized" a query for a website running MySQL. It's the 'new products' type query : SELECT product_id, pd.product_name, p.price, COALESCE( s.specials_price, p.price ) as real_price FROM products p, products_descriptions pd LEFT join specials s ON (p.product_id = s.product_id) WHERE p.product_id = pd.product_id AND pd.language_id=(constant) AND p.product_visible=TRUE AND s.is_active = TRUE ORDER BY p.date_added DESC LIMIT 6 With ~100 products everything went smooth, about 0.5 ms. I decided to test with 20.000 because we have a client with a large catalog coming. Wow. It took half a second, to yield six products. Note that there are appropriate indexes all over the place (for getting the new products, I have an index on product_visible, date_added) I tested with Postgres : with 100 products it takes 0.4 ms, with 20.000 it takes 0.6 ms... Postgres needs a bit of query massaging (putting an extra ORDER BY product_visible to use the index). With MySQL no amount of query rewriting would do. I noted sometimes MySQL would never use a multicolumn index for an ORDER BY LIMIT unless one specifies a dummy condition on the missing parameter. So I had to split the query in two : fetch the six product_ids, store them in a PHP variable, implode(',',$ids), and SELECT ... WHERE product_id IN (x,y,z) UGLY ! And a lot slower. Note this is with MySQL 4.0.23 or something. Maybe 4.1 would be faster. Here's the URL to the site. There is a query log if you wanna look just for laughs. Note that all the products boxes are active which makes a very long page time... There are 42000 fictive products and about 60 real products. Don't use the search form unless you have a good book to read ! You can click on "Nouveautés" to see the old "new products" query in action, but please, only one people at a time. http://pinceau-d-or.com/gros/product_info.php?products_id=164 Ah, you can buy stuff with the test version if you like, just don't use the credit card because ... it works ;) This is the un-messed-up version (production) : http://pinceau-d-or.com/product_info.php?products_id=164 If some day I can recode this mess to use Postgres... this would be nice, so nice... the other day my database went apeshit and in the absence of foreign keys... and the absence of PHP checking anything... ! test=# CREATE TABLE suicide (id INT NOT NULL, moment TIMESTAMP NOT NULL); CREATE TABLE test=# INSERT INTO suicide (id,moment) VALUES (0,now()); INSERT 6145577 1 test=# INSERT INTO suicide (id,moment) VALUES (0,0); ERREUR: La colonne <> est de type timestamp without time zone mais l'expression est de type integer HINT: Vous devez reecrire l'expression ou lui appliquer une transformation de type. test=# INSERT INTO suicide (id,moment) VALUES (NULL,1); ERREUR: La colonne <> est de type timestamp without time zone mais l'expression est de type integer HINT: Vous devez reecrire l'expression ou lui appliquer une transformation de type. test=# INSERT INTO suicide (id,moment) VALUES (NULL,now()); ERREUR: Une valeur NULL dans la colonne <> viole la contrainte NOT NULL test=# SELECT * FROM suicide; id | moment + 0 | 2005-02-11 19:16:21.262359 mysql> CREATE TABLE suicide (id INT NOT NULL, moment DATETIME NOT NULL); Query OK, 0 rows affected (0.02 sec) mysql> INSERT INTO suicide (id,moment) VALUES (0,now()); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO suicide (id,moment) VALUES (0,0); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO suicide (id,moment) VALUES (NULL,1); ERROR 1048: Column 'id' cannot be null mysql> INSERT INTO suicide (moment) VALUES (now()); Query OK, 1 row affected (0.00 sec) hey, did I specify a default value ? mysql> SELECT * FROM suicide; ++-+ | id | moment | ++-+ | 0 | 2005-02-11 19:17:49 | | 0 | -00-00 00:00:00 | | 0 | 2005-02-11 19:18:45 | ++-+ 3 rows in set (0.00 sec) ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Benchmark
In terms of performance, Oracle is to Postgres as Postgres is to Mysql: More complexity, more overhead, more layers of abstraction, but in the long run it pays off when you need it. (Only without the user-friendliness of either open-source softwares.) I don't find postgres complex... I find it nice, well-behaved, very easy to use, very powerful, user-friendly... there are a lot of features but somehow it's well integrated and makes a coherent set. It also has some very useful secret passages (like the whole GiST family) which can save you from some things at which SQL really sucks. It certainly is complex on the inside but I think the devs have done a very good job at hiding that. It's also polite : it will say 'I have a function with the name you said but the parameter types don't match' ; mysql will just say 'syntax error, RTFM', or insert its favorite value of 0. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Benchmark
Mike Benoit <[EMAIL PROTECTED]> writes: > I have never used Oracle myself, nor have I read its license agreement, > but what if you didn't name Oracle directly? ie: > TPS Database > --- > 112 MySQL > 120 PgSQL > 90Sybase > 95"Other database that *may* start with a letter after N" > 50"Other database that *may* start with a letter after L" Great Bridge did essentially that years ago, but I think we only got away with it because we didn't say which DBs "Commercial Database A" and "Commercial Database B" actually were. Even off the record, we were only allowed to tell people that the commercial DBs were Oracle and SQL Server ... but not which was which. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] How to interpret this explain analyse?
Hi Tom, Tom Lane schreef: > On the surface this looks like a reasonable plan choice. If you like > you can try the other two basic types of join plan by turning off > enable_hashjoin, which will likely drive the planner to use a merge > join, and then also turn off enable_mergejoin to get a nested loop > (or if it thinks nested loop is second best, turn off enable_nestloop > to see the behavior with a merge join). The problem is that the query logically requests all records ( as in "select * from a join") from the database but actually displays (in practise) in 97% of the time the first 1000 records and at most the first 50.000 records 99.99% of the time by scrolling (using "page down) in the gui and an occasional "jump to record " through something called a locator) (both percentages tested!). If I do the same query with a "limit 60.000" or if I do a "set enable_seqscan = off" the query returns in 0.3 secs. Otherwise it lasts for 20 secs (which is too much for the user to wait for, given the circumstances). I cannot change the query (it is geneated by a tool called Clarion) but it something like (from the psqlodbc_xxx.log): "... declare SQL_CUR01 cursor for SELECT A.ordernummer, B.klantnummer FROM "orders" A LEFT OUTER JOIN "klt_alg" B ON A.Klantnummer=B.Klantnummer ORDER BY A.klantnummer; fetch 100 in SQL_CUR01; ..." PostgreSQL does the planning (and than executes accordingly) to the query and not the "fetch 100". Changing the query with a "limit whatever" prohibits scrolling after the size of the resultset. If Postgres should delay the planning of the actual query untill the fetch it could choose the quick solution. Another solution would be to "advise" PostgreSQL which index etc (whatever etc means ;-)) to use ( as in the mailing from Silke Trissl in the performance list on 09-02-05). > What's important in comparing different plan alternatives is the ratios > of estimated costs to actual elapsed times. If the planner is doing its > job well, those ratios should be similar across all the alternatives > (which implies of course that the cheapest-estimate plan is also the > cheapest in reality). If not, it may be appropriate to fool with the > planner's cost estimate parameters to try to line up estimates and > reality a bit better. I I really do a "select *" and display the result, the planner is right (tested with "set enable_seqscan = off" and "set enable_seqscan = on). Groeten, Joost Kraaijeveld Askesis B.V. Molukkenstraat 14 6524NB Nijmegen tel: 024-3888063 / 06-51855277 fax: 024-3608416 e-mail: [EMAIL PROTECTED] web: www.askesis.nl ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] How to interpret this explain analyse?
"Joost Kraaijeveld" <[EMAIL PROTECTED]> writes: > I cannot change the query (it is geneated by a tool called Clarion) but it > something like (from the psqlodbc_xxx.log): > "... > declare SQL_CUR01 cursor for > SELECT A.ordernummer, B.klantnummer FROM "orders" A LEFT OUTER JOIN "klt_alg" > B ON A.Klantnummer=B.Klantnummer ORDER BY A.klantnummer; > fetch 100 in SQL_CUR01; > ..." Well, the planner does put some emphasis on startup time when dealing with a DECLARE CURSOR plan; the problem you face is just that that correction isn't large enough. (From memory, I think it optimizes on the assumption that 10% of the estimated rows will actually be fetched; you evidently want a setting of 1% or even less.) We once talked about setting up a GUC variable to control the percentage of a cursor that is estimated to be fetched: http://archives.postgresql.org/pgsql-hackers/2000-10/msg01108.php It never got done but that seems like the most reasonable solution to me. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Benchmark
> I have never used Oracle myself, nor have I read its license agreement, > but what if you didn't name Oracle directly? ie: > > TPS Database > --- > 112 MySQL > 120 PgSQL > 90Sybase > 95"Other database that *may* start with a letter after N" > 50"Other database that *may* start with a letter after L" > > As far as I know there are only a couple databases that don't allow you > to post benchmarks, but if they remain "unnamed" can legal action be > taken? > > Just like all those commercials on TV where they advertise: "Cleans 10x > better then the other leading brand". Instead of measuring transactions/second, let's put everything in terms of transactions/dollar. This will make it quite easy to determine which database is which from the results. Since postgresql is free and would invalidate our test on mathematical terms, we will sub in the $19.99 price of a T-Shirt (http://www.sourcewear.com/) for the price of the database. TP$ Database --- 25 A .5 B .01 C .001D .1 E Merlin ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] How to interpret this explain analyse?
Hi Tom, Tom Lane schreef: > Well, the planner does put some emphasis on startup time when dealing > with a DECLARE CURSOR plan; the problem you face is just that that > correction isn't large enough. (From memory, I think it optimizes on > the assumption that 10% of the estimated rows will actually > be fetched; you evidently want a setting of 1% or even less.) I wish I had your mnemory ;-) . The tables contain 1.100.000 records by the way (that is not nearly 10 %, my math is not that good)) > We once talked about setting up a GUC variable to control the > percentage of a cursor that is estimated to be fetched: > http://archives.postgresql.org/pgsql-hackers/2000-10/msg01108.php > It never got done but that seems like the most reasonable solution to > me. If the proposal means that the cursor is not limited to ths limit in the query but is limited to the fetch than I support the proposal. A bit late I presume. Groeten, Joost Kraaijeveld Askesis B.V. Molukkenstraat 14 6524NB Nijmegen tel: 024-3888063 / 06-51855277 fax: 024-3608416 e-mail: [EMAIL PROTECTED] web: www.askesis.nl ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster