[PERFORM] index usage makes problem
I had 50 lakh records in my table... while counting that am using that row in where condition... which makes problem, cpu is waiting for device... Debian OS, postresql 7.4, 50 lakh records. Query is EXPLAIN ANALYZE select count(call_id) from call_log where call_id > 1; while seeing the top, cpu is waiting for i/o, and without this call_id condition if i do EXPLAIN ANALYZE select count(oid) from call_log where oid > 1; it executed in 21 seconds
Re: [PERFORM] count * performance issue
> Yes it is the latest stable version. > is there any article saying the difference between this 7.3 and 8.4
Re: [PERFORM] count * performance issue
Of course, the official documentation covers that information in its release notes http://www.postgresql.org/docs/8.3/static/release.html best wishes Harald On Thu, Mar 6, 2008 at 1:43 PM, sathiya psql <[EMAIL PROTECTED]> wrote: > > > > > > > > > > Yes it is the latest stable version. > > is there any article saying the difference between this 7.3 and 8.4 > > > -- GHUM Harald Massa persuadere et programmare Harald Armin Massa Spielberger Straße 49 70435 Stuttgart 0173/9409607 fx 01212-5-13695179 - EuroPython 2008 will take place in Vilnius, Lithuania - Stay tuned! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] count * performance issue
On 6-Mar-08, at 1:43 AM, sathiya psql wrote: is there any way to explicitly force the postgres to use index scan If you want to count all the rows in the table there is only one way to do it (without keeping track yourself with a trigger ); a seq scan. An index will not help you. The only thing that is going to help you is really fast disks, and more memory, and you should consider moving to 8.3 for all the other performance benefits. Dave -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] count * performance issue
am Thu, dem 06.03.2008, um 18:13:50 +0530 mailte sathiya psql folgendes: > > Yes it is the latest stable version. > > > is there any article saying the difference between this 7.3 and 8.4 http://developer.postgresql.org/pgdocs/postgres/release.html Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] count * performance issue
Hi, On 6-Mar-08, at 6:58 AM, sathiya psql wrote: The only thing that is going to help you is really fast disks, and more memory, and you should consider moving to 8.3 for all the other performance benefits. Is 8.3 is a stable version or what is the latest stable version of postgres ?? Yes it is the latest stable version. moving my database from 7.4 to 8.3 will it do any harm ?? You will have to test this yourself. There may be issues what are all the advantages of moving from 7.4 to 8.3 Every version of postgresql has improved performance, and robustness; so you will get better overall performance. However I want to caution you this is not a panacea. It will NOT solve your seq scan problem. Dave
Re: [PERFORM] count * performance issue
sathiya psql escribió: > > Yes it is the latest stable version. > > is there any article saying the difference between this 7.3 and 8.4 http://www.postgresql.org/docs/8.3/static/release.html In particular, http://www.postgresql.org/docs/8.3/static/release-8-3.html http://www.postgresql.org/docs/8.3/static/release-8-2.html http://www.postgresql.org/docs/8.3/static/release-8-1.html http://www.postgresql.org/docs/8.3/static/release-8-0.html which are all the major releases between 7.4 and 8.3. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] postgresql Explain command output
On Thu, 6 Mar 2008, RaviRam Kolipaka wrote: My goal is create a visual representation of the expain plan. This problem has been solved already by code that's in pgadmin and you might look at that source code for hints if you want to write your own implementation. There's a good intro to using that at http://www.postgresonline.com/journal/index.php?/archives/27-Reading-PgAdmin-Graphical-Explain-Plans.html -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] count * performance issue
A. Kretschmer wrote: am Thu, dem 06.03.2008, um 12:17:55 +0530 mailte sathiya psql folgendes: TRIGGER i can use if i want the count of the whole table, but i require for some of the rows with WHERE condition so how to do that ??? Okay, in this case a TRIGGER are a bad idea. You can use an INDEX on this row. Can you show us the output for a EXPLAIN ANALYSE SELECT count(*) from WHERE = ... ? Actually - in this case, TRIGGER can be a good idea. If your count table can include the where information, then you no longer require an effective table-wide lock for updates. In the past I have used sequential articles numbers within a topic for an online community. Each topic row had an article_count. To generate a new article, I could update the article_count and use the number I had generated as the article number. To query the number of articles in a particular topic, article_count was available. Given thousands of topics, and 10s of thousands of articles, the system worked pretty good. Not in the millions range as the original poster, but I saw no reason why this wouldn't scale. For the original poster: You might be desperate and looking for help from the only place you know to get it from, but some of your recent answers have shown that you are either not reading the helpful responses provided to you, or you are unwilling to do your own research. If that continues, I won't be posting to aid you. Cheers, mark -- Mark Mielke <[EMAIL PROTECTED]>
Re: [PERFORM] count * performance issue
On Thu, 6 Mar 2008, sathiya psql wrote: is there any article saying the difference between this 7.3 and 8.4 I've collected a list of everything on this topic I've seen at http://www.postgresqldocs.org/index.php/Version_8.3_Changes The Feature Matrix linked to there will be a quicker way to see what's happened than sorting through the release notes. None of these changes change the fact that getting an exact count in this situation takes either a sequential scan or triggers. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] count * performance issue
On Thu, 6 Mar 2008, sathiya psql wrote: any way will you explain., what is this COST, actual time and other stuffs There's a long list of links to tools and articles on this subject at http://www.postgresqldocs.org/index.php/Using_EXPLAIN -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] postgresql Explain command output
Greg Smith <[EMAIL PROTECTED]> writes: > On Thu, 6 Mar 2008, RaviRam Kolipaka wrote: >> My goal is create a visual representation of the expain plan. > This problem has been solved already by code that's in pgadmin and you > might look at that source code for hints if you want to write your own > implementation. It's been solved more than once actually --- Red Hat did a "Visual Explain" tool several years ago, which is unmaintained now but still available for download (http://sources.redhat.com/rhdb/). I've heard that EDB picked it up and is now maintaining their own fork, but I don't know the status of that for sure. That code is in Java, if it makes a difference to you. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] postgresql Explain command output
On Thu, 6 Mar 2008, Tom Lane wrote: Red Hat did a "Visual Explain" tool several years ago, which is unmaintained now but still available for download (http://sources.redhat.com/rhdb/). I've heard that EDB picked it up and is now maintaining their own fork, but I don't know the status of that for sure. I know I wrote this down somewhere...ah ha, it was in the MySQL comparision paper: Visual Explain, originally a RedHat component that has been kept current and improved by Enterprise DB, comes bundled with the EnterpriseDB Advanced Server package. It can be built to run against other PostgreSQL installations using the source code to their Developer Studio package: http://www.enterprisedb.com/products/download.do -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] count * performance issue
In the 3 years I've been using Postgres, the problem of count() performance has come up more times than I can recall, and each time the answer is, "It's a sequential scan -- redesign your application." My question is: What do the other databases do that Postgres can't do, and why not? Count() on Oracle and MySQL is almost instantaneous, even for very large tables. So why can't Postgres do what they do? On the one hand, I understand that Postgres has its architecture, and I understand the issue of row visibility, and so forth. On the other hand, my database is just sitting there, nothing going on, no connections except me, and... it takes FIFTY FIVE SECONDS to count 20 million rows, a query that either Oracle or MySQL would answer in a fraction of a second. It's hard for me to believe there isn't a better way. This is a real problem. Countless people (including me) have spent significant effort rewriting applications because of this performance flaw in Postgres. Over and over, the response is, "You don't really need to do that ... change your application." Well, sure, it's always possible to change the application, but that misses the point. To most of us users, count() seems like it should be a trivial operation. On other relational database systems, it is a trivial operation. This is really a significant flaw on an otherwise excellent relational database system. My rant for today... Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] count * performance issue
Craig James wrote: > This is a real problem. Countless people (including me) have > spent significant effort rewriting applications because of this > performance flaw in Postgres. Over and over, the response is, > "You don't really need to do that ... change your application." > Well, sure, it's always possible to change the application, but > that misses the point. To most of us users, count() seems like > it should be a trivial operation. On other relational database > systems, it is a trivial operation. > > This is really a significant flaw on an otherwise excellent > relational database system. Have you read the TODO items related to this? -- Bruce Momjian <[EMAIL PROTECTED]>http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] count * performance issue
On Thu, Mar 06, 2008 at 07:28:50AM -0800, Craig James wrote: > Count() on Oracle and MySQL is almost instantaneous, even for very large > tables. So why can't Postgres do what they do? In MySQL's case: Handle transactions. (Try COUNT(*) on an InnoDB table.) /* Steinar */ -- Homepage: http://www.sesse.net/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] count * performance issue
In response to Craig James <[EMAIL PROTECTED]>: > In the 3 years I've been using Postgres, the problem of count() performance > has come up more times than I can recall, and each time the answer is, "It's > a sequential scan -- redesign your application." > > My question is: What do the other databases do that Postgres can't do, and > why not? > > Count() on Oracle and MySQL is almost instantaneous, even for very large > tables. So why can't Postgres do what they do? I don't know about Oracle, but MySQL has this problem as well. Use innodb tables and see how slow it is. The only reason myisam tables don't have this problem is because they don't implement any of the features that make the problem difficult to solve. > On the one hand, I understand that Postgres has its architecture, and I > understand the issue of row visibility, and so forth. On the other hand, my > database is just sitting there, nothing going on, no connections except me, > and... it takes FIFTY FIVE SECONDS to count 20 million rows, a query that > either Oracle or MySQL would answer in a fraction of a second. It's hard for > me to believe there isn't a better way. There's been discussion about putting visibility information in indexes. I don't know how far along that effort is, but I expect that will improve count() performance significantly. > This is a real problem. Countless people (including me) have spent > significant effort rewriting applications because of this performance flaw in > Postgres. Over and over, the response is, "You don't really need to do that > ... change your application." Well, sure, it's always possible to change the > application, but that misses the point. To most of us users, count() seems > like it should be a trivial operation. On other relational database systems, > it is a trivial operation. > > This is really a significant flaw on an otherwise excellent relational > database system. Not really. It really is a design flaw in your application ... it doesn't make relational sense to use the number of rows in a table for anything. Just because other people do it frequently doesn't make it right. That being said, it's still a useful feature, and I don't hear anyone denying that. As I said, google around a bit WRT to PG storing visibility information in indexes, as I think that's the way this will be improved. > My rant for today... Feel better now? -- Bill Moran -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] count * performance issue
On Thu, 6 Mar 2008, Steinar H. Gunderson wrote: On Thu, Mar 06, 2008 at 07:28:50AM -0800, Craig James wrote: Count() on Oracle and MySQL is almost instantaneous, even for very large tables. So why can't Postgres do what they do? In MySQL's case: Handle transactions. (Try COUNT(*) on an InnoDB table.) Exactly. There is a good discussion of this at http://www.mysqlperformanceblog.com/2007/04/10/count-vs-countcol/ and I found the comments from Ken Jacobs were the most informative. In short, if you want any reasonable database integrity you have to use InnoDB with MySQL, and once you make that choice it has the same problem. You only get this accelerated significantly when using MyISAM, which can tell you an exact count of all the rows it hasn't corrupted yet. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] count * performance issue
On Thu, Mar 6, 2008 at 3:49 PM, Greg Smith <[EMAIL PROTECTED]> wrote: > > You only get this accelerated significantly when using MyISAM, which can > tell you an exact count of all the rows it hasn't corrupted yet. Please don't do that again. I'm going to have to spend the next hour cleaning coffee out of my laptop keyboard. :-) -- Dave Page EnterpriseDB UK Ltd: http://www.enterprisedb.com PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] count * performance issue
On Thu, 2008-03-06 at 07:28 -0800, Craig James wrote: ... > My question is: What do the other databases do that Postgres can't do, and > why not? > > Count() on Oracle and MySQL is almost instantaneous, even for very large > tables. So why can't Postgres do what they do? ... I can vouch that Oracle can still take linear time to perform a count(*), at least in some cases. I have also seen count(*) fast in some cases too... my understanding is that they maintain a list of "interested transactions" on a per-relation basis. Perhaps they do an optimization based on the index size if there are no pending DML transactions? -- Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] oid...any optimizations
On Thu, 6 Mar 2008 12:43:57 +0530 "sathiya psql" <[EMAIL PROTECTED]> wrote: > > > > Actually it isn't obvious as oids have been deprecated for years. > > > no in my version it is now also available I didn't say they were gone. I said they are deprecated. You should not be using them. > > > > > > > What version of ancient PostgreSQL are you running exactly? > > > postgresql 7.4 That is god awful ancient. Upgrade to something remotely new, like 8.2.6. Joshua D. Drake -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL SPI Liaison | SPI Director | PostgreSQL political pundit signature.asc Description: PGP signature
Re: [PERFORM] count * performance issue
On Thu, 06 Mar 2008 07:28:50 -0800 Craig James <[EMAIL PROTECTED]> wrote: > In the 3 years I've been using Postgres, the problem of count() performance > has come up more times than I can recall, and each time the answer is, "It's > a sequential scan -- redesign your application." > > My question is: What do the other databases do that Postgres can't do, and > why not? > > Count() on Oracle and MySQL is almost instantaneous, even for very large > tables. So why can't Postgres do what they do? It's a tradeoff. The only way to get that information quickly is to maintain it internally when you insert or delete a row. So when do you want to take your hit. It sounds like Oracle has made this decision for you. In PostgreSQL you can use triggers and rules to manage this information if you need it. You can even do stuff like track how many of each type of something you have. That's something you can't do if your database engine has done a generic speedup for you. You would still have to create your own table for something like that and then you get the hit twice. -- D'Arcy J.M. Cain <[EMAIL PROTECTED]> | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] Performance of aggregates over set-returning functions
This this a bug or TODO item? --- Tom Lane wrote: > "John Smith" <[EMAIL PROTECTED]> writes: > >> It's pipelined either way. But int8 is a pass-by-reference data type, > >> and it sounds like we have a memory leak for this case. > > > Thanks for your reply. How easy is it to fix this? Which portion of > > the code should we look to change? > > I was just looking at that. The issue looks to me that nodeResult.c > (and other plan node types that support SRFs in their targetlists) > do this: > > /* > * Check to see if we're still projecting out tuples from a previous scan > * tuple (because there is a function-returning-set in the projection > * expressions). If so, try to project another one. > */ > if (node->ps.ps_TupFromTlist) > { > resultSlot = ExecProject(node->ps.ps_ProjInfo, &isDone); > if (isDone == ExprMultipleResult) > return resultSlot; > /* Done with that source tuple... */ > node->ps.ps_TupFromTlist = false; > } > > /* > * Reset per-tuple memory context to free any expression evaluation > * storage allocated in the previous tuple cycle. Note this can't happen > * until we're done projecting out tuples from a scan tuple. > */ > ResetExprContext(econtext); > > whereas there would be no memory leak if these two chunks of code were > in the other order. The question is whether resetting the context first > would throw away any data that we *do* still need for the repeated > ExecProject calls. That second comment block implies there's something > we do need. > > I'm not sure why it's like this. Some digging in the CVS history shows > that indeed the code used to be in the other order, and I switched it > (and added the second comment block) in this old commit: > > http://archives.postgresql.org/pgsql-committers/2000-08/msg00218.php > > I suppose that the SQL-function support at the time required that its > calling memory context be persistent until it returned ExprEndResult, > but I sure don't recall any details. It's entirely possible that that > requirement no longer exists, or could easily be eliminated given all > the other changes that have happened since then. nodeFunctionscan.c > seems to reset the current context for each call of a SRF, so I'd think > that anything that can't cope with that should have been flushed out > by now. > > If you feel like poking at this, I *strongly* recommend doing your > testing in an --enable-cassert build. You'll have no idea whether you > freed stuff too early if you don't have CLOBBER_FREED_MEMORY enabled. > > regards, tom lane > > ---(end of broadcast)--- > TIP 5: don't forget to increase your free space map settings -- Bruce Momjian <[EMAIL PROTECTED]>http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] Why the difference in plans ?
Dave, > Below I have two almost identical queries. Strangely enough the one > that uses the index is slower ??? My first guess would be that records are highly correlated by DOB and not at all by name. However, it would help if you supplied both the index definitions and what changed between the two queries to cause the index to be used. -- Josh Berkus PostgreSQL @ Sun San Francisco -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] count * performance issue
Craig James <[EMAIL PROTECTED]> writes: > Count() on Oracle and MySQL is almost instantaneous, even for very large > tables. So why can't Postgres do what they do? AFAIK the above claim is false for Oracle. They have the same transactional issues we do. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] Performance of aggregates over set-returning functions
Bruce Momjian <[EMAIL PROTECTED]> writes: > This this a bug or TODO item? TODO, I think. I wouldn't want to risk pushing a change in this into back branches. regards, tom lane >> I'm not sure why it's like this. Some digging in the CVS history shows >> that indeed the code used to be in the other order, and I switched it >> (and added the second comment block) in this old commit: >> >> http://archives.postgresql.org/pgsql-committers/2000-08/msg00218.php >> >> I suppose that the SQL-function support at the time required that its >> calling memory context be persistent until it returned ExprEndResult, >> but I sure don't recall any details. It's entirely possible that that >> requirement no longer exists, or could easily be eliminated given all >> the other changes that have happened since then. nodeFunctionscan.c >> seems to reset the current context for each call of a SRF, so I'd think >> that anything that can't cope with that should have been flushed out >> by now. >> >> If you feel like poking at this, I *strongly* recommend doing your >> testing in an --enable-cassert build. You'll have no idea whether you >> freed stuff too early if you don't have CLOBBER_FREED_MEMORY enabled. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] Performance of aggregates over set-returning functions
OK, added to TODO: * Reduce memory usage of aggregates in set returning functions http://archives.postgresql.org/pgsql-performance/2008-01/msg00031.php --- Tom Lane wrote: > Bruce Momjian <[EMAIL PROTECTED]> writes: > > This this a bug or TODO item? > > TODO, I think. I wouldn't want to risk pushing a change in this into > back branches. > > regards, tom lane > > >> I'm not sure why it's like this. Some digging in the CVS history shows > >> that indeed the code used to be in the other order, and I switched it > >> (and added the second comment block) in this old commit: > >> > >> http://archives.postgresql.org/pgsql-committers/2000-08/msg00218.php > >> > >> I suppose that the SQL-function support at the time required that its > >> calling memory context be persistent until it returned ExprEndResult, > >> but I sure don't recall any details. It's entirely possible that that > >> requirement no longer exists, or could easily be eliminated given all > >> the other changes that have happened since then. nodeFunctionscan.c > >> seems to reset the current context for each call of a SRF, so I'd think > >> that anything that can't cope with that should have been flushed out > >> by now. > >> > >> If you feel like poking at this, I *strongly* recommend doing your > >> testing in an --enable-cassert build. You'll have no idea whether you > >> freed stuff too early if you don't have CLOBBER_FREED_MEMORY enabled. > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance -- Bruce Momjian <[EMAIL PROTECTED]>http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] Why the difference in plans ?
On 6-Mar-08, at 12:26 PM, Josh Berkus wrote: Dave, Below I have two almost identical queries. Strangely enough the one that uses the index is slower ??? My first guess would be that records are highly correlated by DOB and not at all by name. However, it would help if you supplied both the index definitions and what changed between the two queries to cause the index to be used. The two queries were run 2 seconds apart, there were no changes between. I'll get the index definitions. Dave -- Josh Berkus PostgreSQL @ Sun San Francisco -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org ) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] More shared buffers causes lower performances
Tom Lane wrote: > Greg Smith <[EMAIL PROTECTED]> writes: > > On Wed, 26 Dec 2007, Guillaume Smet wrote: > >> beta RPMs are by default compiled with --enable-debug and > >> --enable-cassert which doesn't help them to fly fast... > > > Got that right. Last time I was going crazy after running pgbench with > > those options and not having realized what I changed, I was getting a 50% > > slowdown on results that way compared to without the debugging stuff. > > Didn't realize it scaled with shared_buffers though. > > See AtEOXact_Buffers(). There are probably any number of other > interesting scaling behaviors --- in my tests, AllocSetCheck() is > normally a major cycle-eater if --enable-cassert is set, and that costs > time proportional to the number of memory chunks allocated by the query. > > Currently the docs say that --enable-cassert > > Enables assertion checks in the server, which test for > many cannot happen conditions. This is invaluable for > code development purposes, but the tests slow things down a little. > > Maybe we ought to put that more strongly --- s/a little/significantly/, > perhaps? Docs updated with attached patch, backpatched to 8.3.X. -- Bruce Momjian <[EMAIL PROTECTED]>http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + Index: doc/src/sgml/installation.sgml === RCS file: /cvsroot/pgsql/doc/src/sgml/installation.sgml,v retrieving revision 1.302 diff -c -c -r1.302 installation.sgml *** doc/src/sgml/installation.sgml 17 Feb 2008 16:36:43 - 1.302 --- doc/src/sgml/installation.sgml 6 Mar 2008 21:36:39 - *** *** 1144,1157 Enables assertion checks in the server, which test for many cannot happen conditions. This is invaluable for ! code development purposes, but the tests slow things down a little. Also, having the tests turned on won't necessarily enhance the stability of your server! The assertion checks are not categorized for severity, and so what might be a relatively harmless bug will still lead to server restarts if it triggers an assertion ! failure. Currently, this option is not recommended for ! production use, but you should have it on for development work ! or when running a beta version. --- 1144,1158 Enables assertion checks in the server, which test for many cannot happen conditions. This is invaluable for ! code development purposes, but the tests can slow down the ! server significantly. Also, having the tests turned on won't necessarily enhance the stability of your server! The assertion checks are not categorized for severity, and so what might be a relatively harmless bug will still lead to server restarts if it triggers an assertion ! failure. This option is not recommended for production use, but ! you should have it on for development work or when running a beta ! version. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] Why the difference in plans ?
Dave Cramer wrote: > I have two almost identical queries. Strangely enough the one > that uses the index is slower ??? The index scan is being used so that it can retrieve the rows in the name order. It expects that if it was to retrieve every row via the index, it would get about 1010 rows that matched the filter, and it knows it can stop after 250, so assuming the matching rows are evenly distributed it thinks it can stop after having read only a quarter of the rows. However only 129 rows matched. Consequently it had to read every row in the table anyway, seeking a fair bit as the read order was specified by the index rather than in sequential order, and it also had to read the index. These extra costs were much larger than reading the lot sequentially, and sorting 129 resulting rows. The first query picked a sequential scan as it thought it was only going to get 11 results, so was expecting that the limit wasn't going to come into play, and that every row would have to be read anyway. Regards, Stephen Denne. Disclaimer: At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way. __ This email has been scanned by the DMZGlobal Business Quality Electronic Messaging Suite. Please see http://www.dmzglobal.com/services/bqem.htm for details. __ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] Why the difference in plans ?
On 6-Mar-08, at 5:10 PM, Stephen Denne wrote: Dave Cramer wrote: I have two almost identical queries. Strangely enough the one that uses the index is slower ??? The index scan is being used so that it can retrieve the rows in the name order. It expects that if it was to retrieve every row via the index, it would get about 1010 rows that matched the filter, and it knows it can stop after 250, so assuming the matching rows are evenly distributed it thinks it can stop after having read only a quarter of the rows. However only 129 rows matched. Consequently it had to read every row in the table anyway, seeking a fair bit as the read order was specified by the index rather than in sequential order, and it also had to read the index. These extra costs were much larger than reading the lot sequentially, and sorting 129 resulting rows. The first query picked a sequential scan as it thought it was only going to get 11 results, so was expecting that the limit wasn't going to come into play, and that every row would have to be read anyway. The strange thing of course is that the data is exactly the same for both runs, the tables have not been changed between runs, and I did them right after another. Even more strange is that the seq scan is faster than the index scan. Dave Regards, Stephen Denne. Disclaimer: At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way. __ This email has been scanned by the DMZGlobal Business Quality Electronic Messaging Suite. Please see http://www.dmzglobal.com/services/bqem.htm for details. __ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why the difference in plans ?
> The strange thing of course is that the data is exactly the same for > both runs, the tables have not been changed between runs, and I did > them right after another. Even more strange is that the seq scan is > faster than the index scan. It is not strange at all, since both queries read ALL the rows in your table, checking each and every row to see whether it matched your predicates. The sequential scan read them in the order they are on the disk, meaning your disk didn't have to seek as much (assuming low file fragmentation). The index scan again reads all the rows in your table, but reads them in the order they were in the index, which is probably quite different from the order that they are on the disk, so the disk had to seek a lot. In addition, it had to read the index. Taking some wild guesses about the distribution of your data, I'd hazard a guess that this specific query could be sped up a great deal by creating an index on lower(firstname). Regards, Stephen. Disclaimer: At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way. __ This email has been scanned by the DMZGlobal Business Quality Electronic Messaging Suite. Please see http://www.dmzglobal.com/services/bqem.htm for details. __ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] count * performance issue
Craig James wrote: My question is: What do the other databases do that Postgres can't do, and why not? Count() on Oracle and MySQL is almost instantaneous, even for very large tables. So why can't Postgres do what they do? I think Mysql can only do that for the myisam engine - innodb and falcon are similar to Postgres. I don't believe Oracle optimizes bare count(*) on a table either - tho it may be able to use a suitable index (if present) to get the answer quicker. regards Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] count * performance issue
Tom Lane wrote: Craig James <[EMAIL PROTECTED]> writes: Count() on Oracle and MySQL is almost instantaneous, even for very large tables. So why can't Postgres do what they do? AFAIK the above claim is false for Oracle. They have the same transactional issues we do. My experience doesn't match this claim. When I ported my application from Oracle to Postgres, this was the single biggest performance problem. count() in Oracle was always very fast. We're not talking about a 20% or 50% difference, we're talking about a small fraction of a second (Oracle) versus a minute (Postgres) -- something like two or three orders of magnitude. It may be that Oracle has a way to detect when there's no transaction and use a faster method. If so, this was a clever optimization -- in my experience, that represents the vast majority of the times you want to use count(). It's not very useful to count the rows of a table that many apps are actively modifying since the result may change the moment your transaction completes. Most of the time when you use count(), it's because you're the only one modifying the table, so the count will be meaningful. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Improve Full text rank in a query
Hi all, I'm running the following query to match a supplied text string to an actual place name which is recorded in a table with extra info like coordinates, etc. SELECT ts_rank_cd(textsearchable_index_col , query, 32 /* rank/(rank+1) */) AS rank,* FROM gazetteer, to_tsquery('Gunbower|Island|Vic') query WHERE query @@ textsearchable_index_col order by rank desc, concise_ga desc, auda_alloc desc LIMIT 10 When I run this I get the following top two results: Pos RankName State 1 0.23769 Gunbower Island Primary School Vic 2 0.23769 Gunbower Island Vic The textsearchable_index_col for each of these looks like this: 'vic':6 '':5 'gunbow':1 'island':2 'school':4 'primari':3 'victoria':7 'vic':4 '':3 'gunbow':1 'island':2 'victoria':5 I'm new to this, but I can't figure out why the "Gunbower Island Primary School" is getting top place. How do I get the query to improve the ranking so that an exact match (like "Gunbower|Island|Vic") gets a higher position? Thanks, bw No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.516 / Virus Database: 269.21.4/1309 - Release Date: 3/03/2008 6:50 PM -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] database design for large data.
Hi, I am using postgresql for application. daily i will get more than 5,00,000 records. i have done the partitioning of the table for each month. while generating reports, i will do join on some other table with the large table it takes too much time to get the data so i am planning design star schema for report for last month so report module directly will pick from that stale data. please suggest me any other way to generate report from very large data Regards, Shilpa The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com
Re: [PERFORM] count * performance issue
Craig James wrote: Tom Lane wrote: Craig James <[EMAIL PROTECTED]> writes: Count() on Oracle and MySQL is almost instantaneous, even for very large tables. So why can't Postgres do what they do? AFAIK the above claim is false for Oracle. They have the same transactional issues we do. My experience doesn't match this claim. When I ported my application from Oracle to Postgres, this was the single biggest performance problem. count() in Oracle was always very fast. We're not talking about a 20% or 50% difference, we're talking about a small fraction of a second (Oracle) versus a minute (Postgres) -- something like two or three orders of magnitude. It may be that Oracle has a way to detect when there's no transaction and use a faster method. If so, this was a clever optimization -- in my experience, that represents the vast majority of the times you want to use count(). It's not very useful to count the rows of a table that many apps are actively modifying since the result may change the moment your transaction completes. Most of the time when you use count(), it's because you're the only one modifying the table, so the count will be meaningful. Craig Oracle will use a btree index on a not null set of columns to do a fast full index scan, which can be an order of magnitude or faster compared to a table scan. Also, Oracle can use a bitmap index (in cases where a bitmap index isn't otherwise silly) for a bitmap fast index scan/bitmap conversion for similar dramatic results. For "large" tables, Oracle is not going to be as fast as MyISAM tables in MySQL, even with these optimizations, since MyISAM doesn't have to scan even index pages to get a count(*) answer against the full table. Paul -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Improve Full text rank in a query
"b wragg" <[EMAIL PROTECTED]> writes: > I'm new to this, but I can't figure out why the "Gunbower Island Primary > School" is getting top place. How do I get the query to improve the ranking > so that an exact match (like "Gunbower|Island|Vic") gets a higher position? I'm new at this too, but AFAICS these are both exact matches: they have the same matching lexemes at the same positions, so the basic rank calculation is going to come out exactly the same. Normalization option 32 doesn't help (as the manual notes, it's purely cosmetic). So it's random chance which one comes out first. What I think you might want is one of the other normalization options, so that shorter documents are preferred. Either 1, 2, 8, or 16 would do fine for this simple example --- which one you want depends on just how heavily you want to favor shorter documents. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Nested loop vs merge join: inconsistencies between estimated and actual time
I've came across this issue while writing report-like query for 2 not very large tables. I've tried several methods to resolve this one (see below). But now I'm really stuck... PostgreSQL 8.3, default configuration There are 2 tables (structure was simplified to show only problematic place): create table c ( id bigint primary key cdate date ); create index c_cdate_idx on c (cdate); create table i ( id bigint primary key, id_c bigint references c(id) ); select count(*) from c count 636 565 select count(*) from i count 4 646 145 analyze i; analyze c; explain analyze select id from c join i on i.idc = c.id where c.cdate between '2007-02-01' and '2007-02-16' QUERY PLAN - Merge Join (cost=738.95..57864.63 rows=14479 width=8) (actual time=13954.681..14358.731 rows=14583 loops=1) Merge Cond: (i.idc = c.id) -> Index Scan using fki_i_c_fk on i (cost=0.00..194324.34 rows=4646145 width=8) (actual time=17.254..12061.414 rows=1042599 loops=1) -> Sort (cost=738.94..756.88 rows=7178 width=8) (actual time=53.942..75.013 rows=14583 loops=1) Sort Key: c.id Sort Method: quicksort Memory: 404kB -> Index Scan using c_cdate_idx on c (cost=0.00..279.21 rows=7178 width=8) (actual time=23.595..41.470 rows=7064 loops=1) Index Cond: ((cdate >= '2007-02-01'::date) AND (cdate <= '2007-02-16'::date)) Total runtime: 14379.461 ms set enable_mergejoin to off; set enable_hashjoin to off; QUERY PLAN -- Nested Loop (cost=0.00..59833.70 rows=14479 width=8) (actual time=0.129..153.038 rows=14583 loops=1) -> Index Scan using c_cdate_idx on c (cost=0.00..279.21 rows=7178 width=8) (actual time=0.091..14.468 rows=7064 loops=1) Index Cond: ((cdate >= '2007-02-01'::date) AND (cdate <= '2007-02-16'::date)) -> Index Scan using fki_i_c_fk on i (cost=0.00..8.13 rows=13 width=8) (actual time=0.007..0.011 rows=2 loops=7064) Index Cond: (i.idc = c.id) Total runtime: 172.599 ms Ok, the first problem is here: -> Index Scan using fki_i_c_fk on i (cost=0.00..8.13 rows=13 width=8) (actual time=0.007..0.011 rows=2 loops=7064) I collected statistics for these tables at level 1000 for all columns. select attname, null_frac, avg_width, n_distinct, correlation from pg_stats where tablename = 'i' attname null_frac avg_width n_distinct correlation -- -- - -- id 0 8 -1 0,849796295166 idc0,7236369848251343 8 95583 0,999763011932373 Nice stats except of n_distinct for idc column. select count(distinct idc) from i count 633 864 Of course it is not correct solution but... update pg_statistic set stadistinct = 633864 where starelid = ... and staattnum = ... Reconnect and execute: explain analyze select id from c join i on i.idc = c.id where c.cdate between '2007-02-01' and '2007-02-16' QUERY PLAN -- Nested Loop (cost=0.00..57342.39 rows=14479 width=8) (actual time=0.133..151.426 rows=14583 loops=1) -> Index Scan using c_cdate_idx on c (cost=0.00..279.21 rows=7178 width=8) (actual time=0.094..14.242 rows=7064 loops=1) Index Cond: ((cdate >= '2007-02-01'::date) AND (cdate <= '2007-02-16'::date)) -> Index Scan using fki_i_c_fk on i (cost=0.00..7.92 rows=2 width=8) (actual time=0.007..0.011 rows=2 loops=7064) Index Cond: (i.idc = c.id) Total runtime: 170.911 ms But the reason of this issue is not the incorrect value of n_distinct. Let's expand dates interval in WHERE clause. explain analyze select id from c join i on i.idc = c.id where c.cdate between '2007-02-01' and '2007-02-19' QUERY PLAN Merge Join (cost=831.16..57981.98 rows=16155 width=8) (actual time=11691.156..12155.201 rows=16357 loops=1) Merge Cond: (i.idc = c.id) -> Index Scan using fki_i_c_fk on i (cost=0.00..194324.34 rows=4646145 width=8) (actual time=22.236..9928.489 rows=1044373 loops=1) -> Sort (cost=831.15..851.17 rows=8009 width=8) (actual time=31.660..55.277 rows=16357 loops=1) Sort Key: c.id Sort Method: quicksort Memory: 438kB -> Index Scan using c_cdate_idx on c (cost=0.00..311.87 rows=8009 width=8) (actual time=0.116..17.050 rows=7918 loops=1) Index Cond: ((cdate >= '2007-02-01'::date) AND (cdat
Re: [PERFORM] Nested loop vs merge join: inconsistencies between estimated and actual time
Vlad Arkhipov <[EMAIL PROTECTED]> writes: > I've came across this issue while writing report-like query for 2 not > very large tables. I've tried several methods to resolve this one (see > below). But now I'm really stuck... It looks like you are wishing to optimize for all-in-memory situations, in which case the traditional advice is to reduce random_page_cost to something close to 1. AFAICS all the rowcount estimates you're seeing are spot on, or as close to spot on as you could realistically hope for, and so the problem lies with the cost parameters. Fooling with the statistics is not going to help if the rowcount estimates are already good. (Note: the apparent undercounts you're seeing on indexscans on the outer side of a mergejoin seem to be because the mergejoin terminates early due to limited range of the other input join key. The planner is expecting this, as we can see because the predicted cost of the join is actually much less than the predicted cost of running the input indexscan to completion. The cost ratio is about consistent with the rowcount ratio, which makes me think it got these right too.) regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance