[PERFORM] index usage makes problem

2008-03-06 Thread sathiya psql
I had 50 lakh records in my table...

while counting that am using that row in where condition... which makes
problem, cpu is waiting for device...

Debian OS, postresql 7.4, 50 lakh records.

Query is

EXPLAIN ANALYZE select count(call_id) from call_log where call_id > 1;

while seeing the top, cpu is waiting for i/o, and without this call_id
condition if i do
   EXPLAIN ANALYZE select count(oid) from call_log where oid > 1;
it executed in 21 seconds


Re: [PERFORM] count * performance issue

2008-03-06 Thread sathiya psql
> Yes it is the latest stable version.
>

is there any article saying the difference between this 7.3 and 8.4


Re: [PERFORM] count * performance issue

2008-03-06 Thread Harald Armin Massa
Of course, the official documentation covers that information in its
release notes

http://www.postgresql.org/docs/8.3/static/release.html

best wishes

Harald

On Thu, Mar 6, 2008 at 1:43 PM, sathiya psql <[EMAIL PROTECTED]> wrote:
>
>
> >
> >
> >
> > Yes it is the latest stable version.
>
> is there any article saying the difference between this 7.3 and 8.4
>
>
>



-- 
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
fx 01212-5-13695179
-
EuroPython 2008 will take place in Vilnius, Lithuania - Stay tuned!

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-06 Thread Dave Cramer


On 6-Mar-08, at 1:43 AM, sathiya psql wrote:


is there any way to explicitly force the postgres to use index scan




If you want to count all the rows in the table there is only one way  
to do it (without keeping track yourself with a trigger ); a seq scan.


An index will not help you.

The only thing that is going to help you is really fast disks, and  
more memory, and you should consider moving to 8.3 for all the other  
performance benefits.


Dave 


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-06 Thread A. Kretschmer
am  Thu, dem 06.03.2008, um 18:13:50 +0530 mailte sathiya psql folgendes:
> 
> Yes it is the latest stable version.
> 
> 
> is there any article saying the difference between this 7.3 and 8.4

http://developer.postgresql.org/pgdocs/postgres/release.html


Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-06 Thread Dave Cramer

Hi,

On 6-Mar-08, at 6:58 AM, sathiya psql wrote:


The only thing that is going to help you is really fast disks, and
more memory, and you should consider moving to 8.3 for all the other
performance benefits.
Is 8.3 is a stable version or what is the latest stable version of  
postgres ??



Yes it is the latest stable version.

moving my database from 7.4 to 8.3 will it do any harm ??


You will have to test this yourself. There may be issues

what are all the advantages of moving from 7.4 to 8.3

Every version of postgresql has improved performance, and robustness;  
so you will get better overall performance. However I want to caution  
you this is not a panacea. It will NOT solve your seq scan problem.




Dave





Re: [PERFORM] count * performance issue

2008-03-06 Thread Alvaro Herrera
sathiya psql escribió:
> > Yes it is the latest stable version.
> 
> is there any article saying the difference between this 7.3 and 8.4

http://www.postgresql.org/docs/8.3/static/release.html

In particular,
http://www.postgresql.org/docs/8.3/static/release-8-3.html
http://www.postgresql.org/docs/8.3/static/release-8-2.html
http://www.postgresql.org/docs/8.3/static/release-8-1.html
http://www.postgresql.org/docs/8.3/static/release-8-0.html
which are all the major releases between 7.4 and 8.3.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] postgresql Explain command output

2008-03-06 Thread Greg Smith

On Thu, 6 Mar 2008, RaviRam Kolipaka wrote:


My goal is create a visual representation of the expain plan.


This problem has been solved already by code that's in pgadmin and you 
might look at that source code for hints if you want to write your own 
implementation.  There's a good intro to using that at 
http://www.postgresonline.com/journal/index.php?/archives/27-Reading-PgAdmin-Graphical-Explain-Plans.html


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-06 Thread Mark Mielke

A. Kretschmer wrote:

am  Thu, dem 06.03.2008, um 12:17:55 +0530 mailte sathiya psql folgendes:
  

TRIGGER i can use if i want the count of the whole table, but i require for
some of the rows with WHERE condition

so how to do that ???



Okay, in this case a TRIGGER are a bad idea. You can use an INDEX on
this row. Can you show us the output for a EXPLAIN ANALYSE SELECT
count(*) from  WHERE  = ... ?
  


Actually - in this case, TRIGGER can be a good idea. If your count table 
can include the where information, then you no longer require an 
effective table-wide lock for updates.


In the past I have used sequential articles numbers within a topic for 
an online community. Each topic row had an article_count. To generate a 
new article, I could update the article_count and use the number I had 
generated as the article number. To query the number of articles in a 
particular topic, article_count was available. Given thousands of 
topics, and 10s of thousands of articles, the system worked pretty good. 
Not in the millions range as the original poster, but I saw no reason 
why this wouldn't scale.


For the original poster: You might be desperate and looking for help 
from the only place you know to get it from, but some of your recent 
answers have shown that you are either not reading the helpful responses 
provided to you, or you are unwilling to do your own research. If that 
continues, I won't be posting to aid you.


Cheers,
mark

--
Mark Mielke <[EMAIL PROTECTED]>



Re: [PERFORM] count * performance issue

2008-03-06 Thread Greg Smith

On Thu, 6 Mar 2008, sathiya psql wrote:


is there any article saying the difference between this 7.3 and 8.4


I've collected a list of everything on this topic I've seen at 
http://www.postgresqldocs.org/index.php/Version_8.3_Changes


The Feature Matrix linked to there will be a quicker way to see what's 
happened than sorting through the release notes.


None of these changes change the fact that getting an exact count in this 
situation takes either a sequential scan or triggers.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-06 Thread Greg Smith

On Thu, 6 Mar 2008, sathiya psql wrote:

any way will you explain., what is this COST, actual time and other 
stuffs


There's a long list of links to tools and articles on this subject at 
http://www.postgresqldocs.org/index.php/Using_EXPLAIN


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] postgresql Explain command output

2008-03-06 Thread Tom Lane
Greg Smith <[EMAIL PROTECTED]> writes:
> On Thu, 6 Mar 2008, RaviRam Kolipaka wrote:
>> My goal is create a visual representation of the expain plan.

> This problem has been solved already by code that's in pgadmin and you 
> might look at that source code for hints if you want to write your own 
> implementation.

It's been solved more than once actually --- Red Hat did a "Visual
Explain" tool several years ago, which is unmaintained now but still
available for download (http://sources.redhat.com/rhdb/).  I've heard
that EDB picked it up and is now maintaining their own fork, but I
don't know the status of that for sure.  That code is in Java, if it
makes a difference to you.

regards, tom lane

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] postgresql Explain command output

2008-03-06 Thread Greg Smith

On Thu, 6 Mar 2008, Tom Lane wrote:

Red Hat did a "Visual Explain" tool several years ago, which is 
unmaintained now but still available for download 
(http://sources.redhat.com/rhdb/).  I've heard that EDB picked it up and 
is now maintaining their own fork, but I don't know the status of that 
for sure.


I know I wrote this down somewhere...ah ha, it was in the MySQL 
comparision paper:


Visual Explain, originally a RedHat component that has been kept current 
and improved by Enterprise DB, comes bundled with the EnterpriseDB 
Advanced Server package.  It can be built to run against other PostgreSQL 
installations using the source code to their Developer Studio package: 
http://www.enterprisedb.com/products/download.do


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-06 Thread Craig James

In the 3 years I've been using Postgres, the problem of count() performance has come up 
more times than I can recall, and each time the answer is, "It's a sequential scan 
-- redesign your application."

My question is: What do the other databases do that Postgres can't do, and why 
not?

Count() on Oracle and MySQL is almost instantaneous, even for very large 
tables. So why can't Postgres do what they do?

On the one hand, I understand that Postgres has its architecture, and I 
understand the issue of row visibility, and so forth.  On the other hand, my 
database is just sitting there, nothing going on, no connections except me, 
and... it takes FIFTY FIVE SECONDS to count 20 million rows, a query that 
either Oracle or MySQL would answer in a fraction of a second.  It's hard for 
me to believe there isn't a better way.

This is a real problem.  Countless people (including me) have spent significant effort 
rewriting applications because of this performance flaw in Postgres.  Over and over, the 
response is, "You don't really need to do that ... change your application."  
Well, sure, it's always possible to change the application, but that misses the point.  
To most of us users, count() seems like it should be a trivial operation.  On other 
relational database systems, it is a trivial operation.

This is really a significant flaw on an otherwise excellent relational database 
system.

My rant for today...
Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-06 Thread Bruce Momjian
Craig James wrote:
> This is a real problem.  Countless people (including me) have
> spent significant effort rewriting applications because of this
> performance flaw in Postgres.  Over and over, the response is,
> "You don't really need to do that ... change your application."
> Well, sure, it's always possible to change the application, but
> that misses the point.  To most of us users, count() seems like
> it should be a trivial operation.  On other relational database
> systems, it is a trivial operation.
> 
> This is really a significant flaw on an otherwise excellent
> relational database system.

Have you read the TODO items related to this?

--
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-06 Thread Steinar H. Gunderson
On Thu, Mar 06, 2008 at 07:28:50AM -0800, Craig James wrote:
> Count() on Oracle and MySQL is almost instantaneous, even for very large 
> tables. So why can't Postgres do what they do?

In MySQL's case: Handle transactions. (Try COUNT(*) on an InnoDB table.)

/* Steinar */
-- 
Homepage: http://www.sesse.net/


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-06 Thread Bill Moran
In response to Craig James <[EMAIL PROTECTED]>:

> In the 3 years I've been using Postgres, the problem of count() performance 
> has come up more times than I can recall, and each time the answer is, "It's 
> a sequential scan -- redesign your application."
> 
> My question is: What do the other databases do that Postgres can't do, and 
> why not?
> 
> Count() on Oracle and MySQL is almost instantaneous, even for very large 
> tables. So why can't Postgres do what they do?

I don't know about Oracle, but MySQL has this problem as well.  Use
innodb tables and see how slow it is.  The only reason myisam tables
don't have this problem is because they don't implement any of the
features that make the problem difficult to solve.

> On the one hand, I understand that Postgres has its architecture, and I 
> understand the issue of row visibility, and so forth.  On the other hand, my 
> database is just sitting there, nothing going on, no connections except me, 
> and... it takes FIFTY FIVE SECONDS to count 20 million rows, a query that 
> either Oracle or MySQL would answer in a fraction of a second.  It's hard for 
> me to believe there isn't a better way.

There's been discussion about putting visibility information in indexes.
I don't know how far along that effort is, but I expect that will improve
count() performance significantly.

> This is a real problem.  Countless people (including me) have spent 
> significant effort rewriting applications because of this performance flaw in 
> Postgres.  Over and over, the response is, "You don't really need to do that 
> ... change your application."  Well, sure, it's always possible to change the 
> application, but that misses the point.  To most of us users, count() seems 
> like it should be a trivial operation.  On other relational database systems, 
> it is a trivial operation.
> 
> This is really a significant flaw on an otherwise excellent relational 
> database system.

Not really.  It really is a design flaw in your application ... it doesn't
make relational sense to use the number of rows in a table for anything.
Just because other people do it frequently doesn't make it right.

That being said, it's still a useful feature, and I don't hear anyone
denying that.  As I said, google around a bit WRT to PG storing
visibility information in indexes, as I think that's the way this will
be improved.

> My rant for today...

Feel better now?

-- 
Bill Moran

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-06 Thread Greg Smith

On Thu, 6 Mar 2008, Steinar H. Gunderson wrote:


On Thu, Mar 06, 2008 at 07:28:50AM -0800, Craig James wrote:

Count() on Oracle and MySQL is almost instantaneous, even for very large
tables. So why can't Postgres do what they do?


In MySQL's case: Handle transactions. (Try COUNT(*) on an InnoDB table.)


Exactly.  There is a good discussion of this at 
http://www.mysqlperformanceblog.com/2007/04/10/count-vs-countcol/ and I 
found the comments from Ken Jacobs were the most informative.


In short, if you want any reasonable database integrity you have to use 
InnoDB with MySQL, and once you make that choice it has the same problem. 
You only get this accelerated significantly when using MyISAM, which can 
tell you an exact count of all the rows it hasn't corrupted yet.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-06 Thread Dave Page
On Thu, Mar 6, 2008 at 3:49 PM, Greg Smith <[EMAIL PROTECTED]> wrote:
>
>  You only get this accelerated significantly when using MyISAM, which can
>  tell you an exact count of all the rows it hasn't corrupted yet.

Please don't do that again. I'm going to have to spend the next hour
cleaning coffee out of my laptop keyboard.

:-)

-- 
Dave Page
EnterpriseDB UK Ltd: http://www.enterprisedb.com
PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-06 Thread Mark Lewis
On Thu, 2008-03-06 at 07:28 -0800, Craig James wrote:
...
> My question is: What do the other databases do that Postgres can't do, and 
> why not?
> 
> Count() on Oracle and MySQL is almost instantaneous, even for very large 
> tables. So why can't Postgres do what they do?
...

I can vouch that Oracle can still take linear time to perform a
count(*), at least in some cases.

I have also seen count(*) fast in some cases too... my understanding is
that they maintain a list of "interested transactions" on a per-relation
basis.  Perhaps they do an optimization based on the index size if there
are no pending DML transactions?

-- Mark



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] oid...any optimizations

2008-03-06 Thread Joshua D. Drake
On Thu, 6 Mar 2008 12:43:57 +0530
"sathiya psql" <[EMAIL PROTECTED]> wrote:

> >
> > Actually it isn't obvious as oids have been deprecated for years.
> 
> 
> no in my version it is now also available

I didn't say they were gone. I said they are deprecated. You should not
be using them.

> 
> >
> >
> > What version of ancient PostgreSQL are you running exactly?
> 
> 
> postgresql 7.4

That is god awful ancient. Upgrade to something remotely new, like
8.2.6.

Joshua D. Drake

-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL SPI Liaison | SPI Director |  PostgreSQL political pundit



signature.asc
Description: PGP signature


Re: [PERFORM] count * performance issue

2008-03-06 Thread D'Arcy J.M. Cain
On Thu, 06 Mar 2008 07:28:50 -0800
Craig James <[EMAIL PROTECTED]> wrote:
> In the 3 years I've been using Postgres, the problem of count() performance 
> has come up more times than I can recall, and each time the answer is, "It's 
> a sequential scan -- redesign your application."
> 
> My question is: What do the other databases do that Postgres can't do, and 
> why not?
> 
> Count() on Oracle and MySQL is almost instantaneous, even for very large 
> tables. So why can't Postgres do what they do?

It's a tradeoff.  The only way to get that information quickly is to
maintain it internally when you insert or delete a row.  So when do you
want to take your hit.  It sounds like Oracle has made this decision
for you.  In PostgreSQL you can use triggers and rules to manage this
information if you need it.  You can even do stuff like track how many
of each type of something you have.  That's something you can't do if
your database engine has done a generic speedup for you.  You would
still have to create your own table for something like that and then
you get the hit twice.

-- 
D'Arcy J.M. Cain <[EMAIL PROTECTED]> |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] Performance of aggregates over set-returning functions

2008-03-06 Thread Bruce Momjian

This this a bug or TODO item?

---

Tom Lane wrote:
> "John Smith" <[EMAIL PROTECTED]> writes:
> >> It's pipelined either way.  But int8 is a pass-by-reference data type,
> >> and it sounds like we have a memory leak for this case.
> 
> > Thanks for your reply. How easy is it to fix this? Which portion of
> > the code should we look to change?
> 
> I was just looking at that.  The issue looks to me that nodeResult.c
> (and other plan node types that support SRFs in their targetlists)
> do this:
> 
> /*
>  * Check to see if we're still projecting out tuples from a previous scan
>  * tuple (because there is a function-returning-set in the projection
>  * expressions).  If so, try to project another one.
>  */
> if (node->ps.ps_TupFromTlist)
> {
> resultSlot = ExecProject(node->ps.ps_ProjInfo, &isDone);
> if (isDone == ExprMultipleResult)
> return resultSlot;
> /* Done with that source tuple... */
> node->ps.ps_TupFromTlist = false;
> }
> 
> /*
>  * Reset per-tuple memory context to free any expression evaluation
>  * storage allocated in the previous tuple cycle.  Note this can't happen
>  * until we're done projecting out tuples from a scan tuple.
>  */
> ResetExprContext(econtext);
> 
> whereas there would be no memory leak if these two chunks of code were
> in the other order.  The question is whether resetting the context first
> would throw away any data that we *do* still need for the repeated
> ExecProject calls.  That second comment block implies there's something
> we do need.
> 
> I'm not sure why it's like this.  Some digging in the CVS history shows
> that indeed the code used to be in the other order, and I switched it
> (and added the second comment block) in this old commit:
> 
> http://archives.postgresql.org/pgsql-committers/2000-08/msg00218.php
> 
> I suppose that the SQL-function support at the time required that its
> calling memory context be persistent until it returned ExprEndResult,
> but I sure don't recall any details.  It's entirely possible that that
> requirement no longer exists, or could easily be eliminated given all
> the other changes that have happened since then.  nodeFunctionscan.c
> seems to reset the current context for each call of a SRF, so I'd think
> that anything that can't cope with that should have been flushed out
> by now.
> 
> If you feel like poking at this, I *strongly* recommend doing your
> testing in an --enable-cassert build.  You'll have no idea whether you
> freed stuff too early if you don't have CLOBBER_FREED_MEMORY enabled.
> 
>   regards, tom lane
> 
> ---(end of broadcast)---
> TIP 5: don't forget to increase your free space map settings

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] Why the difference in plans ?

2008-03-06 Thread Josh Berkus
Dave,

> Below I have two almost identical queries.  Strangely enough the one
> that uses the index is slower ???

My first guess would be that records are highly correlated by DOB and not at 
all by name.  However, it would help if you supplied both the index 
definitions and what changed between the two queries to cause the index to be 
used.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-06 Thread Tom Lane
Craig James <[EMAIL PROTECTED]> writes:
> Count() on Oracle and MySQL is almost instantaneous, even for very large 
> tables. So why can't Postgres do what they do?

AFAIK the above claim is false for Oracle.  They have the same
transactional issues we do.

regards, tom lane

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] Performance of aggregates over set-returning functions

2008-03-06 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes:
> This this a bug or TODO item?

TODO, I think.  I wouldn't want to risk pushing a change in this into
back branches.

regards, tom lane

>> I'm not sure why it's like this.  Some digging in the CVS history shows
>> that indeed the code used to be in the other order, and I switched it
>> (and added the second comment block) in this old commit:
>> 
>> http://archives.postgresql.org/pgsql-committers/2000-08/msg00218.php
>> 
>> I suppose that the SQL-function support at the time required that its
>> calling memory context be persistent until it returned ExprEndResult,
>> but I sure don't recall any details.  It's entirely possible that that
>> requirement no longer exists, or could easily be eliminated given all
>> the other changes that have happened since then.  nodeFunctionscan.c
>> seems to reset the current context for each call of a SRF, so I'd think
>> that anything that can't cope with that should have been flushed out
>> by now.
>> 
>> If you feel like poking at this, I *strongly* recommend doing your
>> testing in an --enable-cassert build.  You'll have no idea whether you
>> freed stuff too early if you don't have CLOBBER_FREED_MEMORY enabled.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] Performance of aggregates over set-returning functions

2008-03-06 Thread Bruce Momjian

OK, added to TODO:

* Reduce memory usage of aggregates in set returning functions

  http://archives.postgresql.org/pgsql-performance/2008-01/msg00031.php


---

Tom Lane wrote:
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > This this a bug or TODO item?
> 
> TODO, I think.  I wouldn't want to risk pushing a change in this into
> back branches.
> 
>   regards, tom lane
> 
> >> I'm not sure why it's like this.  Some digging in the CVS history shows
> >> that indeed the code used to be in the other order, and I switched it
> >> (and added the second comment block) in this old commit:
> >> 
> >> http://archives.postgresql.org/pgsql-committers/2000-08/msg00218.php
> >> 
> >> I suppose that the SQL-function support at the time required that its
> >> calling memory context be persistent until it returned ExprEndResult,
> >> but I sure don't recall any details.  It's entirely possible that that
> >> requirement no longer exists, or could easily be eliminated given all
> >> the other changes that have happened since then.  nodeFunctionscan.c
> >> seems to reset the current context for each call of a SRF, so I'd think
> >> that anything that can't cope with that should have been flushed out
> >> by now.
> >> 
> >> If you feel like poking at this, I *strongly* recommend doing your
> >> testing in an --enable-cassert build.  You'll have no idea whether you
> >> freed stuff too early if you don't have CLOBBER_FREED_MEMORY enabled.
> 
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] Why the difference in plans ?

2008-03-06 Thread Dave Cramer


On 6-Mar-08, at 12:26 PM, Josh Berkus wrote:


Dave,


Below I have two almost identical queries.  Strangely enough the one
that uses the index is slower ???


My first guess would be that records are highly correlated by DOB  
and not at

all by name.  However, it would help if you supplied both the index
definitions and what changed between the two queries to cause the  
index to be

used.


The two queries were run 2 seconds apart, there were no changes  
between. I'll get the index definitions.


Dave



--
Josh Berkus
PostgreSQL @ Sun
San Francisco

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org 
)

To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] More shared buffers causes lower performances

2008-03-06 Thread Bruce Momjian
Tom Lane wrote:
> Greg Smith <[EMAIL PROTECTED]> writes:
> > On Wed, 26 Dec 2007, Guillaume Smet wrote:
> >> beta RPMs are by default compiled with --enable-debug and
> >> --enable-cassert which doesn't help them to fly fast...
> 
> > Got that right.  Last time I was going crazy after running pgbench with 
> > those options and not having realized what I changed, I was getting a 50% 
> > slowdown on results that way compared to without the debugging stuff. 
> > Didn't realize it scaled with shared_buffers though.
> 
> See AtEOXact_Buffers().  There are probably any number of other
> interesting scaling behaviors --- in my tests, AllocSetCheck() is
> normally a major cycle-eater if --enable-cassert is set, and that costs
> time proportional to the number of memory chunks allocated by the query.
> 
> Currently the docs say that --enable-cassert
> 
>  Enables assertion checks in the server, which test for
>  many cannot happen conditions.  This is invaluable for
>  code development purposes, but the tests slow things down a little.
> 
> Maybe we ought to put that more strongly --- s/a little/significantly/,
> perhaps?

Docs updated with attached patch, backpatched to 8.3.X.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: doc/src/sgml/installation.sgml
===
RCS file: /cvsroot/pgsql/doc/src/sgml/installation.sgml,v
retrieving revision 1.302
diff -c -c -r1.302 installation.sgml
*** doc/src/sgml/installation.sgml	17 Feb 2008 16:36:43 -	1.302
--- doc/src/sgml/installation.sgml	6 Mar 2008 21:36:39 -
***
*** 1144,1157 
  
   Enables assertion checks in the server, which test for
   many cannot happen conditions.  This is invaluable for
!  code development purposes, but the tests slow things down a little.
   Also, having the tests turned on won't necessarily enhance the
   stability of your server!  The assertion checks are not categorized
   for severity, and so what might be a relatively harmless bug will
   still lead to server restarts if it triggers an assertion
!  failure.  Currently, this option is not recommended for
!  production use, but you should have it on for development work
!  or when running a beta version.
  
 

--- 1144,1158 
  
   Enables assertion checks in the server, which test for
   many cannot happen conditions.  This is invaluable for
!  code development purposes, but the tests can slow down the
!  server significantly.
   Also, having the tests turned on won't necessarily enhance the
   stability of your server!  The assertion checks are not categorized
   for severity, and so what might be a relatively harmless bug will
   still lead to server restarts if it triggers an assertion
!  failure.  This option is not recommended for production use, but
!  you should have it on for development work or when running a beta
!  version.
  
 


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] Why the difference in plans ?

2008-03-06 Thread Stephen Denne
Dave Cramer wrote:
> I have two almost identical queries.  Strangely enough the one  
> that uses the index is slower ???

The index scan is being used so that it can retrieve the rows in the name order.
It expects that if it was to retrieve every row via the index, it would get 
about 1010 rows that matched the filter, and it knows it can stop after 250, so 
assuming the matching rows are evenly distributed it thinks it can stop after 
having read only a quarter of the rows.

However only 129 rows matched. Consequently it had to read every row in the 
table anyway, seeking a fair bit as the read order was specified by the index 
rather than in sequential order, and it also had to read the index. These extra 
costs were much larger than reading the lot sequentially, and sorting 129 
resulting rows.

The first query picked a sequential scan as it thought it was only going to get 
11 results, so was expecting that the limit wasn't going to come into play, and 
that every row would have to be read anyway.

Regards,
Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer 
focus, and courage. This email with any attachments is confidential and may be 
subject to legal privilege.  If it is not intended for you please advise by 
reply immediately, destroy it and do not copy, disclose or use it in any way.

__
  This email has been scanned by the DMZGlobal Business Quality 
  Electronic Messaging Suite.
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] Why the difference in plans ?

2008-03-06 Thread Dave Cramer


On 6-Mar-08, at 5:10 PM, Stephen Denne wrote:


Dave Cramer wrote:

I have two almost identical queries.  Strangely enough the one
that uses the index is slower ???


The index scan is being used so that it can retrieve the rows in the  
name order.
It expects that if it was to retrieve every row via the index, it  
would get about 1010 rows that matched the filter, and it knows it  
can stop after 250, so assuming the matching rows are evenly  
distributed it thinks it can stop after having read only a quarter  
of the rows.


However only 129 rows matched. Consequently it had to read every row  
in the table anyway, seeking a fair bit as the read order was  
specified by the index rather than in sequential order, and it also  
had to read the index. These extra costs were much larger than  
reading the lot sequentially, and sorting 129 resulting rows.


The first query picked a sequential scan as it thought it was only  
going to get 11 results, so was expecting that the limit wasn't  
going to come into play, and that every row would have to be read  
anyway.


The strange thing of course is that the data is exactly the same for  
both runs, the tables have not been changed between runs, and I did  
them right after another. Even more strange is that the seq scan is  
faster than the index scan.


Dave

Regards,
Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect,  
achievement, customer focus, and courage. This email with any  
attachments is confidential and may be subject to legal privilege.   
If it is not intended for you please advise by reply immediately,  
destroy it and do not copy, disclose or use it in any way.


__
 This email has been scanned by the DMZGlobal Business Quality
 Electronic Messaging Suite.
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why the difference in plans ?

2008-03-06 Thread Stephen Denne
> The strange thing of course is that the data is exactly the same for  
> both runs, the tables have not been changed between runs, and I did  
> them right after another. Even more strange is that the seq scan is  
> faster than the index scan.

It is not strange at all, since both queries read ALL the rows in your table, 
checking each and every row to see whether it matched your predicates.

The sequential scan read them in the order they are on the disk, meaning your 
disk didn't have to seek as much (assuming low file fragmentation).

The index scan again reads all the rows in your table, but reads them in the 
order they were in the index, which is probably quite different from the order 
that they are on the disk, so the disk had to seek a lot. In addition, it had 
to read the index.

Taking some wild guesses about the distribution of your data, I'd hazard a 
guess that this specific query could be sped up a great deal by creating an 
index on lower(firstname).

Regards,
Stephen.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer 
focus, and courage. This email with any attachments is confidential and may be 
subject to legal privilege.  If it is not intended for you please advise by 
reply immediately, destroy it and do not copy, disclose or use it in any way.

__
  This email has been scanned by the DMZGlobal Business Quality 
  Electronic Messaging Suite.
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-06 Thread Mark Kirkwood

Craig James wrote:


My question is: What do the other databases do that Postgres can't do, 
and why not?


Count() on Oracle and MySQL is almost instantaneous, even for very 
large tables. So why can't Postgres do what they do?




I think Mysql can only do that for the myisam engine - innodb and 
falcon  are similar to Postgres.


I don't believe Oracle optimizes bare count(*) on a table either - tho 
it may be able to use a suitable index (if present) to get the answer 
quicker.


regards

Mark

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-06 Thread Craig James

Tom Lane wrote:

Craig James <[EMAIL PROTECTED]> writes:

Count() on Oracle and MySQL is almost instantaneous, even for very large 
tables. So why can't Postgres do what they do?


AFAIK the above claim is false for Oracle.  They have the same
transactional issues we do.


My experience doesn't match this claim.  When I ported my application from 
Oracle to Postgres, this was the single biggest performance problem.  count() 
in Oracle was always very fast.  We're not talking about a 20% or 50% 
difference, we're talking about a small fraction of a second (Oracle) versus a 
minute (Postgres) -- something like two or three orders of magnitude.

It may be that Oracle has a way to detect when there's no transaction and use a 
faster method.  If so, this was a clever optimization -- in my experience, that 
represents the vast majority of the times you want to use count().  It's not 
very useful to count the rows of a table that many apps are actively modifying 
since the result may change the moment your transaction completes.  Most of the 
time when you use count(), it's because you're the only one modifying the 
table, so the count will be meaningful.

Craig


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Improve Full text rank in a query

2008-03-06 Thread b wragg
Hi all,

I'm running the following query to match a supplied text string to an actual
place name which is recorded in a table with extra info like coordinates,
etc.

SELECT ts_rank_cd(textsearchable_index_col , query, 32 /* rank/(rank+1) */)
AS rank,*
FROM gazetteer, to_tsquery('Gunbower|Island|Vic') query
WHERE query @@ textsearchable_index_col order by rank desc, concise_ga desc,
auda_alloc desc LIMIT 10

When I run this I get the following top two results:

Pos RankName
State
1   0.23769 Gunbower Island Primary School  Vic 
2   0.23769 Gunbower Island Vic

The textsearchable_index_col for each of these looks like this:

'vic':6 '':5 'gunbow':1 'island':2 'school':4 'primari':3 'victoria':7
'vic':4 '':3 'gunbow':1 'island':2 'victoria':5

I'm new to this, but I can't figure out why the "Gunbower Island Primary
School" is getting top place. How do I get the query to improve the ranking
so that an exact match (like "Gunbower|Island|Vic") gets a higher position?

Thanks,

bw




No virus found in this outgoing message.
Checked by AVG Free Edition. 
Version: 7.5.516 / Virus Database: 269.21.4/1309 - Release Date: 3/03/2008
6:50 PM
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] database design for large data.

2008-03-06 Thread shilpa.raghavendra
Hi,

I am using postgresql for application. daily i will get more
than 5,00,000 records.

i have done the partitioning of the table for each month.

while generating reports, i will do join on some other table
with the large table

it takes too much time to get the data so i am planning design
star schema for report for last month so report module directly will
pick from that stale data.

   please suggest me any other way to generate report from very
large data

Regards,
Shilpa

The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. 

WARNING: Computer viruses can be transmitted via email. The recipient should 
check this email and any attachments for the presence of viruses. The company 
accepts no liability for any damage caused by any virus transmitted by this 
email.

www.wipro.com



Re: [PERFORM] count * performance issue

2008-03-06 Thread paul rivers

Craig James wrote:

Tom Lane wrote:

Craig James <[EMAIL PROTECTED]> writes:
Count() on Oracle and MySQL is almost instantaneous, even for very 
large tables. So why can't Postgres do what they do?


AFAIK the above claim is false for Oracle.  They have the same
transactional issues we do.


My experience doesn't match this claim.  When I ported my application 
from Oracle to Postgres, this was the single biggest performance 
problem.  count() in Oracle was always very fast.  We're not talking 
about a 20% or 50% difference, we're talking about a small fraction of 
a second (Oracle) versus a minute (Postgres) -- something like two or 
three orders of magnitude.


It may be that Oracle has a way to detect when there's no transaction 
and use a faster method.  If so, this was a clever optimization -- in 
my experience, that represents the vast majority of the times you want 
to use count().  It's not very useful to count the rows of a table 
that many apps are actively modifying since the result may change the 
moment your transaction completes.  Most of the time when you use 
count(), it's because you're the only one modifying the table, so the 
count will be meaningful.


Craig




Oracle will use a btree index on a not null set of columns to do a fast 
full index scan, which can be an order of magnitude or faster compared 
to a table scan.  Also, Oracle can use a bitmap index (in cases where a 
bitmap index isn't otherwise silly) for a bitmap fast index scan/bitmap 
conversion for similar dramatic results.  

For "large" tables, Oracle is not going to be as fast as MyISAM tables 
in MySQL, even with these optimizations, since MyISAM doesn't have to 
scan even index pages to get a count(*) answer against the full table.


Paul



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Improve Full text rank in a query

2008-03-06 Thread Tom Lane
"b wragg" <[EMAIL PROTECTED]> writes:
> I'm new to this, but I can't figure out why the "Gunbower Island Primary
> School" is getting top place. How do I get the query to improve the ranking
> so that an exact match (like "Gunbower|Island|Vic") gets a higher position?

I'm new at this too, but AFAICS these are both exact matches: they have
the same matching lexemes at the same positions, so the basic rank
calculation is going to come out exactly the same.  Normalization option
32 doesn't help (as the manual notes, it's purely cosmetic).  So it's
random chance which one comes out first.

What I think you might want is one of the other normalization options,
so that shorter documents are preferred.  Either 1, 2, 8, or 16 would
do fine for this simple example --- which one you want depends on just
how heavily you want to favor shorter documents.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Nested loop vs merge join: inconsistencies between estimated and actual time

2008-03-06 Thread Vlad Arkhipov

I've came across this issue while writing report-like query for 2 not
very large tables. I've tried several methods to resolve this one (see
below). But now I'm really stuck...
PostgreSQL 8.3, default configuration

There are 2 tables (structure was simplified to show only problematic
place):
create table c
(
 id bigint primary key
 cdate date
);

create index c_cdate_idx on c (cdate);

create table i
(
 id bigint primary key,
 id_c bigint references c(id)
);

select count(*) from c

count

636 565

select count(*) from i

count

4 646 145

analyze i;
analyze c;

explain analyze
select id
from c
 join i on i.idc = c.id
where c.cdate between '2007-02-01' and '2007-02-16'

QUERY
PLAN 




- 



Merge Join  (cost=738.95..57864.63 rows=14479 width=8) (actual
time=13954.681..14358.731 rows=14583
loops=1)
 Merge Cond: (i.idc =
c.id) 



 ->  Index Scan using fki_i_c_fk on i  (cost=0.00..194324.34
rows=4646145 width=8) (actual time=17.254..12061.414 rows=1042599 loops=1)
 ->  Sort  (cost=738.94..756.88 rows=7178 width=8) (actual
time=53.942..75.013 rows=14583
loops=1)
   Sort Key:
c.id 



   Sort Method:  quicksort  Memory:
404kB 



   ->  Index Scan using c_cdate_idx on c  (cost=0.00..279.21
rows=7178 width=8) (actual time=23.595..41.470 rows=7064 loops=1)
 Index Cond: ((cdate >= '2007-02-01'::date) AND (cdate <=
'2007-02-16'::date))
Total runtime: 14379.461
ms 




set enable_mergejoin to off;
set enable_hashjoin to off;

QUERY
PLAN 



-- 



Nested Loop  (cost=0.00..59833.70 rows=14479 width=8) (actual
time=0.129..153.038 rows=14583
loops=1)
 ->  Index Scan using  c_cdate_idx on c  (cost=0.00..279.21 rows=7178
width=8) (actual time=0.091..14.468 rows=7064 loops=1)
   Index Cond: ((cdate >= '2007-02-01'::date) AND (cdate <=
'2007-02-16'::date))
 ->  Index Scan using fki_i_c_fk on i  (cost=0.00..8.13 rows=13
width=8) (actual time=0.007..0.011 rows=2 loops=7064)
   Index Cond: (i.idc =
c.id) 



Total runtime: 172.599 ms

Ok, the first problem is here:
 ->  Index Scan using fki_i_c_fk on i  (cost=0.00..8.13 rows=13
width=8) (actual time=0.007..0.011 rows=2 loops=7064)

I collected statistics for these tables at level 1000 for all columns.

select attname, null_frac, avg_width, n_distinct, correlation
from pg_stats
where tablename = 'i'

attname null_frac   avg_width n_distinct
correlation
--  --    -
--
id 0   8 -1 0,849796295166
idc0,7236369848251343  8 95583  0,999763011932373

Nice stats except of n_distinct for idc column.

select count(distinct idc)
from i

count

633 864

Of course it is not correct solution but...

update pg_statistic
set stadistinct = 633864
where starelid = ... and staattnum = ...

Reconnect and execute:

explain analyze
select id
from c
 join i on i.idc = c.id
where c.cdate between '2007-02-01' and '2007-02-16'

QUERY
PLAN 



-- 



Nested Loop  (cost=0.00..57342.39 rows=14479 width=8) (actual
time=0.133..151.426 rows=14583
loops=1)
 ->  Index Scan using c_cdate_idx on c  (cost=0.00..279.21 rows=7178
width=8) (actual time=0.094..14.242 rows=7064 loops=1)
   Index Cond: ((cdate >= '2007-02-01'::date) AND (cdate <=
'2007-02-16'::date))
 ->  Index Scan using fki_i_c_fk on i  (cost=0.00..7.92 rows=2 width=8)
(actual time=0.007..0.011 rows=2 loops=7064)
   Index Cond: (i.idc =
c.id) 



Total runtime: 170.911
ms 




But the reason of this issue is not the incorrect value of n_distinct.
Let's expand dates interval in WHERE clause.


explain analyze
select id
from c
 join i on i.idc = c.id
where c.cdate between '2007-02-01' and '2007-02-19'

QUERY
PLAN 



 



Merge Join  (cost=831.16..57981.98 rows=16155 width=8) (actual
time=11691.156..12155.201 rows=16357
loops=1)
 Merge Cond: (i.idc =
c.id) 



 ->  Index Scan using fki_i_c_fk on i  (cost=0.00..194324.34
rows=4646145 width=8) (actual time=22.236..9928.489 rows=1044373 loops=1)
 ->  Sort  (cost=831.15..851.17 rows=8009 width=8) (actual
time=31.660..55.277 rows=16357
loops=1)
   Sort Key:
c.id 



   Sort Method:  quicksort  Memory:
438kB 



   ->  Index Scan using c_cdate_idx on c  (cost=0.00..311.87
rows=8009 width=8) (actual time=0.116..17.050 rows=7918 loops=1)
 Index Cond: ((cdate >= '2007-02-01'::date) AND (cdat

Re: [PERFORM] Nested loop vs merge join: inconsistencies between estimated and actual time

2008-03-06 Thread Tom Lane
Vlad Arkhipov <[EMAIL PROTECTED]> writes:
> I've came across this issue while writing report-like query for 2 not
> very large tables. I've tried several methods to resolve this one (see
> below). But now I'm really stuck...

It looks like you are wishing to optimize for all-in-memory situations,
in which case the traditional advice is to reduce random_page_cost to
something close to 1.  AFAICS all the rowcount estimates you're seeing
are spot on, or as close to spot on as you could realistically hope for,
and so the problem lies with the cost parameters.  Fooling with the
statistics is not going to help if the rowcount estimates are already
good.

(Note: the apparent undercounts you're seeing on indexscans on the outer
side of a mergejoin seem to be because the mergejoin terminates early
due to limited range of the other input join key.  The planner is
expecting this, as we can see because the predicted cost of the join is
actually much less than the predicted cost of running the input
indexscan to completion.  The cost ratio is about consistent with the
rowcount ratio, which makes me think it got these right too.)

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance