[PERFORM] Slow performance with no apparent reason
I am in the process of adding PostgreSQL support for an application, in addition to Oracle and MS SQL. I am using PostgreSQL version 7.3.2, Red Hat 9.0 on Intel Pentium III board. I have a query that generally looks like this: SELECT t1.col1, t2.col1 FROM t1, t2 WHERE t1.x=t2.y AND t2.p='string' AND t2.q=1 This query is strikingly slow (about 100 sec when both t1 and t2 has about 1,200 records, compare with less than 4 sec with MS SQL and Oracle) The strange thing is that if I remove one of the last 2 conditions (doesn't matter which one), I get the same performance like with the other databases. Since in this particular case both conditions ( t2.p='string', t2.q=1) are not required, I can't understand why having both turns the query so slow. A query on table t2 alone is fast with or without the 2 conditions. I tired several alternatives, this one works pretty well: SELECT t1.col1, t2.col1 FROM t1, t2 WHERE t1.x=t2.y AND EXISTS ( SELECT * FROM t2 t2a WHERE t2a.p='string' AND t2a.q=1 AND t2a.y=t2.y ) Since the first query is simpler than the second, it seems to me like a bug. Please advise Yonatan
Re: [PERFORM] Slow performance with no apparent reason
I run ANALYZE and the problem resolved Thanks Yonatan Goraly kirjutas P, 26.10.2003 kell 00:25: I am in the process of adding PostgreSQL support for an application, in addition to Oracle and MS SQL. I am using PostgreSQL version 7.3.2, Red Hat 9.0 on Intel Pentium III board. I have a query that generally looks like this: SELECT t1.col1, t2.col1 FROM t1, t2 WHERE t1.x=t2.y AND t2.p='string' AND t2.q=1 This query is strikingly slow (about 100 sec when both t1 and t2 has about 1,200 records, compare with less than 4 sec with MS SQL and Oracle) always send results of EXPLAIN ANALYZE if you ask for help on [PERFORM] knowing which indexes you have would also help. and you should have run ANALYZE too. - Hannu ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[PERFORM] Various performance questions
Hi, We're in the process of setting up a new database server. The application is an online rss aggregator which you can see at www.fastbuzz.com (still running with the old hardware). The new machine is a dual Xeon with 2 Gigs of ram The OS is freebsd 4.9. shared_buffers = 1 sort_mem = 32768 effective_cache_size = 25520 -- freebsd forumla: vfs.hibufspace / 8192 1. While it seems to work correctly, I'm unclear on why this number is correct. 25520*8 = 204160 or 200 Megs. On a machine with 2 Gigs it seems like the number should be more like 1 - 1.5 Gigs. 2. The main performance challenges are with the items table which has around five million rows and grows at the rate of more than 100,000 rows a day. If I do a select count(*) from the items table it take 55 - 60 seconds to execute. I find it interesting that it takes that long whether it's doing it the first time and fetching the pages from disk or on subsequent request where it fetches the pages from memory. I know that it's not touching the disks because I'm running an iostat in a different window. Here's the explain analyze: explain analyze select count(*) from items; QUERY PLAN -- Aggregate (cost=245377.53..245377.53 rows=1 width=0) (actual time=55246.035..55246.040 rows=1 loops=1) -> Seq Scan on items (cost=0.00..233100.62 rows=4910762 width=0) (actual time=0.054..30220.641 rows=4910762 loops=1) Total runtime: 55246.129 ms (3 rows) and the number of pages: select relpages from pg_class where relname = 'items'; relpages -- 183993 So does it make sense that it would take close to a minute to count the 5 million rows even if all pages are in memory? 3. Relpages is 183993 so file size should be 183993*8192 = 1507270656, roughly 1.5 gig. The actual disk size is 1073741824 or roughly 1 gig. Why the difference? 4. If I put a certain filter/condition on the query it tells me that it's doing a sequential scan, and yet it takes less time than a full sequential scan: explain analyze select count(*) from items where channel < 5000; QUERY PLAN -- Aggregate (cost=249141.54..249141.54 rows=1 width=0) (actual time=26224.603..26224.608 rows=1 loops=1) -> Seq Scan on items (cost=0.00..245377.52 rows=1505605 width=0) (actual time=7.599..17686.869 rows=1632057 loops=1) Filter: (channel < 5000) Total runtime: 26224.703 ms How can it do a sequential scan and apply a filter to it in less time than the full sequential scan? Is it actually using an index without really telling me? Here's the structure of the items table Column | Type | Modifiers ---+--+--- articlenumber | integer | not null channel | integer | not null title | character varying| link | character varying| description | character varying| comments | character varying(500) | dtstamp | timestamp with time zone | signature | character varying(32)| pubdate | timestamp with time zone | Indexes: "item_channel_link" btree (channel, link) "item_created" btree (dtstamp) "item_signature" btree (signature) "items_channel_article" btree (channel, articlenumber) "items_channel_tstamp" btree (channel, dtstamp) 5. Any other comments/suggestions on the above setup. Thanks, Dror -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.zapatec.com ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Various performance questions
Dror Matalon <[EMAIL PROTECTED]> writes: > explain analyze select count(*) from items where channel < 5000; > QUERY PLAN > -- > Aggregate (cost=249141.54..249141.54 rows=1 width=0) (actual > time=26224.603..26224.608 rows=1 loops=1) >-> Seq Scan on items (cost=0.00..245377.52 rows=1505605 width=0) (actual > time=7.599..17686.869 rows=1632057 loops=1) > Filter: (channel < 5000) > Total runtime: 26224.703 ms > > > How can it do a sequential scan and apply a filter to it in less time > than the full sequential scan? Is it actually using an index without > really telling me? It's not using the index and not telling you. It's possible the count(*) operator itself is taking some time. Postgres doesn't have to call it on the rows that don't match the where clause. How long does "explain analyze select 1 from items" with and without the where clause take? What version of postgres is this?. In 7.4 (and maybe 7.3?) count() uses an int8 to store its count so it's not limited to 4 billion records. Unfortunately int8 is somewhat inefficient as it has to be dynamically allocated repeatedly. It's possible it's making a noticeable difference, especially with all the pages in cache, though I'm a bit surprised. There's some thought about optimizing this in 7.5. -- greg ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[PERFORM] Duplicate in pg_user table
Hi Currently we are running Postgresql v7.3.2 on Redhat Linux OS v9.0. We have Windows2000 client machines inserting records into the Postgresql tables via the Postgres ODBC v7.3.0100. After a few weeks of usage, when we do a \d at the sql prompt, there was a duplicate object name in the same schema, ie it can be a duplicate row of index or table. When we do a \d table_name, it will show a duplication of column names inside the table. We discovered that the schema in the pg_user table was duplicated also; thus causing the pg_dump to fail. Thank you, REgards. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Various performance questions
On Sun, Oct 26, 2003 at 10:49:29PM -0500, Greg Stark wrote: > Dror Matalon <[EMAIL PROTECTED]> writes: > > > explain analyze select count(*) from items where channel < 5000; > > QUERY PLAN > > -- > > Aggregate (cost=249141.54..249141.54 rows=1 width=0) (actual > > time=26224.603..26224.608 rows=1 loops=1) > >-> Seq Scan on items (cost=0.00..245377.52 rows=1505605 width=0) (actual > > time=7.599..17686.869 rows=1632057 loops=1) > > Filter: (channel < 5000) > > Total runtime: 26224.703 ms > > > > > > How can it do a sequential scan and apply a filter to it in less time > > than the full sequential scan? Is it actually using an index without > > really telling me? > > It's not using the index and not telling you. > > It's possible the count(*) operator itself is taking some time. Postgres I find it hard to believe that the actual counting would take a significant amount of time. > doesn't have to call it on the rows that don't match the where clause. How > long does "explain analyze select 1 from items" with and without the where > clause take? Same as count(*). Around 55 secs with no where clause, around 25 secs with. > > What version of postgres is this?. In 7.4 (and maybe 7.3?) count() uses an This is 7.4. > int8 to store its count so it's not limited to 4 billion records. > Unfortunately int8 is somewhat inefficient as it has to be dynamically > allocated repeatedly. It's possible it's making a noticeable difference, > especially with all the pages in cache, though I'm a bit surprised. There's > some thought about optimizing this in 7.5. > > -- > greg > -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.zapatec.com ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Various performance questions
[EMAIL PROTECTED] (Dror Matalon) wrote: > On Sun, Oct 26, 2003 at 10:49:29PM -0500, Greg Stark wrote: >> Dror Matalon <[EMAIL PROTECTED]> writes: >> >> > explain analyze select count(*) from items where channel < 5000; >> > QUERY PLAN >> > -- >> > Aggregate (cost=249141.54..249141.54 rows=1 width=0) (actual >> > time=26224.603..26224.608 rows=1 loops=1) >> >-> Seq Scan on items (cost=0.00..245377.52 rows=1505605 width=0) (actual >> > time=7.599..17686.869 rows=1632057 loops=1) >> > Filter: (channel < 5000) >> > Total runtime: 26224.703 ms >> > >> > >> > How can it do a sequential scan and apply a filter to it in less time >> > than the full sequential scan? Is it actually using an index without >> > really telling me? >> >> It's not using the index and not telling you. >> >> It's possible the count(*) operator itself is taking some time. Postgres > > I find it hard to believe that the actual counting would take a > significant amount of time. Most of the time involves: a) Reading each page of the table, and b) Figuring out which records on those pages are still "live." What work were you thinking was involved in doing the counting? >> doesn't have to call it on the rows that don't match the where clause. How >> long does "explain analyze select 1 from items" with and without the where >> clause take? > > Same as count(*). Around 55 secs with no where clause, around 25 secs > with. Good; at least that's consistent... -- (format nil "[EMAIL PROTECTED]" "cbbrowne" "acm.org") http://www3.sympatico.ca/cbbrowne/postgresql.html Signs of a Klingon Programmer #2: "You question the worthiness of my code? I should kill you where you stand!" ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] slow select
Vivek Khera wrote: "JB" == Josh Berkus <[EMAIL PROTECTED]> writes: JB> Actually, what OS's can't use all idle ram for kernel cache? I JB> should note that in my performance docs FreeBSD. Limited by the value of "sysctl vfs.hibufspace" from what I understand. This value is set at boot based on available RAM and some other tuning parameters. Actually I wanted to ask this question for long time. Can we have guidelines about how to set effective cache size for various OSs? Linux is pretty simple. Everything free is buffer cache. FreeBSD, not so straightforward but there is a sysctl.. How about HP-UX, Solaris and AIX? Other BSDs? and most importantly windows? That could add much value to the tuning guide. Isn't it? Shridhar ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Slow performance with no apparent reason
Yonatan Goraly kirjutas P, 26.10.2003 kell 00:25: > I am in the process of adding PostgreSQL support for an application, > in addition to Oracle and MS SQL. > I am using PostgreSQL version 7.3.2, Red Hat 9.0 on Intel Pentium III > board. > > I have a query that generally looks like this: > > SELECT t1.col1, t2.col1 FROM t1, t2 WHERE t1.x=t2.y AND t2.p='string' > AND t2.q=1 > > This query is strikingly slow (about 100 sec when both t1 and t2 has > about 1,200 records, compare with less than 4 sec with MS SQL and > Oracle) always send results of EXPLAIN ANALYZE if you ask for help on [PERFORM] knowing which indexes you have would also help. and you should have run ANALYZE too. - Hannu ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[PERFORM] explicit casting required for index use
Here's the basic issue: PostgreSQL doesn't use indexes unless a query criterion is of exactly the same type as the index type. This occurs even when a cast would enable the use of an index and greatly improve performance. I understand that casting is needed to use an index and will therefore affect performance -- the part I don't understand is why postgresql doesn't automatically cast query arguments to the column type, thereby enabling indexes on that column. I have a table that looks like this (extra cols, indexes, and fk constraints removed): [EMAIL PROTECTED] \d paprospect2 Table "unison.paprospect2" Column | Type | Modifiers -+-+--- pseq_id | integer | not null run_id | integer | not null pmodel_id | integer | not null svm | real | Indexes: paprospect2_search1 btree (pmodel_id, run_id, svm), I often search for pseq_ids based on all of pmodel_id, run_id, and svm threshold as below, hence the multi-column index. Without an explicit cast of the svm criterion: [EMAIL PROTECTED] explain select pseq_id from paprospect2 where pmodel_id=8210 and run_id=1 and svm>=11; Index Scan using paprospect2_search2 on paprospect2 (cost=0.00..43268.93 rows=2 width=4) Index Cond: ((pmodel_id = 8210) AND (run_id = 1)) Filter: (svm >= 11::double precision) And with an explicit cast to real (the same as the column type and indexed type): [EMAIL PROTECTED] explain select pseq_id from paprospect2 where pmodel_id=8210 and run_id=1 and svm>=11::real; Index Scan using paprospect2_search1 on paprospect2 (cost=0.00..6.34 rows=2 width=4) Index Cond: ((pmodel_id = 8210) AND (run_id = 1) AND (svm >= 11::real)) Note two things above: 1) The explicit cast greatly reduces the predicted (and actual) cost. 2) The uncasted query eventually casts svm to double precision, which seems odd since the column itself is real (that is, it eventually does cast, but to the "wrong" type). For small queries (returning ~10 rows), this is worth 100x in speed (9ms v. 990ms... in absolute terms, no big deal). For larger result sets (~200 rows), I've seen more like 1000x speed increases by using an explicit cast. For the larger queries, this can mean seconds versus many minutes. Having to explicitly cast criterion is very non-intuitive. Moreover, it seems quite straightforward that PostgreSQL might incorporate casts (and perhaps even function calls like upper() for functional indexes) into its query strategy optimization. (I suppose functional indexes would apply only to immutable fx only, but that's fine.) Thanks, Reece -- Reece Hart, Ph.D. [EMAIL PROTECTED], http://www.gene.com/ Genentech, Inc. 650/225-6133 (voice), -5389 (fax) Bioinformatics and Protein Engineering 1 DNA Way, MS-93http://www.in-machina.com/~reece/ South San Francisco, CA 94080-4990 [EMAIL PROTECTED], GPG: 0x25EC91A0