[PERFORM] Dump performance problems following server crash
Hello Performance, Yesterday, with help from the admin list, we took care of a problem we were having with creating tables/views/indexes following a server overheat & crash (an index on pg_attribute was corrupted, causing the create to hang and go through the roof on memory usage until it failed out - a reindex fixed it like charm). Following the repair of creates (we also took the system into single user to run reindex system), and with no other known problems with the db itself, we immediately began a dump of the database. Typical dump time: ~12 hours, we dump overnight but there is still decently heavy activity. However, this dump has the box to itself and after 10 hours we are only about 20% done just with pulling schema for the indexes - something that typically takes it 4-6 hours to complete all schema entirely. Load on the machine is minimal, along with memory usage by the dump process itself (864M, not large for this system). It is definitely moving, but just very slowly. At this point we are attempting to determine if this is a machine level problem (which we haven't seen sign of yet) or still a potential problem in postgres. The dump is currently doing I/O at 10mbps, but in testing our sys admin reports he has no problem getting stronger I/O stats from other processes. The current dump query running: SELECT t.tableoid, t.oid, t.relname as indexname, pg_catalog.pg_get_indexdef(i.indexrelid) as indexdef, t.relnatts as indnkeys, i.indkey, i.indisclustered, c.contype, c.conname, c.tableoid as contableoid, c.oid as conoid, (SELECT spcname FROM pg_catalog.pg_tablespace s WHERE s.oid = t.reltablespace) as tablespace FROM pg_catalog.pg_index i JOIN pg_catalog.pg_class t ON (t.oid = i.indexrelid) LEFT JOIN pg_catalog.pg_depend d ON (d.classid = t.tableoid AND d.objid = t.oid AND d.deptype = 'i') LEFT JOIN pg_catalog.pg_constraint c ON (d.refclassid = c.tableoid AND d.refobjid = c.oid) WHERE i.indrelid = '44240'::pg_catalog.oid ORDER BY indexname Amount of time it took me to run the query from console: ~5secs (I'm counting in my head, sophisticated, eh?) We tend to respond to slow queries with vacs and analyzes, but considering these are system tables that have recently been reindexed, how likely is it that we could improve things by doing a vac? At this point we plan to halt the dump and run a vac full on the db, but any ideas you may have as to why the dump is sluggish on getting this information, I'd appreciate them. spec info - Postgres 8.1.4 db size: 200+ GB 101,745 tables 314,821 indexes 1,569 views maintenance_work_mem = 262144 Server: OS: Solaris 10 Sunfire X4100 XL 2x AMD Opteron Model 275 dual core procs 8GB of ram (this server overheated & crashed due to a cooling problem at the hosting service) On top of a: Sun Storedge 6130 14x 146GB Drives in a Raid 5 Brocade 200E switches Emulex 4gb HBAs (this server had no known problems) Thanks in advance, Kim Hatcher <http://www.myemma.com>
Re: [PERFORM] Dump performance problems following server crash
We dropped into single user mode and ran reindex system - it was my understanding this would reindex them all, including shared catalogs - but perhaps not? Kim Tom Lane wrote: Kim <[EMAIL PROTECTED]> writes: The current dump query running: SELECT t.tableoid, t.oid, t.relname as indexname, pg_catalog.pg_get_indexdef(i.indexrelid) as indexdef, t.relnatts as indnkeys, i.indkey, i.indisclustered, c.contype, c.conname, c.tableoid as contableoid, c.oid as conoid, (SELECT spcname FROM pg_catalog.pg_tablespace s WHERE s.oid = t.reltablespace) as tablespace FROM pg_catalog.pg_index i JOIN pg_catalog.pg_class t ON (t.oid = i.indexrelid) LEFT JOIN pg_catalog.pg_depend d ON (d.classid = t.tableoid AND d.objid = t.oid AND d.deptype = 'i') LEFT JOIN pg_catalog.pg_constraint c ON (d.refclassid = c.tableoid AND d.refobjid = c.oid) WHERE i.indrelid = '44240'::pg_catalog.oid ORDER BY indexname Amount of time it took me to run the query from console: ~5secs (I'm counting in my head, sophisticated, eh?) Even 5 seconds is way too long. You've apparently still got something corrupted somewhere. Did you reindex *all* the system catalogs? regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster -- *kim hatcher* senior developer, emma® e: [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> p: 800 595 4401 w: www.myemma.com <http://www.myemma.com>
[PERFORM] unusual performance for vac following 8.2 upgrade
Hello all! Running a vac on an 8.2 client connecting to an 8.2 database (following example was for a 14 row static table) - vacuums will sit (for lack of a better word) for anywhere from ten to twenty minutes before taking a lock out on the table and doing work there. Once the vacuum does complete, I noticed that no timestamp is registered in pg_stat_all_tables for that relation for the last-vacuum'd timestamp (however analyze does seem to set it's timestamp). I asked it to run a vacuum on an index (knowing it would fail out), and again, the vacuum sat for several minutes before finally erroring out saying that it couldn't vacuum an index. Out of curiosity I tried the vacuum on an 8.1 client connected to the 8.2 db, same delay. In running a truss on the process while it is running, there is over five minutes where the process seems to be scanning pg_class (at least thats the only table listed in pg_locks for this process). Following this it drops into a cycle of doing the same send() command with several seconds lag between each one, and every so often it catches the same interrupt (SIGUSR1) and then goes back into the same cycle of send() calls. Also, whatever it is doing during this stage, it isn't checking for process-cancelled interrupts, as the process won't recognize it's been requested to cancel until it breaks out of this cycle of send()s and SIGUSR1s (which can go for another several minutes). I'm happy to send along the gore of the truss call if you think it would be helpful... Any ideas what the vac is prepping for that it could become bogged down in before finally taking the lock on the table? Is the lack of a timestamp set for last_vacuum in pg_stat_all_tables an indication that there may be something incomplete about our install? Since the upgrade, we've also seen unusual lag time in simple inserts into tables (atomic inserts have been seen running for several seconds), and also extreme delays in running \d on tables (I got tired of counting past 2 minutes, connecting with an 8.1 client gives immediate response on this command). We plan to upgrade to 8.2.1 as soon as possible, and also to drop into single user mode and run a reindex system, but any suggestions in the meantime as to a potential cause or a way to further debug the vacs would be greatly appreciated. OS: Solaris 10 write transactions/hr: 1.5 million size of pg_class: 535,226 number of relations: 108,694 Thanks to all, Kim ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] unusual performance for vac following 8.2 upgrade
Hey Tom, We were running on 8.1.1 previous to upgrading to 8.2, and yes, we definitely have a heafty pg_class. The inheritance model is heavily used in our schema (the results of the group by you wanted to see are down below). However, no significant problems were seen with vacs while we were on 8.1. Execution time for the vac seemed more linked to large table size and how active the table was with updates, rather than being universally over 10 minutes regardless of the vac's object. We will be doing an audit of our 8.2 install to try and make sure that it looks like a complete install, any tests you can think of that may further narrow things down for us? relkind | count -+ v | 1740 t | 49986 c | 4 S | 57 r | 108689 i | 374723 (6 rows) Tom Lane wrote: Kim <[EMAIL PROTECTED]> writes: size of pg_class: 535,226 number of relations: 108,694 Oh, I shoulda read all the way to the bottom of your email :-(. What version of PG were you running before? I would think that pretty much any version of pgstat_vacuum_tabstats would have had a performance issue with pg_class that large. Also, could we see select relkind, count(*) from pg_class group by relkind; regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] unusual performance for vac following 8.2 upgrade
For 8.1, we did have stats_block_level and stats_row_level on, so thats not it either :-/ However, I did go on to an alternate database of ours on the same machine, using the same install, same postmaster - that holds primarily static relations, and not many of those (16 relations total). The response of running a vac for a 1.3k static table was quick (6 seconds - but it still did not set the last_vacuum field). Not sure why we weren't seeing more probs with this on 8.1 for the full db, but from the looks of things I think your theory on the primary problem with our vacs is solid. I'm hoping we can fire up our old 8.1 dataset and run some tests on there to confirm/reject the idea that it was doing any better, but that will require quieter times on the machine than we've got right now :) We are going to try and upgrade to 8.2.1 as soon as we can, and if we continue to see some of the other problems I mentioned as side-notes, we'll build some information on those and pass it along... Thanks so much! Kim Tom Lane wrote: Kim <[EMAIL PROTECTED]> writes: We were running on 8.1.1 previous to upgrading to 8.2, and yes, we definitely have a heafty pg_class. The inheritance model is heavily used in our schema (the results of the group by you wanted to see are down below). However, no significant problems were seen with vacs while we were on 8.1. Odd, because the 8.1 code looks about the same, and it is perfectly obvious in hindsight that its runtime is about O(N^2) in the number of relations :-(. At least that'd be the case if the stats collector output were fully populated. Did you have either stats_block_level or stats_row_level turned on in 8.1? If not, maybe the reason for the change is that in 8.2, that table *will* be pretty fully populated, because now it's got a last-vacuum-time entry that gets made even if the stats are otherwise turned off. Perhaps making that non-disablable wasn't such a hot idea :-(. What I think we need to do about this is (1) fix pgstat_vacuum_tabstats to have non-O(N^2) behavior; I'm thinking of using a hash table for the OIDs instead of a linear list. Should be a pretty small change; I'll work on it today. (2) Reconsider whether last-vacuum-time should be sent to the collector unconditionally. Comments from hackers? regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] unusual performance for vac following 8.2 upgrade
Our pgstats.stat file is 40M for 8.2, on 8.1 it was 33M. Our schema size hasn't grown *that* much in the two weeks since we upgraded I'm not sure if this sheds any more light on the situation, but in scanning down through the process output from truss, it looks like the first section of output was a large chunk of reads on pgstat.stat, followed by a larger chunk of reads on the global directory and directories under base - this whole section probably went on for a good 6-7 minutes, though I would say the reads on pgstat likely finished within a couple of minutes or so. Following this there was a phase were it did a lot of seeks and reads on files under pg_clog, and it was while doing this (or perhaps it had finished whatever it wanted with clogs) it dropped into the send()/SIGUSR1 loop that goes for another several minutes. Kim Tom Lane wrote: I wrote: (2) Reconsider whether last-vacuum-time should be sent to the collector unconditionally. Actually, now that I look, the collector already contains this logic: /* * Don't create either the database or table entry if it doesn't already * exist. This avoids bloating the stats with entries for stuff that is * only touched by vacuum and not by live operations. */ and ditto for analyze messages. So my idea that the addition of last-vac-time was causing an increase in the statistics file size compared to 8.1 seems wrong. How large is your $PGDATA/global/pgstat.stat file, anyway? regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [PERFORM] unusual performance for vac following 8.2
Hello again Tom, We have our upgrade to 8.2.1 scheduled for this weekend, and we noticed your message regarding the vacuum patch being applied to 8.2 and back-patched. I expect I know the answer to this next question :) but I was wondering if the patch referenced below has also been bundled into the normal source download of 8.2.1 or if we would still need to manually apply it? - Fix a performance problem in databases with large numbers of tables (or other types of pg_class entry): the function pgstat_vacuum_tabstat, invoked during VACUUM startup, had runtime proportional to the number of stats table entries times the number of pg_class rows; in other words O(N^2) if the stats collector's information is reasonably complete. Replace list searching with a hash table to bring it back to O(N) behavior. Per report from kim at myemma.com. Back-patch as far as 8.1; 8.0 and before use different coding here. Thanks, Kim Tom Lane wrote: I wrote: What I think we need to do about this is (1) fix pgstat_vacuum_tabstats to have non-O(N^2) behavior; I'm thinking of using a hash table for the OIDs instead of a linear list. Should be a pretty small change; I'll work on it today. I've applied the attached patch to 8.2 to do the above. Please give it a try and see how much it helps for you. Some limited testing here confirms a noticeable improvement in VACUUM startup time at 1 tables, and of course it should be 100X worse with 10 tables. I am still confused why you didn't see the problem in 8.1, though. This code is just about exactly the same in 8.1. Maybe you changed your stats collector settings when moving to 8.2? regards, tom lane Index: pgstat.c === RCS file: /cvsroot/pgsql/src/backend/postmaster/pgstat.c,v retrieving revision 1.140 diff -c -r1.140 pgstat.c *** pgstat.c21 Nov 2006 20:59:52 - 1.140 --- pgstat.c11 Jan 2007 22:32:30 - *** *** 159,164 --- 159,165 static void pgstat_read_statsfile(HTAB **dbhash, Oid onlydb); static void backend_read_statsfile(void); static void pgstat_read_current_status(void); + static HTAB *pgstat_collect_oids(Oid catalogid); static void pgstat_setheader(PgStat_MsgHdr *hdr, StatMsgType mtype); static void pgstat_send(void *msg, int len); *** *** 657,666 void pgstat_vacuum_tabstat(void) { ! List *oidlist; ! Relationrel; ! HeapScanDesc scan; ! HeapTuple tup; PgStat_MsgTabpurge msg; HASH_SEQ_STATUS hstat; PgStat_StatDBEntry *dbentry; --- 658,664 void pgstat_vacuum_tabstat(void) { ! HTAB *htab; PgStat_MsgTabpurge msg; HASH_SEQ_STATUS hstat; PgStat_StatDBEntry *dbentry; *** *** 679,693 /* * Read pg_database and make a list of OIDs of all existing databases */ ! oidlist = NIL; ! rel = heap_open(DatabaseRelationId, AccessShareLock); ! scan = heap_beginscan(rel, SnapshotNow, 0, NULL); ! while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL) ! { ! oidlist = lappend_oid(oidlist, HeapTupleGetOid(tup)); ! } ! heap_endscan(scan); ! heap_close(rel, AccessShareLock); /* * Search the database hash table for dead databases and tell the --- 677,683 /* * Read pg_database and make a list of OIDs of all existing databases */ ! htab = pgstat_collect_oids(DatabaseRelationId); /* * Search the database hash table for dead databases and tell the *** *** 698,709 { Oid dbid = dbentry->databaseid; ! if (!list_member_oid(oidlist, dbid)) pgstat_drop_database(dbid); } /* Clean up */ ! list_free(oidlist); /* * Lookup our own database entry; if not found, nothing more to do. --- 688,701 { Oid dbid = dbentry->databaseid; ! CHECK_FOR_INTERRUPTS(); ! ! if (hash_search(htab, (void *) &dbid, HASH_FIND, NULL) == NULL) pgstat_drop_database(dbid); } /* Clean up */ ! hash_destroy(htab); /* * Lookup our own database entry; if not found, nothing more to do. *** *** 717,731 /* * Similarly to above, make a list of all known relations in this DB. */ ! oidlist = NIL; ! rel = heap_open(RelationRelationId, AccessShareLock); ! scan = heap_beginscan(rel, SnapshotNow, 0, NULL); ! while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL) ! { ! oidlist = lappend_oid(oidlist, HeapTupleGetOid(tup));
[PERFORM] full outer performance problem
Hi, I'm having problems with the query optimizer and FULL OUTER JOIN on PostgreSQL 7.4. I cannot get it to use my indexes with full outer joins. I might be naive, but I think that it should be possible? I have two BIG tables (virtually identical) with 3 NOT NULL columns Station_id, TimeObs, Temp_, with indexes on (Station_id, TimeObs) and valid ANALYSE (set statistics=100). I want to join the two tables with a FULL OUTER JOIN. When I specify the query as: select temp_max_60min,temp_dry_at_2m from station s natural join temp_dry_at_2m a full outer join temp_max_60min b using (station_id, timeobs) where s.wmo_id=6065 and timeobs='2004-1-1 0:0:0' and '2004-1-1 0:0:0' between s.startdate and s.enddate; I get the correct results, BUT LOUSY performance, and the following explain: Nested Loop Left Join (cost=5.84..163484.08 rows=1349 width=12) (actual time=66146.815..119005.381 rows=1 loops=1) Filter: (COALESCE("outer".timeobs, "inner".timeobs) = '2004-01-01 00:00:00'::timestamp without time zone) -> Hash Join (cost=5.84..155420.24 rows=1349 width=16) (actual time=8644.449..110836.038 rows=109826 loops=1) Hash Cond: ("outer".station_id = "inner".station_id) -> Seq Scan on temp_dry_at_2m a (cost=0.00..120615.94 rows=6956994 width=16) (actual time=0.024..104548.515 rows=6956994 loops=1) -> Hash (cost=5.84..5.84 rows=1 width=4) (actual time=0.114..0.114 rows=0 loops=1) -> Index Scan using wmo_idx on station (cost=0.00..5.84 rows=1 width=4) (actual time=0.105..0.108 rows=1 loops=1) Index Cond: ((wmo_id = 6065) AND ('2004-01-01 00:00:00'::timestamp without time zone >= startdate) AND ('2004-01-01 00:00:00'::timestamp without time zone <= enddate)) -> Index Scan using temp_max_60min_idx on temp_max_60min b (cost=0.00..5.96 rows=1 width=20) (actual time=0.071..0.071 rows=0 loops=109826) Index Cond: (("outer".station_id = b.station_id) AND ("outer".timeobs = b.timeobs)) Total runtime: 119005.499 ms (11 rows) If I change the query to (and thus negates the full outer join): select temp_max_60min,temp_dry_at_2m from station s natural join temp_dry_at_2m a full outer join temp_max_60min b using (station_id, timeobs) where s.wmo_id=6065 and a.timeobs='2004-1-1 0:0:0' and b.timeobs='2004-1-1 0:0:0' and '2004-1-1 0:0:0' between s.startdate and s.enddate; I get wrong results (In the case where one of the records is missing in one of the tables), BUT GOOD performance, and this query plan: Nested Loop (cost=0.00..17.83 rows=1 width=12) (actual time=79.221..79.236 rows=1 loops=1) -> Nested Loop (cost=0.00..11.82 rows=1 width=24) (actual time=65.517..65.526 rows=1 loops=1) -> Index Scan using wmo_idx on station (cost=0.00..5.83 rows=1 width=4) (actual time=0.022..0.026 rows=1 loops=1) Index Cond: ((wmo_id = 6065) AND ('2004-01-01 00:00:00'::timestamp without time zone >= startdate) AND ('2004-01-01 00:00:00'::timestamp without time zone <= enddate)) -> Index Scan using temp_max_60min_idx on temp_max_60min b (cost=0.00..5.97 rows=1 width=20) (actual time=65.483..65.486 rows=1 loops=1) Index Cond: (("outer".station_id = b.station_id) AND (b.timeobs = '2004-01-01 00:00:00'::timestamp without time zone)) -> Index Scan using temp_dry_at_2m_idx on temp_dry_at_2m a (cost=0.00..6.00 rows=1 width=16) (actual time=13.694..13.698 rows=1 loops=1) Index Cond: (("outer".station_id = a.station_id) AND (a.timeobs = '2004-01-01 00:00:00'::timestamp without time zone)) Total runtime: 79.340 ms (9 rows) If further info like EXPLAIN VERBOSE is useful please say so and I will provide it. Thanks in advance! Kim Bisgaard.
Re: [PERFORM] full outer performance problem
Hi, Look for my comments further down... John A Meinel wrote: Kim Bisgaard wrote: Hi, I'm having problems with the query optimizer and FULL OUTER JOIN on PostgreSQL 7.4. I cannot get it to use my indexes with full outer joins. I might be naive, but I think that it should be possible? I have two BIG tables (virtually identical) with 3 NOT NULL columns Station_id, TimeObs, Temp_, with indexes on (Station_id, TimeObs) and valid ANALYSE (set statistics=100). I want to join the two tables with a FULL OUTER JOIN. When I specify the query as: select temp_max_60min,temp_dry_at_2m from station s natural join temp_dry_at_2m a full outer join temp_max_60min b using (station_id, timeobs) where s.wmo_id=6065 and timeobs='2004-1-1 0:0:0' and '2004-1-1 0:0:0' between s.startdate and s.enddate; I get the correct results, BUT LOUSY performance, and the following explain: Nested Loop Left Join (cost=5.84..163484.08 rows=1349 width=12) (actual time=66146.815..119005.381 rows=1 loops=1) Filter: (COALESCE("outer".timeobs, "inner".timeobs) = '2004-01-01 00:00:00'::timestamp without time zone) -> Hash Join (cost=5.84..155420.24 rows=1349 width=16) (actual time=8644.449..110836.038 rows=109826 loops=1) Well, the estimate here is quite a bit off. It thinks you will be getting 1349 (which is probably why it picked a nested loop plan), but then it is getting 109826 rows. I'm guessing it is misunderstanding the selectivity of the timeobs column. I think you are right.. Hash Cond: ("outer".station_id = "inner".station_id) -> Seq Scan on temp_dry_at_2m a (cost=0.00..120615.94 rows=6956994 width=16) (actual time=0.024..104548.515 rows=6956994 loops=1) -> Hash (cost=5.84..5.84 rows=1 width=4) (actual time=0.114..0.114 rows=0 loops=1) -> Index Scan using wmo_idx on station (cost=0.00..5.84 rows=1 width=4) (actual time=0.105..0.108 rows=1 loops=1) Index Cond: ((wmo_id = 6065) AND ('2004-01-01 00:00:00'::timestamp without time zone >= startdate) AND ('2004-01-01 00:00:00'::timestamp without time zone <= enddate)) -> Index Scan using temp_max_60min_idx on temp_max_60min b (cost=0.00..5.96 rows=1 width=20) (actual time=0.071..0.071 rows=0 loops=109826) Index Cond: (("outer".station_id = b.station_id) AND ("outer".timeobs = b.timeobs)) Total runtime: 119005.499 ms (11 rows) I think the bigger problem is that a full outer join says grab all rows, even if they are null. What about this query: SELECT temp_max_60min,temp_dry_at_2m FROM (station s LEFT JOIN temp_dry_at_2m a USING (station_id, timeobs) LEFT JOIN temp_max_60min b USING (station_id, timeobs) where s.wmo_id=6065 and timeobs='2004-1-1 0:0:0' and '2004-1-1 0:0:0' between s.startdate and s.enddate; This works very well, and gives the correct result - thanks!! After that, you should probably have a multi-column index on (station_id, timeobs), which lets postgres use just that index for the lookup, rather than using an index and then a filter. (Looking at your next query you might already have that index). Yes I have. If I change the query to (and thus negates the full outer join): This is the same query, I think you messed up your copy and paste. Nope. Changed "and timeobs='2004-1-1 0:0:0' " to "and a.timeobs='2004-1-1 0:0:0' and b.timeobs='2004-1-1 0:0:0' " select temp_max_60min,temp_dry_at_2m from station s natural join temp_dry_at_2m a full outer join temp_max_60min b using (station_id, timeobs) where s.wmo_id=6065 and _a.timeobs='2004-1-1 0:0:0' and b._timeobs='2004-1-1 0:0:0' and '2004-1-1 0:0:0' between s.startdate and s.enddate; I get wrong results (In the case where one of the records is missing in one of the tables), BUT GOOD performance, and this query plan: Nested Loop (cost=0.00..17.83 rows=1 width=12) (actual time=79.221..79.236 rows=1 loops=1) -> Nested Loop (cost=0.00..11.82 rows=1 width=24) (actual time=65.517..65.526 rows=1 loops=1) -> Index Scan using wmo_idx on station (cost=0.00..5.83 rows=1 width=4) (actual time=0.022..0.026 rows=1 loops=1) Index Cond: ((wmo_id = 6065) AND ('2004-01-01 00:00:00'::timestamp without time zone >= startdate) AND ('2004-01-01 00:00:00'::timestamp without time zone <= enddate)) -> Index Scan using temp_max_60min_idx on temp_max_60min b (cost=0.00..5.97 rows=1 width=20) (actual time=65.483..65.486 rows=1 loops=1) Index Cond: (("outer".station_id = b.station_id) AND (b.timeobs = '2004-01-01 00:00:00'::timestamp without time zone)) -> Index Scan using temp_dry_at_2m_idx on temp_dry_at_2m a (cost=0.00..6.00 rows=1 width=16) (actual time=13.6
Re: [PERFORM] full outer performance problem
Sorry for not listing the exact layout of temp_: obsdb=> \d temp_dry_at_2m Table "public.temp_dry_at_2m" Column | Type | Modifiers +-+--- obshist_id | integer | not null station_id | integer | not null timeobs | timestamp without time zone | not null temp_dry_at_2m | real | not null Indexes: "temp_dry_at_2m_pkey" primary key, btree (obshist_id) "temp_dry_at_2m_idx" btree (station_id, timeobs) The difference between the two queries is if a (station_id,timeobs) row is missing in one table, then the first returns one record(null,9.3) while the second return no records. Regards, Kim Bisgaard. Tom Lane wrote: Kim Bisgaard <[EMAIL PROTECTED]> writes: I have two BIG tables (virtually identical) with 3 NOT NULL columns Station_id, TimeObs, Temp_, with indexes on (Station_id, TimeObs) and valid ANALYSE (set statistics=100). I want to join the two tables with a FULL OUTER JOIN. I'm confused. If the columns are NOT NULL, why isn't this a valid transformation of your original query? select temp_max_60min,temp_dry_at_2m from station s natural join temp_dry_at_2m a full outer join temp_max_60min b using (station_id, timeobs) where s.wmo_id=6065 and _a.timeobs='2004-1-1 0:0:0' and b._timeobs='2004-1-1 0:0:0' and '2004-1-1 0:0:0' between s.startdate and s.enddate; Seems like it's not eliminating any rows that would otherwise succeed. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- Kim Bisgaard Computer Department Phone: +45 3915 7562 (direct) Danish Meteorological Institute Fax: +45 3915 7460 (division)
[PERFORM] full outer performance problem
Hi, I'm having problems with the query optimizer and FULL OUTER JOIN on PostgreSQL 7.4. I cannot get it to use my indexes with full outer joins. I might be naive, but I think that it should be possible? I have two BIG tables (virtually identical) with 3 NOT NULL columns Station_id, TimeObs, Temp_, with unique indexes on (Station_id, TimeObs) and valid ANALYSE (set statistics=100). I want to join the two tables with a FULL OUTER JOIN. When I specify the query as: SELECT station_id, timeobs,temp_grass, temp_dry_at_2m FROM temp_dry_at_2m a FULL OUTER JOIN temp_grass b USING (station_id, timeobs) WHERE station_id = 52981 AND timeobs = '2004-1-1 0:0:0' I get the correct results station_id | timeobs | temp_grass | temp_dry_at_2m +-++ 52944 | 2004-01-01 00:10:00 || -1.1 (1 row) BUT LOUSY performance, and the following EXPLAIN: QUERY PLAN -- Merge Full Join (cost=1542369.83..1618958.58 rows=6956994 width=32) (actual time=187176.408..201436.264 rows=1 loops=1) Merge Cond: (("outer".station_id = "inner".station_id) AND ("outer".timeobs = "inner".timeobs)) Filter: ((COALESCE("outer".station_id, "inner".station_id) = 52981) AND (COALESCE("outer".timeobs, "inner".timeobs) = '2004-01-01 00:00:00'::timestamp without time zone)) -> Sort (cost=1207913.44..1225305.93 rows=6956994 width=16) (actual time=145748.253..153851.607 rows=6956994 loops=1) Sort Key: a.station_id, a.timeobs -> Seq Scan on temp_dry_at_2m a (cost=0.00..117549.94 rows=6956994 width=16) (actual time=0.049..54226.770 rows=6956994 loops=1) -> Sort (cost=334456.38..340472.11 rows=2406292 width=16) (actual time=31668.876..34491.123 rows=2406292 loops=1) Sort Key: b.station_id, b.timeobs -> Seq Scan on temp_grass b (cost=0.00..40658.92 rows=2406292 width=16) (actual time=0.052..5484.489 rows=2406292 loops=1) Total runtime: 201795.989 ms (10 rows) If I change the query (note the "b."s) explain analyse SELECT b.station_id, b.timeobs,temp_grass, temp_dry_at_2m FROM temp_dry_at_2m a FULL OUTER JOIN temp_grass b USING (station_id, timeobs) WHERE b.station_id = 52981 AND b.timeobs = '2004-1-1 0:0:0' I seem to destroy the FULL OUTER JOIN and get wrong results (nothing) If I had happend to use "a.", and not "b.", I would have gotten correct results (by accident). The "a." variant gives this EXPLAIN: QUERY PLAN Nested Loop Left Join (cost=0.00..11.97 rows=1 width=20) (actual time=0.060..0.067 rows=1 loops=1) -> Index Scan using temp_dry_at_2m_idx on temp_dry_at_2m a (cost=0.00..5.99 rows=1 width=16) (actual time=0.033..0.036 rows=1 loops=1) Index Cond: ((station_id = 52981) AND (timeobs = '2004-01-01 00:00:00'::timestamp without time zone)) -> Index Scan using temp_grass_idx on temp_grass b (cost=0.00..5.96 rows=1 width=16) (actual time=0.018..0.021 rows=1 loops=1) Index Cond: (("outer".station_id = b.station_id) AND ("outer".timeobs = b.timeobs)) Total runtime: 0.140 ms (6 rows) Why will PostgreSQL not use the same plan for both these queries - they are virtually identical?? I have tried to formulate the problem with left joins, but this demands from me that I know which table has all the values (and thus has to go first), and in practice no such table excists. TIA, Kim Bisgaard. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] full outer performance problem
Hi Bruno, Thanks for the moral support! I feel so too - but I am confident it will show up soon. W.r.t. your rewrite of the query, I get this "ERROR: could not devise a query plan for the given query" but no further details - I will try google Regards, Kim. Bruno Wolff III wrote: On Wed, Jun 08, 2005 at 11:37:40 +0200, Kim Bisgaard <[EMAIL PROTECTED]> wrote: Hi, I'm having problems with the query optimizer and FULL OUTER JOIN on PostgreSQL 7.4. I cannot get it to use my indexes with full outer joins. I might be naive, but I think that it should be possible? I have two BIG tables (virtually identical) with 3 NOT NULL columns Station_id, TimeObs, Temp_, with unique indexes on (Station_id, TimeObs) and valid ANALYSE (set statistics=100). I want to join the two tables with a FULL OUTER JOIN. When I specify the query as: SELECT station_id, timeobs,temp_grass, temp_dry_at_2m FROM temp_dry_at_2m a FULL OUTER JOIN temp_grass b USING (station_id, timeobs) WHERE station_id = 52981 AND timeobs = '2004-1-1 0:0:0' I get the correct results station_id | timeobs | temp_grass | temp_dry_at_2m +-++ 52944 | 2004-01-01 00:10:00 || -1.1 (1 row) BUT LOUSY performance, and the following EXPLAIN: QUERY PLAN -- Merge Full Join (cost=1542369.83..1618958.58 rows=6956994 width=32) (actual time=187176.408..201436.264 rows=1 loops=1) Merge Cond: (("outer".station_id = "inner".station_id) AND ("outer".timeobs = "inner".timeobs)) Filter: ((COALESCE("outer".station_id, "inner".station_id) = 52981) AND (COALESCE("outer".timeobs, "inner".timeobs) = '2004-01-01 00:00:00'::timestamp without time zone)) -> Sort (cost=1207913.44..1225305.93 rows=6956994 width=16) (actual time=145748.253..153851.607 rows=6956994 loops=1) Sort Key: a.station_id, a.timeobs -> Seq Scan on temp_dry_at_2m a (cost=0.00..117549.94 rows=6956994 width=16) (actual time=0.049..54226.770 rows=6956994 loops=1) -> Sort (cost=334456.38..340472.11 rows=2406292 width=16) (actual time=31668.876..34491.123 rows=2406292 loops=1) Sort Key: b.station_id, b.timeobs -> Seq Scan on temp_grass b (cost=0.00..40658.92 rows=2406292 width=16) (actual time=0.052..5484.489 rows=2406292 loops=1) Total runtime: 201795.989 ms (10 rows) Someone else will need to comment on why Postgres can't use a more efficient plan. What I think will work for you is to restrict the station_id and timeobs on each side and then do a full join. You can try something like the sample query below (which hasn't been tested): SELECT station_id, timeobs, temp_grass, temp_dry_at_2m FROM (SELECT station_id, timeobs, temp_dry_at_2m FROM temp_dry_at_2m WHERE station_id = 52981 AND timeobs = '2004-1-1 0:0:0') a FULL OUTER JOIN (SELECT station_id, timeobs, temp_grass FROM temp_grass WHERE station_id = 52981 AND timeobs = '2004-1-1 0:0:0') b USING (station_id, timeobs) ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] full outer performance problem
Quoting Tom Lane <[EMAIL PROTECTED]>: > Kim Bisgaard <[EMAIL PROTECTED]> writes: > > SELECT station_id, timeobs,temp_grass, temp_dry_at_2m > > FROM temp_dry_at_2m a > > FULL OUTER JOIN temp_grass b > > USING (station_id, timeobs) > > WHERE station_id = 52981 > > AND timeobs = '2004-1-1 0:0:0' > > > explain analyse SELECT b.station_id, b.timeobs,temp_grass, temp_dry_at_2m > > FROM temp_dry_at_2m a > > FULL OUTER JOIN temp_grass b > > USING (station_id, timeobs) > > WHERE b.station_id = 52981 > > AND b.timeobs = '2004-1-1 0:0:0' > > > Why will PostgreSQL not use the same plan for both these queries - they > > are virtually identical?? > > Because they're semantically completely different. The second query is > effectively a RIGHT JOIN, because join rows in which b is all-null will > be thrown away by the WHERE. The optimizer sees this (note your second > plan doesn't use a Full Join step anywhere) and is able to produce a > much better plan. Full outer join is difficult to optimize, in part > because we have no choice but to use a merge join for it --- the other > join types don't support full join. > > regards, tom lane > Yes I am aware that they are not "identical", they also give different results, but the data nessesary to compute the results is (0-2 rows, 0-1 row from each table), and thus ideally have the potential to have similar performance - to my head anyway, but I may not have grasped the complete picture yet :-) Regards, Kim. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] full outer performance problem
Quoting Tom Lane <[EMAIL PROTECTED]>: > Kim Bisgaard <[EMAIL PROTECTED]> writes: > > W.r.t. your rewrite of the query, I get this "ERROR: could not devise a > > query plan for the given query" but no further details - I will try google > > Which PG version are you using again? That should be fixed in 7.4.3 > and later. > > regards, tom lane > Its 7.4.1. I am in the process (may take a while yet) of installing 8.0.3 on the same hardware in order to have a parallel system. Time is a finite meassure :-) I must admit I would rather have the first query perform, that have this workaround function ;-) Regards, Kim. ---(end of broadcast)--- TIP 8: explain analyze is your friend
[PERFORM] Is any effect other process performance after vaccumdb finished ?
Hi. First extremely thanks for your works about postgresql . I wonder that after executing 'vaccumdb -z' some other process can not read their own msg queue during 2 ~ 3 minuts. vaccum executed every hour. and The processes have not any relations between postgreql. Is it possible ?
[PERFORM] Planner selects slow "Bitmap Heap Scan" when "Index Scan" is faster
Hi All I have a query where the planner makes a wrong cost estimate, it looks like it underestimates the cost of a "Bitmap Heap Scan" compared to an "Index Scan". This it the two plans, I have also pasted them below: Slow (189ms): http://explain.depesz.com/s/2Wq Fast (21ms): http://explain.depesz.com/s/ThQ I have run "VACUUM FULL VERBOSE ANALYZE". I have configured shared_buffers and effective_cache_size, that didn't solve my problem, the estimates was kept the same and both queries got faster. What can I do to fix the cost estimate? Regards, Kim Hansen yield=> SELECT version(); version --- PostgreSQL 9.1.3 on x86_64-unknown-linux-gnu, compiled by gcc-4.4.real (Debian 4.4.5-8) 4.4.5, 64-bit (1 row) yield=> explain analyze select "filtered_demands"."pol" as "c0" from "demands"."filtered_demands" as "filtered_demands" where ("filtered_demands"."pod" = 'VELAG') group by "filtered_demands"."pol" order by "filtered_demands"."pol" ASC NULLS LAST; QUERY PLAN --- Sort (cost=38564.80..38564.80 rows=2 width=6) (actual time=188.987..189.003 rows=221 loops=1) Sort Key: pol Sort Method: quicksort Memory: 35kB -> HashAggregate (cost=38564.77..38564.79 rows=2 width=6) (actual time=188.796..188.835 rows=221 loops=1) -> Bitmap Heap Scan on filtered_demands (cost=566.23..38503.77 rows=24401 width=6) (actual time=6.501..182.634 rows=18588 loops=1) Recheck Cond: (pod = 'VELAG'::text) -> Bitmap Index Scan on filtered_demands_pod_pol_idx (cost=0.00..560.12 rows=24401 width=0) (actual time=4.917..4.917 rows=18588 loops=1) Index Cond: (pod = 'VELAG'::text) Total runtime: 189.065 ms (9 rows) yield=> set enable_bitmapscan = false; SET yield=> explain analyze select "filtered_demands"."pol" as "c0" from "demands"."filtered_demands" as "filtered_demands" where ("filtered_demands"."pod" = 'VELAG') group by "filtered_demands"."pol" order by "filtered_demands"."pol" ASC NULLS LAST; QUERY PLAN -- Group (cost=0.00..76534.33 rows=2 width=6) (actual time=0.028..20.823 rows=221 loops=1) -> Index Scan using filtered_demands_pod_pol_idx on filtered_demands (cost=0.00..76473.33 rows=24401 width=6) (actual time=0.027..17.174 rows=18588 loops=1) Index Cond: (pod = 'VELAG'::text) Total runtime: 20.877 ms (4 rows) yield=> -- Kim Rydhof Thor Hansen Vadgårdsvej 3, 2. tv. 2860 Søborg Phone: +45 3091 2437 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Planner selects slow "Bitmap Heap Scan" when "Index Scan" is faster
On Thu, Apr 5, 2012 at 17:34, Kevin Grittner wrote: > Kim Hansen wrote: > >> I have a query where the planner makes a wrong cost estimate, it >> looks like it underestimates the cost of a "Bitmap Heap Scan" >> compared to an "Index Scan". > >> What can I do to fix the cost estimate? > > Could you try running the query with cpu_tuple_cost = 0.05 and let > us know how that goes? > It looks like it just increased the estimated cost of both queries by about 1000. Regards, Kim === yield=> explain analyze select "filtered_demands"."pol" as "c0" from "demands"."filtered_demands" as "filtered_demands" where ("filtered_demands"."pod" = 'VELAG') group by "filtered_demands"."pol" order by "filtered_demands"."pol" ASC NULLS LAST; QUERY PLAN --- Sort (cost=39540.92..39540.92 rows=2 width=6) (actual time=186.833..186.858 rows=221 loops=1) Sort Key: pol Sort Method: quicksort Memory: 35kB -> HashAggregate (cost=39540.81..39540.91 rows=2 width=6) (actual time=186.643..186.678 rows=221 loops=1) -> Bitmap Heap Scan on filtered_demands (cost=566.23..39479.81 rows=24401 width=6) (actual time=6.154..180.654 rows=18588 loops=1) Recheck Cond: (pod = 'VELAG'::text) -> Bitmap Index Scan on filtered_demands_pod_pol_idx (cost=0.00..560.12 rows=24401 width=0) (actual time=4.699..4.699 rows=18588 loops=1) Index Cond: (pod = 'VELAG'::text) Total runtime: 186.912 ms (9 rows) yield=> set enable_bitmapscan = false; SET yield=> explain analyze select "filtered_demands"."pol" as "c0" from "demands"."filtered_demands" as "filtered_demands" where ("filtered_demands"."pod" = 'VELAG') group by "filtered_demands"."pol" order by "filtered_demands"."pol" ASC NULLS LAST; QUERY PLAN ------ Group (cost=0.00..77510.37 rows=2 width=6) (actual time=0.029..20.361 rows=221 loops=1) -> Index Scan using filtered_demands_pod_pol_idx on filtered_demands (cost=0.00..77449.37 rows=24401 width=6) (actual time=0.027..16.859 rows=18588 loops=1) Index Cond: (pod = 'VELAG'::text) Total runtime: 20.410 ms (4 rows) yield=> -- Kim Rydhof Thor Hansen Vadgårdsvej 3, 2. tv. 2860 Søborg Phone: +45 3091 2437 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Planner selects slow "Bitmap Heap Scan" when "Index Scan" is faster
Hi all On Fri, Apr 6, 2012 at 19:11, Jeff Janes wrote: > On Wed, Apr 4, 2012 at 6:47 AM, Kim Hansen wrote: >> Hi All >> >> I have a query where the planner makes a wrong cost estimate, it looks >> like it underestimates the cost of a "Bitmap Heap Scan" compared to an >> "Index Scan". >> >> This it the two plans, I have also pasted them below: >> Slow (189ms): http://explain.depesz.com/s/2Wq >> Fast (21ms): http://explain.depesz.com/s/ThQ > > Could you do explain (analyze, buffers)? I have done that now, the log is pasted in below. It looks like every buffer fetched is a hit, I would think that PostgreSQL should know that as almost nothing happens on the server and effective_cache_size is configured to 8GB. > Did you run these queries multiple times in both orders? If you just > ran them once each, in the order indicated, then the bitmap scan may > have done the hard work of reading all the needed buffers into cache, > and the index scan then got to enjoy that cache. I have run the queries a few times in order to warm up the caches, the queries stabilise on 20ms and 180ms. Regards, Kim yield=> explain (analyze,buffers) select "filtered_demands"."pol" as "c0" from "demands"."filtered_demands" as "filtered_demands" where ("filtered_demands"."pod" = 'VELAG') group by "filtered_demands"."pol" order by "filtered_demands"."pol" ASC NULLS LAST; QUERY PLAN --- Sort (cost=38564.80..38564.80 rows=2 width=6) (actual time=185.497..185.520 rows=221 loops=1) Sort Key: pol Sort Method: quicksort Memory: 35kB Buffers: shared hit=14969 -> HashAggregate (cost=38564.77..38564.79 rows=2 width=6) (actual time=185.303..185.343 rows=221 loops=1) Buffers: shared hit=14969 -> Bitmap Heap Scan on filtered_demands (cost=566.23..38503.77 rows=24401 width=6) (actual time=6.119..179.056 rows=18588 loops=1) Recheck Cond: (pod = 'VELAG'::text) Buffers: shared hit=14969 -> Bitmap Index Scan on filtered_demands_pod_pol_idx (cost=0.00..560.12 rows=24401 width=0) (actual time=4.661..4.661 rows=18588 loops=1) Index Cond: (pod = 'VELAG'::text) Buffers: shared hit=74 Total runtime: 185.577 ms (13 rows) yield=> set enable_bitmapscan = false; SET yield=> explain (analyze,buffers) select "filtered_demands"."pol" as "c0" from "demands"."filtered_demands" as "filtered_demands" where ("filtered_demands"."pod" = 'VELAG') group by "filtered_demands"."pol" order by "filtered_demands"."pol" ASC NULLS LAST; QUERY PLAN -- Group (cost=0.00..76534.33 rows=2 width=6) (actual time=0.029..20.202 rows=221 loops=1) Buffers: shared hit=18386 -> Index Scan using filtered_demands_pod_pol_idx on filtered_demands (cost=0.00..76473.33 rows=24401 width=6) (actual time=0.027..16.455 rows=18588 loops=1) Index Cond: (pod = 'VELAG'::text) Buffers: shared hit=18386 Total runtime: 20.246 ms (6 rows) -- Kim Rydhof Thor Hansen Vadgårdsvej 3, 2. tv. 2860 Søborg Phone: +45 3091 2437 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Planner selects slow "Bitmap Heap Scan" when "Index Scan" is faster
On Tue, Apr 10, 2012 at 04:59, Jeff Janes wrote: > On Fri, Apr 6, 2012 at 3:09 PM, Kim Hansen wrote: > >> I have run the queries a few times in order to warm up the caches, the >> queries stabilise on 20ms and 180ms. > > My first curiosity is not why the estimate is too good for Bitmap > Index Scan, but rather why the actual execution is too poor. As far > as I can see the only explanation for the poor execution is that the > bitmap scan has gone lossy, so that every tuple in every touched block > needs to be rechecked against the where clause. If that is the case, > it suggests that your work_mem is quite small. > > In 9.2, explain analyze will report the number of tuples filtered out > by rechecking, but that isn't reported in your version. > > It looks like the planner makes no attempt to predict when a bitmap > scan will go lossy and then penalize it for the extra rechecks it will > do. Since it doesn't know it will be carrying out those extra checks, > you can't just increase the tuple or operator costs factors. You are right, when I increase the work_mem from 1MB to 2MB the time decreases from 180ms to 30ms for the slow query. I have now configured the server to 10MB work_mem. > So that may explain why the bitmap is not getting penalized for its > extra CPU time. But that doesn't explain why the estimated cost is > substantially lower than the index scan. That is probably because the > bitmap scan expects it is doing more sequential IO and less random IO. > You could cancel that advantage be setting random_page_cost to about > the same as seq_page_cost (which since you indicated most data will be > cached, would be an appropriate thing to do regardless of this > specific issue). I have set seq_page_cost and random_page_cost to 0.1 in order to indicate that data is cached, the system now selects the faster index scan. Thanks for your help, -- Kim Rydhof Thor Hansen Vadgårdsvej 3, 2. tv. 2860 Søborg Phone: +45 3091 2437 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] LIMIT on partitioned-table!?
0:00'::timestamp without time zone) AND (ts < '2011-02-15 15:00:00'::timestamp without time zone)) -> Bitmap Index Scan on idx_unitstat_y2010m12_nodeid_gps_ts (cost=0.00..127.55 rows=4684 width=0) (actual time=225.009..225.009 rows=2014 loops=1) Index Cond: ((nodeid)::text = 'abcd'::text) -> Bitmap Heap Scan on unitstat_y2010m10 unitstat (cost=101.74..9686.81 rows=4987 width=194) (actual time=488.130..35826.742 rows=25279 loops=1) Recheck Cond: ((nodeid)::text = 'abcd'::text) Filter: ((ts > '2010-01-01 00:00:00'::timestamp without time zone) AND (ts < '2011-02-15 15:00:00'::timestamp without time zone)) -> Bitmap Index Scan on idx_unitstat_y2010m10_nodeid_gps_ts (cost=0.00..100.49 rows=4988 width=0) (actual time=472.796..472.796 rows=25279 loops=1) Index Cond: ((nodeid)::text = 'abcd'::text) -> Bitmap Heap Scan on unitstat_y2010m09 unitstat (cost=489.56..49567.74 rows=27466 width=194) (actual time=185.198..12753.315 rows=31099 loops=1) Recheck Cond: ((nodeid)::text = 'abcd'::text) Filter: ((ts > '2010-01-01 00:00:00'::timestamp without time zone) AND (ts < '2011-02-15 15:00:00'::timestamp without time zone)) -> Bitmap Index Scan on idx_unitstat_y2010m09_nodeid_gps_ts (cost=0.00..482.69 rows=27472 width=0) (actual time=158.072..158.072 rows=31099 loops=1) Index Cond: ((nodeid)::text = 'abcd'::text) -> Index Scan using fki_unitstat_y2010m08_nodeid_ts_fkey on unitstat_y2010m08 unitstat (cost=0.00..9353.76 rows=4824 width=194) (actual time=31.351..10259.090 rows=17606 loops=1) Index Cond: (((nodeid)::text = 'abcd'::text) AND (ts > '2010-01-01 00:00:00'::timestamp without time zone) AND (ts < '2011-02-15 15:00:00'::timestamp without time zone)) -> Index Scan using fki_unitstat_y2010m07_nodeid_ts_fkey on unitstat_y2010m07 unitstat (cost=0.00..8686.72 rows=4492 width=194) (actual time=41.572..9636.335 rows=9511 loops=1) Index Cond: (((nodeid)::text = 'abcd'::text) AND (ts > '2010-01-01 00:00:00'::timestamp without time zone) AND (ts < '2011-02-15 15:00:00'::timestamp without time zone)) -> Bitmap Heap Scan on unitstat_y2010m06 unitstat (cost=311.50..32142.18 rows=17406 width=194) (actual time=113.857..12136.570 rows=17041 loops=1) Recheck Cond: ((nodeid)::text = 'abcd'::text) Filter: ((ts > '2010-01-01 00:00:00'::timestamp without time zone) AND (ts < '2011-02-15 15:00:00'::timestamp without time zone)) -> Bitmap Index Scan on idx_unitstat_y2010m06_nodeid_gps_ts (cost=0.00..307.15 rows=17410 width=0) (actual time=91.638..91.638 rows=17041 loops=1) Index Cond: ((nodeid)::text = 'abcd'::text) -> Index Scan using fki_unitstat_y2010m05_nodeid_ts_fkey on unitstat_y2010m05 unitstat (cost=0.00..11942.82 rows=6279 width=193) (actual time=62.264..19887.675 rows=19246 loops=1) Index Cond: (((nodeid)::text = 'abcd'::text) AND (ts > '2010-01-01 00:00:00'::timestamp without time zone) AND (ts < '2011-02-15 15:00:00'::timestamp without time zone)) -> Index Scan using fki_unitstat_y2010m04_nodeid_ts_fkey on unitstat_y2010m04 unitstat (cost=0.00..11840.93 rows=6194 width=193) (actual time=52.735..17302.361 rows=21936 loops=1) Index Cond: (((nodeid)::text = 'abcd'::text) AND (ts > '2010-01-01 00:00:00'::timestamp without time zone) AND (ts < '2011-02-15 15:00:00'::timestamp without time zone)) -> Index Scan using fki_unitstat_y2010m03_nodeid_ts_fkey on unitstat_y2010m03 unitstat (cost=0.00..11664.36 rows=6101 width=194) (actual time=66.613..17541.374 rows=15471 loops=1) Index Cond: (((nodeid)::text = 'abcd'::text) AND (ts > '2010-01-01 00:00:00'::timestamp without time zone) AND (ts < '2011-02-15 15:00:00'::timestamp without time zone)) Total runtime: 205855.569 ms Regards, Kim -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] LIMIT on partitioned-table!?
Thank you Shaun, removing the ORDER BY worked. But I am afraid to ask this. How can I order by partition? It seams that the planner has picked a random(!?) order of partition to select from. The returned records, from the selected partition, are correctly sorted bythe index though. On 2011-02-15 15:49, Shaun Thomas wrote: On 02/15/2011 08:23 AM, Kim A. Brandt wrote: does `postgres (PostgreSQL) 8.4.5' use the LIMIT of a query when it is run on a partitioned-table or am I doing something wrong? It looks as if postgres queries all partitions and then LIMITing the records afterwards!? This results in a long (>3 minutes) running query. What can I do to optimise this? Make sure you have constraint_exclusion set to 'on' in your config. Also, what are your checks for your partitions? You've got a pretty wide range in your 'ts' checks, so if you're using them as your partition definition, you're not helping yourself. The parameter `constraint_exclusion' is set to `partition'. Postgres is on FreeBSD. My checks (if I understand you right) are as follows: CREATE TABLE flexserver.unitstat_y2011m02 ( ts timestamp without time zone NOT NULL, nodeid character varying(10) NOT NULL, gps_ts timestamp without time zone NOT NULL, ... CONSTRAINT unitstat_y2011m02_ts_check CHECK (ts >= '2011-02-01 00:00:00'::timestamp without time zone AND ts < '2011-03-01 00:00:00'::timestamp without time zone) ) INHERITS (flexserver.unitstat); Each partition is constrained to one month. About the wide range, I am aware of that. This probably has to change anyway!? So the current (and probably final solution) is to use a narrower search range. Thank you for the hint. The main issue might just be that you've used an order clause. LIMIT 1000 or not, even if it can restrict the result set based on your CHECK criteria, it'll still need to select every matching row from every matched partition, order the results, and chop off the first 1000. That was it. Just how can one order by partition if one would do a wide range search over multiple partitions? The new query and EXPLAIN ANALYSE-output is: SELECT * FROM flexserver.unitstat WHERE nodeid = 'abcd' AND ts > '2010-01-01 00:00:00' AND ts < '2011-02-15 15:00:00' --ORDER BY nodeid, ts LIMIT 1000; Limit (cost=0.00..1862.46 rows=1000 width=194) (actual time=2.569..18.948 rows=1000 loops=1) -> Result (cost=0.00..225611.08 rows=121136 width=194) (actual time=2.566..15.412 rows=1000 loops=1) -> Append (cost=0.00..225611.08 rows=121136 width=194) (actual time=2.558..11.243 rows=1000 loops=1) -> Seq Scan on unitstat (cost=0.00..14.90 rows=1 width=258) (actual time=0.003..0.003 rows=0 loops=1) Filter: ((ts > '2010-01-01 00:00:00'::timestamp without time zone) AND (ts < '2011-02-15 15:00:00'::timestamp without time zone) AND ((nodeid)::text = 'abcd'::text)) -> Bitmap Heap Scan on unitstat_y2011m01 unitstat (cost=116.47..8097.17 rows=4189 width=194) (actual time=2.550..7.701 rows=1000 loops=1) Recheck Cond: ((nodeid)::text = 'abcd'::text) Filter: ((ts > '2010-01-01 00:00:00'::timestamp without time zone) AND (ts < '2011-02-15 15:00:00'::timestamp without time zone)) -> Bitmap Index Scan on idx_unitstat_y2011m01_nodeid_gps_ts (cost=0.00..115.42 rows=4190 width=0) (actual time=1.706..1.706 rows=5377 loops=1) Index Cond: ((nodeid)::text = 'abcd'::text) -> Bitmap Heap Scan on unitstat_y2011m02 unitstat (cost=52.92..3744.97 rows=1934 width=194) (never executed) Recheck Cond: ((nodeid)::text = 'abcd'::text) Filter: ((ts > '2010-01-01 00:00:00'::timestamp without time zone) AND (ts < '2011-02-15 15:00:00'::timestamp without time zone)) -> Bitmap Index Scan on idx_unitstat_y2011m02_nodeid_gps_ts (cost=0.00..52.44 rows=1935 width=0) (never executed) Index Cond: ((nodeid)::text = 'abcd'::text) -> Index Scan using fki_unitstat_y2010m02_nodeid_ts_fkey on unitstat_y2010m02 unitstat (cost=0.00..10179.11 rows=5257 width=193) (never executed) Index Cond: (((nodeid)::text = 'abcd'::text) AND (ts > '2010-01-01 00:00:00'::timestamp without time zone) AND (ts < '2011-02-15 15:00:00'::timestamp without time zone)) -> Index Scan using fki_unitstat_y2010m01_nodeid_ts_fkey on unitstat_y2010m01 unitstat (cost=0.00..1032
Re: [PERFORM] LIMIT on partitioned-table!?
Thank you Marti, I will go with the ``reduced number of matched rows'' and naturally be waiting for postgres 9.1 expectantly. Kind regards, Kim On 2011-02-15 22:13, Marti Raudsepp wrote: On Tue, Feb 15, 2011 at 21:33, Kim A. Brandt wrote: removing the ORDER BY worked. But I am afraid to ask this. How can I order by partition? It seams that the planner has picked a random(!?) order of partition to select from. The returned records, from the selected partition, are correctly sorted bythe index though. If a single query accesses more than one partition, PostgreSQL currently cannot read the values in index-sorted order. Hence with ORDER BY and LIMIT, PostgreSQL cannot return *any* results before it has read all matching rows and then sorted them. Adding a LIMIT doesn't help much. Your only bet is to reduce the number of matched rows, or make sure that you only access a single partition. Increasing work_mem may speed up the sort step if you're hitting the disk (EXPLAIN ANALYZE VERBOSE will tell you whether that's the case). This will change in PostgreSQL 9.1 which has a new Merge Append plan node. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance