[PERFORM] Forcing WAL flush
Is there a way to force a WAL flush so that async commits (from other connections) are flushed, short of actually updating a sacrificial row? Would be nice to do it without generating anything extra, even if it is something that causes IO in the checkpoint. Am I right to think that an empty transaction won't do it, and nor will a transaction that is just a NOTIFY? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Forcing WAL flush
Le 2013-01-07 à 16:49, james a écrit : Is there a way to force a WAL flush so that async commits (from other connections) are flushed, short of actually updating a sacrificial row? Would be nice to do it without generating anything extra, even if it is something that causes IO in the checkpoint. Am I right to think that an empty transaction won't do it, and nor will a transaction that is just a NOTIFY? Does pg_start_backup() trigger a full WAL flush? http://www.postgresql.org/docs/9.2/static/continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP http://www.postgresql.org/docs/9.2/static/functions-admin.html#FUNCTIONS-ADMIN-BACKUP-TABLE Bye, François That sounds rather heavyweight! I'm looking for something lightweight - I might call this rather often, as a sort of application-level group commit where I commit async but defer the ack to the requester (or other externally visible side effects) slightly until some other thread forces a flush. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance of complicated query
On 23/05/2013 22:57, Jonathan Morra wrote: I'm not sure I understand your proposed solution. There is also the case to consider where the same patient can be assigned the same device multiple times. In this case, the value may be reset at each assignment (hence the line value - issued_value AS value from the original query). Perhaps you could use triggers to help somewhat? At least for the lifetime part. For a given assignment of a device to a patient, only the last value is useful, so you can maintain that easily enough (a bit like a materialised view but before 9.3 I guess). But, that might fix 'lifetime' but not some arbitrary windowed view. I can see why an 'as at' end time is useful, but not why a start time is so useful: if a device has readings before the window but not in the window, is that 'no reading' or should the last reading prior to the window apply? It also seems to me that the solution you have is hard to reason about. Its like a Haskell program done in one big inline fold rather than a bunch of 'where' clauses, and I find these cause significant brain overload. Perhaps you could break it out into identifiable chunks that work out (both for lifetime if not using triggers, and for your date range otherwise) the readings that are not superceded (ie the last in the date bounds for a device assignment), and then work with those. Consider the CTE 'WITH queries' for doing this? It seems to me that if you can do this, then the problem might be easier to express. Failing that, I'd be looking at using temporary tables, and forcing a series of reduce steps using them, but then I'm a nasty old Sybase hacker at heart. ;-)
[PERFORM] Postgres recovery time
Title: Postgres recovery time Does anyone know what factors affect the recovery time of postgres if it does not shutdown cleanly? With the same size database I've seen times from a few seconds to a few minutes. The longest time was 33 minutes. The 33 minutes was after a complete system crash and reboot so there are a lot of other things going on as well. 125 seconds was the longest time I could reproduce by just doing a kill -9 on postmaster. Is it the size of the transaction log? The dead space in files? I'm running postges 7.3.4 in Red Hat 8.0. Yes, yes I know it's crazy but for a variety of reasons upgrading is not currently feasible. Jim
Re: [PERFORM] Hardware/OS recommendations for large databases ( 5TB)
Unless there was a way to guarantee consistency, it would be hard at best to make this work. Convergence on large data sets across boxes is non-trivial, and diffing databases is difficult at best. Unless there was some form of automated way to ensure consistency, going 8 ways into separate boxes is *very* hard. I do suppose that if you have fancy storage (EMC, Hitachi) you could do BCV or Shadow copies. But in terms of commodity stuff, I'd have to agree with Merlin. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of William Yu Sent: Tuesday, November 15, 2005 10:57 AM To: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Hardware/OS recommendations for large databases ( 5TB) Merlin Moncure wrote: >>You could instead buy 8 machines that total 16 cores, 128GB RAM and > > It's hard to say what would be better. My gut says the 5u box would > be a lot better at handling high cpu/high concurrency problems...like > your typical business erp backend. This is pure speculation of > course...I'll defer to the experts here. In this specific case (data warehouse app), multiple machines is the better bet. Load data on 1 machine, copy to other servers and then use a middleman to spread out SQL statements to each machine. I was going to suggest pgpool as the middleman but I believe it's limited to 2 machines max at this time. I suppose you could daisy chain pgpools running on every machine. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] make bulk deletes faster?
I have the following table: CREATE TABLE timeblock ( timeblockid int8 NOT NULL, starttime timestamp, endtime timestamp, duration int4, blocktypeid int8, domain_id int8, create_date timestamp, revision_date timestamp, scheduleid int8, CONSTRAINT timeblock_pkey PRIMARY KEY (timeblockid), CONSTRAINT fk25629e03312570b FOREIGN KEY (blocktypeid) REFERENCES blocktype (blocktypeid) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT fk25629e09be84177 FOREIGN KEY (domain_id) REFERENCES wa_common_domain (domain_id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION ) WITH OIDS; CREATE INDEX timeblock_blocktype_idx ON timeblock USING btree (blocktypeid); CREATE INDEX timeblock_date_idx ON timeblock USING btree (starttime, endtime); CREATE INDEX timeblockepoch_idx ON timeblock USING btree (date_trunc('minute'::text, starttime), (date_part('epoch'::text, date_trunc('minute'::text, starttime)) * 1000::double precision), date_trunc('minute'::text, endtime), (date_part('epoch'::text, date_trunc('minute'::text, endtime)) * 1000::double precision)); CREATE INDEX timeblockhourmin_idx ON timeblock USING btree (date_part('hour'::text, starttime), date_part('minute'::text, starttime), date_part('hour'::text, endtime), date_part('minute'::text, endtime)); CREATE INDEX timeblockid_idx ON timeblock USING btree (timeblockid); There are also indexes on wa_common_domain and blocktype on pkeys. explain analyze delete from timeblock where timeblockid = 66 Index Scan using timeblockid_idx on timeblock (cost=0.00..5.28 rows=1 width=6) (actual time=0.022..0.022 rows=0 loops=1) Index Cond: (timeblockid = 66) Total runtime: 0.069 ms I need to routinely move data from the timeblock table to an archive table with the same schema named timeblock_archive. I really need this to happen as quickly as possible, as the archive operation appears to really tax the db server... I'd like some suggestions on how to get the deletes to happen faster, as while deleting individually appears to extremely fast, when I go to delete lots of rows the operation takes an extremely long time to complete (5000 rows takes about 3 minutes, 100 rows takes almost close to 4 hours or more depending upon server load; wall time btw). i've tried several different approaches doing the delete and I can't seem to make it much faster... anyone have any ideas? The approaches I've taken both use a temp table to define the set that needs to be deleted. Here's what I've tried: Attempt 1: -- delete from timeblock where timeblockid in (select timeblockid from timeblock_tmp) Attempt 2: -- num_to_delete := (select count(1) from tmp_timeblock); RAISE DEBUG 'archiveDailyData(%): need to delete from timeblock [% rows]', timestart, num_to_delete; cur_offset := 0; while cur_offset < num_to_delete loop delete from timeblock where timeblockid in (select timeblockid from tmp_timeblock limit 100 offset cur_offset); get diagnostics num_affected = ROW_COUNT; RAISE DEBUG 'archiveDailyData(%): delete from timeblock [% rows] cur_offset = %', timestart, num_affected, cur_offset; cur_offset := cur_offset + 100; end loop; Attempt 3: -- num_to_delete := (select count(1) from tmp_timeblock); cur_offset := num_to_delete; RAISE DEBUG 'archiveDailyData(%): need to delete from timeblock [% rows]', timestart, num_to_delete; open del_cursor for select timeblockid from tmp_timeblock; loop fetch del_cursor into del_pkey; if not found then exit; else delete from timeblock where timeblockid = del_pkey; get diagnostics num_affected = ROW_COUNT; cur_offset := cur_offset - num_affected; if cur_offset % 1000 = 0 then RAISE DEBUG 'archiveDailyData(%): delete from timeblock [% left]', timestart, cur_offset; end if; end if; end loop; close del_cursor; I've considered using min(starttime) and max(starttime) from the temp table and doing: delete from timeblock where starttime between min and max; however, I'm concerned about leaving orphan data, deleting too much data running into foreign key conflicts, etc. dropping the indexes on timeblock could be bad, as this table recieves has a high volume on reads, inserts & updates. Any one have any suggestions? Thanks, Jim K ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] make bulk deletes faster?
In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] (Michael Fuhr) wrote: > On Sat, Dec 17, 2005 at 09:10:40PM -0800, James Klo wrote: > > I'd like some suggestions on how to get the deletes to happen faster, as > > while deleting individually appears to extremely fast, when I go to > > delete lots of rows the operation takes an extremely long time to > > complete (5000 rows takes about 3 minutes, 100 rows takes almost > > close to 4 hours or more depending upon server load; wall time btw). > > Those times do seem excessive -- do any other tables have foreign > key references to the table you're deleting from? If so, do those > tables have indexes on the referring columns? Does this table or > any referring table have triggers? Also, are you regularly vacuuming > and analyzing your tables? Have you examined pg_locks to see if > an unacquired lock might be slowing things down? As the table was originally created using Hibernate, yes, there are several key references, however I've already added indexes those tables on referring keys. There are no triggers, we were running pg_autovaccum, but found that it wasn't completing. I believe we disabled, and are now running a cron every 4 hours. My archiving method, is also running analyze - as I figure after a mass deletes, it would probably keep query speeds from degrading.) I've looked at pg_locks, but not sure I understand quite how to use it to determine if there are unacquired locks. I do know that we occasionally get some warnings from C3P0 that states it detects a deadlock, and allocates emergency threads. BTW, If I didn't mention, we are using PG 8.1 on Red Hat Enterprise, 4GB RAM, 4 dual-core CPUs, think its RAID5 (looks like what I would consider typical Linux partitioning /, /tmp, /usr, /var, /boot, /home). After trolling the archives, and doing a bit of sleuthing on the DB, I'm lead to believe that this is more or less a default install of PG 8.1. As I'm relatively new to PG, I'm not sure how it should be configured for our setup. I would suspect that this could probably effect the speed of deletes (and queries as well). Thanks for any help you can provide. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] make bulk deletes faster?
Mitch Skinner wrote: Have you considered partitioning? http://www.postgresql.org/docs/8.1/interactive/ddl-partitioning.html If you can partition your timeblock table so that you archive an entire partition at a time, then you can delete the archived rows by just dropping (or truncating) that partition. AFAIK there's no way to "re-parent" a partition (e.g., from the timeblock table to the timeblock_archive table). If your app is particularly cooperative you might be able to use partitioning to avoid moving data around entirely. If table accesses are always qualified by something you can use as a partitioning key, then partitioning can give you the speed benefits of a small table without the effort of keeping it cleared out. Yes, I've considered partitioning as a long term change. I was thinking about this for other reasons - mainly performance. If I go the partitioning route, would I need to even perform archival? The larger problem that I need to solve is really twofold: 1. Need to keep reads on timeblocks that are from the current day through the following seven days very fast, especially current day reads. 2. Need to be able to maintain the timeblocks for reporting purposes, for at least a year (potentially more). This could probably better handled performing aggregate analysis, but this isn't on my current radar. Another good read, if you haven't yet, is http://powerpostgresql.com/Downloads/annotated_conf_80.html especially the "Memory", "Checkpoints", and maybe "WAL options" sections. If you're doing large deletes then you may need to increase your free space map settings--if a VACUUM VERBOSE finishes by saying that you need more FSM pages, then the table may have gotten bloated over time (which can be fixed with a configuration change and a VACUUM FULL, though this will lock everything else out of the table while it's running). Thanks, I will look into this as well. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Optimizing a query
I've currently got this table: , | n=# \d nanpa | Table "public.nanpa" |Column | Type | Modifiers | +--+--- | state | character(2) | | npa| character(3) | not null | nxx| character(3) | not null | ocn| character(4) | | company| text | | ratecenter | text | | switch | text | | effective | date | | use| character(2) | not null | assign | date | | ig | character(1) | | Indexes: | "nanpa_pkey" PRIMARY KEY, btree (npa, nxx) CLUSTER ` and was doing queries of the form: , | select * from nanpa where npa=775 and nxx=413; ` where were quite slow. Explain showed that it was doing sequential scans even though the primary key contained the two term I was selecting on. Today, looking at it again in prep to this post, I noticed that the numbers were being converted to ::text, and a quick test showed that queries of the form: , | select * from nanpa where npa=775::bpchar and nxx=413::bpchar; ` used the index. I specified char(3) when I created the table simple because npa and nxx are defined as three-character strings. Tagging the queies is a pain, especially as I often do queries of that form in psql(1). (Incidently, there are multiple similar tables, also keyed on (npa,nxx), which show the same problem. The nanpa table above is just a good example.) Should I convert the columns to text? Or create an additional index that expects ::text args? (If so, how?) Or is there some other way to ensure the indices get used w/o having to tag data in the queries? Thanks, -JimC -- James Cloos <[EMAIL PROTECTED]> OpenPGP: 1024D/ED7DAEA6 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Optimizing a query
>>>>> "Husam" == Tomeh, Husam <[EMAIL PROTECTED]> writes: Husam> Have you run vacuum/analyze on the table? Yes, back when I first noticed how slow it was. It did not make any difference. explain analyze says: , | n=# explain analyse select * from nanpa where npa=775 and nxx=473; |QUERY PLAN | | Seq Scan on nanpa (cost=0.00..5344.60 rows=4 width=105) (actual time=371.718..516.816 rows=1 loops=1) |Filter: (((npa)::text = '775'::text) AND ((nxx)::text = '473'::text)) | Total runtime: 516.909 ms | (3 rows) ` vs: , | n=# explain analyse select * from nanpa where npa=775::char and nxx=473::char; | QUERY PLAN | -- | Index Scan using nanpa_pkey on nanpa (cost=0.00..4.33 rows=1 width=105) (actual time=64.831..64.831 rows=0 loops=1) |Index Cond: ((npa = '7'::bpchar) AND (nxx = '4'::bpchar)) | Total runtime: 64.927 ms | (3 rows) ` BTW, I forgot to mention I'm at 8.1.4 on that box. -JimC -- James Cloos <[EMAIL PROTECTED]> OpenPGP: 1024D/ED7DAEA6 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] compact flash disks?
I see that one can now get compact flash to SATA connectors. If I were to use a filesystem with noatime etc and little non-sql traffic, does the physical update pattern tend to have hot sectors that will tend to wear out CF? I'm wondering about a RAID5 with data on CF drives and RAID1 for teh WAL on a fast SATA or SAS drive pair. I'm thhinking that this would tend to have good performance because the seek time for the data is very low, even if the actual write speed can be slower than state of the art. 2GB CF isn't so pricey any more. Just wondering. James -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.446 / Virus Database: 268.18.7/711 - Release Date: 05/03/2007 09:41 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] compact flash disks?
>WARNING: modern TOtL flash RAMs are only good for ~1.2M writes per >memory cell. and that's the =good= ones. >Using flash RAM for write heavy applications like OLTP, or for WAL, >etc can be very dangerous Well, that's why I suggested that the WAL would stream to a hard disk array, where the large IO sequential write speed will be helpful. Whether OLTP is a problem will presumably depend on the freqency of updates and vacuum to each physical cluster of rows in a disk block. Most rows in a trading application will have quite a long lifetime, and be updated relatively few times (even where we writing fixings info into trades). >Flash write speeds also stink; being ~1/2 flash's already low read speed. Sure - but it may still be an effective tradoff where the limiting factor would otherwise be seek time. >Much better to use flash RAM for read heavy applications. Why? I can get a 'PC' server with 128GB of RAM quite easily now, and that will mean I can cache most of not all hot data for any trading app I've worked on. Settled trades that matured in prior periods can be moved to tables on real disks - they are hardly ever accessed anyway. In the long run, we are going to have to seriously rethink pg's use of WAL as the way we implement MVCC as it becomes more and more of a performance bottleneck. We have WAL because Stonebreaker made an assumption about the future dominance of optical media that has turned out to be false. ...and it's been one of pg's big issues every since. >> 2GB CF isn't so >>pricey any more. >Heck =16= GB Flash only costs ~$300 US and 128GB SSDs based on flash >RAM are due out this year. Quite. Suppose I have a RAID with double redundancy, then I get enough capacity for quite a lot of raw data, and can swap a card out every weekend and let the RAID rebuild it in rotation to keep them within conservative wear limits. So long as the wear levelling works moderately well (and without needing FAT on the disk or whatever) then I should be fine. I think. Maybe. James -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.446 / Virus Database: 268.18.7/713 - Release Date: 07/03/2007 09:24 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] compact flash disks?
Isn't it likely that a single stream (or perhaps one that can be partitioned across spindles) will tend to be fastest, since it has a nice localised stream that a) allows for compression of reasonable blocks and b) fits with commit aggregation? RAM capacity on servers is going up and up, but the size of a customer address or row on an invoice isn't. I'd like to see an emphasis on speed of update with an assumption that most hot data is cached, most of the time. My understanding also is that storing data columnwise is handy when its persisted because linear scans are much faster. Saw it once with a system modelled after APL, blew me away even on a sparc10 once the data was organised and could be mapped. Still, for the moment anything that helps with the existing system would be good. Would it help to define triggers to be deferrable to commit as well as end of statement (and per row)? Seems to me it should be, at least for ones that raise 'some thing changed' events. And/or allow specification that events can fold and should be very cheap (don't know if this is the case now? Its not as well documented how this works as I'd like) James -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.446 / Virus Database: 268.18.7/713 - Release Date: 07/03/2007 09:24 ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] Dispatch-Merge pattern
Howdy- I am currently using PostgreSQL to store and process a high-bandwidth event stream. I do not need old events but the delete and vacuum does not terminate due to the large number of events being inserted (it just pushes me over the tipping point of where the machine can keep up with the events). I ended up implementing a scheme where a trigger is used to redirect the events (round robin based on time) to a series of identically structured tables. I can then use TRUNCATE older tables rather than DELETE and VACUUM (which is a significant speed up). It worked out pretty well so thought post the idea to find out if - it is stupid way of doing things and there is a correct database abstraction for doing this or - it is a reasonable way of solving this problem and might be of use to other folks using rdbs as event processing I then use a view to merge the tables. Obviously update would be a problem for my purposes, and I suppose a lot of event processing, it isn't an issue. Either way, details are at: http://unsyntax.net/james/blog/tools+and+programming/2007/03/08/Dispatch-Merge-Database-Pattern Cheers, James ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] compact flash disks?
>On sequential read speed HDs outperform flash disks... only on random >access the flash disks are better. So if your application is a DW one, >you're very likely better off using HDs. This looks likely to be a non-issue shortly, see here: http://www.reghardware.co.uk/2007/03/27/sams_doubles_ssd_capacity/ I still think this sort of devices will become the OLTP device of choice before too long - even if we do have to watch the wear rate. >WARNING: modern TOtL flash RAMs are only good for ~1.2M writes per >memory cell. and that's the =good= ones. Well, my original question was whether the physical update pattern of the server does have hotspots that will tend to cause a problem in normal usage if the wear levelling (such as it is) doesn't entirely spread the load. The sorts of application I'm interested in will not update individual data elements very often. There's a danger that index nodes might be rewritted frequently, but one might want to allow that indexes persist lazily and should be recovered from a scan after a crash that leaves them dirty, so that they can be cached and avoid such an access pattern. Out of interest with respect to WAL - has anyone tested to see whether one could tune the group commit to pick up slightly bigger blocks and write the WAL using compression, to up the *effective* write speed of media? Once again, most data I'm interested in is far from a random pattern and tends to compress quite well. If the WAL write is committed to the disk platter, is it OK for arbitrary data blocks to have failed to commit to disk so we can recover the updates for committed transactions? Is theer any documentation on the write barrier usage? James -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.446 / Virus Database: 268.18.25/744 - Release Date: 03/04/2007 05:32 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] SCSI vs SATA
>sure but for any serious usage one either wants to disable that >cache(and rely on tagged command queuing or how that is called in SATAII >world) or rely on the OS/raidcontroller implementing some sort of >FUA/write barrier feature(which linux for example only does in pretty >recent kernels) Does anyone know which other hosts have write barrier implementations? Solaris? FreeBSD? Windows? The buffers should help greatly in such a case, right? Particularly if you have quite a wide stripe. -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.446 / Virus Database: 268.18.25/745 - Release Date: 03/04/2007 12:48 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] SCSI vs SATA
>Right --- the point is not the interface, but whether the drive is built >for reliability or to hit a low price point. Personally I take the marketing mublings about the enterprise drives with a pinch of salt. The low-price drives HAVE TO be reliable too, because a non-negligible failure rate will result in returns processing costs that destroy a very thin margin. Granted, there was a move to very short warranties a while back, but the trend has been for more realistic warranties again recently. You can bet they don't do this unless the drives are generally pretty good. James -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.446 / Virus Database: 268.18.25/745 - Release Date: 03/04/2007 12:48 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] SCSI vs SATA
>Server drives are generally more tolerant of higher temperatures. I.e. >the failure rate for consumer and server class HDs may be about the same >at 40 degrees C, but by the time the internal case temps get up to 60-70 >degrees C, the consumer grade drives will likely be failing at a much >higher rate, whether they're working hard or not. Can you cite any statistical evidence for this? James -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.446 / Virus Database: 268.18.26/746 - Release Date: 04/04/2007 13:09 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] SCSI vs SATA
>Logic? Foul! That's NOT evidence. > >Mechanical devices have decreasing MTBF when run in hotter environments, >often at non-linear rates. I agree that this seems intuitive. But I think taking it as a cast-iron truth is dangerous. >Server class drives are designed with a longer lifespan in mind. Evidence? >Server class hard drives are rated at higher temperatures than desktop >drives. > >Google can supply any numbers to fill those facts in, but I found a >dozen or so data sheets for various enterprise versus desktop drives in >a matter of minutes. I know what the marketing info says, that's not the point. Bear in mind that these are somewhat designed to justify very much higher prices. I'm looking for statistical evidence that the difference is there, not marketing colateral. They may be designed to be more reliable. And the design targets *that the manufacturer admits to* may be more stringent, but I'm interested to know what the actual measured difference is. >From the sound of it, you DON'T have such evidence. Which is not a surprise, because I don't have it either, and I do try to keep my eyes open for it. James -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.446 / Virus Database: 269.0.0/751 - Release Date: 07/04/2007 22:57 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] 121+ million record table perf problems
I've got a table with ~121 million records in it. Select count on it currently takes ~45 minutes, and an update to the table to set a value on one of the columns I finally killed after it ran 17 hours and had still not completed. Queries into the table are butt slow, and The update query that started this all I had to kill after 17hours. It should have updated all 121+ million records. That brought my select count down to 19 minutes, but still a far cry from acceptable. If you have a column that needs to be updated often for all rows, separate it into a different table, and create a view that joins it back to the main table so that your application still sees the old schema. This will greatly speed your update since (in Postgres) and update is the same as a delete+insert. By updating that one column, you're re-writing your entire 121 million rows. If you separate it, you're only rewriting that one column. Don't forget to vacuum/analyze and reindex when you're done. Better yet, if you can stand a short down time, you can drop indexes on that column, truncate, then do 121 million inserts, and finally reindex. That will be MUCH faster. Craig ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] Auto-ANALYZE?
Auto-vacuum has made Postgres a much more "friendly" system. Is there some reason the planner can't also auto-ANALYZE in some situations? Here's an example I ran into: create table my_tmp_table (...); insert into my_tmp_table (select some stuff from here and there); select ... from my_tmp_table join another_table on (...); The last statement generated a horrible plan, because the planner had no idea what was in the temporary table (which only had about 100 rows in it). Simply inserting an ANALYZE before the SELECT improved performance by a factor of 100 or so. There are several situations where you could automatically analyze the data. 1. Any time you have to do a full table scan, you might as well throw in an ANALYZE of the data you're scanning. If I understand things, ANALYZE takes a random sample anyway, so a full table scan should be able to produce even better statistics than a normal ANALYZE. 2. If you have a table with NO statistics, the chances of generating a sensible plan are pretty random. Since ANALYZE is quite fast, if the planner encounters no statistics, why not ANALYZE it on the spot? (This might need to be a configurable feature, though.) 3. A user-configurable update threshold, such as, "When 75% of the rows have changed since the last ANALYZE, trigger an auto-analyze." The user-configurable part would account for the fact that some tables stats don't change much even after many updates, but others may need to be reanalyzed after a modest number of updates. Auto-vacuum, combined with auto-analyze, would eliminate many of the problems that plague neophyte (and sometimes experienced) users of Postgres. A substantial percentage of the questions to this list are answered with, "Have you ANALYZED?" Craig ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] LIKE search and performance
Alexander Staubo wrote: On 5/23/07, Andy <[EMAIL PROTECTED]> wrote: An example would be: SELECT * FROM table WHERE name like '%john%' or street like '%srt%' Anyway, the query planner always does seq scan on the whole table and that takes some time. How can this be optimized or made in another way to be faster? There's no algorithm in existence that can "index" arbitrary substrings the way you think. The only rational way to accomplish this is to first break the text into substrings using some algorithm (eg., words delimited by whitespace and punctuation), and index the substrings individually. That seems rather harsh. If I'd put an index on each of these colomns I'd certainly expect it to use the indices - and I'm pretty sure that Sybase would. I'd expect it to scan the index leaf pages instead of the table itself - they should be much more compact and also likely to be hot in cache. Why *wouldn't* the planner do this? James ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] LIKE search and performance
If Sybase is still like SQL Server (or the other way around), it *may* end up scanning the index *IFF* the index is a clustered index. If it's a normal index, it will do a sequential scan on the table. Are you sure its not covered? Have to check at work - but I'm off next week so it'll have to wait. It's not a win on PostgreSQL, because of our MVCC implementation. We need to scan *both* index *and* data pages if we go down that route, in which case it's a lot faster to just scan the data pages alone. Why do you need to go to all the data pages - doesn't the index structure contain all the keys so you prefilter and then check to see if the *matched* items are still in view? I'll be first to admit I know zip about Postgres, but it seems odd - doesn't the index contain copies of the key values?. I suspect that I mis-spoke with 'leaf'. I really just mean 'all index pages with data', since the scan does not even need to be in index order, just a good way to get at the data in a compact way. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] LIKE search and performance
Mark Lewis wrote: PG could scan the index looking for matches first and only load the actual rows if it found a match, but that could only be a possible win if there were very few matches, because the difference in cost between a full index scan and a sequential scan would need to be greater than the cost of randomly fetching all of the matching data rows from the table to look up the visibility information. Just out of curiosity: Does Postgress store a duplicate of the data in the index, even for long strings? I thought indexes only had to store the string up to the point where there was no ambiguity, for example, if I have "missing", "mississippi" and "misty", the index only needs "missin", "missis" and "mist" in the actual index. This would make it impossible to use a full index scan for a LIKE query. Craig ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] LIKE search and performance
Alvaro Herrera wrote: >> Just out of curiosity: Does Postgress store a duplicate of the data in the index, even for long strings? I thought indexes only had to store the string up to the point where there was no ambiguity, for example, if I have "missing", "mississippi" and "misty", the index only needs "missin", "missis" and "mist" in the actual index. What would happen when you inserted a new tuple with just "miss"? You would need to expand all the other tuples in the index. That's right. This technique used by some index implementations is a tradeoff between size and update speed. Most words in most natural languages can be distinguished by the first few characters. The chances of having to modify more than a few surrounding nodes when you insert "miss" is small, so some implementations choose this method. Other implementations choose to store the full string. I was just curious which method Postgres uses. Craig ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] ECC RAM really needed?
We're thinking of building some new servers. We bought some a while back that have ECC (error correcting) RAM, which is absurdly expensive compared to the same amount of non-ECC RAM. Does anyone have any real-life data about the error rate of non-ECC RAM, and whether it matters or not? In my long career, I've never once had a computer that corrupted memory, or at least I never knew if it did. ECC sound like a good idea, but is it solving a non-problem? Thanks, Craig ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] How PostgreSQL handles multiple DDBB instances?
On Fri, 2007-05-25 at 20:16 +0200, Arnau wrote: The point I'm worried is performance. Do you think the performance would be better executing exactly the same queries only adding an extra column to all the tables e.g. customer_id, than open a connection to the only one customers DB and execute the query there? There is no simple answer to this question; it depends too much on your data. In many cases, adding a customer_id to every table, and perhaps also per-customer views (per Jeff's suggestion), can work really well. However, performance is not the only consideration, or even the main consideration. We operate with about 150 separate databases. In our cases, administration issues and software design outweighed performance issues. For example, with separate databases, security is simpler, *and* it's easy to convince the customer that their data is protected. Creating views only helps for read-only access. When the customer wants to modify their data, how will you keep them from accessing and overwriting one another's data? Even with views, can you convince the customer you've done it right? With separate databases, you use the built-in security of Postgres, and don't have to duplicate it in your schema and apps. With separate databases, it's really easy to discard a customer. This can be particularly important for a big customer with millions of linked records. In a database-for-everyone design, you'll have lots of foreign keys, indexes, etc. that make deleting a whole customer a REALLY big job. Contrast that with just discarding a whole database, which typically takes a couple seconds. But even more important (to us) is the simplicity of the applications and management. It's far more than just an extra " ... and customer = xyz" added to every query. Throwing the customers together means every application has to understand security, and many operations that would be simple become horribly tangled. Want to back up a customer's data? You can't use pg_dump, you have to write your own dump app. Want to restore a customer's data? Same. Want to do a big update? Your whole database is affected and probably needs to be vacuum/analyzed. On and on, at every turn, management and applications are more complex. If you have hundreds of separate databases, it's also easy to scale: Just buy more servers, and move some of the databases. With a single monster database, as load increases, you may hit the wall sooner or later. Postgres is really good at maintaining many separate databases. Why do it yourself? There are indeed performance issues, but even that's not black and white. Depending on the specifics of your queries and the load on your servers, you may get better performance from a single monster database, or from hundreds of separate databases. So, your question has no simple answer. You should indeed evaluate the performance, but other issues may dominate your decision. Craig ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[PERFORM] Autodetect of software RAID1+0 fails
Apologies for a somewhat off-topic question, but... The Linux kernel doesn't properly detect my software RAID1+0 when I boot up. It detects the two RAID1 arrays, the partitions of which are marked properly. But it can't find the RAID0 on top of that, because there's no corresponding device to auto-detect. The result is that it creates /dev/md0 and /dev/md1 and assembles the RAID1 devices on bootup, but /dev/md2 isn't created, so the RAID0 can't be assembled at boot time. Here's what it looks like: $ cat /proc/mdstat Personalities : [raid0] [raid1] md2 : active raid0 md0[0] md1[1] 234436224 blocks 64k chunks md1 : active raid1 sde1[1] sdc1[2] 117218176 blocks [2/2] [UU] md0 : active raid1 sdd1[1] sdb1[0] 117218176 blocks [2/2] [UU] $ uname -r 2.6.12-1.1381_FC3 After a reboot, I always have to do this: mknod /dev/md2 b 9 2 mdadm --assemble /dev/md2 /dev/md0 /dev/md1 mount /dev/md2 What am I missing here? Thanks, Craig ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Thousands of tables versus on table?
[EMAIL PROTECTED] wrote: various people (not database experts) are pushing to install Oracle cluster so that they can move all of these to one table with a customerID column. They're blowing smoke if they think Oracle can do this. One of my applications had this exact same problem -- table-per-customer versus big-table-for-everyone. Oracle fell over dead, even with the best indexing possible, tuned by the experts, and using partitions keyed to the customerID. We ended up breaking it up into table-per-customer because Oracle fell over dead when we had to do a big update on a customer's entire dataset. All other operations were slowed by the additional index on the customer-ID, especially complex joins. With a table-for-everyone, you're forced to create tricky partitioning or clustering, clever indexes, and even with that, big updates are problematic. And once you do this, then you become heavily tied to one RDBMS and your applications are no longer portable, because clustering, indexing, partitioning and other DB tuning tricks are very specific to each RDBMS. When we moved to Postgres, we never revisited this issue, because both Oracle and Postgres are able to handle thousands of tables well. As I wrote in a previous message on a different topic, often the design of your application is more important than the performance. In our case, the table-per-customer makes the applications simpler, and security is MUCH easier. Oracle is simply not better than Postgres in this regard. As far as I know, there is only one specific situation (discussed frequently here) where Oracle is faster: the count(), min() and max() functions, and I know significant progress has been made since I started using Postgres. I have not found any other query where Oracle is significantly better, and I've found several where Postgres is the clear winner. It's telling that Oracle's license contract prohibits you from publishing comparisons and benchmarks. You have to wonder why. Craig ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Thousands of tables versus on table?
Scott Marlowe wrote: OTOH, there are some things, like importing data, which are MUCH faster in pgsql than in the big database. An excellent point, I forgot about this. The COPY command is the best thing since the invention of a shirt pocket. We have a database-per-customer design, and one of the mosterous advantages of Postgres is that we can easily do backups. A pg_dump, then scp to a backup server, and in just a minute or two we have a full backup. For recovery, pg_restore is equally fast and amazing. Last time I checked, Oracle didn't have anything close to this. Craig ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Thousands of tables versus on table?
Jonah H. Harris wrote: On 6/6/07, Craig James <[EMAIL PROTECTED]> wrote: They're blowing smoke if they think Oracle can do this. Oracle could handle this fine. Oracle fell over dead, even with the best indexing possible, tuned by the experts, and using partitions keyed to the customerID. I don't think so, whoever tuned this likely didn't know what they were doing. Wrong on both counts. You didn't read my message. I said that *BOTH* Oracle and Postgres performed well with table-per-customer. I wasn't Oracle bashing. In fact, I was doing the opposite: Someone's coworker claimed ORACLE was the miracle cure for all problems, and I was simply pointing out that there are no miracle cures. (I prefer Postgres for many reasons, but Oracle is a fine RDBMS that I have used extensively.) The technical question is simple: Table-per-customer or big-table-for-everyone. The answer is, "it depends." It depends on your application, your read-versus-write ratio, the table size, the design of your application software, and a dozen other factors. There is no simple answer, but there are important technical insights which, I'm happy to report, various people contributed to this discussion. Perhaps you have some technical insight too, because it really is an important question. The reason I assert (and stand by this) that "They're blowing smoke" when they claim Oracle has the magic cure, is because Oracle and Postgres are both relational databases, they write their data to disks, and they both have indexes with O(log(N)) retrieval/update times. Oracle doesn't have a magical workaround to these facts, nor does Postgres. Craig ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] LIKE search and performance
[EMAIL PROTECTED] wrote: What is a real life example where an intelligent and researched database application would issue a like or ilike query as their primary condition in a situation where they expected very high selectivity? In my case the canonical example is to search against textual keys where the search is performed automatically if the user hs typed enough data and paused. In almost all cases the '%' trails, and I'm looking for 'starts with' in effect. usually the search will have a specified upper number of returned rows, if that's an available facility. I realise in this case that matching against the index does not allow the match count unless we check MVCC as we go, but I don't see why another thread can't be doing that. James ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Best way to delete unreferenced rows?
Tyrrill, Ed wrote: I have a table, let's call it A, whose primary key, a_id, is referenced in a second table, let's call it B. For each unique A.a_id there are generally many rows in B with the same a_id. My problem is that I want to delete a row in A when the last row in B that references it is deleted. Right now I just query for rows in A that aren't referenced by B, and that worked great when the tables were small, but it takes over an hour now that the tables have grown larger (over 200 million rows in B and 14 million in A). The delete has to do a sequential scan of both tables since I'm looking for what's not in the indexes. I was going to try creating a trigger after delete on B for each row to check for more rows in B with the same a_id, and delete the row in A if none found. In general I will be deleting 10's of millions of rows from B and 100's of thousands of rows from A on a daily basis. What do you think? Does anyone have any other suggestions on different ways to approach this? Essentially what you're doing is taking the one-hour job and spreading out in little chunks over thousands of queries. If you have 10^7 rows in B and 10^5 rows in A, then on average you have 100 references from B to A. That means that 99% of the time, your trigger will scan B and find that there's nothing to do. This could add a lot of overhead to your ordinary transactions, costing a lot more in the long run than just doing the once-a-day big cleanout. You didn't send the specifics of the query you're using, along with an EXPLAIN ANALYZE of it in operation. It also be that your SQL is not optimal, and that somebody could suggest a more efficient query. It's also possible that it's not the sequential scans that are the problem, but rather that it just takes a long time to delete 100,000 rows from table A because you have a lot of indexes. Or it could be a combination of performance problems. You haven't given us enough information to really analyze your problem. Send more details! Craig ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Best way to delete unreferenced rows?
Tyrrill, Ed wrote: QUERY PLAN --- Merge Left Join (cost=38725295.93..42505394.70 rows=13799645 width=8) (actual time=6503583.342..8220629.311 rows=93524 loops=1) Merge Cond: ("outer".record_id = "inner".record_id) Filter: ("inner".record_id IS NULL) -> Index Scan using backupobjects_pkey on backupobjects (cost=0.00..521525.10 rows=13799645 width=8) (actual time=15.955..357813.621 rows=13799645 loops=1) -> Sort (cost=38725295.93..39262641.69 rows=214938304 width=8) (actual time=6503265.293..7713657.750 rows=214938308 loops=1) Sort Key: backup_location.record_id -> Seq Scan on backup_location (cost=0.00..3311212.04 rows=214938304 width=8) (actual time=11.175..1881179.825 rows=214938308 loops=1) Total runtime: 8229178.269 ms (8 rows) I ran vacuum analyze after the last time any inserts, deletes, or updates were done, and before I ran the query above. I've attached my postgresql.conf. The machine has 4 GB of RAM. I thought maybe someone with more expertise than me might answer this, but since they haven't I'll just make a comment. It looks to me like the sort of 214 million rows is what's killing you. I suppose you could try to increase the sort memory, but that's a lot of memory. It seems to me an index merge of a relation this large would be faster, but that's a topic for the experts. On a theoretical level, the problem is that it's sorting the largest table. Perhaps you could re-cast the query so that it only has to sort the smaller table, something like select a.id from a where a.id not in (select distinct b.id from b) where "b" is the smaller table. There's still no guarantee that it won't do a sort on "a", though. In fact one of the clever things about Postgres is that it can convert a query like the one above into a regular join, unless you do something like "select ... offset 0" which blocks the optimizer from doing the rearrangement. But I think the first approach is to try to tune for a better plan using your original query. Craig ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] test / live environment, major performance difference
On 2007-06-11 Christo Du Preez wrote: I really hope someone can shed some light on my problem. I'm not sure if this is a posgres or potgis issue. Anyway, we have 2 development laptops and one live server, somehow I managed to get the same query to perform very well om my laptop, but on both the server and the other laptop it's really performing bad. One simple possibility that bit me in the past: If you do pg_dump/pg_restore to create a copy of the database, you have to ANALYZE the newly-restored database. I mistakenly assumed that pg_restore would do this, but you have to run ANALYZE explicitely after a restore. Craig ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Replication
Looking for replication solutions, I find: Slony-I Seems good, single master only, master is a single point of failure, no good failover system for electing a new master or having a failed master rejoin the cluster. Slave databases are mostly for safety or for parallelizing queries for performance. Suffers from O(N^2) communications (N = cluster size). Slony-II Seems brilliant, a solid theoretical foundation, at the forefront of computer science. But can't find project status -- when will it be available? Is it a pipe dream, or a nearly-ready reality? PGReplication Appears to be a page that someone forgot to erase from the old GBorg site. PGCluster Seems pretty good, but web site is not current, there are releases in use that are not on the web site, and also seems to always be a couple steps behind the current release of Postgres. Two single-points failure spots, load balancer and the data replicator. Is this a good summary of the status of replication? Have I missed any important solutions or mischaracterized anything? Thanks! Craig ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Replication
Thanks to all who replied and filled in the blanks. The problem with the web is you never know if you've missed something. Joshua D. Drake wrote: Looking for replication solutions, I find... Slony-II Dead Wow, I'm surprised. Is it dead for lack of need, lack of resources, too complex, or all of the above? It sounded like such a promising theoretical foundation. Ben wrote: Which replication problem are you trying to solve? Most of our data is replicated offline using custom tools tailored to our loading pattern, but we have a small amount of "global" information, such as user signups, system configuration, advertisements, and such, that go into a single small (~5-10 MB) "global database" used by all servers. We need "nearly-real-time replication," and instant failover. That is, it's far more important for the system to keep working than it is to lose a little data. Transactional integrity is not important. Actual hardware failures are rare, and if a user just happens to sign up, or do "save preferences", at the instant the global-database server goes down, it's not a tragedy. But it's not OK for the entire web site to go down when the one global-database server fails. Slony-I can keep several slave databases up to date, which is nice. And I think I can combine it with a PGPool instance on each server, with the master as primary and few Slony-copies as secondary. That way, if the master goes down, the PGPool servers all switch to their secondary Slony slaves, and read-only access can continue. If the master crashes, users will be able to do most activities, but new users can't sign up, and existing users can't change their preferences, until either the master server comes back, or one of the slaves is promoted to master. The problem is, there don't seem to be any "vote a new master" type of tools for Slony-I, and also, if the original master comes back online, it has no way to know that a new master has been elected. So I'd have to write a bunch of SOAP services or something to do all of this. I would consider PGCluster, but it seems to be a patch to Postgres itself. I'm reluctant to introduce such a major piece of technology into our entire system, when only one tiny part of it needs the replication service. Thanks, Craig ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Replication
Andreas Kostyrka wrote: Slony provides near instantaneous failovers (in the single digit seconds range). You can script an automatic failover if the master server becomes unreachable. But Slony slaves are read-only, correct? So the system isn't fully functional once the master goes down. That leaves you the problem of restarting your app (or making it reconnect) to the new master. Don't you have to run a Slony app to convert one of the slaves into the master? 5-10MB data implies such a fast initial replication, that making the server rejoin the cluster by setting it up from scratch is not an issue. The problem is to PREVENT it from rejoining the cluster. If you have some semi-automatic process that detects the dead server and converts a slave to the master, and in the mean time the dead server manages to reboot itself (or its network gets fixed, or whatever the problem was), then you have two masters sending out updates, and you're screwed. The problem is, there don't seem to be any "vote a new master" type of tools for Slony-I, and also, if the original master comes back online, it has no way to know that a new master has been elected. So I'd have to write a bunch of SOAP services or something to do all of this. You don't need SOAP services, and you do not need to elect a new master. if dbX goes down, dbY takes over, you should be able to decide on a static takeover pattern easily enough. I can't see how that is true. Any self-healing distributed system needs something like the following: - A distributed system of nodes that check each other's health - A way to detect that a node is down and to transmit that information across the nodes - An election mechanism that nominates a new master if the master fails - A way for a node coming online to determine if it is a master or a slave Any solution less than this can cause corruption because you can have two nodes that both think they're master, or end up with no master and no process for electing a master. As far as I can tell, Slony doesn't do any of this. Is there a simpler solution? I've never heard of one. The point here is, that the servers need to react to a problem, but you probably want to get the admin on duty to look at the situation as quickly as possible anyway. No, our requirement is no administrator interaction. We need instant, automatic recovery from failure so that the system stays online. Furthermore, you need to checkout pgpool, I seem to remember that it has some bad habits in routing queries. (E.g. it wants to apply write queries to all nodes, but slony makes the other nodes readonly. Furthermore, anything inside a BEGIN is sent to the master node, which is bad with some ORMs, that by default wrap any access into a transaction) I should have been more clear about this. I was planning to use PGPool in the PGPool-1 mode (not the new PGPool-2 features that allow replication). So it would only be acting as a failover mechanism. Slony would be used as the replication mechanism. I don't think I can use PGPool as the replicator, because then it becomes a new single point of failure that could bring the whole system down. If you're using it for INSERT/UPDATE, then there can only be one PGPool server. I was thinking I'd put a PGPool server on every machine in failover mode only. It would have the Slony master as the primary connection, and a Slony slave as the failover connection. The applications would route all INSERT/UPDATE statements directly to the Slony master, and all SELECT statements to the PGPool on localhost. When the master failed, all of the PGPool servers would automatically switch to one of the Slony slaves. This way, the system would keep running on the Slony slaves (so it would be read-only), until a sysadmin could get the master Slony back online. And when the master came online, the PGPool servers would automatically reconnect and write-access would be restored. Does this make sense? Craig ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Replication
Markus Schiltknecht wrote: Not quite... there's still Postgres-R, see www.postgres-r.org And I'm continuously working on it, despite not having updated the website for almost a year now... I planned on releasing the next development snapshot together with 8.3, as that seems to be delayed, that seems realistic ;-) Is Postgres-R the same thing as Slony-II? There's a lot of info and news around about Slony-II, but your web page doesn't seem to mention it. While researching replication solutions, I had a heck of a time sorting out the dead or outdated web pages (like the stuff on gborg) from the active projects. Either way, it's great to know you're working on it. Craig ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Volunteer to build a configuration tool
This is my idea: A JavaScript HTML page that would have some basic questions at the top: 1) How much memory do you have? 2) How many connections will be made to the database? 3) What operating system do you use? 4) Etc… Next the person would press a button, “generate”, found below the questions. The JavaScript HTML page would then generate content for two Iframes at the bottom on the page. One Iframe would contain the contents of the postgresql.conf file. The postgresql.conf settings would be tailored more to the individuals needs than the standard default file. The second Iframe would contain the default settings one should consider using with their operating system. My web team would be very happy to develop this for the PostgreSQL project. It would have saved us a lot of time by having a configuration tool in the beginning. I am willing to make this a very high priority for my team. Hi Lance, I agree that having a page that can assist in generating a base configuration file is an excellent way to start off with a good configuration that can assist a system administrator in getting half way to a good configuration. We've recently gone through a process of configuring a machine and it is a time consuming task of testing and benchmarking various configuration details. My thoughts: Using the browser is a great idea as a universal platform. I can foreseen a problem in that some users won't have GUI access to the machine that they are setting up. I don't have much direct experience in this field, but I suspect that a great number of installations happen 'headless'? This can easily be circumvented by hosting the configuration builder on a public internet site, possibly postgresql.org? Also, Javascript isn't the easiest language to use to get all the decisions that need to be made for various configuration options. Would it not be a better idea to host a configuration builder centrally, possible on postgresql.org and have the documentation reference it, including the docs that come packaged with postgresql (README, INSTALL documentation?). This would mean that you wouldn't be able to package the configuration builder, but you would be able to implement more application logic and more complex decision making in a hosted application. Of course, I have no idea of the skills that your team already have :) To add ideas: perhaps a more advanced tool would be able to add comment indicating a suggested range for the particular setting. For example, with 2Gb of RAM, it chooses a workmem of, say, 768Mb, with a comment indicating a suggested range of 512Mb - 1024Mb. Thanks for taking the time to put this together and for offering the services of your team. Kind regards, James begin:vcard fn:James Neethling n:Neethling;James org:Silver Sphere Business Solutions adr:Centurion Business Park A2;;25633 Democracy Way;Prosperity Park;Milnerton;Cape Town;7441 email;internet:[EMAIL PROTECTED] title:Managing Member tel;work:27 21 552 7108 tel;fax:27 21 552 7106 tel;cell:27 83 399 2799 x-mozilla-html:FALSE url:http://www.silversphere.co.za version:2.1 end:vcard ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] rtree/gist index taking enormous amount of space in 8.2.3
Dolafi, Tom wrote: min(fmin) | max(fmin)|avg(fmin) 1 | 55296469 |11423945 min(fmax) | max(fmax)|avg(fmax) 18 | 3288 |11424491 There are 5,704,211 rows in the table. When you're looking for weird index problems, it's more interesting to know if there are certain numbers that occur a LOT. From your statistics above, each number occurs about 10 times in the table. But do some particular numbers occur thousands, or even millions, of times? Here is a query that will print a list of the highest-occuring values. You might expect a few occurances of 20, and maybe 30, but if you have thousands or millions of occurances of certain numbers, then that can screw up an index. select fmax, c from (select fmax, count(fmax) as c from your_table group by fmax) as foo where c > 3 order by c desc; Craig ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Join with lower/upper limits doesn't scale well
I have the same schema in two different databases. In "smalldb", the two tables of interest have about 430,000 rows, in "bigdb", the two tables each contain about 5.5 million rows. I'm processing the data, and for various reasons it works out well to process it in 100,000 row chunks. However, it turns out for the big schema, selecting 100,000 rows is the longest single step of the processing. Below is the explain/analyze output of the query from each database. Since both tables are indexed on the joined columns, I don't understand why the big table should be so much slower -- I hoped this would scale well, or at least O(log(N)), not O(N). What's going on here? I don't know if I'm reading this right, but it looks like the sort is taking all the time, but that doesn't make sense because in both cases it's sorting 100,000 rows. Thanks, Craig bigdb=> explain analyze bigdb-> select r.row_num, m.molkeys from my_rownum r bigdb-> join my_molkeys m on (r.version_id = m.version_id) bigdb-> where r.row_num >= 10 AND r.row_num < 20 bigdb-> order by r.row_num; Sort (cost=431000.85..431248.23 rows=98951 width=363) (actual time=46306.748..46417.448 rows=10 loops=1) Sort Key: r.row_num -> Hash Join (cost=2583.59..422790.68 rows=98951 width=363) (actual time=469.010..45752.131 rows=10 loops=1) Hash Cond: ("outer".version_id = "inner".version_id) -> Seq Scan on my_molkeys m (cost=0.00..323448.30 rows=5472530 width=363) (actual time=11.243..33299.933 rows=5472532 loops=1) -> Hash (cost=2336.21..2336.21 rows=98951 width=8) (actual time=442.260..442.260 rows=10 loops=1) -> Index Scan using i_chm_rownum_row_num on my_rownum r (cost=0.00..2336.21 rows=98951 width=8) (actual time=47.551..278.736 rows=10 loops=1) Index Cond: ((row_num >= 10) AND (row_num < 20)) Total runtime: 46543.163 ms smalldb=> explain analyze smalldb-> select r.row_num, m.molkeys from my_rownum r smalldb-> join my_molkeys m on (r.version_id = m.version_id) smalldb-> where r.row_num >= 10 AND r.row_num < 20 smalldb-> order by r.row_num; Sort (cost=43598.23..43853.38 rows=102059 width=295) (actual time=4097.180..4207.733 rows=10 loops=1) Sort Key: r.row_num -> Hash Join (cost=2665.09..35107.41 rows=102059 width=295) (actual time=411.635..3629.756 rows=10 loops=1) Hash Cond: ("outer".version_id = "inner".version_id) -> Seq Scan on my_molkeys m (cost=0.00..23378.90 rows=459590 width=295) (actual time=8.563..2011.455 rows=459590 loops=1) -> Hash (cost=2409.95..2409.95 rows=102059 width=8) (actual time=402.867..402.867 rows=10 loops=1) -> Index Scan using i_chm_rownum_row_num_8525 on my_rownum r (cost=0.00..2409.95 rows=102059 width=8) (actual time=37.122..242.528 rows=10 loops=1) Index Cond: ((row_num >= 10) AND (row_num < 20)) Total runtime: 4333.501 ms Table "bigdb.my_rownum" Column | Type | Modifiers +-+--- version_id | integer | parent_id | integer | row_num| integer | Indexes: "i_chm_rownum_row_num" UNIQUE, btree (row_num) "i_chm_rownum_version_id" UNIQUE, btree (version_id) "i_chm_rownum_parent_id" btree (parent_id) Table "bigdb.my_molkeys" Column | Type | Modifiers +-+--- version_id | integer | molkeys| text| Indexes: "i_chm_molkeys_version_id" UNIQUE, btree (version_id) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Equivalent queries produce different plans
The two queries below produce different plans. select r.version_id, r.row_num, m.molkeys from my_rownum r join my_molkeys m on (r.version_id = m.version_id) where r.version_id >= 320 and r.version_id < 330 order by r.version_id; select r.version_id, r.row_num, m.molkeys from my_rownum r join my_molkeys m on (r.version_id = m.version_id) where r.version_id >= 320 and r.version_id < 330 and m.version_id >= 320 and m.version_id < 330 order by r.version_id; I discovered this while looking at the plans for the first query. It seemed to be ignoring the fact that it could push the "between" condition along to the second table, since the condition and the join are on the same indexed columns. So, I added a redundant condition, and bingo, it was a lot faster. In the analysis shown below, the timing (about 1.0 and 1.5 seconds respectively) are for a "hot" database that's been queried a couple of times. In real life on a "cold" database, the times are more like 10 seconds and 21 seconds, so it's quite significant. Thanks, Craig db=> explain analyze db-> select r.version_id, r.row_num, m.molkeys from my_rownum r db-> join my_molkeys m on (r.version_id = m.version_id) db-> where r.version_id >= 320 db-> and r.version_id < 330 db-> order by r.version_id; Sort (cost=264979.51..265091.06 rows=44620 width=366) (actual time=1424.126..1476.048 rows=46947 loops=1) Sort Key: r.version_id -> Nested Loop (cost=366.72..261533.64 rows=44620 width=366) (actual time=41.649..1186.331 rows=46947 loops=1) -> Bitmap Heap Scan on my_rownum r (cost=366.72..41168.37 rows=44620 width=8) (actual time=41.616..431.783 rows=46947 loops=1) Recheck Cond: ((version_id >= 320) AND (version_id < 330)) -> Bitmap Index Scan on i_chm_rownum_version_id_4998 (cost=0.00..366.72 rows=44620 width=0) (actual time=21.244..21.244 rows=46947 loops=1) Index Cond: ((version_id >= 320) AND (version_id < 330)) -> Index Scan using i_chm_molkeys_version_id on my_molkeys m (cost=0.00..4.93 rows=1 width=362) (actual time=0.009..0.010 rows=1 loops=46947) Index Cond: ("outer".version_id = m.version_id) Total runtime: 1534.638 ms (10 rows) db=> explain analyze db-> select r.version_id, r.row_num, m.molkeys from my_rownum r db-> join my_molkeys m on (r.version_id = m.version_id) db-> where r.version_id >= 320 db-> and r.version_id <330 db-> and m.version_id >= 320 db-> and m.version_id <330 db-> order by r.version_id; Sort (cost=157732.20..157732.95 rows=298 width=366) (actual time=985.383..1037.423 rows=46947 loops=1) Sort Key: r.version_id -> Hash Join (cost=41279.92..157719.95 rows=298 width=366) (actual time=502.875..805.402 rows=46947 loops=1) Hash Cond: ("outer".version_id = "inner".version_id) -> Index Scan using i_chm_molkeys_version_id on my_molkeys m (cost=0.00..115717.85 rows=47947 width=362) (actual time=0.023..117.270 rows=46947 loops=1) Index Cond: ((version_id >= 320) AND (version_id < 330)) -> Hash (cost=41168.37..41168.37 rows=44620 width=8) (actual time=502.813..502.813 rows=46947 loops=1) -> Bitmap Heap Scan on my_rownum r (cost=366.72..41168.37 rows=44620 width=8) (actual time=41.621..417.508 rows=46947 loops=1) Recheck Cond: ((version_id >= 320) AND (version_id < 330)) -> Bitmap Index Scan on i_chm_rownum_version_id_4998 (cost=0.00..366.72 rows=44620 width=0) (actual time=21.174..21.174 rows=46947 loops=1) Index Cond: ((version_id >= 320) AND (version_id < 330)) Total runtime: 1096.031 ms (12 rows) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Equivalent queries produce different plans
Sorry, I forgot to mention: This is 8.1.4, with a fairly ordinary configuration on a 4 GB system. Craig Craig James wrote: The two queries below produce different plans. select r.version_id, r.row_num, m.molkeys from my_rownum r join my_molkeys m on (r.version_id = m.version_id) where r.version_id >= 320 and r.version_id < 330 order by r.version_id; select r.version_id, r.row_num, m.molkeys from my_rownum r join my_molkeys m on (r.version_id = m.version_id) where r.version_id >= 320 and r.version_id < 330 and m.version_id >= 320 and m.version_id < 330 order by r.version_id; I discovered this while looking at the plans for the first query. It seemed to be ignoring the fact that it could push the "between" condition along to the second table, since the condition and the join are on the same indexed columns. So, I added a redundant condition, and bingo, it was a lot faster. In the analysis shown below, the timing (about 1.0 and 1.5 seconds respectively) are for a "hot" database that's been queried a couple of times. In real life on a "cold" database, the times are more like 10 seconds and 21 seconds, so it's quite significant. Thanks, Craig db=> explain analyze db-> select r.version_id, r.row_num, m.molkeys from my_rownum r db-> join my_molkeys m on (r.version_id = m.version_id) db-> where r.version_id >= 320 db-> and r.version_id < 330 db-> order by r.version_id; Sort (cost=264979.51..265091.06 rows=44620 width=366) (actual time=1424.126..1476.048 rows=46947 loops=1) Sort Key: r.version_id -> Nested Loop (cost=366.72..261533.64 rows=44620 width=366) (actual time=41.649..1186.331 rows=46947 loops=1) -> Bitmap Heap Scan on my_rownum r (cost=366.72..41168.37 rows=44620 width=8) (actual time=41.616..431.783 rows=46947 loops=1) Recheck Cond: ((version_id >= 320) AND (version_id < 330)) -> Bitmap Index Scan on i_chm_rownum_version_id_4998 (cost=0.00..366.72 rows=44620 width=0) (actual time=21.244..21.244 rows=46947 loops=1) Index Cond: ((version_id >= 320) AND (version_id < 330)) -> Index Scan using i_chm_molkeys_version_id on my_molkeys m (cost=0.00..4.93 rows=1 width=362) (actual time=0.009..0.010 rows=1 loops=46947) Index Cond: ("outer".version_id = m.version_id) Total runtime: 1534.638 ms (10 rows) db=> explain analyze db-> select r.version_id, r.row_num, m.molkeys from my_rownum r db-> join my_molkeys m on (r.version_id = m.version_id) db-> where r.version_id >= 320 db-> and r.version_id <330 db-> and m.version_id >= 320 db-> and m.version_id <330 db-> order by r.version_id; Sort (cost=157732.20..157732.95 rows=298 width=366) (actual time=985.383..1037.423 rows=46947 loops=1) Sort Key: r.version_id -> Hash Join (cost=41279.92..157719.95 rows=298 width=366) (actual time=502.875..805.402 rows=46947 loops=1) Hash Cond: ("outer".version_id = "inner".version_id) -> Index Scan using i_chm_molkeys_version_id on my_molkeys m (cost=0.00..115717.85 rows=47947 width=362) (actual time=0.023..117.270 rows=46947 loops=1) Index Cond: ((version_id >= 320) AND (version_id < 330)) -> Hash (cost=41168.37..41168.37 rows=44620 width=8) (actual time=502.813..502.813 rows=46947 loops=1) -> Bitmap Heap Scan on my_rownum r (cost=366.72..41168.37 rows=44620 width=8) (actual time=41.621..417.508 rows=46947 loops=1) Recheck Cond: ((version_id >= 320) AND (version_id < 330)) -> Bitmap Index Scan on i_chm_rownum_version_id_4998 (cost=0.00..366.72 rows=44620 width=0) (actual time=21.174..21.174 rows=46947 loops=1) Index Cond: ((version_id >= 320) AND (version_id < 330)) Total runtime: 1096.031 ms (12 rows) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] pg_restore causes 100
Here's an oddity. I have 10 databases, each with about a dozen connections to Postgres (about 120 connections total), and at midnight they're all idle. These are mod_perl programs (like a FastCGI -- they stay connected so they're ready for instant service). So using "ps -ef" and grep, we find one of the databases looks like this: postgres 22708 7619 0 Jul11 ?00:00:06 postgres: chemmega chemmega 192.168.10.192(46915) idle postgres 22709 7619 0 Jul11 ?00:00:06 postgres: chemmega chemmega 192.168.10.192(46916) idle postgres 22710 7619 0 Jul11 ?00:00:06 postgres: chemmega chemmega 192.168.10.192(46917) idle postgres 22711 7619 0 Jul11 ?00:00:06 postgres: chemmega chemmega 192.168.10.192(46918) idle postgres 22712 7619 0 Jul11 ?00:00:06 postgres: chemmega chemmega 192.168.10.192(46919) idle postgres 22724 7619 0 Jul11 ?00:00:06 postgres: chemmega chemmega 192.168.10.192(42440) idle postgres 22725 7619 0 Jul11 ?00:00:06 postgres: chemmega chemmega 192.168.10.192(42441) idle postgres 22726 7619 0 Jul11 ?00:00:06 postgres: chemmega chemmega 192.168.10.192(42442) idle postgres 22727 7619 0 Jul11 ?00:00:06 postgres: chemmega chemmega 192.168.10.192(42443) idle postgres 22728 7619 0 Jul11 ?00:00:06 postgres: chemmega chemmega 192.168.10.192(42444) idle postgres 22731 7619 0 Jul11 ?00:00:06 postgres: chemmega chemmega 192.168.10.192(42447) idle Now here's the weird thing. I'm running a pg_restore of a database (on the order of 4GB compressed, maybe 34M rows of ordinary data, and 15M rows in one BLOB table that's typically 2K per blob). When I do this, ALL of the postgress backends start working at about 1% CPU apiece. This means that the 120 "idle" postgres backends are together using almost 100% of one CPU on top of the 100% CPU being used by pg_restore. See the output of top(1) below. Is this normal? All I can guess at is that something's going on in shared memory that every Postgres backend has to respond to. Thanks, Craig Tasks: 305 total, 1 running, 304 sleeping, 0 stopped, 0 zombie Cpu(s): 33.5% us, 1.5% sy, 0.0% ni, 57.8% id, 6.6% wa, 0.2% hi, 0.4% si Mem: 4151456k total, 4011020k used, 140436k free,10096k buffers Swap: 2104504k total,94136k used, 2010368k free, 3168596k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 6681 postgres 16 0 217m 188m 161m D 50.4 4.6 4:29.30 postmaster 1577 root 10 -5 000 S 1.0 0.0 108:01.97 md0_raid1 8487 postgres 15 0 187m 8704 4996 S 1.0 0.2 0:06.56 postmaster 8506 postgres 15 0 187m 8604 4892 S 1.0 0.2 0:06.37 postmaster 8507 postgres 15 0 187m 8708 5004 S 1.0 0.2 0:06.42 postmaster 8512 postgres 15 0 187m 8612 4904 S 1.0 0.2 0:06.65 postmaster 8751 postgres 15 0 187m 10m 7520 S 1.0 0.3 0:07.95 postmaster 8752 postgres 15 0 187m 10m 7492 S 1.0 0.3 0:07.84 postmaster 14053 postgres 15 0 187m 8752 5044 S 1.0 0.2 0:06.53 postmaster 16515 postgres 15 0 187m 8156 4452 S 1.0 0.2 0:06.33 postmaster 25351 postgres 15 0 187m 9772 6064 S 1.0 0.2 0:06.75 postmaster 25387 postgres 15 0 187m 8444 4752 S 1.0 0.2 0:06.45 postmaster
Re: [PERFORM] select on 1milion register = 6s
Bruno Rodrigues Siqueira wrote: Who can help me? My SELECT in a base with 1 milion register, using expression index = 6seconds… Run your query using EXPLAIN ANALYZE SELECT ... your query ... and then post the results to this newsgroup. Nobody can help until they see the results of EXPLAIN ANALYZE. Also, include all other relevant information, such as Postgres version, operating system, amount of memory, and any changes you have made to the Postgres configuration file. Craig ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Slow query with backwards index scan
Tilmann Singer wrote: * [EMAIL PROTECTED] <[EMAIL PROTECTED]> [20070728 21:05]: Let's try putting the sort/limit in each piece of the UNION to speed them up separately. SELECT * FROM ( (SELECT * FROM large_table lt WHERE lt.user_id = 12345 ORDER BY created_at DESC LIMIT 10) AS q1 UNION (SELECT * FROM large_table lt WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=12345) ORDER BY created_at DESC LIMIT 10) AS q2 ORDER BY created_at DESC LIMIT 10; It's not possible to use ORDER BY or LIMIT within unioned queries. http://www.postgresql.org/docs/8.2/static/sql-select.html#SQL-UNION If I'm reading this documentation correctly, it *is* possible, as long as they're inside of a sub-select, as in this case. Craig ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Performance on 8CPU's and 32GB of RAM
Scott Marlowe wrote: And there's the issue that with windows / NTFS that when one process opens a file for read, it locks it for all other users. This means that things like virus scanners can cause odd, unpredictable failures of your database. Can you provide some justification for this? James ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Performance on 8CPU's and 32GB of RAM
Scott Marlowe wrote: Where unixes generally outperform windows is in starting up new backends, better file systems, and handling very large shared_buffer settings. Why do you think that UNIX systems are better at handling large shared buffers than Wndows? 32 bit Windows systems can suffer from fragmented address space, to be sure, but if the performance of the operating-system supplied mutex or semaphore isn't good enough, you can just use the raw atomic ops. If what you mean is that pg has a design that's heavily oriented towards things that tend to be cheap on POSIX and doesn't use the core Win32 features effectively, then let's track that as an optimisation opportunity for the Win32 port. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Performance on 8CPU's and 32GB of RAM
Carlo Stonebanks wrote: Isn't it just easier to assume that Windows Server can't do anything right? ;-) Well, avoiding the ;-) - people do, and its remarkably foolish of them. Its a long-standing whinge that many people with a UNIX-background seem to just assume that Windows sucks, but you could run 40,000 sockets from a single Win32 process for a while and some well-known UNIX systems would still struggle to do this, libevent or no. Admitedly, the way a Win32 app is architected would be rather different from a typical POSIX one. Windows has been a cheap target bt its remarkably adequate and the TPC results speak for themselves. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] What to vacuum after deleting lots of tables
If I delete a whole bunch of tables (like 10,000 tables), should I vacuum system tables, and if so, which ones? (This system is still on 8.1.4 and isn't running autovacuum). Thanks, Craig ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Tablespaces and NFS
Carlos Moreno wrote: Anyone has tried a setup combining tablespaces with NFS-mounted partitions? There has been some discussion of this recently, you can find it in the archives (http://archives.postgresql.org/). The word seems to be that NFS can lead to data corruption. Craig ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Low CPU Usage
Luiz K. Matsumura wrote: Is connected to full 100Mb, it transfers many things quick to clients. Is running Apache adn JBoss, transfer rate is good, I did scp to copy many archives and is as quick as the old server. I have no idea how to continue researching this problem. Now I'm going to do some networks tests. Any chance this is your desktop machine, and you're also using it for audio? Microsoft built in a feature (!) that reduces network speed by 90% when music is playing: http://it.slashdot.org/article.pl?sid=07/08/26/1628200&from=rss Craig ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Block at a time ...
On 3/17/10 2:52 AM, Greg Stark wrote: On Wed, Mar 17, 2010 at 7:32 AM, Pierre C wrote: I was thinking in something like that, except that the factor I'd use would be something like 50% or 100% of current size, capped at (say) 1 GB. This turns out to be a bad idea. One of the first thing Oracle DBAs are told to do is change this default setting to allocate some reasonably large fixed size rather than scaling upwards. This might be mostly due to Oracle's extent-based space management but I'm not so sure. Recall that the filesystem is probably doing some rounding itself. If you allocate 120kB it's probably allocating 128kB itself anyways. Having two layers rounding up will result in odd behaviour. In any case I was planning on doing this a while back. Then I ran some experiments and couldn't actually demonstrate any problem. ext2 seems to do a perfectly reasonable job of avoiding this problem. All the files were mostly large contiguous blocks after running some tests -- IIRC running pgbench. This is one of the more-or-less solved problems in Unix/Linux. Ext* file systems have a "reserve" usually of 10% of the disk space that nobody except root can use. It's not for root, it's because with 10% of the disk free, you can almost always do a decent job of allocating contiguous blocks and get good performance. Unless Postgres has some weird problem that Linux has never seen before (and that wouldn't be unprecedented...), there's probably no need to fool with file-allocation strategies. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Block at a time ...
On 3/22/10 11:47 AM, Scott Carey wrote: On Mar 17, 2010, at 9:41 AM, Craig James wrote: On 3/17/10 2:52 AM, Greg Stark wrote: On Wed, Mar 17, 2010 at 7:32 AM, Pierre C wrote: I was thinking in something like that, except that the factor I'd use would be something like 50% or 100% of current size, capped at (say) 1 GB. This turns out to be a bad idea. One of the first thing Oracle DBAs are told to do is change this default setting to allocate some reasonably large fixed size rather than scaling upwards. This might be mostly due to Oracle's extent-based space management but I'm not so sure. Recall that the filesystem is probably doing some rounding itself. If you allocate 120kB it's probably allocating 128kB itself anyways. Having two layers rounding up will result in odd behaviour. In any case I was planning on doing this a while back. Then I ran some experiments and couldn't actually demonstrate any problem. ext2 seems to do a perfectly reasonable job of avoiding this problem. All the files were mostly large contiguous blocks after running some tests -- IIRC running pgbench. This is one of the more-or-less solved problems in Unix/Linux. Ext* file systems have a "reserve" usually of 10% of the disk space that nobody except root can use. It's not for root, it's because with 10% of the disk free, you can almost always do a decent job of allocating contiguous blocks and get good performance. Unless Postgres has some weird problem that Linux has never seen before (and that wouldn't be unprecedented...), there's probably no need to fool with file-allocation strategies. Craig Its fairly easy to break. Just do a parallel import with say, 16 concurrent tables being written to at once. Result? Fragmented tables. Is this from real-life experience? With fragmentation, there's a point of diminishing return. A couple head-seeks now and then hardly matter. My recollection is that even when there are lots of concurrent processes running that are all making files larger and larger, the Linux file system still can do a pretty good job of allocating mostly-contiguous space. It doesn't just dumbly allocate from some list, but rather tries to allocate in a way that results in pretty good "contiguousness" (if that's a word). On the other hand, this is just from reading discussion groups like this one over the last few decades, I haven't tried it... Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] mysql to postgresql, performance questions
Hannu Krosing wrote: Pulling the plug should not corrupt a postgreSQL database, unless it was using disks which lie about write caching. Didn't we recently put the old wife's 'the disks lied' tale to bed in favour of actually admiting that some well known filesystems and saftware raid systems have had trouble with their write barriers? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] why does swap not recover?
On 3/26/10 4:57 PM, Richard Yen wrote: Hi everyone, We've recently encountered some swapping issues on our CentOS 64GB Nehalem machine, running postgres 8.4.2. Unfortunately, I was foolish enough to set shared_buffers to 40GB. I was wondering if anyone would have any insight into why the swapping suddenly starts, but never recovers? http://richyen.com/i/swap.png";> Note, the machine has been up and running since mid-December 2009. It was only a March 8 that this swapping began, and it's never recovered. If we look at dstat, we find the following: http://richyen.com/i/dstat.png";> Note that it is constantly paging in, but never paging out. This happens when you have too many processes using too much space to fit in real memory, but none of them are changing their memory image. If the system swaps a process in, but that process doesn't change anything in memory, then there are no dirty pages and the kernel can just kick the process out of memory without writing anything back to the swap disk -- the data in the swap are still valid. It's a classic problem when processes are running round-robin. Say you have space for 100 processes, but you're running 101 process. When you get to the #101, #1 is the oldest so it swaps out. Then #1 runs, and #2 is the oldest, so it gets kicked out. Then #2 runs and kicks out #3 ... and so forth. Going from 100 to 101 process brings the system nearly to a halt. Some operating systems try to use tricks to keep this from happening, but it's a hard problem to solve. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Occasional giant spikes in CPU load
Most of the time Postgres runs nicely, but two or three times a day we get a huge spike in the CPU load that lasts just a short time -- it jumps to 10-20 CPU loads. Today it hit 100 CPU loads. Sometimes days go by with no spike events. During these spikes, the system is completely unresponsive (you can't even login via ssh). I managed to capture one such event using top(1) with the "batch" option as a background process. See output below - it shows 19 active postgress processes, but I think it missed the bulk of the spike. For some reason, every postgres backend suddenly decides (is told?) to do something. When this happens, the system become unusable for anywhere from ten seconds to a minute or so, depending on how much web traffic stacks up behind this event. We have two servers, one offline and one public, and they both do this, so it's not caused by actual web traffic (and the Apache logs don't show any HTTP activity correlated with the spikes). I thought based on other posts that this might be a background-writer problem, but it's not I/O, it's all CPU as far as I can tell. Any ideas where I can look to find what's triggering this? 8 CPUs, 8 GB memory 8-disk RAID10 (10k SATA) Postgres 8.3.0 Fedora 8, kernel is 2.6.24.4-64.fc8 Diffs from original postgres.conf: max_connections = 1000 shared_buffers = 2000MB work_mem = 256MB max_fsm_pages = 1600 max_fsm_relations = 625000 synchronous_commit = off wal_buffers = 256kB checkpoint_segments = 30 effective_cache_size = 4GB escape_string_warning = off Thanks, Craig top - 11:24:59 up 81 days, 20:27, 4 users, load average: 0.98, 0.83, 0.92 Tasks: 366 total, 20 running, 346 sleeping, 0 stopped, 0 zombie Cpu(s): 30.6%us, 1.5%sy, 0.0%ni, 66.3%id, 1.5%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 8194800k total, 8118688k used,76112k free, 36k buffers Swap: 2031608k total, 169348k used, 1862260k free, 7313232k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 18972 postgres 20 0 2514m 11m 8752 R 11 0.1 0:00.35 postmaster 10618 postgres 20 0 2514m 12m 9456 R9 0.2 0:00.54 postmaster 10636 postgres 20 0 2514m 11m 9192 R9 0.1 0:00.45 postmaster 25903 postgres 20 0 2514m 11m 8784 R9 0.1 0:00.21 postmaster 10626 postgres 20 0 2514m 11m 8716 R6 0.1 0:00.45 postmaster 10645 postgres 20 0 2514m 12m 9352 R6 0.2 0:00.42 postmaster 10647 postgres 20 0 2514m 11m 9172 R6 0.1 0:00.51 postmaster 18502 postgres 20 0 2514m 11m 9016 R6 0.1 0:00.23 postmaster 10641 postgres 20 0 2514m 12m 9296 R5 0.2 0:00.36 postmaster 10051 postgres 20 0 2514m 13m 10m R4 0.2 0:00.70 postmaster 10622 postgres 20 0 2514m 12m 9216 R4 0.2 0:00.39 postmaster 10640 postgres 20 0 2514m 11m 8592 R4 0.1 0:00.52 postmaster 18497 postgres 20 0 2514m 11m 8804 R4 0.1 0:00.25 postmaster 18498 postgres 20 0 2514m 11m 8804 R4 0.1 0:00.22 postmaster 10341 postgres 20 0 2514m 13m 9m R2 0.2 0:00.57 postmaster 10619 postgres 20 0 2514m 12m 9336 R1 0.2 0:00.38 postmaster 15687 postgres 20 0 2321m 35m 35m R0 0.4 8:36.12 postmaster -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Occasional giant spikes in CPU load
On 4/7/10 2:40 PM, Joshua D. Drake wrote: On Wed, 2010-04-07 at 14:37 -0700, Craig James wrote: Most of the time Postgres runs nicely, but two or three times a day we get a huge spike in the CPU load that lasts just a short time -- it jumps to 10-20 CPU loads. Today it hit 100 CPU loads. Sometimes days go by with no spike events. During these spikes, the system is completely unresponsive (you can't even login via ssh). I managed to capture one such event using top(1) with the "batch" option as a background process. See output below - it shows 19 active postgress processes, but I think it missed the bulk of the spike. What does iostat 5 say during the jump? It's very hard to say ... I'll have to start a background job to watch for a day or so. While it's happening, you can't login, and any open windows become unresponsive. I'll probably have to run it at high priority using nice(1) to get any data at all during the event. Would vmstat be informative? Thanks, Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Occasional giant spikes in CPU load
On 4/7/10 3:36 PM, Joshua D. Drake wrote: On Wed, 2010-04-07 at 14:45 -0700, Craig James wrote: On 4/7/10 2:40 PM, Joshua D. Drake wrote: On Wed, 2010-04-07 at 14:37 -0700, Craig James wrote: Most of the time Postgres runs nicely, but two or three times a day we get a huge spike in the CPU load that lasts just a short time -- it jumps to 10-20 CPU loads. Today it hit 100 CPU loads. Sometimes days go by with no spike events. During these spikes, the system is completely unresponsive (you can't even login via ssh). I managed to capture one such event using top(1) with the "batch" option as a background process. See output below - it shows 19 active postgress processes, but I think it missed the bulk of the spike. What does iostat 5 say during the jump? It's very hard to say ... I'll have to start a background job to watch for a day or so. While it's happening, you can't login, and any open windows become unresponsive. I'll probably have to run it at high priority using nice(1) to get any data at all during the event. Do you have sar runing? Say a sar -A ? No, I don't have it installed. I'll have a look. At first glance it looks like a combination of what I can get with "top -b" and vmstat, but with a single program. My guess is that it is not CPU, it is IO and your CPU usage is all WAIT on IO. To have your CPUs so flooded that they are the cause of an inability to log in is pretty suspect. I thought so too, except that I can't login during the flood. If the CPUs were all doing iowaits, logging in should be easy. Greg's suggestion that shared_buffers and work_mem are too big for an 8 GB system fits these symptoms -- if it's having a swap storm, login is effectively impossible. Craig Joshua D. Drake Thanks, Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Occasional giant spikes in CPU load
On 4/7/10 2:59 PM, Tom Lane wrote: Craig James writes: Most of the time Postgres runs nicely, but two or three times a day we get a huge spike in the CPU load that lasts just a short time -- it jumps to 10-20 CPU loads. Today it hit 100 CPU loads. Sometimes days go by with no spike events. During these spikes, the system is completely unresponsive (you can't even login via ssh). I managed to capture one such event using top(1) with the "batch" option as a background process. See output below - it shows 19 active postgress processes, but I think it missed the bulk of the spike. Any ideas where I can look to find what's triggering this? Postgres 8.3.0 ^ If it's really 8.3.0, try updating to 8.3.something-recent. We've fixed a whole lot of bugs since then. Good advice, I've been meaning to do this, maybe this will be a kick in the pants to motivate me. I have a suspicion that this might be an sinval overrun scenario, in which case you'd need to update to 8.4 to get a real fix. But updating in the 8.3 branch would be cheap and easy. If it is sinval overrun, it would presumably be triggered by a whole lot of catalog changes being made at approximately the same time. Can you correlate the spikes with anything like that? Not that I know of. Just regular web traffic. On the backup server these events happen occasionally even when there is little or no web traffic, and nobody logged in doing maintenance. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Occasional giant spikes in CPU load
On 4/7/10 5:47 PM, Robert Haas wrote: On Wed, Apr 7, 2010 at 6:56 PM, David Rees wrote: max_fsm_pages = 1600 max_fsm_relations = 625000 synchronous_commit = off You are playing with fire here. You should never turn this off unless you do not care if your data becomes irrecoverably corrupted. That is not correct. Turning off synchronous_commit is sensible if you don't mind losing the last few transactions on a crash. What will corrupt your database is if you turn off fsync. A bit off the original topic, but ... I set it this way because I was advised that with a battery-backed RAID controller, this was a safe setting. Is that not the case? Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Dell Perc HX00 RAID controllers: What's inside?
Now that it's time to buy a new computer, Dell has changed their RAID models from the Perc6 to Perc H200 and such. Does anyone know what's inside these? I would hope they've stuck with the Megaraid controller... Also, I can't find any info on Dell's site about how these devices can be configured. I was thinking of ten disks, as OS: RAID1 WAL: RAID1 Database: RAID10 using 6 disks But it's not clear to me if these RAID controllers can handle multible arrays, or if you need a separate controller for each array. We're a small shop and I only get to do this every year or so, and everything changes in between purchases! Thanks, Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issues when the number of records are around 10 Million
On 5/12/10 4:55 AM, Kevin Grittner wrote: venu madhav wrote: we display in sets of 20/30 etc. The user also has the option to browse through any of those records hence the limit and offset. Have you considered alternative techniques for paging? You might use values at the edges of the page to run a small query (limit, no offset) when they page. You might generate all the pages on the first pass and cache them for a while. Kevin is right. You need to you "hitlists" - a semi-temporary table that holds the results of your initial query. You're repeating a complex, expensive query over and over, once for each page of data that the user wants to see. Instead, using a hitlist, your initial query looks something like this: create table hitlist_xxx( objectid integer, sortorder integer default nextval('hitlist_seq') ); insert into hitlist_xxx (objectid) (select ... your original query ... order by ...) You store some object ID or primary key in the "hitlist" table, and the sequence records your original order. Then when your user asks for page 1, 2, 3 ... N, all you have to do is join your hitlist to your original data: select ... from mytables join hitlist_xxx on (...) where sortorder >= 100 and sortorder < 120; which would instantly return page 5 of your data. To do this, you need a way to know when a user is finished so that you can discard the hitlist. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL Function Language Performance: C vs PL/PGSQL
On 5/26/10 9:47 AM, Stephen Frost wrote: * Eliot Gable (egable+pgsql-performa...@gmail.com) wrote: Since PostgreSQL is written in C, I assume there is no such additional overhead. I assume that the PL/PGSQL implementation at its heart also uses SPI to perform those executions. Is that a fair statement? Right, but I also wouldn't expect a huge improvment either, unless you're calling these queries a ton, or the queries that you're calling from the pl/pgsql are pretty short-lived. Don't get me wrong, C is going to be faster, but it depends on exactly what's going on as to if it's going to be an overall improvment of, say, 10%, or a 10-fold improvment. :) Or a 0.1% improvement, which is more likely. Or that the PL/PGSQL version is even faster than the C version, because if you do any string regexp in your function, Perl has extremely efficient algorithms, probably better than you have time to write in C. We use Perl extensively and have never had any complaints. The database activity completely dominates all queries, and the performance of Perl has never even been noticable. We use a C functions for a few things, and it is a big nuisance. Every time you upgrade Postgres or your OS, there's a chance the recompile will fail because of changed header files. Any bugs in your code crash Postgres itself. We avoid C as much as possible (and I love C, been doing it since 1984). Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Does FILTER in SEQSCAN short-circuit AND?
On 5/18/10 3:28 PM, Carlo Stonebanks wrote: Sample code: SELECT * FROM MyTable WHERE foo = 'bar' AND MySlowFunc('foo') = 'bar' Let's say this required a SEQSCAN because there were no indexes to support column foo. For every row where foo <> 'bar' would the filter on the SEQSCAN short-circuit the AND return false right away, or would it still execute MySlowFunc('foo') ? I asked a similar question a few years back, and the answer is that the planner just makes a guess and applies it to all functions. It has no idea whether your function is super fast or incredibly slow, they're all assigned the same cost. In this fairly simple case, the planner might reasonably guess that "foo = 'bar'" will always be faster than "AnyFunc(foo) = 'bar'". But for real queries, that might not be the case. In my case, I have a function that is so slow that it ALWAYS is good to avoid it. Unfortunately, there's no way to explain that to Postgres, so I have to use other tricks to force the planner not to use it. select * from (select * from MyTable where foo = 'bar' offset 0) where MySlowFunc(foo) = 'bar'; The "offset 0" prevents the planner from collapsing this query back into your original syntax. It will only apply MySlowFunc() to rows where you already know that foo = 'bar'. It would be nice if Postgres had a way to assign a cost to every function. Until then, you have to use convoluted SQL if you have a really slow function. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Does FILTER in SEQSCAN short-circuit AND?
On 5/27/10 2:28 PM, Kevin Grittner wrote: Craig James wrote: It would be nice if Postgres had a way to assign a cost to every function. The COST clause of CREATE FUNCTION doesn't do what you want? http://www.postgresql.org/docs/8.4/interactive/sql-createfunction.html Cool ... I must have missed it when this feature was added. Nice! Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Weird XFS WAL problem
I'm testing/tuning a new midsize server and ran into an inexplicable problem. With an RAID10 drive, when I move the WAL to a separate RAID1 drive, TPS drops from over 1200 to less than 90! I've checked everything and can't find a reason. Here are the details. 8 cores (2x4 Intel Nehalem 2 GHz) 12 GB memory 12 x 7200 SATA 500 GB disks 3WARE 9650SE-12ML RAID controller with bbu 2 disks: RAID1 500GB ext4 blocksize=4096 8 disks: RAID10 2TB, stripe size 64K, blocksize=4096 (ext4 or xfs - see below) 2 disks: hot swap Ubuntu 10.04 LTS (Lucid) With xfs or ext4 on the RAID10 I got decent bonnie++ and pgbench results (this one is for xfs): Version 1.03e --Sequential Output-- --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP argon24064M 70491 99 288158 25 129918 16 65296 97 428210 23 558.9 1 --Sequential Create-- Random Create -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 23283 81 + +++ 13775 56 20143 74 + +++ 15152 54 argon,24064M,70491,99,288158,25,129918,16,65296,97,428210,23,558.9,1,16,23283,81,+,+++,13775,56,20143\ ,74,+,+++,15152,54 pgbench -i -s 100 -U test pgbench -c 10 -t 1 -U test scaling factor: 100 query mode: simple number of clients: 10 number of transactions per client: 1 number of transactions actually processed: 10/10 tps = 1046.104635 (including connections establishing) tps = 1046.337276 (excluding connections establishing) Now the mystery: I moved the pg_xlog directory to a RAID1 array (same 3WARE controller, two more SATA 7200 disks). Run the same tests and ... tps = 82.325446 (including connections establishing) tps = 82.326874 (excluding connections establishing) I thought I'd made a mistake, like maybe I moved the whole database to the RAID1 array, but I checked and double checked. I even watched the lights blink - the WAL was definitely on the RAID1 and the rest of Postgres on the RAID10. So I moved the WAL back to the RAID10 array, and performance jumped right back up to the >1200 TPS range. Next I check the RAID1 itself: dd if=/dev/zero of=./bigfile bs=8192 count=200 which yielded 98.8 MB/sec - not bad. bonnie++ on the RAID1 pair showed good performance too: Version 1.03e --Sequential Output-- --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP argon24064M 68601 99 110057 18 46534 6 59883 90 123053 7 471.3 1 --Sequential Create-- Random Create -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 + +++ + +++ + +++ + +++ + +++ + +++ argon,24064M,68601,99,110057,18,46534,6,59883,90,123053,7,471.3,1,16,+,+++,+,+++,+,+++,+,\ +++,+,+++,+,+++ So ... anyone have any idea at all how TPS drops to below 90 when I move the WAL to a separate RAID1 disk? Does this make any sense at all? It's repeatable. It happens for both ext4 and xfs. It's weird. You can even watch the disk lights and see it: the RAID10 disks are on almost constantly when the WAL is on the RAID10, but when you move the WAL over to the RAID1, its lights are dim and flicker a lot, like it's barely getting any data, and the RAID10 disk's lights barely go on at all. Thanks, Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Weird XFS WAL problem
On 6/2/10 4:40 PM, Mark Kirkwood wrote: On 03/06/10 11:30, Craig James wrote: I'm testing/tuning a new midsize server and ran into an inexplicable problem. With an RAID10 drive, when I move the WAL to a separate RAID1 drive, TPS drops from over 1200 to less than 90! I've checked everything and can't find a reason. Are the 2 new RAID1 disks the same make and model as the 12 RAID10 ones? Yes. Also, are barriers *on* on the RAID1 mount and off on the RAID10 one? It was the barriers. "barrier=1" isn't just a bad idea on ext4, it's a disaster. pgbench -i -s 100 -U test pgbench -c 10 -t 1 -U test Change WAL to barrier=0 tps = 1463.264981 (including connections establishing) tps = 1463.725687 (excluding connections establishing) Change WAL to noatime, nodiratime, barrier=0 tps = 1479.331476 (including connections establishing) tps = 1479.810545 (excluding connections establishing) Change WAL to barrier=1 tps = 82.325446 (including connections establishing) tps = 82.326874 (excluding connections establishing) This is really hard to believe, because the bonnie++ numbers and dd(1) numbers look good (see my original post). But it's totally repeatable. It must be some really unfortunate "just missed the next sector going by the write head" problem. So with ext4, bonnie++ and dd aren't the whole story. BTW, I also learned that if you edit /etc/fstab and use "mount -oremount" it WON'T change "barrier=0/1" unless it is explicit in the fstab file. That is, if you put "barrier=0" into /etc/fstab and use the remount, it will change it to no barriers. But if you then remove it from /etc/fstab, it won't change it back to the default. You have to actually put "barrier=1" if you want to get it back to the default. This seems like a bug to me, and it made it really hard to track this down. "mount -oremount" is not the same as umount/mount! Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need to increase performance of a query
On 6/10/10 12:34 PM, Anne Rosset wrote: Jochen Erwied wrote: Thursday, June 10, 2010, 8:36:08 PM you wrote: psrdb=# (SELECT psrdb(# MAX(item_rank.rank) AS maxRank psrdb(# FROM psrdb(# item_rank item_rank psrdb(# WHERE psrdb(# item_rank.project_id='proj2783' psrdb(# AND item_rank.pf_id IS NULL psrdb(# psrdb(# ) psrdb-# ORDER BY psrdb-# maxRank DESC; Don't think it does really matter, but why do you sort a resultset consisting of only one row? Sorry, I should have removed the ORDER by (the full query has a union). So without the ORDER by, here are the results: psrdb=# SELECT psrdb-# MAX(item_rank.rank) AS maxRank psrdb-# FROM psrdb-# item_rank item_rank psrdb-# WHERE psrdb-# item_rank.pf_id='plan1408'; maxrank - 2050400 (1 row) Time: 1.516 ms psrdb=# SELECT psrdb-# MAX(item_rank.rank) AS maxRank psrdb-# FROM psrdb-# item_rank item_rank psrdb-# WHERE psrdb-# item_rank.project_id='proj2783' psrdb-# AND item_rank.pf_id IS NULL; maxrank - 202 (1 row) Time: 13.177 ms Is there anything that can be done for the second one? Postgres normally doesn't index NULL values even if the column is indexed, so it has to do a table scan when your query includes an IS NULL condition. You need to create an index that includes the "IS NULL" condition. create index item_rank_null_idx on item_rank(pf_id) where item_rank.pf_id is null; Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL as a local in-memory cache
[oops, didn't hit "reply to list" first time, resending...] On 6/15/10 9:02 AM, Steve Wampler wrote: Chris Browne wrote: "jgard...@jonathangardner.net" writes: My question is how can I configure the database to run as quickly as possible if I don't care about data consistency or durability? That is, the data is updated so often and it can be reproduced fairly rapidly so that if there is a server crash or random particles from space mess up memory we'd just restart the machine and move on. For such a scenario, I'd suggest you: - Set up a filesystem that is memory-backed. On Linux, RamFS or TmpFS are reasonable options for this. - The complication would be that your "restart the machine and move on" needs to consist of quite a few steps: - recreating the filesystem - fixing permissions as needed - running initdb to set up new PG instance - automating any needful fiddling with postgresql.conf, pg_hba.conf - starting up that PG instance - creating users, databases, schemas, ... How about this: Set up a database entirely on a RAM disk, then install a WAL-logging warm standby. If the production computer goes down, you bring the warm standby online, shut it down, and use tar(1) to recreate the database on the production server when you bring it back online. You have speed and you have near-100% backup. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL as a local in-memory cache
On 6/16/10 12:00 PM, Josh Berkus wrote: * fsync=off => 5,100 * fsync=off and synchronous_commit=off => 5,500 Now, this *is* interesting ... why should synch_commit make a difference if fsync is off? Anyone have any ideas? I found that pgbench has "noise" of about 20% (I posted about this a couple days ago using data from 1000 identical pgbench runs). Unless you make a bunch of runs and average them, a difference of 5,100 to 5,500 appears to be meaningless. Craig tmpfs, WAL on same tmpfs: * Default config: 5,200 * full_page_writes=off => 5,200 * fsync=off => 5,250 * synchronous_commit=off => 5,200 * fsync=off and synchronous_commit=off => 5,450 * fsync=off and full_page_writes=off => 5,250 * fsync=off, synchronous_commit=off and full_page_writes=off => 5,500 So, in this test, it seems like having WAL on tmpfs doesn't make a significant difference for everything == off. I'll try running some tests on Amazon when I have a chance. It would be worthwhile to get figures without Python's "ceiling". -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] requested shared memory size overflows size_t
Can anyone tell me what's going on here? I hope this doesn't mean my system tables are corrupt... Thanks, Craig select relname, pg_relation_size(relname) from pg_class where pg_get_userbyid(relowner) = 'emol_warehouse_1' and relname not like 'pg_%' order by pg_relation_size(relname) desc; ERROR: relation "rownum_temp" does not exist emol_warehouse_1=> select relname from pg_class where relname = 'rownum_temp'; relname -- rownum_temp (1 row) emol_warehouse_1=> \d rownum_temp Did not find any relation named "rownum_temp". emol_warehouse_1=> create table rownum_temp(i int); CREATE TABLE emol_warehouse_1=> drop table rownum_temp; DROP TABLE emol_warehouse_1=> select relname, pg_relation_size(relname) from pg_class where pg_get_userbyid(relowner) = 'emol_warehouse_1' and relname not like 'pg_%' order by pg_relation_size(relname) desc; ERROR: relation "rownum_temp" does not exist emol_warehouse_1=> select relname, pg_relation_size(relname) from pg_class; ERROR: relation "tables" does not exist -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] System tables screwed up? (WAS requested shared memory size overflows size_t)
On 6/24/10 4:19 PM, Alvaro Herrera wrote: Excerpts from Craig James's message of jue jun 24 19:03:00 -0400 2010: select relname, pg_relation_size(relname) from pg_class where pg_get_userbyid(relowner) = 'emol_warehouse_1' and relname not like 'pg_%' order by pg_relation_size(relname) desc; ERROR: relation "rownum_temp" does not exist emol_warehouse_1=> select relname from pg_class where relname = 'rownum_temp'; relname -- rownum_temp (1 row) What's the full row? I'd just add a "WHERE relkind = 'r'" to the above query anyway. Thanks, in fact that works. But my concern is that these are system tables and system functions and yet they seem to be confused. I've used this query dozens of times and never seen this behavior before. It makes me really nervous... Craig P.S. Sorry I got the Subject wrong the first time by hitting the REPLY key mindlessly, I've changed it now. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Occasional giant spikes in CPU load
I'm reviving this question because I never figured it out. To summarize: At random intervals anywhere from a few times per hour to once or twice a day, we see a huge spike in CPU load that essentially brings the system to a halt for up to a minute or two. Previous answers focused on "what is it doing", i.e. is it really Postgres or something else? Now the question has narrowed down to this: what could trigger EVERY postgres backend to do something at the same time? See the attached output from "top -b", which shows what is happening during one of the CPU spikes. A little background about our system. We have roughly 100 FastCGI clients connected at all times that are called on to generate images from data in the database. Even though there are a lot of these, they don't do much. They sit there most of the time, then they spew out a couple dozen GIF images in about one second as a user gets a new page of data. Each GIF image requires fetching a single row using a single indexed column, so it's a trival amount of work for Postgres. We also have the "heavy lift" application that does the search. Typically one or two of these is running at a time, and takes from a fraction of a second to a few minutes to complete. In this particular instance, immediately before this spike, the CPU load was only at about 10% -- a couple users poking around with easy queries. So what is it that will cause every single Postgres backend to come to life at the same moment, when there's no real load on the server? Maybe if a backend crashes? Some other problem? There's nothing in the serverlog. Thanks, Craig top - 12:15:09 up 81 days, 21:18, 4 users, load average: 0.38, 0.38, 0.73 Tasks: 374 total, 95 running, 279 sleeping, 0 stopped, 0 zombie Cpu(s): 62.5%us, 2.2%sy, 0.0%ni, 34.9%id, 0.2%wa, 0.0%hi, 0.1%si, 0.0%st Mem: 8194800k total, 7948928k used, 245872k free, 36k buffers Swap: 2031608k total, 161136k used, 1870472k free, 7129744k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 22120 postgres 20 0 2514m 17m 13m R 11 0.2 0:01.02 postmaster 18497 postgres 20 0 2514m 11m 8832 R6 0.1 0:00.62 postmaster 22962 postgres 20 0 2514m 12m 9548 R6 0.2 0:00.22 postmaster 24002 postgres 20 0 2514m 11m 8804 R6 0.1 0:00.15 postmaster 25900 postgres 20 0 2514m 11m 8824 R6 0.1 0:00.55 postmaster 8941 postgres 20 0 2324m 6172 4676 R5 0.1 0:00.32 postmaster 10622 postgres 20 0 2514m 12m 9444 R5 0.2 0:00.79 postmaster 14021 postgres 20 0 2514m 11m 8548 R5 0.1 0:00.28 postmaster 14075 postgres 20 0 2514m 11m 8672 R5 0.1 0:00.27 postmaster 14423 postgres 20 0 2514m 11m 8572 R5 0.1 0:00.29 postmaster 18896 postgres 20 0 2324m 5644 4204 R5 0.1 0:00.11 postmaster 18897 postgres 20 0 2514m 12m 9800 R5 0.2 0:00.27 postmaster 18928 postgres 20 0 2514m 11m 8792 R5 0.1 0:00.18 postmaster 18973 postgres 20 0 2514m 11m 8792 R5 0.1 0:00.70 postmaster 22049 postgres 20 0 2514m 17m 14m R5 0.2 0:01.11 postmaster 22050 postgres 20 0 2514m 16m 13m R5 0.2 0:01.06 postmaster 22843 postgres 20 0 2514m 12m 9328 R5 0.2 0:00.20 postmaster 24202 postgres 20 0 2324m 5560 4120 R5 0.1 0:00.07 postmaster 24388 postgres 20 0 2514m 12m 9380 R5 0.2 0:00.16 postmaster 25903 postgres 20 0 2514m 11m 8828 R5 0.1 0:00.55 postmaster 28362 postgres 20 0 2514m 11m 8952 R5 0.1 0:00.48 postmaster 5667 postgres 20 0 2324m 6752 5588 R4 0.1 0:08.93 postmaster 7531 postgres 20 0 2324m 5452 4008 R4 0.1 0:03.21 postmaster 9219 postgres 20 0 2514m 11m 8476 R4 0.1 0:00.89 postmaster 9820 postgres 20 0 2514m 12m 9.9m R4 0.2 0:00.92 postmaster 10050 postgres 20 0 2324m 6172 4676 R4 0.1 0:00.31 postmaster 10645 postgres 20 0 2514m 12m 9512 R4 0.2 0:00.72 postmaster 14582 postgres 20 0 2514m 25m 21m R4 0.3 0:02.10 postmaster 18502 postgres 20 0 2514m 11m 9040 R4 0.1 0:00.64 postmaster 18972 postgres 20 0 2514m 11m 8792 R4 0.1 0:00.76 postmaster 18975 postgres 20 0 2514m 11m 8904 R4 0.1 0:00.63 postmaster 19496 postgres 20 0 2514m 14m 11m R4 0.2 0:00.44 postmaster 22121 postgres 20 0 2514m 16m 13m R4 0.2 0:00.81 postmaster 24340 postgres 20 0 2514m 12m 9424 R4 0.2 0:00.15 postmaster 24483 postgres 20 0 2324m 6008 4536 R4 0.1 0:00.21 postmaster 25668 postgres 20 0 2514m 16m 13m R4 0.2 0:00.91 postmaster 26382 postgres 20 0 2514m 11m 8996 R4 0.1 0:00.50 postmaster 28363 postgres 20 0 2514m 11m 8908 R4 0.1 0:00.34 postmaster 9754 postgres 20 0 2514m 11m 8752 R3 0.1 0:00.29 postmaster 16113 postgres 20 0 2514m 17m 14m R3 0.2 0:01.10 postmaster 18498 postgres 20 0 2514m 11m 8
Re: [PERFORM] Occasional giant spikes in CPU load
On 6/24/10 9:04 PM, Tom Lane wrote: Craig James writes: So what is it that will cause every single Postgres backend to come to life at the same moment, when there's no real load on the server? Maybe if a backend crashes? Some other problem? sinval queue overflow comes to mind ... although that really shouldn't happen if there's "no real load" on the server. What PG version is this? 8.3.10. Upgraded based on your advice when I first asked this question. Also, the pg_stat_activity view contents when this happens would probably be more useful to look at than "top" output. I'll try. It's hard to discover anything because the whole machine is overwhelmed when this happens. The only way I got the top(1) output was by running it high priority as root using nice(1). I can't do that with a Postgres backend, but I'll see what I can do. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Occasional giant spikes in CPU load
On 6/25/10 7:47 AM, Tom Lane wrote: Craig James writes: On 6/24/10 9:04 PM, Tom Lane wrote: sinval queue overflow comes to mind ... although that really shouldn't happen if there's "no real load" on the server. What PG version is this? 8.3.10. Upgraded based on your advice when I first asked this question. Any chance of going to 8.4? If this is what I suspect, you really need this 8.4 fix: http://archives.postgresql.org/pgsql-committers/2008-06/msg00227.php which eliminated the thundering-herd behavior that previous releases exhibit when the sinval queue overflows. Yes, there is a chance of upgrading to 8.4.4. I just bought a new server and it has 8.4.4 on it, but it won't be online for a while so I can't compare yet. This may motivate me to upgrade the current servers to 8.4.4 too. I was pleased to see that 8.4 has a new upgrade-in-place feature that means we don't have to dump/restore. That really helps a lot. A question about 8.4.4: I've been having problems with bloat. I thought I'd adjusted the FSM parameters correctly based on advice I got here, but apparently not. 8.4.4 has removed the configurable FSM parameters completely, which is very cool. But ... if I upgrade a bloated database using the upgrade-in-place feature, will 8.4.4 recover the bloat and return it to the OS, or do I still have to recover the space manually (like vacuum-full/reindex, or cluster, or copy/drop a table)? Or you could look at using connection pooling so you don't have quite so many backends ... I always just assumed that lots of backends that would be harmless if each one was doing very little. If I understand your explanation, it sounds like that's not entirely true in pre-8.4.4 releases due to the sinval queue problems. Thanks, Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Occasional giant spikes in CPU load
On 6/25/10 9:41 AM, Kevin Grittner wrote: Craig James wrote: I always just assumed that lots of backends that would be harmless if each one was doing very little. Even if each is doing very little, if a large number of them happen to make a request at the same time, you can have problems. This is exactly where a connection pool can massively improve both throughput and response time. If you can arrange it, you want a connection pool which will put a limit on active database transactions and queue requests to start a new transaction until one of the pending ones finishes. No, that's doesn't seem to be the case. There is no external activity that triggers this huge spike in usage. It even happens to our backup server when only one of us is using it to do a single query. This problem seems to be triggered by Postgres itself, not by anything external. Per Tom's suggestion, I think upgrading to 8.4.4 is the answer. I'll learn more when our new hardware comes into use with a shiny new 8.4.4 installation. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] pgbench results on a new server
I've got a new server and want to make sure it's running well. Are these pretty decent numbers? 8 cores (2x4 Intel Nehalem 2 GHz) 12 GB memory 12 x 7200 SATA 500 GB disks 3WARE 9650SE-12ML RAID controller with BBU WAL on ext2, 2 disks: RAID1 500GB, blocksize=4096 Database on ext4, 8 disks: RAID10 2TB, stripe size 64K, blocksize=4096 Ubuntu 10.04 LTS (Lucid) Postgres 8.4.4 pgbench -i -s 100 -U test pgbench -c 5 -t 2 -U test tps = 4903 pgbench -c 10 -t 1 -U test tps = 4070 pgbench -c20 -t 5000 -U test tps = 5789 pgbench -c30 -t -U test tps = 6961 pgbench -c40 -t 2500 -U test tps = 2945 Thanks, Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Architecting a database
On 6/25/10 3:28 PM, Kevin Grittner wrote: wrote: With the PostgreSQL type tables I am not so certain how the data is arranged within the one file. Does having the data all in one database allow PostgreSQL to better utilize indexes and caches or does having a number of smaller databases provide performance increases? In case it is important, there are 2000 clients involved, so that would be 2000 databases if I followed my current FoxPro related structure. The implications of putting multiple clients in a table, with a client's rows identified by a client_id column, are probably fairly obvious. If many of those 2,000 clients have tables with millions of rows, performance could suffer without very careful indexing, managing tables with billions of rows can become challenging, and there could be concerns about how to ensure that data from one client isn't accidentally shown to another. You should also ask whether there are social (that is, nontechncal) reasons to avoid multiple clients per table. When a customer asks about security and you tell them, "You get your own database, nobody else can log in," they tend to like that. If you tell them that their data is mixed with everyone else's, but "we've done a really good job with our app software and we're pretty sure there are no bugs that would let anyone see your data," that may not fly. People will trust Postgres security (assuming you actually do it right) because it's an open source, trusted product used by some really big companies. But your own app? Do you even trust it? Even if your application IS secure, it may not matter. It's what the customer believes or worries about that can sell your product. We've also found another really good reason for separate databases. It lets you experiment without any impact on anything else. We have scripts that can create a database in just a few minutes, load it up, and have it ready to demo in just a few minutes. If we don't end up using it, we just blow it off and its gone. No other database is impacted at all. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] pgbench results on a new server
On 6/25/10 12:03 PM, Greg Smith wrote: Craig James wrote: I've got a new server and want to make sure it's running well. Any changes to the postgresql.conf file? Generally you need at least a moderate shared_buffers (1GB or so at a minimum) and checkpoint_segments (32 or higher) in order for the standard pgbench test to give good results. max_connections = 500 shared_buffers = 1000MB work_mem = 128MB synchronous_commit = off full_page_writes = off wal_buffers = 256kB checkpoint_segments = 30 effective_cache_size = 4GB For fun I ran it with the installation defaults, and it never got above 1475 TPS. pgbench -c20 -t 5000 -U test tps = 5789 pgbench -c30 -t -U test tps = 6961 pgbench -c40 -t 2500 -U test tps = 2945 General numbers are OK, the major drop going from 30 to 40 clients is larger than it should be. I'd suggest running the 40 client count one again to see if that's consistent. It is consistent. When I run pgbench from a different server, I get this: pgbench -c40 -t 2500 -U test tps = 7999 pgbench -c100 -t 1000 -U test tps = 6693 Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL as a local in-memory cache
On 6/30/10 9:42 AM, Dave Crooke wrote: I haven't jumped in yet on this thread, but here goes If you're really looking for query performance, then any database which is designed with reliability and ACID consistency in mind is going to inherently have some mis-fit features. Some other ideas to consider, depending on your query mix: 1. MySQL with the MyISAM database (non-ACID) 2. Put an in-application generic query cache in front of the DB, that runs in the app address space, e.g. Cache' if using Java 3. Using a DB is a good way to get generic querying capability, but if the "where" clause in the querying is over a small set of meta-data, and SQL syntax is not a big requirement, consider non-RDBMS alternatives, e.g. use XPath over a W3C DOM object tree to get primary keys to in-memory hash tables (possibly distributed with something like memcached) These would be good suggestions if the "throwaway" database was the only one. But in real life, these throwaway databases are built from other databases that are NOT throwaway, where the data matters and ACID is critical. In other words, they'll probably need Postgres anyway. Sure, you could use both Postgres and MySQL/ISAM, but that means installing and maintaining both, plus building all of the other application layers to work on both systems. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Highly Efficient Custom Sorting
On 7/2/10 6:59 AM, Eliot Gable wrote: Yes, I have two pl/pgsql functions. They take a prepared set of data (just the row id of the original results, plus the particular priority and weight fields) and they return the same set of data with an extra field called "order" which contains a numerical order to apply when sorting the rows. One function uses the priority information to break everything into priority groups, then calls the other function for each priority group. Each time it gets results back from the inner function, it returns that set of results. When it has looped through all priority groups, then it returns the full built-up set of results back to the calling function. The pl/pgsql functions implementing the sort are as optimized as they are likely to get. I don't want to waste my time trying to further optimize pl/pgsql functions that are never going to be as fast and efficient as I need. I would rather spend that time re-writing it in C and get sorting back to <1ms. I guess the real question is, is a generic C sorting function my only real alternative? Is there anything else that would allow me to sort things faster than pl/pgsql functions? For example, if I used pl/perl, would I be able to expect considerably better performance for sorting than using pl/pgsql? What about other supported languages? If I can get close to 1ms sorting performance without resorting to C, it would save me much time and frustration. Try coding it in perl on the server. It is MUCH easier to code, and you don't have to link anything or learn the esoteric details of the Postgres/C API. Perl itself is written in C, and some of it's operations are extremely fast. Depending on the size and complexity of your data structures, Perl code may be just as fast as code you could write in C. Even if it turns out to be slower than you like, it will give you a way to package up your sort functionality into a function call, so if you later find you need to replace the Perl function with a C function, the rest of your application won't change. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance on new linux box
On 7/8/10 9:31 AM, Ryan Wexler wrote: Thanks a lot for all the comments. The fact that both my windows box and the old linux box both show a massive performance improvement over the new linux box seems to point to hardware to me. I am not sure how to test the fsync issue, but i don't see how that could be it. The raid card the server has in it is: 3Ware 4 Port 9650SE-4LPML RAID Card Looking it up, it seems to indicate that it has BBU Make sure the battery isn't dead. Most RAID controllers drop to non-BBU speeds if they detect that the battery is faulty. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance on new linux box
On 7/8/10 12:47 PM, Ryan Wexler wrote: On Thu, Jul 8, 2010 at 12:46 PM, Kevin Grittner mailto:kevin.gritt...@wicourts.gov>> wrote: Ryan Wexler mailto:r...@iridiumsuite.com>> wrote: > One thing I don't understand is why BBU will result in a huge > performance gain. I thought BBU was all about power failures? Well, it makes it safe for the controller to consider the write complete as soon as it hits the RAM cache, rather than waiting for persistence to the disk itself. It can then schedule the writes in a manner which is efficient based on the physical medium. Something like this was probably happening on your non-server machines, but without BBU it was not actually safe. Server class machines tend to be more conservative about not losing your data, but without a RAID controller with BBU cache, that slows writes down to the speed of the rotating disks. -Kevin Thanks for the explanations that makes things clearer. It still amazes me that it would account for a 5x change in IO. It's not exactly a 5x change in I/O, rather it's a 5x change in *transactions*. Without a BBU Postgres has to wait for each transaction to by physically written to the disk, which at 7200 RPM (or 10K or 15K) means a few hundred per second. Most of the time Postgres is just sitting there waiting for the disk to say, "OK, I did it." With BBU, once the RAID card has the data, it's virtually guaranteed it will get to the disk even if the power fails, so the RAID controller says, "OK, I did it" even though the data is still in the controller's cache and not actually on the disk. It means there's no tight relationship between the disk's rotational speed and your transaction rate. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance on new linux box
On 7/8/10 2:18 PM, timothy.noo...@emc.com wrote: How does the linux machine know that there is a BBU installed and to change its behavior or change the behavior of Postgres? I am experiencing performance issues, not with searching but more with IO. It doesn't. It trusts the disk controller. Linux says, "Flush your cache" and the controller says, "OK, it's flushed." In the case of a BBU controller, the controller can say that almost instantly because it's got the data in a battery-backed memory that will survive even if the power goes out. In the case of a non-BBU controller (RAID or non-RAID), the controller has to actually wait for the head to move to the right spot, then wait for the disk to spin around to the right sector, then write the data. Only then can it say, "OK, it's flushed." So to Linux, it just appears to be a disk that's exceptionally fast at flushing its buffers. Craig -Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Craig James Sent: Thursday, July 08, 2010 4:02 PM To: pgsql-performance@postgresql.org Subject: Re: [PERFORM] performance on new linux box On 7/8/10 12:47 PM, Ryan Wexler wrote: On Thu, Jul 8, 2010 at 12:46 PM, Kevin Grittner mailto:kevin.gritt...@wicourts.gov>> wrote: Ryan Wexlermailto:r...@iridiumsuite.com>> wrote: > One thing I don't understand is why BBU will result in a huge > performance gain. I thought BBU was all about power failures? Well, it makes it safe for the controller to consider the write complete as soon as it hits the RAM cache, rather than waiting for persistence to the disk itself. It can then schedule the writes in a manner which is efficient based on the physical medium. Something like this was probably happening on your non-server machines, but without BBU it was not actually safe. Server class machines tend to be more conservative about not losing your data, but without a RAID controller with BBU cache, that slows writes down to the speed of the rotating disks. -Kevin Thanks for the explanations that makes things clearer. It still amazes me that it would account for a 5x change in IO. It's not exactly a 5x change in I/O, rather it's a 5x change in *transactions*. Without a BBU Postgres has to wait for each transaction to by physically written to the disk, which at 7200 RPM (or 10K or 15K) means a few hundred per second. Most of the time Postgres is just sitting there waiting for the disk to say, "OK, I did it." With BBU, once the RAID card has the data, it's virtually guaranteed it will get to the disk even if the power fails, so the RAID controller says, "OK, I did it" even though the data is still in the controller's cache and not actually on the disk. It means there's no tight relationship between the disk's rotational speed and your transaction rate. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Using more tha one index per table
On 7/21/10 5:47 PM, Craig Ringer wrote: On 21/07/10 22:59, Greg Smith wrote: A useful trick to know is that if you replace the version number with "current", you'll get to the latest version most of the time (sometimes the name of the page is changed between versions, too, but this isn't that frequent). The docs pages could perhaps benefit from an auto-generated note saying: "The current version of Pg is 8.4. This documentation is for version 8.2. Click [here] for documentation on the current version." ... or something to that effect. It'd be a nice (and more user-friendly) alternative to url twiddling when searches reveal docs for an old version, and might help push the /current/ pages up in search rank too. In addition, why not use symlinks so that the current version is simply called "current", as in http://www.postgresql.org/docs/current/static/sql-insert.html If you google for "postgres insert", you get this: http://www.postgresql.org/docs/8.1/static/sql-insert.html The problem is that Google ranks pages based on inbound links, so older versions of Postgres *always* come up before the latest version in page ranking. By using "current" and encouraging people to link to that, we could quickly change the Google pagerank so that a search for Postgres would turn up the most-recent version of documentation. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Using more tha one index per table
On 7/21/10 6:47 PM, Greg Smith wrote: Craig James wrote: By using "current" and encouraging people to link to that, we could quickly change the Google pagerank so that a search for Postgres would turn up the most-recent version of documentation. How do you propose to encourage people to do that? If I had a good answer to that question, I'd already be executing on it. I've made a habit of doing that when writing articles on the wiki, which hopefully themselves become popular and then elevate those links (all of the ones http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server for example point to current). I don't know how to target "people who link to the PostgreSQL manual" beyond raising awareness of the issue periodically on these lists, like I did on this thread. You don't have to get everyone to do it. Just get more people to link to "current" than to other versions and you win in the Google ranking. Start by sending an announcement to every PG mailing list. You'd probably get a couple thousand right away, which by itself might do the trick. Once "current" reaches the top of the Google ranking, it will cascade: People searching for Postgres documentation will find "current" first, and will post links to it, which will further reinforce its popularity. There will always be people who link to older versions, but since the versions change frequently and "current" lasts forever, its ranking will constantly build until it ultimately wins. There's no downside to it. It's easy to do. The other ideas (like putting "out of date" disclaimers and such into older versions) might also be useful, but might be a lot of work for just a little more gain. Creating a "current" link is simple and in the long run will be very effective. The sooner it starts, the sooner it will gain traction. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Using more tha one index per table
On 7/23/10 2:22 AM, Torsten Zühlsdorff wrote: Craig James schrieb: A useful trick to know is that if you replace the version number with "current", you'll get to the latest version most of the time (sometimes the name of the page is changed between versions, too, but this isn't that frequent). The docs pages could perhaps benefit from an auto-generated note saying: "The current version of Pg is 8.4. This documentation is for version 8.2. Click [here] for documentation on the current version." ... or something to that effect. It'd be a nice (and more user-friendly) alternative to url twiddling when searches reveal docs for an old version, and might help push the /current/ pages up in search rank too. In addition, why not use symlinks so that the current version is simply called "current", as in http://www.postgresql.org/docs/current/static/sql-insert.html If you google for "postgres insert", you get this: http://www.postgresql.org/docs/8.1/static/sql-insert.html The problem is that Google ranks pages based on inbound links, so older versions of Postgres *always* come up before the latest version in page ranking. Since 2009 you can deal with this by defining the canonical-version. (http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html) This is a really cool feature, but it's not what we need. The "canonical" refers to the URL, not the web page. It's only supposed to be used if you have multiple URLs that are actually the *same* page; the "canonical" URL tells Google "use only this URL for this page." But in our case, the Postgres manuals for each release have different URLs *and* different content, so the "canonical URL" isn't the right solution. Craig Greetings from Germany, Torsten -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Using more tha one index per table
On 7/24/10 5:57 AM, Torsten Zühlsdorff wrote: Craig James schrieb: The problem is that Google ranks pages based on inbound links, so older versions of Postgres *always* come up before the latest version in page ranking. Since 2009 you can deal with this by defining the canonical-version. (http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html) This is a really cool feature, but it's not what we need. The "canonical" refers to the URL, not the web page. It's only supposed to be used if you have multiple URLs that are actually the *same* page; the "canonical" URL tells Google "use only this URL for this page." But in our case, the Postgres manuals for each release have different URLs *and* different content, so the "canonical URL" isn't the right solution. This is true, but the content is allowed to change "a little". Of course their is no percentage of allowed changes. But it can be quite much. I've used this feature for some clients, which push their content into very different websites and it does work. Most of the content of the documentation doesn't change much between the releases. In most cases the canonical will work the way i suggest. In case of big changes even the recommandation of using a "current" version won't work. Its true that Google ranks pages based on inbound links. But there are more than 200 other factores, which influence the rankings. Most people do not know, that changing most of a sites content makes the inbound links for a long time useless. After big changes in the documentation the "current" entry will be droped for some monthes and the old entries will appear. But note, that every single site of the documentation is ranked for itself. From my experience i would expect the canonical-version with better results, than the current-version. But the canonical is not the best solution in my opinion. I often edit the urls of some documentations, because i need it for a special postgresql version. The documentation clearly misses a version-switch. Combined with an big note, that the current displayed documentation is not the one of the current postgresql-version, this will be the best compromiss in my opinion. Here's an idea: Use a "current" URL, plus a JavaScript embedded in every page that compares its own URL to the "current" URL and, if it doesn't match, does a "document.write()" indicating how to find the most-current version. That would solve three problems: 1. There would be a "current" version that people could link to. 2. If someone found an old version, they would know it and could instantly be directed to the current version. 3. It wouldn't be any burden on the web site maintainers, because the JavaScript wouldn't have to be changed. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Two fast searches turn slow when used with OR clause
I can query either my PARENT table joined to PRICES, or my VERSION table joined to PRICES, and get an answer in 30-40 msec. But put the two together, it jumps to 4 seconds. What am I missing here? I figured this query would be nearly instantaneous. The VERSION.ISOSMILES and PARENT.ISOSMILES columns both have unique indexes. Instead of using these indexes, it's doing a full-table scan of both tables, even though there can't possibly be more than one match in each table. I guess I could rewrite this as a UNION of the two subqueries, but that seems contrived. This is PG 8.3.10 on Linux. Thanks, Craig => explain analyze select p.price, p.amount, p.units, s.catalogue_id, vn.version_id -> from plus p join sample s -> on (p.compound_id = s.compound_id and p.supplier_id = s.supplier_id) -> join version vn on (s.version_id = vn.version_id) join parent pn -> on (s.parent_id = pn.parent_id) -> where vn.isosmiles = 'Fc1n1B1OC(C)(C)C(C)(C)O1' -> or pn.isosmiles = 'Fc1n1B1OC(C)(C)C(C)(C)O1' -> order by price; Sort (cost=71922.00..71922.00 rows=1 width=19) (actual time=4337.114..4337.122 rows=10 loops=1) Sort Key: p.price Sort Method: quicksort Memory: 25kB -> Nested Loop (cost=18407.53..71921.99 rows=1 width=19) (actual time=1122.685..4337.028 rows=10 loops=1) -> Hash Join (cost=18407.53..71903.71 rows=4 width=20) (actual time=1122.624..4336.682 rows=7 loops=1) Hash Cond: (s.version_id = vn.version_id) Join Filter: ((vn.isosmiles = 'Fc1n1B1OC(C)(C)C(C)(C)O1'::text) OR (pn.isosmiles = 'Fc1n1B1OC(C)(C)C(C)(C)O1'::text)) -> Hash Join (cost=8807.15..44470.73 rows=620264 width=54) (actual time=431.501..2541.329 rows=620264 loops=1) Hash Cond: (s.parent_id = pn.parent_id) -> Seq Scan on sample s (cost=0.00..21707.64 rows=620264 width=24) (actual time=0.008..471.340 rows=620264 loops=1) -> Hash (cost=5335.40..5335.40 rows=277740 width=38) (actual time=431.166..431.166 rows=277740 loops=1) -> Seq Scan on parent pn (cost=0.00..5335.40 rows=277740 width=38) (actual time=0.012..195.822 rows=277740 loops=1) -> Hash (cost=5884.06..5884.06 rows=297306 width=38) (actual time=467.267..467.267 rows=297306 loops=1) -> Seq Scan on version vn (cost=0.00..5884.06 rows=297306 width=38) (actual time=0.017..215.285 rows=297306 loops=1) -> Index Scan using i_plus_compound_id on plus p (cost=0.00..4.51 rows=4 width=26) (actual time=0.039..0.041 rows=1 loops=7) Index Cond: ((p.supplier_id = s.supplier_id) AND (p.compound_id = s.compound_id)) Total runtime: 4344.222 ms (17 rows) If I only query the VERSION table, it's very fast: x=> explain analyze select p.price, p.amount, p.units, s.catalogue_id, vn.version_id -> from plus p -> join sample s on (p.compound_id = s.compound_id and p.supplier_id = s.supplier_id) -> join version vn on (s.version_id = vn.version_id) -> where vn.isosmiles = 'Fc1n1B1OC(C)(C)C(C)(C)O1' order by price; Sort (cost=45.73..45.74 rows=1 width=19) (actual time=32.438..32.448 rows=10 loops=1) Sort Key: p.price Sort Method: quicksort Memory: 25kB -> Nested Loop (cost=0.00..45.72 rows=1 width=19) (actual time=32.309..32.411 rows=10 loops=1) -> Nested Loop (cost=0.00..36.58 rows=2 width=20) (actual time=32.295..32.319 rows=7 loops=1) -> Index Scan using i_version_isosmiles on version vn (cost=0.00..8.39 rows=1 width=4) (actual time=32.280..32.281 rows=1 loops=1) Index Cond: (isosmiles = 'Fc1n1B1OC(C)(C)C(C)(C)O1'::text) -> Index Scan using i_sample_version_id on sample s (cost=0.00..28.12 rows=6 width=20) (actual time=0.011..0.024 rows=7 loops=1) Index Cond: (s.version_id = vn.version_id) -> Index Scan using i_plus_compound_id on plus p (cost=0.00..4.51 rows=4 width=26) (actual time=0.010..0.011 rows=1 loops=7) Index Cond: ((p.supplier_id = s.supplier_id) AND (p.compound_id = s.compound_id)) Total runtime: 32.528 ms (12 rows) Same good performance if I only query the PARENT table: x=> explain analyze select p.price, p.amount, p.units, s.catalogue_id, pn.parent_id from plus p join sample s on (p.compound_id = s.compound_id and p.supplier_id = s.supplier_id) join parent pn on (s.parent_id = pn.parent_id) where pn.isosmiles = 'Fc1n1B1OC(C)(C)C(C)(C)O1' order by price; QUERY PLAN --- Sort (cost=57.73..57.74 rows=1 width=19) (actual time=43.564..43.564 rows=10 loops=1) Sort Key: p.price Sort Method: quicksort Memory: 25kB -> Nested Loop (cost=0.00..57.72 rows=1 width=19) (a
Re: [PERFORM] Advice configuring ServeRAID 8k for performance
On 8/5/10 11:28 AM, Kenneth Cox wrote: I am using PostgreSQL 8.3.7 on a dedicated IBM 3660 with 24GB RAM running CentOS 5.4 x86_64. I have a ServeRAID 8k controller with 6 SATA 7500RPM disks in RAID 6, and for the OLAP workload it feels* slow My current performance is 85MB/s write, 151 MB/s reads I get 193MB/sec write and 450MB/sec read on a RAID10 on 8 SATA 7200 RPM disks. RAID10 seems to scale linearly -- add disks, get more speed, to the limit of your controller. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] write barrier question
On 8/18/10 12:24 PM, Samuel Gendler wrote: With barriers off, I saw a transaction rate of about 1200. With barriers on, it was closer to 1050. The test had a concurrency of 40 in both cases. I discovered there is roughly 10-20% "noise" in pgbench results after running the exact same test over a 24-hour period on a machine with no other activity. Be sure you run your tests enough times to get good statistics unless you're looking at much larger differences. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Using Between
On 8/27/10 5:21 PM, Ozer, Pam wrote: I have a query that Select Distinct VehicleId From Vehicle Where VehicleMileage between 0 and 15000. I have an index on VehicleMileage. Is there another way to put an index on a between? The index is not being picked up. It does get picked up when I run Select Distinct VehicleId From Vehicle Where VehicleMileage = 15000. I just want to make sure that there is not a special index I should be using. You need to post EXPLAIN ANALYZE of your query. It could be that an index scan is actually not a good plan (for example, a sequential scan might be faster if most of your vehicles have low mileage). Without the EXPLAIN ANALYZE, there's no way to say what's going on. Did you ANALYZE your database after you loaded the data? Craig Thanks *Pam Ozer* -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Held idle connections vs use of a Pooler
On 9/14/10 9:10 AM, mark wrote: Hello, I am relatively new to postgres (just a few months) so apologies if any of you are bearing with me. I am trying to get a rough idea of the amount of bang for the buck I might see if I put in a connection pooling service into the enviroment vs our current methodology of using persistent open connections. We have a number of in house applications that connect to a central Postgres instance. (8.3.7). The box is admitting underpowered with only 8 cores, and 8gb or ram and not great disk IO out of an MSA-70. the database is about 35GB on disk and does mainly (~95%) OTLP type queries. I am currently begging for more ram. Most of the connections from the various apps hold idle connections until they need to execute a query once done go back to holding an open idle connection. (there are ~600 open connections at any given time, and most of the time most are idle) this is typically fine while the number of active queries is low, but some other application (that doesn't use connection pooling or holding open connections when not in use) is hitting the db from time to time with 50-100 small queries (2ms queries from my testing) nearly all at once. when this happens the whole response time goes out the door however). While connection pooling may be a good answer for you, there also appears to be a problem/bug in 8.3.x that may be biting you. My installation is very similar to yours (hundreds of idle "lightweight" connections, occasional heavy use by certain apps). Look at this thread: http://archives.postgresql.org/pgsql-performance/2010-04/msg00071.php On the server that's been upgraded to 8.4.4, we're not seeing this problem. But it's not in full production yet, so I can't say for sure that the CPU spikes are gone. (Unfortunately, the archives.postgresql.org HTML formatting is horrible -- why on Earth can't it wrap lines?) Craig I think from reading this list for a few weeks the answer is move to using connection pooling package elsewhere to better manage incoming connections, with a lower number to the db. I am told this will require some re-working of some app code as I understand pg-pool was tried a while back in our QA environment and server parts of various in-house apps/scripts/..etc started to experience show stopping problems. to help make my case to the devs and various managers I was wondering if someone could expand on what extra work is having to be done while queries run and there is a high (500-600) number of open yet idle connections to db. lots of the queries executed use sub-transactions if that makes a difference. basically what I am paying extra for with that many persistent connections, that I might save if I go to the effort of getting the in-house stuff to make use of a connection pooler ? thank you for your time. ..: mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
On 10/9/10 6:47 PM, Scott Marlowe wrote: On Sat, Oct 9, 2010 at 5:26 PM, Neil Whelchel wrote: I know that there haven been many discussions on the slowness of count(*) even when an index is involved because the visibility of the rows has to be checked. In the past I have seen many suggestions about using triggers and tables to keep track of counts and while this works fine in a situation where you know what the report is going to be ahead of time, this is simply not an option when an unknown WHERE clause is to be used (dynamically generated). I ran into a fine example of this when I was searching this mailing list, "Searching in 856,646 pages took 13.48202 seconds. Site search powered by PostgreSQL 8.3." Obviously at some point count(*) came into play here because the site made a list of pages (1 2 3 4 5 6> next). I very commonly make a list of pages from search results, and the biggest time killer here is the count(*) portion, even worse yet, I sometimes have to hit the database with two SELECT statements, one with OFFSET and LIMIT to get the page of results I need and another to get the amount of total rows so I can estimate how many pages of results are available. The point I am driving at here is that since building a list of pages of results is such a common thing to do, there need to be some specific high speed ways to do this in one query. Maybe an estimate(*) that works like count but gives an answer from the index without checking visibility? I am sure that this would be good enough to make a page list, it is really no big deal if it errors on the positive side, maybe the list of pages has an extra page off the end. I can live with that. What I can't live with is taking 13 seconds to get a page of results from 850,000 rows in a table. 99% of the time in the situations you don't need an exact measure, and assuming analyze has run recently, select rel_tuples from pg_class for a given table is more than close enough. I'm sure wrapping that in a simple estimated_rows() function would be easy enough to do. First of all, it's not true. There are plenty of applications that need an exact answer. Second, even if it is only 1%, that means it's 1% of the queries, not 1% of people. Sooner or later a large fraction of developers will run into this. It's probably been the most-asked question I've seen on this forum in the four years I've been here. It's a real problem, and it needs a real solution. I know it's a hard problem to solve, but can we stop hinting that those of us who have this problem are somehow being dense? Thanks, Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UUID performance as primary key
On 10/15/10 6:58 PM, Navkirat Singh wrote: I am interested in finding out the pros/cons of using UUID as a primary key field. My requirement states that UUID would be perfect in my case as I will be having many small databases which will link up to a global database using the UUID. Hence, the need for a unique key across all databases. You left out one piece of information: How many keys per second do you need? We put a sequence in the global database that all secondary databases use to get their IDs. It means an extra connect/disconnect (a pooler can minimize this), so if you're issuing thousands of IDs per second, this isn't a good idea. But for a small-ish number of IDs per second, it gets you the benefit of a universal ID without the size of the UUID field. Craig (the other one) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] BBU Cache vs. spindles
Kevin Grittner wrote: On what do you base that assumption? I assume that we send a full 8K to the OS cache, and the file system writes disk sectors according to its own algorithm. With either platters or BBU cache, the data is persisted on fsync; why do you see a risk with one but not the other? Surely 'the data is persisted sometime after our write and before the fsynch returns, but may be written: - in small chunks - out of order - in an unpredictable way' When I looked at the internals of TokyoCabinet for example, the design was flawed but would be 'fairly robust' so long as mmap'd pages that were dirtied did not get persisted until msync, and were then persisted atomically. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] CPUs for new databases
>>>>> "JB" == Josh Berkus writes: JB> In a general workload, fewer faster cores are better. We do not scale JB> perfectly across cores. The only case where that's not true is JB> maintaining lots of idle connections, and that's really better dealt JB> with in software. I've found that ram speed is the most limiting factor I've run into for those cases where the db fits in RAM. The less efficient lookups run just as fast when the CPU is in powersving mode as in performance, which implies that the cores are mostly waiting on RAM (cache or main). I suspect cache size and ram speed will be the most important factors until the point where disk i/o speed and capacity take over. I'm sure some db applications run computaionally expensive queries on the server, but most queries seem light on computaion and heavy on gathering and comparing. It can help to use recent versions of gcc with -march=native. And recent versions of glibc offer improved string ops on recent hardware. -JimC -- James Cloos OpenPGP: 1024D/ED7DAEA6 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] BBU Cache vs. spindles
Tom Lane wrote: The other and probably worse problem is that there's no application control over how soon changes to mmap'd pages get to disk. An msync will flush them out, but the kernel is free to write dirty pages sooner. So if they're depending for consistency on writes not happening until msync, it's broken by design. (This is one of the big reasons we don't use mmap'd space for Postgres disk buffers.) Well, I agree that it sucks for the reason you give - but you use write and that's *exactly* the same in terms of when it gets written, as when you update a byte on an mmap'd page. And you're quite happy to use write. The only difference is that its a lot more explicit where the point of 'maybe its written and maybe it isn't' occurs. There need be no real difference in the architecture for one over the other: there does seem to be evidence that write and read can have better forward-read and write-behind behaviour, because read/write does allow you to initiate an IO with a hint to a size that exceeds a hardware page. And yes, after getting into the details while starting to port TC to Windows, I decided to bin it. Especially handy that SQLite3 has WAL now. (And one last dig - TC didn't even have a checksum that would let you tell when it had been broken: but it might all be fixed now of course, I don't have time to check.) James -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance