Re: [PERFORM] optimization ideas for frequent, large(ish) updates in frequently accessed DB?
On Fri, 13 Feb 2004 01:58:34 +0100, "Marinos J. Yannikos" <[EMAIL PROTECTED]> wrote: >I'm looking for ideas that might improve the interactive performance of >the system, without slowing down the updates too much. IOW, you could accept slower updates. Did you actually try and throttle down the insert rate? > Here are the >characteristics of the table and its use: > >- approx. 2 million rows Doesn't sound worrying. What's the min/max/average size of these rows? How large is this table? SELECT relpages FROM pg_class WHERE relname='...'; What else is in this database, how many tables, how large is the database (du $PGDATA)? >- approx. 4-5 million rows per day are replaced in short bursts of >1-200k rows (average ~3000 rows per update) How often do you VACUUM [ANALYSE]? >- the table needs 6 indexes (not all indexes are used all the time, but >keeping them all the time slows the system down less than re-creating >some of them just before they're needed and dropping them afterwards) I agree. >- an "update" means that 1-200k rows with a common value in a particular >field are replaced with an arbitrary number of new rows (with the same >value in that field), i.e.: > >begin transaction; > delete from t where id=5; > insert into t (id,...) values (5,...); > ... [1-200k rows] >end; This is a wide variation in the number of rows. You told us the average batch size is 3000. Is this also a *typical* batch size? And what is the number of rows where you start to get the feeling that it slows down other sessions? Where do the new values come from? I don't think they are typed in :-) Do they come from external sources or from the same database? If the latter, INSERT INTO ... SELECT ... might help. >The problem is, that a large update of this kind can delay SELECT >queries running in parallel for several seconds, so the web interface >used by several people will be unusable for a short while. Silly question: By SELECT you mean pure SELECT transactions and not some transaction that *mostly* reads from the database? I mean, you are sure your SELECT queries are slowed down and not blocked by the "updates". Show us the EXPLAIN ANALYSE output for the same SELECT, once when it is fast and once when it is slow. BTW, what is fast and what is slow? >Currently, I'm using temporary tables: > [...] >This is slightly faster than inserting directly into t (and probably >faster than using COPY, even though using that might reduce the overall >load on the database). You might try using a prepared INSERT statement or COPY. >shared_buffers=10 >(I tried many values, this seems to work well for us - 12GB RAM) >wal_buffers=500 >sort_mem=80 >checkpoint_segments=16 >effective_cache_size=100 See Josh's comments. >Any help/suggestions would be greatly appreciated... Even if it's >something like "you need a faster db box, there's no other way" ;-) We have to find out, what is the bottleneck. Tell us about your environment (hardware, OS, ...). Run top and/or vmstat and look for significant differences between times of normal processing and slow phases. Post top/vmstat output here if you need help. Servus Manfred ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Disappointing performance in db migrated from MS SQL Server
On Thu, Feb 12, 2004 at 05:19:27PM -0500, Chris Ruprecht wrote: > what he's doing, this might be a bottleneck. I don't like the virtual memory > strategy of Linux too much and would rather increase this to 1 - 2 GB for the > Postgres DB - Specially since he's not running anything else on the machine > and he has 4 GB to play with. Have you ever had luck with 2G of shared memory? When I have tried that, the system is very fast initially, and gradually slows to molasses-like speed. My hypothesis is that the cache-lookup logic isn't that smart, and so is inefficient either when using the cache or when doing cache maintenance. A -- Andrew Sullivan ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] optimization ideas for frequent, large(ish) updates in frequently accessed DB?
Marinos, while you are busy answering my first set of questions :-), here is an idea that might help even out resource consumption. On Fri, 13 Feb 2004 01:58:34 +0100, "Marinos J. Yannikos" <[EMAIL PROTECTED]> wrote: >begin transaction; > delete from t where id=5; > insert into t (id,...) values (5,...); > ... [1-200k rows] >end; > >The problem is, that a large update of this kind can delay SELECT >queries running in parallel for several seconds, so the web interface >used by several people will be unusable for a short while. CREATE TABLE idmap ( internalid int NOT NULL PRIMARY KEY, visibleid int NOT NULL, active bool NOT NULL ); CREATE INDEX ipmap_visible ON idmap(visibleid); Populate this table with INSERT INTO idmap SELECT id, id, true FROM t; Change SELECT ... FROM t WHERE t.id = 5; to SELECT ... FROM t INNER JOIN idmap ON (idmap.internalid = t.id AND idmap.active) WHERE idmap.visibleid = 5; When you have to replace the rows in t for id=5, start by INSERT INTO idmap VALUES (12345, 5, false); Then repeatedly INSERT INTO t (id, ...) VALUES (12345, ...); at a rate as slow as you can accept. You don't have to wrap all INSERTs into a single transaction, but batching together a few hundred to a few thousand INSERTs will improve performance. When all the new values are in the database, you switch to the new id in one short transaction: BEGIN; UPDATE idmap SET active = false WHERE visibleid = 5 AND active; UPDATE idmap SET active = true WHERE internalid = 12345; COMMIT; Do the cleanup in off-peak hours (pseudocode): FOR delid IN (SELECT internalid FROM idmap WHERE NOT active) BEGIN DELETE FROM t WHERE id = delid; DELETE FROM idmap WHERE internalid = delid; END; VACUUM ANALYSE t; VACUUM ANALYSE idmap; To prevent this cleanup from interfering with INSERTs in progress, you might want to add a "beinginserted" flag to idmap. HTH. Servus Manfred ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Disappointing performance in db migrated from MS SQL Server
All, thanks for your suggestions. I've tweaked my configuration, and I think I've squeezed a little more performance out of the setup. I also tried running several tests simultaneously against postgres and SQL Server, and postgres did much better with the heavy load. My new settings are: max_connections = 50 shared_buffers = 12 # min 16, at least max_connections*2, 8KB each - default is 1000 sort_mem = 8000 # min 64, size in KB - default is 1024 (commented out) effective_cache_size = 375000 # typically 8KB each - default is 1000 (commented out) random_page_cost = 1# units are one sequential page fetch cost - default is 4 (commented out) geqo = true Josh, the disks in the new system should be substantially faster than the old. Both are Ultra160 SCSI RAID 5 arrays, but the new system has 15k RPM disks, as opposed to the 10k RPM disks in the old system. On Feb 12, 2004, at 3:26 PM, Josh Berkus wrote: Leon, Hello all. I am in the midst of porting a large web application from a MS SQL Server backend to PostgreSQL. The migration work is basically complete, and we're at the testing and optimization phase of the project. The results so far have been disappointing, with Postgres performing queries in about the same time as SQL Server even though Postgres is running on a dedicated box with about 4 times the clock speed of the SQL Server box. For a chart of my results, please see http://leonout.com/pggraph.pdf for a graph of some test results. Your settings look ok to start, but we'll probably want to tune them further. Can you post some details of the tests? Include: 1) the query 2) the EXPLAIN ANALYZE results of the query 3) Whether you ran the test as the only connection, or whether you tested multi-user load. The last is fairly important for a SQL Server vs. PostgreSQL test; SQL Server is basically a single-user-database, so like MySQL it appears very fast until you get a bunch o' users on it. Finally, for most queries the disk I/O and the RAM are more important than the CPU clock speed. From the looks of it, you upgraded the CPU + RAM, but did downgraded the disk array as far as database writes are concered; not a terrible effective way to gain performance on your hardware. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Lengthy pg_restore and high iowait?
Courier NewHello again. I'm setting up a backup routine for my new db server. As part of my testing, I'm attempting to pg_restore a pg_dump'ed backup of my database. The database is about 4.3 GB, and the dump file is about 100 MB. I first did a schema-only restore, then started a data-only restore with --disable-triggers to get around the referential integrity issues of reloading the data. The data-only restore has been running for a couple of hours now, and I'm seeing high iowait numbers in top. 15:57:58 up 23:55, 2 users, load average: 2.04, 2.07, 2.01 60 processes: 57 sleeping, 3 running, 0 zombie, 0 stopped CPU states: cpuusernice systemirq softirq iowaitidle total4.0%0.0%0.7% 0.0% 0.0% 43.5% 51.6% cpu000.0%0.0%0.3% 0.0% 0.0% 84.8% 14.7% cpu01 15.7%0.0%1.7% 0.0% 0.0%2.7% 79.6% cpu020.1%0.0%0.7% 0.0% 0.0% 84.2% 14.7% cpu030.2%0.0%0.0% 0.0% 0.0%2.4% 97.4% Mem: 3869544k av, 3849280k used, 20264k free, 0k shrd, 110544k buff 1297452k actv, 2298928k in_d, 57732k in_c Swap: 2040244k av, 0k used, 2040244k free 3576684k cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 8366 postgres 16 0 954M 954M 952M R 4.3 25.2 21:58 1 postmaster 9282 postgres 15 0 00 0 RW0.2 0.0 0:00 2 postmaster 1 root 15 0 496 496 444 S 0.0 0.0 0:05 3 init Questions: * Do these iowait numbers indicate a problem with my setup? * Does anyone have a good method for disabling indexes before a restore and restoring them afterwards? I've spent some time writing scripts to do this, but I have yet to come up with drop/recreate solution that returns my database to the same pre-drop state. Thanks in advance! Leon
Re: [PERFORM] Disappointing performance in db migrated from MS SQL Server
> "LO" == Leon Out <[EMAIL PROTECTED]> writes: LO> Josh, the disks in the new system should be substantially faster than LO> the old. Both are Ultra160 SCSI RAID 5 arrays, but the new system has LO> 15k RPM disks, as opposed to the 10k RPM disks in the old system. If you've got the time, try making your 5 disk array into a RAID10 plus one spare. I found that with that few disks, RAID10 was a better performer for an even mix of read/write to the DB. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D.Khera Communications, Inc. Internet: [EMAIL PROTECTED] Rockville, MD +1-301-869-4449 x806 AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Disappointing performance in db migrated from MS SQL
> Josh, the disks in the new system should be substantially faster than > the old. Both are Ultra160 SCSI RAID 5 arrays, but the new system has > 15k RPM disks, as opposed to the 10k RPM disks in the old system. Spindle speed does not correlate with 'throughput' in any easy way. What controllers are you using for these disks? ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Disappointing performance in db migrated from MS SQL Server
> "LO" == Leon Out <[EMAIL PROTECTED]> writes: LO> project. The results so far have been disappointing, with Postgres LO> performing queries in about the same time as SQL Server even though LO> Postgres is running on a dedicated box with about 4 times the clock LO> speed of the SQL Server box. For a chart of my results, please see LO> http://leonout.com/pggraph.pdf for a graph of some test results. Are you using transactions liberally? If you have large groups of inserts/updates, putting them inside transactions buys you a lot of improvement by batching the writes to the WAL. Also, increase your checkpoint_segments if you do a lot of writes. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D.Khera Communications, Inc. Internet: [EMAIL PROTECTED] Rockville, MD +1-301-869-4449 x806 AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/ ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] slow GIST index creation
Greetings! Why does creation of gist indexes takes significantly more time than normal btree index. Can any configuration changes lead to faster index creation? query: CREATE INDEX co_name_index_idx ON profiles USING gist (co_name_index public.gist_txtidx_ops); regds mallah. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])