Re: [PERFORM] db server load
Hi Scott, as you know since the other thread, I performed some tests: -bash-3.1$ pgbench -c 50 -t 1000 starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 100 number of clients: 50 number of transactions per client: 1000 number of transactions actually processed: 5/5 tps = 377.351354 (including connections establishing) tps = 377.788377 (excluding connections establishing) Some vmstat samplings in the meantime: procs ---memory-- ---swap-- -io --system-- -cpu-- r b swpd free buff cache si sobibo in cs us sy id wa st 0 4 92 127880 8252 329451200 458 12399 2441 14903 22 9 34 35 0 11 49 92 125336 8288 329701600 392 11071 2390 11568 17 7 51 24 0 0 2 92 124548 8304 329776400 126 8249 2291 3829 5 3 64 28 0 0 1 92 127268 7796 329567200 493 11387 2323 14221 23 9 47 21 0 0 2 92 127256 7848 329549200 501 10654 2215 14599 24 9 42 24 0 0 2 92 125772 7892 32956560034 7541 2311 327 0 1 59 40 0 0 1 92 127188 7952 329408400 537 11039 2274 15460 23 10 43 24 0 7 4 92 123816 7996 329862000 253 8946 2284 7310 11 5 52 32 0 0 2 92 126652 8536 329422000 440 9563 2307 9036 13 6 56 25 0 0 10 92 125268 8584 329611600 426 10696 2285 11034 20 9 39 32 0 0 2 92 124168 8604 329725200 104 8385 2319 4162 3 3 40 54 0 0 8 92 123780 8648 329645600 542 11498 2298 16613 25 10 16 48 0 -bash-3.1$ pgbench -t 1 -c 50 starting vacuum...end. transaction type: SELECT only scaling factor: 100 number of clients: 50 number of transactions per client: 1 number of transactions actually processed: 50/50 tps = 8571.573651 (including connections establishing) tps = 8594.357138 (excluding connections establishing) -bash-3.1$ pgbench -t 1 -c 50 -S starting vacuum...end. transaction type: SELECT only scaling factor: 100 number of clients: 50 number of transactions per client: 1 number of transactions actually processed: 50/50 tps = 8571.573651 (including connections establishing) tps = 8594.357138 (excluding connections establishing) (next test is with scaling factor 1) -bash-3.1$ pgbench -t 2 -c 8 -S pgbench starting vacuum...end. transaction type: SELECT only scaling factor: 1 number of clients: 8 number of transactions per client: 2 number of transactions actually processed: 16/16 tps = 11695.895318 (including connections establishing) tps = 11715.603720 (excluding connections establishing) Any comment ? I can give you also some details about database usage of my application: - number of active connections: about 60 - number of idle connections: about 60 Here some number from a mine old pgfouine report: - query peak: 378 queries/s - select: 53,1%, insert 3,8%, update 2,2 %, delete 2,8 % The application is basically a web application and the db size is 37 GB. How would you classify the load ? small/medium/high ? Cheers, ste Scott Marlowe wrote: On Fri, Dec 12, 2008 at 3:07 AM, Stefano Nichele wrote: Hi All, I would like to ask to you, how many connections a db server can handle. I know the question is not so easy, and actually I don't want to known a "number" but something like: - up to 100 connections: small load, low entry server is enough - up to 200 connections: the db server starts to sweat, you need a dedicated medium server - up to 300 connections: hard work, dedicated server - up to 500 connections: hard work, dedicated high level server I would like just to understand when we can talk about small/medium/high load. Well, it's of course more than just how many connections you have. What percentage of the connections are idle? Are you running small tight multi-statement transactions, or huge reporting queries? The db server we have at work routinely has 100+ connections, but of those, there are only a dozen or so actively running, and they are small and transactional in nature. The machine handling this is very overpowered, with 8 opteron cores and 12 disks in a RAID-10 for data and 2 in another RAID-10 for everything else (pg_xlog, logging, etc) under a very nice hardware RAID card with battery backed cache. We've tested it to much higher loads and it's held up quite well. With the current load, and handling a hundred or so transactions per second, the top of top looks like this: top - 14:40:55 up 123 days, 2:24, 1 user, load average: 1.08, 0.97, 1.04 Tasks: 284 total, 1 running, 283 sleeping, 0 stopped, 0 zombie Cpu0 : 2.8%us, 0.4%sy, 0.0%ni, 96.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 2.5%us, 0.3%sy, 0.0%ni, 97.2%id, 0.1%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 2.5%us, 0.2%sy, 0.0%ni, 97.1%id, 0.1%wa, 0.1%hi, 0.0%si, 0.0%st Cpu3 : 10.0%us, 0.7%sy, 0.0%ni, 89.0%id, 0.1%wa, 0.0%hi, 0.2%si, 0.0%st Cpu4 :
Re: [PERFORM] understanding postgres issues/bottlenecks
On Sun, 11 Jan 2009, M. Edward (Ed) Borasky wrote: Where you *will* have some major OS risk is with testing-level software or "bleeding edge" Linux distros like Fedora. I just ran "uptime" on my home machine, and it said 144 days. Debian unstable, on no-name hardware. I guess the last time I rebooted was when I changed the graphics drivers. I can't remember the last time it actually crashed. I guess the moral is, you find a combination of hardware and software that works well, and you may as well stick with it. The most likely things to destabilise the system are drivers for "interesting" pieces of hardware, like graphics cards and (unfortunately) some high-performance RAID cards. Matthew -- A good programmer is one who looks both ways before crossing a one-way street. Considering the quality and quantity of one-way streets in Cambridge, it should be no surprise that there are so many good programmers there. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] multicolumn indexes still efficient if not fully stressed?
Hello, I created a multicolumn index on the columns c_1,..,c_n . If I do use only a true subset of these columns in a SQL query, is the index still efficient? Or is it better to create another multicolumn index defined on this subset? Thanks for any comments! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] multicolumn indexes still efficient if not fullystressed?
On Mon, 2009-01-12 at 18:49 +0100, Jörg Kiegeland wrote: > Hello, > > I created a multicolumn index on the columns c_1,..,c_n . > If I do use only a true subset of these columns in a SQL query, is the > index still efficient? > Or is it better to create another multicolumn index defined on this subset? > > Thanks for any comments! Why would you create a multicolumn index for all columns if that's not what you actually query on? The order of columns matter for multicolumn indexes. Multicolumn indexes work best for queries that use all of the columns in the index, but can also be helpful if at least the leftmost columns in the index are specified in the query. So it depends on the order. If the index is defined on (c_1, c_2, c_3, c_4) and your query includes: "WHERE c_2=val AND c_3=val AND c_4=val", then the index is almost certainly useless. On the other hand, if you were to query "WHERE c_1=val" then if c_1 is highly selective the index would still help. See here: http://www.postgresql.org/docs/8.3/interactive/indexes-multicolumn.html -- Mark Lewis -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] multicolumn indexes still efficient if not fully stressed?
Jörg Kiegeland schrieb: > Hello, > > I created a multicolumn index on the columns c_1,..,c_n . > If I do use only a true subset of these columns in a SQL query, is the > index still efficient? > Or is it better to create another multicolumn index defined on this subset? Create several indexes for each column, since 8.1 PG can use a so called Bitmap Index Scan. Read more about that: - http://www.postgresql-support.de/pgbook/node492.html (in german, i think, you can read that) - http://en.wikipedia.org/wiki/Bitmap_index Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknown) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889° -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] multicolumn indexes still efficient if not fully stressed?
On Mon, Jan 12, 2009 at 12:23 PM, Andreas Kretschmer wrote: > Jörg Kiegeland schrieb: > >> Hello, >> >> I created a multicolumn index on the columns c_1,..,c_n . >> If I do use only a true subset of these columns in a SQL query, is the >> index still efficient? >> Or is it better to create another multicolumn index defined on this subset? > > Create several indexes for each column, since 8.1 PG can use a so called > Bitmap Index Scan. Read more about that: I've found that when you do frequently query on two or more columns, a multi-column index is faster than bitmap scans, especially for larger data sets. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] multicolumn indexes still efficient if not fully stressed?
On Monday 12 January 2009, "Scott Marlowe" wrote: > I've found that when you do frequently query on two or more columns, a > multi-column index is faster than bitmap scans, especially for larger > data sets. Very much faster, especially if you're only looking for a few dozen or hundred rows out of multi-million row tables. -- Current Peeve: The mindset that the Internet is some sort of school for novice sysadmins and that everyone -not- doing stupid dangerous things should act like patient teachers with the ones who are. -- Bill Cole, NANAE -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] slow query
OK, I've got a query that's running slow the first time, then fast. But I can't see where the time is really being spend on the first run. Query and plan attached to preserve formatting. The index scan and nested loop that feed the next layer up nested loop both show very short run times. Yet the nested loop they feed to takes 30 seconds to run. If I run the query a second time, everything looks the same but the second nested loop now runs in well under a second. I can't figure out where my time's going to. =# explain analyze SELECT u.*, to_char(u.registration_date, 'Month FMDD, ') AS f_registration_date FROM users u INNER JOIN (org_relationships_join j1 INNER JOIN org_relationships_join j2 ON j1.relid = j2.relid) ON u.orgid = j2.orgid WHERE j1.orgid = 142550 AND u.deleted = 0 AND u.type_id < 10 AND ( lower(u.lname) LIKE lower('boat%') OR lower(u.fname) LIKE lower('boat%') OR lower(u.username) LIKE lower('boat%') OR lower(u.option1) LIKE lower('boat%') OR lower(u.email) LIKE lower('%boat%') OR lower(u.external_id) = lower('boat') ) ORDER BY lower(u.lname), lower(u.fname) ; QUERY PLAN - Sort (cost=2625.26..2625.51 rows=99 width=271) (actual time=30123.181..30123.183 rows=4 loops=1) Sort Key: (lower((u.lname)::text)), (lower((u.fname)::text)) Sort Method: quicksort Memory: 26kB -> Nested Loop (cost=0.00..2621.98 rows=99 width=271) (actual time=26919.298..30123.065 rows=4 loops=1) -> Nested Loop (cost=0.00..29.36 rows=20 width=4) (actual time=0.099..0.383 rows=35 loops=1) -> Index Scan using org_relationships_orgid_idx on org_relationships_join j1 (cost=0.00..8.27 rows=1 width=4) (actual time=0.048..0.051 rows=1 loops=1) Index Cond: (orgid = 142550) -> Index Scan using org_relationships_relid_idx on org_relationships_join j2 (cost=0.00..21.00 rows=7 width=8) (actual time=0.046..0.222 rows=35 loops=1) Index Cond: (j2.relid = j1.relid) -> Index Scan using users_orgid_idx on users u (cost=0.00..129.52 rows=5 width=271) (actual time=843.825..860.638 rows=0 loops=35) Index Cond: (u.orgid = j2.orgid) Filter: ((u.type_id < 10) AND (u.deleted = 0) AND ((lower((u.lname)::text) ~~ 'boat%'::text) OR (lower((u.fname)::text) ~~ 'boat%'::text) OR (lower((u.username)::text) ~~ 'boat%'::text) OR (lower(u.option1) ~~ 'boat%'::text) OR (lower((u.email)::text) ~~ '%boat%'::text) OR (lower(u.external_id) = 'boat'::text))) Total runtime: 30123.405 ms Second run: Sort (cost=2625.30..2625.55 rows=99 width=271) (actual time=116.011..116.012 rows=4 loops=1) Sort Key: (lower((u.lname)::text)), (lower((u.fname)::text)) Sort Method: quicksort Memory: 26kB -> Nested Loop (cost=0.00..2622.02 rows=99 width=271) (actual time=79.531..115.958 rows=4 loops=1) -> Nested Loop (cost=0.00..29.36 rows=20 width=4) (actual time=0.036..0.126 rows=35 loops=1) -> Index Scan using org_relationships_orgid_idx on org_relationships_join j1 (cost=0.00..8.27 rows=1 width=4) (actual time=0.016..0.018 rows=1 loops=1) Index Cond: (orgid = 142550) -> Index Scan using org_relationships_relid_idx on org_relationships_join j2 (cost=0.00..21.00 rows=7 width=8) (actual time=0.014..0.068 rows=35 loops=1) Index Cond: (j2.relid = j1.relid) -> Index Scan using users_orgid_idx on users u (cost=0.00..129.52 rows=5 width=271) (actual time=3.126..3.305 rows=0 loops=35) Index Cond: (u.orgid = j2.orgid) Filter: ((u.type_id < 10) AND (u.deleted = 0) AND ((lower((u.lname)::text) ~~ 'boat%'::text) OR (lower((u.fname)::text) ~~ 'boat%'::text) OR (lower((u.username)::text) ~~ 'boat%'::text) OR (lower(u.option1) ~~ 'boat%'::text) OR (lower((u.email)::text) ~~ '%boat%'::text) OR (lower(u.external_id) = 'boat'::text))) Total runtime: 116.214 ms -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] slow query
On Mon, Jan 12, 2009 at 2:59 PM, Scott Marlowe wrote: > OK, I've got a query that's running slow the first time, then fast. > But I can't see where the time is really being spend on the first run. > Query and plan attached to preserve formatting. > > The index scan and nested loop that feed the next layer up nested loop > both show very short run times. Yet the nested loop they feed to > takes 30 seconds to run. If I run the query a second time, everything > looks the same but the second nested loop now runs in well under a > second. > > I can't figure out where my time's going to. If it is any help, there is a nice tool to format your explain plan at http://explain.depesz.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] slow query
Scott Marlowe wrote: > > OK, I've got a query that's running slow the first time, then fast. > But I can't see where the time is really being spend on the first run. > Query and plan attached to preserve formatting. Often this is from caching -- the first time the system has to go to disk to get the values; the subsequent times the data (and indexes, presumably) are all in RAM and so much faster. Is this plausible ? Greg Williamson Senior DBA DigitalGlobe Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information and must be protected in accordance with those provisions. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. (My corporate masters made me say this.)
Re: [PERFORM] slow query
"Scott Marlowe" writes: > -> Index Scan using users_orgid_idx on users u (cost=0.00..129.52 > rows=5 width=271) (actual time=843.825..860.638 rows=0 loops=35) >Index Cond: (u.orgid = j2.orgid) >Filter: ((u.type_id < 10) AND (u.deleted = 0) AND > ((lower((u.lname)::text) ~~ 'boat%'::text) OR (lower((u.fname)::text) ~~ > 'boat%'::text) OR (lower((u.username)::text) ~~ 'boat%'::text) OR > (lower(u.option1) ~~ 'boat%'::text) OR (lower((u.email)::text) ~~ > '%boat%'::text) OR (lower(u.external_id) = 'boat'::text))) Not sure if this is what's going on but I find the high startup time for this index scan suspicious. Either there are a lot of dead tuples (which would explain the second run being fast if it marks them all as lp_dead) or there are a lot of matching index pointers which fail those other constraints. Assuming it's the latter perhaps some other index definition would let it zero in on the right tuples more quickly instead of having to grovel through a lot of irrelevant rows? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] slow query
On Mon, Jan 12, 2009 at 5:59 PM, Scott Marlowe wrote: > I can't figure out where my time's going to. Looks like it's going to: -> Index Scan using users_orgid_idx on users u (cost=0.00..129.52 rows=5 width=271) (actual time=843.825..860.638 rows=0 loops=35) I'd guess the index/pages for users isn't in memory the first time around. Next time is: -> Index Scan using users_orgid_idx on users u (cost=0.00..129.52 rows=5 width=271) (actual time=3.126..3.305 rows=0 loops=35) -- - David T. Wilson david.t.wil...@gmail.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] slow query
On Mon, Jan 12, 2009 at 4:55 PM, David Wilson wrote: > On Mon, Jan 12, 2009 at 5:59 PM, Scott Marlowe > wrote: >> I can't figure out where my time's going to. > > Looks like it's going to: > > -> Index Scan using users_orgid_idx on users u (cost=0.00..129.52 > rows=5 width=271) (actual time=843.825..860.638 rows=0 loops=35) > > I'd guess the index/pages for users isn't in memory the first time around. Exactly. I keep forgetting to look at loops... sigh. Thanks! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Slow insert performace, 8.3 Wal related?
Hi Group. Recently upgraded from 8.1 to 8.3 on RHEL 5 64-bit. I've noticed some performance problems that I am guessing are WAL related based on my browsing around and wondered if someone had some suggestions for tuning the WAL settings. It could also help if someone just laughed at me and told me it wasn't WAL. I have narrowed the problem down to two pretty simple descriptions. I had a data load that I was doing with 8.1. It involved about 250k sql statements that were inserts into a table with just one index. The index has two fields. With the upgrade to 8.3 that process started taking all night and 1/2 a day. It inserted at the rate of 349 records a minute. When I started working on the problem I decided to test by putting all statements withing a single transaction. Just a simple BEGIN at the start and COMMIT at the end. Magically it only took 7 minutes to do the whole set, or 40k per minute. That seemed very odd to me, but at least I solved the problem. The most recently noticed simple problem. I had a table with about 20k records. We issued the statement DELETE FROM table where this=that. This was part of a combined index and about 8k records should have been deleted. This statement caused all other queries to grind to a halt. It was only when I killed it that normal operation resumed. It was acting like a lock, but that table was not being used by any other process. So that describes what I am seeing, let me relay what we are doing with what I think to be the relevant settings. For the log shipping, I am using scp to send the logs to a separate server. And yes they are getting sent. I have it set now to send the log about every two minutes since I am comfortable with that amount of data loss. Here are the settings from the log file that are not commented out relating to WAL. (everything below WRITE AHEAD LOG section in the default config file) synchronous_commit = off checkpoint_segments = 3 # in logfile segments, min 1, 16MB each checkpoint_timeout = 5min # range 30s-1h checkpoint_completion_target = 0.5 # checkpoint target duration, 0.0 - 1.0 checkpoint_warning = 30s# 0 is off archive_mode = on # allows archiving to be done archive_command = '/var/lib/pgsql/data/logship.sh %f %p' archive_timeout = 120 # force a logfile segment switch after this Thanks for any help (or laughter) Rusty -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow insert performace, 8.3 Wal related?
Bill Preston writes: > I've noticed some performance problems that I am guessing are WAL > related based on my browsing around and wondered if someone had some > suggestions for tuning the WAL settings. It could also help if someone > just laughed at me and told me it wasn't WAL. Consider it done ;-). I'm not sure what your problem is but it's unlikely to be WAL, especially not if you're using the same WAL-related settings in 8.1 and 8.3. Which you might not be. The large speedup from wrapping many small inserts into one transaction is entirely expected and should have occurred on 8.1 as well. I am suspicious that you were running 8.1 with fsync off and 8.3 has it on. Do you still have your 8.1 postgresql.conf? Comparing all the non-defaulted settings would be the first thing to do. If it's not that, I'm not sure. One cross-version difference that comes to mind is that 8.3 is a bit stickier about implicit casting, and so it seems conceivable that something about these queries was considered indexable in 8.1 and is not in 8.3. But you've not provided enough detail to do more than speculate. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow insert performace, 8.3 Wal related?
On Monday 12 January 2009, Bill Preston wrote: > I had a data load that I was doing with 8.1. It involved about 250k sql > statements that were inserts into a table with just one index. The index > has two fields. > With the upgrade to 8.3 that process started taking all night and 1/2 a > day. It inserted at the rate of 349 records a minute. > When I started working on the problem I decided to test by putting all > statements withing a single transaction. Just a simple BEGIN at the > start and COMMIT at the end. Magically it only took 7 minutes to do the > whole set, or 40k per minute. That seemed very odd to me, but at least I > solved the problem. > That's well-known behaviour. If you don't do them in one big transaction, PostgreSQL has to fsync after every insert, which effectively limits your insert rate to the rotational speed of your WAL drive (roughly speaking). If you don't explicitly start and end transactions, PostgreSQL does it for you. For every statement. > The most recently noticed simple problem. > I had a table with about 20k records. We issued the statement DELETE > FROM table where this=that. > This was part of a combined index and about 8k records should have been > deleted. > This statement caused all other queries to grind to a halt. It was only > when I killed it that normal operation resumed. It was acting like a > lock, but that table was not being used by any other process. Are there foreign keys on any other table(s) that point to this one? Are the relevant columns in those tables indexed? -- Current Peeve: The mindset that the Internet is some sort of school for novice sysadmins and that everyone -not- doing stupid dangerous things should act like patient teachers with the ones who are. -- Bill Cole, NANAE -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] dbt-2 tuning results with postgresql-8.3.5
On Mon, Dec 22, 2008 at 7:27 AM, Kevin Grittner wrote: "Mark Wong" wrote: > >> The DL380 G5 is an 8 core Xeon E5405 with 32GB of >> memory. The MSA70 is a 25-disk 15,000 RPM SAS array, currently >> configured as a 25-disk RAID-0 array. > >> number of connections (250): > >> Moving forward, what other parameters (or combinations of) do people >> feel would be valuable to illustrate with this workload? > > To configure PostgreSQL for OLTP on that hardware, I would strongly > recommend the use of a connection pool which queues requests above > some limit on concurrent queries. My guess is that you'll see best > results with a limit somewhere aound 40, based on my tests indicating > that performance drops off above (cpucount * 2) + spindlecount. It appears to peak around 220 database connections: http://pugs.postgresql.org/node/514 Of course the system still isn't really tuned all that much... I wouldn't be surprised if the workload peaked at a different number of connections as it is tuned more. > I wouldn't consider tests of the other parameters as being very useful > before tuning this. This is more or less equivalent to the "engines" > configuration in Sybase, for example. Regards, Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow insert performace, 8.3 Wal related?
Wow thanks for all the help Tom and Alan. Sadly I was un-aware of the well-known behavior. Consider it more will known now. Fsync is off in 8.3. I am not too worried about what was before in 8.1 since we are not going back. For the first example (bad behavior when I am not using transactions). Is there anyway to tell that it is going on at a given point and time? Is their a buffer that fills up, a stat somewhere that I can read? A lot of our code isn't using transactions yet so I would like a heads up when this problem is happening or if possible increase some parameter so it happens less. As to the second example with the delete. There are no foreign keys. For the index. If the table has fields a,b,c and d. We have a btree index (a,b,c,d) and we are saying DELETE FROM table_messed_up WHERE a=x. So the WHERE statement is the first field in the the index. Now that you have given me more knowledge, let me ask a question that might lead to the answer. Example 1 happens in isolation. Example 2 happened on a live system with the parameters that I specified and a whole lot of sql statements without transactions being run at the same time. In fact their probably was a whole lot of inserts on this very table before the delete statement was hit. Is it possible that a problem like Example 1 caused the behavior that I witnessed in Example 2? It was waiting for the WAL's to catch up or something? Thanks Rusty Alan Hodgson wrote: On Monday 12 January 2009, Bill Preston wrote: I had a data load that I was doing with 8.1. It involved about 250k sql statements that were inserts into a table with just one index. The index has two fields. With the upgrade to 8.3 that process started taking all night and 1/2 a day. It inserted at the rate of 349 records a minute. When I started working on the problem I decided to test by putting all statements withing a single transaction. Just a simple BEGIN at the start and COMMIT at the end. Magically it only took 7 minutes to do the whole set, or 40k per minute. That seemed very odd to me, but at least I solved the problem. That's well-known behaviour. If you don't do them in one big transaction, PostgreSQL has to fsync after every insert, which effectively limits your insert rate to the rotational speed of your WAL drive (roughly speaking). If you don't explicitly start and end transactions, PostgreSQL does it for you. For every statement. The most recently noticed simple problem. I had a table with about 20k records. We issued the statement DELETE FROM table where this=that. This was part of a combined index and about 8k records should have been deleted. This statement caused all other queries to grind to a halt. It was only when I killed it that normal operation resumed. It was acting like a lock, but that table was not being used by any other process. Are there foreign keys on any other table(s) that point to this one? Are the relevant columns in those tables indexed? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance