Re: [PERFORM] two queries and dual cpu (perplexed)
On Apr 21, 2005, at 7:49 AM, Shoaib Burq (VPAC) wrote: Now I have not touch the $PGDATA/postgresql.conf (As I know very little about memory tuning) Have run VACCUM & ANALYZE. You should really, really bump up shared_buffers and given you have 8GB of ram this query would likely benefit from more work_mem. and the time taken is *twice* that for the original. The modification was minor. The queries do make use of both CPUs: Is this an IO intensive query? If running both in parellel results in 2x the run time and you have sufficient cpus it would (to me) indicate you don't have enough IO bandwidth to satisfy the query. Can we see an explain analyze of the query? Could be a bad plan and a bad plan will never give good performance. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] two queries and dual cpu (perplexed)
On Apr 21, 2005, at 11:33 PM, Shoaib Burq (VPAC) wrote: BTW I guess should mention that I am doing the select count(*) on a View. A bit of a silly question... but are you actually selecting all the rows from this query in production or would it be more selective? ie select * from bigslowview where bah = 'snort'? Ran the Explain analyse with the nestedloop disabled but it was taking forever... and killed it after 30mins. If it takes too long you can run just plain explain (no analyze) and it will show you the plan. This is nearly always instant... it'll give you a clue as to if your setting changes did anything. You may need to end up breaking some parts of this up into subqueries. I've had to do this before. I had one query that just ran too dang slow as a join so I modified it into a subquery type deal. Worked great. However since you are selecting ALL rows I doubt that will help much. Another option may be to use materialized views. Not sure how "dynamic" your data model is. It could help. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Why is this system swapping?
On Apr 27, 2005, at 1:48 PM, Anjan Dave wrote: As you can see the system starts utilizing swap at some point, with so many processes. Some time ago we had decided to keep the connections from the pool open for longer You've shown the system has used swap but not that it is swapping. Having swap in use is fine - there is likely plenty of code and whatnot that is not being used so it dumped it out to swap. However if you are actively moving data to/from swap that is bad. Very bad. Especially on linux. To tell if you are swapping you need to watch the output of say, vmstat 1 and look at the si and so columns. Linux is very swap happy and likes to swap things for fun and profit. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Why is this system swapping?
On Apr 27, 2005, at 2:29 PM, Greg Stark wrote: "AI would seriously look at tuning those connection pools down. A lot. If your server processes are sitting idle over half the time I would at least cut it by a factor of 2. Are you (Anjan) using real or fake connection pooling - ie pgpool versus php's persistent connections ? I'd strongly recommend looking at pgpool. it does connection pooling correctly (A set of X connections shared among the entire box rather than 1 per web server) -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Why is this system swapping?
On Apr 27, 2005, at 7:46 PM, Greg Stark wrote: In fact I think it's generally superior to having a layer like pgpool having to hand off all your database communication. Having to do an extra context switch to handle every database communication is crazy. I suppose this depends on how many machines / how much traffic you have. In one setup I run here I get away with 32 * 4 db connections instead of 500 * 4. Pretty simple to see the savings on the db machine. (Yes, it is a "bad design" as you said where static & dynamic content are served from the same box. However it also saves money since I don't need machines sitting around serving up pixel.gif vs myBigApplication.cgi) -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Xeon twice the performance of opteron
hey folks, Running into some odd performance issues between a few of our db boxes. While trying to speed up a query I ran it on another box and it was twice as fast. The plans are identical and various portions of the query run in the same amount of time - it all boils down to most of the time being spent in a join filter. The plan is as good as it is going to get but the thing that is concerning me, which hopefully some folks here may have some insight on, is the very large difference in runtime. three boxes: A: Intel(R) Xeon(R) CPU E5345 @ 2.33GHz (Runs query fastest) 4MB cache B: Quad-Core AMD Opteron(tm) Processor 2352 (2.1GHZ) (Main production box, currently, middle speed) 512k cache C: Quad-Core AMD Opteron(tm) Processor 2378 (2.4GHZ) 512k cache A & B are running PG 8.4.2 (yes, I know it desperately need to be upgraded). C was also on 8.4.2 and since it was not in production I upgraded it to 8.4.7 and got the same performance as 8.4.2. Dataset on A & B is the same C is mostly the same, but is missing a couple weeks of data (but since this query runs over 3 years of data, it is negligable - plus C runs the slowest!) All three running FC10 with kernel Linux db06 2.6.27.19-170.2.35.fc10.x86_64 #1 SMP Mon Feb 23 13:00:23 EST 2009 x86_64 x86_64 x86_64 GNU/Linux Load is very low on each box. The query is running from shared_buffers - no real IO is occuring. The average timing for the query in question is 90ms on A, 180ms on B and 190ms on C. Now here's where some odd stuff starts piling up: explain analyze overhead on said queries: 20ms on A, 50ms on B and 85ms on C(!!) We had one thought about potential NUMA issues, but doing a series (100) of connect, query, disconnect and looking at the timings reveals them all to be solid... but even still we wouldn't expect it to be that awful. The smaller cache of the opterons is also a valid argument. I know we're running an old kernel, I'm tempted to upgrade to see what will happen, but at the same time I'm afraid it'll upgrade to a kernel with a broken [insert major subsystem here] which has happened before. Anybody have some insight into this or run into this before? btw, little more background on the query: -> Nested Loop (cost=5.87..2763.69 rows=9943 width=0) (actual time=0.571..2 74.750 rows=766 loops=1) Join Filter: (ce.eventdate >= (md.date - '6 days'::interval)) -> Nested Loop (cost=5.87..1717.98 rows=27 width=8) (actual time=0.53 3..8.301 rows=159 loops=1) [stuff removed here] -> Index Scan using xxx_date_idx on xx md (cost=0.00..19.50 rows=1099 width=8) (actual time=0.023..0.729 rows=951 loops=15 9) Index Cond: (ce.eventdate <= md.date) On all three boxes that inner nestloop completes in about the same amount of time - it is that join filter that is causing the pain and agony. (If you are noticing the timing differences, that is because the numbers above are the actual numbers, not explain analyze). The query is pulling up a rolling window of events that occured on a specific date. This query pulls up al the data for a period of time. ce.eventdate is indexed, and is used in the outer nestloop. Thinking more about what is going on cache thrashing is certainly a possibility. the amazing explain analyze overhead is also very curious - we all know it adds overhead, but 85ms? Yow. -- Jeff Trout http://www.stuarthamm.net/ http://www.dellsmartexitin.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Xeon twice the performance of opteron
On Mar 17, 2011, at 9:39 PM, Scott Marlowe wrote: My experience puts the 23xx series opterons in a same general neighborhood as the E5300 and a little behind the E5400 series Xeons. OTOH, the newer Magny Cours Opterons stomp both of those into the ground. Do any of those machines have zone.reclaim.mode = 1 ??? i.e.: sysctl -a|grep zone.reclaim vm.zone_reclaim_mode = 0 I had a machine that had just high enough interzone communications cost to get it turned on by default and it slowed it right to a crawl under pgsql. It is set to zero on this machine. I've tried PG compiled on the box itself, same result. As for power savings, according to cpuinfo all the cores are running at 2.1ghz We had another machine which typically runs as a web server running on an AMD Opteron(tm) Processor 6128 which after diddling the speed governor to performance (thus bumping cpu speed to 2ghz from 800mhz) query speed increased to 100ms, still not as fast as the xeon, but close enough. I think I'm just hitting some wall of the architecture. I tried getting some oprofile love from it but oprofile seems to not work on that box. however it worked on the xeon box: 33995 9.6859 postgres j2date 21925 6.2469 postgres ExecMakeFunctionResultNoSets 20500 5.8409 postgres slot_deform_tuple 17623 5.0212 postgres BitmapHeapNext 13059 3.7208 postgres dt2time 12271 3.4963 postgres slot_getattr 11509 aside from j2date (probably coming up due to that Join filter I'd wager) nothing unexpected. -- Jeff Trout http://www.stuarthamm.net/ http://www.dellsmartexitin.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Intel SSDs that may not suck
On Mar 29, 2011, at 12:13 AM, Merlin Moncure wrote: My own experience with MLC drives is that write cycle expectations are more or less as advertised. They do go down (hard), and have to be monitored. If you are writing a lot of data this can get pretty expensive although the cost dynamics are getting better and better for flash. I have no idea what would be precisely prudent, but maybe some good monitoring tools and phased obsolescence at around 80% duty cycle might not be a bad starting point. With hard drives, you can kinda wait for em to pop and swap em in -- this is NOT a good idea for flash raid volumes. we've been running some of our DB's on SSD's (x25m's, we also have a pair of x25e's in another box we use for some super hot tables). They have been in production for well over a year (in some cases, nearly a couple years) under heavy load. We're currently being bit in the ass by performance degradation and we're working out plans to remedy the situation. One box has 8 x25m's in a R10 behind a P400 controller. First, the p400 is not that powerful and we've run experiments with newer (p812) controllers that have been generally positive. The main symptom we've been seeing is write stalls. Writing will go, then come to a complete halt for 0.5-2 seconds, then resume. The fix we're going to do is replace each drive in order with the rebuild occuring between each. Then we do a security erase to reset the drive back to completely empty (including the "spare" blocks kept around for writes). Now that all sounds awful and horrible until you get to overall performance, especially with reads - you are looking at 20k random reads per second with a few disks. Adding in writes does kick it down a noch, but you're still looking at 10k+ iops. That is the current trade off. In general, i wouldn't recommend the cciss stuff with SSD's at this time because it makes some things such as security erase, smart and other things near impossible. (performance seems ok though) We've got some tests planned seeing what we can do with an Areca controller and some ssds to see how it goes. Also note that there is a funky interaction with an MSA70 and SSDs. they do not work together. (I'm not sure if HP's official branded ssd's have the same issue). The write degradation could probably be monitored looking at svctime from sar. We may be implementing that in the near future to detect when this creeps up again. -- Jeff Trout http://www.stuarthamm.net/ http://www.dellsmartexitin.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Intel SSDs that may not suck
On Mar 29, 2011, at 10:16 AM, Jeff wrote: Now that all sounds awful and horrible until you get to overall performance, especially with reads - you are looking at 20k random reads per second with a few disks. Adding in writes does kick it down a noch, but you're still looking at 10k+ iops. That is the current trade off. We've been doing a burn in for about 4 days now on an array of 8 x25m's behind a p812 controller: here's a sample of what it is currently doing (I have 10 threads randomly seeking, reading, and 10% of the time writing (then fsync'ing) out, using my pgiosim tool which I need to update on pgfoundry) 10:25:24 AM dev104-2 7652.21 109734.51 12375.22 15.96 8.22 1.07 0.12 88.32 10:25:25 AM dev104-2 7318.52 104948.15 11696.30 15.94 8.62 1.17 0.13 92.50 10:25:26 AM dev104-2 7871.56 112572.48 13034.86 15.96 8.60 1.09 0.12 91.38 10:25:27 AM dev104-2 7869.72 111955.96 13592.66 15.95 8.65 1.10 0.12 91.65 10:25:28 AM dev104-2 7859.41 111920.79 13560.40 15.97 9.32 1.19 0.13 98.91 10:25:29 AM dev104-2 7285.19 104133.33 12000.00 15.94 8.08 1.11 0.13 92.59 10:25:30 AM dev104-2 8017.27 114581.82 13250.91 15.94 8.48 1.06 0.11 90.36 10:25:31 AM dev104-2 8392.45 120030.19 13924.53 15.96 8.90 1.06 0.11 94.34 10:25:32 AM dev104-2 10173.86 145836.36 16409.09 15.95 10.72 1.05 0.11113.52 10:25:33 AM dev104-2 7007.14 100107.94 11688.89 15.95 7.39 1.06 0.11 79.29 10:25:34 AM dev104-2 8043.27 115076.92 13192.31 15.95 9.09 1.13 0.12 96.15 10:25:35 AM dev104-2 7409.09 104290.91 13774.55 15.94 8.62 1.16 0.12 90.55 the 2nd to last column is svctime. first column after dev104-2 is TPS. if I kill the writes off, tps rises quite a bit: 10:26:34 AM dev104-2 22659.41 361528.71 0.00 15.95 10.57 0.42 0.04 99.01 10:26:35 AM dev104-2 22479.41 359184.31 7.84 15.98 9.61 0.52 0.04 98.04 10:26:36 AM dev104-2 21734.29 347230.48 0.00 15.98 9.30 0.43 0.04 95.33 10:26:37 AM dev104-2 21551.46 344023.30116.50 15.97 9.56 0.44 0.05 97.09 10:26:38 AM dev104-2 21964.42 350592.31 0.00 15.96 10.25 0.42 0.04 96.15 10:26:39 AM dev104-2 22512.75 359294.12 7.84 15.96 10.23 0.50 0.04 98.04 10:26:40 AM dev104-2 22373.53 357725.49 0.00 15.99 9.52 0.43 0.04 98.04 10:26:41 AM dev104-2 21436.79 342596.23 0.00 15.98 9.17 0.43 0.04 94.34 10:26:42 AM dev104-2 22525.49 359749.02 39.22 15.97 10.18 0.45 0.04 98.04 now to demonstrate "write stalls" on the problemtic box: 10:30:49 AM dev104-3 0.00 0.00 0.00 0.00 0.38 0.00 0.00 35.85 10:30:50 AM dev104-3 3.03 8.08258.59 88.00 2.43635.00333.33101.01 10:30:51 AM dev104-3 4.00 0.00128.00 32.00 0.67391.75 92.75 37.10 10:30:52 AM dev104-3 10.89 0.00 95.05 8.73 1.45133.55 12.27 13.37 10:30:53 AM dev104-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 10:30:54 AM dev104-3155.00 0.00 1488.00 9.60 10.88 70.23 2.92 45.20 10:30:55 AM dev104-3 10.00 0.00536.00 53.60 1.66100.20 45.80 45.80 10:30:56 AM dev104-3 46.53 0.00411.88 8.85 3.01 78.51 4.30 20.00 10:30:57 AM dev104-3 11.00 0.00 96.00 8.73 0.79 72.91 27.00 29.70 10:30:58 AM dev104-3 12.00 0.00 96.00 8.00 0.79 65.42 11.17 13.40 10:30:59 AM dev104-3 7.84 7.84 62.75 9.00 0.67 85.38 32.00 25.10 10:31:00 AM dev104-3 8.00 0.00224.00 28.00 0.82102.00 47.12 37.70 10:31:01 AM dev104-3 20.00 0.00184.00 9.20 0.24 11.80 1.10 2.20 10:31:02 AM dev104-3 4.95 0.00 39.60 8.00 0.23 46.00 13.00 6.44 10:31:03 AM dev104-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 that was from a simple dd, not random writes. (since it is in production, I can't really do the random write test as easily) theoretically, a nice rotation of disks would remove that problem. annoying, but it is the price you need to pay -- Jeff Trout http://www.stuarthamm.net/ http://www.dellsmartexitin.com/ -- Sent via pgsql-performance mailing list (pgsql-perfor
Re: [PERFORM] Intel SSDs that may not suck
On Mar 29, 2011, at 12:12 PM, Jesper Krogh wrote: Are you replacing the drives with new once, or just secure-erase and back in? What kind of numbers are you drawing out of smartmontools in usage figures? (Also seeing some write-stalls here, on 24 Raid50 volumes of x25m's, and have been planning to cycle drives for quite some time, without actually getting to it. we have some new drives that we are going to use initially, but eventually it'll be a secure-erase'd one we replace it with (which should perform identical to a new one) What enclosure & controller are you using on the 24 disk beast? -- Jeff Trout http://www.stuarthamm.net/ http://www.dellsmartexitin.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] is it possible to make this faster?
Also, are you sure your numbers are not coming out of the mysql query cache? That might explain some of it - also with Tom seeing comprable numbers in his test. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] SETOF performance
I think it was on this list - someone posted a message about SETOF being slower. Tom replied saying it was because it needed to create an on-disk tuplestore. I was just looking for some clarification - a SETOF function will always write the reslting tuples to disk (Not buffering in say a sort_mem sized buffer)? I think if that is the case I may need to go back and change some stuff around. I have a procedure that I broke out a bit to make life easier. Basically it goes for v_row in select blah from function_that_gets_data_from_some_cache() rowcount := rowcount + 1; return next v_row; end for; if rowcount = 0 then [same thing, but we call some_function_that_creates_data_for_cache] end if; Doing it this way means I avoid having to deal with it in the client and I also avoid having a giant stored procedure. (I like short & sweet things) What I've found for timings is this: select * from function_that_gets_data_from_some_cache() runs around 1.8 ms but select * from the_top_level_function() runs around 4.2ms (Yes, I know 4.2 ms is fast, but that is not the point). could this overhead be related to the SETOF tuplestores? Might it be better to use refcursor or something or bite the bullet and live with a giant procedure? -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] [ SOLVED ] select count(*) very slow on an already
On Apr 16, 2004, at 4:23 AM, Rajesh Kumar Mallah wrote: I am running an update on the same table update rfis set inquiry_status='APPROVED' where inquiry_status='a'; Its running for past 20 mins. and top output is below. The PID which is executing the query above is 6712. Can anyone tell me why it is in an uninterruptable sleep and does it relate to the apparent poor performance? Is it problem with the disk hardware. I know at nite this query will run reasonably fast. I've had this problem recently. The problem is simply that the disk cannot keep up. Most likely you don't see it at night because traffic is lower. There are only 2 solutions: 1. get more disks 2. write to the db less The machine I was running on had a single(!) disk. It was a quad xeon so there was plenty of cpu. I'd see 8-9 processes stuck in the "D" state. Doing a simple ls -l somefile would take 10-15 seconds and of course, db performance was abysmal. I had a lowly P2 with a few disks in it that was able to run circles around it for the simple fact the machine was not waiting for disk. Again, proof that disk is far more important than CPU in a db. good luck. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Wierd context-switching issue on Xeon
On Apr 19, 2004, at 8:01 PM, Tom Lane wrote: [test case] Quad P3-700Mhz, ServerWorks, pg 7.4.2 - 1 process: 10-30 cs / second 2 process: 100k cs / sec 3 process: 140k cs / sec 8 process: 115k cs / sec Dual P2-450Mhz, non-serverworks (piix) - 1 process 15-20 / sec 2 process 30k / sec 3 (up to 7) process: 15k /sec (Yes, I verified with more processes the cs's drop) And finally, 6 cpu sun e4500, solaris 2.6, pg 7.4.2: 1 - 10 processes: hovered between 2-3k cs/second (there was other stuff running on the machine as well) Verrry interesting. I've got a dual G4 at home, but for convenience Apple doesn't ship a vmstat that tells context switches -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] planner/optimizer question
On Apr 30, 2004, at 3:01 AM, Gary Doades wrote: [ pg query plan, etc ] I wonder if other parts of the plan are affecting the speed. I've recently run into a case where a merge join plan was chosen for this query, which took 11 seconds to execute. Forcing it to pick a nested loop join dropped it to 3. (Updating my default_statistics_target to 500 caused the planner to choose nested loop join) So, is the plan really the same? A better comparision query may be a simple "select a from mytable where a between foo and bar" to get an index scan. In that case its a straight up, vanilla index scan. Nothing else getting in the way. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Visual Explain
I've known about this tool for a while, but it seems many people do not know of its existence and I think it would be useful to a lot of people who have a hard time reading explain analyze output. (And even those who can read them without blinking.. when you get deep in join hell it gets tricky!) Red Hat Visual Explain - part of Red Hat Database. It is what the name implies - a graphical (java) program to draw a picture of your query plan (along with all the juicy information explain analyze provides). I just tried it out today and after upgrading my JDBC to 7.4 it worked fine (If you get a message about SET AUTOCOMMIT then you need to upgrade your jdbc jar) Quite handy for getting a grasp on stupidly large query plans. http://sources.redhat.com/rhdb/visualexplain.html I used the CVS version, I have no idea how well the "official" releases work. Anyone else using it? -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Visual Explain
On Jun 17, 2004, at 7:10 AM, Adam Witney wrote: Will this run on other platforms? OSX maybe? I've run it on both linux (rh8) and osx (panther). its java so it *should* run anywhere. It isn't the fastest beast in the world though. takes a bit of time to render the plan. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Visual Explain
On Jun 17, 2004, at 12:54 PM, Vitaly Belman wrote: Is it possible to download the Visual Explain only (link)? I only see that you can donwload the whole ISO (which I hardly need). you'll need to snag it out of cvs: http://sources.redhat.com/rhdb/cvs.html You can checkout just visual explain so you won't need to grab everything. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Major differences between oracle and postgres performance - what can I do ?
On Jun 18, 2004, at 7:31 AM, Gary Cowell wrote: The explain output on postgres shows the same execution with a scan on vers and a sort but the query time is 78.6 seconds. Does it run just as slow if you run it again? It could be a case of the caches being empty Oracle but I think I've configured comparible buffering and sort area sizes, certainly there isn't much physical IO going on in either case. Configuring PG like Oracle isn't the best thing in the world. The general PG philosophy is to let the OS do all the caching & buffering - this is reversed in the Oracle world. In 7.4 the rule of thumb is no more than 10k shared_buffers.. beyond that the overhead of maintaining it becomes excessive. (This isn't really the case in 7.5) Curiously, what are your sort_mem and shared_buffers settings? -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] SQL stupid query plan... terrible performance !
On Jun 27, 2004, at 8:37 PM, Jim wrote: Hi, I have one performance issue... and realy have no idea what's going on... When I set enable_seqscan to 0, query2 runs the same way... upload => 60667 entities uploadfield => 506316 entities Have you vacuum analyze'd recently? -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Mysterious performance of query because of plsql function in where condition
On Jul 2, 2004, at 3:48 AM, Peter Alberer wrote: Postgres seems to execute the function "submission_status" for every row of the submissions table (~1500 rows). The query therefore takes quite a lot time, although in fact no row is returned from the assignments table when the condition package_id=949589 is used. Well, you need to think of it this way - PG has no idea what the function does so it treats it as a "black box" - thus it has to run it for each row to see what evaluates too - especially since it is in a where clause. If you really want a function there you can use a SQL function instead of plpgsql - PG has smart enough to push that function up into your query and let the optimizer look at the whole thing. You can also take a look at the various flags you can use while creating functions such as immutable, strict, etc. they can help -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Performance over a LAN
On Jul 23, 2004, at 3:57 AM, William Carney wrote: I tested the LAN connection by transferring around some large (150 MByte) files, and got consistent transfer rates of about 10 MBytes/second in both directions without any problems, which is what I would expect. Netstat says It would be interesting to run something like ntop that can show you current network usage... unless you are doing a large COPY the PG protocol has a lot of back and forth messages... -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] my boss want to migrate to ORACLE
On Jul 28, 2004, at 1:08 PM, Stephane Tessier wrote: we have a BIG problem of performance,it's slow Can you isolate which part is slow? (log_min_duration is useful for finding your slow running queries) we use postgres 7.3 for php security application with approximately 4 millions of insertion by day and 4 millions of delete and update That is pretty heavy write volume. Are these updates done in batches or "now and then"? If they are done in batches you could speed them up by wrapping them inside a transaction. #shared_buffers = 256 # min max_connections*2 or 16, 8KB each #shared_buffers = 196000 # min max_connections*2 or 16, 8KB each shared_buffers = 128000 # min max_connections*2 or 16, 8KB each Too much. Generally over 1 will stop benefitting you. #wal_buffers = 8 # min 4, typically 8KB each Might want to bump this up #checkpoint_segments = 3 # in logfile segments, min 1, 16MB each Given your write volume, increase this up a bit.. oh.. 20 or 30 of them will help a lot. But it will use 16*30MB of disk space. Oracle is *NOT* a silver bullet. It will not instantly make your problems go away. I'm working on a project porting some things to Oracle and as a test I also ported it to Postgres. And you know what? Postgres is running about 30% faster than Oracle. The Oracle lovers here are not too happy with that one :) Just so you know.. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] The black art of postgresql.conf tweaking
On Aug 4, 2004, at 8:45 AM, Paul Serby wrote: Apache on the Web server can take up to 300 connections and PHP is using pg_pconnect Postgres is set with the following. max_connections = 300 shared_buffers = 38400 sort_mem = 12000 But Apache is still maxing out the non-super user connection limit. Did you restart PG after making that change? (you need to restart, reload won't change max_connections) Also, you're sort_mem is likely too high (That is the amount of memory that can be used PER SORT) and you s hould back down on shared_buffers. (General consensus is don't go over 10k shared buffers) Another thing you may want to try is using pgpool and regular pg_connect - this way you only have a pool of say, 32 connections to the DB that are shared among all apache instances. This gets rid of the need to have hundreds of idle postgres' sitting around. Connecting to pgpool is very fast. We use it in production here and it works wonderfully. And it is 100% transparent to your application. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Performance Bottleneck
On Aug 8, 2004, at 1:29 AM, Martin Foster wrote: I am currently making use of Apache::DBI which overrides the DBI::disconnect call and keeps a pool of active connections for use when need be. Since it offloads the pooling to the webserver, it seems more advantageous then pgpool which while being able to run on a external system is not adding another layer of complexity. Apache::DBI is not the same sort of a pool as pgpool. DB connections are not shared among all your apache children (A common misconception). So if you have 300 apache kids you can have have 300 db connections. With pgpool connections are shared among all of them so even though you have 300 kids you only have say 32 db connections. Anyone had any experience with both Apache::DBI and pgpool? For my needs they seem to do essentially the same thing, simply that one is invisible to the code while the other requires adding the complexity of a proxy. Both are invisible to the app. (With pgpool it thinks it is connecting to a regular old PG server) And I've been running pgpool in production for months. It just sits there. Doesn't take much to set it up or configure it. Works like a champ -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] high load caused by I/O - a hint
On Aug 18, 2004, at 4:18 AM, eleven wrote: Hello, This is not strictly PostgreSQL performance hint, but may be helpful to someone with problems like mine. As I earlier posted, I was experiencing very high load average on one of my Linux database servers (IBM eServer 345, SCSI disks on LSI Logic controller) caused by I/O bottleneck. We have some 335's (I think they are 335s) and until April or so there was a bug in the Fusion MPT driver that would cause it to revert to async narrow mode if hardware RAID was enabled on it. (Performance was horrible - NFS on a 100meg network was 10x faster than local disk!) And on the upside, when I originally researched the problem they hadn't found the bug yet so there were no others around having issues like mine so trying to figure it out was quite difficult. I may see if using that acpi=ht makes any difference as well. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] What is the best way to do attribute/values?
On Aug 25, 2004, at 4:22 AM, Mark Kirkwood wrote: select pav1.person_id from person_attributes_vertical pav1 where (pav1.attribute_id = 1 and pav1.value_id in (2,3)) or (pav1.attribute_id = 2 and pav1.value_id in (2,3)) You know.. It may help if you toss in a group by ie select pav1.person_id, count(*) from person_attributes_vertical pav1 where (pav1.attribute_id = 1 and pav1.value_id in (2,3)) or ( ... ) or (...) group by pav1.person_id order by count(*) desc that should give you the person_id's that matched the most criteria I've used similar things before now that I've thought about it. If you want an exact match you could put "having count(*) = $myNumAttributes" in there too.. By definition an exact match would match that definition.. it has an added side effect of producing "closest matches" when an exact match cannot be found... granted you may not want that for a dating site : ) "You asked for a blond female, blue eyes.. but I couldn't find any... but I *DID* find a brown haired male with brown eyes! Is that good enough?" -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Caching of Queries
[ discussion of server side result caching ] and lets not forget PG's major fork it will throw into things: MVCC The results of query A may hold true for txn 1, but not txn 2 and so on . That would have to be taken into account as well and would greatly complicate things. It is always possible to do a "poor man"'s query cache with triggers.. which would just leave you with basically a materialized view. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Need advice on postgresql.conf settings
On Nov 9, 2004, at 2:01 PM, Shane | SkinnyCorp wrote: Thanks in advance for anything you can do to help. The real issue is this, we have THE SAME queries taking anywhere from .001 - 90.0 seconds... the server is using 98% of the available RAM at all times (because of the persistant connections via php), and I don't know what to do. Every time I change a I'd recommend strongly ditching the use of pconnect and use pgpool + regular connect. It is a terrific combination that provides pool connections like how you'd think they shoudl work (a pool of N connections to PG shared by Y processes instead of a 1:1 mapping). curiously, have you noticed any pattern to the slowdown? It could be induced by a checkpoint or vacuum. Are you swapping at all? Are your PHP scripts leaking at all, etc.? Your load average is high, how does your CPU idle look (if load is high, and the cpus are pretty idle that is an indicator of being IO bound). good luck. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] How to speed-up inserts with jdbc
On Nov 10, 2004, at 8:51 AM, Michael Kleiser wrote: It is trunning in in 10 Threads. Each thread makes 100 Inserts: For the 1000 Inserts (10 threads a 100 inserts) we need 8 seconds. That's 125 Insets / Seconds. How could we make it faster ? Batch the inserts up into a transaction. So you'd have BEGIN insert insert insert ... COMMIT Your numbers will suddenly sky rocket. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] query plan question
On Nov 17, 2004, at 7:32 AM, David Parker wrote: Oh, I didn't realize that analyze gave that much more info. I've got a lot to learn about this tuning stuff ;-) I've attached the output. I see from the new output where the slow query is taking its time (the nested loop at line 10), but I still have no idea why this plan is getting chosen looks like your stats are incorrect on the sparc. Did you forget to run vacuum analyze on it? also, do both db's have the same data loaded? there are some very different numbers in terms of actual rows floating around there... -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] scaling beyond 4 processors
On Dec 6, 2004, at 5:18 PM, [EMAIL PROTECTED] wrote: Hello everyone! Since our current Postgres server, a quad Xeon system, finally can't keep up with our load anymore we're ready to take the next step. I'm assuming you've already done as much query tweaking as possible. and are you sure you are CPU bound and not IO bound? (Symptoms of IO bound are low cpu usage, high load average, poor performance. Many processes in "D" state) So the question is: Has anyone experiences with running Postgres on systems with more than 4 processors in a production environment? Which systems and Have you also considered a replicated approach? -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Some Performance Advice Needed
On Dec 23, 2004, at 9:27 AM, Alex wrote: Running hdparm reported A) 920mb/s (SCSI 10k) B) 270mb/s (SCSI 10k) C) 1750mb/s (IDE 7.2k) IDE disks lie about write completion (This can be disabled on some drives) whereas SCSI drives wait for the data to actually be written before they report success. It is quite easy to corrupt a PG (Or most any db really) on an IDE drive. Check the archives for more info. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Some Performance Advice Needed
On Dec 23, 2004, at 4:27 PM, Joshua D. Drake wrote: IDE disks lie about write completion (This can be disabled on some drives) whereas SCSI drives wait for the data to actually be written before they report success. It is quite easy to corrupt a PG (Or most any db really) on an IDE drive. Check the archives for more info. Do we have any real info on this? Specifically which drives? Is SATA the same way? What about SATA-II? I am not saying it isn't true (I know it is) but this is a blanket statement that may or may not be true with newer tech. Scott Marlowe did some tests a while ago on it. They are likely in the archives. Maybe we can get him to pipe up :) -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Swapping on Solaris
On Jan 19, 2005, at 10:42 AM, Alan Stange wrote: Kevin Schroeder wrote: I take that back. There actually is some paging going on. I ran sar -g 5 10 and when a request was made (totally about 10 DB queries) my pgout/s jumped to 5.8 and my ppgout/s jumped to 121.8. pgfree/s also jumped to 121.80. I'm fairly sure that the pi and po numbers include file IO in Solaris, because of the unified VM and file systems. Curiously, what are your shared_buffers and sort_mem set too? Perhaps they are too high? -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Swapping on Solaris
On Jan 19, 2005, at 10:40 AM, Kevin Schroeder wrote: I may be asking the question the wrong way, but when I start up PostgreSQL swap is what gets used the most of. I've got 1282MB free RAM right now and and 515MB swap in use. Granted, swap file usage probably wouldn't be zero, but I would guess that it should be a lot lower so something must be keeping PostgreSQL from using the free RAM that my system is reporting. For example, one of my postgres processes is 201M in size but on 72M is resident in RAM. That extra 130M is available in RAM, according to top, but postgres isn't using it. Can you please give us your exact shared_buffer and sort_mem settings? This will help greatly. As a general thing, we say don't use more than 10k shared bufs unless you have done testing and enjoy a benefit. Managing all those buffers isn't free. I'm also not sure how Solaris reports shared memory usage for apps... a lot of that could be shared mem. Can you watch say, vmstat 1 for a minute or two while PG is running and see if you're actually swapping? -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] PostgreSQL clustering VS MySQL clustering
On Jan 20, 2005, at 9:36 AM, Hervé Piedvache wrote: Sorry but I don't agree with this ... Slony is a replication solution ... I don't need replication ... what will I do when my database will grow up to 50 Gb ... I'll need more than 50 Gb of RAM on each server ??? Slony doesn't use much ram. The mysql clustering product, ndb I believe it is called, requires all data fit in RAM. (At least, it used to). What you'll need is disk space. As for a cluster I think you are thinking of multi-master replication. You should look into what others have said about trying to partiition data among several boxes and then join the results together. Or you could fork over hundreds of thousands of dollars for Oracle's RAC. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Benchmark
On Feb 10, 2005, at 12:49 AM, Jaime Casanova wrote: Hi guys, i'm planning try to do a comparative between some DBMS and postgresql (informix, oracle, m$ sql server, firebird and even mysql) i'm coordinating with people in the irc spanish postgresql channel. 2) point me to a good benchmark test or script that can be used? The TPC tests are fairly widely accepted. The thing with a benchmark is they are unlikely to simulate your real traffic. But it is always fun to look at numbers 3) any comments? If you plan on making your results public be very careful with the license agreements on the other db's. I know Oracle forbids the release of benchmark numbers without their approval. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Benchmark
On Feb 11, 2005, at 2:04 AM, Mitch Pirtle wrote: I did do the research, but couldn't find one instance where someone was actually taken to task over it. So far it appears to be bluster. Horrifying to some, but still bluster. They may not have done that yet, but they _COULD_. And if they decide to they have more money and power than you likely have and would drive you into financial ruin for the rest of your life (Even if you are correct). It is a big risk. I think that clause is in there so MS, etc. can't say "Use FooSQL, its 428% faster than that Oracle POS Just look!" After using oracle in the last few months.. I can see why they'd want to prevent those numbers.. Oracle really isn't that good. I had been under the impression that it was holy smokes amazingly fast. It just isn't. At least, in my experience it isn't. but that is another story. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[PERFORM] Possible interesting extra information for explain analyze?
Given some recent posts / irc issues with dead tuple bloat.. And given that a lot of these people have at least been smart enough to explain analyze would it be A. possible B. useful C. None of the above to have various "scan" nodes of explain analyze also report how many invisible / dead tuples they had to disqualify (Just to clarify, they matched the search criteria, but were invisible due to MVCC rules). Some thing like: Seq Scan on boards (cost=0.00..686.30 rows=25430 width=0) (actual time=8.866..5407.693 rows=18636 loops=1 invisiblerows=8934983098294) This may help us to point out tuple bloat issues quicker... or it may give the developer enough of a clue to search around and find out he needs to vacuum... hmm.. but once we have an integrated autovacuum it will be a moot point. Also another thing I started working on back in the day and hope to finish when I get time (that is a funny idea) is having explain analyze report when a step required the use of temp files. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] multi billion row tables: possible or insane?
On Mar 1, 2005, at 4:34 AM, Ramon Bastiaans wrote: What would be important issues when setting up a database this big, and is it at all doable? Or would it be a insane to think about storing up to 5-10 billion rows in a postgres database. Buy a bunch of disks. And then go out and buy more disks. When you are done with that - go buy some more disks. Then buy some ram. Then buy more disks. You want the fastest IO possible. I'd also recommend the opteron route since you can also put heaping gobules of ram in there as well. The database's performance is important. There would be no use in storing the data if a query will take ages. Query's should be quite fast if possible. And make sure you tune your queries. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] 7 hrs for a pg_restore?
On Feb 19, 2008, at 1:22 PM, Tom Lane wrote: maintenance_work_mem, to be more specific. If that's too small it will definitely cripple restore speed. I'm not sure fsync would make much difference, but checkpoint_segments would. See http://www.postgresql.org/docs/8.3/static/populate.html#POPULATE-PG- DUMP I wonder if it would be worthwhile if pg_restore could emit a warning if maint_work_mem is "low" (start flamewar on what "low" is). And as an addition to that - allow a cmd line arg to have pg_restore bump it before doing its work? On several occasions I was moving a largish table and the COPY part went plenty fast, but when it hit index creation it slowed down to a crawl due to low maint_work_mem.. -- Jeff Trout <[EMAIL PROTECTED]> www.dellsmartexitin.com www.stuarthamm.net ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Performance increase with elevator=deadline
On Apr 11, 2008, at 7:22 AM, Albe Laurenz wrote: After some time of trial and error we found that changing the I/O scheduling algorithm to "deadline" improved I/O performance by a factor 4 (!) for this specific load test. I was inspired once again to look into this - as I'm recently hitting some glass ceilings with my machines. I have a little app I wrote called pgiosim (its on pgfoundry - http://pgfoundry.org/projects/pgiosim) that basically just opens some files, and does random seeks and 8kB reads, much like what our beloved PG does. Using 4 of these with a dataset of about 30GB across a few files (Machine has 8GB mem) I went from around 100 io/sec to 330 changing to noop. Quite an improvement. If you have a decent controller CFQ is not what you want. I tried deadline as well and it was a touch slower. The controller is a 3ware 9550sx with 4 disks in a raid10. I'll be trying this out on the big array later today. I found it suprising this info wasn't more widespread (the use of CFQ on a good controller). it also seems changing elevators on the fly works fine (echo schedulername > /sys/block/.../queue/scheduler I admit I sat there flipping back and forth going "disk go fast.. disk go slow.. disk go fast... " :) -- Jeff Trout <[EMAIL PROTECTED]> http://www.stuarthamm.net/ http://www.dellsmartexitin.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] RAID 10 Benchmark with different I/O schedulers (was: Performance increase with elevator=deadline)
On May 5, 2008, at 7:33 PM, Craig James wrote: I had the opportunity to do more testing on another new server to see whether the kernel's I/O scheduling makes any difference. Conclusion: On a battery-backed RAID 10 system, the kernel's I/O scheduling algorithm has no effect. This makes sense, since a battery-backed cache will supercede any I/O rescheduling that the kernel tries to do. this goes against my real world experience here. pgbench -i -s 20 -U test pgbench -c 10 -t 5 -v -U test You should use a sample size of 2x ram to get a more realistic number, or try out my pgiosim tool on pgfoundry which "sort of" simulates an index scan. I posted numbers from that a month or two ago here. -- Jeff Trout <[EMAIL PROTECTED]> http://www.stuarthamm.net/ http://www.dellsmartexitin.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] syslog performance when logging big statements
On Jul 8, 2008, at 8:24 AM, Achilleas Mantzios wrote: File sizes of about 3M result in actual logging output of ~ 10Mb. In this case, the INSERT *needs* 20 minutes to return. This is because the logging through syslog seems to severely slow the system. If instead, i use stderr, even with logging_collector=on, the same statement needs 15 seconds to return. In syslog.conf is the destination for PG marked with a "-"? (ie -/var/ log/pg.log) which tells syslog to not sync after each line logged. That could explain a large chunk of the difference in time. -- Jeff Trout <[EMAIL PROTECTED]> http://www.stuarthamm.net/ http://www.dellsmartexitin.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] 3ware vs Areca
I've got a couple boxes with some 3ware 9550 controllers, and I'm less than pleased with performance on them.. Sequential access is nice, but start seeking around and you kick it in the gut. (I've found posts on the internets about others having similar issues). My last box with a 3ware I simply had it in jbod mode and used sw raid and it smoked the hw. Anyway, anybody have experience in 3ware vs Areca - I've heard plenty of good anecdotal things that Areca is much better, just wondering if anybody here has firsthand experience.It'll be plugged into about 8 10k rpm sata disks. thanks -- Jeff Trout <[EMAIL PROTECTED]> http://www.stuarthamm.net/ http://www.dellsmartexitin.com/
Re: [PERFORM] 3ware vs Areca
On Jul 11, 2008, at 3:39 PM, Jeffrey Baker wrote: From my experience, the Areca controllers are difficult to operate. Their firmware is, frankly, garbage. In more than one instance we have had the card panic when a disk fails, which is obviously counter to the entire purpose of a RAID. We finally removed the Areca controllers from our database server and replaced them with HP P800s. My main db has a p600 plugged into an msa70 which works well - does the HP junk work in non-hp boxes? -- Jeff Trout <[EMAIL PROTECTED]> http://www.stuarthamm.net/ http://www.dellsmartexitin.com/
Re: [PERFORM] 3ware vs Areca
On Jul 11, 2008, at 3:21 PM, Greg Smith wrote: My last box with a 3ware I simply had it in jbod mode and used sw raid and it smoked the hw. That is often the case no matter which hardware controller you've got, particularly in more complicated RAID setups. You might want to consider that a larger lesson rather than just a single data point. Yeah, it'd be fun to run more benchmarks, but the beefy box, for some reason, is a prod box busy 24/7. no time to nuke it and fidgit :) Check out the pages starting at http://www.tomshardware.com/reviews/ SERIAL-RAID-CONTROLLERS-AMCC,1738-12.html for example, where the newer Areca 1680ML card just gets crushed at all kinds of workloads by the AMCC 3ware 9690SA. I think the 3ware 9600 series cards have achieved or exceeded what Areca's 1200 series was capable of, while Areca's latest generation has slipped a bit from the previous one. It does look like the 9600 series fixed a lot of the 9550 issues. (and for the record, yes, either card I get will have a bbu. tis silly to get a controller without one) -- Jeff Trout <[EMAIL PROTECTED]> http://www.stuarthamm.net/ http://www.dellsmartexitin.com/
Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server
On Aug 11, 2008, at 5:17 AM, Henrik wrote: OK, changed the SAS RAID 10 to RAID 5 and now my random writes are handing 112 MB/ sek. So it is almsot twice as fast as the RAID10 with the same disks. Any ideas why? Is the iozone tests faulty? does IOzone disable the os caches? If not you need to use a size of 2xRAM for true results. regardless - the test only took 10 seconds of wall time - which isn't very long at all. You'd probably want to run it longer anyway. iozone -e -i0 -i1 -i2 -i8 -t1 -s 1000m -r 8k -+u -F /database/iotest Children see throughput for 1 random writers= 112074.58 KB/sec Parent sees throughput for 1 random writers = 111962.80 KB/sec Min throughput per process = 112074.58 KB/sec Max throughput per process = 112074.58 KB/sec Avg throughput per process = 112074.58 KB/sec Min xfer= 1024000.00 KB CPU utilization: Wall time9.137CPU time0.510CPU utilization 5.58 % -- Jeff Trout <[EMAIL PROTECTED]> http://www.stuarthamm.net/ http://www.dellsmartexitin.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] understanding postgres issues/bottlenecks
On Jan 11, 2009, at 9:43 PM, M. Edward (Ed) Borasky wrote: Luke Lonergan wrote: Not to mention the #1 cause of server faults in my experience: OS kernel bug causes a crash. Battery backup doesn't help you much there. Not that long ago (a month or so) we ran into a problem where hpacucli (Utility for configuring/inspecting/etc HP smartarray controllers) would tickle the cciss driver in such a way that it would cause a kernel panic. KABLAMMO (No data loss! we!). The box had run for a long time without crashes, but it seems that when we added more disks and started the array building the new logical drive some magical things happened. Bugs happen. The [bad word] of it is catching the culprit with its fingers in the cookie jar. -- Jeff Trout http://www.stuarthamm.net/ http://www.dellsmartexitin.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [PERFORMANCE] Buying hardware
On Jan 26, 2009, at 2:42 PM, David Rees wrote: Lots of people have databases much, much, bigger - I'd hate to imagine have to restore from backup from one of those monsters. If you use PITR + rsync you can create a binary snapshot of the db, so restore time is simply how long it takes to untar / whatever it into place. Our backup script basically does: archive backup directory pg_start_backup rsync pg_stop_backup voila. I have 2 full copies of the db. You could even expand it a bit and after the rsync & friends have it fire up the instance and run pg_dump against it for a pg_restore compatible dump "just in case". It takes a long time to restore a 300GB db, even if you cheat and parallelify some of it. 8.4 may get a pg_restore that can load in parallel - which will help somewhat. -- Jeff Trout http://www.stuarthamm.net/ http://www.dellsmartexitin.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [PERFORMANCE] Buying hardware
On Jan 26, 2009, at 3:00 PM, Joshua D. Drake wrote: On Mon, 2009-01-26 at 14:58 -0500, Jeff wrote: voila. I have 2 full copies of the db. You could even expand it a bit and after the rsync & friends have it fire up the instance and run pg_dump against it for a pg_restore compatible dump "just in case". It takes a long time to restore a 300GB db, even if you cheat and parallelify some of it. 8.4 may get a pg_restore that can load in parallel - which will help somewhat. Somewhat? Just to be clear, if you have the hardware for it, parallel restore can take a 500GB restore in 2.5 hours (versus 15). IMO, that is a *little* more than somewhat. Maybe, a bit? ;) I'd say that qualifies more towards just a "smidge" faster ;) I'm quite excited about the feature. I'm still on 8.2 mostly because of the downtime of the dump & restore. I wrote up some plans a while back on doing the poor-mans parallel restore, but I haven't had the time to actually do it. Theoretically, wouldn't the parallel pg_restore be able to run against an 8.3 instance with a dump from 8.2? I don't see why it wouldn't be able to (unless it uses some handy dandy new 8.4-only catalog). Maybe if I get time (HAHAHA) I'll test that out.. -- Jeff Trout http://www.stuarthamm.net/ http://www.dellsmartexitin.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SSD performance
I somehow managed to convince the powers that be to let me get a couple X25-E's. I tossed them in my macpro (8 cores), fired up Ubuntu 8.10 and did some testing. Raw numbers are very impressive. I was able to get 3700 random seek +read's a second. In a R1 config it stayed at 3700, but if I added another process it went up to 7000, and eventually settled into the 4000s.If I added in some random writing with fsyncs to it, it settled at 2200 (to be specific, I had 3 instances going - 2 read-only and 1 read-20% write to get that). These numbers were obtained running a slightly modified version of pgiosim (which is on pgfoundtry) - it randomly seeks to a "block" in a file and reads 8kB of data, optionally writing the block back out. Now, moving into reality I compiled 8.3.latest and gave it a whirl. Running against a software R1 of the 2 x25-e's I got the following pgbench results: (note config tweaks: work_mem=>4mb, shared_buffers=>1gb, should probably have tweaked checkpoint_segs, as it was emitting lots of notices about that, but I didn't). (multiple runs, avg tps) Scalefactor 50, 10 clients: 1700tps At that point I realized write caching on the drives was ON. So I turned it off at this point: Scalefactor 50, 10 clients: 900tps At scalefactor 50 the dataset fits well within memory, so I scaled it up. Scalefactor 1500: 10 clients: 420tps While some of us have arrays that can smash those numbers, that is crazy impressive for a plain old mirror pair. I also did not do much tweaking of PG itself. While I'm in the testing mood, are there some other tests folks would like me to try out? -- Jeff Trout http://www.stuarthamm.net/ http://www.dellsmartexitin.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SSD performance
On Feb 3, 2009, at 1:43 PM, Scott Carey wrote: I don’t think write caching on the disks is a risk to data integrity if you are configured correctly. Furthermore, these drives don’t use the RAM for write cache, they only use a bit of SRAM on the controller chip for that (and respect fsync), so write caching should be fine. Confirm that NCQ is on (a quick check in dmesg), I have seen degraded performance when the wrong SATA driver is in use on some linux configs, but your results indicate its probably fine. As it turns out, there's a bug/problem/something with the controller in the macpro vs the ubuntu drives where the controller goes into "works, but not as super as it could" mode, so NCQ is effectively disabled, haven't seen a workaround yet. Not sure if this problem exists on other distros (used ubuntu because I just wanted to try a live). I read some stuff from Intel on the NCQ and in a lot of cases it won't make that much difference because the thing can respond so fast. How much RAM is in that machine? 8GB Some suggested tests if you are looking for more things to try :D -- What affect does the following tuning have: Turn the I/O scheduler to ‘noop’ ( echo noop > /sys/block// queue/scheduler) I’m assuming the current was cfq, deadline may also be interesting, anticipatory would have comically horrible results. I only tested noop, if you think about it, it is the most logical one as an SSD really does not need an elevator at all. There is no rotational latency or moving of the arm that the elevator was designed to cope with. but, here are the results: scale 50, 100 clients, 10x txns: 1600tps (a noticable improvement!) scale 1500, 100 clients, 10xtxns: 434tps I'm going to try to get some results for raptors, but there was another post earlier today that got higher, but not ridiculously higher tps but it required 14 15k disks instead of 2 Tune upward the readahead value ( blockdev —setra /dev/ ) -- try 16384 (8MB) This probably won’t help that much for a pgbench tune, its more for large sequential scans in other workload types, and more important for rotating media. Generally speaking with SSD’s, tuning the above values does less than with hard drives. Yeah, I don't think RA will help pgbench, and for my workloads it is rather useless as they tend to be tons of random IO. I've got some Raptors here too I'll post numbers wed or thu. -- Jeff Trout http://www.stuarthamm.net/ http://www.dellsmartexitin.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] random_page_cost vs ssd?
I've got a couple x25-e's in production now and they are working like a champ. (In fact, I've got another box being built with all x25s in it. its going to smoke!) Anyway, I was just reading another thread on here and that made me wonder about random_page_cost in the world of an ssd where a seek is basically free. I haven't tested this yet (I can do that next week), but logically, in this scenario wouldn't lowering random_page_cost be ideal or would it not really matter in the grand scheme of things? -- Jeff Trout http://www.stuarthamm.net/ http://www.dellsmartexitin.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] current transaction in productive database
On Mar 20, 2009, at 5:26 AM, m...@bortal.de wrote: Hello List, is there a way to find out, how many transactions my currenc productive database is doing? I know know how much i an offer with my new database and hardware, but i would also like to know what i actually _need_ on my current productive system. Is there a way to find this out? Are you looking to see how many transactions per second or more how many transactions concurrently at a given time? For the former you can use pgspy (its on pgfoundry) to get an idea of queries per second coming in. For the latter, just select * from pg_stat_activity where current_query <> ''; -- Jeff Trout http://www.stuarthamm.net/ http://www.dellsmartexitin.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I have a fusion IO drive available for testing
On Mar 26, 2009, at 8:47 AM, Dave Cramer wrote: So far using dd I am seeing around 264MB/s on ext3, 335MB/s on ext2 write speed. So the question becomes what is the best filesystem for this drive? Anyone want me to run anything on it ? Dave I'd be more interested in the random io numbers. You can do some tests with pgiosim (avail on pgfoundry) to sort-of simulate an index scan. It just seeks and reads. It can also randomly write and or fsync. I'd be interested in seeing numbers for 1 proc and 10 on the fusionIO. You have to make some file(s) for it to use first (I usually use dd to do that, and make sure it is at least 2xRAM in size) -- Jeff Trout http://www.stuarthamm.net/ http://www.dellsmartexitin.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I have a fusion IO drive available for testing
On Mar 27, 2009, at 1:30 PM, da...@lang.hm wrote: for the WAL you definantly don't need the journal, for the data I'm not sure. I believe that postgres does appropriate fsync calls so is safe on a non-journaling filesystem. the fusionIO devices are small enough that a fsync on them does not take that long, so it may not be worth the overhead of the journaling. The win for the journal on the heap is simply so you don't need to spend $longtime fsck'ing if you crash, etc. -- Jeff Trout http://www.stuarthamm.net/ http://www.dellsmartexitin.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Best replication solution?
On Apr 7, 2009, at 1:18 PM, Andrew Sullivan wrote: I should have stated that differently. First, you're right that if you don't know where to look or what to look for, you can easily be unaware of nodes being out of sync. What's not a problem with Slony _$cluster.sl_status on the origin is a very handy tool to see your slaves, how many sync's behind they are and whatnot. Maybe I'm lucky, but I haven't got into a funky state that didn't cause my alarms that watch sl_status to go nuts. Complexity seems to be the major evil here. Yes. Slony is massively complex. Configuring slony by hand using slonik commands does suck horribly. But the included altperl tools that come with it, along with slon_tools.conf removes a HUGE amount of that suck. To add a table with a pk you edit slon_tools.conf and add something along the lines of: "someset" => { "set_id" => 5, "table_id" => 5, "pkeyedtables" => [ "tacos", "burritos", "gorditas" ] } then you just run [create tables on slave(s)] slonik_create_set someset; slonik_subscribe_set 1 2; there are other handy scripts in there as well for failing over, adding tables, merging, etc. that hide a lot of the suck. Especially the suck of adding a node and creating the store paths. I'm running slony on a rather write intensive system, works fine, just make sure you've got beefy IO. One sucky thing though is if a slave is down sl_log can grow very large (I've had it get over 30M rows, the slave was only down for hours) and this causes major cpu churn while the queries slon issues sift through tons of data. But, to be fair, that'll hurt any replication system. -- Jeff Trout http://www.stuarthamm.net/ http://www.dellsmartexitin.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Best replication solution?
On Apr 8, 2009, at 4:46 PM, Dimitri Fontaine wrote: $ londiste.py setup.ini provider add schema.table $ londiste.py setup.ini subscriber add schema.table That is nice. One could probably do that for slony too. I may try some tests out with londiste.. I'm always open to new (ideally, better) things. This could happen in Londiste too, just set pgq_lazy_fetch to a reasonable value and Londiste will use a cursor to fetch the events, lowering the load. Events are just tuples in an INSERT only table, which when not used anymore is TRUNCATEd away. PGQ will use 3 tables where to store events and will rotate its choice of where to insert new envents, allowing to use TRUNCATE rather than DELETE. And PostgreSQL is quite efficient to manage this :) http://wiki.postgresql.org/wiki/Londiste_Tutorial#Londiste_is_eating_all_my_CPU_and_lag_is_raising Well, Slony always uses a cursor to fetch, the problem is it may have to slog through millions of rows to find the new data - I've analyzed the queries and there isn't much it can do - lots of calls to the xxid_ functions to determine whats to be used, whats not to be used. When all slaves have a sync event ack'd it is free to be removed by the cleanup routine which is run every few minutes. Oh and some people asked what Londiste with failover and DDL would look like. Here's what the API being cooked looks like at the moment: $ londiste setup.ini execute myddl.script.sql $ londiste conf/londiste_db3.ini change-provider --provider=rnode1 $ londiste conf/londiste_db1.ini switchover --target=rnode2 ok, so londiste can't do failover yet, or is it just somewhat convoluted at this point? -- Jeff Trout http://www.stuarthamm.net/ http://www.dellsmartexitin.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] linux deadline i/o elevator tuning
On Apr 10, 2009, at 2:47 AM, Albe Laurenz *EXTERN* wrote: Grzegorz Jaskiewicz wrote: acording to kernel folks, anticipatory scheduler is even better for dbs. Oh well, it probably means everyone has to test it on their own at the end of day. In my test case, noop and deadline performed well, deadline being a little better than noop. Both anticipatory and CFQ sucked big time. This is my experience as well, I posted about playing with the scheduler a while ago on -performance, but I can't seem to find it. If you have a halfway OK raid controller, CFQ is useless. You can fire up something such as pgbench or pgiosim, fire up an iostat and then watch your iops jump high when you flip to noop or deadline and plummet on cfq. Try it. it's neat! -- Jeff Trout http://www.stuarthamm.net/ http://www.dellsmartexitin.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] limiting performance impact of wal archiving.
On Nov 10, 2009, at 10:53 AM, Laurent Laborde wrote: On Tue, Nov 10, 2009 at 4:48 PM, Kevin Grittner wrote: Laurent Laborde wrote: BTW, if you have any idea to improve IO performance, i'll happily read it. We're 100% IO bound. At the risk of stating the obvious, you want to make sure you have high quality RAID adapters with large battery backed cache configured to write-back. Not sure how "high quality" the 3ware is. /c0 Driver Version = 2.26.08.004-2.6.18 /c0 Model = 9690SA-8I /c0 Available Memory = 448MB I'll note that I've had terrible experience with 3ware controllers and getting a high number of iops using hardware raid mode. If you switch it to jbod and do softraid you'll get a large increase in iops - which is the key metric for a db. I've posted previously about my problems with 3ware. as for the ssd comment - I disagree. I've been running ssd's for a while now (probably closing in on a year by now) with great success. A pair of intel x25-e's can get thousands of iops. That being said the key is I'm running the intel ssds - there are plenty of absolutely miserable ssds floating around (I'm looking at you jmicron based disks!) Have you gone through the normal process of checking your query plans to ensure they are sane? There is always a possibility a new index can vastly reduce IO. -- Jeff Trout http://www.stuarthamm.net/ http://www.dellsmartexitin.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Linux I/O tuning: CFQ vs. deadline
On Feb 8, 2010, at 11:35 PM, da...@lang.hm wrote: And, yes, the whole I/O scheduling approach in Linux was just completely redesigned for a very recent kernel update. So even what we think we know is already obsolete in some respects. I'd done some testing a while ago on the schedulers and at the time deadline or noop smashed cfq. Now, it is 100% possible since then that they've made vast improvements to cfq and or the VM to get better or similar performance. I recall a vintage of 2.6 where they severely messed up the VM. Glad I didn't upgrade to that one :) Here's the old post: http://archives.postgresql.org/pgsql-performance/2008-04/msg00155.php -- Jeff Trout http://www.stuarthamm.net/ http://www.dellsmartexitin.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Linux I/O tuning: CFQ vs. deadline
On Feb 10, 2010, at 1:37 AM, Greg Smith wrote: Jeff wrote: I'd done some testing a while ago on the schedulers and at the time deadline or noop smashed cfq. Now, it is 100% possible since then that they've made vast improvements to cfq and or the VM to get better or similar performance. I recall a vintage of 2.6 where they severely messed up the VM. Glad I didn't upgrade to that one :) Here's the old post: http://archives.postgresql.org/pgsql-performance/2008-04/msg00155.php pgiosim doesn't really mix writes into there though, does it? The mixed read/write situations are the ones where the scheduler stuff gets messy. It has the abillity to rewrite blocks randomly as well - but I honestly don't remember if I did that during my cfq/deadline test. I'd wager I didn't. Maybe I'll get some time to run some more tests on it in the next couple days -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Jeff Trout http://www.stuarthamm.net/ http://www.dellsmartexitin.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] bgwriter tunables vs pg_stat_bgwriter
Since getting on 8.4 I've been monitoring things fairly closely. I whipped up a quick script to monitor pg_stat_bgwriter and save deltas every minute so I can ensure my bgwriter is beating out the backends for writes (as it is supposed to do). Now, the odd thing I'm running into is this: bgwriter_delay is 100ms (ie 10 times a second, give or take) bgwriter_lru_maxpages is 500 (~5000 pages / second) bgwriter_lru_multiplier is 4 Now, assuming I understand these values right the following is what should typically happen: while(true) { if buffers_written > bgwriter_lru_maxpages or buffers_written > anticipated_pages_needed * bgwriter_lru_multiplier { sleep(bgwriter_delay ms) continue; } ... } so I should not be able to have more than ~5000 bgwriter_clean pages per minute. (this assumes writing takes 0ms, which of course is inaccurate) However, I see this in my stats (they are deltas), and I'm reasonably sure it is not a bug in the code: (timestamp, buffers clean, buffers_checkpoint, buffers backend) 2010-02-17 08:23:51.184018 | 1 | 1686 | 5 2010-02-17 08:22:51.170863 | 15289 | 12676 | 207 2010-02-17 08:21:51.155793 | 38467 | 8993 |4277 2010-02-17 08:20:51.139199 | 35582 | 0 |9437 2010-02-17 08:19:51.125025 | 8 | 0 | 3 2010-02-17 08:18:51.84 | 1140 | 1464 | 6 2010-02-17 08:17:51.098422 | 0 | 1682 | 228 2010-02-17 08:16:51.082804 |50 | 0 | 6 2010-02-17 08:15:51.067886 | 789 | 0 | 1 perhaps some stats buffering occurring or something or some general misunderstanding of some of these tunables? -- Jeff Trout http://www.stuarthamm.net/ http://www.dellsmartexitin.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] bgwriter tunables vs pg_stat_bgwriter
On Feb 17, 2010, at 6:23 PM, Greg Smith wrote: JWith bgwriter_lru_maxpages=500 and bgwriter_delay=100ms, you can get up to 5000 pages/second which makes for 300,000 pages/minute. So none of your numbers look funny just via their scale. This is why the defaults are so low--the maximum output of the background writer is quite big even before you adjust it upwards. d'oh! that would be the reason. Sorry folks, nothing to see here :) There are however two bits of stats buffering involved. Stats updates don't become visible instantly, they're buffered and only get their updates pushed out periodically to where clients can see them to reduce overhead. Also, the checkpoint write update happens in one update at the end--not incrementally as the checkpoint progresses. The idea is that you should be able to tell if a checkpoint happened or not during a period of monitoring time. You look to be having checkpoints as often as once per minute right now, so something isn't right--probably checkpoint_segments is too low for your workload. checkpoint_segments is currently 32. maybe I'll bump it up - this db does a LOT of writes By the way, your monitoring code should be saving maxwritten_clean and buffers_allocated, too. While you may not be doing something with them yet, the former will shed some light on what you're running into now, and the latter is useful later down the road you're walking. It is, I just didn't include them in the mail. -- Jeff Trout http://www.stuarthamm.net/ http://www.dellsmartexitin.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Some vacuum & tuning help
On Tue, 5 Aug 2003, Shridhar Daithankar wrote: > > I would suggest autovacuum daemon which is in CVS contrib works for 7.3.x as > well.. Or schedule a vacuum analyze every 15 minutes or so.. > > Good Call. I'll give that a whirl and let you know. > I think vacuum full is required. > D'oh. Would this be a regular thing? I suppose we could do it weekly. As for the pg_dumping of it. I suppose it would work on this table as it is only a couple million rows and not terribly big data-wise. The other tables in this db are rather big and a load is not fast. (It is about 8GB). thanks > You mean linux? I guess you need a kernel revision for a long time. How about > 2.4.21? > Yeah, linux. We're planning on upgrading when we relocate datacenters at the end of August. This machine has actually been up for 486 days (We're hoping to reach linux's uptime wraparound of 496 days :) and the only reason it went down then was because the power supply failed. (That can be read: pg7.0.2 had over a year of uptime. lets hope 7.3 works as good :) -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[PERFORM] Some vacuum & tuning help
I've been trying to search through the archives, but it hasn't been successful. We recently upgraded from pg7.0.2 to 7.3.4 and things were happy. I'm trying to fine tune things to get it running a bit better and I'm trying to figure out how vacuum output correlates to tuning parameters. Here's the msot recent vacuum for the "active" table. It gets a few hundred updates/inserts a minute constantly throughout the day. INFO: Pages 27781: Changed 0, Empty 0; Tup 2451648: Vac 0, Keep 0, UnUsed 1003361. Total CPU 2.18s/0.61u sec elapsed 2.78 sec. I see unused is quite high. This morning I bumped max_fsm_pages to 50. If I'm thinking right you want unused and max_fsm to be closish, right? (Yesterday it was down around.. oh.. 600k?) I'm thinking vacuum full's may be in order. Which stinks because I was hoping to do away with the db essentially down for 10 minutes (includes all the db's on that machine) while it vacuum'd. The upside is: it is performing great. During the vacuum analyze I do get a few multi-second pauses while something occurs. I figured it was a checkpoint, so I bumped checkpoint_timeout to 30 seconds and wal_buffers to 128. (I'm just guessing on wal_buffers). Machine is weenucks 2.2.17 on a dual p3 800, 2gb ram, 18gb drive (mirrored). If you guys need other info (shared_buffers, etc) I'll be happy to funish them. but the issue isn't query slowness.. just want to get this thing oiled). thanks -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Perfomance Tuning
On Tue, 12 Aug 2003, Christopher Browne wrote: > Are you _certain_ that's still true? Have you a metric that shows > Informix being 10x faster on a modern system? That would be quite > surprising... > We were forced (for budget reason) to switch from raw disk to cooked files on our informix db. We took a huge hit - about 5-6x slower. Granted part of that was because informix takes number of spindles, etc into account when generating query plans and the fact running UPDATE STATISTICS (think Vacuum analyze) on the version we run locks the table exclusively. And it is unacceptable to have our "main table" unavailable for hours and hours while the update runs. (For the record: its a 8cpu sun e4500 running sol2.6. The raw disks were on a hitachi fibre array and the cooked files were on a raid5 (scsi). Forget how many spindles in the raid. There were 20 raw disks) Informix, etc. have spent a lot of time and money working on it. They also have the advantage of having many paid fulltime developers who are doing this for a job, not as a weekend hobby (Compared to the what? 2-3 full time PG developers). The other advantage (which I hinted to above) with raw disks is being able to optimize queries to take advantage of it. Informix is multithreaded and it will spawn off multiple "readers" to do say, a seq scan (and merge the results at the end). So if you have a table across say, 3 disks and you need to do a seq scan it will spawn three readers to do the read. Result: nice and fast (Yes, It may not always spawn the three readers, only when it thinks it will be a good thing to do) I think for PG the effort would be much better spent on other features... like replication and whatnot. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Benchmark
On Fri, 15 Aug 2003, Richard Huxton wrote: > If you do find some flexible, scriptable web testing system that can read/fill > out forms etc please post to the list - I've had no luck finding anything I > like. > There was a tool created by altavista R&D called "WebL" that I used on a project. I think if you search around you'll be able to find it. However, I think it has developed "bit rot" from not being touched in so long. My old webl stuff will no longer work. But what is it? it is a web scraping language written in java. Fast it is not. Easy to scrape and interact with pages: YES. It has all sorts of things for locating fields, locating table cells, etc. (I used it for writing a prototype that would scrape a competitors site and import the data into our application :) but if it doesn't work LWP isn't _that_ bad. You will end up in regex hell though. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] What is the fastest way to get a resultset
On Mon, 25 Aug 2003, Bupp Phillips wrote: > > I have a table that has 103,000 records in it (record size is about 953 > bytes) and when I do a select all (select * from ) it takes a > whopping 30 secs for the data to return!! > > SQLServer on the other hand takes 6 secs, but you can also use what is > called a firehose cursor, which will return the data in < 1 sec. > You probably want a cursor. Typically what happens is postgres sends _all_ the data to the client - which can be rather substantial. A cursor allows you to say "get me the first 1000 records. now the next 1000" - it should get you the speed you want. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[PERFORM] Sun vs a P2. Interesting results.
ntime: 378.21 msec (6 rows) Sun: Limit (cost=2521.19..2521.24 rows=20 width=67) (actual time=1041.14..1041.20 r ows=20 loops=1) -> Sort (cost=2520.94..2521.39 rows=178 width=67) (actual time=1040.96..104 1.08 rows=121 loops=1) Sort Key: dob -> Seq Scan on userprofile (cost=0.00..2514.28 rows=178 width=67) ( actual time=0.37..1014.50 rows=1783 loops=1) Filter: (gender_v AND (gender = 'm'::character varying) AND count ry_v AND (country = 'br'::character varying) AND (dob_v = true) AND (dob >= '197 4-08-26 08:21:52.158181-04'::timestamp with time zone) AND (dob <= '1985-08-26 0 8:21:52.158181-04'::timestamp with time zone)) Total runtime: 1042.54 msec (6 rows) They are loaded with the exact same dataset - 53k rows, ~10MB Notice the estimates are roughly the same, but the execution time is different. I don't think it is the IO system, since 10MB will be cached by the OS and iostat reports no activity on the disks (when running the query many times over and over and in parellel). it is a simple query.. Could it just be that the sun sucks? (And for the record - same schema, nearly same query (modified for datetime syntax) on informix runs in 3 seconds). -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Sun vs a P2. Interesting results.
On Tue, 26 Aug 2003, Darcy Buskermolen wrote: > Also, after having taken another look at this, you aren't preforming the same > query on both datasets, so you can't expect them to generate the same > results, or the same query plans, or even comparable times. Please retry your > tests with identical queries , specify the dates, don;t use a function like > now() to retrieve them. > Given what you said in the previous email and this one here's some new information. I redid the query to use a static starting time and I ran 19 beaters in parallel. After I send this mail out I'll try it with 40. New Query: select userkey, dob, email, gender, country from userprofile where gender_v and gender='m' and country_v and country = 'br' and dob_v and dob >= '2003-08-26'::timestamptz - '29 years'::interval and dob <= '2003-08-26'::timestamptz - '18 years'::interval order by dob asc limit 20 offset 100 Explain Analyze's: (basically the same) Sun: Limit (cost=2390.05..2390.10 rows=20 width=67) (actual time=1098.34..1098.39 rows=20 loops=1) -> Sort (cost=2389.80..2390.24 rows=178 width=67) (actual time=1098.16..1098.28 rows=121 loops=1) Sort Key: dob -> Seq Scan on imuserprofile (cost=0.00..2383.14 rows=178 width=67) (actual time=0.38..1068.94 rows=1783 loops=1) Filter: (gender_v AND (gender = 'm'::character varying) AND country_v AND (country = 'br'::character varying) AND dob_v AND (dob >= '1974-08-26 00:00:00-04'::timestamp with time zone) AND (dob <= '1985-08-26 00:00:00-04'::timestamp with time zone)) Total runtime: 1099.93 msec (6 rows) p2 Limit (cost=2353.38..2353.43 rows=20 width=67) (actual time=371.75..371.83 rows=20 loops=1) -> Sort (cost=2353.13..2353.60 rows=186 width=67) (actual time=371.46..371.63 rows=121 loops=1) Sort Key: dob -> Seq Scan on imuserprofile (cost=0.00..2346.14 rows=186 width=67) (actual time=0.17..345.53 rows=1783 loops=1) Filter: (gender_v AND (gender = 'm'::character varying) AND country_v AND (country = 'br'::character varying) AND dob_v AND (dob >= '1974-08-26 00:00:00-04'::timestamp with time zone) AND (dob <= '1985-08-26 00:00:00-04'::timestamp with time zone)) Total runtime: 372.63 msec (6 rows) I ran this query 100 times per beater (no prepared queries) and ran 19 beaters in parellel. P2 Machine: 345sec avg Sun:565sec avg I know solaris/sun isn't the preferred pg platform, and we have plenty of capicty even with these numbers, I just find it a little suprising the speed difference. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Sun vs a P2. Interesting results.
On Tue, 26 Aug 2003, Darcy Buskermolen wrote: > I'm still seeing differences in the planner estimates, have you run a VACUUM > ANALYZE prior to running these tests? > I did. I shall retry that.. but the numbers (the cost estimates) are pretty close on both. the actual times are very different. > Also, are the disk subsystems in these 2 systems the same? You may be seeing > some discrepancies in things spindle speed, U160 vs U320, throughput on > specific RAID controlers, different blocksize, ect. > As I said in my first email IO isn't the problem here - the data set is small enough that it is all cached (~10MB). iostat reports 0 activity on the disks on both the sun and p2. and I just ran teh test again with 40 clients: 730s for hte p2, 1100 for the sun. (0% idle on both of them, no IO). I think the next I may try is recompiling with a newer gcc. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Sun vs a P2. Interesting results.
Well, installing gcc 3.3.1 and using -mcpu=v9 didn't help. in fact it made things worse. Unless someone has something clever I'm just gonna stop tinkering with it - my goal was met (it is several orders of magnitude faster than informix ) and the hardware is being replaced in a month or two. thanks for the ideas / comments. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Sun vs a P2. Interesting results.
On Tue, 26 Aug 2003, Neil Conway wrote: > > Would it be possible to get a profile (e.g. gprof output) for a postgres > backend executing the query on the Sun machine? > Heh. Never thought of doing a profile! I attached the entire gprof output, but here's the top few functions. I did the test, 1 beater, 100 searches: 148 seconds total. 30.9 45.5545.55nocachegetattr [16] 16.0 69.2023.65internal_mcount [22] 6.9 79.3710.17 5245902 0.00 0.00 heapgettup [21] 6.0 88.28 8.91 3663201 0.00 0.00 ExecMakeFunctionResult [23] 5.4 96.27 7.99 11431400 0.00 0.00 ExecEvalVar [25] 3.0 100.73 4.46 18758201 0.00 0.00 ExecEvalExpr [24] 3.0 105.17 4.44 5246005 0.00 0.00 AllocSetReset [29] 2.5 108.89 3.72 5245700 0.00 0.00 HeapTupleSatisfiesSnapshot [30] 2.0 111.78 2.89 5650632 0.00 0.00 LWLockRelease [32] 1.6 114.10 2.32 5650632 0.00 0.00 LWLockAcquire [34] 1.6 116.40 2.30 5245800 0.00 0.01 SeqNext [17] 1.4 118.54 2.14 5438301 0.00 0.00 ExecStoreTuple [27] 1.4 120.62 2.08 5245700 0.00 0.01 ExecQual [18] 1.3 122.50 1.88 5379202 0.00 0.00 ReleaseAndReadBuffer [35] 1.1 124.16 1.66 178400 0.01 0.40 ExecScan [15] 1.1 125.80 1.64_mcount (6247) 1.1 127.41 1.61 5245902 0.00 0.01 heap_getnext [20] .. as it turns out the profile gzipped is still huge (100kb) so I put it on my web server - snag it at http://www.jefftrout.com/~threshar/postgres/postgres-7.3.4-sol8-gprof.txt.gz I'll do a profile for hte p2 and send post that in an hour or two -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] The results of my PostgreSQL/filesystem performance
On Tue, 26 Aug 2003, Bill Moran wrote: > > Intelligent feedback is welcome. > That's some good work there, Lou. You'll make sgt for that someday. But I think the next step, before trying out other filesystems and options would be concurrency. Run a bunch of these beasts together and see what happens (I don't think too many of us have a single session running). Perhaps even make them "interfere" with each other to create as much "pain" as possible? on a side note - I might be blind here - I didn't see what version of pg you were using or any postgresql.conf tweaks - or did you just use whatever came with each distro? -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] bad estimates
On Fri, 29 Aug 2003, Ken Geis wrote: > Some good news here. Doing the same as above on 7.4beta2 took 29 > minutes. Now, the 7.3.3 was on reiser and 7.4 on ext2, so take that as > you will. 7.4's index selectivity estimate seems much better; 7.3.3's > anticipated rows was ten times the actual; 7.4's is one half of the actual. > Min() & Max() unfortunatly suck on PG. It will be that way for a while perhaps at some point someone will make a "special" case and convince -HACKERS it is a Good Thing(tm) (Like select count(*) from table being 'cached' - a lot of people probably get bad first impressions because of that) Would it be possible ot rewrite your queries replacing min/max with a select stock_id from bigtable where blah = blorch order by stock_id (desc|asc) limit 1? because that would enable PG to use an index and magically "go fast". You may need a subselect.. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Selecting random rows efficiently
On Sat, 30 Aug 2003, Richard Jones wrote: > Hi, > i have a table of around 3 million rows from which i regularly (twice a second > at the moment) need to select a random row from > > currently i'm doing "order by rand() limit 1" - but i suspect this is > responsible for the large load on my db server - i guess that PG is doing far > too much work just to pick one row. > If you have an int id (aka serial) column then it is simple - just pick a random number between 1 and currval('id_seq')... or offset rand() limit 1 perhaps? since you want random ther eis no need to bother with an order and that'll save a sort. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Upgrade Woes
On Thu, 11 Sep 2003, [EMAIL PROTECTED] wrote: > > The Vacuum full is performed once at the end of the whole job. > have you also tried vacuum analyze periodically - it does not lock the table and can help quite a bit? still odd why it would be that much slower between those versions. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] software vs hw hard on linux
Due to various third party issues, and the fact PG rules, we're planning on migrating our deplorable informix db to PG. It is a rather large DB with a rather high amount of activity (mostly updates). So I'm going to be aquiring a dual (or quad if they'll give me money) box. (In my testing my glorious P2 with a 2 spindle raid0 is able to handle it fairly well) What I'm wondering about is what folks experience with software raid vs hardware raid on linux is. A friend of mine ran a set of benchmarks at work and found sw raid was running obscenely faster than the mylex and (some other brand that isn't 3ware) raids.. On the pro-hw side you have ones with battery backed cache, chacnes are they are less likely to fail.. On the pro-sw side you have lots of speed and less cost (unfortunately, there is a pathetic budget so spending $15k on a raid card is out of the question really). any thoughts? -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Inconsistent performance
On Tue, 16 Sep 2003, Joseph Bove wrote: > I still think that 3 seconds is not acceptable. However, I reserve the > right to be wrong. Does it sound unrealistic to expect PostgreSQL to be > able to read 90,000 rows with 300 bytes per row in under a second? > first, check to see what your max throughput on your disk is using a benchmark such as Bonnie (Making sure to use a size LARGER than phsyical memory. 2x physical is veyr optimial). next, run your query again with a vmstat 1 running in another term. See how close the vmstat "bi" numbers correspond to your max according to bonnie. You could have an IO bottleneck. (I once went running around trying to figure it out and then discovered the issue was IO). -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] LIKE query running slow
On Tue, 23 Sep 2003, Garrett Bladow wrote: > Recently we upgraded the RAM in our server. After the install a LIKE query that used > to take 5 seconds now takes 5 minutes. We have tried the usual suspects, VACUUM, > ANALYZE and Re-indexing. > > Any thoughts on what might have happened? > Did you reload the db? If you did perhaps you didn't use the "C" locale? That can cause a huge slowdown. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Tuning/performance issue...
On Tue, 30 Sep 2003, David Griffiths wrote: > > This is all part of a "migrate away from Oracle" project. We are looking at > 3 databases - > MySQL (InnoDB), Postgres and Matisse (object oriented). We have alot of > queries like this > or worse, and I'm worried that many of them would need to be re-written. The > developers > know SQL, but nothing about tuning, etc. > There's a movement at my company to ditch several commercial db's in favor of a free one. I'm currently the big pg fan around here and I've actually written a rather lengthy presentation about pg features, why, tuning, etc. but another part was some comparisons to other db's.. I decided so I wouldn't be blinding flaming mysql to give it a whirl and loaded it up with the same dataset as pg. First thing I hit was lack of stored procedures. But I decided to code around that, giving mysql the benefit of the doubt. What I found was interesting. For 1-2 concurrent 'beaters' it screamed. ultra-fast. But.. If you increase the concurrent beaters up to say, 20 Mysql comes to a grinding halt.. Mysql and the machine itself become fairly unresponsive. And if you do cache unfriendly queries it becomes even worse. On PG - no problems at all. Scaled fine and dandy up. And with 40 concurrent beaters the machine was still responsive. (The numbers for 20 client was 220 seconds (pg) and 650 seconds (mysql)) So that is another test to try out - Given your configuration I expect you have lots of concurrent activity. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Tuning/performance issue...
On Wed, 1 Oct 2003, Oleg Lebedev wrote: > Jeff, > I would really appreciate if you could send me that lengthy presentation > that you've written on pg/other dbs comparison. > Thanks. > After I give the presentation at work and collect comments from my coworkers (and remove some information you folks don't need to know :) I will be very willing to post it for people to see. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] count(*) slow on large tables
On Thu, 2 Oct 2003, Christopher Browne wrote: > I can't imagine why the raw number of tuples in a relation would be > expected to necessarily be terribly useful. > We use stuff like that for reporting queries. example: On our message boards each post is a row. The powers that be like to know how many posts there are total (In addition to 'today')- select count(*) from posts is how it has been done on our informix db. With our port to PG I instead select reltuples pg_class. I know when I login to a new db (or unknown to me db) the first thing I do is look at tables and see what sort of data there is.. but in code I'd rarely do that. I know some monitoring things around here also do a select count(*) on sometable to ensure it is growing, but like you said, this is easily done with the number of pages as well. yes. Informix caches this data. I believe Oracle does too. Mysql with InnoDB does the same thing PG does. (MyISAM caches it) -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Tuning/performance issue...
On Fri, 3 Oct 2003, Bruce Momjian wrote: > > I have updated the FAQ to be: > > In comparison to MySQL or leaner database systems, we are > faster for multiple users, complex queries, and a read/write query > load. MySQL is faster for SELECT queries done by a few users. > > Is this accurate? It seems so. > > Another thing I noticed - If you use a dataset that can live in mysql's query cache / os cache it screams, until it has to hit the disk. then GRINDING HALT. It would be nice if someone (I don't have the time now) did a comparison of say: selct value where name = XXX; [where xxx varies] with 1,10,20,50 connections then make progressively more complex queries. And occasionally point out mysql silly omissions: select * from myview where id = 1234 [Oh wait! mysql doesn't have views. Ooopsy!] Wrapping up - PG is not that slow for simple queries either. It can be rather zippy - and PREPARE can give HUGE gains - even for simple statements. I've often wondered if YACC, etc is a bottleneck (You can only go as fast as your parser can go). Hurray for PG! And I'm giving my PG presentation monday. I hope to post it tuesday after I update with comments I receive and remove confidential information. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] reindex/vacuum locking/performance?
On Sun, 5 Oct 2003, Neil Conway wrote: > > > I don't know any portable way to do that :-( > > For the non-portable way of doing this, are you referring to O_DIRECT? > > Even if it isn't available everywhere, it might be worth considering > this at least for the platforms on which it is supported. > I strongly agree here only if we can prove there is a benefit. I think it would be silly of us if some OS supported SnazzyFeatureC that was able to speed up PG by a large percentage (hopefully, in a rather non-invasive way in the code). But, I do see the problem here with bloat and PG being radically different platform to platform. I suppose we could dictate that at least N os's had to have it.. or perhaps supply it has contrib/ patches Something to think about. I'd be interested in tinkering with this, but I'm more interested at the moment of why (with proof, not antecdotal) Solaris is so much slower than Linux and what we cna do about this. We're looking to move a rather large Informix db to PG and ops has reservations about ditching Sun hardware. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] locking/performance, Solaris performance discovery
On Mon, 6 Oct 2003, Andrew Sullivan wrote: > There's plenty of academic work which purports to show that LRU is > far from the best choice. Just in principle, it seems obvious that a > single-case seqscan-type operation (such as vacuum does) is a good > way to lose your cache for no real gain. > Logically bypassing caches for a seq scan also makes sense. > Interestingly, we're contemplating ditching Solaris because of the > terrible reliability we're getting from the hardware. > The reason ops likes solaris / sun is twofold. 1. we have a pile of big sun machines around. 2. Solaris / Sun is quite a bit more graceful in the egvent of a hardware failure. We've burned out our fair share of cpu's etc and solaris has been rather graceful about it. I've started profiling and running tests... currently it is leaning towards the sysv semaphores. I see in src/backend/port/ that pg_sema.c is linked to the sysv implementation. So what I did was create a semaphore set, and then fired off 5 copies of a program that attaches to that semaphore and then locks/unlocks it 1M times. 2xP2-450, Linux 2.4.18: 1 process: 221680 / sec, 5 process: 98039 / sec 4xUltraSparc II-400Mhz, Solaris 2.6: 1 proc: 142857 / sec, 5 process: 23809 So I'm guessing that is where a LOT of the suck is coming from. What I plan to do next is looking to see if there are other interprocess locking mechanisms on solaris (perhaps pthread_mutex with that inter-process flag or something) to see if I can get those numbers a little closer. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] locking/performance, Solaris performance discovery
On Mon, 6 Oct 2003, Tom Lane wrote: > > Does Solaris have Posix semaphores? You could try using those instead. > > regards, tom lane Yep. It does. I learned them quick enough (using posix_sema.c as a guide) and found out that at least on Sol 2.6 they are slower than sysv - with 5 processes it went to about 16k lock/unlock a second. I'm going to try to find a box around here I can get sol(8|9) on that has sufficient disk space and see. I'm guessing sun has likely made improvements... Another odd thing I'm trying to work out is why my profiles come out so radically different on the linux box and the sun box. Sun: 31.17 18.9018.90 internal_mcount 19.10 30.4811.58 8075381 0.00 0.00 _bt_checkkeys 5.66 33.91 3.43 24375253 0.00 0.00 FunctionCall2 4.82 36.83 2.92 8073010 0.00 0.00 _bt_step 3.51 38.96 2.1314198 0.15 0.15 _read 2.77 40.64 1.68 8069040 0.00 0.00 varchareq 2.59 42.21 1.5728454 0.06 0.23 _bt_next 2.29 43.60 1.39 1003 1.39 1.40 AtEOXact_Buffers 1.86 44.73 1.13 16281197 0.00 0.00 pg_detoast_datum 1.81 45.83 1.10 _mcount 1.68 46.85 1.02 2181 0.47 0.47 pglz_decompress Linux: 11.14 0.62 0.62 1879 0.00 0.00 pglz_decompress 6.71 0.99 0.37 1004 0.00 0.00 AtEOXact_Buffers 3.80 1.20 0.21 1103045 0.00 0.00 AllocSetAlloc 3.23 1.38 0.18 174871 0.00 0.00 nocachegetattr 2.92 1.54 0.16 1634957 0.00 0.00 AllocSetFreeIndex 2.50 1.68 0.1420303 0.00 0.00 heapgettup 1.93 1.79 0.11 1003 0.00 0.00 AtEOXact_CatCache 1.76 1.89 0.10 128442 0.00 0.00 hash_any 1.72 1.98 0.1090312 0.00 0.00 FunctionCall3 1.69 2.08 0.0950632 0.00 0.00 ExecTargetList 1.60 2.17 0.0951647 0.00 0.00 heap_formtuple 1.55 2.25 0.09 406162 0.00 0.00 newNode 1.46 2.33 0.08 133044 0.00 0.00 hash_search It is the same query with slightly different data (The Sun has probably.. 20-40k more rows in the table the query hits). I'll be digging up more info later today. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[PERFORM] SOlaris updates
Ran the test on another linux box - the one that generated the dump the sun loaded (which should have similar data...) and I got a profile plan similar to the Sun. Which makes me feel more comfortable. Still interesting why that other box gave me the different profile. Now off the fun and exciting world of seeing what I can do about it. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Postgres low end processing.
On Tue, 7 Oct 2003, Stef wrote: > The initial instance took up 8372K and this fluctuated > between +- 8372K and 10372K, plus +- 3500K for > every connection. > Does that include/exlude the size of say, shared code & libraries? I know linux does copy-on-write forking.. so it may be less in reality... -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[PERFORM] Sun performance - Major discovery!
Well, as you guys know I've been tinkering with sun-vs-linux postgres for a while trying to come up with reasons for the HUGE performance differences. We've all had our anecdotal thoughts (fork sucks, ipc sucks, ufs sucks, etc) and I've had a breakthrough. Knowing that GCC only produces good code on x86 (and powerpc with apple's mods, but it is doubtful that is as good as ibm's power compiler) I decided to try out Sunsoft CC. I'd heard from more than one person/place that gcc makes abysmal sparc code. Given that the performance profiles for both the linux and sun boxes showed the same functions taking up most of the time I thought I'd see what a difference sunsoft could give me. So - hardware - Sun E450 4x400mhz ultrasparc IIi, 4GB ram, scsi soemthing disk. (not raid) solaris 2.6 Linux - 2xP3 500mhz, 2GB, scsi disk of some flavor (not raid) linux 2.2.17 (old I know!) So here's the results using my load tester (single connection per beater, repeats the query 1000 times with different input each time (we'll get ~20k rows back), the query is a common query around here. I discounted the first run of the test as caches populated. Linux - 1x - 35 seconds, 20x - 180 seconds Sun - gcc - 1x 60 seconds 20x 245 seconds Sun - sunsoft defaults - 1x 52 seonds 20x [similar to gcc most likely] Sun - sunsoft -fast - 1x 28 seconds 20x 164 seconds As you math guru's can probably deduce - that is a rather large improvement. And by rather large I mean hugely significant. With results like this, I think it warrants mentioning in the FAQ_Solaris, and probably the performance guide. Connecting will always be a bit slower. But I think most people realize that connecting to a db is not cheap. I think update/etc will cause more locking, but I think IO will become the bottle neck much sooner than lock/unlock will. (This is mostly anecdotal given how fast solaris can lock/unlock a semaphore and how much IO I know I have) Oh yes, with was with 7.3.4 and sunsoft cc Sun WorkShop 6 update 1 C 5.2 2000/09/11 (which is old, perhaps newer ones make even better code?) I'm not sure of PG's policy of non-gcc things in configure, but perhaps if we detect sunsoft we toss in the -fast flag and maybe make it the preferred one on sun? [btw, it compiled with no changes but it did spew out tons of warnings] comments? -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Sun performance - Major discovery!
On Wed, 8 Oct 2003, Andrew Sullivan wrote: > My worry about this test is that it gives us precious little > knowledge about concurrent connection slowness, which is where I find > the most significant problems. When we tried a Sunsoft cc vs gcc 2.95 > on Sol 7 about 1 1/2 years ago, we found more or less no difference > once we added more than 5 connections (and we always have more than 5 > connections). It might be worth trying again, though, since we moved > to Sol 8. > The 20x column are the results when I fired up 20 beater concurrently. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Sun performance - Major discovery!
On Wed, 8 Oct 2003, Neil Conway wrote: > What is the query? > It retrieves an index listing for our boards. The boards are flat (not threaded) and messages are numbered starting at 1 for each board. If you pass in 0 for the start_from it assumes the latest 60. And it should be noted - in some cases some boards have nearly 2M posts. Index on board_name, number. I cannot give out too too much stuff ;) create or replace function get_index2(integer, varchar, varchar) returns setof snippet as ' DECLARE p_start alias for $1; p_board alias for $2; v_start integer; v_num integer; v_body text; v_sender varchar(35); v_time timestamptz; v_finish integer; v_row record; v_ret snippet; BEGIN v_start := p_start; if v_start = 0 then select * into v_start from get_high_msg(p_board); v_start := v_start - 59; end if; v_finish := v_start + 60; for v_row in select number, substr(body, 0, 50) as snip, member_handle, timestamp from posts where board_name = p_board and number >= v_start and number < v_finish order by number desc LOOP return next v_row; END LOOP; return; END; ' language 'plpgsql'; > Interesting (and surprising that the performance differential is that > large, to me at least). Can you tell if the performance gain comes from > an improvement in a particular subsystem? (i.e. could you get a profile > of Sun/gcc and compare it with Sun/sunsoft). > I'll get these later today. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[PERFORM] Presentation
The boss cleared my de-company info-ified pg presentation. It deals with PG features, crude comparison to other dbs, install, admin, and most importantly - optimization & quirks. Its avail in powerpoint and (ugg) powerpoint exported html. Let me know if there are blatant errors, etc in there. Maybe even slightly more subtle blatant errors :) The people here thought it was good. http://postgres.jefftrout.com/ -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Sun performance - Major discovery!
On Wed, 8 Oct 2003, Neil Conway wrote: > Interesting (and surprising that the performance differential is that > large, to me at least). Can you tell if the performance gain comes from > an improvement in a particular subsystem? (i.e. could you get a profile > of Sun/gcc and compare it with Sun/sunsoft). > Yeah - like I expected it was able to generate much better code for _bt_checkkeys which was the #1 function in gcc on both sun & linux. and as you can see, suncc was just able to generate much nicer code. I'd look at the assembler output but that won't be useful since I am very unfamiliar with the [ultra]sparc instruction set.. Here's the prof and gprof output for the latest run: GCC: % cumulative self self total time seconds secondscalls ms/call ms/call name 31.52 19.4419.44 internal_mcount 20.28 31.9512.51 8199466 0.00 0.00 _bt_checkkeys 5.61 35.41 3.46 8197422 0.00 0.00 _bt_step 5.01 38.50 3.09 24738620 0.00 0.00 FunctionCall2 3.00 40.35 1.85 8194186 0.00 0.00 varchareq 2.61 41.96 1.6124309 0.07 0.28 _bt_next 2.42 43.45 1.49 1003 1.49 1.51 AtEOXact_Buffers 2.37 44.91 1.4612642 0.12 0.12 _read 2.33 46.35 1.44 16517771 0.00 0.00 pg_detoast_datum 2.08 47.63 1.28 8193186 0.00 0.00 int4lt 1.35 48.46 0.83 8237204 0.00 0.00 BufferGetBlockNumber 1.35 49.29 0.83 8193888 0.00 0.00 int4ge 1.35 50.12 0.83 _mcount SunCC -pg -fast. %Time Seconds Cumsecs #Calls msec/call Name 23.24.274.27108922056 0. _mcount 20.73.828.09 8304052 0.0005 _bt_checkkeys 13.72.53 10.6225054788 0.0001 FunctionCall2 5.10.94 11.56 24002 0.0392 _bt_next 4.40.81 12.37 8301867 0.0001 _bt_step 3.40.63 13.00 8298219 0.0001 varchareq 2.70.50 13.5016726855 0. pg_detoast_datum 2.40.45 13.95 8342464 0.0001 BufferGetBlockNumber 2.40.44 14.39 8297941 0.0001 int4ge 2.20.41 14.801003 0.409 AtEOXact_Buffers 2.00.37 15.17 4220349 0.0001 lc_collate_is_c 2.00.37 15.54 8297219 0. int4lt 1.60.29 15.83 26537 0.0109 AllocSetContextCreate 0.90.16 15.991887 0.085 pglz_decompress 0.70.13 16.12 159966 0.0008 nocachegetattr 0.70.13 16.25 4220349 0. varstr_cmp 0.60.11 16.36 937576 0.0001 MemoryContextAlloc 0.50.09 16.45 150453 0.0006 hash_search > -Neil > > > > ---(end of broadcast)--- > TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] > > -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Presentation
On Wed, 8 Oct 2003, Shridhar Daithankar wrote: Thanks for the nitpicks :) I've taken some into consideration. I also signed onto the advocacy list so I can be in on discussions there. Feel free to convert to whatever format you'd like. I originally started working on it in OpenOffice, but I got mad at it. So I switched to powerpoint and got mad at that too :) -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Presentation
On Wed, 8 Oct 2003, Shridhar Daithankar wrote: > * Same slide. IIRC postgresql always compresses bytea/varchar. Not too much sure > about which but there is something that is compressed by default..:-) > * Tablespaces has a patch floating somewhere. IIRC Gavin Sherry is the one who > is most ahead of it. For all goodness, they will feature in 7.5 and design is For the sake of things, I didn't include any features a patch provides. I did include things that may appear in contrib/. > * Mysql transaction breaks down if tables from different table types are involved. > * Mysql transactions do not feature constant time commit/rollback like > postgresql. The time to rollback depends upon size of transaction > * Mysql does not split large files in segments the way postgresql do. Try > storing 60GB of data in single mysql table. I didn't add these ones. The user can figure this one out. Perhaps when we/me expands this into multiple documents we can expand on this. > * Slide on caching. Postgresql can use 7000MB of caching. Important part is it > does not lock that memory in it's own process space. OS can move around buffer > cache but not memory space of an application. I'm guilty of this myself - when I first started pg I was looking for a way to make it use a zillion megs of memory like we have informix do - Perhaps I'll reword that segment.. the point was to show PG relies on the OS to do a lot of caching and that it doesn't do it itself. > * Using trigger for maintening a row count would generate as much dead rows as > you wanted to avoid in first place..:-) We all know this.. but it is a way to get a fast select count(*) from table > All of them are really minor. It's a very well done presentation but 45 slides > could be bit too much at a time. I suggest splitting the presentation in 3. > Intro and comparison, features, administration, programming and tuning. Wow.. > they are 5..:-) > Yeah. What I'd really love to do is de-powerpointify it and make it a nice set of "real" web pages. > Can you rip out informix migration? It could be a good guide by itself. > I agree. It would be good to rip out. I think we have the oracle guide somewhere.. I've put this updated on up on hte postgres.jefftrout.com site along with openoffice version. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Sun performance - Major discovery!
On Wed, 8 Oct 2003, Neil Conway wrote: > > What CFLAGS does configure pick for gcc? From > src/backend/template/solaris, I'd guess it's not enabling any > optimization. Is that the case? If so, some gcc numbers with -O and -O2 > would be useful. > I can't believe I didn't think of this before! heh. Turns out gcc was getting nothing for flags. I added -O2 to CFLAGS and my 60 seconds went down to 21. A rather mild improvment huh? I did a few more tests and suncc still beats it out - but not by too much now (Not enought to justify buying a license just for compiling pg) I'll go run the regression test suite with my gcc -O2 pg and the suncc pg. See if they pass the test. If they do we should consider adding -O2 and -fast to the CFLAGS. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 8: explain analyze is your friend