Re: [PERFORM] [ADMIN] Benchmarking postgres on Solaris/Linux
> Personally, I've been unimpressed by Dell/Xeon; I think the Sun might do > better than you think, comparitively.On all the Dell servers I've used > so > far, I've not seen performance that comes even close to the hardware > specs. It's true that any difference will be far less than the GHz ratio, and I can't really speak for Dell servers in general, but a pair of 2.4GHz Xeons in a Dell workstation gets about 23 SPECint_rate2000, and a pair of 1GHz UltraSparc IIIs in a SunFire V210 gets 10. The ratios are the same for other non-FP benchmarks. Now the Suns do have some architectural advantages, and they used to have far superior memory bandwidth than intel boxes, and they often still do for more than 2 cpus, and definitely do for more than four. But my personal experience is that for 4 cpus or less the entry level UNIX offerings from Sun/IBM/HP fell behind in raw performance (FP excepted) two or three years ago. The posh hardware's an entirely different matter of course. On the other hand, I can think of innumerable non performance related reasons to buy a 'real UNIX box' as a low end DB server. CPU performance is way down the priority list compared with IO throughput, stability, manageability, support, etc etc. Given that the original question was about a very heavily write-oriented environment, I'd take the Sun every day of the week, assuming that those compile option changes have sorted out the oddly slow PG performance at last. M ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] [ADMIN] Benchmarking postgres on Solaris/Linux
> Indeed, if our Suns actually diabled the broken hardware when they > died, fell over, and rebooted themselves, I'd certainly praise them > to heaven. But I have to say that the really very good reporting of > failing memory has saved me some headaches. Ha! Yes, it would seem the obvious thing to do - but as you say, at least you get told what borked and may even be able to remove it without stopping the machine. Sometimes. Or at least you get a nice lunch from your Sun reseller. > I should say, also, that my initial experience of AIX has been > extremely good. I can't comment on the fun it might involve in the > long haul, of course. The current crop of power4+ boxen is reputed to even be able to recover from a failed CPU without a restart. Not *always* one imagines, but usefully often enough for the banking mob to get sweaty over the feature. More importantly though, IBM seems committed to supporting all this goodness under Linux too (though not BSD I fear - sorry Bruce) Now if these vendors could somehow eliminate downtime due to human error we'd be talking *serious* reliablity. M ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] RAID or manual split?
> It seems, that if I know the type and frequency of the queries a > database will be seeing, I could split the database by hand over > multiple disks and get better performance that I would with a RAID array > with similar hardware. Unlikely, but possible if you had radically different hardware for different tables. > Six large (3-7 Mrow) 'summary' tables, each being updated continuously > by 5-20 processes with about 0.5 transactions/second/process. Well you should get close to an order of magnitude better performance from a RAID controller with write-back cache on those queries. > Periodically (currently every two weeks), join queries are > performed between one of the 'summary' tables(same one each time) and > each of the other five. Each join touches most rows of both tables, > indexes aren't used. Results are written into a separate group of > 'inventory' tables (about 500 Krow each), one for each join. The more disks the data is spread over the better (the RAID controller will help here with striping). > There are frequent (100-1000/day) queries of both the > inventory and summary tables using the primary key -- always using the > index and returning < 10 rows. RAM is what you need, to cache the data and indexes, and then as much CPU power as you can get. > We're currently getting (barely) acceptable performance from a single > 15k U160 SCSI disk, but db size and activity are growing quickly. > I've got more disks and a battery-backed LSI card on order. 3 or more disks in a stripe set, with write back caching, will almost certainly give a huge performance boost. Try that first, and only if you have issues should you think about futzing with symlinks etc. M ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Performance Bottleneck
> Squid also takes away the work of doing SSL (presuming you're running it > on a different machine). Unfortunately it doesn't support HTTP/1.1 which > means that most generated pages (those that don't set Content-length) end > up forcing squid to close and then reopen the connection to the web > server. It is true that it doesn't support http/1.1, but 'most generated pages'? Unless they are actually emitted progressively they should have a perfectly good content-length header. > I've also had some problems when Squid had a large number of connections > open (several thousand); though that may have been because of my > half_closed_clients setting. Squid 3 coped a lot better when I tried it > (quite a few months ago now - and using FreeBSD and the special kqueue > system call) but crashed under some (admittedly synthetic) conditions. It runs out of the box with a very conservative setting for max open file descriptors - this may or may not be the cause of the problems you have seen. Certainly I ran squid with >16,000 connections back in 1999... > You still have periods of time when the web servers are busy using their > CPUs to generate HTML rather than waiting for database queries. This is > especially true if you cache a lot of data somewhere on the web servers > themselves (which, in my experience, reduces the database load a great > deal). If you REALLY need to reduce the number of connections (because you > have a large number of web servers doing a lot of computation, say) then > it might still be useful. Aha, a postgres related topic in this thread! What you say is very true, but then given that the connection overhead is so vanishingly small, why not simply run without a persistent DB connection in this case? I would maintain that if your webservers are holding open idle DB connections for so long that it's a problem, then simply close the connections! M ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] PostgreSQL performance problem -> tuning
> Mem: 71M Active, 23M Inact, 72M Wired, 436K Cache, 48M Buf, 208M Free > Swap: 368M Total, 2852K Used, 366M Free > > It's right that I can figure that I can use 384M (total RAM) - 72M > (wired) - 48M (buf) = 264M for PostgreSQL. > Hence, if I set effective_cache_size to 24M (3072 8K blocks), > reasonable value (less than 240M, say 48M) for sort_mem, some value for > shared_buffers (i.e. 24M, or 6144 4K blocks (FreeBSD), or 3072 8K blocks > (PostgreSQL)), and rest of RAM 264M (total free with OS cache) - 24M > (reserved for OS cache) - 48M (sort) - 24M (shared) = 168M PostgreSQL > allocate dynamically by himself? Totally, utterly the wrong way around. Start with 384M, subtract whatever is in use by other processes, excepting kernel disk cache, subtract your PG shared buffers, subtract (PG proc size + PG sort mem)*(max number of PG processes you need to run - should be same as max_connections if thinking conservatively), leave some spare room so you can ssh in without swapping, and *the remainder* is what you should set effective_cache_size to. This is all in the docs. The key thing is: set effective_cache_size *last*. Note that Postgres assumes your OS is effective at caching disk blocks, so if that assumption is wrong you lose performance. Also, why on _earth_ would you need 48MB for sort memory? Are you seriously going to run a query that returns 48M of data and then sort it, on a machine with 384M of RAM? M ---(end of broadcast)--- TIP 8: explain analyze is your friend
[PERFORM] Hardware recommendations to scale to silly load
I'm wondering if the good people out there could perhaps give me some pointers on suitable hardware to solve an upcoming performance issue. I've never really dealt with these kinds of loads before, so any experience you guys have would be invaluable. Apologies in advance for the amount of info below... My app is likely to come under some serious load in the next 6 months, but the increase will be broadly predictable, so there is time to throw hardware at the problem. Currently I have a ~1GB DB, with the largest (and most commonly accessed and updated) two tables having 150,000 and 50,000 rows. A typical user interaction with the system involves about 15 single-table selects, 5 selects with joins or subqueries, 3 inserts, and 3 updates. The current hardware probably (based on benchmarking and profiling) tops out at about 300 inserts/updates *or* 2500 selects per second. There are multiple indexes on each table that updates & inserts happen on. These indexes are necessary to provide adequate select performance. Current hardware/software: Quad 700MHz PIII Xeon/1MB cache 3GB RAM RAID 10 over 4 18GB/10,000rpm drives 128MB battery backed controller cache with write-back enabled Redhat 7.3, kernel 2.4.20 Postgres 7.2.3 (stock redhat issue) I need to increase the overall performance by a factor of 10, while at the same time the DB size increases by a factor of 50. e.g. 3000 inserts/updates or 25,000 selects per second, over a 25GB database with most used tables of 5,000,000 and 1,000,000 rows. Notably, the data is very time-sensitive, so the active dataset at any hour is almost certainly going to be more on the order of 5GB than 25GB (plus I'll want all the indexes in RAM of course). Also, and importantly, the load comes but one hour per week, so buying a Starfire isn't a real option, as it'd just sit idle the rest of the time. I'm particularly interested in keeping the cost down, as I'm a shareholder in the company! So what do I need? Can anyone who has (or has ever had) that kind of load in production offer any pointers, anecdotes, etc? Any theoretical musings also more than welcome. Comments upon my sanity will be referred to my doctor. If the best price/performance option is a second hand 32-cpu Alpha running VMS I'd be happy to go that way ;-) Many thanks for reading this far. Matt ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Hardware recommendations to scale to silly load
> You probably, more than anything, should look at some kind of > superfast, external storage array Yeah, I think that's going to be a given. Low end EMC FibreChannel boxes can do around 20,000 IOs/sec, which is probably close to good enough. You mentioned using multiple RAID controllers as a boost - presumably the trick here is to split the various elements (WAL, tables, indexes) across different controllers using symlinks or suchlike? Can I feasibly split the DB tables across 5 or more controllers? > > Also, and importantly, the load comes but one hour per week, so buying a > > Starfire isn't a real option, as it'd just sit idle the rest of the > > time. I'm particularly interested in keeping the cost down, as I'm a > > shareholder in the company! > > Interesting. If you can't spread the load out, can you batch some parts > of it? Or is the whole thing interactive therefore needing to all be > done in real time at once? All interactive I'm afraid. It's a micropayment system that's going to be used here in the UK to do online voting for a popular TV programme. The phone voting system has a hard limit of [redacted] million votes per hour, and the producers would like to be able to tell people to vote online if the phone lines are busy. They can vote online anyway, but we expect the average viewer to have to make 10 calls just to get through during peak times, so the attraction is obvious. > whether you like it or not, you're gonna need heavy iron if you need to do > this all in one hour once a week. Yeah, I need to rent a Starfire for a month later this year, anybody got one lying around? Near London? > Actually, I've seen stuff like that going on Ebay pretty cheap lately. I > saw a 64 CPU E10k (366 MHz CPUs) with 64 gigs ram and 20 hard drives going > for $24,000 a month ago. Put Linux or BSD on it and Postgresql should > fly. Jeez, and I thought I was joking about the Starfire. Even Slowaris would be OK on one of them. The financial issue is that there's just not that much money in the micropayments game for bursty sales. If I was doing these loads *continuously* then I wouldn't be working, I'd be in the Maldives :-) I'm also looking at renting equipment, or even trying out IBM/HP's 'on-demand' offerings. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Hardware recommendations to scale to silly load
> Don't know how "cheap" they are. > > I have an app that does large batch updates. I found that if I dropped > the indexes, did the updates and recreated the indexes, it was faster > than doing the updates while the indexes were intact. Yeah, unfortunately it's not batch work, but real time financial work. If I drop all the indexes my select performance goes through the floor, as you'd expect. > Does noatime make much difference on a PostgreSQL database? I haven't > tested that yet. Yup, it does. In fact it should probably be in the standard install documentation (unless someone has a reason why it shouldn't). Who *cares* when PG last looked at the tables? If 'nomtime' was available that would probably be a good thing too. > Can you split it onto multiple boxes? Some database layouts lend themselves > to this, others don't. Obviously you can't do joins from one server to > another, so you may lose more in multiple queries than you gain by having > multiple servers. It's worth looking into though. I'm considering that. There are some tables which I might be able to split out. There amy even be some things I can pull from the DB altogether (session info in particular, so long as I can reliably send a given user's requests to the same app server each time, bearing in mind I can't see the cookies too easily because 50% of the requests are over SSL) > I know my answers aren't quite the ones you were looking for, but my > experience is that many people try to solve poor application design > by simply throwing bigger hardware at the problem. It appears as though > you've already done your homework, though. Well, I *hope* that's the case! The core issue is simply that we have to deal with an insane load for 1 hour a week, and there's just no avoiding it. Maybe I can get Sun/HP/IBM to lend some gear (it's a pretty high-profile site). ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Hardware recommendations to scale to silly load
> Are you sure? Have you tested the overall application to see if possibly > you gain more on insert performance than you lose on select performanc? Unfortunately dropping any of the indexes results in much worse select performance that is not remotely clawed back by the improvement in insert performance. Actually there doesn't really seem to *be* that much improvement in insert performance when going from 3 indexes to 2. I guess indexes must be fairly cheap for PG to maintain? > It's possible that compiling Postgres manually with proper optimizations > could yield some improvements, as well as building a custom kernel in > Redhat. > > Also, you don't mention which filesystem you're using: > http://www.potentialtech.com/wmoran/postgresql.php Yeah, I can imagine getting 5% extra from a slim kernel and super-optimised PG. The FS is ext3, metadata journaling (the default), mounted noatime. > But if you're in the situation where you have more time than money, > you may find that an overall audit of your app is worthwhile. Consider > taking parts that are in perl (for example) and recoding them into C > (that is, unless you've already identified that all the bottlenecks are > at the PostgreSQL server) I can pretty cheaply add more CPU horsepower for the app servers, as they scale horizontally, so I can chuck in a couple (or 3, or 4, or ...) more dual-cpu boxen with a gig of ram and tell the load balancer about them. The problem with the DB is that that approach simply won't work - the box just has to get bigger! > I doubt if the suggestions I've made are going to get you 10x, but they > may get you 2x, and then you only need the hardware to do 5x. It all helps :-) A few percent here, a few percent there, pretty soon you're talking serious improvements... Thanks Matt ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Hardware recommendations to scale to silly load
> Are you *sure* about that 3K updates/inserts per second xlates > to 10,800,000 per hour. That, my friend, is a WHOLE HECK OF A LOT! Yup, I know! > During the 1 hour surge, will SELECTs at 10 minutes after the > hour depend on INSERTs at 5 minutes after the hour? Yes, they do. It's a payments system, so things like account balances and purchase histories have to be updated in real time. > Only one hour out of 168? May I ask what kind of app it is? Online voting for an unnamed TV show... > > If the best price/performance option is a second hand 32-cpu Alpha > > running VMS I'd be happy to go that way ;-) > > I'd love to work on a GS320! You may even pick one up for a million > or 2. The license costs for VMS & Rdb would eat you, though. You'd be amazed how little they do go for actually :-) ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Disappointing performance in db migrated from MS SQL
> Josh, the disks in the new system should be substantially faster than > the old. Both are Ultra160 SCSI RAID 5 arrays, but the new system has > 15k RPM disks, as opposed to the 10k RPM disks in the old system. Spindle speed does not correlate with 'throughput' in any easy way. What controllers are you using for these disks? ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[PERFORM] Prefetch
I wanted to get some opinions about row prefetching. AFAIK, there is no prefetching done by PostgreSQL; all prefetching is delegated to the operating system. The hardware (can't say enough good things about it): Athlon 64, dual channel 4GB ram 240GB usable 4 disk raid5 (ATA133) Fedora Core 3 PostgreSQL 7.4.7 I have what is essentially a data warehouse of stock data. Each day has around 30,000 records (tickers). A typical operation is to get the 200 day simple moving average (of price) for each ticker and write the result to a summary table. In running this process (Perl/DBI), it is typical to see 70-80% I/O wait time with postgres running a about 8-9%. If I run the next day's date, the postgres cache and file cache is now populated with 199 days of the needed data, postgres runs 80-90% of CPU and total run time is greatly reduced. My conclusion is that this is a high cache hit rate in action. I've done other things that make sense, like using indexes, playing with the planner constants and turning up the postgres cache buffers. Even playing with extream hdparm read-ahead numbers (i.e. 64738) yields no apparent difference in database performance. The random nature of the I/O drops disk reads down to about 1MB/sec for the array. A linear table scan can easily yield 70-80MB/sec on this system. Total table size is usually around 1GB and with indexes should be able to fit completely in main memory. Other databases like Oracle and DB2 implement some sort of row prefetch. Has there been serious consideration of implementing something like a prefetch subsystem? Does anyone have any opinions as to why this would be a bad idea for postgres? Postges is great for a multiuser environment and OLTP applications. However, in this set up, a data warehouse, the observed performance is not what I would hope for. Regards, Matt Olson Ocean Consulting http://www.oceanconsulting.com/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Prefetch
My postgres binaries and WAL are on a separate disk from the raid array. The table I'm doing the selects from is probably about 4GB in size and 18-20 million records. No concurrent or dependent inserts or deletes are going on. Tom's point and your points about optimizing the application are well taken. I know my approach is sub optimal and prone to getting caught by latency issues (seek times, cache hit rates, etc.). However, the question of prefetch in my mind is all about eliminating latencies, so, I thought my problem would be good for the sake of discussing prefetching. The two approaches I'm in the process of testing are Rod and Greg's suggestion of using 'CLUSTER'. And for the sake of not letting a good idea get away, I'll probably spend time on doing a parallel query approach which Tom suggested. I'll report back to the list what I find and maybe do some _rough_ benchmarking. This is a production app, so I can't get too much in the way of the daily batches. -- Matt Olson Ocean Consulting http://www.oceanconsulting.com/ On Tuesday 10 May 2005 11:13 am, Greg Stark wrote: > Matt Olson writes: > > I've done other things that make sense, like using indexes, playing with > > the planner constants and turning up the postgres cache buffers. > > > > Even playing with extream hdparm read-ahead numbers (i.e. 64738) yields > > no apparent difference in database performance. The random nature of the > > I/O drops disk reads down to about 1MB/sec for the array. A linear table > > scan can easily yield 70-80MB/sec on this system. Total table size is > > usually around 1GB and with indexes should be able to fit completely in > > main memory. > > Actually forcing things to use indexes is the wrong direction to go if > you're trying to process lots of data and want to stream it off disk as > rapidly as possible. I would think about whether you can structure your > data such that you can use sequential scans. That might mean partitioning > your raw data into separate tables and then accessing only the partitions > that are relevant to the query. > > In your application that might be hard. It sounds like you would need more > or less one table per stock ticker which would really be hard to manage. > > One thing you might look into is using the CLUSTER command. But postgres > doesn't maintain the cluster ordering so it would require periodically > rerunning it. > > I'm a bit surprised by your 1MB/s rate. I would expect to see about 10MB/s > even for completely random reads. Is it possible you're seeing something > else interfering? Do you have INSERT/UPDATE/DELETE transactions happening > concurrently with this select scan? If so you should strongly look into > separating the transaction log from the data files. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Prefetch
I wanted to get some opinions about row prefetching. AFAIK, there is no prefetching done by PostgreSQL; all prefetching is delegated to the operating system. The hardware (can't say enough good things about it): Athlon 64, dual channel 4GB ram 240GB usable 4 disk raid5 (ATA133) Fedora Core 3 PostgreSQL 7.4.7 I have what is essentially a data warehouse of stock data. Each day has around 30,000 records (tickers). A typical operation is to get the 200 day simple moving average (of price) for each ticker and write the result to a summary table. In running this process (Perl/DBI), it is typical to see 70-80% I/O wait time with postgres running a about 8-9%. If I run the next day's date, the postgres cache and file cache is now populated with 199 days of the needed data, postgres runs 80-90% of CPU and total run time is greatly reduced. My conclusion is that this is a high cache hit rate in action. I've done other things that make sense, like using indexes, playing with the planner constants and turning up the postgres cache buffers. Even playing with extream hdparm read-ahead numbers (i.e. 64738), there is no apparent difference in database performance. The random nature of the I/O drops disk reads down to about 1MB/sec for the array. A linear table scan can easily yield 70-80MB/sec on this system. Total table size is usually around 1GB and with indexes should be able to fit completely in main memory. Other databases like Oracle and DB2 implement some sort of row prefetch. Has there been serious consideration of implementing something like a prefetch subsystem? Does anyone have any opinions as to why this would be a bad idea for postgres? Postges is great for a multiuser environment and OLTP applications. However, in this set up, a data warehouse, the observed performance is not what I would hope for. Regards, Matt Olson Ocean Consulting http://www.oceanconsulting.com/ ---(end of broadcast)--- TIP 8: explain analyze is your friend
[PERFORM] pgsql upgrade
Hi all, I've run into an issue with a Pg 7.4.6 to 8.1.5 upgrade along with hardware upgrade. I moved the database from a 2x 3.0ghz Xeon (512kb w/HT) to a 2x Opteron 250. The database is in memory on a tmpfs partition. (the application can rebuild the db during total system failure) When I first switched it over, the results were exactly what I expected. I was sustaining about a 2.2 on the Xeon and with the Opteron, about a 1.5 with the same traffic. The box is highload, it ouputs about 15mbps. After a couple hours working perfectly the ' system' (vs user) load jumped from a 3% of total CPU usage, to 30%, a 10x increase, and postgres started to write out to data/base at a fairly fast rate. The CPU context switch rate doubled at the same time. Iowait, which was historically 0 on the 7.4 box, went to 0.08. Strangely enough, a vacuum (not full or analyze) stopped postgres from writing to data/base but the strange load pattern remains. (system is ~30% of the overall load, vs 3% before) So, my question is, what happened, and how can I get it back to the same load pattern 7.4.6 had, and the same pattern I had for 4 hours before it went crazy? Matt ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Sunfire X4500 recommendations
My company is purchasing a Sunfire x4500 to run our most I/O-bound databases, and I'd like to get some advice on configuration and tuning. We're currently looking at: - Solaris 10 + zfs + RAID Z - CentOS 4 + xfs + RAID 10 - CentOS 4 + ext3 + RAID 10 but we're open to other suggestions. >From previous message threads, it looks like some of you have achieved stellar >performance under both Solaris 10 U2/U3 with zfs and CentOS 4.4 with xfs. >Would those of you who posted such results please describe how you tuned the >OS/fs to yield those figures (e.g. patches, special drivers, read-ahead, >checksumming, write-through cache settings, etc.)? Most of our servers currently run CentOS/RedHat, and we have little experience with Solaris, but we're not opposed to Solaris if there's a compelling reason to switch. For example, it sounds like zfs snapshots may have a lighter performance penalty than LVM snapshots. We've heard that just using LVM (even without active snapshots) imposes a maximum sequential I/O rate of around 600 MB/s (although we haven't yet reached this limit experimentally). By the way, we've also heard that Solaris is "more stable" under heavy I/O load than Linux. Have any of you experienced this? It's hard to put much stock in such a blanket statement, but naturally we don't want to introduce instabilities. Thanks in advance for your thoughts! For reference: Our database cluster will be 3-6 TB in size. The Postgres installation will be 8.1 (at least initially), compiled to use 32 KB blocks (rather than 8 KB). The workload will be predominantly OLAP. The Sunfire X4500 has 2 dual-core Opterons, 16 GB RAM, 48 SATA disks (500 GB/disk * 48 = 24 TB raw -> 12 TB usable under RAID 10). So far, we've seen the X4500 deliver impressive but suboptimal results using the out-of-the-box installation of Solaris + zfs. The Linux testing is in the early stages (no xfs, yet), but so far it yeilds comparatively modest write rates and very poor read and rewrite rates. === Results under Solaris with zfs: === Four concurrent writers: % time dd if=/dev/zero of=/zpool1/test/50GB-zero1 bs=1024k count=51200 ; time sync % time dd if=/dev/zero of=/zpool1/test/50GB-zero2 bs=1024k count=51200 ; time sync % time dd if=/dev/zero of=/zpool1/test/50GB-zero3 bs=1024k count=51200 ; time sync % time dd if=/dev/zero of=/zpool1/test/50GB-zero4 bs=1024k count=51200 ; time sync Seq Write (bs = 1 MB): 128 + 122 + 131 + 124 = 505 MB/s Four concurrent readers: % time dd if=/zpool1/test/50GB-zero1 of=/dev/null bs=1024k % time dd if=/zpool1/test/50GB-zero2 of=/dev/null bs=1024k % time dd if=/zpool1/test/50GB-zero3 of=/dev/null bs=1024k % time dd if=/zpool1/test/50GB-zero4 of=/dev/null bs=1024k Seq Read (bs = 1 MB): 181 + 177 + 180 + 178 = 716 MB/s One bonnie++ process: % bonnie++ -r 16384 -s 32g:32k -f -n0 -d /zpool1/test/bonnie_scratch Version 1.03 --Sequential Output-- --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size:chnk K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP thumper132G:32k 604173 98 268893 43 543389 59 519.2 3 thumper1,32G:32k,,,604173,98,268893,43,,,543389,59,519.2,3, 4 concurrent synchronized bonnie++ processes: % bonnie++ -p4 % bonnie++ -r 16384 -s 32g:32k -y -f -n0 -d /zpool1/test/bonnie_scratch % bonnie++ -r 16384 -s 32g:32k -y -f -n0 -d /zpool1/test/bonnie_scratch % bonnie++ -r 16384 -s 32g:32k -y -f -n0 -d /zpool1/test/bonnie_scratch % bonnie++ -r 16384 -s 32g:32k -y -f -n0 -d /zpool1/test/bonnie_scratch % bonnie++ -p-1 Combined results of 4 sessions: Seq Output: 124 + 124 + 124 + 140 = 512 MB/s Rewrite: 93 + 94 + 93 + 96 = 376 MB/s Seq Input:192 + 194 + 193 + 197 = 776 MB/s Random Seek: 327 + 327 + 335 + 332 = 1321 seeks/s = Results under CentOS 4 with ext3 and LVM: = % bonnie++ -s 32g:32k -f -n0 -d /large_lvm_stripe/test/bonnie_scratch Version 1.03 --Sequential Output-- --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size:chnk K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP thumper1.rt 32G:32k 346595 94 59448 11 132471 12 479.4 2 thumper1.rtkinternal,32G:32k,,,346595,94,59448,11,,,132471,12,479.4,2, Summary of bonnie++ results: sequential sequentialsequential scattered Test case write MB/s rewrite MB/s read MB/s seeks/s - -- -- - Sol10+zfs, 1 process 604 269 543519 Sol10+zfs, 4 processes512 376
Re: [PERFORM] Sunfire X4500 recommendations
Thanks Dimitri! That was very educational material! I'm going to think out loud here, so please correct me if you see any errors. The section on tuning for OLTP transactions was interesting, although my OLAP workload will be predominantly bulk I/O over large datasets of mostly-sequential blocks. The NFS+ZFS section talked about the zil_disable control for making zfs ignore commits/fsyncs. Given that Postgres' executor does single-threaded synchronous I/O like the tar example, it seems like it might benefit significantly from setting zil_disable=1, at least in the case of frequently flushed/committed writes. However, zil_disable=1 sounds unsafe for the datafiles' filesystem, and would probably only be acceptible for the xlogs if they're stored on a separate filesystem and you're willing to loose recently committed transactions. This sounds pretty similar to just setting fsync=off in postgresql.conf, which is easier to change later, so I'll skip the zil_disable control. The RAID-Z section was a little surprising. It made RAID-Z sound just like RAID 50, in that you can customize the trade-off between iops versus usable diskspace and fault-tolerance by adjusting the number/size of parity-protected disk groups. The only difference I noticed was that RAID-Z will apparently set the stripe size across vdevs (RAID-5s) to be as close as possible to the filesystem's block size, to maximize the number of disks involved in concurrently fetching each block. Does that sound about right? So now I'm wondering what RAID-Z offers that RAID-50 doesn't. I came up with 2 things: an alleged affinity for full-stripe writes and (under RAID-Z2) the added fault-tolerance of RAID-6's 2nd parity bit (allowing 2 disks to fail per zpool). It wasn't mentioned in this blog, but I've heard that under certain circumstances, RAID-Z will magically decide to mirror a block instead of calculating parity on it. I'm not sure how this would happen, and I don't know the circumstances that would trigger this behavior, but I think the goal (if it really happens) is to avoid the performance penalty of having to read the rest of the stripe required to calculate parity. As far as I know, this is only an issue affecting small writes (e.g. single-row updates in an OLTP workload), but not large writes (compared to the RAID's stripe size). Anyway, when I saw the filesystem's intent log mentioned, I thought maybe the small writes are converted to full-stripe writes by deferring their commit until a full stripe's worth of data had been accumulated. Does that sound plausible? Are there any other noteworthy perks to RAID-Z, rather than RAID-50? If not, I'm inclined to go with your suggestion, Dimitri, and use zfs like RAID-10 to stripe a zpool over a bunch of RAID-1 vdevs. Even though many of our queries do mostly sequential I/O, getting higher seeks/second is more important to us than the sacrificed diskspace. For the record, those blogs also included a link to a very helpful ZFS Best Practices Guide: http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide To sum up, so far the short list of tuning suggestions for ZFS includes: - Use a separate zpool and filesystem for xlogs if your apps write often. - Consider setting zil_disable=1 on the xlogs' dedicated filesystem. ZIL is the intent log, and it sounds like disabling it may be like disabling journaling. Previous message threads in the Postgres archives debate whether this is safe for the xlogs, but it didn't seem like a conclusive answer was reached. - Make filesystem block size (zfs record size) match the Postgres block size. - Manually adjust vdev_cache. I think this sets the read-ahead size. It defaults to 64 KB. For OLTP workload, reduce it; for DW/OLAP maybe increase it. - Test various settings for vq_max_pending (until zfs can auto-tune it). See http://blogs.sun.com/erickustarz/entry/vq_max_pending - A zpool of mirrored disks should support more seeks/second than RAID-Z, just like RAID 10 vs. RAID 50. However, no single Postgres backend will see better than a single disk's seek rate, because the executor currently dispatches only 1 logical I/O request at a time. >>> Dimitri <[EMAIL PROTECTED]> 03/23/07 2:28 AM >>> On Friday 23 March 2007 03:20, Matt Smiley wrote: > My company is purchasing a Sunfire x4500 to run our most I/O-bound > databases, and I'd like to get some advice on configuration and tuning. > We're currently looking at: - Solaris 10 + zfs + RAID Z > - CentOS 4 + xfs + RAID 10 > - CentOS 4 + ext3 + RAID 10 > but we're open to other suggestions. > Matt, for Solaris + ZFS you may find answers to all your questions here: http://blogs.sun.com/roch/category/ZFS http://blogs.sun.com/realneel/entry/zfs_and_databases Think to
Re: [PERFORM] Sunfire X4500 recommendations
Hi Dimitri, First of all, thanks again for the great feedback! Yes, my I/O load is mostly read operations. There are some bulk writes done in the background periodically throughout the day, but these are not as time-sensitive. I'll have to do some testing to find the best balance of read vs. write speed and tolerance of disk failure vs. usable diskspace. I'm looking forward to seeing the results of your OLTP tests! Good luck! Since I won't be doing that myself, it'll be all new to me. About disk failure, I certainly agree that increasing the number of disks will decrease the average time between disk failures. Apart from any performance considerations, I wanted to get a clear idea of the risk of data loss under various RAID configurations. It's a handy reference, so I thought I'd share it: The goal is to calculate the probability of data loss when we loose a certain number of disks within a short timespan (e.g. loosing a 2nd disk before replacing+rebuilding the 1st one). For RAID 10, 50, and Z, we will loose data if any disk group (i.e. mirror or parity-group) looses 2 disks. For RAID 60 and Z2, we will loose data if 3 disks die in the same parity group. The parity groups can include arbitrarily many disks. Having larger groups gives us more usable diskspace but less protection. (Naturally we're more likely to loose 2 disks in a group of 50 than in a group of 5.) g = number of disks in each group (e.g. mirroring = 2; single-parity = 3 or more; dual-parity = 4 or more) n = total number of disks risk of loosing any 1 disk = 1/n risk of loosing 1 disk from a particular group = g/n risk of loosing 2 disks in the same group = g/n * (g-1)/(n-1) risk of loosing 3 disks in the same group = g/n * (g-1)/(n-1) * (g-2)/(n-2) For the x4500, we have 48 disks. If we stripe our data across all those disks, then these are our configuration options: RAID 10 or 50 -- Mirroring or single-parity must loose 2 disks from the same group to loose data: disks_per_group num_groups total_disks usable_disks risk_of_data_loss 2 24 4824 0.09% 3 16 4832 0.27% 4 12 4836 0.53% 6 8 4840 1.33% 8 6 4842 2.48% 12 4 4844 5.85% 24 2 4846 24.47% 48 1 4847100.00% RAID 60 or Z2 -- Double-parity must loose 3 disks from the same group to loose data: disks_per_group num_groups total_disks usable_disks risk_of_data_loss 2 24 48 n/an/a 3 16 4816 0.01% 4 12 4824 0.02% 6 8 4832 0.12% 8 6 4836 0.32% 12 4 4840 1.27% 24 2 4844 11.70% 48 1 4846100.00% So, in terms of fault tolerance: - RAID 60 and Z2 always beat RAID 10, since they never risk data loss when only 2 disks fail. - RAID 10 always beats RAID 50 and Z, since it has the largest number of disk groups across which to spread the risk. - Having more parity groups increases fault tolerance but decreases usable diskspace. That's all assuming each disk has an equal chance of failure, which is probably true since striping should distribute the workload evenly. And again, these probabilities are only describing the case where we don't have enough time between disk failures to recover the array. In terms of performance, I think RAID 10 should always be best for write speed. (Since it doesn't calculate parity, writing a new block doesn't require reading the rest of the RAID stripe just to recalculate the parity bits.) I think it's also normally just as fast for reading, since the controller can load-balance the pending read requests to both sides of each mirror. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Sunfire X4500 recommendations
Hi David, Thanks for your feedback! I'm rather a newbie at this, and I do appreciate the critique. First, let me correct myself: The formulas for the risk of loosing data when you loose 2 and 3 disks shouldn't have included the first term (g/n). I'll give the corrected formulas and tables at the end of the email. > please explain why you are saying that the risk of loosing any 1 disk is > 1/n. shouldn't it be probability of failure * n instead? 1/n represents the assumption that all disks have an equal probability of being the next one to fail. This seems like a fair assumption in general for the active members of a stripe (not including hot spares). A possible exception would be the parity disks (because reads always skip them and writes always hit them), but that's only a consideration if the RAID configuration used dedicated disks for parity instead of distributing it across the RAID 5/6 group members. Apart from that, whether the workload is write-heavy or read-heavy, sequential or scattered, the disks in the stripe ought to handle a roughly equivalent number of iops over their lifetime. > following this logic the risk of loosing all 48 disks in a single group of > 48 would be 100% Exactly. Putting all disks in one group is RAID 0 -- no data protection. If you loose even 1 active member of the stripe, the probability of loosing your data is 100%. > also what you are looking for is the probability of the second (and third) > disks failing in time X (where X is the time nessasary to notice the > failure, get a replacement, and rebuild the disk) Yep, that's exactly what I'm looking for. That's why I said, "these probabilities are only describing the case where we don't have enough time between disk failures to recover the array." My goal wasn't to estimate how long time X is. (It doesn't seem like a generalizable quantity; due partly to logistical and human factors, it's unique to each operating environment.) Instead, I start with the assumption that time X has been exceeded, and we've lost a 2nd (or 3rd) disk in the array. Given that assumption, I wanted to show the probability that the loss of the 2nd disk has caused the stripe to become unrecoverable. We know that RAID 10 and 50 can tolerate the loss of anywhere between 1 and n/g disks, depending on how lucky you are. I wanted to quantify the amount of luck required, as a risk management tool. The duration of time X can be minimized with hot spares and attentive administrators, but the risk after exceeding time X can only be minimized (as far as I know) by configuring the RAID stripe with small enough underlying failure groups. > the killer is the time needed to rebuild the disk, with multi-TB arrays > is't sometimes faster to re-initialize the array and reload from backup > then it is to do a live rebuild (the kernel.org servers had a raid failure > recently and HPA mentioned that it took a week to rebuild the array, but > it would have only taken a couple days to do a restore from backup) That's very interesting. I guess the rebuild time also would depend on how large the damaged failure group was. Under RAID 10, for example, I think you'd still only have to rebuild 1 disk from its mirror, regardless of how many other disks were in the stripe, right? So shortening the rebuild time may be another good motivation to keep the failure groups small. > add to this the fact that disk failures do not appear to be truely > independant from each other statisticly (see the recent studies released > by google and cmu), and I wouldn't bother with single-parity for a I don't think I've seen the studies you mentioned. Would you cite them please? This may not be typical of everyone's experience, but what I've seen during in-house load tests is an equal I/O rate for each disk in my stripe, using short-duration sampling intervals to avoid long-term averaging effects. This is what I expected to find, so I didn't delve deeper. Certainly it's true that some disks may be more heavily burdened than others for hours or days, but I wouldn't expect any bias from an application-driven access pattern to persist for a significant fraction of a disk's lifespan. The only influence I'd expect to bias the cumulative I/O handled by a disk over its entire life would be its role in the RAID configuration. Hot spares will have minimal wear-and-tear until they're activated. Dedicated parity disks will probably live longer than data disks, unless the workload is very heavily oriented towards small writes (e.g. logging). > multi-TB array. If the data is easy to recreate (including from backup) or > short lived (say a database of log data that cycles every month or so) I > would just do RAID-0 and plan on loosing the data on drive failure (this > assumes that you can afford the loss of service when this happens). if the > data is more important then I'd do dual-parity or more, along with a ho
[PERFORM] db performance/design question
I'm designing a system that will be doing over a million inserts/deletes on a single table every hour. Rather than using a single table, it is possible for me to partition the data into multiple tables if I wanted to, which would be nice because I can just truncate them when I don't need them. I could even use table spaces to split the IO load over multiple filers. The application does not require all this data be in the same table. The data is fairly temporary, it might last 5 seconds, it might last 2 days, but it will all be deleted eventually and different data will be created. Considering a single table would grow to 10mil+ rows at max, and this machine will sustain about 25mbps of insert/update/delete traffic 24/7 - 365, will I be saving much by partitioning data like that? -- -Matt <http://twiki.spimageworks.com/twiki/bin/view/Software/CueDevelopment>
[PERFORM] Slowing UPDATEs inside a transaction
Hi. I've only been using PostgreSQL properly for a week or so, so I apologise if this has been covered numerous times, however Google is producing nothing of use. I'm trying to import a large amount of legacy data (billions of denormalised rows) into a pg database with a completely different schema, de-duplicating bits of it on-the-fly while maintaining a reference count. The procedures to do this have proven easy to write, however the speed is not pretty. I've spent some time breaking down the cause and it's come down to a simple UPDATE as evidenced below: CREATE TABLE foo (a int PRIMARY KEY, b int); INSERT INTO foo VALUES (1,1); CREATE OR REPLACE FUNCTION test() RETURNS int AS $$ DECLARE i int; BEGIN FOR i IN 1..1 LOOP UPDATE foo SET b=b+1 WHERE a=1; END LOOP; RETURN 1; END; $$ LANGUAGE plpgsql; When run individually, this function produces the following timing: Time: 1912.593 ms Time: 1937.183 ms Time: 1941.607 ms Time: 1943.673 ms Time: 1944.738 ms However, when run inside a transaction (to simulate a function doing the same work) I get this: START TRANSACTION Time: 0.836 ms Time: 1908.931 ms Time: 5370.110 ms Time: 8809.378 ms Time: 12274.294 ms Time: 15698.745 ms Time: 19218.389 ms There is no disk i/o and the postgresql process runs 100% cpu. Server is amd64 FreeBSD 8-STABLE w/16GB RAM running postgresql 9.0.3 from packages Looking at the timing of real data (heavily grouped), it seems the speed of UPDATEs can vary dependent on how heavily updated a row is, so I set out to produce a test case: CREATE TABLE foo (a int PRIMARY KEY, b int); INSERT INTO foo VALUES (1,1),(2,1),(3,1),(4,1); CREATE OR REPLACE FUNCTION test(int) RETURNS int AS $$ DECLARE i int; BEGIN FOR i IN 1..1 LOOP UPDATE foo SET b=1 WHERE a=$1; END LOOP; RETURN 1; END; $$ LANGUAGE plpgsql; START TRANSACTION; SELECT test(1); Time: 1917.305 ms SELECT test(2); Time: 1926.758 ms SELECT test(3); Time: 1926.498 ms SELECT test(1); Time: 5376.691 ms SELECT test(2); Time: 5408.231 ms SELECT test(3); Time: 5403.654 ms SELECT test(1); Time: 8842.425 ms SELECT test(4); Time: 1925.954 ms COMMIT; START TRANSACTION; SELECT test(1); Time: 1911.153 ms As you can see, the more an individual row is updated /within a transaction/, the slower it becomes for some reason. Unfortunately in my real-world case, I need to do many billions of these UPDATEs. Is there any way I can get around this without pulling my huge source table out of the database and feeding everything in line-at-a-time from outside the database? Thanks. -- The information contained in this message is confidential and is intended for the addressee only. If you have received this message in error or there are any problems please notify the originator immediately. The unauthorised use, disclosure, copying or alteration of this message is strictly forbidden. Critical Software Ltd. reserves the right to monitor and record e-mail messages sent to and from this address for the purposes of investigating or detecting any unauthorised use of its system and ensuring its effective operation. Critical Software Ltd. registered in England, 04909220. Registered Office: IC2, Keele Science Park, Keele, Staffordshire, ST5 5NH. This message has been scanned for security threats by iCritical. For further information, please visit www.icritical.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slowing UPDATEs inside a transaction
Robert Haas wrote: > Old row versions have to be kept around until they're no longer of > interest to any still-running transaction. Thanks for the explanation. Regarding the snippet above, why would the intermediate history of multiply-modified uncommitted rows be of interest to anything, or is the current behaviour simply "cheaper" overall in terms of cpu/developer time? -- The information contained in this message is confidential and is intended for the addressee only. If you have received this message in error or there are any problems please notify the originator immediately. The unauthorised use, disclosure, copying or alteration of this message is strictly forbidden. Critical Software Ltd. reserves the right to monitor and record e-mail messages sent to and from this address for the purposes of investigating or detecting any unauthorised use of its system and ensuring its effective operation. Critical Software Ltd. registered in England, 04909220. Registered Office: IC2, Keele Science Park, Keele, Staffordshire, ST5 5NH. This message has been scanned for security threats by iCritical. For further information, please visit www.icritical.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Scaling further up
Quoting Andrew Sullivan <[EMAIL PROTECTED]>: > On Fri, Mar 12, 2004 at 06:25:48PM -0500, Anjan Dave wrote: > > We upgraded from 8GB to 12GB RAM a month or so ago, but even in the > > past, I've never seen the system exhaust on it's system cache (~6GB, in > > 'top'), while it's swapping. > > > > Some one had mentioned why not have the entire DB in memory? How do I > > configure that, for knowledge? > > You don't. It'll automatically be in memory if (a) you have enough > memory, (b) you don't have anything else on the machine using the > memory, and (c) it's been read at least one time. This is the preferred method, but you could create a memory disk if running linux. This has several caveats, though. 1. You may have to recompile the kernel for support. 2. You must store the database on a hard drive partition during reboots. 3. Because of #2 this option is generally useful if you have static content that is loaded to the MD upon startup of the system. You could have some fancy methodology of shutting down the system and then copying the data to a disk-based filesystem, but this is inherently bad since at any moment a power outage would erase any updates changes. The option is there to start with all data in memory, but in general, this is probablt not what you want. Just an FYI. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] [ADMIN] Benchmarking postgres on Solaris/Linux
If it's going to be write intensive then the RAID controller will be the most important thing. A dual p3/500 with a write-back cache will smoke either of the boxes you mention using software RAID on write performance. As for the compute intensive side (complex joins & sorts etc), the Dell will most likely beat the Sun by some distance, although what the Sun lacks in CPU power it may make up a bit in memory bandwidth/latency. Matt > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] Behalf Of Subbiah, > Stalin > Sent: 23 March 2004 18:41 > To: 'Andrew Sullivan'; '[EMAIL PROTECTED]' > Subject: Re: [PERFORM] [ADMIN] Benchmarking postgres on Solaris/Linux > > > We are looking into Sun V210 (2 x 1 GHz cpu, 2 gig ram, 5.8Os) vs. Dell 1750 > (2 x 2.4 GHz xeon, 2 gig ram, RH3.0). database will mostly be > write intensive and disks will be on raid 10. Wondering if 64bit 1 GHz to > 32bit 2.4 GHz make a big difference here. > > Thanks! > > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] Behalf Of Andrew > Sullivan > Sent: Tuesday, March 23, 2004 9:37 AM > To: '[EMAIL PROTECTED]' > Subject: Re: [PERFORM] [ADMIN] Benchmarking postgres on Solaris/Linux > > > On Mon, Mar 22, 2004 at 04:05:45PM -0800, Subbiah, Stalin wrote: > > being the key performance booster for postgres. what is the preferred OS > > for postgres deployment if given an option between linux and solaris. As > > One thing this very much depends on is what you're trying to do. > Suns have a reputation for greater reliability. While my own > experience with Sun hardware has been rather shy of sterling, I _can_ > say that it stands head and shoulders above a lot of the x86 gear you > can get. > > If you're planning to use Solaris on x86, don't bother. Solaris is a > slow, bloated pig compared to Linux, at least when it comes to > managing the largish number of processes that Postgres requires. > > If pure speed is what you're after, I have found that 2-way, 32 bit > Linux on P-IIIs compares very favourably to 4 way 64 bit Ultra SPARC > IIs. > > A > > -- > Andrew Sullivan | [EMAIL PROTECTED] > The fact that technology doesn't work is no bar to success in the > marketplace. > --Philip Greenspun > > ---(end of broadcast)--- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED]) > > ---(end of broadcast)--- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match > ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] [ADMIN] Benchmarking postgres on Solaris/Linux
> > Now if these vendors could somehow eliminate downtime due to human error > > we'd be talking *serious* reliablity. > > You mean making the OS smart enough to know when clearing the arp > cache is a bonehead operation, or just making the hardware smart > enough to realise that the keyswitch really shouldn't be turned > while 40 people are logged in? (Either way, I agree this'd be an > improvement. It'd sure make colocation a lot less painful.) Well I was joking really, but those are two very good examples! Yes, machines should require extra confirmation for operations like those. Hell, even a simple 'init 0' would be well served by a prompt that says "There are currently 400 network sockets open, 50 remote users logged in, and 25 disk IOs per second. What's more, there's nobody logged in at the console to boot me up again afterwards - are you _sure_ you want to shut the machine down?". It's also crazy that there's no prompt after an 'rm -rf' (we could have 'rm -rf --iacceptfullresponsibility' for an unprompted version). Stuff like that would have saved me from a few embarrassments in the past for sure ;-) It drives me absolutely nuts every time I see a $staggeringly_expensive clustered server whose sysadmins are scared to do a failover test in case something goes wrong! Or which has worse uptime than my desktop PC because the cluster software's poorly set up or administered. Or which has both machines on the same circuit breaker. I could go on but it's depressing me. Favourite anecdote: A project manager friend of mine had a new 'lights out' datacenter to set up. The engineers, admins and operators swore blind that everything had been tested in every possible way, and that incredible uptime was guaranteed. 'So if I just pull this disk out everything will keep working?' he asked, and then pulled the disk out without waiting for an answer... Ever since he told me that story I've done exactly that with every piece of so-called 'redundant' hardware a vendor tries to flog me. Ask them to set it up, then just do nasty things to it without asking for permission. Less than half the gear makes it through that filter, and actually you can almost tell from the look on the technical sales rep's face as you reach for the drive/cable/card/whatever whether it will or won't. M ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Wierd context-switching issue on Xeon
As a cross-ref to all the 7.4.x tests people have sent in, here's 7.2.3 (Redhat 7.3), Quad Xeon 700MHz/1MB L2 cache, 3GB RAM. Idle-ish (it's a production server) cs/sec ~5000 3 test queries running: procs memoryswap io system cpu r b w swpd free buff cache si sobibo incs us sy id 3 0 0 23380 577680 105912 2145140 0 0 0 0 107 116890 50 14 35 2 0 0 23380 577680 105912 2145140 0 0 0 0 114 118583 50 15 34 2 0 0 23380 577680 105912 2145140 0 0 0 0 107 115842 54 14 32 2 1 0 23380 577680 105920 2145140 0 0 032 156 117549 50 16 35 HTH Matt > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] Behalf Of Tom Lane > Sent: 20 April 2004 01:02 > To: [EMAIL PROTECTED] > Cc: Joe Conway; scott.marlowe; Bruce Momjian; [EMAIL PROTECTED]; > [EMAIL PROTECTED]; Neil Conway > Subject: Re: [PERFORM] Wierd context-switching issue on Xeon > > > Here is a test case. To set up, run the "test_setup.sql" script once; > then launch two copies of the "test_run.sql" script. (For those of > you with more than two CPUs, see whether you need one per CPU to make > trouble, or whether two test_runs are enough.) Check that you get a > nestloops-with-index-scans plan shown by the EXPLAIN in test_run. > > In isolation, test_run.sql should do essentially no syscalls at all once > it's past the initial ramp-up. On a machine that's functioning per > expectations, multiple copies of test_run show a relatively low rate of > semop() calls --- a few per second, at most --- and maybe a delaying > select() here and there. > > What I actually see on Josh's client's machine is a context swap storm: > "vmstat 1" shows CS rates around 170K/sec. strace'ing the backends > shows a corresponding rate of semop() syscalls, with a few delaying > select()s sprinkled in. top(1) shows system CPU percent of 25-30 > and idle CPU percent of 16-20. > > I haven't bothered to check how long the test_run query takes, but if it > ends while you're still examining the behavior, just start it again. > > Note the test case assumes you've got shared_buffers set to at least > 1000; with smaller values, you may get some I/O syscalls, which will > probably skew the results. > > regards, tom lane > > ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Postgres over Linux NBD or NFS
How about iSCSI? This is exactly what it's for - presenting a bunch of remote SCSI hardware as if it were local. There are several reference implementations on SourceForge from Intel, Cisco & others. I've never tried it myself, but I would if I had the need. And let's face it there are some very big players selling very pricey kit that uses it, so you should have pretty high confidence that the fundamentals are strong. M > The other goal is to be able to stick LOTS of disk into one > box, and dole it out to multiple servers. It's more > expensive to set up and manage 3 RAID arrays than it is to > set up and manage just 1, because you have to manage 3 sets > of disk hardware rather than 1. [snip] > The "poor man's approach" involves trying to fake this by > building a "disk box" running Linux that exports the storage > either as a filesystem (using NFS) or as disk blocks (NBD). > NFS clearly doesn't provide the filesystem semantics needed > to get decent reliability; with NBD, it's not clear what happens :-(. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Swapping in 7.4.3
> This is normal. My personal workstation has been up for 16 > days, and it shows 65 megs used for swap. The linux kernel > looks for things that haven't been accessed in quite a while > and tosses them into swap to free up the memory for other uses. > > This isn't PostgreSQL's fault, or anything elses. It's how a > typical Unix kernel works. I.e. you're seeing a problem that > simply isn't there. Actually it (and other OSes) does slightly better than that. It _copies_ the least recently used pages into swap, but leaves them in memory. Then when there really is a need to swap stuff out there is no need to actually write to swap because it's already been done, and conversely if those pages are wanted then they don't have to be read from disk because they were never removed from memory. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Performance Bottleneck
> And this is exactly where the pgpool advantage lies. > Especially with the > TPC-W, the Apache is serving a mix of PHP (or whatever CGI > technique is > used) and static content like images. Since the 200+ Apache > kids serve > any of that content by random and the emulated browsers very much > encourage it to ramp up MaxClients children by using up to 4 > concurrent > image connections, one does end up with MaxClients DB > connections that > are all relatively low frequently used. In contrast to that the real > pgpool causes lesser, more active DB connections, which is better for > performance. There are two well-worn and very mature techniques for dealing with the issue of web apps using one DB connection per apache process, both of which work extremely well and attack the issue at its source. 1) Use a front-end caching proxy like Squid as an accelerator. Static content will be served by the accelerator 99% of the time. Additionally, large pages can be served immediately to the accelerator by Apache, which can then go on to serve another request without waiting for the end user's dial-up connection to pull the data down. Massive speedup, fewer apache processes needed. 2) Serve static content off an entirely separate apache server than the dynamic content, but by using separate domains (e.g. 'static.foo.com'). Personally I favour number 1. Our last biggish peak saw 6000 open HTTP and HTTPS connections and only 200 apache children, all of them nice and busy, not hanging around on street corners looking bored. During quiet times Apache drops back to its configured minimum of 40 kids. Option 2 has the advantage that you can use a leaner build for the 'dynamic' apache server, but with RAM so plentiful these days that's a less useful property. Basically this puts the 'pooling' back in the stateless HTTP area where it truly belongs and can be proven not to have any peculiar side effects (especially when it comes to transaction safety). Even better, so long as you use URL parameters for searches and the like, you can have the accelerator cache those pages for a certain time too so long as slightly stale results are OK. I'm sure pgpool and the like have their place, but being band-aids for poorly configured websites probably isn't the best use for them. M ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] insert
> > It is likely that you are missing an index on one of those foreign > > key'd items. > > I don't think that is too likely as a foreign key reference > must be a unique key which would have an index. I think you must be thinking of primary keys, not foreign keys. All one-to-many relationships have non-unique foreign keys. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Disabling transaction/outdated-tuple behaviour
> Immediately after performing a vaccum, updates take upto 50 > milliseconds to occur, however the update performance > degrades over time, such that after a few hours of continuous > updates, each update takes about half a second. Regular > vacuuming improves the performance temporarily, but during > the vacuum operation (which takes upto 2 minutes), > performance of concurrent updates falls below an acceptable > level (sometimes > 2 seconds per update). You must be doing an enormous number of updates! You can vacuum as often as you like, and should usually do so at least as often as the time it takes for 'all' tuples to be updated. So, in your case, every 10 updates. OK, that seems unnecessary, how about every 100 updates? > According to the documentation, PostgreSQL keeps the old > versions of the tuples in case of use by other transactions > (i.e. each update is actually extending the table). I believe > this behaviour is what is causing my performance problem. Yes, it probably is. > Is there a way to disable this behaviour such that an update > operation would overwrite the current record and does not > generate an outdated tuple each time? (My application does > not need transactional support). No, I don't believe there is. If you really don't need transaction support then you might want to reconsider whether postgres is really the right tool. M ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Table UPDATE is too slow
> >That looks like poor database normalization, really. Are you > sure you > >don't want to split this into multiple tables instead of having 62 > >columns? > > > No, it is properly normalized. The data in this table is stock > fundamentals, stuff like 52 week high, ex-dividend date, etc, etc. Hmm, the two examples you gave there are actually ripe for breaking out into another table. It's not quite 'normalisation', but if you have data that changes very rarely, why not group it into a separate table? You could have the highly volatile data in one table, the semi-volatile stuff in another, and the pretty static stuff in a third. Looked at another way, if you have sets of fields that tend to change together, group them into tables together. That way you will radically reduce the number of indexes that are affected by each update. But as someone else pointed out, you should at the very least wrap your updates in a big transaction. M ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Caching of Queries
> I've looked at PREPARE, but apparently it only lasts > per-session - that's worthless in our case (web based > service, one connection per data-requiring connection). That's a non-sequitur. Most 'normal' high volume web apps have persistent DB connections, one per http server process. Are you really dropping DB connections and reconnecting each time a new HTTP request comes in? M ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] IBM P-series machines
As for "vendor support" for Opteron, that sure looks like a trainwreck... If you're going through IBM, then they won't want to respond to any issues if you're not running a "bog-standard" RHAS/RHES release from Red Hat. And that, on Opteron, is preposterous, because there's plenty of the bits of Opteron support that only ever got put in Linux 2.6, whilst RHAT is still back in the 2.4 days. To be fair, they have backported a boatload of 2.6 features to their kernel: http://www.redhat.com/software/rhel/kernel26/ And that page certainly isn't an exhaustive list... M ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Which plattform do you recommend I run PostgreSQL for
In the MySQL manual it says that MySQL performs best with Linux 2.4 with ReiserFS on x86. Can anyone official, or in the know, give similar information regarding PostgreSQL? I'm neither official, nor in the know, but I do have a spare moment! I can tell you that any *NIX variant on any modern hardware platform will give you good performance, except for Cygwin/x86. Any differences between OSes on the same hardware are completely swamped by far more direct concerns like IO systems, database design, OS tuning etc. Pick the OS you're most familiar with is usually a good recommendation (and not just for Postgres). ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Opteron vs RHAT
> >>trainwreck... If you're going through IBM, then they won't want to > >>respond to any issues if you're not running a > "bog-standard" RHAS/RHES > >>release from Red Hat. ...> To be fair, we keep on actually running into things that > _can't_ be backported, like fibrechannel drivers that were > written to take advantage of changes in the SCSI support in 2.6. I thought IBM had good support for SUSE? I don't know why I thought that... ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Select with qualified join condition / Batch inserts
> SELECT cmp.WELL_INDEX, cmp.COMPOUND, con.CONCENTRATION > FROM SCR_WELL_COMPOUND cmp, SCR_WELL_CONCENTRATION con > WHERE cmp.BARCODE=con.BARCODE > AND cmp.WELL_INDEX=con.WELL_INDEX > AND cmp.MAT_ID=con.MAT_ID > AND cmp.MAT_ID = 3 > AND cmp.BARCODE='910125864' > AND cmp.ID_LEVEL = 1; Quick guess - type mismatch forcing sequential scan. Try some quotes: AND cmp.MAT_ID = '3' AND cmp.BARCODE='910125864' AND cmp.ID_LEVEL = '1'; M ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Caching of Queries
Basically you set a default in seconds for the HTML results to be cached, and then have triggers set that force the cache to regenerate (whenever CRUD happens to the content, for example). Can't speak for Perl/Python/Ruby/.Net/Java, but Cache_Lite sure made a believer out of me! Nice to have it in a library, but if you want to be that simplistic then it's easy in any language. What if a process on server B modifies a n important value that server A has cached though? Coherency (albeit that the client may choose to not use it) is a must for a general solution. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Caching of Queries
It might be easiest to shove the caching logic into pgpool instead. ... When pg_pool is told to cache a query, it can get a table list and monitor for changes. When it gets changes, simply dumps the cache. It's certainly the case that the typical web app (which, along with warehouses, seems to be one half of the needy apps), could probably do worse than use pooling as well. I'm not well up enough on pooling to know how bulletproof it is though, which is why I included it in my list of things that make me go 'hmm'. It would be really nice not to have to take both things together. More to the point though, I think this is a feature that really really should be in the DB, because then it's trivial for people to use. Taking an existing production app and justifying a switch to an extra layer of pooling software is relatively hard compared with grabbing data from a view instead of a table (or setting a variable, or adding a tweak to a query, or however else it might be implemented). Eminiently doable in pgpool though, and just the right thing for anyone already using it. M ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Caching of Queries
More to the point though, I think this is a feature that really really should be in the DB, because then it's trivial for people to use. How does putting it into PGPool make it any less trivial for people to use? The answers are at http://www2b.biglobe.ne.jp/~caco/pgpool/index-e.html . Specifically, it's a separate application that needs configuration, the homepage has no real discussion of the potential pitfalls of pooling and what this implementation does to get around them, you get the idea. I'm sure it's great software, but it doesn't come as part of the DB server, so 95% of people who would benefit from query caching being implemented in it never will. If it shipped with and was turned on by default in SUSE or RedHat that would be a different matter. Which I realise makes me look like one of those people who doesn't appreciate code unless it's 'popular', but I hope I'm not *that* bad... Oh OK, I'll say it, this is a perfect example of why My*** has so much more mindshare. It's not better, but it sure makes the average Joe _feel_ better. Sorry, I've got my corporate hat on today, I'm sure I'll feel a little less cynical tomorrow. M
Re: [PERFORM] Caching of Queries
Any competently written application where caching results would be a suitable performance boost can already implement application or middleware caching fairly easily, and increase performance much more than putting result caching into the database would. I guess the performance increase is that you can spend $10,000 on a developer, or $10,000 on hardware, and for the most part get a more reliable result the second way. MemcacheD is fine(ish), but it's not a panacea, and it's more than easy to shoot yourself in the foot with it. Caching is hard enough that lots of people do it badly - I'd rather use an implementation from the PG team than almost anywhere else. I don't see caching results in the database as much of a win for most well written applications. Toy benchmarks, sure, but for real apps it seems it would add a lot of complexity, and violate the whole point of using an ACID database. Well the point surely is to _remove_ complexity from the application, which is written by God Knows Who, and put it in the DB, which is written by God And You. And you can still have ACID (cached data is not the same as stale data, although once you have the former, the latter can begin to look tempting sometimes). M ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Caching of Queries
If you're not using a connection pool of some kind then you might as well forget query plan caching, because your connect overhead will swamp the planning cost. This does not mean you have to use something like pgpool (which makes some rather questionable claims IMO); any decent web application language/environment will support connection pooling. Hmm, a question of definition - there's a difference between a pool and a persistent connection. Pretty much all web apps have one connection per process, which is persistent (i.e. not dropped and remade for each request), but not shared between processes, therefore not pooled. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Caching of Queries
> OK, that'd work too... the point is if you're re-connecting > all the time it doesn't really matter what else you do for > performance. Yeah, although there is the chap who was asking questions on the list recently who had some very long-running code on his app servers, so was best off closing the connection because he had far too many postmaster processes just sitting there idle all the time! But you're right, it's a killer usually. M ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Caching of Queries
> I don't know what you are exactly referring to in above URL > when you are talking about "potential pitfalls of pooling". > Please explain more. Sorry, I wasn't implying that pgpool doesn't deal with the issues, just that some people aren't necessarily aware of them up front. For instance, pgpool does an 'abort transaction' and a 'reset all' in lieu of a full reconnect (of course, since a full reconnect is exactly what we are trying to avoid). Is this is enough to guarantee that a given pooled connection behaves exactly as a non-pooled connection would from a client perspective? For instance, temporary tables are usually dropped at the end of a session, so a client (badly coded perhaps) that does not already use persistent connections might be confused when the sequence 'connect, create temp table foo ..., disconnect, connect, create temp table foo ...' results in the error 'Relation 'foo' already exists'. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[PERFORM] Speeding up this function
Hello, I've thought it would be nice to index certain aspects of my apache log files for analysis. I've used several different techniques and have something usable now, but I'd like to tweak it one step further. My first performance optimization was to change the logformat into a CSV format. I processed the logfiles with PHP and plsql stored procedures. Unfortunately, it took more than 24 hours to process 1 days worth of log files. I've now switched to using C# (using mono) to create hash-tables to do almost all of the pre-processing. This has brought the time down to about 3 hours. Actually, if I take out one step it brought the process down to about 6 minutes, which is a tremendous improvement. The one step that is adding 2.5+ hours to the job is not easily done in C#, as far as I know. Once the mostly-normalized data has been put into a table called usage_raw_access I then use this query: insert into usage_access select * , usage_normalize_session(accountid,client,atime) as sessionid from usage_raw_access; All it does is try to "link" pageviews together into a session. here's the function: create or replace function usage_normalize_session (varchar(12), inet, timestamptz) returns integer as ' DECLARE -- $1 = Account ID, $2 = IP Address, $3 = Time RecordSet record; BEGIN SELECT INTO RecordSet DISTINCT sessionid FROM usage_access ua WHERE ua.accountid = $1 AND ua.client = $2 AND ua.atime <= ($3 - ''20 min''::interval)::timestamptz; if found then return RecordSet.sessionid; end if; return nextval(''usage_session_ids''); END;' language plpgsql; And the table usage_access looks like this: Table "public.usage_access" Column| Type | Modifiers -+--+--- [snip] client | inet | atime | timestamp with time zone | accountid | character varying(12)| sessionid | integer | Indexes: usage_acccess_req_url btree (req_url), usage_access_accountid btree (accountid), usage_access_atime btree (atime), usage_access_hostid btree (hostid), usage_access_sessionid btree (sessionid) usage_access_sessionlookup btree (accountid,client,atime); As you can see, this looks for clients who have visited the same site within 20 min. If there is no match, a unique sessionid is assigned from a sequence. If there is a visit, the session id assigned to them is used. I'm only able to process about 25 records per second with my setup. My window to do this job is 3-4 hours and the shorter the better. Here is an explain analyze of the query I do (note I limited it to 1000): EXPLAIN ANALYZE insert into usage_access select * , usage_normalize_session(accountid,client,atime) as sessionid from usage_raw_access limit 1000; QUERY PLAN -- Subquery Scan "*SELECT*" (cost=0.00..20.00 rows=1000 width=196) (actual time=51.63..47634.22 rows=1000 loops=1) -> Limit (cost=0.00..20.00 rows=1000 width=196) (actual time=51.59..47610.23 rows=1000 loops=1) -> Seq Scan on usage_raw_access (cost=0.00..20.00 rows=1000 width=196) (actual time=51.58..47606.14 rows=1001 loops=1) Total runtime: 48980.54 msec I also did an explain of the query that's performed inside the function: EXPLAIN ANALYZE select sessionid from usage_access ua where ua.accountid = 'XYZ' and ua.client = '64.68.88.45'::inet and ua.atime <= '2003-11-02 04:50:01-05'::timestamptz; QUERY PLAN -- Index Scan using usage_access_sessionlookup on usage_access ua (cost=0.00..6.02 rows=1 width=4) (actual time=0.29..0.29 rows=0 loops=1) Index Cond: ((accountid = 'XYZ'::character varying) AND (client = '64.68.88.45'::inet) AND (atime <= '2003-11-02 04:50:01-05'::timestamp with time zone)) Total runtime: 0.35 msec (3 rows) What I'd really like to know is if someone knows a way to do any of the following: a: Make the INSERT into ... SELECT *,usage_access_sessionlookup().. work faster b: Make the usage_access_sessionlookup() smarter,better,etc. c: Do this in C# using a hash-table or some other procedure that would be quicker. d: Find an algorithm to create the sessionid without having to do any database or hash-table lookups. As the dataset gets bigger, it won't fit in RAM and the lookup queries will become I/O bound, drastically slowing things down. d: is my first choice. For some reason I just can't seem to get my mind around the data. I wonder if there'
Re: [PERFORM] Speeding up this function
On Tue, 19 Oct 2004 15:49:45 -0400, Jeremy Dunn <[EMAIL PROTECTED]> wrote: > > -Original Message- > > From: [EMAIL PROTECTED] > > [mailto:[EMAIL PROTECTED] On Behalf Of > > Matt Nuzum > > Sent: Tuesday, October 19, 2004 3:35 PM > > To: pgsql-performance > > Subject: [PERFORM] Speeding up this function > > > > > This is probably a stupid question, but why are you trying to create > sessions after the fact? Since it appears that users of your site must > login, why not just assign a sessionID to them at login time, and keep > it in the URL for the duration of the session? Then it would be easy to > track where they've been. > > - Jeremy > > You don't have to log in to visit the sites. These log files are actually for many domains. Right now, we do logging with a web-bug and it does handle the sessions, but it relies on javascript and we want to track a lot more than we are now. Plus, that code is in JavaScript and one of our primary motiviations is to ditch MySQL completely. -- Matthew Nuzum | Makers of "Elite Content Management System" www.followers.net | View samples of Elite CMS in action [EMAIL PROTECTED] | http://www.followers.net/portfolio/ ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] OS desicion
You are asking the wrong question. The best OS is the OS you (and/or the customer) knows and can administer competently. The real performance differences between unices are so small as to be ignorable in this context. The context switching bug is not OS-dependent, but varys in severity across machine architectures (I understand it to be mostly P4/Athlon related, but don't take my word for it). M Tom Fischer wrote: Hi List, I have a Dual-Xeon 3Ghz System with with GB RAM and an Adaptec 212ß SCSI RAID with 4 SCA Harddiscs. Our customer wants to have the Machine tuned for best Database performance. Which OS should we used? We are tending between Linux 2.6 or FreeBSD. The Database Size is 5GB and ascending. Most SQL-Queries are Selects, the Tablesizes are beetween 300k and up to 10 MB. I've read the Hardware Performance Guide and the result was to take FreeBSD in the Decision too :) And what is on this Context Switiching Bug i have read in the Archive? Hope you can help me Regards Tom ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] how much mem to give postgres?
How would I turn that off? In the kernel config? Not too familiar with that. I have a 2 proc xeon with 4 gigs of mem on the way for postgres, so I hope HT isn't a problem. If HT is turned off, does it just not use the other "half" of the processor? Or does the processor just work as one unit? You turn it off in the BIOS. There is no 'other half', the processor is just pretending to have two cores by shuffling registers around, which gives maybe a 5-10% performance gain in certain multithreaded situations. A hack to overcome marchitactural limitations due to the overly long pipeline in the Prescott core.. Really of most use for desktop interactivity rather than actual throughput. M ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] OS desicion
The real performance differences between unices are so small as to be ignorable in this context. <> Well, at least the difference between Linux and BSD. There are substantial tradeoffs should you chose to use Solaris or UnixWare. Yes, quite right, I should have said 'popular x86-based unices'.
Re: [PERFORM] how much mem to give postgres?
Hyperthreading is actually an excellent architectural feature that can give significant performance gains when implemented well and used for an appropriate workload under a decently HT aware OS. IMO, typical RDBMS streams are not an obviously appropriate workload, Intel didn't implement it particularly well and I don't think there are any OSes that support it particularly well. But don't write off using it in the future, when it's been improved at both the OS and the silicon levels. You are quite right of course - unfortunately the current Intel implementation meets nearly none of these criteria! As Rod Taylor pointed out off-list, IBM's SMT implementation on the Power5 is vastly superior. Though he's also just told me that Sun is beating IBM on price/performance for his workload, so who knows how reliable a chap he is... ;-) M ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[PERFORM] Anything to be gained from a 'Postgres Filesystem'?
I suppose I'm just idly wondering really. Clearly it's against PG philosophy to build an FS or direct IO management into PG, but now it's so relatively easy to plug filesystems into the main open-source Oses, It struck me that there might be some useful changes to, say, XFS or ext3, that could be made that would help PG out. I'm thinking along the lines of an FS that's aware of PG's strategies and requirements and therefore optimised to make those activities as efiicient as possible - possibly even being aware of PG's disk layout and treating files differently on that basis. Not being an FS guru I'm not really clear on whether this would help much (enough to be worth it anyway) or not - any thoughts? And if there were useful gains to be had, would it need a whole new FS or could an existing one be modified? So there might be (as I said, I'm not an FS guru...): * great append performance for the WAL? * optimised scattered writes for checkpointing? * Knowledge that FSYNC is being used for preserving ordering a lot of the time, rather than requiring actual writes to disk (so long as the writes eventually happen in order...)? Matt Matt Clark Ymogen Ltd P: 0845 130 4531 W: https://ymogen.net/ M: 0774 870 1584 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?
> Looking at that list, I got the feeling that you'd want to > push that PG-awareness down into the block-io layer as well, > then, so as to be able to optimise for (perhaps) conflicting > goals depending on what the app does; for the IO system to be > able to read the apps mind it needs to have some knowledge of > what the app is / needs / wants and I get the impression that > this awareness needs to go deeper than the FS only. That's a fair point, it would need be a kernel patch really, although not necessarily a very big one, more a case of looking at FDs and if they're flagged in some way then get the PGfs to do the job instead of/as well as the normal code path. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] can't handle large number of INSERT/UPDATEs
I don't have iostat on that machine, but vmstat shows a lot of writes to the drives, and the runnable processes are more than 1: 6 1 0 3617652 292936 279192800 0 52430 1347 4681 25 19 20 37 Assuming that's the output of 'vmstat 1' and not some other delay, 50MB/second of sustained writes is usually considered 'a lot'. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] can't handle large number of INSERT/UPDATEs
and certainly anyone who's been around a computer more than a week or two knows which direction "in" and "out" are customarily seen from. regards, tom lane Apparently not whoever wrote the man page that everyone copied ;-) Interesting. I checked this on several machines. They actually say different things. Redhat 9- bi: Blocks sent to a block device (blocks/s). Latest Cygwin- bi: Blocks sent to a block device (blocks/s). Redhat 7.x- bi: Blocks sent to a block device (blocks/s). Redhat AS3- bi: blocks sent out to a block device (in blocks/s) I would say that I probably agree, things should be relative to the cpu. However, it doesn't seem to be something that was universally agreed upon. Or maybe the man-pages were all wrong, and only got updated recently. Looks like the man pages are wrong, for RH7.3 at least. It says bi is 'blocks written', but an actual test like 'dd if=/dev/zero of=/tmp/test bs=1024 count=16384' on an otherwise nearly idle RH7.3 box gives: procs memoryswap io system cpu r b w swpd free buff cache si sobibo incs us sy id 0 0 0 75936 474704 230452 953580 0 0 0 0 106 2527 0 0 99 0 0 0 75936 474704 230452 953580 0 0 0 16512 376 2572 0 2 98 0 0 0 75936 474704 230452 953580 0 0 0 0 105 2537 0 0 100 Which is in line with bo being 'blocks written'. M ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] preloading indexes
Title: Message The best way to get all the stuff needed by a query into RAM is to run the query. Is it more that you want to 'pin' the data in RAM so it doesn't get overwritten by other queries? -Original Message-From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED]Sent: 03 November 2004 17:31To: [EMAIL PROTECTED]Subject: [PERFORM] preloading indexes I am working with some pretty convoluted queries that work very slowly the first time they’re called but perform fine on the second call. I am fairly certain that these differences are due to the caching. Can someone point me in a direction that would allow me to pre-cache the critical indexes?
Re: [PERFORM] Restricting Postgres
> I have a dual processor system that can support over 150 concurrent > connections handling normal traffic and load. Now suppose I setup > Apache to spawn all of it's children instantly, what will ... > This will spawn 150 children in a short order of time and as > this takes "Doctor, it hurts when I do this!" "Well, don't do that then..." Sorry, couldn't resist ;-) Our Apache/PG driven website also needs to be able to deal with occasional large peaks, so what we do is: StartServers 15 # Don't create too many children initially MinSpareServers 10 # Always have at least 10 spares lying around MaxSpareServers 20 # But no more than 20 MaxClients 150 # Up to 150 - the default 256 is too much for our RAM So on server restart 15 Apache children are created, then one new child every second up to a maximum of 150. Apache's 'ListenBackLog' is around 500 by default, so there's plenty of scope for queuing inbound requests while we wait for sufficient children to be spawned. In addition we (as _every_ high load site should) run Squid as an accelerator, which dramatically increases the number of client connections that can be handled. Across 2 webservers at peak times we've had 50,000 concurrently open http & https client connections to Squid, with 150 Apache children doing the work that squid can't (i.e. all the dynamic stuff), and PG (on a separate box of course) whipping through nearly 800 mixed selects, inserts and updates per second - and then had to restart Apache on one of the servers for a config change... Not a problem :-) One little tip - if you run squid on the same machine as apache, and use a dual-proc box, then because squid is single-threaded it will _never_ take more than half the CPU - nicely self balancing in a way. M ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Restricting Postgres
> Apache::DBI overall works better to what I require, even if > it is not a > pool per sey. Now if pgpool supported variable rate pooling like > Apache does with it's children, it might help to even things > out. That > and you'd still get the spike if you have to start the webserver and > database server at or around the same time. I still don't quite get it though - you shouldn't be getting more than one child per second being launched by Apache, so that's only one PG postmaster per second, which is really a trivial load. That is unless you have 'StartServers' set high, in which case the 'obvious' answer is to lower it. Are you launching multiple DB connections per Apache process as well? ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Restricting Postgres
> Case in point: A first time visitor hits your home page. A > dynamic page is generated (in about 1 second) and served > (taking 2 more seconds) which contains links to 20 additional The gain from an accelerator is actually even more that that, as it takes essentially zero seconds for Apache to return the generated content (which in the case of a message board could be quite large) to Squid, which can then feed it slowly to the user, leaving Apache free again to generate another page. When serving dialup users large dynamic pages this can be a _huge_ gain. I think Martin's pages (dimly recalling another thread) take a pretty long time to generate though, so he may not see quite such a significant gain. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Restricting Postgres
> Correct the 75% of all hits are on a script that can take > anywhere from > a few seconds to a half an hour to complete.The script > essentially > auto-flushes to the browser so they get new information as it arrives > creating the illusion of on demand generation. This is more like a streaming data server, which is a very different beast from a webserver, and probably better suited to the job. Usually either multithreaded or single-process using select() (just like Squid). You could probably build one pretty easily. Using a 30MB Apache process to serve one client for half an hour seems like a hell of a waste of RAM. > A squid proxy would probably cause severe problems when > dealing with a > script that does not complete output for a variable rate of time. No, it's fine, squid gives it to the client as it gets it, but can receive from the server faster. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Restricting Postgres
1- You have a query that runs for half an hour and you spoon feed the results to the client ? (argh) 2- Your script looks for new data every few seconds, sends a packet, then sleeps, and loops ? If it's 2 I have a readymade solution for you, just ask. I'm guessing (2) - PG doesn't give the results of a query in a stream. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Restricting Postgres
These are CGI scripts at the lowest level, nothing more and nothing less. While I could probably embed a small webserver directly into the perl scripts and run that as a daemon, it would take away the portability that the scripts currently offer. If they're CGI *scripts* then they just use the CGI environment, not Apache, so a daemon that accepts the inbound connections, then compiles the scripts a-la Apache::Registry, but puts each in a separate thread would be, er, relatively easy for someone better at multithreaded stuff than me. This should be my last question on the matter, does squid report the proper IP address of the client themselves?That's a critical requirement for the scripts. In the X-Forwarded-For header. Not that you can be sure you're seeing the true client IP anyway if they've gone through an ISP proxy beforehand. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Restricting Postgres
In your webpage include an iframe with a Javascript to refresh it every five seconds. The iframe fetches a page from the server which brings in the new data in form of generated JavaScript which writes in the parent window. Thus, you get a very short request every 5 seconds to fetch new data, and it is displayed in the client's window very naturally. ... Yup. If you go the JS route then you can do even better by using JS to load data into JS objects in the background and manipulate the page content directly, no need for even an Iframe. Ignore the dullards who have JS turned off - it's essential for modern web apps, and refusing JS conflicts absolutely with proper semantic markup. http://developer.apple.com/internet/webcontent/xmlhttpreq.html is a good starting point. It's clear that this discussion has moved way away from PG! Although in the context of DB backed web apps I guess in remains a bit on-topic... M ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] appropriate indexing
- ITEM table will, grow, grow, grow (sold items are not deleted) WHERE PRODUCT.SECTION_USED_FK IS NOT NULL AND ITEM.STATUS=1 and (ITEM.KIND=2 or ITEM.KIND=3) Partial index on item.status ? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Better Hardware, worst Results
[EMAIL PROTECTED] wrote: Citando Rod Taylor <[EMAIL PROTECTED]>: Please send an explain analyze from both. I'm sendin three explains. In the first the Dell machine didn't use existing indexes, so I turn enable_seqscan off (this is the second explain). The total cost decreased, but the total time not. The third explain refers to the cheaper (and faster) machine. The last thing is the query itself. Nested Loop (cost=9008.68..13596.97 rows=1 width=317) (actual time=9272.803..65287.304 rows=2604 loops=1) Nested Loop (cost=5155.51..19320.20 rows=1 width=317) (actual time=480.311..62530.121 rows=2604 loops=1) Hash Join (cost=2.23..11191.77 rows=9 width=134) (actual time=341.708..21868.167 rows=2604 loops=1) Well the plan is completely different on the dev machine. Therefore either the PG version or the postgresql.conf is different. No other possible answer. M
Re: [PERFORM] Better Hardware, worst Results
All 3 plans have crappy estimates. Run ANALYZE in production, then send another explain analyze (as an attachment please, to avoid linewrap). Er, no other possible answer except Rod's :-) ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Restricting Postgres
Javascript is too powerful to turn for any random web page. It is only essential for web pages because people write their web pages to only work with javascript. Hmm... I respectfully disagree. It is so powerful that it is impossible to ignore when implementing a sophisticated app. And it is not dangerous to the user so long as they have a popup blocker. Commercially, I can ignore the people who turn it off, and I can gain a huge benefit from knowing that 95% of people have it turned on, because it gives my users a hugely better experience than the equivalent XHTML only page (which I deliver, and which works, but which is a fairly depressing experience compared to the JS enabled version). It is _amazing_ how much crud you can take out of a page if you let JS do the dynamic stuff (with CSS still in full control of the styling). Nice, clean, semantically sensible XHTML, that can be transformed for multiple devices - it's great. An example: /previews/foo.wmv But we want it to appear in a popup when viewed in certain devices Easy - Attach an 'onclick' event handler (or just set the target attribute) when the device has a suitable screen & media player, but leave the markup clean for the rest of the world. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Restricting Postgres
A note though : you'll have to turn off HTTP persistent connections in your server (not in your proxy) or youre back to square one. I hadn't considered that. On the client side it would seem to be up to the client whether to use a persistent connection or not. If it does, then yeah, a request every 5 seconds would still just hold open a server. One more reason to use a proxy I s'pose. It's clear that this discussion has moved way away from PG! Although in the context of DB backed web apps I guess in remains a bit on-topic... I find it very on-topic as - it's a way to help this guy solve his "pg problem" which was iin fact a design problem - it's the future of database driven web apps (no more reloading the whole page !) I think in the future there will be a good bit of presentation login in the client... Not if Bruno has his way ;-) ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Restricting Postgres
Pierre-Frédéric Caillaud wrote: check this marvelus piece of 5 minutes of work : http://boutiquenumerique.com/test/iframe_feed.html cela m'a fait le sourire :-) (apologies for bad french) M ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Strange (?) Index behavior?
> For some reason it's a requirement that partial wildcard > searches are done on this field, such as "SELECT ... WHERE > field LIKE 'A%'" > > I thought an interesting way to do this would be to simply > create partial indexes for each letter on that field, and it > works when the query matches the WHERE clause in the index > exactly like above. The problem is thus: I thought PG could use an ordinary index for 'like' conditions with just a terminating '%'? My other thought is that like 'A%' should grab about 1/26th of the table anyway (if the initial character distribution is random), and so a sequential scan might be the best plan anyway... M ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[PERFORM] What is the difference between these?
To me, these three queries seem identical... why doesn't the first one (simplest to understand and write) go the same speed as the third one? I'll I'm trying to do is get statistics for one day (in this case, today) summarized. Table has ~25M rows. I'm using postgres 7.3.? on rh linux 7.3 (note that i think the difference between the first two might just be related to the data being in memory for the second query). EXPLAIN ANALYZE select count(distinct sessionid) from usage_access where atime > date_trunc('day', now()); QUERY PLAN -- Aggregate (cost=933439.69..933439.69 rows=1 width=4) (actual time=580350.43..580350.43 rows=1 loops=1) -> Seq Scan on usage_access (cost=0.00..912400.11 rows=8415831 width=4) (actual time=580164.48..580342.21 rows=2964 loops=1) Filter: (atime > date_trunc('day'::text, now())) Total runtime: 580350.65 msec (4 rows) EXPLAIN ANALYZE select count(distinct sessionid) from (select * from usage_access where atime > date_trunc('day', now())) as temp; QUERY PLAN -- Aggregate (cost=933439.69..933439.69 rows=1 width=4) (actual time=348012.85..348012.85 rows=1 loops=1) -> Seq Scan on usage_access (cost=0.00..912400.11 rows=8415831 width=4) (actual time=347960.53..348004.68 rows=2964 loops=1) Filter: (atime > date_trunc('day'::text, now())) Total runtime: 348013.10 msec (4 rows) EXPLAIN ANALYZE select count(distinct sessionid) from usage_access where atime between date_trunc('day', now()) and date_trunc('day', now()) + '1 day'::interval; QUERY PLAN -- Aggregate (cost=89324.98..89324.98 rows=1 width=4) (actual time=27.84..27.84 rows=1 loops=1) -> Index Scan using usage_access_atime on usage_access (cost=0.00..89009.39 rows=126237 width=4) (actual time=0.51..20.37 rows=2964 loops=1) Index Cond: ((atime >= date_trunc('day'::text, now())) AND (atime <= (date_trunc('day'::text, now()) + '1 day'::interval))) Total runtime: 28.11 msec (4 rows) -- Matthew Nuzum | Makers of "Elite Content Management System" www.followers.net | View samples of Elite CMS in action [EMAIL PROTECTED] | http://www.followers.net/portfolio/ ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Strange (?) Index behavior?
> With that many rows, and a normal index on the field, > postgres figures the best option for say "I%" is not an index > scan, but a sequential scan on the table, with a filter -- > quite obviously this is slow as heck, and yes, I've run > analyze several times and in fact have the vacuum analyze automated. Ah, so "like 'I%'" uses a very slow seq scan, but "like 'ABC%'" uses an ordinary index OK? If so then... The planner would usually assume (from what Tom usually says) that 1/26 selectivity isn't worth doing an index scan for, but in your case it's wrong (maybe because the rows are very big?) You may be able to get the planner to go for an index scan on "like 'I%'" by tweaking the foo_cost variables in postgresql.conf Or you could have the app rewrite "like 'I%'" to "like 'IA%' or like 'IB%' ... ", or do that as a stored proc. > With the partial index the index scan is used and the cost > drops from 0..2million to 0..9000 -- a vast improvement. So there are really only 9000 rows out of 76 million starting with 'I'? How about combining some techniques - you could create an index on the first two chars of the field (should be selective enough to give an index scan), select from that, and select the actual data with the like clause. CREATE INDEX idx_firstletters ON table (substr(field, 1, 2)); CREATE INDEX idx_all ON table (field); SELECT field FROM (SELECT field FROM table WHERE substr(field, 1, 2) = 'DE') AS approx WHERE field LIKE 'DE%'; Any good? ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Strange (?) Index behavior?
So there are really only 9000 rows out of 76 million starting with 'I'? How about combining some techniques - you could create an index on the first two chars of the field (should be selective enough to give an index scan), select from that, and select the actual data with the like clause. I was talking about the cost, not the number of rows. About 74,000 rows are returned but the query only takes about 8 seconds to run. -- Well, 74000/7600 ~= 0.1%, way less than 1/26, so no surprise that an indexscan is better, and also no surprise that the planner can't know that I is such an uncommon initial char. with the partial index in place. CREATE INDEX idx_firstletters ON table (substr(field, 1, 2)); CREATE INDEX idx_all ON table (field); SELECT field FROM (SELECT field FROM table WHERE substr(field, 1, 2) = 'DE') AS approx WHERE field LIKE 'DE%'; That looks like a pretty slick way to create an index, I didn't know there was such a way to do it.. but It appears that this will not work with queries where the WHERE clause wants to find substrings longer than 2 characters. I don't see why not, it just uses the functional index to grap the 1/(ascii_chars^2) of the rows that are of obvious interest, and then uses the standard index to filter that set.. Where it won't work is where you just want one initial char! Which is why I suggested the silly query rewrite... Going back to the initial problem -- having only one large, complete index on the table (no partial indexes) the query "SELECT field FROM table WHERE field LIKE 'A%';" does not use the index. The query "SELECT field FROM table WHERE field LIKE 'AB%';" however, does use the single large index if it exists. If you were planning the query, what would you do? Assuming we're talking about A-Z as possible first chars, and assuming we don't know the distribution of those chars, then we have to assume 1/26 probability of each char, so a seq scan makes sense. Whereas like 'JK%' should only pull 1/500 rows. Adding the partial index "CREATE INDEX idx_table_substrfield_A ON table (field) WHERE field LIKE 'A%';" causes all queries with substrings of any length to do index scans.provided I issue the query as: SELECT field FROM table WHERE field LIKE 'A%' AND field LIKE 'AB%'; -- or even -- SELECT field FROM table WHERE field LIKE 'A%'; The latter query, without the partial index described, does a sequential scan on the table itself instead of an index scan. Yes, because (I assume, Tom will no doubt clarify/correct), by creating the partial indices you create a lot more information about the distribution of the first char - either that, or the planner simply always uses an exactly matching partial index if available. I _think_ that creating 26 partial indexes on '?%' is essentially the same thing as creating one functional index on substr(field,1,1), just messier, unless the partial indexes cause the planner to do something special... M ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] ext3 journalling type
> Am I right to assume that "writeback" is both fastest and at > the same time as safe to use as ordered? Maybe any of you > did some benchmarks? It should be fastest because it is the least overhead, and safe because postgres does it's own write-order guaranteeing through fsync(). You should also mount the FS with the 'noatime' option. But For some workloads, there are tests showing that 'data=journal' can be the fastest! This is because although the data is written twice (once to the journal, and then to its real location on disk) in this mode data is written _sequentially_ to the journal, and later written out to its destination, which may be at a quieter time. There's a discussion (based around 7.2) here: http://www.kerneltraffic.org/kernel-traffic/kt20020401_160.txt M ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Speed in V8.0
Another man working to the bitter end this Christmas! There could be many reasons, but maybe first you should look at the amount of RAM available? If the tables fit in RAM on the production server but not on the dev server, then that will easily defeat the improvement due to using the native DB version. Why don't you install cygwin on the dev box and do the comparison using the same hardware? M > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of > Thomas Wegner > Sent: 21 December 2004 23:03 > To: pgsql-performance@postgresql.org > Subject: [PERFORM] Speed in V8.0 > > > Hello, i have a problem between V7.4.3 Cygwin and > V8.0RC2 W2K. I have 2 systems: > > 1. Production Machine > - Dual P4 3000MHz > - 2 GB RAM > - W2K > - PostgreSQL 7.4.3 under Cygwin > - i connect to it over a DSL Line > 2. Develop Machine > - P4 1800MHz > - 760 MB RAM > - PostgreSQL Native Windows > - local connection 100MB/FD > > Both systems use the default postgresql.conf. Now the > problem. I have an (unoptimized, dynamic) query wich was > execute on the production machine over DSL in 2 seconds and > on my develop machine, connected over local LAN, in 119 seconds! > > Whats this? I can not post the query details here public, its > a commercial project. Any first idea? I execute on both > machine the same query with the same database design! > - > Thomas Wegner > CabrioMeter - The Weather Plugin for Trillian > http://www.wegner24.de/cabriometer > > > > ---(end of > broadcast)--- > TIP 4: Don't 'kill -9' the postmaster > ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Swapping on Solaris
> Kevin Schroeder wrote: > It looks to me like you are using no (device or file) swap at all, and > have 1.3G of real memory free, so could in fact give Postgres more of it :-) > Indeed. If you DO run into trouble after giving Postgres more RAM, use the vmstat command. You can use this command like "vmstat 10". (ignore the first line) Keep an eye on the "pi" and "po" parameters. (kilobytes paged in and out) HTH, Matt -- Matt Casters <[EMAIL PROTECTED]> i-Bridge bvba, http://www.kettle.be Fonteinstraat 70, 9400 Okegem, Belgium Phone +32 (0) 486/97.29.37 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Swapping on Solaris
This page may be of use: http://www.serverworldmagazine.com/monthly/2003/02/solaris.shtml From personal experience, for god's sake don't think Solaris' VM/swap implementation is easy - it's damn good, but it ain't easy! Matt Kevin Schroeder wrote: I think it's probably just reserving them. I can't think of anything else. Also, when I run swap activity with sar I don't see any activity, which also points to reserved swap space, not used swap space. swap -s reports total: 358336k bytes allocated + 181144k reserved = 539480k used, 2988840k available Kevin - Original Message - From: "Alan Stange" <[EMAIL PROTECTED]> To: "Kevin Schroeder" <[EMAIL PROTECTED]> Cc: Sent: Wednesday, January 19, 2005 11:04 AM Subject: Re: [PERFORM] Swapping on Solaris Kevin Schroeder wrote: I may be asking the question the wrong way, but when I start up PostgreSQL swap is what gets used the most of. I've got 1282MB free RAM right now and and 515MB swap in use. Granted, swap file usage probably wouldn't be zero, but I would guess that it should be a lot lower so something must be keeping PostgreSQL from using the free RAM that my system is reporting. For example, one of my postgres processes is 201M in size but on 72M is resident in RAM. That extra 130M is available in RAM, according to top, but postgres isn't using it. The test you're doing doesn't measure what you think you're measuring. First, what else is running on the machine?Note that some shared memory allocations do reserve backing pages in swap, even though the pages aren't currently in use. Perhaps this is what you're measuring? "swap -s" has better numbers than top. You'd be better by trying a reboot then starting pgsql and seeing what memory is used. Just because you start a process and see the swap number increase doesn't mean that the new process is in swap. It means some anonymous pages had to be evicted to swap to make room for the new process or some pages had to be reserved in swap for future use. Typically a new process won't be paged out unless something else is causing enormous memory pressure... -- Alan ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM]
Hi, I have the go ahead of a customer to do some testing on Postgresql in a couple of weeks as a replacement for Oracle. The reason for the test is that the number of users of the warehouse is going to increase and this will have a serious impact on licencing costs. (I bet that sounds familiar) We're running a medium sized data warehouse on a Solaris box (4CPU, 8Gb RAM) on Oracle. Basically we have 2 large fact tables to deal with: one going for 400M rows, the other will be hitting 1B rows soon. (around 250Gb of data) My questions to the list are: has this sort of thing been attempted before? If so, what where the performance results compared to Oracle? I've been reading up on partitioned tabes on pgsql, will the performance benefit will be comparable to Oracle partitioned tables? What are the gotchas? Should I be testing on 8 or the 7 version? While I didn't find any documents immediately, are there any fine manuals to read on data warehouse performance tuning on PostgreSQL? Thanks in advance for any help you may have, I'll do my best to keep pgsql-performance up to date on the results. Best regards, Matt -- Matt Casters <[EMAIL PROTECTED]> i-Bridge bvba, http://www.kettle.be Fonteinstraat 70, 9400 Okegem, Belgium Phone +32 (0) 486/97.29.37 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM]
Thanks Stephen, My main concern is to get as much read performance on the disks as possible on this given system. CPU is rarely a problem on a typical data warehouse system, this one's not any different. We basically have 2 RAID5 disk sets (300Gb) and 150Gb) with a third one coming along.(around 350Gb) I was kind of hoping that the new PGSQL tablespaces would allow me to create a storage container spanning multiple file-systems, but unfortunately, that seems to be not the case. Is this correct? That tells me that I probably need to do a full reconfiguration of the disks on the Solaris level to get maximum performance out of the system. Mmmm. This is going to be a though one to crack. Perhaps it will be possible to get some extra juice out of placing the indexes on the smaller disks (150G) and the data on the bigger ones? Thanks! Matt -Oorspronkelijk bericht- Van: Stephen Frost [mailto:[EMAIL PROTECTED] Verzonden: donderdag 20 januari 2005 15:26 Aan: Matt Casters CC: pgsql-performance@postgresql.org Onderwerp: Re: [PERFORM] * Matt Casters ([EMAIL PROTECTED]) wrote: > I have the go ahead of a customer to do some testing on Postgresql in > a couple of weeks as a replacement for Oracle. > The reason for the test is that the number of users of the warehouse > is going to increase and this will have a serious impact on licencing > costs. (I bet that sounds familiar) Rather familiar, yes... :) > We're running a medium sized data warehouse on a Solaris box (4CPU, 8Gb RAM) on Oracle. > Basically we have 2 large fact tables to deal with: one going for 400M > rows, the other will be hitting 1B rows soon. > (around 250Gb of data) Quite a bit of data. There's one big thing to note here I think- Postgres will not take advantage of multiple CPUs for a given query, Oracle will. So, it depends on your workload as to how that may impact you. Situations where this will be unlikely to affect you: Your main bottle-neck is IO/disk and not CPU. You run multiple queries in parallel frequently. There are other processes on the system which chew up CPU time anyway. Situations where you're likely to be affected would be: You periodically run one big query. You run a set of queries in sequential order. > My questions to the list are: has this sort of thing been attempted > before? If so, what where the performance results compared to Oracle? I'm pretty sure it's been attempted before but unfortunately I don't have any numbers on it myself. My data sets aren't that large (couple million rows) but I've found PostgreSQL at least as fast as Oracle for what we do, and much easier to work with. > I've been reading up on partitioned tabes on pgsql, will the > performance benefit will be comparable to Oracle partitioned tables? In this case I would think so, except that PostgreSQL still won't use multiple CPUs for a given query, even against partitioned tables, aiui. > What are the gotchas? See above? :) Other issues are things having to do w/ your specific SQL- Oracle's old join syntax isn't supported by PostgreSQL (what is it, something like select x,y from a,b where x=%y; to do a right-join, iirc). > Should I be testing on 8 or the 7 version? Now that 8.0 is out I'd say probably test with that and just watch for 8.0.x releases before you go production, if you have time before you have to go into production with the new solution (sounds like you do- changing databases takes time anyway). > Thanks in advance for any help you may have, I'll do my best to keep > pgsql-performance up to date on the results. Hope that helps. Others on here will correct me if I misspoke. :) Stephen ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM]
Joshua, Actually that's a great idea! I'll have to check if Solaris wants to play ball though. We'll have to see as we don't have the new disks yet, ETA is next week. Cheers, Matt -Oorspronkelijk bericht- Van: Joshua D. Drake [mailto:[EMAIL PROTECTED] Verzonden: donderdag 20 januari 2005 21:26 Aan: [EMAIL PROTECTED] CC: pgsql-performance@postgresql.org Onderwerp: Re: [PERFORM] Matt Casters wrote: > > Thanks Stephen, > > My main concern is to get as much read performance on the disks as > possible on this given system. CPU is rarely a problem on a typical > data warehouse system, this one's not any different. > > We basically have 2 RAID5 disk sets (300Gb) and 150Gb) with a third > one coming along.(around 350Gb) Why not run two raid systems. A RAID 1 for your OS and a RAID 10 for your database? Push all of your extra drives into the RAID 10. Sincerely, Joshua D. Drake > I was kind of hoping that the new PGSQL tablespaces would allow me to > create a storage container spanning multiple file-systems, but > unfortunately, that seems to be not the case. Is this correct? > > That tells me that I probably need to do a full reconfiguration of the > disks on the Solaris level to get maximum performance out of the system. > Mmmm. This is going to be a though one to crack. Perhaps it will be > possible to get some extra juice out of placing the indexes on the > smaller disks (150G) and the data on the bigger ones? > > Thanks! > > Matt > > -Oorspronkelijk bericht- > Van: Stephen Frost [mailto:[EMAIL PROTECTED] > Verzonden: donderdag 20 januari 2005 15:26 > Aan: Matt Casters > CC: pgsql-performance@postgresql.org > Onderwerp: Re: [PERFORM] > > * Matt Casters ([EMAIL PROTECTED]) wrote: > >>I have the go ahead of a customer to do some testing on Postgresql in >>a couple of weeks as a replacement for Oracle. >>The reason for the test is that the number of users of the warehouse >>is going to increase and this will have a serious impact on licencing >>costs. (I bet that sounds familiar) > > > Rather familiar, yes... :) > > >>We're running a medium sized data warehouse on a Solaris box (4CPU, >>8Gb > > RAM) on Oracle. > >>Basically we have 2 large fact tables to deal with: one going for 400M >>rows, the other will be hitting 1B rows soon. >>(around 250Gb of data) > > > Quite a bit of data. There's one big thing to note here I think- > Postgres will not take advantage of multiple CPUs for a given query, Oracle will. > So, it depends on your workload as to how that may impact you. > Situations where this will be unlikely to affect you: > > Your main bottle-neck is IO/disk and not CPU. > You run multiple queries in parallel frequently. > There are other processes on the system which chew up CPU time anyway. > > Situations where you're likely to be affected would be: > > You periodically run one big query. > You run a set of queries in sequential order. > > >>My questions to the list are: has this sort of thing been attempted >>before? If so, what where the performance results compared to Oracle? > > > I'm pretty sure it's been attempted before but unfortunately I don't > have any numbers on it myself. My data sets aren't that large (couple > million > rows) but I've found PostgreSQL at least as fast as Oracle for what we > do, and much easier to work with. > > >>I've been reading up on partitioned tabes on pgsql, will the >>performance benefit will be comparable to Oracle partitioned tables? > > > In this case I would think so, except that PostgreSQL still won't use > multiple CPUs for a given query, even against partitioned tables, aiui. > > >>What are the gotchas? > > > See above? :) Other issues are things having to do w/ your specific > SQL- Oracle's old join syntax isn't supported by PostgreSQL (what is > it, something like select x,y from a,b where x=%y; to do a right-join, iirc). > > >>Should I be testing on 8 or the 7 version? > > > Now that 8.0 is out I'd say probably test with that and just watch for > 8.0.x releases before you go production, if you have time before you > have to go into production with the new solution (sounds like you do- > changing databases takes time anyway). > > >>Thanks in advance for any help you may have, I'll do my best to keep >>pgsql-performance up to date on the results. > > > Hope that helps. Others on here will correct me if I misspoke. :) > > Stephen > > > > ---(e
[PERFORM] DWH on Postgresql
Hi, I have the go ahead of a customer to do some testing on Postgresql in a couple of weeks as a replacement for Oracle. The reason for the test is that the number of users of the warehouse is going to increase and this will have a serious impact on licencing costs. (I bet that sounds familiar) We're running a medium sized data warehouse on a Solaris box (4CPU, 8Gb RAM) on Oracle. Basically we have 2 large fact tables to deal with: one going for 400M rows, the other will be hitting 1B rows soon. (around 250Gb of data) My questions to the list are: has this sort of thing been attempted before? If so, what where the results? I've been reading up on partitioned tabes on pgsql, will the performance benefit will be comparable to Oracle partitioned tables? What are the gotchas? Should I be testing on 8 or the 7 version? Thanks in advance for any help you may have, I'll do my best to keep pgsql-performance up to date on the results. Best regards, Matt ___ Matt Casters i-Bridge bvba, http://www.kettle.be Fonteinstraat 70, 9400 OKEGEM, Belgium Tel. 054/25.01.37 GSM 0486/97.29.37
Re: [PERFORM]
> On Thu, Jan 20, 2005 at 11:31:29 -0500, > Alex Turner <[EMAIL PROTECTED]> wrote: >> I am curious - I wasn't aware that postgresql supported partitioned tables, >> Could someone point me to the docs on this. > > Some people have been doing it using a union view. There isn't actually > a partition feature. > > Actually, there is. If found this example on pgsql-performance: >> CREATE TABLE super_foo ( partition NUMERIC, bar NUMERIC ); >> ANALYZE super_foo ; >> >> CREATE TABLE sub_foo1 () INHERITS ( super_foo ); >> INSERT INTO sub_foo1 VALUES ( 1, 1 ); >> -- repeat insert until sub_foo1 has 1,000,000 rows >> CREATE INDEX idx_subfoo1_partition ON sub_foo1 ( partition ); >> ANALYZE sub_foo1 ; >> >> CREATE TABLE sub_foo2 () INHERITS ( super_foo ); >> INSERT INTO sub_foo2 VALUES ( 2, 1 ); >> -- repeat insert until sub_foo2 has 1,000,000 rows >> CREATE INDEX idx_subfoo2_partition ON sub_foo2 ( partition ); >> ANALYZE sub_foo2 ; >> I think that in certain cases this system even beats Oracle as it stores less information in the table partitions. (and in doing so is causing less disk IO) BTW, internally, Oracle sees partitions as tables too. Even the "Union all" system that MS SQL Server uses works fine as long as the optimiser supports it to prune correctly. Cheers, Matt -- Matt Casters <[EMAIL PROTECTED]> i-Bridge bvba, http://www.kettle.be Fonteinstraat 70, 9400 Okegem, Belgium Phone +32 (0) 486/97.29.37 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] PostgreSQL clustering VS MySQL clustering
Presumably it can't _ever_ know without being explicitly told, because even for a plain SELECT there might be triggers involved that update tables, or it might be a select of a stored proc, etc. So in the general case, you can't assume that a select doesn't cause an update, and you can't be sure that the table list in an update is a complete list of the tables that might be updated. Tatsuo Ishii wrote: Can I ask a question? Suppose table A gets updated on the master at time 00:00. Until 00:03 pgpool needs to send all queries regarding A to the master only. My question is, how can pgpool know a query is related to A? -- Tatsuo Ishii ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM]
>> > Some people have been doing it using a union view. There isn't actually >> > a partition feature. >> >> Actually, there is. If found this example on pgsql-performance: >> >> >> CREATE TABLE super_foo ( partition NUMERIC, bar NUMERIC ); >> >> ANALYZE super_foo ; >> >> >> >> CREATE TABLE sub_foo1 () INHERITS ( super_foo ); > [...] >> >> >> >> CREATE TABLE sub_foo2 () INHERITS ( super_foo ); > [...] >> >> > > Yes, this could be used instead of a view. But there is one thing > missing. You can't just insert into super_foo and aquire the "correct > partition". You will still have to insert into the correct underlying > table. "Real" partitioning will take care of correct partition > selection. This IS bad news. It would mean a serious change in the ETL. I think I can solve the other problems, but I don't know about this one... Regards, Matt ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] PostgreSQL clustering VS MySQL clustering
Yes, I wasn't really choosing my examples particularly carefully, but I think the conclusion stands: pgpool (or anyone/thing except for the server) cannot in general tell from the SQL it is handed by the client whether an update will occur, nor which tables might be affected. That's not to say that pgpool couldn't make a good guess in the majority of cases! M Joshua D. Drake wrote: Matt Clark wrote: Presumably it can't _ever_ know without being explicitly told, because even for a plain SELECT there might be triggers involved that update tables, or it might be a select of a stored proc, etc. So in the general case, you can't assume that a select doesn't cause an update, and you can't be sure that the table list in an update is a complete list of the tables that might be updated. Uhmmm no :) There is no such thing as a select trigger. The closest you would get is a function that is called via select which could be detected by making sure you are prepending with a BEGIN or START Transaction. Thus yes pgPool can be made to do this. Sincerely, Joshua D. Drake ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Query plan excluding index on view
I'm trying to fine tune this query to return in a reasonable amount of time and am having difficulties getting the query to run the way I'd like. I have a couple of semi-related entities that are stored in individual tables, say, A and B. There is then a view created that pulls together the common fields from these 2 tables. These are then related through a m:m relationship to a classification. Quick definitions of all of this follows: Table: ItemA id <- primary key name description Table: ItemB id<- primary key name description View: Combined SELECT id, name, description from ItemA UNION ALL SELECT id, name, description from ItemB Table: xref id <- primary key item_id <- indexed, points to either ItemA.id or ItemB.id classifcation_id <- indexed, points to classification.id Table: classifcation id <- primiary key name I'm trying to query from the classification, through the xref, and to the view to get a list of Items (either A or B) that are tied to a specific classification. My query is rather simple, baiscally as follows: SELECT id, name, description FROM combination c INNER JOIN xref on c.id = xref.item_id WHERE xref.classifcation_id = 1 This query runs in about 2-3 minutes (I should mention that ItemA has ~18M records and xref has ~26M records - and both will continue to grow). The explain text shows a disregard for the indexes on ItemA and ItemB and a sequence scan is done on both of them. However, if I rewrite this query to join directly to ItemA rather to the view it runs in ~50ms because it now uses the proper index. I know it's generally requested to include the EXPLAIN text when submitting a specific question, but I thought perhaps this was generic enough that someone might at least have some suggestions. If required I can certainly work up a simpler example, or I could include my actual explain (though it doesn't exactly match everything defined above as I tried to keep things rather generic). Any links would be nice as well, from all my searching the past few days, most of the performance tuning resources I could find where about tuning the server itself, not really a specific query - at least not one that dealt with this issue. If you've read this far - thank you much!
Re: [PERFORM] Query plan excluding index on view
Sorry for not including this extra bit originally. Below is the explain detail from both the query to the view that takes longer and then the query directly to the single table that performs quickly. Hash Join (cost=49082.96..1940745.80 rows=11412 width=76) Hash Cond: (outer.?column1? = inner.listing_fid) -> Append (cost=0.00..1290709.94 rows=18487347 width=252) -> Subquery Scan *SELECT* 1 (cost=0.00..1285922.80 rows=18384890 width=251) -> Seq Scan on company (cost=0.00..1102073.90 rows=18384890 width=251) -> Subquery Scan *SELECT* 2 (cost=0.00..4787.14 rows=102457 width=252) -> Seq Scan on school (cost=0.00..3762.57 rows=102457 width=252) -> Hash (cost=49042.64..49042.64 rows=16130 width=8) -> Bitmap Heap Scan on listing_node_xref xref (cost=102.45..49042.64 rows=16130 width=8) Recheck Cond: (node_fid = 173204537) -> Bitmap Index Scan on idx_listing_node_xref_node_fid (cost=0.00..102.45 rows=16130 width=0) Index Cond: (node_fid = 173204537) Nested Loop (cost=102.45..98564.97 rows=11349 width=517) -> Bitmap Heap Scan on listing_node_xref xref (cost=102.45..49042.64 rows=16130 width=8) Recheck Cond: (node_fid = 173204537) -> Bitmap Index Scan on idx_listing_node_xref_node_fid (cost=0.00..102.45 rows=16130 width=0) Index Cond: (node_fid = 173204537) -> Index Scan using idx_pki_company_id on company c (cost=0.00..3.06 rows=1 width=517) Index Cond: (c.id = outer.listing_fid) On Thu, Apr 3, 2008 at 7:19 PM, Tom Lane <[EMAIL PROTECTED]> wrote: > "Matt Klinker" <[EMAIL PROTECTED]> writes: > > I new I'd forget something! I've tried this on both 8.2 and 8.3 with > the > > same results. > > Then you're going to have to provide more details ... > >regards, tom lane >
Re: [PERFORM] Query plan excluding index on view
.304.90 rows=16240 width=0) Index Cond: (node_fid = 173204537) -> Index Scan using pk_company_id on company c (cost=0.00..9.67 rows=1 width=424) Index Cond: (c.id = xref.listing_fid) On Thu, Apr 3, 2008 at 11:49 PM, Tom Lane <[EMAIL PROTECTED]> wrote: > "Matt Klinker" <[EMAIL PROTECTED]> writes: > > Sorry for not including this extra bit originally. Below is the explain > > detail from both the query to the view that takes longer and then the > query > > directly to the single table that performs quickly. > ... > > -> Subquery Scan *SELECT* 1 (cost=0.00..1285922.80 > rows=18384890 > > width=251) > > -> Seq Scan on company (cost=0.00..1102073.90 > rows=18384890 > > The presence of a Subquery Scan node tells me that either this is a much > older PG version than you stated, or there are some interesting details > to the query that you omitted. Please drop the fan-dance routine and > show us a complete reproducible test case. > >regards, tom lane >
Re: [SOLVED] [PERFORM] Query plan excluding index on view
Removing the constants definitely did take care of the issue on 8.3 (still same query plan on 8.1). Thanks for your help in getting this resolved, and sorry again for not including all relevant information on my initial request On Fri, Apr 4, 2008 at 10:20 PM, Tom Lane <[EMAIL PROTECTED]> wrote: > "Matt Klinker" <[EMAIL PROTECTED]> writes: > > --Joined View: > > CREATE OR REPLACE VIEW directory_listing AS > > SELECT school.id, school.name, school.description, 119075291 AS > > listing_type_fid > >FROM school > > UNION ALL > > SELECT company.id, company.name, company.description, 119074833 AS > > listing_type_fid > >FROM company; > > Ah, there's the problem :-(. Can you get rid of the constants here? > The planner's currently not smart about UNION ALL subqueries unless > their SELECT lists contain just simple column references. > > (Yes, fixing that is on the todo list, but don't hold your breath... > it'll be 8.4 material at the earliest.) > >regards, tom lane >
Re: [PERFORM] limit clause breaks query planner?
Hi David, Early in this thread, Pavel suggested: > you should partial index > > create index foo(b) on mytable where a is null; Rather, you might try the opposite partial index (where a is NOT null) as a replacement for the original unqualified index on column A. This new index will be ignored by the query you're trying to tune, but it'll be available to the other queries that filter to a non-null value of column A. (Omitting NULL from that index should be ok because you normally wouldn't want to use an index when 95% of the table's rows match the filtered key.) Then you can temporarily disable Seq Scans in your session for just this one query, as follows: SQL> create table my_table ( a int, b int ) ; CREATE TABLE SQL> create index idx_a_not_null on my_table ( a ) where a is not null ; CREATE INDEX SQL> create index idx_b on my_table ( b ) ; CREATE INDEX SQL> insert into my_table (a, b) select case when random() <= 0.95 then null else i end as a, mod(i, 10) as b from generate_series(1, 1000) s(i) ; INSERT 0 1000 SQL> analyze my_table ; ANALYZE Review the statistics available to the optimizer: SQL> select attname, null_frac, n_distinct, most_common_vals, most_common_freqs, histogram_bounds, correlation from pg_stats where tablename = 'my_table' order by attname ; attname | null_frac | n_distinct | most_common_vals| most_common_freqs | histogram_bounds| correlation -+---++---+--++- a | 0.945 | -1 | | | {2771,1301755,2096051,3059786,3680728,4653531,5882434,6737141,8240245,9428702,9875768} | 1 b | 0 | 10 | {9,4,3,1,2,6,8,5,7,0} | {0.110333,0.104,0.102333,0.100333,0.100333,0.0996667,0.0986667,0.098,0.096,0.09} | |0.127294 (2 rows) SQL> select relname, reltuples, relpages from pg_class where relname in ('my_table', 'idx_a_not_null', 'idx_b') order by relname ; relname | reltuples | relpages +---+-- idx_a_not_null |499955 | 1100 idx_b | 1e+07 |21946 my_table | 1e+07 |39492 (3 rows) Run the test query, first without disabling Seq Scan to show this example reproduces the plan you're trying to avoid. SQL> explain analyze select * from my_table where a is null and b = 5 limit 15 ; QUERY PLAN --- Limit (cost=0.00..2.66 rows=15 width=8) (actual time=0.070..0.263 rows=15 loops=1) -> Seq Scan on my_table (cost=0.00..164492.00 rows=929250 width=8) (actual time=0.061..0.159 rows=15 loops=1) Filter: ((a IS NULL) AND (b = 5)) Total runtime: 0.371 ms (4 rows) Now run the same query without the Seq Scan option. SQL> set enable_seqscan = false ; SET SQL> explain analyze select * from my_table where a is null and b = 5 limit 15 ; QUERY PLAN -- Limit (cost=0.00..46.33 rows=15 width=8) (actual time=0.081..0.232 rows=15 loops=1) -> Index Scan using idx_b on my_table (cost=0.00..2869913.63 rows=929250 width=8) (actual time=0.072..0.130 rows=15 loops=1) Index Cond: (b = 5) Filter: (a IS NULL) Total runtime: 0.341 ms (5 rows) SQL> reset enable_seqscan ; RESET Yes, it's unsavory to temporarily adjust a session-level parameter to tune a single query, but I don't know of a less intrusive way to avoid the SeqScan. Here's why I think it might be your simplest option: As far as I can tell, the plan nodes for accessing the table/index are unaware of the LIMIT. The cost of the Limit node is estimated as the cost of its input row-source multiplied by the ratio of requested/returned rows. For example, from the preceding plan output: 2869913.63 for "Index Scan" upper cost * (15 row limit / 929250 returned rows) = 46.326 upper cost for the "Limit" node The underlying plan nodes each assume that all the rows matching their filter predicates will be returned up the pipeline; the cost estimate is only reduced at the Limit node. A Seq Scan and an Index Scan (over a complete index) will both expected the same number of input rows (pg_class.reltuples). They also produce the same estimate
Re: [PERFORM] limit clause breaks query planner?
"Tom Lane" <[EMAIL PROTECTED]> writes: > "Matt Smiley" <[EMAIL PROTECTED]> writes: > > So an Index Scan is always going to have a higher cost estimate than > > an equivalent Seq Scan returning the same result rows (unless > > random_page_cost is < 1). That's why I think the planner is always > > preferring the plan that uses a Seq Scan. > > If that were the case, we'd never choose an indexscan at all... You're right, that was a silly guess. > It's true that a plain indexscan is not preferred for queries that will > return a large fraction of the table. However, it should be willing to > use a bitmap scan for this query, given default cost settings (the > default cost settings will cause it to prefer bitmap scan for retrieving > up to about a third of the table, in my experience). I too am confused > about why it doesn't prefer that choice in the OP's example. It looks like the bitmap scan has a higher cost estimate because the entire bitmap index must be built before beginning the heap scan and returning rows up the pipeline. The row-count limit can't be pushed lower than the bitmap-heap-scan like it can for the basic index-scan. test_8_3_3=# set enable_seqscan = false ; SET test_8_3_3=# set enable_indexscan = false ; SET test_8_3_3=# explain analyze select * from my_table where a is null and b = 3 limit 15 ; QUERY PLAN Limit (cost=17070.22..17071.02 rows=15 width=8) (actual time=606.902..607.086 rows=15 loops=1) -> Bitmap Heap Scan on my_table (cost=17070.22..69478.96 rows=988217 width=8) (actual time=606.892..606.983 rows=15 loops=1) Recheck Cond: (b = 3) Filter: (a IS NULL) -> Bitmap Index Scan on idx_b (cost=0.00..16823.17 rows=109 width=0) (actual time=592.657..592.657 rows=100 loops=1) Index Cond: (b = 3) Total runtime: 607.340 ms (7 rows) > It would be interesting to alter the random_page_cost setting and see if he > gets > different results. Using an unmodified postgresql.conf, the cost estimate for an index-scan were so much higher than for a seqscan that random_page_cost had to be set below 0.2 before the index-scan was preferred. However, it looks like this was mainly because effective_cache_size was too small. The planner thought the cache was only 128 MB, and the size of the complete table+index was 39492 + 21946 pages * 8 KB/block = 330 MB. It makes sense for the cost estimate to be so much higher if blocks are expected to be repeatedly re-fetched from disk. I wonder if David's effective_cache_size is too small. test_8_3_3=# reset all ; RESET test_8_3_3=# explain analyze select * from my_table where a is null and b = 3 limit 15 ; QUERY PLAN --- Limit (cost=0.00..2.50 rows=15 width=8) (actual time=0.036..0.239 rows=15 loops=1) -> Seq Scan on my_table (cost=0.00..164492.74 rows=988217 width=8) (actual time=0.028..0.138 rows=15 loops=1) Filter: ((a IS NULL) AND (b = 3)) Total runtime: 0.338 ms (4 rows) test_8_3_3=# set enable_seqscan = false ; SET test_8_3_3=# show random_page_cost ; random_page_cost -- 4 (1 row) test_8_3_3=# explain analyze select * from my_table where a is null and b = 3 limit 15 ; QUERY PLAN -- Limit (cost=0.00..45.99 rows=15 width=8) (actual time=0.051..0.200 rows=15 loops=1) -> Index Scan using idx_b on my_table (cost=0.00..3029924.36 rows=988217 width=8) (actual time=0.043..0.100 rows=15 loops=1) Index Cond: (b = 3) Filter: (a IS NULL) Total runtime: 0.308 ms (5 rows) test_8_3_3=# set random_page_cost = 0.19 ; SET test_8_3_3=# explain analyze select * from my_table where a is null and b = 3 limit 15 ; QUERY PLAN - Limit (cost=0.00..2.45 rows=15 width=8) (actual time=0.050..0.201 rows=15 loops=1) -> Index Scan using idx_b on my_table (cost=0.00..161190.65 rows=988217 width=8) (actual time=0.042..0.097 rows=15 loops=1) Index Cond: (b = 3) Filter: (a IS NULL) Total runtime: 0.307 ms (5 rows) Now fix effective_cache_size and try again. test_8_3_3=# reset all ; RESET test_8_3_3=# set effect
Re: [PERFORM] limit clause breaks query planner?
"Tom Lane" <[EMAIL PROTECTED]> writes: > I'm not sure offhand whether the existing correlation stats would be of use > for > it, or whether we'd have to get ANALYZE to gather additional data. Please forgive the tangent, but would it be practical to add support for gathering statistics on an arbitrary expression associated with a table, rather than just on materialized columns? For example: analyze my_tab for expression 'my_func(my_tab.col)' ; It seems like any time you'd consider using a functional index, this feature would let the planner calculate decent selectivity estimates for the expression's otherwise opaque data distribution. The expression might be treated as a virtual column on the table; not sure if that helps or hurts. Should I post this question on pgsql-hackers? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] inaccurate stats on large tables
Hi Kiran, You gave great info on your problem. First, is this the query you're actually trying to speed up, or is it a simplified version? It looks like the optimizer has already chosen the best execution plan for the given query. Since the query has no joins, we only have to consider access paths. You're fetching 58221/37909009 = 0.15% of the rows, so a sequential scan is clearly inappropriate. A basic index scan is likely to incur extra scattered I/O, so a bitmap index scan is favored. To improve on this query's runtime, you could try any of the following: - Reorganize the data to reduce this query's scattered I/O (i.e. cluster on "paliasorigin_search3_idx" rather than "paliasorigin_alias_casefold_idx"). Bear in mind, this may adversely affect other queries. - Increase the cache hit frequency by ensuring the underlying filesystem cache has plenty of RAM (usually so under Linux) and checking that other concurrent queries aren't polluting the cache. Consider adding RAM if you think the working set of blocks required by most queries is larger than the combined Postgres and filesystem caches. If other processes than the db do I/O on this machine, consider them as resource consumers, too. - Restructure the table, partitioning along a column that would be useful for pruning whole partitions for your painful queries. In this case, origin_id or tax_id seems like a good bet, but again, consider other queries against this table. 38 million rows probably makes your table around 2 GB (guessing about 55 bytes/row). Depending on the size and growth rate of the table, it may be time to consider partitioning. Out of curiosity, what runtime are you typically seeing from this query? The explain-analyze ran in 113 ms, which I'm guessing is the effect of caching, not the runtime you're trying to improve. - Rebuild the indexes on this table. Under certain use conditions, btree indexes can get horribly bloated. Rebuilding the indexes returns them to their most compact and balanced form. For example: reindex index "paliasorigin_search3_idx"; Apart from the locking and CPU usage during the rebuild, this has no negative consequences, so I'd try this before something drastic like partitioning. First review the current size of the index for comparison: select pg_size_pretty(pg_relation_size('paliasorigin_search3_idx')); Since you asked specifically about improving the row-count estimate, like the previous responder said, you should consider increasing the statistics target. This will help if individual columns are being underestimated, but not if the overestimate is due to joint variation. In other words, the optimizer has no way to tell if there is there a logical relationship between columns A and B such that certain values in B only occur with certain values of A. Just judging from the names, it sounds like origin_id and tax_id might have a parent-child relationship, so I thought it was worth mentioning. Do the columns individually have good estimates? explain analyze select * from paliasorigin where origin_id=20; explain analyze select * from paliasorigin where tax_id=9606; If not, increase the statistics on that column, reanalyze the table, and recheck the selectivity estimate: alter table paliasorigin alter column origin_id set statistics 20; analyze paliasorigin; explain analyze select * from paliasorigin where origin_id=20; Good luck! Matt -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] too many clog files
Hi Duan, As others have said, you should probably attempt to run pg_dump to export your database. If that doesn't work, consider restoring from backup. If the dump does work, you can create a clean PGDATA directory (using initdb like when you setup your original installation), and create a fresh copy of your database using the dump file. Then abandon your potentially damaged PGDATA directory. For future reference: - The "autovacuum" parameter in postgresql.conf is off by default under Postgres 8.1. You should probably turn it on to ensure regular vacuuming, unless you have your own cronjob to do the vacuuming. - About finding old transactions, there are 2 places you have to look for old transactions. The usual place is in pg_stat_activity. The 2nd place is "pg_prepared_xacts", where prepared transactions are listed. If there's a prepared transaction in your system, it might explain why your old commit-logs aren't being purged. The following query shows both prepared and normal transactions: select l.transactionid, age(l.transactionid) as age, /* measured in number of other transactions elapsed, not in terms of time */ l.pid, case when l.pid is null then false else true end as is_prepared, a.backend_start, p.prepared as time_xact_was_prepared, p.gid as prepared_name from pg_locks l left outer join pg_stat_activity a on l.pid = a.procpid left outer join pg_prepared_xacts p on l.transactionid = p.transaction where l.locktype = 'transactionid' and l.mode = 'ExclusiveLock' and l.granted order by age(l.transactionid) desc ; transactionid | age | pid | is_prepared | backend_start | time_xact_was_prepared | prepared_name ---+-+--+-+---+---+-- 316645 | 44 | | f | | 2008-09-09 00:31:46.724178-07 | my_prepared_transaction1 316689 | 0 | 6093 | t | 2008-09-09 00:40:10.928287-07 | | (2 rows) Note that unless you run this query as a superuser (e.g. "postgres"), the columns from pg_stat_activity will only be visible for sessions that belong to you. To rollback this example prepared transaction, you'd type: ROLLBACK PREPARED 'my_prepared_transaction1'; Hope this helps! Matt -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] too many clog files
Alvaro Herrera wrote: > Move the old clog files back where they were, and run VACUUM FREEZE in > all your databases. That should clean up all the old pg_clog files, if > you're really that desperate. Has anyone actually seen a CLOG file get removed under 8.2 or 8.3? How about 8.1? I'm probably missing something, but looking at src/backend/commands/vacuum.c (under 8.2.9 and 8.3.3), it seems like vac_truncate_clog() scans through *all* tuples of pg_database looking for the oldest datfrozenxid. Won't that always be template0, which as far as I know can never be vacuumed (or otherwise connected to)? postgres=# select datname, datfrozenxid, age(datfrozenxid), datallowconn from pg_database order by age(datfrozenxid), datname ; datname | datfrozenxid | age| datallowconn --+--+--+-- template1| 36347792 | 3859 | t postgres | 36347733 | 3918 | t mss_test | 36347436 | 4215 | t template0| 526 | 36351125 | f (4 rows) I looked at several of my 8.2 databases' pg_clog directories, and they all have all the sequentially numbered segments ( through current segment). Would it be reasonable for vac_truncate_clog() to skip databases where datallowconn is false (i.e. template0)? Looking back to the 8.1.13 code, it does exactly that: if (!dbform->datallowconn) continue; Also, Duan, if you have lots of files under pg_clog, you may be burning through transactions faster than necessary. Do your applications leave autocommit turned on? And since no one else mentioned it, as a work-around for a small filesystem you can potentially shutdown your database, move the pg_clog directory to a separate filesystem, and create a symlink to it under your PGDATA directory. That's not a solution, just a mitigation. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] suggestions for postgresql setup on Dell 2950 , PERC6i controller
Arjen van der Meijden wrote: > Afaik the Perc 5/i and /e are more or less rebranded LSI-cards (they're > not identical in layout etc), so it would be a bit weird if they > performed much less than the similar LSI's wouldn't you think? I've recently had to replace a PERC4/DC with the exact same card made by LSI (320-2) because the PERCs firmware was crippled. Its idea of RAID10 actually appears to be concatenated RAID1 arrays. Since replacing it and rebuilding the array on the LSI card, performance has been considerably better (14 disk SCSI shelf) > Areca may be the fastest around right now, but if you'd like to get it > all from one supplier, its not too bad to be stuck with Dell's perc 5 or > 6 series. The PERC6 isn't too bad, however it grinds to a halt when the IO queue gets large and it has the serious limitation of not supporting more than 8 spans, so trying to build a RAID10 array greater than 16 disks is pointless if you're not just after the extra capacity. Are there any reasonable choices for bigger (3+ shelf) direct-connected RAID10 arrays, or are hideously expensive SANs the only option? I've checked out the latest Areca controllers, but the manual available on their website states there's a limitation of 32 disks in an array... -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] suggestions for postgresql setup on Dell 2950 , PERC6i controller
Glyn Astill wrote: > Did you try flashing the PERC with the LSI firmware? > > I tried flashing a PERC3/dc with LSI firmware, it worked fine but I > saw no difference in performance so I assumed it must be somethign > else on the board that cripples it. No, for a few reasons: 1. I read somewhere on the interwebs that doing so would brick the card 2. I don't have access to a DOS/Windows machine 3. Dodgy hardware isn't what you want when dealing with large databases If it's not just a firmware issue it wouldn't surprise me if you could just link a couple of pins/contacts/etc on the card and gain the LSIs capabilities, but it's not an idea I'd entertain outside of personal use... -- -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] suggestions for postgresql setup on Dell 2950 , PERC6i controller
Rajesh Kumar Mallah wrote: >> I've checked out the latest Areca controllers, but the manual >> available on their website states there's a limitation of 32 disks >> in an array... > > Where exactly is there limitation of 32 drives. the datasheet of > 1680 states support upto 128drives using enclosures. The 1680 manual: http://www.areca.us//support/download/RaidCards/Documents/Manual_Spec/SAS_Manual.zip Page 25: > Note: > > 1. The maximum no. is 32 disk drived included in a single RAID set Page 49: > 1. Up to 32 disk drives can be included in a single RAID set. > 2. Up to 8 RAID sets can be created per controller (point 2 meaning you can't do s/w RAID over umpteen h/w RAID1 pairs) Page 50: > To create RAID 30/50/60 volume, you need create multiple RAID sets > first with the same disk members on each RAID set. The max no. disk > drives per volume set: 32 for RAID 0/1/10/3/5/6 and 128 for RAID > 30/50/60. ...and a few more times saying the same thing -- -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] suggestions for postgresql setup on Dell 2950 , PERC6i controller
Scott Carey wrote: > You probably don’t want a single array with more than 32 drives anyway, > its almost always better to start carving out chunks and using software > raid 0 or 1 on top of that for various reasons. I wouldn’t put more than > 16 drives in one array on any of these RAID cards, they’re just not > optimized for really big arrays and tend to fade between 6 to 16 in one > array, depending on the quality. This is what I'm looking at now. The server I'm working on at the moment currently has a PERC6/e and 3xMD1000s which needs to be tested in a few setups. I need to code a benchmarker yet (I haven't found one yet that can come close to replicating our DB usage patterns), but I intend to try: 1. 3x h/w RAID10 (one per shelf), sofware RAID0 2. lots x h/w RAID1, software RAID0 if the PERC will let me create enough arrays 3. Pure s/w RAID10 if I can convince the PERC to let the OS see the disks 4. 2x h/w RAID30, software RAID0 I'm not holding much hope out for the last one :) I'm just glad work on a rewrite of my inherited backend systems should start soon; get rid of the multi-TB MySQL hell and move to a distributed PG setup on dirt cheap Dell R200s/blades > You can do direct-attached storage to 100+ drives or more if you want. > The price and manageability cost go up a lot if it gets too big > however. Having global hot spare drives is critical. Not that the cost > of using SAN’s and such is low... SAS expanders have made DAS with > large arrays very accessible though. For large storage arrays (RAID60 or similar) you can't beat a RAID controller and disk shelf(s), especially if you keep the raidsets small and use cheap ludicrous capacity SATA disks You just need to be aware that performance doesn't scale well/easily over 1-2 shelves on the things -- -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] suggestions for postgresql setup on Dell 2950 , PERC6i controller
Glyn Astill wrote: >> Stupid question, but why do people bother with the Perc line of >> cards if the LSI brand is better? It seems the headache of trying >> to get the Perc cards to perform is not worth any money saved. > > I think in most cases the dell cards actually cost more, people end > up stuck with them because they come bundled with their servers - > they find out too late that they've got a lemon. That's what's been happening with me... The fact Dell prices can have a fair bit of downward movement when you get the account manager on the phone makes them especially attractive to the people controlling the purse strings. The biggest reason for me however is the lack of comparative reviews. I struggled to get the LSI card to replace the PERC3 because all I had to go on was qualitative mailing list/forum posts from strangers. The only way I got it was to make the argument that other than trying the LSI, we'd have no choice other than replacing the server+shelf+disks. I want to see just how much better a high-end Areca/Adaptec controller is, but I just don't think I can get approval for a £1000 card "because some guy on the internet said the PERC sucks". Would that same person say it sucked if it came in Areca packaging? Am I listening to the advice of a professional, or a fanboy? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow-ish Query Needs Some Love
On Feb 2, 6:06 am, Edgardo Portal wrote: > On 2010-02-02, Matt White wrote: > > > I have a relatively straightforward query that by itself isn't that > > slow, but we have to run it up to 40 times on one webpage load, so it > > needs to run much faster than it does. Here it is: > > > SELECT COUNT(*) FROM users, user_groups > > WHERE users.user_group_id = user_groups.id AND NOT users.deleted AND > > user_groups.partner_id IN > > (partner_id_1, partner_id_2); > > > The structure is partners have user groups which have users. In the > > test data there are over 200,000 user groups and users but only ~3000 > > partners. Anyone have any bright ideas on how to speed this query up? > > Can you avoid running it 40 times, maybe by restructuring the > query (or making a view) along the lines of the following and > adding some logic to your page? > > SELECT p.partner_id, ug.user_group_id, u.id, count(*) > FROM partners p > LEFT JOIN user_groups ug > ON ug.partner_id=p.partner_id > LEFT JOIN users u > ON u.user_group_id=ug.id > WHERE NOT u.deleted > GROUP BY 1,2,3 > ; Thanks for the suggestion. The view didn't seem to speed things up. Perhaps we can reduce the number of times it's called, we'll see. Any additional ideas would be helpful. Thanks. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance