[PERFORM] SSD options, small database, ZFS
Hello all, I've spent some time looking through previous posts regarding postgres and SSD drives and have also been reading up on the subject of SSDs in general elsewhere. Some quick background: We're currently looking at changing our basic database setup as we migrate away from some rather old (circa 2006 or so) hardware to more current gear. Our previous setup basically consisted of two types of 1U boxes - dual-xeon with a low-end Adaptec RAID card paired with two U320 or U160 10K SCSI drives for database and light web frontend and mail submission duties and single P4 and dual PIII boxes with commodity drives handling mail deliveries. Most of the delivery servers have been migrated to current hardware (single and dual quad-core xeons, 4 WD RE3 SATA drives, FreeBSD w/ZFS), and we've also moved a handful of the db/web services to one of these servers just to see what a large increase in RAM and CPU can do for us. As we sort of expected, the performance overall was better, but there's a serious disk IO wall we hit when running nightly jobs that are write heavy and long-running. There are currently four db/web servers that have a combined total of 133GB of on-disk postgres data. Given the age of the servers, the fact that each only has two drives on rather anemic RAID controllers with no cache/BBU, and none has more than 2GB RAM I think it's a safe bet that consolidating this down to two modern servers can give us better performance, allow for growth over the next few years, and let us split db duties from web/mail submission. One live db server, one standby. And this is where SSDs come in. We're not looking at terabytes of data here, and I don't see us growing much beyond our current database size in the next few years. SSDs are getting cheap enough that this feels like a good match - we can afford CPU and RAM, we can't afford some massive 24 drive monster and a pile of SAS drives. The current db boxes top out at 300 IOPS, the SATA boxes maybe about half that. If I can achieve 300x4 IOPS (or better, preferably...) on 2 or 4 SSDs, I'm pretty much sold. From what I've been reading here, this sounds quite possible. I understand the Intel 320 series are the best "bargain" option since they can survive an unexpected shutdown, so I'm not going to go looking beyond that - OCZ makes me a bit nervous and finding a clear list of which drives have the "supercapacitor" has been difficult. I have no qualms about buying enough SSDs to mirror them all. I am aware I need to have automated alerts for various SMART stats so I know when danger is imminent. I know I need to have this box replicated even with mirrors and monitoring up the wazoo. Here's my remaining questions: -I'm calling our combined databases at 133GB "small", fair assumption? -Is there any chance that a server with dual quad core xeons, 32GB RAM, and 2 or 4 SSDs (assume mirrored) could be slower than the 4 old servers described above? I'm beating those on raw cpu, quadrupling the amount of RAM (and consolidating said RAM), and going from disks that top out at 4x300 IOPS with SSDs that conservatively should provide 2000 IOPS. -We're also finally automating more stuff and trying to standardize server configs. One tough decision we made that has paid off quite well was to move to ZFS. We find the features helpful to admin tasks outweigh the drawbacks and RAM is cheap enough that we can deal with its tendency to eat RAM. Is ZFS + Postgres + SSDs a bad combo? -Should I even be looking at the option of ZFS on SATA or low-end SAS drives and ZIL and L2ARC on SSDs? Initially this intrigued me, but I can't quite get my head around how the SSD-based ZIL can deal with flushing the metadata out when the whole system is under any sort of extreme write-heavy load - I mean if the ZIL is absorbing 2000 IOPS of metadata writes, at some point it has to get full as it's trying to flush this data to much slower spinning drives. -Should my standby box be the same configuration or should I look at actual spinning disks on that? How rough is replication on the underlying storage? Would the total data written on the slave be less or equal to the master? Any input is appreciated. I did really mean for this to be a much shorter post... Thanks, Charles -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SSD options, small database, ZFS
Resurrecting this long-dormant thread... On Oct 14, 2011, at 6:41 AM, Arjen van der Meijden wrote: > On 14-10-2011 10:23, CSS wrote: >> -I'm calling our combined databases at 133GB "small", fair >> assumption? -Is there any chance that a server with dual quad core >> xeons, 32GB RAM, and 2 or 4 SSDs (assume mirrored) could be slower >> than the 4 old servers described above? I'm beating those on raw >> cpu, quadrupling the amount of RAM (and consolidating said RAM), >> and going from disks that top out at 4x300 IOPS with SSDs that >> conservatively should provide 2000 IOPS. > > Whether 133GB is small or not probably mostly depends on how much of it is > actually touched during use. But I'd agree that it isn't a terribly large > database, I'd guess a few simple SSDs would be plenty to achieve 2000 IOPs. > For lineair writes, they're still not really faster than normal disks, but if > that's combined with random access (either read or write) you ought to be ok. > We went from 15x 15k sas-disks to 6x ssd several years back in our MySQL-box, > but since we also increased the ram from 16GB to 72GB, the io-load dropped so > much the ssd's are normally only lightly loaded... Thanks for your input on this. It's taken some time, but I do finally have some hardware on hand (http://imgur.com/LEC5I) and as more trickles in over the coming days, I'll be putting together our first SSD-based postgres box. I have much testing to do, and I'm going to have some questions regarding that subject in another thread. > Btw, the 5500 and 5600 Xeons are normally more efficient with a multiple of 6 > ram-modules, so you may want to have a look at 24GB (6x4), 36GB (6x4+6x2) or > 48GB (12x4 or 6x8) RAM. Thanks - I really had a hard time wrapping my head around the rules on populating the banks. If I understand it correctly, this is due to the memory controller moving from the south(?)bridge to being integrated in the CPU. > Given the historical questions on the list, there is always a risk of getting > slower queries with hardware that should be much faster. For instance, the > huge increase in RAM may trigger a less efficient query-plan. Or the disks > abide by the flush-policies more correctly. > Assuming the queries are still getting good plans and there are no such > special differences, I'd agree with the assumption that its a win on every > count. > Or your update to a newer OS and PostgreSQL may trigger some worse query plan > or hardware-usage. That's an interesting point, I'd not even considered that. Is there any sort of simple documentation on the query planner that might cover how things like increased RAM could impact how a query is executed? >> -Should I even be looking at the option of ZFS on SATA or low-end >> SAS drives and ZIL and L2ARC on SSDs? Initially this intrigued me, >> but I can't quite get my head around how the SSD-based ZIL can deal >> with flushing the metadata out when the whole system is under any >> sort of extreme write-heavy load - I mean if the ZIL is absorbing >> 2000 IOPS of metadata writes, at some point it has to get full as >> it's trying to flush this data to much slower spinning drives. > > A fail-safe set-up with SSD's in ZFS assumes at least 3 in total, i.e. a pair > of SSD's for ZIL and as many as you want for L2ARC. Given your database size, > 4x160GB SSD (in "raid10") or 2x 300GB should yield plenty of space. So given > the same choice, I wouldn't bother with a set of large capacity sata disks > and ZIL/L2ARC-SSD's, I'd just go with 4x160GB or 2x300GB SSD's. Well, I've bought 4x160GB, so that's what I'll use. I will still do some tests with two SATA drives plus ZIL, just to see what happens. > >> -Should my standby box be the same configuration or should I look >> at actual spinning disks on that? How rough is replication on the >> underlying storage? Would the total data written on the slave be >> less or equal to the master? > > How bad is it for you if the performance of your database potentially drops a > fair bit when your slave becomes the master? If you have a read-mostly > database, you may not even need SSD's in your master-db (given your amount of > RAM). But honestly, I don't know the answer to this question :) It's complicated - during the day we're mostly looking at very scattered reads and writes, probably a bit biased towards writes. But each evening we kick off a number of jobs to pre-generate stats for more complex queries... If the job could still complete in 6-8 hours, we'd probably be OK, but if it starts clogging up our normal que
[PERFORM] Benchmarking tools, methods
Hello, I'm going to be testing some new hardware (see http://archives.postgresql.org/pgsql-performance/2011-11/msg00230.php) and while I've done some very rudimentary before/after tests with pgbench, I'm looking to pull more info than I have in the past, and I'd really like to automate things further. I'll be starting with basic disk benchmarks (bonnie++ and iozone) and then moving on to pgbench. I'm running FreeBSD and I'm interested in getting some baseline info on UFS2 single disk (SATA 7200/WD RE4), gmirror, zfs mirror, zfs raidz1, zfs set of two mirrors (ie: two mirrored vdevs in a mirror). Then I'm repeating that with the 4 Intel 320 SSDs, and just to satisfy my curiosity, a zfs mirror with two of the SSDs mirrored as the ZIL. Once that's narrowed down to a few practical choices, I'm moving on to pgbench. I've found some good info here regarding pgbench that is unfortunately a bit dated: http://www.westnet.com/~gsmith/content/postgresql/ A few questions: -Any favorite automation or graphing tools beyond what's on Greg's site? -Any detailed information on creating "custom" pgbench tests? -Any other postgres benchmarking tools? I'm also curious about benchmarking using my own data. I tried something long ago that at least gave the illusion of working, but didn't seem quite right to me. I enabled basic query logging on one of our busier servers, dumped the db, and let it run for 24 hours. That gave me the normal random data from users throughout the day as well as our batch jobs that run overnight. I had to grep out and reformat the actual queries from the logfile, but that was not difficult. I then loaded the dump into the test server and basically fed the saved queries into it and timed the result. I also hacked together a script to sample cpu and disk stats every 2S and had that feeding into an rrd database so I could see how "busy" things were. In theory, this sounded good (to me), but I'm not sure I trust the results. Any suggestions on the general concept? Is it sound? Is there a better way to do it? I really like the idea of using (our) real data. Lastly, any general suggestions on tools to collect system data during tests and graph it are more than welcome. I can homebrew, but I'm sure I'd be reinventing the wheel. Oh, and if anyone wants any tests run that would not take an insane amount of time and would be valuable to those on this list, please let me know. Since SSDs have been a hot topic lately and not everyone has a 4 SSDs laying around, I'd like to sort of focus on anything that would shed some light on the whole SSD craze. The box under test ultimately will have 32GB RAM, 2 quad core 2.13GHz Xeon 5506 cpus and 4 Intel 320 160GB SSDs. I'm recycling some older boxes as well, so I have much more RAM on hand until those are finished. Thanks, Charles ps - considering the new PostgreSQL Performance book that Packt has, any strong feelings about that one way or the other? Does it go very far beyond what's on the wiki? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Benchmarking tools, methods
On Nov 19, 2011, at 11:21 AM, Greg Smith wrote: > On 11/18/2011 04:55 AM, CSS wrote: >> I'm also curious about benchmarking using my own data. I tried something >> long ago that at least gave the illusion of working, but didn't seem quite >> right to me. I enabled basic query logging on one of our busier servers, >> dumped the db, and let it run for 24 hours. That gave me the normal random >> data from users throughout the day as well as our batch jobs that run >> overnight. I had to grep out and reformat the actual queries from the >> logfile, but that was not difficult. I then loaded the dump into the test >> server and basically fed the saved queries into it and timed the result. I >> also hacked together a script to sample cpu and disk stats every 2S and had >> that feeding into an rrd database so I could see how "busy" things were. >> >> In theory, this sounded good (to me), but I'm not sure I trust the results. >> Any suggestions on the general concept? Is it sound? Is there a better way >> to do it? I really like the idea of using (our) real data. >> > > The thing that's hard to do here is replay the activity with the right > timing. Some benchmarks, such as pgbench, will hit the database as fast as > it will process work. That's not realistic. You really need to consider > that real applications have pauses in them, and worry about that both in > playback speed and in results analysis. > > See http://wiki.postgresql.org/wiki/Statement_Playback for some more info on > this. Thanks so much for this, and thanks to Cédric for also pointing out Tsung specifically on that page. I had no idea any of these tools existed. I really like the idea of "application specific" testing, it makes total sense for the kind of things we're trying to measure. I also wanted to thank everyone else that posted in this thread, all of this info is tremendously helpful. This is a really excellent list, and I really appreciate all the people posting here that make their living doing paid consulting taking the time to monitor and post on this list. Yet another way for me to validate choosing postgres over that "other" open source db. >> ps - considering the new PostgreSQL Performance book that Packt has, any >> strong feelings about that one way or the other? Does it go very far beyond >> what's on the wiki? >> > > Pages 21 through 97 are about general benchmarking and hardware setup; 189 > through 208 cover just pgbench. There's almost no overlap between those > sections and the wiki, which is mainly focused on PostgreSQL usage issues. > Unless you're much smarter than me, you can expect to spent months to years > reinventing wheels described there before reaching new ground in the areas it > covers. From the questions you've been asking, you may not find as much > about ZFS tuning and SSDs as you'd like though. We're grabbing a copy of it for the office. Packt is running a sale, so we're also going to grab the "cookbook", it looks intriguing. > http://www.2ndquadrant.com/en/talks/ has some updated material about things > discovered since the book was published. The "Bottom-Up Database > Benchmarking" there shows the tests I'm running nowadays, which have evolved > a bit in the last year. Looks like good stuff, thanks. Charles > -- > Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD > PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] rough benchmarks, sata vs. ssd
Hello all, Just wanted to share some results from some very basic benchmarking runs comparing three disk configurations on the same hardware: http://morefoo.com/bench.html Before I launch into any questions about the results (I don't see anything particularly shocking here), I'll describe the hardware and configurations in use here. Hardware: *Tyan B7016 mainboard w/onboard LSI SAS controller *2x4 core xeon E5506 (2.13GHz) *64GB ECC RAM (8GBx8 ECC, 1033MHz) *2x250GB Seagate SATA 7200.9 (ST3250824AS) drives (yes, old and slow) *2x160GB Intel 320 SSD drives Software: *FreeBSD 8.2 STABLE snapshot from 6/2011 (includes zfsv28, this is our production snapshot) *PostgreSQL 9.0.6 (also what we run in production) *pgbench-tools 0.5 (to automate the test runs and make nice graphs) I was mainly looking to compare three variations of drive combinations and verify that we don't see any glaring performance issues with Postgres running on ZFS. We mostly run 1U boxes and we're looking for ways to get better performance without having to invest in some monster box that can hold a few dozen cheap SATA drives. SSDs or SATA with SSDs hosting the "ZIL" (ZFS Intent Log). The ZIL is a bit of a cheat, as it allows you to throw all the synchronous writes to the SSD - I was particularly curious about how this would benchmark even though we will not likely use ZIL in production (at least not on this db box). background thread: http://archives.postgresql.org/pgsql-performance/2011-10/msg00137.php So the three sets of results I've linked are all pgbench-tools runs of the "tpc-b" benchmark. One using the two SATA drives in a ZFS mirror, one with the same two drives in a ZFS mirror with two of the Intel 320s as ZIL for that pool, and one with just two Intel 320s in a ZFS mirror. Note that I also included graphs in the pgbench results of some basic system metrics. That's from a few simple scripts that collect some vmstat, iostat and "zpool iostat" info during the runs at 1 sample/second. They are a bit ugly, but give a good enough visual representation of how swamped the drives are during the course of the tests. Why ZFS? Well, we adopted it pretty early for other tasks and it makes a number of tasks easy. It's been stable for us for the most part and our latest wave of boxes all use cheap SATA disks, which gives us two things - a ton of cheap space (in 1U) for snapshots and all the other space-consuming toys ZFS gives us, and on this cheaper disk type, a guarantee that we're not dealing with silent data corruption (these are probably the normal fanboy talking points). ZFS snapshots are also a big time-saver when benchmarking. For our own application testing I load the data once, shut down postgres, snapshot pgsql + the app homedir and start postgres. After each run that changes on-disk data, I simply rollback the snapshot. I don't have any real questions for the list, but I'd love to get some feedback, especially on the ZIL results. The ZIL results interest me because I have not settled on what sort of box we'll be using as a replication slave for this one - I was going to either go the somewhat risky route of another all-SSD box or looking at just how cheap I can go with lots of 2.5" SAS drives in a 2U. I'm hoping that the general "call for discussion" is an acceptable request for this list, which seems to cater more often to very specific tuning questions. If not, let me know. If you have any test requests that can be quickly run on the above hardware, let me know. I'll have the box easily accessible for the next few days at least (and I wouldn't mind pushing more writes through to two of my four ssds before deploying the whole mess in case it is true that SSDs fail a the same write cycle count). I'll be doing more tests for my own curiousity such as making sure UFS2 doesn't wildly outperform ZFS on the SSD-only setup, testing with the expected final config of 4 Intel 320s, and then lots of application-specific tests, and finally digging a bit more thoroughly into Greg's book to make sure I squeeze all I can out of this thing. Thanks, Charles -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] rough benchmarks, sata vs. ssd
On Feb 3, 2012, at 6:23 AM, Ivan Voras wrote: > On 31/01/2012 09:07, CSS wrote: >> Hello all, >> >> Just wanted to share some results from some very basic benchmarking >> runs comparing three disk configurations on the same hardware: >> >> http://morefoo.com/bench.html > > That's great! Thanks. I did spend a fair amount of time on it. It was also a good excuse to learn a little about gnuplot, which I used to draw the (somewhat oddly combined) system stats. I really wanted to see IO and CPU info over the duration of a test even if I couldn't really know what part of the test was running. Don't ask me why iostat sometimes shows greater than 100% in the "busy" column though. It is in the raw iostat output I used to create the graphs. > >> *Tyan B7016 mainboard w/onboard LSI SAS controller >> *2x4 core xeon E5506 (2.13GHz) >> *64GB ECC RAM (8GBx8 ECC, 1033MHz) >> *2x250GB Seagate SATA 7200.9 (ST3250824AS) drives (yes, old and slow) >> *2x160GB Intel 320 SSD drives > > It shows that you can have large cheap SATA drives and small fast SSD-s, and > up to a point have best of both worlds. Could you send me (privately) a tgz > of the results (i.e. the pages+images from the above URL), I'd like to host > them somewhere more permanently. Sent offlist, including raw vmstat, iostat and zpool iostat output. > >> The ZIL is a bit of a cheat, as it allows you to throw all the >> synchronous writes to the SSD > > This is one of the main reasons it was made. It's not a cheat, it's by design. I meant that only in the best way. Some of my proudest achievements are cheats. :) It's a clever way of moving cache to something non-volatile and providing a fallback, although the fallback would be insanely slow in comparison. > >> Why ZFS? Well, we adopted it pretty early for other tasks and it >> makes a number of tasks easy. It's been stable for us for the most >> part and our latest wave of boxes all use cheap SATA disks, which >> gives us two things - a ton of cheap space (in 1U) for snapshots and >> all the other space-consuming toys ZFS gives us, and on this cheaper >> disk type, a guarantee that we're not dealing with silent data >> corruption (these are probably the normal fanboy talking points). >> ZFS snapshots are also a big time-saver when benchmarking. For our >> own application testing I load the data once, shut down postgres, >> snapshot pgsql + the app homedir and start postgres. After each run >> that changes on-disk data, I simply rollback the snapshot. > > Did you tune ZFS block size for the postgresql data directory (you'll need to > re-create the file system to do this)? When I investigated it in the past, it > really did help performance. I actually did not. A year or so ago I was doing some basic tests on cheap SATA drives with ZFS and at least with pgbench, I could see no difference at all. I actually still have some of that info, so I'll include it here. This was a 4-core xeon, E5506 2.1GHZ, 4 1TB WD RE3 drives in a RAIDZ1 array, 8GB RAM. I tested three things - time to load an 8.5GB dump of one of our dbs, time to run through a querylog of real data (1.4M queries), and then pgbench with a scaling factor of 100, 20 clients, 10K transactions per client. default 128K zfs recordsize: -9 minutes to load data -17 minutes to run query log -pgbench output transaction type: TPC-B (sort of) scaling factor: 100 query mode: simple number of clients: 20 number of transactions per client: 1 number of transactions actually processed: 20/20 tps = 100.884540 (including connections establishing) tps = 100.887593 (excluding connections establishing) 8K zfs recordsize (wipe data dir and reinit db) -10 minutes to laod data -21 minutes to run query log -pgbench output transaction type: TPC-B (sort of) scaling factor: 100 query mode: simple number of clients: 20 number of transactions per client: 1 number of transactions actually processed: 20/20 tps = 97.896038 (including connections establishing) tps = 97.898279 (excluding connections establishing) Just thought I'd include that since I have the data. > >> I don't have any real questions for the list, but I'd love to get >> some feedback, especially on the ZIL results. The ZIL results >> interest me because I have not settled on what sort of box we'll be >> using as a replication slave for this one - I was going to either go >> the somewhat risky route of another all-SSD box or looking at just >> how cheap I can go with lots of 2.5" SAS drives in a 2U. > > You probably know the answer to that: if you need lots of storage, you'll > probably be better off using large SATA d
Re: [PERFORM] rough benchmarks, sata vs. ssd
For the top-post scanners, I updated the ssd test to include changing the zfs recordsize to 8k. On Feb 11, 2012, at 1:35 AM, CSS wrote: > > On Feb 3, 2012, at 6:23 AM, Ivan Voras wrote: > >> On 31/01/2012 09:07, CSS wrote: >>> Hello all, >>> >>> Just wanted to share some results from some very basic benchmarking >>> runs comparing three disk configurations on the same hardware: >>> >>> http://morefoo.com/bench.html >> >> That's great! > > Thanks. I did spend a fair amount of time on it. It was also a > good excuse to learn a little about gnuplot, which I used to draw > the (somewhat oddly combined) system stats. I really wanted to see > IO and CPU info over the duration of a test even if I couldn't > really know what part of the test was running. Don't ask me why > iostat sometimes shows greater than 100% in the "busy" column > though. It is in the raw iostat output I used to create the graphs. > >> >>> *Tyan B7016 mainboard w/onboard LSI SAS controller >>> *2x4 core xeon E5506 (2.13GHz) >>> *64GB ECC RAM (8GBx8 ECC, 1033MHz) >>> *2x250GB Seagate SATA 7200.9 (ST3250824AS) drives (yes, old and slow) >>> *2x160GB Intel 320 SSD drives >> >> It shows that you can have large cheap SATA drives and small fast SSD-s, and >> up to a point have best of both worlds. Could you send me (privately) a tgz >> of the results (i.e. the pages+images from the above URL), I'd like to host >> them somewhere more permanently. > > Sent offlist, including raw vmstat, iostat and zpool iostat output. > >> >>> The ZIL is a bit of a cheat, as it allows you to throw all the >>> synchronous writes to the SSD >> >> This is one of the main reasons it was made. It's not a cheat, it's by >> design. > > I meant that only in the best way. Some of my proudest achievements > are cheats. :) > > It's a clever way of moving cache to something non-volatile and > providing a fallback, although the fallback would be insanely slow > in comparison. > >> >>> Why ZFS? Well, we adopted it pretty early for other tasks and it >>> makes a number of tasks easy. It's been stable for us for the most >>> part and our latest wave of boxes all use cheap SATA disks, which >>> gives us two things - a ton of cheap space (in 1U) for snapshots and >>> all the other space-consuming toys ZFS gives us, and on this cheaper >>> disk type, a guarantee that we're not dealing with silent data >>> corruption (these are probably the normal fanboy talking points). >>> ZFS snapshots are also a big time-saver when benchmarking. For our >>> own application testing I load the data once, shut down postgres, >>> snapshot pgsql + the app homedir and start postgres. After each run >>> that changes on-disk data, I simply rollback the snapshot. >> >> Did you tune ZFS block size for the postgresql data directory (you'll need >> to re-create the file system to do this)? When I investigated it in the >> past, it really did help performance. > Well now I did, added the results to http://ns.morefoo.com/bench.html and it looks like there's certainly an improvement. That's with the only change from the previous test being to copy the postgres data dir, wipe the original, set the zfs recordsize to 8K (default is 128K), and then copy the data dir back. Things that stand out on first glance: -at a scaling factor of 10 or greater, there is a much more gentle decline in TPS than with the default zfs recordsize -on the raw *disk* IOPS graph, I now see writes peaking at around 11K/second compared to 1.5K/second. -on the zpool iostat graph, I do not see those huge write peaks, which is a bit confusing -on both iostat graphs, I see the datapoints look more scattered with the 8K recordsize Any comments are certainly welcome. I understand 8K recordsize should perform better since that's the size of the chunks of data postgresql is dealing with, but the effects on the system graphs are interesting and I'm not quite following how it all relates. I wonder if the recordsize impacts the ssd write amplification at all... Thanks, Charles > I actually did not. A year or so ago I was doing some basic tests > on cheap SATA drives with ZFS and at least with pgbench, I could see > no difference at all. I actually still have some of that info, so > I'll include it here. This was a 4-core xeon, E5506 2.1GHZ, 4 1TB > WD RE3 drives in a RAIDZ1 array, 8GB RAM. > > I tested three things - time to load an 8.5GB dump of one of our > dbs, time to run through a queryl
[PERFORM] select operations that generate disk writes
Hello, Time for a broad question. I'm aware of some specific select queries that will generate disk writes - for example, a sort operation when there's not enough work_mem can cause PG to write out some temp tables (not the correct terminology?). That scenario is easily remedied by enabling "log_temp_files" and specifying the threshold in temp file size at which you want logging to happen. I've recently been trying to put some of my recent reading of Greg's book and other performance-related documentation to use by seeking out queries that take an inordinate amount of time to run. Given that we're usually disk-bound, I've gotten in the habit of running an iostat in a terminal while running and tweaking some of the problem queries. I find this gives me some nice instant feedback on how hard the query is causing PG to hit the disks. What's currently puzzling me are some selects with complex joins and sorts that generate some fairly large bursts of write activity while they run. I was able to reduce this by increasing work_mem (client-side) to give the sorts an opportunity to happen in memory. I now see no temp file writes being logged, and indeed the query sped up. So my question is, what else can generate writes when doing read-only operations? I know it sounds like a simple question, but I'm just not finding a concise answer anywhere. Thanks, Charles -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Anyone running Intel S3700 SSDs?
Considering this list is where I first learned of the Intel 320 drives (AFAIK, the only non-enterprise SSDs that are power-failure safe), I thought I'd see if any of the folks here that tend to test new stuff have got their hands on these yet. I had no idea these drives were out (but they still are a bit pricey, but cheaper than any spinning drives that would give the same sort of random IO performance), and while trying to find a place to source some spare 300GB 320s, I found this review: http://www.anandtech.com/show/6433/intel-ssd-dc-s3700-200gb-review Of most interest to me was this: "Along one edge of the drive Intel uses two 35V 47µF capacitors, enough to allow the controller to commit any data (and most non-data) to NAND in the event of a power failure. The capacitors in the S3700 are periodically tested by the controller. In the event that they fail, the controller disables all write buffering and throws a SMART error flag." This is also the first new Intel drive in a long time to use an Intel controller rather than a SandForce (which frankly, I don't trust). Anyone have any benchmarks to share? Are there any other sub-$1K drives out there currently that incorporate power loss protection like this and the 320s do? Thanks, Charles -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] New server setup
On Mar 13, 2013, at 3:23 PM, Steve Crawford wrote: > On 03/13/2013 09:15 AM, John Lister wrote: >> On 13/03/2013 15:50, Greg Jaskiewicz wrote: >>> SSDs have much shorter life then spinning drives, so what do you do when >>> one inevitably fails in your system ? >> Define much shorter? I accept they have a limited no of writes, but that >> depends on load. You can actively monitor the drives "health" level... > > What concerns me more than wear is this: > > InfoWorld Article: > http://www.infoworld.com/t/solid-state-drives/test-your-ssds-or-risk-massive-data-loss-researchers-warn-213715 > > Referenced research paper: > https://www.usenix.org/conference/fast13/understanding-robustness-ssds-under-power-fault > > Kind of messes with the "D" in ACID. Have a look at this: http://blog.2ndquadrant.com/intel_ssd_now_off_the_sherr_sh/ I'm not sure what other ssds offer this, but Intel's newest entry will, and it's attractively priced. Another way we leverage SSDs that can be more reliable in the face of total SSD meltdown is to use them as ZFS Intent Log caches. All the sync writes get handled on the SSDs. We deploy them as mirrored vdevs, so if one fails, we're OK. If both fail, we're really slow until someone can replace them. On modest hardware, I was able to get about 20K TPS out of pgbench with the SSDs configured as ZIL and 4 10K raptors as the spinny disks. In either case, the amount of money you'd have to spend on the two-dozen or so SAS drives (and the controllers, enclosure, etc.) that would equal a few pairs of SSDs in random IO performance is non-trivial, even if you plan on proactively retiring your SSDs every year. Just another take on the issue.. Charles > > Cheers, > Steve > > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication
On May 22, 2013, at 4:06 PM, Greg Smith wrote: > And there are some other products with interesting price/performance/capacity > combinations that are also sensitive to wearout. Seagate's hybrid drives > have turned interesting now that they cache writes safely for example. > There's no cheaper way to get 1TB with flash write speeds for small commits > than that drive right now. (Test results on that drive coming soon, along > with my full DC S3700 review) I am really looking forward to that. Will you announce here or just post on the 2ndQuadrant blog? Another "hybrid" solution is to run ZFS on some decent hard drives and then put the ZFS intent log on SSDs. With very synthetic benchmarks, the random write performance is excellent. All of these discussions about alternate storage media are great - everyone has different needs and there are certainly a number of deployments that can "get away" with spending much less money by adding some solid state storage. There's really an amazing number of options today… Thanks, Charles -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance