[PERFORM] SSD options, small database, ZFS

2011-10-14 Thread CSS
Hello all,

I've spent some time looking through previous posts regarding
postgres and SSD drives and have also been reading up on the
subject of SSDs in general elsewhere.

Some quick background:

We're currently looking at changing our basic database setup as we
migrate away from some rather old (circa 2006 or so) hardware to
more current gear.  Our previous setup basically consisted of two
types of 1U boxes - dual-xeon with a low-end Adaptec RAID card
paired with two U320 or U160 10K SCSI drives for database and light
web frontend and mail submission duties and single P4 and dual PIII
boxes with commodity drives handling mail deliveries.  Most of the
delivery servers have been migrated to current hardware (single and
dual quad-core xeons, 4 WD RE3 SATA drives, FreeBSD w/ZFS), and
we've also moved a handful of the db/web services to one of these
servers just to see what a large increase in RAM and CPU can do for
us.  As we sort of expected, the performance overall was better,
but there's a serious disk IO wall we hit when running nightly jobs
that are write heavy and long-running.

There are currently four db/web servers that have a combined total
of 133GB of on-disk postgres data.  Given the age of the servers,
the fact that each only has two drives on rather anemic RAID
controllers with no cache/BBU, and none has more than 2GB RAM I
think it's a safe bet that consolidating this down to two modern
servers can give us better performance, allow for growth over the
next few years, and let us split db duties from web/mail
submission.  One live db server, one standby.

And this is where SSDs come in.  We're not looking at terabytes of
data here, and I don't see us growing much beyond our current
database size in the next few years.  SSDs are getting cheap enough
that this feels like a good match - we can afford CPU and RAM, we
can't afford some massive 24 drive monster and a pile of SAS
drives.  The current db boxes top out at 300 IOPS, the SATA boxes
maybe about half that.  If I can achieve 300x4 IOPS (or better,
preferably...) on 2 or 4 SSDs, I'm pretty much sold.

From what I've been reading here, this sounds quite possible.  I
understand the Intel 320 series are the best "bargain" option since
they can survive an unexpected shutdown, so I'm not going to go
looking beyond that - OCZ makes me a bit nervous and finding a
clear list of which drives have the "supercapacitor" has been
difficult.  I have no qualms about buying enough SSDs to mirror
them all.  I am aware I need to have automated alerts for various
SMART stats so I know when danger is imminent.  I know I need to
have this box replicated even with mirrors and monitoring up the
wazoo.

Here's my remaining questions:

-I'm calling our combined databases at 133GB "small", fair
assumption?  -Is there any chance that a server with dual quad core
xeons, 32GB RAM, and 2 or 4 SSDs (assume mirrored) could be slower
than the 4 old servers described above?  I'm beating those on raw
cpu, quadrupling the amount of RAM (and consolidating said RAM),
and going from disks that top out at 4x300 IOPS with SSDs that
conservatively should provide 2000 IOPS.  

-We're also finally automating more stuff and trying to standardize
server configs.  One tough decision we made that has paid off quite
well was to move to ZFS.  We find the features helpful to admin
tasks outweigh the drawbacks and RAM is cheap enough that we can
deal with its tendency to eat RAM.  Is ZFS + Postgres + SSDs a bad
combo?

-Should I even be looking at the option of ZFS on SATA or low-end
SAS drives and ZIL and L2ARC on SSDs?  Initially this intrigued me,
but I can't quite get my head around how the SSD-based ZIL can deal
with flushing the metadata out when the whole system is under any
sort of extreme write-heavy load - I mean if the ZIL is absorbing
2000 IOPS of metadata writes, at some point it has to get full as
it's trying to flush this data to much slower spinning drives.

-Should my standby box be the same configuration or should I look
at actual spinning disks on that?  How rough is replication on the
underlying storage?  Would the total data written on the slave be
less or equal to the master?

Any input is appreciated.  I did really mean for this to be a much
shorter post...

Thanks,

Charles
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SSD options, small database, ZFS

2011-11-17 Thread CSS
Resurrecting this long-dormant thread...

On Oct 14, 2011, at 6:41 AM, Arjen van der Meijden wrote:

> On 14-10-2011 10:23, CSS wrote:
>> -I'm calling our combined databases at 133GB "small", fair
>> assumption?  -Is there any chance that a server with dual quad core
>> xeons, 32GB RAM, and 2 or 4 SSDs (assume mirrored) could be slower
>> than the 4 old servers described above?  I'm beating those on raw
>> cpu, quadrupling the amount of RAM (and consolidating said RAM),
>> and going from disks that top out at 4x300 IOPS with SSDs that
>> conservatively should provide 2000 IOPS.
> 
> Whether 133GB is small or not probably mostly depends on how much of it is 
> actually touched during use. But I'd agree that it isn't a terribly large 
> database, I'd guess a few simple SSDs would be plenty to achieve 2000 IOPs. 
> For lineair writes, they're still not really faster than normal disks, but if 
> that's combined with random access (either read or write) you ought to be ok.
> We went from 15x 15k sas-disks to 6x ssd several years back in our MySQL-box, 
> but since we also increased the ram from 16GB to 72GB, the io-load dropped so 
> much the ssd's are normally only lightly loaded...

Thanks for your input on this.  It's taken some time, but I do finally have 
some hardware on hand  (http://imgur.com/LEC5I) and as more trickles in over 
the coming days, I'll be putting together our first SSD-based postgres box.  I 
have much testing to do, and I'm going to have some questions regarding that 
subject in another thread.

> Btw, the 5500 and 5600 Xeons are normally more efficient with a multiple of 6 
> ram-modules, so you may want to have a look at 24GB (6x4), 36GB (6x4+6x2) or 
> 48GB (12x4 or 6x8) RAM.

Thanks - I really had a hard time wrapping my head around the rules on 
populating the banks.  If I understand it correctly, this is due to the memory 
controller moving from the south(?)bridge to being integrated in the CPU.

> Given the historical questions on the list, there is always a risk of getting 
> slower queries with hardware that should be much faster. For instance, the 
> huge increase in RAM may trigger a less efficient query-plan. Or the disks 
> abide by the flush-policies more correctly.
> Assuming the queries are still getting good plans and there are no such 
> special differences, I'd agree with the assumption that its a win on every 
> count.
> Or your update to a newer OS and PostgreSQL may trigger some worse query plan 
> or hardware-usage.

That's an interesting point, I'd not even considered that.  Is there any sort 
of simple documentation on the query planner that might cover how things like 
increased RAM could impact how a query is executed?

>> -Should I even be looking at the option of ZFS on SATA or low-end
>> SAS drives and ZIL and L2ARC on SSDs?  Initially this intrigued me,
>> but I can't quite get my head around how the SSD-based ZIL can deal
>> with flushing the metadata out when the whole system is under any
>> sort of extreme write-heavy load - I mean if the ZIL is absorbing
>> 2000 IOPS of metadata writes, at some point it has to get full as
>> it's trying to flush this data to much slower spinning drives.
> 
> A fail-safe set-up with SSD's in ZFS assumes at least 3 in total, i.e. a pair 
> of SSD's for ZIL and as many as you want for L2ARC. Given your database size, 
> 4x160GB SSD (in "raid10") or 2x 300GB should yield plenty of space. So given 
> the same choice, I wouldn't bother with a set of large capacity sata disks 
> and ZIL/L2ARC-SSD's, I'd just go with 4x160GB or 2x300GB SSD's.

Well, I've bought 4x160GB, so that's what I'll use.  I will still do some tests 
with two SATA drives plus ZIL, just to see what happens.

> 
>> -Should my standby box be the same configuration or should I look
>> at actual spinning disks on that?  How rough is replication on the
>> underlying storage?  Would the total data written on the slave be
>> less or equal to the master?
> 
> How bad is it for you if the performance of your database potentially drops a 
> fair bit when your slave becomes the master? If you have a read-mostly 
> database, you may not even need SSD's in your master-db (given your amount of 
> RAM). But honestly, I don't know the answer to this question :)

It's complicated - during the day we're mostly looking at very scattered reads 
and writes, probably a bit biased towards writes.  But each evening we kick off 
a number of jobs to pre-generate stats for more complex queries...  If the job 
could still complete in 6-8 hours, we'd probably be OK, but if it starts 
clogging up our normal que

[PERFORM] Benchmarking tools, methods

2011-11-18 Thread CSS
Hello,

I'm going to be testing some new hardware (see 
http://archives.postgresql.org/pgsql-performance/2011-11/msg00230.php) and 
while I've done some very rudimentary before/after tests with pgbench, I'm 
looking to pull more info than I have in the past, and I'd really like to 
automate things further.

I'll be starting with basic disk benchmarks (bonnie++ and iozone) and then 
moving on to pgbench.

I'm running FreeBSD and I'm interested in getting some baseline info on UFS2 
single disk (SATA 7200/WD RE4), gmirror, zfs mirror, zfs raidz1, zfs set of two 
mirrors (ie: two mirrored vdevs in a mirror).  Then I'm repeating that with the 
4 Intel 320 SSDs, and just to satisfy my curiosity, a zfs mirror with two of 
the SSDs mirrored as the ZIL.

Once that's narrowed down to a few practical choices, I'm moving on to pgbench. 
 I've found some good info here regarding pgbench that is unfortunately a bit 
dated:  http://www.westnet.com/~gsmith/content/postgresql/

A few questions:

-Any favorite automation or graphing tools beyond what's on Greg's site?
-Any detailed information on creating "custom" pgbench tests?
-Any other postgres benchmarking tools?

I'm also curious about benchmarking using my own data.  I tried something long 
ago that at least gave the illusion of working, but didn't seem quite right to 
me.  I enabled basic query logging on one of our busier servers, dumped the db, 
and let it run for 24 hours.  That gave me the normal random data from users 
throughout the day as well as our batch jobs that run overnight.  I had to grep 
out and reformat the actual queries from the logfile, but that was not 
difficult.   I then loaded the dump into the test server and basically fed the 
saved queries into it and timed the result.  I also hacked together a script to 
sample cpu and disk stats every 2S and had that feeding into an rrd database so 
I could see how "busy" things were.

In theory, this sounded good (to me), but I'm not sure I trust the results.  
Any suggestions on the general concept?  Is it sound?  Is there a better way to 
do it?  I really like the idea of using (our) real data.

Lastly, any general suggestions on tools to collect system data during tests 
and graph it are more than welcome.  I can homebrew, but I'm sure I'd be 
reinventing the wheel.

Oh, and if anyone wants any tests run that would not take an insane amount of 
time and would be valuable to those on this list, please let me know.  Since 
SSDs have been a hot topic lately and not everyone has a 4 SSDs laying around, 
I'd like to sort of focus on anything that would shed some light on the whole 
SSD craze.

The box under test ultimately will have 32GB RAM, 2 quad core 2.13GHz Xeon 5506 
cpus and 4 Intel 320 160GB SSDs.  I'm recycling some older boxes as well, so I 
have much more RAM on hand until those are finished.

Thanks,

Charles

ps - considering the new PostgreSQL Performance book that Packt has, any strong 
feelings about that one way or the other?  Does it go very far beyond what's on 
the wiki?
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Benchmarking tools, methods

2011-11-28 Thread CSS
On Nov 19, 2011, at 11:21 AM, Greg Smith wrote:

> On 11/18/2011 04:55 AM, CSS wrote:
>> I'm also curious about benchmarking using my own data.  I tried something 
>> long ago that at least gave the illusion of working, but didn't seem quite 
>> right to me.  I enabled basic query logging on one of our busier servers, 
>> dumped the db, and let it run for 24 hours.  That gave me the normal random 
>> data from users throughout the day as well as our batch jobs that run 
>> overnight.  I had to grep out and reformat the actual queries from the 
>> logfile, but that was not difficult.   I then loaded the dump into the test 
>> server and basically fed the saved queries into it and timed the result.  I 
>> also hacked together a script to sample cpu and disk stats every 2S and had 
>> that feeding into an rrd database so I could see how "busy" things were.
>> 
>> In theory, this sounded good (to me), but I'm not sure I trust the results.  
>> Any suggestions on the general concept?  Is it sound?  Is there a better way 
>> to do it?  I really like the idea of using (our) real data.
>>   
> 
> The thing that's hard to do here is replay the activity with the right 
> timing.  Some benchmarks, such as pgbench, will hit the database as fast as 
> it will process work.  That's not realistic.  You really need to consider 
> that real applications have pauses in them, and worry about that both in 
> playback speed and in results analysis.
> 
> See http://wiki.postgresql.org/wiki/Statement_Playback for some more info on 
> this.

Thanks so much for this, and thanks to Cédric for also pointing out Tsung 
specifically on that page.  I had no idea any of these tools existed.  I really 
like the idea of "application specific" testing, it makes total sense for the 
kind of things we're trying to measure.

I also wanted to thank everyone else that posted in this thread, all of this 
info is tremendously helpful.  This is a really excellent list, and I really 
appreciate all the people posting here that make their living doing paid 
consulting taking the time to monitor and post on this list.  Yet another way 
for me to validate choosing postgres over that "other" open source db.


>> ps - considering the new PostgreSQL Performance book that Packt has, any 
>> strong feelings about that one way or the other?  Does it go very far beyond 
>> what's on the wiki?
>>   
> 
> Pages 21 through 97 are about general benchmarking and hardware setup; 189 
> through 208 cover just pgbench.  There's almost no overlap between those 
> sections and the wiki, which is mainly focused on PostgreSQL usage issues.  
> Unless you're much smarter than me,  you can expect to spent months to years 
> reinventing wheels described there before reaching new ground in the areas it 
> covers.  From the questions you've been asking, you may not find as much 
> about ZFS tuning and SSDs as you'd like though.

We're grabbing a copy of it for the office.  Packt is running a sale, so we're 
also going to grab the "cookbook", it looks intriguing.

> http://www.2ndquadrant.com/en/talks/ has some updated material about things 
> discovered since the book was published.  The "Bottom-Up Database 
> Benchmarking" there shows the tests I'm running nowadays, which have evolved 
> a bit in the last year.

Looks like good stuff, thanks.

Charles

> -- 
> Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
> PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
> 
> 
> -- 
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] rough benchmarks, sata vs. ssd

2012-01-31 Thread CSS
Hello all,

Just wanted to share some results from some very basic benchmarking
runs comparing three disk configurations on the same hardware:

http://morefoo.com/bench.html

Before I launch into any questions about the results (I don't see
anything particularly shocking here), I'll describe the hardware and
configurations in use here.

Hardware:

*Tyan B7016 mainboard w/onboard LSI SAS controller
*2x4 core xeon E5506 (2.13GHz)
*64GB ECC RAM (8GBx8 ECC, 1033MHz)
*2x250GB Seagate SATA 7200.9 (ST3250824AS) drives (yes, old and slow)
*2x160GB Intel 320 SSD drives

Software:

*FreeBSD 8.2 STABLE snapshot from 6/2011 (includes zfsv28, this is
our production snapshot) 
*PostgreSQL 9.0.6 (also what we run in production) 
*pgbench-tools 0.5 (to automate the test runs and make nice graphs)

I was mainly looking to compare three variations of drive
combinations and verify that we don't see any glaring performance
issues with Postgres running on ZFS.  We mostly run 1U boxes and
we're looking for ways to get better performance without having to
invest in some monster box that can hold a few dozen cheap SATA
drives.  SSDs or SATA with SSDs hosting the "ZIL" (ZFS Intent Log).
The ZIL is a bit of a cheat, as it allows you to throw all the
synchronous writes to the SSD - I was particularly curious about how
this would benchmark even though we will not likely use ZIL in
production (at least not on this db box).

background thread: 
http://archives.postgresql.org/pgsql-performance/2011-10/msg00137.php

So the three sets of results I've linked are all pgbench-tools runs
of the "tpc-b" benchmark.  One using the two SATA drives in a ZFS
mirror, one with the same two drives in a ZFS mirror with two of the
Intel 320s as ZIL for that pool, and one with just two Intel 320s in
a ZFS mirror.  Note that I also included graphs in the pgbench
results of some basic system metrics.  That's from a few simple
scripts that collect some vmstat, iostat and "zpool iostat" info
during the runs at 1 sample/second.  They are a bit ugly, but give a
good enough visual representation of how swamped the drives are
during the course of the tests.

Why ZFS?  Well, we adopted it pretty early for other tasks and it
makes a number of tasks easy.  It's been stable for us for the most
part and our latest wave of boxes all use cheap SATA disks, which
gives us two things - a ton of cheap space (in 1U) for snapshots and
all the other space-consuming toys ZFS gives us, and on this cheaper
disk type, a guarantee that we're not dealing with silent data
corruption (these are probably the normal fanboy talking points).
ZFS snapshots are also a big time-saver when benchmarking.  For our
own application testing I load the data once, shut down postgres,
snapshot pgsql + the app homedir and start postgres.  After each run
that changes on-disk data, I simply rollback the snapshot.

I don't have any real questions for the list, but I'd love to get
some feedback, especially on the ZIL results.  The ZIL results
interest me because I have not settled on what sort of box we'll be
using as a replication slave for this one - I was going to either go
the somewhat risky route of another all-SSD box or looking at just
how cheap I can go with lots of 2.5" SAS drives in a 2U.

I'm hoping that the general "call for discussion" is an acceptable
request for this list, which seems to cater more often to very
specific tuning questions.  If not, let me know.

If you have any test requests that can be quickly run on the above
hardware, let me know.  I'll have the box easily accessible for the
next few days at least (and I wouldn't mind pushing more writes
through to two of my four ssds before deploying the whole mess in
case it is true that SSDs fail a the same write cycle count).  I'll
be doing more tests for my own curiousity such as making sure UFS2
doesn't wildly outperform ZFS on the SSD-only setup, testing with
the expected final config of 4 Intel 320s, and then lots of
application-specific tests, and finally digging a bit more
thoroughly into Greg's book to make sure I squeeze all I can out of
this thing.

Thanks,

Charles
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] rough benchmarks, sata vs. ssd

2012-02-10 Thread CSS

On Feb 3, 2012, at 6:23 AM, Ivan Voras wrote:

> On 31/01/2012 09:07, CSS wrote:
>> Hello all,
>> 
>> Just wanted to share some results from some very basic benchmarking
>> runs comparing three disk configurations on the same hardware:
>> 
>> http://morefoo.com/bench.html
> 
> That's great!

Thanks.  I did spend a fair amount of time on it.  It was also a
good excuse to learn a little about gnuplot, which I used to draw
the (somewhat oddly combined) system stats.  I really wanted to see
IO and CPU info over the duration of a test even if I couldn't
really know what part of the test was running.  Don't ask me why
iostat sometimes shows greater than 100% in the "busy" column
though.  It is in the raw iostat output I used to create the graphs.

> 
>> *Tyan B7016 mainboard w/onboard LSI SAS controller
>> *2x4 core xeon E5506 (2.13GHz)
>> *64GB ECC RAM (8GBx8 ECC, 1033MHz)
>> *2x250GB Seagate SATA 7200.9 (ST3250824AS) drives (yes, old and slow)
>> *2x160GB Intel 320 SSD drives
> 
> It shows that you can have large cheap SATA drives and small fast SSD-s, and 
> up to a point have best of both worlds. Could you send me (privately) a tgz 
> of the results (i.e. the pages+images from the above URL), I'd like to host 
> them somewhere more permanently.

Sent offlist, including raw vmstat, iostat and zpool iostat output.

> 
>> The ZIL is a bit of a cheat, as it allows you to throw all the
>> synchronous writes to the SSD
> 
> This is one of the main reasons it was made. It's not a cheat, it's by design.

I meant that only in the best way.  Some of my proudest achievements
are cheats. :)

It's a clever way of moving cache to something non-volatile and
providing a fallback, although the fallback would be insanely slow
in comparison.

> 
>> Why ZFS?  Well, we adopted it pretty early for other tasks and it
>> makes a number of tasks easy.  It's been stable for us for the most
>> part and our latest wave of boxes all use cheap SATA disks, which
>> gives us two things - a ton of cheap space (in 1U) for snapshots and
>> all the other space-consuming toys ZFS gives us, and on this cheaper
>> disk type, a guarantee that we're not dealing with silent data
>> corruption (these are probably the normal fanboy talking points).
>> ZFS snapshots are also a big time-saver when benchmarking.  For our
>> own application testing I load the data once, shut down postgres,
>> snapshot pgsql + the app homedir and start postgres.  After each run
>> that changes on-disk data, I simply rollback the snapshot.
> 
> Did you tune ZFS block size for the postgresql data directory (you'll need to 
> re-create the file system to do this)? When I investigated it in the past, it 
> really did help performance.

I actually did not.  A year or so ago I was doing some basic tests
on cheap SATA drives with ZFS and at least with pgbench, I could see
no difference at all.  I actually still have some of that info, so
I'll include it here.  This was a 4-core xeon, E5506 2.1GHZ, 4 1TB
WD RE3 drives in a RAIDZ1 array, 8GB RAM.

I tested three things - time to load an 8.5GB dump of one of our
dbs, time to run through a querylog of real data (1.4M queries), and
then pgbench with a scaling factor of 100, 20 clients, 10K
transactions per client.

default 128K zfs recordsize:

-9 minutes to load data
-17 minutes to run query log
-pgbench output

transaction type: TPC-B (sort of)
scaling factor: 100
query mode: simple
number of clients: 20
number of transactions per client: 1
number of transactions actually processed: 20/20
tps = 100.884540 (including connections establishing)
tps = 100.887593 (excluding connections establishing)

8K zfs recordsize (wipe data dir and reinit db)

-10 minutes to laod data
-21 minutes to run query log
-pgbench output

transaction type: TPC-B (sort of)
scaling factor: 100
query mode: simple
number of clients: 20
number of transactions per client: 1
number of transactions actually processed: 20/20
tps = 97.896038 (including connections establishing)
tps = 97.898279 (excluding connections establishing)

Just thought I'd include that since I have the data.

> 
>> I don't have any real questions for the list, but I'd love to get
>> some feedback, especially on the ZIL results.  The ZIL results
>> interest me because I have not settled on what sort of box we'll be
>> using as a replication slave for this one - I was going to either go
>> the somewhat risky route of another all-SSD box or looking at just
>> how cheap I can go with lots of 2.5" SAS drives in a 2U.
> 
> You probably know the answer to that: if you need lots of storage, you'll 
> probably be better off using large SATA d

Re: [PERFORM] rough benchmarks, sata vs. ssd

2012-02-13 Thread CSS
For the top-post scanners, I updated the ssd test to include
changing the zfs recordsize to 8k.

On Feb 11, 2012, at 1:35 AM, CSS wrote:

> 
> On Feb 3, 2012, at 6:23 AM, Ivan Voras wrote:
> 
>> On 31/01/2012 09:07, CSS wrote:
>>> Hello all,
>>> 
>>> Just wanted to share some results from some very basic benchmarking
>>> runs comparing three disk configurations on the same hardware:
>>> 
>>> http://morefoo.com/bench.html
>> 
>> That's great!
> 
> Thanks.  I did spend a fair amount of time on it.  It was also a
> good excuse to learn a little about gnuplot, which I used to draw
> the (somewhat oddly combined) system stats.  I really wanted to see
> IO and CPU info over the duration of a test even if I couldn't
> really know what part of the test was running.  Don't ask me why
> iostat sometimes shows greater than 100% in the "busy" column
> though.  It is in the raw iostat output I used to create the graphs.
> 
>> 
>>> *Tyan B7016 mainboard w/onboard LSI SAS controller
>>> *2x4 core xeon E5506 (2.13GHz)
>>> *64GB ECC RAM (8GBx8 ECC, 1033MHz)
>>> *2x250GB Seagate SATA 7200.9 (ST3250824AS) drives (yes, old and slow)
>>> *2x160GB Intel 320 SSD drives
>> 
>> It shows that you can have large cheap SATA drives and small fast SSD-s, and 
>> up to a point have best of both worlds. Could you send me (privately) a tgz 
>> of the results (i.e. the pages+images from the above URL), I'd like to host 
>> them somewhere more permanently.
> 
> Sent offlist, including raw vmstat, iostat and zpool iostat output.
> 
>> 
>>> The ZIL is a bit of a cheat, as it allows you to throw all the
>>> synchronous writes to the SSD
>> 
>> This is one of the main reasons it was made. It's not a cheat, it's by 
>> design.
> 
> I meant that only in the best way.  Some of my proudest achievements
> are cheats. :)
> 
> It's a clever way of moving cache to something non-volatile and
> providing a fallback, although the fallback would be insanely slow
> in comparison.
> 
>> 
>>> Why ZFS?  Well, we adopted it pretty early for other tasks and it
>>> makes a number of tasks easy.  It's been stable for us for the most
>>> part and our latest wave of boxes all use cheap SATA disks, which
>>> gives us two things - a ton of cheap space (in 1U) for snapshots and
>>> all the other space-consuming toys ZFS gives us, and on this cheaper
>>> disk type, a guarantee that we're not dealing with silent data
>>> corruption (these are probably the normal fanboy talking points).
>>> ZFS snapshots are also a big time-saver when benchmarking.  For our
>>> own application testing I load the data once, shut down postgres,
>>> snapshot pgsql + the app homedir and start postgres.  After each run
>>> that changes on-disk data, I simply rollback the snapshot.
>> 
>> Did you tune ZFS block size for the postgresql data directory (you'll need 
>> to re-create the file system to do this)? When I investigated it in the 
>> past, it really did help performance.
> 

Well now I did, added the results to
http://ns.morefoo.com/bench.html and it looks like there's
certainly an improvement.  That's with the only change from the
previous test being to copy the postgres data dir, wipe the
original, set the zfs recordsize to 8K (default is 128K), and then
copy the data dir back.

Things that stand out on first glance:

-at a scaling factor of 10 or greater, there is a much more gentle
 decline in TPS than with the default zfs recordsize
-on the raw *disk* IOPS graph, I now see writes peaking at around 
 11K/second compared to 1.5K/second.
-on the zpool iostat graph, I do not see those huge write peaks, 
 which is a bit confusing
-on both iostat graphs, I see the datapoints look more scattered
 with the 8K recordsize

Any comments are certainly welcome.  I understand 8K recordsize
should perform better since that's the size of the chunks of data
postgresql is dealing with, but the effects on the system graphs
are interesting and I'm not quite following how it all relates.

I wonder if the recordsize impacts the ssd write amplification at
all...

Thanks,

Charles


> I actually did not.  A year or so ago I was doing some basic tests
> on cheap SATA drives with ZFS and at least with pgbench, I could see
> no difference at all.  I actually still have some of that info, so
> I'll include it here.  This was a 4-core xeon, E5506 2.1GHZ, 4 1TB
> WD RE3 drives in a RAIDZ1 array, 8GB RAM.
> 
> I tested three things - time to load an 8.5GB dump of one of our
> dbs, time to run through a queryl

[PERFORM] select operations that generate disk writes

2012-07-05 Thread CSS
Hello,

Time for a broad question.  I'm aware of some specific select queries that will 
generate disk writes - for example, a sort operation when there's not enough 
work_mem can cause PG to write out some temp tables (not the correct 
terminology?).  That scenario is easily remedied by enabling "log_temp_files" 
and specifying the threshold in temp file size at which you want logging to 
happen.

I've recently been trying to put some of my recent reading of Greg's book and 
other performance-related documentation to use by seeking out queries that take 
an inordinate amount of time to run.  Given that we're usually disk-bound, I've 
gotten in the habit of running an iostat in a terminal while running and 
tweaking some of the problem queries.  I find this gives me some nice instant 
feedback on how hard the query is causing PG to hit the disks.  What's 
currently puzzling me are some selects with complex joins and sorts that 
generate some fairly large bursts of write activity while they run.  I was able 
to reduce this by increasing work_mem (client-side) to give the sorts an 
opportunity to happen in memory.  I now see no temp file writes being logged, 
and indeed the query sped up.

So my question is, what else can generate writes when doing read-only 
operations?  I know it sounds like a simple question, but I'm just not finding 
a concise answer anywhere.

Thanks,

Charles
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Anyone running Intel S3700 SSDs?

2013-03-07 Thread CSS
Considering this list is where I first learned of the Intel 320 drives (AFAIK, 
the only non-enterprise SSDs that are power-failure safe), I thought I'd see if 
any of the folks here that tend to test new stuff have got their hands on these 
yet.

I had no idea these drives were out (but they still are a bit pricey, but 
cheaper than any spinning drives that would give the same sort of random IO 
performance), and while trying to find a place to source some spare 300GB 320s, 
I found this review:

http://www.anandtech.com/show/6433/intel-ssd-dc-s3700-200gb-review

Of most interest to me was this:

"Along one edge of the drive Intel uses two 35V 47µF capacitors, enough to 
allow the controller to commit any data (and most non-data) to NAND in the 
event of a power failure. The capacitors in the S3700 are periodically tested 
by the controller. In the event that they fail, the controller disables all 
write buffering and throws a SMART error flag."

This is also the first new Intel drive in a long time to use an Intel 
controller rather than a SandForce (which frankly, I don't trust).

Anyone have any benchmarks to share?

Are there any other sub-$1K drives out there currently that incorporate power 
loss protection like this and the 320s do?

Thanks,

Charles

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server setup

2013-03-13 Thread CSS
On Mar 13, 2013, at 3:23 PM, Steve Crawford wrote:

> On 03/13/2013 09:15 AM, John Lister wrote:
>> On 13/03/2013 15:50, Greg Jaskiewicz wrote:
>>> SSDs have much shorter life then spinning drives, so what do you do when 
>>> one inevitably fails in your system ?
>> Define much shorter? I accept they have a limited no of writes, but that 
>> depends on load. You can actively monitor the drives "health" level...
> 
> What concerns me more than wear is this:
> 
> InfoWorld Article:
> http://www.infoworld.com/t/solid-state-drives/test-your-ssds-or-risk-massive-data-loss-researchers-warn-213715
> 
> Referenced research paper:
> https://www.usenix.org/conference/fast13/understanding-robustness-ssds-under-power-fault
> 
> Kind of messes with the "D" in ACID.

Have a look at this:

http://blog.2ndquadrant.com/intel_ssd_now_off_the_sherr_sh/

I'm not sure what other ssds offer this, but Intel's newest entry will, and 
it's attractively priced.

Another way we leverage SSDs that can be more reliable in the face of total SSD 
meltdown is to use them as ZFS Intent Log caches.  All the sync writes get 
handled on the SSDs.  We deploy them as mirrored vdevs, so if one fails, we're 
OK.  If both fail, we're really slow until someone can replace them.  On modest 
hardware, I was able to get about 20K TPS out of pgbench with the SSDs 
configured as ZIL and 4 10K raptors as the spinny disks.

In either case, the amount of money you'd have to spend on the two-dozen or so 
SAS drives (and the controllers, enclosure, etc.) that would equal a few pairs 
of SSDs in random IO performance is non-trivial, even if you plan on 
proactively retiring your SSDs every year.

Just another take on the issue..

Charles

> 
> Cheers,
> Steve
> 
> 
> 
> -- 
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-22 Thread CSS
On May 22, 2013, at 4:06 PM, Greg Smith wrote:

> And there are some other products with interesting price/performance/capacity 
> combinations that are also sensitive to wearout.  Seagate's hybrid drives 
> have turned interesting now that they cache writes safely for example.  
> There's no cheaper way to get 1TB with flash write speeds for small commits 
> than that drive right now.  (Test results on that drive coming soon, along 
> with my full DC S3700 review)

I am really looking forward to that.  Will you announce here or just post on 
the 2ndQuadrant blog?

Another "hybrid" solution is to run ZFS on some decent hard drives and then put 
the ZFS intent log on SSDs.  With very synthetic benchmarks, the random write 
performance is excellent.

All of these discussions about alternate storage media are great - everyone has 
different needs and there are certainly a number of deployments that can "get 
away" with spending much less money by adding some solid state storage.  
There's really an amazing number of options today…

Thanks,

Charles

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance