Re: [ceph-users] SSD journal deployment experiences

2014-09-09 Thread Christian Balzer
On Tue, 9 Sep 2014 10:57:26 -0700 Craig Lewis wrote: > On Sat, Sep 6, 2014 at 9:27 AM, Christian Balzer wrote: > > > On Sat, 06 Sep 2014 16:06:56 + Scott Laird wrote: > > > > > Backing up slightly, have you considered RAID 5 over your SSDs? > > > Practically speaking, there's no performance

Re: [ceph-users] SSD journal deployment experiences

2014-09-09 Thread Craig Lewis
On Sat, Sep 6, 2014 at 9:27 AM, Christian Balzer wrote: > On Sat, 06 Sep 2014 16:06:56 + Scott Laird wrote: > > > Backing up slightly, have you considered RAID 5 over your SSDs? > > Practically speaking, there's no performance downside to RAID 5 when > > your devices aren't IOPS-bound. > > >

Re: [ceph-users] SSD journal deployment experiences

2014-09-09 Thread Craig Lewis
On Sat, Sep 6, 2014 at 7:50 AM, Dan van der Ster wrote: > > BTW, do you happen to know, _if_ we re-use an OSD after the journal has > failed, are any object inconsistencies going to be found by a > scrub/deep-scrub? > I haven't tested this, but I did something I *think* is similar. I deleted an

Re: [ceph-users] SSD journal deployment experiences

2014-09-08 Thread Christian Balzer
stat. Christian > Regards, > Quenten Grasso > > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Christian Balzer Sent: Sunday, 7 September 2014 1:38 AM > To: ceph-users > Subject: Re: [ceph-users] SSD journ

Re: [ceph-users] SSD journal deployment experiences

2014-09-08 Thread Quenten Grasso
Of Christian Balzer Sent: Sunday, 7 September 2014 1:38 AM To: ceph-users Subject: Re: [ceph-users] SSD journal deployment experiences On Sat, 6 Sep 2014 14:50:20 + Dan van der Ster wrote: > September 6 2014 4:01 PM, "Christian Balzer" wrote: > > On Sat, 6 Sep 201

Re: [ceph-users] SSD journal deployment experiences

2014-09-08 Thread Dan Van Der Ster
Hi Scott, > On 06 Sep 2014, at 20:39, Scott Laird wrote: > > IOPS are weird things with SSDs. In theory, you'd see 25% of the write IOPS > when writing to a 4-way RAID5 device, since you write to all 4 devices in > parallel. Except that's not actually true--unlike HDs where an IOP is an > I

Re: [ceph-users] SSD journal deployment experiences

2014-09-06 Thread Scott Laird
IOPS are weird things with SSDs. In theory, you'd see 25% of the write IOPS when writing to a 4-way RAID5 device, since you write to all 4 devices in parallel. Except that's not actually true--unlike HDs where an IOP is an IOP, SSD IOPS limits are really just a function of request size. Because

Re: [ceph-users] SSD journal deployment experiences

2014-09-06 Thread Dan Van Der Ster
RAID5... Hadn't considered it due to the IOPS penalty (it would get 1/4th of the IOPS of separated journal devices, according to some online raid calc). Compared to RAID10, I guess we'd get 50% more capacity, but lower performance. After the anecdotes that the DCS3700 is very rarely failing, and

Re: [ceph-users] SSD journal deployment experiences

2014-09-06 Thread Christian Balzer
On Sat, 06 Sep 2014 16:06:56 + Scott Laird wrote: > Backing up slightly, have you considered RAID 5 over your SSDs? > Practically speaking, there's no performance downside to RAID 5 when > your devices aren't IOPS-bound. > Well... For starters with RAID5 you would loose 25% throughput in bo

Re: [ceph-users] SSD journal deployment experiences

2014-09-06 Thread Scott Laird
Backing up slightly, have you considered RAID 5 over your SSDs? Practically speaking, there's no performance downside to RAID 5 when your devices aren't IOPS-bound. On Sat Sep 06 2014 at 8:37:56 AM Christian Balzer wrote: > On Sat, 6 Sep 2014 14:50:20 + Dan van der Ster wrote: > > > Septemb

Re: [ceph-users] SSD journal deployment experiences

2014-09-06 Thread Christian Balzer
On Sat, 6 Sep 2014 14:50:20 + Dan van der Ster wrote: > September 6 2014 4:01 PM, "Christian Balzer" wrote: > > On Sat, 6 Sep 2014 13:07:27 + Dan van der Ster wrote: > > > >> Hi Christian, > >> > >> Let's keep debating until a dev corrects us ;) > > > > For the time being, I give the

Re: [ceph-users] SSD journal deployment experiences

2014-09-06 Thread Dan van der Ster
September 6 2014 4:01 PM, "Christian Balzer" wrote: > On Sat, 6 Sep 2014 13:07:27 + Dan van der Ster wrote: > >> Hi Christian, >> >> Let's keep debating until a dev corrects us ;) > > For the time being, I give the recent: > > https://www.mail-archive.com/ceph-users@lists.ceph.com/msg1220

Re: [ceph-users] SSD journal deployment experiences

2014-09-06 Thread Christian Balzer
On Sat, 6 Sep 2014 13:07:27 + Dan van der Ster wrote: > Hi Christian, > > Let's keep debating until a dev corrects us ;) > For the time being, I give the recent: https://www.mail-archive.com/ceph-users@lists.ceph.com/msg12203.html And not so recent: http://www.spinics.net/lists/ceph-users/

Re: [ceph-users] SSD journal deployment experiences

2014-09-06 Thread Dan van der Ster
Hi Christian, Let's keep debating until a dev corrects us ;) September 6 2014 1:27 PM, "Christian Balzer" wrote: > On Fri, 5 Sep 2014 09:42:02 + Dan Van Der Ster wrote: > >>> On 05 Sep 2014, at 11:04, Christian Balzer wrote: >>> >>> On Fri, 5 Sep 2014 07:46:12 + Dan Van Der Ster wrot

Re: [ceph-users] SSD journal deployment experiences

2014-09-06 Thread Christian Balzer
On Fri, 5 Sep 2014 09:42:02 + Dan Van Der Ster wrote: > > > On 05 Sep 2014, at 11:04, Christian Balzer wrote: > > > > On Fri, 5 Sep 2014 07:46:12 + Dan Van Der Ster wrote: > >> > >>> On 05 Sep 2014, at 03:09, Christian Balzer wrote: > >>> > >>> On Thu, 4 Sep 2014 14:49:39 -0700 Craig

Re: [ceph-users] SSD journal deployment experiences

2014-09-05 Thread Dan Van Der Ster
> On 05 Sep 2014, at 11:04, Christian Balzer wrote: > > > Hello Dan, > > On Fri, 5 Sep 2014 07:46:12 + Dan Van Der Ster wrote: > >> Hi Christian, >> >>> On 05 Sep 2014, at 03:09, Christian Balzer wrote: >>> >>> >>> Hello, >>> >>> On Thu, 4 Sep 2014 14:49:39 -0700 Craig Lewis wrote: >

Re: [ceph-users] SSD journal deployment experiences

2014-09-05 Thread Dan Van Der Ster
> On 05 Sep 2014, at 10:30, Nigel Williams wrote: > > On Fri, Sep 5, 2014 at 5:46 PM, Dan Van Der Ster > wrote: >>> On 05 Sep 2014, at 03:09, Christian Balzer wrote: >>> You might want to look into cache pools (and dedicated SSD servers with >>> fast controllers and CPUs) in your test cluster

Re: [ceph-users] SSD journal deployment experiences

2014-09-05 Thread Christian Balzer
Hello Dan, On Fri, 5 Sep 2014 07:46:12 + Dan Van Der Ster wrote: > Hi Christian, > > > On 05 Sep 2014, at 03:09, Christian Balzer wrote: > > > > > > Hello, > > > > On Thu, 4 Sep 2014 14:49:39 -0700 Craig Lewis wrote: > > > >> On Thu, Sep 4, 2014 at 9:21 AM, Dan Van Der Ster > >> wrote

Re: [ceph-users] SSD journal deployment experiences

2014-09-05 Thread Nigel Williams
On Fri, Sep 5, 2014 at 5:46 PM, Dan Van Der Ster wrote: >> On 05 Sep 2014, at 03:09, Christian Balzer wrote: >> You might want to look into cache pools (and dedicated SSD servers with >> fast controllers and CPUs) in your test cluster and for the future. >> Right now my impression is that there i

Re: [ceph-users] SSD journal deployment experiences

2014-09-05 Thread Dan Van Der Ster
Hi Christian, > On 05 Sep 2014, at 03:09, Christian Balzer wrote: > > > Hello, > > On Thu, 4 Sep 2014 14:49:39 -0700 Craig Lewis wrote: > >> On Thu, Sep 4, 2014 at 9:21 AM, Dan Van Der Ster >> wrote: >> >>> >>> >>> 1) How often are DC S3700's failing in your deployments? >>> >> >> None

Re: [ceph-users] SSD journal deployment experiences

2014-09-04 Thread Martin B Nielsen
On Thu, Sep 4, 2014 at 10:23 PM, Dan van der Ster wrote: > Hi Martin, > > September 4 2014 10:07 PM, "Martin B Nielsen" wrote: > > Hi Dan, > > > > We took a different approach (and our cluster is tiny compared to many > others) - we have two pools; > > normal and ssd. > > > > We use 14 disks in

Re: [ceph-users] SSD journal deployment experiences

2014-09-04 Thread Christian Balzer
Hello, On Thu, 4 Sep 2014 14:49:39 -0700 Craig Lewis wrote: > On Thu, Sep 4, 2014 at 9:21 AM, Dan Van Der Ster > wrote: > > > > > > > 1) How often are DC S3700's failing in your deployments? > > > > None of mine have failed yet. I am planning to monitor the wear level > indicator, and preemp

Re: [ceph-users] SSD journal deployment experiences

2014-09-04 Thread Mark Kirkwood
On 05/09/14 10:05, Dan van der Ster wrote: That's good to know. I would plan similarly for the wear out. But I want to also prepare for catastrophic failures -- in the past we've had SSDs just disappear like a device unplug. Those were older OCZ's though... Yes - the Intel dc style drives s

Re: [ceph-users] SSD journal deployment experiences

2014-09-04 Thread Dan van der Ster
Hi Craig, September 4 2014 11:50 PM, "Craig Lewis" wrote: > On Thu, Sep 4, 2014 at 9:21 AM, Dan Van Der Ster > wrote: > >> 1) How often are DC S3700's failing in your deployments? > > None of mine have failed yet. I am planning to monitor the wear level > indicator, and preemptively > repl

Re: [ceph-users] SSD journal deployment experiences

2014-09-04 Thread Craig Lewis
On Thu, Sep 4, 2014 at 9:21 AM, Dan Van Der Ster wrote: > > > 1) How often are DC S3700's failing in your deployments? > None of mine have failed yet. I am planning to monitor the wear level indicator, and preemptively replace any SSDs that go below 10%. Manually flushing the journal, replacin

Re: [ceph-users] SSD journal deployment experiences

2014-09-04 Thread Dan van der Ster
Hi Martin, September 4 2014 10:07 PM, "Martin B Nielsen" wrote: > Hi Dan, > > We took a different approach (and our cluster is tiny compared to many > others) - we have two pools; > normal and ssd. > > We use 14 disks in each osd-server; 8 platter and 4 ssd for ceph, and 2 ssd > for OS/journ

Re: [ceph-users] SSD journal deployment experiences

2014-09-04 Thread Martin B Nielsen
Hi Dan, We took a different approach (and our cluster is tiny compared to many others) - we have two pools; normal and ssd. We use 14 disks in each osd-server; 8 platter and 4 ssd for ceph, and 2 ssd for OS/journals. We partitioned the two OS ssd as raid1 using about half the space for the OS and

Re: [ceph-users] SSD journal deployment experiences

2014-09-04 Thread Dan van der Ster
Hi Stefan, September 4 2014 9:13 PM, "Stefan Priebe" wrote: > Hi Dan, hi Robert, > > Am 04.09.2014 21:09, schrieb Dan van der Ster: > >> Thanks again for all of your input. I agree with your assessment -- in >> our cluster we avg <3ms for a random (hot) 4k read already, but > 40ms >> for a 4k

Re: [ceph-users] SSD journal deployment experiences

2014-09-04 Thread Robert LeBlanc
This is good to know. I just recompiled the CentOS7 3.10 kernel to enable bcache (I doubt they patched bcache since they don't compile/enable it). I've seen when I ran Ceph in VMs on my workstation that there were oops with bcache, but doing the bcache device and the backend device even with two co

Re: [ceph-users] SSD journal deployment experiences

2014-09-04 Thread Stefan Priebe
Hi Dan, hi Robert, Am 04.09.2014 21:09, schrieb Dan van der Ster: Thanks again for all of your input. I agree with your assessment -- in our cluster we avg <3ms for a random (hot) 4k read already, but > 40ms for a 4k write. That's why we're adding the SSDs -- you just can't run a proportioned RB

Re: [ceph-users] SSD journal deployment experiences

2014-09-04 Thread Dan van der Ster
Thanks again for all of your input. I agree with your assessment -- in our cluster we avg 40ms for a 4k write. That's why we're adding the SSDs -- you just can't run a proportioned RBD service without them. I'll definitely give bcache a try in my test setup, but more reading has kinda tempere

Re: [ceph-users] SSD journal deployment experiences

2014-09-04 Thread Robert LeBlanc
You should be able to use any block device in a bcache device. Right now, we are OK losing one SSD and it takes out 5 OSDs. We would rather have twice the cache. Our opinion may change in the future. We wanted to keep as much overhead as low as possible. I think we may spend the extra on heavier du

Re: [ceph-users] SSD journal deployment experiences

2014-09-04 Thread Dan Van Der Ster
I've just been reading the bcache docs. It's a pity the mirrored writes aren't implemented yet. Do you know if you can use an md RAID1 as a cache dev? And is the graceful failover from wb to writethrough actually working without data loss? Also, write behind sure would help the filestore, since

Re: [ceph-users] SSD journal deployment experiences

2014-09-04 Thread Robert LeBlanc
So far it was worked really well, we can raise/lower/disable/enable the cache in realtime and watch how the load and traffic changes. There has been some positive subjective results, but definitive results are still forth coming. bcache on CentOS 7 was not easy, makes me wish we were running Debian

Re: [ceph-users] SSD journal deployment experiences

2014-09-04 Thread Dan Van Der Ster
Hi Robert, That's actually a pretty good idea, since bcache would also accelerate the filestore flushes and leveldb. I actually wonder if an SSD-only pool would even be faster than such a setup... probably not. We're using an ancient enterprise n distro, so it will be a bit of a headache to ge

Re: [ceph-users] SSD journal deployment experiences

2014-09-04 Thread Robert LeBlanc
We are still pretty early on in our testing of how to best use SSDs as well. What we are trying right now, for some of the reasons you mentioned already, is to use bcache as a cache for both journal and data. We have 10 spindles in our boxes with 2 SSDs. We created two bcaches (one for each SSD) an