On 11/05/2015 11:03 PM, Michal Kozanecki wrote:
> Why did you guys go with partitioning the SSD for ceph journals, instead of 
> just using the whole SSD for bcache and leaving the journal on the filesystem 
> (which itself is ontop bcache)? Was there really a benefit to separating the 
> journals from the bcache fronted HDDs?
> 
> I ask because it has been shown in the past that separating the journal on 
> SSD based pools doesn't really do much.
> 

Well, the I/O for the journal by-passes bcache completely in this case.
The less code the I/O travels through the better we figured.

We didn't try with the Journal on bcache. This works for us so we didn't
mind testing anything different.

Wido

> Michal Kozanecki | Linux Administrator | mkozane...@evertz.com
> 
> 
> -----Original Message-----
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Wido 
> den Hollander
> Sent: October-28-15 5:49 AM
> To: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Ceph OSDs with bcache experience
> 
> 
> 
> On 21-10-15 15:30, Mark Nelson wrote:
>>
>>
>> On 10/21/2015 01:59 AM, Wido den Hollander wrote:
>>> On 10/20/2015 07:44 PM, Mark Nelson wrote:
>>>> On 10/20/2015 09:00 AM, Wido den Hollander wrote:
>>>>> Hi,
>>>>>
>>>>> In the "newstore direction" thread on ceph-devel I wrote that I'm 
>>>>> using bcache in production and Mark Nelson asked me to share some details.
>>>>>
>>>>> Bcache is running in two clusters now that I manage, but I'll keep 
>>>>> this information to one of them (the one at PCextreme behind CloudStack).
>>>>>
>>>>> In this cluster has been running for over 2 years now:
>>>>>
>>>>> epoch 284353
>>>>> fsid 0d56dd8f-7ae0-4447-b51b-f8b818749307
>>>>> created 2013-09-23 11:06:11.819520
>>>>> modified 2015-10-20 15:27:48.734213
>>>>>
>>>>> The system consists out of 39 hosts:
>>>>>
>>>>> 2U SuperMicro chassis:
>>>>> * 80GB Intel SSD for OS
>>>>> * 240GB Intel S3700 SSD for Journaling + Bcache
>>>>> * 6x 3TB disk
>>>>>
>>>>> This isn't the newest hardware. The next batch of hardware will be 
>>>>> more disks per chassis, but this is it for now.
>>>>>
>>>>> All systems were installed with Ubuntu 12.04, but they are all 
>>>>> running
>>>>> 14.04 now with bcache.
>>>>>
>>>>> The Intel S3700 SSD is partitioned with a GPT label:
>>>>> - 5GB Journal for each OSD
>>>>> - 200GB Partition for bcache
>>>>>
>>>>> root@ceph11:~# df -h|grep osd
>>>>> /dev/bcache0    2.8T  1.1T  1.8T  38% /var/lib/ceph/osd/ceph-60
>>>>> /dev/bcache1    2.8T  1.2T  1.7T  41% /var/lib/ceph/osd/ceph-61
>>>>> /dev/bcache2    2.8T  930G  1.9T  34% /var/lib/ceph/osd/ceph-62
>>>>> /dev/bcache3    2.8T  970G  1.8T  35% /var/lib/ceph/osd/ceph-63
>>>>> /dev/bcache4    2.8T  814G  2.0T  30% /var/lib/ceph/osd/ceph-64
>>>>> /dev/bcache5    2.8T  915G  1.9T  33% /var/lib/ceph/osd/ceph-65
>>>>> root@ceph11:~#
>>>>>
>>>>> root@ceph11:~# lsb_release -a
>>>>> No LSB modules are available.
>>>>> Distributor ID:    Ubuntu
>>>>> Description:    Ubuntu 14.04.3 LTS
>>>>> Release:    14.04
>>>>> Codename:    trusty
>>>>> root@ceph11:~# uname -r
>>>>> 3.19.0-30-generic
>>>>> root@ceph11:~#
>>>>>
>>>>> "apply_latency": {
>>>>>       "avgcount": 2985023,
>>>>>       "sum": 226219.891559000
>>>>> }
>>>>>
>>>>> What did we notice?
>>>>> - Less spikes on the disk
>>>>> - Lower commit latencies on the OSDs
>>>>> - Almost no 'slow requests' during backfills
>>>>> - Cache-hit ratio of about 60%
>>>>>
>>>>> Max backfills and recovery active are both set to 1 on all OSDs.
>>>>>
>>>>> For the next generation hardware we are looking into using 3U 
>>>>> chassis with 16 4TB SATA drives and a 1.2TB NVM-E SSD for bcache, 
>>>>> but we haven't tested those yet, so nothing to say about it.
>>>>>
>>>>> The current setup is 200GB of cache for 18TB of disks. The new 
>>>>> setup will be 1200GB for 64TB, curious to see what that does.
>>>>>
>>>>> Our main conclusion however is that it does smoothen the 
>>>>> I/O-pattern towards the disks and that gives a overall better 
>>>>> response of the disks.
>>>>
>>>> Hi Wido, thanks for the big writeup!  Did you guys happen to do any 
>>>> benchmarking?  I think Xiaoxi looked at flashcache a while back but 
>>>> had mixed results if I remember right.  It would be interesting to 
>>>> know how bcache is affecting performance in different scenarios.
>>>>
>>>
>>> No, we didn't do any benchmarking. Initially this cluster was build 
>>> for just the RADOS Gateway, so we went for 2Gbit (2x 1Gbit) per 
>>> machine. 90% is still Gbit networking and we are in the process of 
>>> upgrading it all to 10Gbit.
>>>
>>> Since the 1Gbit network latency is about 4 times higher then 10Gbit 
>>> we aren't really benchmarking the cluster.
>>>
>>> What counts for us most is that we can do recovery operations without 
>>> any slow requests.
>>>
>>> Before bcache we saw disks spike to 100% busy while a backfill was busy.
>>> Now bcache smoothens this and we see peaks of maybe 70%, but that's it.
>>
>> In the testing I was doing to figure out our new lab hardware, I was 
>> seeing SSDs handle recovery dramatically better than spinning disks as 
>> well during cephtestrados runs.  It might be worth digging in to see 
>> what the IO patterns look like.  In the mean time though, it's very 
>> interesting that bcache helps in this case so much.  Good to know!
>>
> 
> To add to this. We still had to enable hashpspools on a few pools, so we did. 
> The degradation went to 39% on the cluster and it has been recovering for 
> over 48 hours now.
> 
> Not a single slow request while we had the OSD complaint time set to 5 
> seconds. After setting this to 0.5 seconds we saw some slow requests, but 
> nothing dramatic.
> 
> For us bcache works really great with spinning disks.
> 
> Wido
> 
>>>
>>>> Thanks,
>>>> Mark
>>>>
>>>>>
>>>>> Wido
>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@lists.ceph.com
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to