Re: [ceph-users] Local SSD cache for ceph on each compute node.

Daniel Niasoff Sun, 27 Mar 2016 01:32:28 -0700

Thanks

Wouldn't it be amazing to puta 2TB NVMe card in each compute node, make 1 
config change and presto! Users see a 10 fold increase in performance :) with 
95% reads going to cache and all writes being acknowledged after being written 
on cache. For writes you might want dual NVMe in a raid 1 so you're fully 
covered.




-----Original Message-----
From: Ric Wheeler [mailto:rwhee...@redhat.com] 
Sent: 27 March 2016 09:27
To: Daniel Niasoff <dan...@redactus.co.uk>; Van Leeuwen, Robert 
<rovanleeu...@ebay.com>; Jason Dillaman <dilla...@redhat.com>
Cc: ceph-users@lists.ceph.com; Mike Snitzer <snit...@redhat.com>; Joe Thornber 
<thorn...@redhat.com>
Subject: Re: [ceph-users] Local SSD cache for ceph on each compute node.

On 03/27/2016 11:13 AM, Daniel Niasoff wrote:
> Hi Ric,
>
> But you would still have to set a dm-cache per rbd volume which makes it 
> difficult to manage.
>
> There needs to be a global setting either within kvm or ceph that caches 
> reads/writes before they hit the rbd the device.
>
> Thanks
>
> Daniel

Correct, it is per block device - effectively it is a layer on top of the rbd 
device if you want to set up a caching layer like this.

As you mention, you can cache at other layers of the system as well.

How difficult that is to manage and assemble depends on tooling. I don't see 
doing it in kvm as really easier than doing it under kvm, but I am a big 
believer in the need for much better tools to help manage things like this so 
that users don't see the complexity.

Ric

>
> -----Original Message-----
> From: Ric Wheeler [mailto:rwhee...@redhat.com]
> Sent: 27 March 2016 09:00
> To: Van Leeuwen, Robert <rovanleeu...@ebay.com>; Daniel Niasoff 
> <dan...@redactus.co.uk>; Jason Dillaman <dilla...@redhat.com>
> Cc: ceph-users@lists.ceph.com; Mike Snitzer <snit...@redhat.com>; Joe 
> Thornber <thorn...@redhat.com>
> Subject: Re: [ceph-users] Local SSD cache for ceph on each compute node.
>
> On 03/16/2016 12:15 PM, Van Leeuwen, Robert wrote:
>>> My understanding of how a writeback cache should work is that it should 
>>> only take a few seconds for writes to be streamed onto the network and is 
>>> focussed on resolving the speed issue of small sync writes. The writes 
>>> would be bundled into larger writes that are not time sensitive.
>>>
>>> So there is potential for a few seconds data loss but compared to the 
>>> current trend of using ephemeral storage to solve this issue, it's a major 
>>> improvement.
>> It think is a bit worse then just a few seconds of data:
>> As mentioned in the blueprint for ceph you would need some kind or ordered 
>> write-back cache that maintains checkpoints internally.
>>
>> I am not that familiar with the internals of dm-cache but I do not think it 
>> guarantees any write order.
>> E.g. By default it will bypass the cache for sequential IO.
>>
>> So I think it is very likely the “few seconds of data loss" in this case 
>> means the filesystem is corrupt and you could lose the whole thing.
>> At the very least you will need to run fsck on it and hope it can sort out 
>> all of the errors with minimal data loss.
>>
>>
>> So, for me, it seems conflicting to me to use persistent storage and then 
>> hoping your volumes survive a power outage.
>>
>> If you can survive missing that data you are probably better of running 
>> fully from ephemeral storage in the first place.
>>
>> Cheers,
>> Robert van Leeuwen
> Hi Robert,
>
> I might be misunderstanding your point above, but dm-cache provides 
> persistent storage. It will be there when you reboot and look for data on 
> that same box.
> dm-cache is also power failure safe and tested to survive this kind of outage.
>
> If you try to look at the rbd device under dm-cache from another host, of 
> course any data that was cached on the dm-cache layer will be missing since 
> the dm-cache device itself is local to the host you wrote the data from 
> originally.
>
> In a similar way, using dm-cache for write caching (or any write cache local 
> to a client) will also mean that your data has a single point of failure 
> since that data will not be replicated out to the backing store until it is 
> destaged from cache.
>
> I would note that this is exactly the kind of write cache that is popular 
> these days in front of enterprise storage arrays on clients so this is not 
> really uncommon.
>
> Regards,
>
> Ric
>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Local SSD cache for ceph on each compute node.

Reply via email to