> -----Original Message-----
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Gregory Farnum
> Sent: 29 March 2016 18:52
> To: Nick Fisk <n...@fisk.me.uk>
> Cc: Ceph Users <ceph-users@lists.ceph.com>
> Subject: Re: [ceph-users] Redirect snapshot COW to alternative pool
> 
> On Sat, Mar 26, 2016 at 3:13 PM, Nick Fisk <n...@fisk.me.uk> wrote:
> >
> > Evening All,
> >
> > I’ve been testing the RBD snapshot functionality and one thing that I have
> seen is that once you take a snapshot of a RBD and perform small random IO
> on the original RBD, performance is really bad due to the amount of write
> amplification going on doing the COW’s. ie every IO to the parent no matter
> what size, equals 12MB of writes.
> >
> > I was wondering if there was anyway to redirect these writes to a different
> pool. Since only a small capacity would be required, a SSD/NVME pool could
> be provisioned very cheaply and would hopefully provide enough
> performance to allow the IO operations to the parent to be unaffected.
> >
> > I’ve looked at the RBD layering, which sort of looks like you can do stuff 
> > like
> this and also change the order. But it looks like you have to base it on an
> existent snapshot, so I believe I would still have the same problem. Or is
> there a “hidden feature” to make normal snapshots use this layering
> functionality?
> >
> > Nick
> 

Thanks for your response Greg.

> This isn't quite making sense to me. When you do a snapshot, then as you
> say it's copy-on-write and every operation copies the data to new blocks
> (whole-object copies with XFS; mere local blocks with btrfs) inside of the
> OSD. 

I think this is where I see slow performance. If you are doing large IO, then 
copying 4MB objects (assuming defaults) is maybe only 2x times the original IO 
to the disk. However if you are doing smaller IO from what I can see a single 
4kb write would lead to a 4MB object being copied to the snapshot, with 3x 
replication, this could be amplification in the thousands. Is my understanding 
correct here, it's certainly what I see?

>With RBD layering, you do whole-object copy-on-write from the client.
> Doing it from the client does let you put "child" images inside of a faster 
> pool,
> yes. But creating new objects doesn't make the *old* ones slow, so why do
> you think there's still the same problem? (Other than "the pool is faster"
> being perhaps too optimistic about the improvement you'd get under this
> workload.) 

From reading the RBD layering docs it looked like you could also specify a 
different object size for the target. If there was some way that the snapshot 
could have a different object size or some sort of dirty bitmap, then this 
would reduce the amount of data that would have to copied on each write.

What I meant about it slowing down the pool, is due to the extra 4MB copy 
writes, the max small IO you can do is dramatically reduced, as each small IO 
is now a 4MB IO. By shifting the COW to a different pool you could reduce the 
load on the primary pool and effect on primary workloads. You are effectively 
shifting this snapshot "tax" onto an isolated set of disks/SSD's.

To give it some context in what I am trying to achieve here is the background. 
We are currently migrating our OLB service from LVM thinpools to Ceph. As part 
of the service we offer, we take regular archive backups to tape and also offer 
DR tests. Both of these require snapshots to allow the normal OLB backups to 
continue uninterrupted and for these snapshots to potentially be left open for 
several days at time. As its OLB, as you can imagine, there is a lot of write 
IO.

Currently with LVM, although there is a slight performance hit, the block size 
in LVM roughly matches the average IO size (128-512KB) and so the COW process 
doesn't cause much overhead. When I did some quick FIO tests with Ceph it 
seemed to have a much greater knock on effect when using 4MB object RBD's.

We can probably work around this by having a cluster with more disks, or 
reducing the RBD object size, but I thought it was worth asking in case there 
was any other way round it,

Nick

>There's definitely nothing integrated into the Ceph codebase
> about internal layering, or a way to redirect snapshots outside of the OSD,
> though you could always experiment with flashcache et al.
> -Greg
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to