Hello,

On Sun, 22 Jun 2014 23:27:01 -0700 Greg Poirier wrote:

> How does RBD cache work? I wasn't able to find an adequate explanation in
> the docs.
>
The mailing list (archive) is your friend, I asked pretty much the same
question in January.

In short it mimics the cache on a typical hard disk, in a similar size to
those in it's default settings and with the same gotchas (need to flush it
at the right times, which any non-acient OS will do).

However keep reading below.
 
> On Sunday, June 22, 2014, Mark Kirkwood <mark.kirkw...@catalyst.net.nz>
> wrote:
> 
> > Good point, I had neglected to do that.
> >
> > So, amending my conf.conf [1]:
> >
> > [client]
> > rbd cache = true
> > rbd cache size = 2147483648

Any inktank engineer reading this, I'd really wish we could use K/M/G
instead of having to whip out a calculator each time when setting values
like these in ceph.

> > rbd cache max dirty = 1073741824
> > rbd cache max dirty age = 100
> >

Mark, you're giving it a 2GB cache. 
For a write test that's 1GB in size. 
"Aggressively set" is a bit of an understatement here. ^o^
Most people will not want to spend this much memory on write-only caching.

Of course with these settings that test will yield impressive results. 

However if you'd observe your storage nodes, OSDs, you will see that this
is still going to take the same time until it is actually, finally written
to disk. Same with using kernelspace RBD and caching enabled in the VM.
Doing similar tests with fio I managed to fill the cache and got fantastic
IOPS but then it took minutes to finally clean out. 

Resulting in hung task warnings for the jbd process(es) like this:
---
May 28 16:58:56 tvm-03 kernel: [  960.320182] INFO: task jbd2/vda1-8:153 blocked
 for more than 120 seconds.
May 28 16:58:56 tvm-03 kernel: [  960.320866] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
---

Now this doesn't actively break things AFAICT, but it left me feeling
quite uncomfortable nevertheless. 

Also what happens if something "bad" happens to the VM or it's host before
the cache is drained?

>From where I'm standing the RBD cache is fine for merging really small
writes and that's it.

Regards,

Christian
> > and also the VM's xml def to include cache to writeback:
> >
> >     <disk type='network' device='disk'>
> >       <driver name='qemu' type='raw' cache='writeback' io='native'/>
> >       <auth username='admin'>
> >         <secret type='ceph'
> > uuid='cd2d3ab1-2d31-41e0-ab08-3d0c6e2fafa0'/> </auth>
> >       <source protocol='rbd' name='rbd/vol1'>
> >         <host name='192.168.1.64' port='6789'/>
> >       </source>
> >       <target dev='vdb' bus='virtio'/>
> >       <address type='pci' domain='0x0000' bus='0x00' slot='0x07'
> > function='0x0'/>
> >     </disk>
> >
> > Retesting from inside the VM:
> >
> > $ dd if=/dev/zero of=/mnt/vol1/scratch/file bs=16k count=65535
> > oflag=direct 65535+0 records in
> > 65535+0 records out
> > 1073725440 bytes (1.1 GB) copied, 8.1686 s, 131 MB/s
> >
> > Which is much better, so certainly for the librbd case enabling the rbd
> > cache seems to nail this particular issue.
> >
> > Regards
> >
> > Mark
> >
> > [1] possibly somewhat agressively set, but at least a noticeable
> > difference :-)
> >
> > On 22/06/14 19:02, Haomai Wang wrote:
> >
> >> Hi Mark,
> >>
> >> Do you enable rbdcache? I test on my ssd cluster(only one ssd), it
> >> seemed ok.
> >>
> >>  dd if=/dev/zero of=test bs=16k count=65536 oflag=direct
> >>>
> >>
> >> 82.3MB/s
> >>
> >>
> >>
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >


-- 
Christian Balzer        Network/Systems Engineer                
ch...@gol.com           Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to