On Fri, 21 Jun 2013, Alex Bligh wrote: > Sage, > > --On 20 June 2013 08:58:19 -0700 Sage Weil <s...@inktank.com> wrote: > > > > I'd like to hear from Ceph folks what their position on kernel rbd vs > > > librados is. Why one do they recommend for QEMU guests and what are the > > > pros/cons? > > > > I agree that a flashcache/bcache-like persistent cache would be a big win > > for qemu + rbd users. > > Great. > > I think Stefan was really after testing my received wisdom that > ceph+librbd will be greater performance than ceph+blkdev+kernelrbd > (even without the persistent cache), and if so why.
Oh, right. At this point the performance differential is strictly related to the cache behavior. If there were feature parity, I would not expect any significant difference. There may be a net difference of a data copy, but I'm not sure it will be significant. > > There are few important issues with librbd vs kernel rbd: > > > > * librbd tends to get new features more quickly that the kernel rbd > > (although now that layering has landed in 3.10 this will be less > > painful than it was). > > > > * Using kernel rbd means users need bleeding edge kernels, a non-starter > > for many orgs that are still running things like RHEL. Bug fixes are > > difficult to roll out, etc. > > > > * librbd has an in-memory cache that behaves similar to an HDD's cache > > (e.g., it forces writeback on flush). This improves performance > > significantly for many workloads. Of course, having a bcache-like > > layer mitigates this.. > > > > I'm not really sure what the best path forward is. Putting the > > functionality in qemu would benefit lots of other storage backends, > > putting it in librbd would capture various other librbd users (xen, tgt, > > and future users like hyper-v), and using new kernels works today but > > creates a lot of friction for operations. > > To be honest I'd not even thought of putting it in librbd (which might > be simpler). I suspect it might be easier to get patches into librbd > than into qemu, and that ensuring cache coherency might be simpler. > If I get time to look at this, would you be interested in taking patches > for this? Certainly! It will be a bit tricky to integrate is a lightweight way, however, so I would be sure to sketch out a design before diving too far into the coding. I suspect the best path forward would be to extend the ObjectCacher. This has the added benefit that ceph-fuse and libcephfs could benefit at well. The dev summit for the next release (emperor) is coming up in a few weeks.. this would be a great project to submit a blueprint for so we can discuss it then. sage