Re: [Qemu-devel] Block layer roadmap on wiki

Stefan Hajnoczi Mon, 22 Aug 2011 10:59:24 -0700

On Mon, Aug 22, 2011 at 09:27:12AM -0500, Ryan Harper wrote:
> * Stefan Hajnoczi <stefa...@gmail.com> [2011-08-22 08:35]:
> > At KVM Forum Kevin, Christoph, and I had an opportunity to get
> > together for a Block Layer BoF.  We went through the recent "roadmap"
> > mailing list thread and touched on each proposed feature.
> > 
> > Here is the block layer roadmap wiki page:
> > http://wiki.qemu.org/BlockRoadmap
> > 
> > Kevin: I have moved the runtime WCE toggling to QEMU 1.0 since you
> > mentioned you want it for the next release.
> > 
> > My main take-away from the BoF was that integrating support for host
> > block devices and storage appliances will allow us to reduce the
> > amount of effort spent on image formats.  In order to make image
> > formats support the desired features and performance we end up
> > implementing much of the storage stack and file systems in userspace -
> > code that is duplicated and cannot take advantage of the existing
> > storage stack.
> 
> +1
> 
> > 
> > Storage management features are not just available in remote SAN and
> > NAS appliances anymore.  For local storage, btrfs has file-level
> > clones and thin-dev is significantly improving LVM snapshots.
> > 
> > Thin-dev is bringing a much more efficient and scalable snapshot model
> > to LVM.  This device-mapper feature will make LVM attractive for high
> > performance I/O without giving up snapshot and clone features.  It
> > also supports cloning off block devices that are not in the pool (e.g.
> > external storage, much like QEMU's backing files feature):
> > https://github.com/jthornber/linux-2.6/tree/thin-dev
> > 
> > This will not replace image formats overnight because image formats
> > are still widely used and will continue to be a useful for
> > transferring and sharing disk images.  But focussing on the larger
> 
> Any thoughts on how to make this easily usable for LVM?  If there were
> an export/import to/from file to LVM?  is that sufficient?  Anything
> like this in existence?


Forgot to mention a major advantage of a raw-oriented storage stack: we need
good support for raw + storage appliance anyway.  Users want to hook up their
SAN or NAS just like they can with other hypervisors.  Time spent on image
formats would be better spent fleshing out integration with LVM, btrfs, SAN,
NAS, and friends.

Back to import/export, it serves two purposes:
1. Efficient transport.  Uploading and downloading image files in a
   compact form that represents zero blocks efficiently and perhaps
   compresses data.
2. Compatibility with other hypervisors and external tools.  Here it's
   all about using a well-defined file format.

In order to pull off a raw-oriented storage stack we need to do
import/export well.  So this is an area where we have to focus.

Image streaming is a good approach for import because it allows the VM
to start instantly (even before the image is fully imported).  A
qemu-nbd process serves up image data and we stream into a logical
volume.

For export we can do a fuse file system that presents logical volumes as image
files.  That way existing applications can get at the data as if there were
real image files sitting on the file system.  Sequential read access is easy
for all formats, random read is more difficult but should be doable for most
formats (the exception would be stream compressed formats that are not designed
for random access).

So moving to a raw-oriented storage stack does not mean we get rid of
image formats.  We still need them but they are outside the critical I/O
path.  Their role is changed since we don't push features into the
formats anymore.

Side note: iSCSI vs NBD came up during the BoF.  Although NBD has not
seen maintenance or activity recently it's perfectly possible to build
on it.  The first feature we need is a flush command (so that NBD can do
non-O_DSYNC accesses for speed).  At that point we have a bare-bones
remote block protocol that can be used for migration and for connecting
up userspace image formats.  iSCSI is more complex and suited for
permanent storage, whereas NBD is simple but perhaps not a protocol we
want to access data over for a long period of time.

Stefan

Re: [Qemu-devel] Block layer roadmap on wiki

Reply via email to