Re: [qemu-web PATCH] Add a blog post about FUSE block exports

Hanna Reitz Fri, 20 Aug 2021 02:05:06 -0700

On 19.08.21 20:22, Klaus Kiwi wrote:

On Thu, Aug 19, 2021 at 7:27 AM Hanna Reitz <hre...@redhat.com<mailto:hre...@redhat.com>> wrote:


    This post explains when FUSE block exports are useful, how they work,
    and that it is fun to export an image file on its own path so it looks
    like your image file (in whatever format it was) is a raw image now.


Thanks Hanna, great work. Even if you explained this to me multiple times,
thanks to this I think I now finally understand *how* it works.


Oops, sorry for forgetting to CC you...

    Signed-off-by: Hanna Reitz <hre...@redhat.com
    <mailto:hre...@redhat.com>>
    ---
    You can also find this patch here:
    https://gitlab.com/hreitz/qemu-web
    <https://gitlab.com/hreitz/qemu-web> fuse-blkexport-v1

    My first patch to qemu-web, so I hope I am not doing anything overly
    stupid here (adding SVGs with extremely long lines comes to mind)...
    ---
     _posts/2021-08-18-fuse-blkexport.md       | 488
    ++++++++++++++++++++++
     screenshots/2021-08-18-block-graph-a.svg  |   2 +
     screenshots/2021-08-18-block-graph-b.svg  |   2 +
     screenshots/2021-08-18-block-graph-c.svg  |   2 +
     screenshots/2021-08-18-block-graph-d.svg  |   2 +
     screenshots/2021-08-18-block-graph-e.svg  |   2 +
     screenshots/2021-08-18-root-directory.svg |   2 +
     screenshots/2021-08-18-root-file.svg      |   2 +
     8 files changed, 502 insertions(+)
     create mode 100644 _posts/2021-08-18-fuse-blkexport.md
     create mode 100644 screenshots/2021-08-18-block-graph-a.svg
     create mode 100644 screenshots/2021-08-18-block-graph-b.svg
     create mode 100644 screenshots/2021-08-18-block-graph-c.svg
     create mode 100644 screenshots/2021-08-18-block-graph-d.svg
     create mode 100644 screenshots/2021-08-18-block-graph-e.svg
     create mode 100644 screenshots/2021-08-18-root-directory.svg
     create mode 100644 screenshots/2021-08-18-root-file.svg

    diff --git a/_posts/2021-08-18-fuse-blkexport.md
    b/_posts/2021-08-18-fuse-blkexport.md
    new file mode 100644
    index 0000000..e6a55d0
    --- /dev/null
    +++ b/_posts/2021-08-18-fuse-blkexport.md
    @@ -0,0 +1,488 @@
    +---
    +layout: post
    +title:  "Exporting block devices as raw image files with FUSE"
    +date:   2021-08-18 18:00:00 +0200
    +author: Hanna Reitz
    +categories: [storage, features, tutorials]

Non-fatal, but I feel that the title doesn't summarize all that this'blog posts is about.An alternate suggestion might be in the lines of "A look into QEMU'sFUSE export

feature, and how to use it to manipulate guest images".

Hmm, I don’t know. The feature itself doesn’t really allow you tomanipulate guest images, it only provides a translation layer so thatother tools can do it. I can definitely replace “Exporting blockdevices” by “Presenting guest images”, but I’m not sure I want to gomuch further, actually.

    +---
    +Sometimes, there is a VM disk image whose contents you want to
    manipulate
    +without booting the VM.  For raw images, that process is usually
    fairly simple,
    +because most Linux systems bring tools for the job, e.g.:
    +* *dd* to just copy data to and from given offsets,
    +* *parted* to manipulate the partition table,
    +* *kpartx* to present all partitions as block devices,
    +* *mount* to access filesystems’ contents.
    +
    +Sadly, but naturally, such tools only work for raw images, and
    not for images
    +e.g. in QEMU’s qcow2 format.  To access such an image’s content,
    the format has
    +to be translated to create a raw image, for example by:
    +* Exporting the image file with `qemu-nbd -c` as an NBD block
    device file,
    +* Converting between image formats using `qemu-img convert`,
    +* Accessing the image from a guest, where it appears as a normal
    block device.
    +

Guessing that this would be the best place to mentionguestmount/libguestfs, as Stefan

mentioned in another reply to this thread?


Yes, probably replacing the “Accessing the image from a guest” point.

Bonus points if you can identify (dis)advantages, similarly to thatyou did below

with the other methods.

    +Unfortunately, none of these methods is perfect: `qemu-nbd -c`
    generally
    +requires root rights, converting to a temporary raw copy requires
    additional
    +disk space and the conversion process takes time, and accessing
    the image from a
    +guest is just quite cumbersome in general (and also specifically
    something that
    +we set out to avoid in the first sentence of this blog post).
    +
    +As of QEMU 6.0, there is another method, namely FUSE block exports.
    +Conceptually, these are rather similar to using `qemu-nbd -c`,
    but they do not
    +require root rights.
    +
    +**Note**: FUSE block exports are a feature that can be enabled or
    disabled
    +during the build process with `--enable-fuse` or
    `--disable-fuse`, respectively;
    +omitting either configure option will enable the feature if and
    only if libfuse3
    +is present.  It is possible that the QEMU build you are using
    does not have FUSE
    +block export support, because it was not compiled in.
    +
    +FUSE (*Filesystem in Userspace*) is a technology to let userspace
    processes
    +provide filesystem drivers.  For example, *sshfs* is a program
    that allows
    +mounting remote directories from a machine accessible via SSH.
    +

Nitpicking but maybe FUSE here could link to anothertutorial/wikipedia page

with more info?

The best I could do is link to Wikipedia, I suppose, but would thatreally be helpful? I think this post itself kind of provides an introinto what FUSE is.

    +QEMU can use FUSE to make a virtual block device appear as a
    normal file on the
    +host, so that tools like *kpartx* can interact with it regardless
    of the image
    +format.
    +
    +## Background information
    +
    +### File mounts
I must confess that, as I've gone through the document, this felt abit like breakingthe flow (probably due to my pre-conceptions of always mounting aresource intosome directory to see it's content, which I guess was what I wasexpecting this
would go before talking about mounting files).
I understand now, however, that this introduction is necessary, butperhapssomething like "Before we are able to use QEMU's FUSE exports, we needto clarifysome fundamental concepts on the VFS and mountpoints: It is alittle-known fact
that <...>" would help me understand the flow better here.


Oh, sure!

    +A perhaps little-known fact is that, on Linux, filesystems do not
    need to have
    +a root directory, they only need to have a root node.  A
    filesystem that only
    +provides a single regular file is perfectly valid.
    +
    +Conceptually, every filesystem is a tree, and mounting works by
    replacing one
    +subtree of the global VFS tree by the mounted filesystem’s tree. 
    Normally, a
    +filesystem’s root node is a directory, like in the following example:
    +
    +|![Regular filesystem: Root directory is mounted to a directory
    mount point](/screenshots/2021-08-18-root-directory.svg)|
    +|:--:|
    +|*Fig. 1: Mounting a regular filesystem with a directory as its
    root node*|
    +
    +Here, the directory `/foo` and its content (the files `/foo/a`
    and `/foo/b`) are
    +shadowed by the new filesystem (showing `/foo/x` and `/foo/y`).
    +

Must confess that I wish there were a better term for it than'shadowed directory'or 'shadowed file', avoiding potential confusion with things like/etc/shadow or

'shadow memory'.. But I couldn't think if any.

    +Note that a filesystem’s root node generally has no name. After
    mounting, the
    +filesystem’s root directory’s name is determined by the original
    name of the
    +mount point.
    +
    +Because a tree does not need to have multiple nodes but may
    consist of just a
    +single leaf, a filesystem with a file for its root node works
    just as well,
    +though:
    +
    +|![Mounting a file root node to a regular file mount
    point](/screenshots/2021-08-18-root-file.svg)|
    +|:--:|
    +|*Fig. 2: Mounting a filesystem with a regular (unnamed) file as
    its root node*|
    +
    +Here, FS B only consists of a single node, a regular file with no
    name.  (As
    +above, a filesystem’s root node is generally unnamed.)
    Consequently, the mount
    +point for it must also be a regular file (`/foo/a` in our
    example), and just
    +like before, the content of `/foo/a` is shadowed, and when
    opening it, one will
    +instead see the contents of FS B’s unnamed root node.
    +
    +### QEMU block exports
    +
    +QEMU allows exporting block nodes via various protocols (as of
    6.0: NBD,
    +vhost-user, FUSE).  A block node is an element of QEMU’s block
    graph (see e.g.
    +[Managing the New Block
    
Layer](http://events17.linuxfoundation.org/sites/events/files/slides/talk\_11.pdf
    
<http://events17.linuxfoundation.org/sites/events/files/slides/talk%5C_11.pdf>),
    +a talk given at KVM Forum 2017), which can for example be
    attached to guest
    +devices.  Here is a very simple example:
    +
    +|![Block graph: image file <-> file node (label: prot-node) <->
    qcow2 node (label: fmt-node) <-> virtio-blk guest
    device](/screenshots/2021-08-18-block-graph-a.svg)|
    +|:--:|
    +|*Fig. 3: A simple block graph for attaching a qcow2 image to a
    virtio-blk guest device*|
    +
    +This is the simplest example for a block graph that connects a
    *virtio-blk*
    +guest device to a qcow2 image file.  The *file* block driver,
    instanced in the
    +form of a block node named *prot-node*, accesses the actual file
    and provides
    +the node above it access to the raw content.  This node above,
    named *fmt-node*,
    +is handled by the *qcow2* block driver, which is capable of
    interpreting the
    +qcow2 format.  Parents of this node will therefore see the actual
    content of the
    +virtual disk that is represented by the qcow2 image.  There is
    only one parent
    +here, which is the *virtio-blk* guest device, which will thus see
    the virtual
    +disk.
    +
    +The command line to achieve the above could look something like this:
    +```
    +$ qemu-system-x86_64 \
    +    -blockdev node-name=prot-node,driver=file,filename=$image_path \
    +    -blockdev node-name=fmt-node,driver=qcow2,file=prot-node \
    +    -device virtio-blk,drive=fmt-node
    +```
    +
    +Besides attaching guest devices to block nodes, you can also
    export them for
    +users outside of qemu, for example via NBD.  Say you have a QMP
    channel open for
    +the QEMU instance above, then you could do this:

As much as I hate to say it, wouldn't it be better to give the examplebelow using(legacy?) qemu monitor commands, instead of JSON? Unless it cannot bedone that way

of course, they feel more intuitive/recognizable to me I think.

nbd_server_start exists as an HMP command, but there’s no directequivalent of block-export-add. We do have nbd_server_add, but of noteis that the nbd-server-add QMP command is deprecated.

In any case, I prefer using the JSON QMP commands here, because they mapdirectly to the storage daemon’s command line (--nbd-server and --export).

If this is too confusing, then I’d rather jump directly to the storagedaemon; but I feel like there’s value in showing that block exports workin the system emulator, too.


    +```json
    +{
    +    "execute": "nbd-server-start",
    +    "arguments": {
    +        "addr": {
    +            "type": "inet",
    +            "data": {
    +                "host": "localhost",
    +                "port": "10809"
    +            }
    +        }
    +    }
    +}
    +{
    +    "execute": "block-export-add",
    +    "arguments": {
    +        "type": "nbd",
    +        "id": "fmt-node-export",
    +        "node-name": "fmt-node",
    +        "name": "guest-disk"
    +    }
    +}
    +```


[...]

The rest of it is very didactic and educational - thanks! And sincenone of my comments are critical:Reviewed-by: Klaus Heinrich Kiwi <kk...@redhat.com<mailto:kk...@redhat.com>>


Thanks!

Hanna

Re: [qemu-web PATCH] Add a blog post about FUSE block exports

Reply via email to