On 19.08.21 20:22, Klaus Kiwi wrote:
On Thu, Aug 19, 2021 at 7:27 AM Hanna Reitz <hre...@redhat.com
<mailto:hre...@redhat.com>> wrote:
This post explains when FUSE block exports are useful, how they work,
and that it is fun to export an image file on its own path so it looks
like your image file (in whatever format it was) is a raw image now.
Thanks Hanna, great work. Even if you explained this to me multiple times,
thanks to this I think I now finally understand *how* it works.
Oops, sorry for forgetting to CC you...
Signed-off-by: Hanna Reitz <hre...@redhat.com
<mailto:hre...@redhat.com>>
---
You can also find this patch here:
https://gitlab.com/hreitz/qemu-web
<https://gitlab.com/hreitz/qemu-web> fuse-blkexport-v1
My first patch to qemu-web, so I hope I am not doing anything overly
stupid here (adding SVGs with extremely long lines comes to mind)...
---
_posts/2021-08-18-fuse-blkexport.md | 488
++++++++++++++++++++++
screenshots/2021-08-18-block-graph-a.svg | 2 +
screenshots/2021-08-18-block-graph-b.svg | 2 +
screenshots/2021-08-18-block-graph-c.svg | 2 +
screenshots/2021-08-18-block-graph-d.svg | 2 +
screenshots/2021-08-18-block-graph-e.svg | 2 +
screenshots/2021-08-18-root-directory.svg | 2 +
screenshots/2021-08-18-root-file.svg | 2 +
8 files changed, 502 insertions(+)
create mode 100644 _posts/2021-08-18-fuse-blkexport.md
create mode 100644 screenshots/2021-08-18-block-graph-a.svg
create mode 100644 screenshots/2021-08-18-block-graph-b.svg
create mode 100644 screenshots/2021-08-18-block-graph-c.svg
create mode 100644 screenshots/2021-08-18-block-graph-d.svg
create mode 100644 screenshots/2021-08-18-block-graph-e.svg
create mode 100644 screenshots/2021-08-18-root-directory.svg
create mode 100644 screenshots/2021-08-18-root-file.svg
diff --git a/_posts/2021-08-18-fuse-blkexport.md
b/_posts/2021-08-18-fuse-blkexport.md
new file mode 100644
index 0000000..e6a55d0
--- /dev/null
+++ b/_posts/2021-08-18-fuse-blkexport.md
@@ -0,0 +1,488 @@
+---
+layout: post
+title: "Exporting block devices as raw image files with FUSE"
+date: 2021-08-18 18:00:00 +0200
+author: Hanna Reitz
+categories: [storage, features, tutorials]
Non-fatal, but I feel that the title doesn't summarize all that this'
blog posts is about.
An alternate suggestion might be in the lines of "A look into QEMU's
FUSE export
feature, and how to use it to manipulate guest images".
Hmm, I don’t know. The feature itself doesn’t really allow you to
manipulate guest images, it only provides a translation layer so that
other tools can do it. I can definitely replace “Exporting block
devices” by “Presenting guest images”, but I’m not sure I want to go
much further, actually.
+---
+Sometimes, there is a VM disk image whose contents you want to
manipulate
+without booting the VM. For raw images, that process is usually
fairly simple,
+because most Linux systems bring tools for the job, e.g.:
+* *dd* to just copy data to and from given offsets,
+* *parted* to manipulate the partition table,
+* *kpartx* to present all partitions as block devices,
+* *mount* to access filesystems’ contents.
+
+Sadly, but naturally, such tools only work for raw images, and
not for images
+e.g. in QEMU’s qcow2 format. To access such an image’s content,
the format has
+to be translated to create a raw image, for example by:
+* Exporting the image file with `qemu-nbd -c` as an NBD block
device file,
+* Converting between image formats using `qemu-img convert`,
+* Accessing the image from a guest, where it appears as a normal
block device.
+
Guessing that this would be the best place to mention
guestmount/libguestfs, as Stefan
mentioned in another reply to this thread?
Yes, probably replacing the “Accessing the image from a guest” point.
Bonus points if you can identify (dis)advantages, similarly to that
you did below
with the other methods.
+Unfortunately, none of these methods is perfect: `qemu-nbd -c`
generally
+requires root rights, converting to a temporary raw copy requires
additional
+disk space and the conversion process takes time, and accessing
the image from a
+guest is just quite cumbersome in general (and also specifically
something that
+we set out to avoid in the first sentence of this blog post).
+
+As of QEMU 6.0, there is another method, namely FUSE block exports.
+Conceptually, these are rather similar to using `qemu-nbd -c`,
but they do not
+require root rights.
+
+**Note**: FUSE block exports are a feature that can be enabled or
disabled
+during the build process with `--enable-fuse` or
`--disable-fuse`, respectively;
+omitting either configure option will enable the feature if and
only if libfuse3
+is present. It is possible that the QEMU build you are using
does not have FUSE
+block export support, because it was not compiled in.
+
+FUSE (*Filesystem in Userspace*) is a technology to let userspace
processes
+provide filesystem drivers. For example, *sshfs* is a program
that allows
+mounting remote directories from a machine accessible via SSH.
+
Nitpicking but maybe FUSE here could link to another
tutorial/wikipedia page
with more info?
The best I could do is link to Wikipedia, I suppose, but would that
really be helpful? I think this post itself kind of provides an intro
into what FUSE is.
+QEMU can use FUSE to make a virtual block device appear as a
normal file on the
+host, so that tools like *kpartx* can interact with it regardless
of the image
+format.
+
+## Background information
+
+### File mounts
I must confess that, as I've gone through the document, this felt a
bit like breaking
the flow (probably due to my pre-conceptions of always mounting a
resource into
some directory to see it's content, which I guess was what I was
expecting this
would go before talking about mounting files).
I understand now, however, that this introduction is necessary, but
perhaps
something like "Before we are able to use QEMU's FUSE exports, we need
to clarify
some fundamental concepts on the VFS and mountpoints: It is a
little-known fact
that <...>" would help me understand the flow better here.
Oh, sure!
+A perhaps little-known fact is that, on Linux, filesystems do not
need to have
+a root directory, they only need to have a root node. A
filesystem that only
+provides a single regular file is perfectly valid.
+
+Conceptually, every filesystem is a tree, and mounting works by
replacing one
+subtree of the global VFS tree by the mounted filesystem’s tree.
Normally, a
+filesystem’s root node is a directory, like in the following example:
+
+||
+|:--:|
+|*Fig. 1: Mounting a regular filesystem with a directory as its
root node*|
+
+Here, the directory `/foo` and its content (the files `/foo/a`
and `/foo/b`) are
+shadowed by the new filesystem (showing `/foo/x` and `/foo/y`).
+
Must confess that I wish there were a better term for it than
'shadowed directory'
or 'shadowed file', avoiding potential confusion with things like
/etc/shadow or
'shadow memory'.. But I couldn't think if any.
+Note that a filesystem’s root node generally has no name. After
mounting, the
+filesystem’s root directory’s name is determined by the original
name of the
+mount point.
+
+Because a tree does not need to have multiple nodes but may
consist of just a
+single leaf, a filesystem with a file for its root node works
just as well,
+though:
+
+||
+|:--:|
+|*Fig. 2: Mounting a filesystem with a regular (unnamed) file as
its root node*|
+
+Here, FS B only consists of a single node, a regular file with no
name. (As
+above, a filesystem’s root node is generally unnamed.)
Consequently, the mount
+point for it must also be a regular file (`/foo/a` in our
example), and just
+like before, the content of `/foo/a` is shadowed, and when
opening it, one will
+instead see the contents of FS B’s unnamed root node.
+
+### QEMU block exports
+
+QEMU allows exporting block nodes via various protocols (as of
6.0: NBD,
+vhost-user, FUSE). A block node is an element of QEMU’s block
graph (see e.g.
+[Managing the New Block
Layer](http://events17.linuxfoundation.org/sites/events/files/slides/talk\_11.pdf
<http://events17.linuxfoundation.org/sites/events/files/slides/talk%5C_11.pdf>),
+a talk given at KVM Forum 2017), which can for example be
attached to guest
+devices. Here is a very simple example:
+
+||
+|:--:|
+|*Fig. 3: A simple block graph for attaching a qcow2 image to a
virtio-blk guest device*|
+
+This is the simplest example for a block graph that connects a
*virtio-blk*
+guest device to a qcow2 image file. The *file* block driver,
instanced in the
+form of a block node named *prot-node*, accesses the actual file
and provides
+the node above it access to the raw content. This node above,
named *fmt-node*,
+is handled by the *qcow2* block driver, which is capable of
interpreting the
+qcow2 format. Parents of this node will therefore see the actual
content of the
+virtual disk that is represented by the qcow2 image. There is
only one parent
+here, which is the *virtio-blk* guest device, which will thus see
the virtual
+disk.
+
+The command line to achieve the above could look something like this:
+```
+$ qemu-system-x86_64 \
+ -blockdev node-name=prot-node,driver=file,filename=$image_path \
+ -blockdev node-name=fmt-node,driver=qcow2,file=prot-node \
+ -device virtio-blk,drive=fmt-node
+```
+
+Besides attaching guest devices to block nodes, you can also
export them for
+users outside of qemu, for example via NBD. Say you have a QMP
channel open for
+the QEMU instance above, then you could do this:
As much as I hate to say it, wouldn't it be better to give the example
below using
(legacy?) qemu monitor commands, instead of JSON? Unless it cannot be
done that way
of course, they feel more intuitive/recognizable to me I think.
nbd_server_start exists as an HMP command, but there’s no direct
equivalent of block-export-add. We do have nbd_server_add, but of note
is that the nbd-server-add QMP command is deprecated.
In any case, I prefer using the JSON QMP commands here, because they map
directly to the storage daemon’s command line (--nbd-server and --export).
If this is too confusing, then I’d rather jump directly to the storage
daemon; but I feel like there’s value in showing that block exports work
in the system emulator, too.
+```json
+{
+ "execute": "nbd-server-start",
+ "arguments": {
+ "addr": {
+ "type": "inet",
+ "data": {
+ "host": "localhost",
+ "port": "10809"
+ }
+ }
+ }
+}
+{
+ "execute": "block-export-add",
+ "arguments": {
+ "type": "nbd",
+ "id": "fmt-node-export",
+ "node-name": "fmt-node",
+ "name": "guest-disk"
+ }
+}
+```
[...]
The rest of it is very didactic and educational - thanks! And since
none of my comments are critical:
Reviewed-by: Klaus Heinrich Kiwi <kk...@redhat.com
<mailto:kk...@redhat.com>>
Thanks!
Hanna