[Qemu-devel] Minutes from the "Stuttgart block Gipfele"

Markus Armbruster Fri, 18 Dec 2015 05:17:06 -0800

Kevin, Max and I used an opportunity to meet and discuss block layer
matters.  We examined two topics in some depth: BlockBackend, and block
filters and dynamic reconfiguration.


Not nearly enough people to call it a block summit.  But the local
dialect is known for its use of diminutives, and "Gipfele" is the
diminutive of "summit" :)


= BlockBackend =

Background: BlockBackend (BB) was split off BlockDriverState (BDS) to
separate the block layer's external interface (BB) from its internal
building block (BDS).  Block layer clients such as device models and the
NBD server attach to a BB by BB name.  A BB has zero or one BDS (zero
means no medium).

Multiple device models using the same BB is dangerous, so we allow
attaching only one.  We don't currently enforce an "only one"
restriction for other clients.  This is problematic, because

* Different clients may want to configure the BB in conflicting ways,
  e.g. writeback caching mode (still to be moved from the BDS's
  enable_write_cache to the BB).

* When the BDS graph gets dynamically reconfigured, say when a block
  filter gets spliced in, clients that started out in the same spot may
  need to move differently.

Instead, each client should connect to its own BB.

This leads to the next question: how should this BB be created?

Initially, what is now the BB was mashed into the BDS.  In a way, the BB
got created along with the BDS.

The current code lets you create a BB along with a BDS when you need
one, or create a new BB for an existing BDS.  The BB has a name, and the
BDS may have a node-name.

The obvious low-level building blocks would be "create BB", "connect BB
to a BDS" (we have that as x-blockdev-insert-medium), "disconnect BB
from a BDS" (x-blockdev-remove-medium) and "destroy BB"
(x-blockdev-del).

Management applications probably don't mind having to work at this low
level, but for human users, it's cumbersome.  Perhaps the BB should be
created along with the client, at least optionally.

Means to create BBs separately are mostly useful when the BB needs to be
configured by the user: instead of duplicating the BB configuration
within each client, we keep it neatly separate.  We're not aware of
user-configurable knobs, though.

Currently, a client is configured to attach to a BB by specifying a BB
name.  For instance, a device model has a "drive" property that names a
BB.  If we create the BB automatically, we need client configuration to
name a BDS instead, i.e. we need a node-name instead of a BB name.

Of course, we'll have to keep the legacy configuration working.  The
"drive" property will have to refer to a BDS, like it did before BBs
were invented.  We could:

* Move the BB name back into the BDS.

* Move the BB name into DriveInfo, where the other legacy stuff lives.
  DriveInfo needs to be changed to hang off BDS rather than BB.

Regardless, dynamic reconfiguration may have to move the name / the
DriveInfo to a different BDS.

Not entirely sure automatic creation of BB is worthwhile or not.

Next steps:

* Support multiple BBs sharing the same BDS.

* Restrict BB to only one client of any kind instead of special-casing
  device models.

* Block jobs should go through BB.

* Investigate automatic creation of BB.


= Block filters =

We already have a few block filters:

* blkdebug, blkverify, quorum

Encryption should become another one.

Moreover, we have a few things mashed into BDS that should be filters:

* throttle (only at a root, i.e. right below a BB), copy-on-read,
  notifier (for backup block job), detect-zero

Dynamic reconfiguration means altering the BDS graph while it's in use.
Existing mutators:

* snaphot, mirror-complete, commit-complete, x-blockdev-change.

Things become interesting when nodes get implicitly inserted into the
graph, e.g.:

* A backup job inserts its notifier filter

* We create an implicit throttle filter to implement legacy throttling
  configuration

And so forth.  Nothing of the sort exists just yet.

What should happen when the user asks for a mutation at a place where we
have implicit filter(s)?

First, let's examine how such a chain could look like.  If we read the
current code correctly, it behaves as if we had a chain

        BB
         |
      throttle
         |
     detect-zero
         |
    copy-on-read
         |
        BDS

Except for the backup job, which behaves as if we had

               backup job
              /
      notifier
         |
     detect-zero
         |
        BDS

We believe that the following cleaned up filter stack should work:

        BB
         |
      throttle      \
         |           \
    copy-on-read      ) fixed at creation time
         |           /
     detect-zero    /
         |
         |     backup job
         |    /
      notifier      ) dynamically inserted by the job
         |
        BDS

Clients (device model, NBD server) connect through a BB on top.

Snapshot cuts in between the BDS and its implicit filters, like this:

        BB
         |
      throttle
         |
    copy-on-read
         |
     detect-zero
         |
       qcow2            \
         |  \            ) inserted by snapshot snapshot
         |   overlay    /
        BDS

The notifier filter not shown, because we can't currently snapshot while
a block job is active.

Still to do: similar analysis for the other mutators.

[Qemu-devel] Minutes from the "Stuttgart block Gipfele"

Reply via email to