[Qemu-devel] Meeting notes on -blockdev, dynamic backend reconfiguration

Markus Armbruster Mon, 05 Dec 2016 04:07:25 -0800

I recently met Kevin, and we discussed two block layer topics in some
depth.


= -blockdev =

We want a command line option to mirror QMP blockdev-add for 2.9.
QemuOpts has to grow from "list of (key, simple value) plus conventions
to support lists of simple values in limited ways" to the expressive
power of JSON.

== Basic idea ==

QMP pipeline: JSON string - JSON parser - QObject - QObject input
visitor - QAPI object.  For commands with with 'gen': false, we stop at
QObject.  These are rare.

Command line now: option argument string - QemuOpts parser - QemuOpts.

We occasionally continue - options or string input visitor - QAPI
object.  Both visitors can't do arbitrary QAPI objects.  Both visitors
extend QemuOpts syntax.

Daniel Berrange posted patches to instead do - crumple - QObject -
qobject input visitor - QAPI object.  Arbitrary QObjects (thus QAPI
objects) are possible with dotted key convention, which is already used
by block layer.

As before, a visitor sits on top of QemuOpts, providing syntax
extensions.  Stacking parsers like that is not a good idea.  We want
*one* option argument parser, and we need it to yield a QObject.

== Backward compatibility issues ==

* Traditional key=value,... syntax

* The "repeated key is list" hack

* Options and string input visitor syntax extensions

* Dotted key convention

Hopefully, most of the solutions can be adapted from Daniel's patches.

== Type ambguity ==

In JSON, the type of a value is syntactically obvious.  The JSON parser
yields QObject with these types.  The QObject input visitor rejects
values with types that don't match the QAPI schema.

In the traditional key=value command line syntax, the type of a value
isn't obvious.  Options and string input visitor convert the string
value to the type expected by the QAPI schema.

Unlike a QObject from JSON, a QObject from QemuOpts has only string
values, and the QObject input visitor needs to be able to convert
instead of reject.  Daniel's patches do that.

== Action item ==

Markus to explore the proposed solution as soon as possible.


= Dynamic block backend reconfiguration =

== Mirror job ==

State before the job:

    frontend
        |
       BB
        |
       BDS

Frontend writes flow down.

Passive mirror job, as it currently works:

    frontend   mirror-job
        |       |      |
        BB      BB'    BB2
        |  ____/       |
        | /            |
       BDS            BDS2

The mirror job copies the contents of BDS to BDS2.  To handle frontend
writes, BDS maintains a dirty bitmap, which mirror-job uses to copy
updates from BB' to BB2.

Pivot to mirror on job completion: replace BB's child BDS by BDS2,
delete mirror-job and its BB', BB2.

    frontend
        |
        BB
         \_____________
                       \
       BDS            BDS2


Future mirror job using a mirror-filter:

    frontend   mirror-job
        |         |
        BB       BB'
        |        /
    mirror-filter
        |        \
       BDS      BDS2

Passive mirror-filter: maintains dirty bitmap, copies from BDS to BDS2.

Active mirror-filter: no dirty bitmap, mirrors writes to BDS2 directly.

Can easily switch from passive to active at any time.

Pivot: replace parent of mirror-filter's child mirror-filter by BDS2,
delete mirror job and its BB'.  "Parent of" in case other filters have
been inserted: we drop the ones below mirror-filter, and keep the ones
above.

== Backup job ==

Current backup job:

    frontend   backup-job
        |       |      |
        BB      BB'    BB2
        |  ____/       |
        | /            |
       BDS            BDS2

The backup job copies the contents of BDS to BDS2.  To handle frontend
writes, BDS provices a before-write-notifier, backup-job uses it to copy
old data from BB' to BB2 right before it's overwritten.

Pivot: delete backup-job and its BB', BB2.

    frontend
        |
        BB
        |
        |
       BDS            BDS2

Future backup job using a backup-filter:

    frontend   backup-job
        |         |
        BB       BB'
        |        /
    backup-filter
        |       \
       BDS      BDS2

backup-filter copies old data from BDS to BDS2 before it forwards write
to BDS.

Pivot: replace parent of backup-filter's child backup-filter by BDS2,
delete backup-job and its BB'.

== Commit job ==

State before the job:

       frontend
           |
           BB
           |
         QCOW2
    file /   \ backing
        /     \
       /       \
     BDS1     QCOW2_top
         file /   \ backing
             /     .
          BDS_top   .
                     \
                  BDS_base

"file" and "backing" are the QCOW2 child names for the delta image and
the backing image, respectively.

Frontend writes flow to BDS1.

Current commit job to commit from QCOW2_top down to BDS_base:

       frontend
           |
           BB               commit-job
           |                 /     \
         QCOW2           BB_top  BB_base
    file /   \ backing     /       /
        /     \   ________/       /
       /       \ /               /
     BDS1     QCOW2_top         /
         file /   \ backing    /
             /     .          /
          BDS_top   .   _____/
                     \ /
                  BDS_base

commit-job copies anything allocated above BDS_base up to BDS_top from
BB_top to BB_base.

Pivot: replace backing child of QCOW2_top by BDS_base, delete commit-job
and its BB_top, BB_base.

       frontend
           |
           BB
           |
         QCOW2
    file /   \ backing
        /     \
       /       \
     BDS1     QCOW2_top
         file /   \ backing
             /     \
          BDS_top  BDS_base

Drops any filters meanwhile inserted between QCOW2_top and BDS_base.
Should we have a (otherwise no op) commit-filter node to provide a place
for filters we want to keep?  Would op blockers need / profit from such
a filter?

== Streaming job ==

Just like commit (hopefully).

== Basic dynamic reconfiguration operation ==

The basic operation is "replace child".

Beware of race conditions.  Consider:

          BB
          |
    mirror-filter
          |
         BDS

Add a throttle filter under BB while the mirror job is running.  First
step, create the filter:

          BB    throttle-filter
          |     /
    mirror-filter
          |
         BDS

Second step, replace child of BB by the new filter:

          BB
          |
   throttle-filter
          |
    mirror-filter
          |
         BDS

But: if mirror-filter goes away between the two steps, the replace
brings it right back!

To guard against such races, we need to specify both ends of the edge
being replaced, i.e. parent, child name, actual child.  Then the replace
step fails if the mirror-filter has gone away.  We can either fail the
whole operation, or start over.

Alternatively, transactions, but that feels much more complex.

[Qemu-devel] Meeting notes on -blockdev, dynamic backend reconfiguration

Reply via email to