I recently met Kevin, and we discussed two block layer topics in some depth.
= -blockdev = We want a command line option to mirror QMP blockdev-add for 2.9. QemuOpts has to grow from "list of (key, simple value) plus conventions to support lists of simple values in limited ways" to the expressive power of JSON. == Basic idea == QMP pipeline: JSON string - JSON parser - QObject - QObject input visitor - QAPI object. For commands with with 'gen': false, we stop at QObject. These are rare. Command line now: option argument string - QemuOpts parser - QemuOpts. We occasionally continue - options or string input visitor - QAPI object. Both visitors can't do arbitrary QAPI objects. Both visitors extend QemuOpts syntax. Daniel Berrange posted patches to instead do - crumple - QObject - qobject input visitor - QAPI object. Arbitrary QObjects (thus QAPI objects) are possible with dotted key convention, which is already used by block layer. As before, a visitor sits on top of QemuOpts, providing syntax extensions. Stacking parsers like that is not a good idea. We want *one* option argument parser, and we need it to yield a QObject. == Backward compatibility issues == * Traditional key=value,... syntax * The "repeated key is list" hack * Options and string input visitor syntax extensions * Dotted key convention Hopefully, most of the solutions can be adapted from Daniel's patches. == Type ambguity == In JSON, the type of a value is syntactically obvious. The JSON parser yields QObject with these types. The QObject input visitor rejects values with types that don't match the QAPI schema. In the traditional key=value command line syntax, the type of a value isn't obvious. Options and string input visitor convert the string value to the type expected by the QAPI schema. Unlike a QObject from JSON, a QObject from QemuOpts has only string values, and the QObject input visitor needs to be able to convert instead of reject. Daniel's patches do that. == Action item == Markus to explore the proposed solution as soon as possible. = Dynamic block backend reconfiguration = == Mirror job == State before the job: frontend | BB | BDS Frontend writes flow down. Passive mirror job, as it currently works: frontend mirror-job | | | BB BB' BB2 | ____/ | | / | BDS BDS2 The mirror job copies the contents of BDS to BDS2. To handle frontend writes, BDS maintains a dirty bitmap, which mirror-job uses to copy updates from BB' to BB2. Pivot to mirror on job completion: replace BB's child BDS by BDS2, delete mirror-job and its BB', BB2. frontend | BB \_____________ \ BDS BDS2 Future mirror job using a mirror-filter: frontend mirror-job | | BB BB' | / mirror-filter | \ BDS BDS2 Passive mirror-filter: maintains dirty bitmap, copies from BDS to BDS2. Active mirror-filter: no dirty bitmap, mirrors writes to BDS2 directly. Can easily switch from passive to active at any time. Pivot: replace parent of mirror-filter's child mirror-filter by BDS2, delete mirror job and its BB'. "Parent of" in case other filters have been inserted: we drop the ones below mirror-filter, and keep the ones above. == Backup job == Current backup job: frontend backup-job | | | BB BB' BB2 | ____/ | | / | BDS BDS2 The backup job copies the contents of BDS to BDS2. To handle frontend writes, BDS provices a before-write-notifier, backup-job uses it to copy old data from BB' to BB2 right before it's overwritten. Pivot: delete backup-job and its BB', BB2. frontend | BB | | BDS BDS2 Future backup job using a backup-filter: frontend backup-job | | BB BB' | / backup-filter | \ BDS BDS2 backup-filter copies old data from BDS to BDS2 before it forwards write to BDS. Pivot: replace parent of backup-filter's child backup-filter by BDS2, delete backup-job and its BB'. == Commit job == State before the job: frontend | BB | QCOW2 file / \ backing / \ / \ BDS1 QCOW2_top file / \ backing / . BDS_top . \ BDS_base "file" and "backing" are the QCOW2 child names for the delta image and the backing image, respectively. Frontend writes flow to BDS1. Current commit job to commit from QCOW2_top down to BDS_base: frontend | BB commit-job | / \ QCOW2 BB_top BB_base file / \ backing / / / \ ________/ / / \ / / BDS1 QCOW2_top / file / \ backing / / . / BDS_top . _____/ \ / BDS_base commit-job copies anything allocated above BDS_base up to BDS_top from BB_top to BB_base. Pivot: replace backing child of QCOW2_top by BDS_base, delete commit-job and its BB_top, BB_base. frontend | BB | QCOW2 file / \ backing / \ / \ BDS1 QCOW2_top file / \ backing / \ BDS_top BDS_base Drops any filters meanwhile inserted between QCOW2_top and BDS_base. Should we have a (otherwise no op) commit-filter node to provide a place for filters we want to keep? Would op blockers need / profit from such a filter? == Streaming job == Just like commit (hopefully). == Basic dynamic reconfiguration operation == The basic operation is "replace child". Beware of race conditions. Consider: BB | mirror-filter | BDS Add a throttle filter under BB while the mirror job is running. First step, create the filter: BB throttle-filter | / mirror-filter | BDS Second step, replace child of BB by the new filter: BB | throttle-filter | mirror-filter | BDS But: if mirror-filter goes away between the two steps, the replace brings it right back! To guard against such races, we need to specify both ends of the edge being replaced, i.e. parent, child name, actual child. Then the replace step fails if the mirror-filter has gone away. We can either fail the whole operation, or start over. Alternatively, transactions, but that feels much more complex.