On 10/21/2015 04:27 PM, Markus Armbruster wrote: > Sorry for my slow reply. > > Kevin Wolf <kw...@redhat.com> writes: > >> Am 08.10.2015 um 13:02 hat Kevin Wolf geschrieben: >>> Am 08.10.2015 um 08:15 hat Markus Armbruster geschrieben: >>>> Max Reitz <mre...@redhat.com> writes: >>>>> E.g. you may have a block filter in the future where you want to >>>>> exchange its child BDS. This exchange should be an atomic operation, so >>>>> we cannot use this interface there anyway. For quorum, such an exchange >>>>> does not need to be atomic, since you can just add the new child first >>>>> and remove the old one afterwards. >>>>> >>>>> So maybe in the future we get some block driver other than quorum for >>>>> which adding and removing children (as opposed to atomically exchanging >>>>> them) makes sense, but for now I can only see quorum. Therefore, that >>>>> this works for quorum only is in my opinion not a reason to make it >>>>> experimental. I think we actually want to keep it that way. >>>> >>>> Are you telling us the existing interface is insufficiently general? >>>> That the general interface neeeds to support atomic replacement? >>>> >>>> If yes, why isn't the general interface is what we should do for quorum? >>>> Delete is atomic replacement by nothing, add is atomic replacement of >>>> nothing. >>> >>> The general thing is what we used to call "dynamic reconfiguration". If >>> we want a single command to enable it (which would be great if we >>> could), we need to some more design work first. Atomic replacement might >>> be the operation we're looking for, but we have to be sure. >>> >>> So far we haven't thought about dynamic reconfiguation enough that we >>> would know the right solution, but enough that we know it's hard. > > Yes. > >>> That >>> would be an argument for me that makes adding an x-* command now >>> acceptable. On the other hand, the fact that we want a single command in >>> the end makes me want to keep it experimental. > > We have a bit more time to think until the release calcifies interfaces. > >>> What types of dynamic reconfiguration do we need to support? I'll start >>> with a small list, feel free to extend it: >>> >>> * Replace a child node with another node. This works pretty much >>> everywhere in the tree - including the root, i.e. BlockBackend! >>> Working just on BDSes doesn't seem to be enough. > > Two ways to replace a node: > > 1. Operation on a node, i.e. replace node O by node N *everywhere*. > > 2. Operation on an edge, i.e. replace an edge to node O by an edge to > node N. > > Less abstract: > > 1. Node replacement takes a BDS *O. It either replaces in place > (something we've worked hard to get rid of), or needs to find all > variables containing the value O. > > 2. Edge replacement takes a BDS **O. Naturally, whoever contains that > thing may need to be notified, so the actual interface may well be a > bit more involved. > >>> * Adding a child to a node that can take additional children (e.g. >>> quorum can take an arbitrary number; but also COW image formats have >>> an option child, so you could add a backing file to a node originally >>> opened with BDRV_O_NO_BACKING) >>> >>> Same as atomically replacing nothing by a node. > > Yes. > >>> * Removing an optional child from a node that remains valid with that >>> child removed. The same examples apply. >>> >>> Same as atomically replacing a child by nothing. > > Yes. > >>> * Add a new node between two existing nodes. An example is taking a live >>> snapshot, which inserts a new node between the BlockBackend and the >>> first BDS. Or it could be used to insert a filter somewhere in the >>> graph. >>> >>> Same as creating the new node pointing to node B (or dynamically >>> adding it) and then atomically replacing the reference of node A that >>> pointed to B with a reference to the new node. > > Footnote: between creation of the new node and the atomic replace, the > old node has another parent. Requires support for a BDS to have > multiple parents, but that's a given, I think. > >>> * Add a new node between multiple existing nodes. This is one of the >>> examples we always used to discuss with dynamic reconfiguration: >>> >>> base <- sn1 <- sn2 <--+-- virtio-blk >>> | >>> +-- NBD server >>> >>> Adding a filter could result in this: >>> >>> base <- sn1 <- sn2 <- throttling <--+-- virtio-blk >>> | >>> +-- NBD server >>> >>> Or this: >>> >>> base <- sn1 <- sn2 <--+-- throttling <- virtio-blk >>> | >>> +-- NBD server >>> >>> Or this: >>> >>> base <- sn1 <- sn2 <--+-- virtio-blk >>> | >>> +-- throttling <- NBD server >>> >>> All of these are different kinds of "adding a filter", and all of >>> them are valid operations that a user could want to perform. >>> >>> Case 2 and 3 are really just "add a new node between two existing >>> nodes", as explained above. > > Yes. > >>> Case 1 is new: We still create the throttling node so that it already >>> points to sn2, but now we have to atomically change the children of >>> two things (the BlockBackends of virtio-blk and the NBD server). Not a >>> problem per se because we can just do that, but it raises the question >>> whether the atomic replacement operation needs to be transactionable. > > Where "transactionable" means you can bind it together with other > transactionable operations, and the resulting transaction either > succeeds or fails completely. Right? > > In increasing flexibility order: > > a. atomic single replacement, not transactionable > > b. atomic multiple replacements, not transactionable > > c. single replacement, transactionable > > In your throttling example, having to insert the throttle in two > separate operations (either of which can fail) seems tolerable. > > Can we think of an example where a similar multi-replacement should > either succeed or fail completely? I.e. one that needs at least b. > > Can we think of an example where we need c? > >>> * Remove a node between two (or more) other nodes. >>> >>> Same as atomically replacing a child by a grandchild. For more than >>> two involved nodes, again a transactional version might be needed. >>> >>> So at the first sight, this operation seems to work as the general >>> interface for dynamic reconfiguration. > > Yes, we seem to be onto something here. Finally! > >> And actually, after re-reading the other branch of this email thread >> where I replied to Berto that he wants to use bdrv_reopen() to change >> the voting threshold for quorum, it occurred to me that bdrv_reopen() >> should possibly also be this atomatic replacement function. >> >> After all, the references to other nodes are blockdev-add options, and >> if we implement bdrv_reopen() fully so that it can change any options >> that can be given to blockdev-add, that means that we must be able to >> replace children. > > You got a point. Do you think implementing it will be hard? > >> >> Kevin >> >>> One problem we discussed earlier that I'm not sure whether it's related >>> is filter nodes inserted automatically once we change e.g. the I/O >>> throttling QMP commands to add a throttling filter BDS to the graph. >>> >>> If the user creates nodes A and B, but sets throttling options, they >>> might end up with a chain like this: >>> >>> A <- throttling <- B > > Is "sets throttling options" == "uses the (then) legacy interface for > throttling instead of the more powerful 'insert throttling node' > interface"? > >>> Now imagine that they want to add another filter between A and B, let's >>> say blkdebug. They would need to know that they have to insert the new >>> node between A and throttling or B and throttling, but not between A and >>> B. If they tried to insert it between A and B, the algorithm above says >>> that they would let blkdebug point to A, and replace B's child with >>> blkdebug, the resulting tree wouldn't be throttled any more! >>> >>> A <- blkdebug <- B >>> >>> throttling (ref = 0 -> delete) > > Can we phrase the operation differently? Instead of "insert between A > and B (silently replacing everything that is now between A and B)", > say one of > > 1a. Replace node A by A <- blkdebug > > 1b. Replace node B by blkdebug <- B > > 2a. Replace edge A <- B by <- blkdebug <- > Impossible in the current state, because there is no such edge.
What does 'edge' mean? Thanks Wen Congyang > > 2b. Replace edge A <- by <- blkdebug <- > > 2c. Replace edge <- B by <- blkdebug <- > > See also node vs. edge replacement above. > >>> Even if they knew that they have to consider the throttling node, it >>> currently wouldn't have a node-name, and with Jeff's autogenerated names >>> it wouldn't be predictable. >>> >>> Maybe the dynamic reconfiguration interface does need to be a bit >>> cleverer. >>> >>> >>> Anyway, after writing all of this, I'm almost convinced now that an >>> experimental interface is the right thing to do in this patch series. > > Let's do experimental now, and reconsider before the release. > > . >