On Tue, 09/03 18:24, Benoît Canet wrote: > > Hello list, > > I am thinking about QEMU block filters lately. > > I am not a block.c/blockdev.c expert so tell me what you think of the > following. > > The use cases I see would be: > > -$user want to have some real cryptography on top of qcow2/qed or another > format. > snapshots and other block features should continue to work > > -$user want to use a raid like feature like QUORUM in QEMU. > other features should continue to work > > -$user want to use the future SSD deduplication implementation with metadata > on > SSD and data on spinning disks. > other features should continue to work > > -$user want to I/O throttle one drive of his vm. > > -$user want to do Copy On Read > > -$user want to do a combination of the above > > -$developer want to make the minimum of required steps to keep changes small > > -$developer want to keep user interface changes for later > > Lets take a example case of an user wanting to do I/O throttled encrypted > QUORUM > on top of QCOW2. > > Assuming we want to implement throttle and encryption as something remotely > being like a block filter this makes a pretty complex BlockDriverState tree. > > The tree would look like the following: > > I/O throttling BlockDriverState (bs) > | > | > | > | > Encryption BlockDriverState (bs) > | > | > | > | > Quorum BlockDriverState (bs) > / | \ > / | \ > / | \ > / | \ > QCOW2 bs QCOW2 b s QCOW2 bs > | | | > | | | > | | | > | | | > RAW bs RAW bs RAW bs > > An external snapshot should result in a tree like the following. > I/O throttling BlockDriverState (bs) > | > | > | > | > Encryption BlockDriverState (bs) > | > | > | > | > Quorum BlockDriverState (bs) > / | \ > / | \ > / | \ > / | \ > QCOW2 bs QCOW2 bs QCOW2 bs > | | | > | | | > | | | > | | | > QCOW2 bs QCOW2 bs QCOW2 bs > | | | > | | | > | | | > | | | > RAW bs RAW bs RAW bs > > In the current state of QEMU we can code some block drivers to implement this > tree. > > However when doing operations like snapshots blockdev.c would have no real > idea > of what should be snapshotted and how. (The 3 top bs should be kept on top) > > Moreover it would have no way to manipulate easily this tree of > BlockDriverState > has each one is encapsulated in it's parent. > > Also there no generic way to tell the block layer that two or more > BlockDriverState > are siblings. > > The current mail is here to propose some additionals structures in order to > cope > with these problems. > > The overall strategy of the proposed structures is to push out the > BlockDriverStates relationships out of each BlockDriverState. > > The idea is that it would make it easier for the block layer to manipulate a > well known structure instead of being forced to enter into each > BlockDriverState > specificity. > > The first structure is the BlockStackNode. > > The BlockStateNode would be used to represent the relationship between the > various BlockDriverStates > > struct BlockStackNode { > BlockDriverState *bs; /* the BlockDriverState holded by this node */ > > /* this doubly linked list entry points to the child node and the parent > * node > */ > QLIST_ENTRY(BlockStateNode) down; > > /* This doubly linked list entry point to the siblings of this node > */ > QLIST_ENTRY(BlockStateNode) siblings; > > /* a hash or an array of the sibbling of this node for fast access > * should be recomputed when updating the tree */ > QHASH_ENTRY<BlockStateNode, index> sibblings_hash; > } > > The BlockBackend would be the structure used to hold the "drive" the guest > use. > > struct BlockBackend { > /* the following doubly linked list header point to the top BlockStackNode > * in our case it's the one containing the I/O throttling bs > */ > QLIST_HEAD(, BlockStateNode) block_stack_head; > /* this is a pointer to the topest node below the block filter chain > * in our case the first QCOW2 sibling > */ > BlockStackNode *top_node_below_filters; > } > > > Updated diagram: > > (Here bsn means BlockStacknode) > > ------------------------BlockBackend > | | > | block_stack_head > | | > | | > | I/O throttling BlockStackNode (contains it's bs) > | | > | down > | | > | | > top_node_below_filter Encryption BlockStacknode (contains it's bs) > | | > | down > | | > | | > | Quorum BlockStackNode (contain's it's bs) > | / > | down > | / > | / S S > ------ QCOW2 bsn--i---QCOW2 bsn--i------ QCOW2 bsn (each bsn contains a > bs) > | b | b | > down l down l down > | i | i | > | n | n | > | g | g | > | s | s | > | | | > RAW bsn RAW bsn RAW bsn (each bsn contains a > bs) > > > Block driver point of view: > > to construct the tree each BlockDriver would have some utility functions > looking > like. > > bdrv_register_child_bs(bs, child_bs, int index); > > multiples calls to this function could be done to register multiple siblings > childs identified by their index. > > This way something like quorum could register multiple QCOW2 instances. > > driver would have a > BlockDriverSTate *bdrv_access_child(bs, int index); > > to access their childs. > > These functions can be implemented without the driver knowing about > BlockStateNodes using container_of. > > blockdev point of view: (here I need your help) > > When doing a snapshot blockdev.c would access > BlockBackend->top_node_below_filter and make a snapshot of the bs contained in > this node and it's sibblings. > Since BlockDriver.bdrv_snapshot_create() is an optional operation, blockdev.c can navigate down the tree from top node, until hitting some layer where the op is implemented (the QCow2 bs), so we get rid of this top_node_below_filter pointer.
Is this the only use case of top_node_below_filter? Fam > After each individual snapshot the linked lists and the hash/arrays would be > updated to point to the new top bsn. > The snapshot operation can be done without violating any of the top block > filter BlockDriverState. > > What do you think of this idea ? > How this would fit in block.c/blockdev.c ? > > Best regards > > Benoît