The code is pretty confused about format vs. protocol, and so are we. Let's try to figure them out.
>From cruising altitude, all this format, protocol, stacking business doesn't matter. We provide a bunch of arguments, and get an image. If you look more closely, providing that image involves sub-tasks. One is to haul bits. Another one is to translate between bits in different formats. Working hypothesis: * A protocol hauls image bits. Examples: file, host_device, nbd. * A format translates image formats. Examples: raw, qcow2. Note: this does *not* follow the code's use of the terms. It better doesn't. Because the code is confused. Both protocol and format provide an image. That's why we can and in fact do have a common abstraction for them: BlockDriver. Our data type for a block driver instance is BlockDriverState. Nothing stops a block driver to translate and haul at the same time. We generally separate the two jobs, because it lets us combine the different ways to translate with the different ways to haul. Nevertheless, a block driver *can* be both format and protocol. Example: vvfat arguably both translates and hauls. Let's call a format that isn't also protocol a pure format, and a protocol that isn't also a format a pure protocol. Obviously, pure formats need to sit on top of something providing images to translate. Formats don't care whether those somethings translate or haul. Therefore, a pure format is always stacked on one or more BlockDriverStates. Example: raw is always stacked one exactly one BlockDriverState (stored in bs->file). Example: qcow2 is always stacked on exactly two BlockDriverStates (stored in bs->file and bs->backing_hd). Conversely, anything that isn't stacked on any BlockDriverState can't be a pure format, and thus must be a protocol. Example: file hauls an ordinary file's bits, nbd hauls bits over TCP using the NBD protocol. Summary so far: 1. BlockDriverStates form a tree. 2. The leaves of the tree are protocols, not pure formats. 3. The non-leaf nodes may be anything. We haven't found a reason why not. In general, a block driver needs some arguments to create an instance. The current code provides two BlockDriver methods for that: * bdrv_open() takes a flags argument. * bdrv_file_open() takes a flags argument and a filename argument. This is woefully inadequate for anything but the simplest block drivers. Any driver taking more complex arguments has to extract them out of the "filename". Example: http extracts url and optional readahead. A saner interface would pass flags and a suitable argument dictionary such as QemuOpts. Now, let's review our existing interface to create such a tree of block drivers. Beware, royal mess ahead. There are two interfaces. The first one is bdrv_open(). It takes three arguments: filename, flags and an optional block driver argument. If flag BDRV_O_SNAPSHOT is set, we do snapshot magic. Omitted here in an attempt to protect reader sanity. If the block driver is missing, we guess one. More on that below. The block driver is instantiated to set up the root of the tree. Let's call it the root block driver, and its instance bs. If the root block driver provides method bdrv_file_open(), it is used, and gets the flags and filename argument. Else, we first instantiate *another* driver. We use the second interface for that: bdrv_file_open(). It takes filename and flags arguments like bdrv_open(), but no block driver argument. It chooses the block driver by looking at filename. If filename names a host device, use the protocol for hauling that device's bits. If it starts with P:, where P is some driver's "protocol_name", use that driver. Else fail. Except I just lied; the actual rules are messier than that. Unlike bdrv_open(), bdrv_file_open() ignores flag BDRV_O_SNAPSHOT, and always behaves as if flag BDRV_O_NO_BACKING was set. We store the instance in bs->file. Then the root block driver is instantiated with method bdrv_open(). It gets the flags argument. It stacks on top of bs->file, but that's mere convention. Note: one of the code's ideas on format vs. protocol is "protocols provide bdrv_file_open(), formats do not". I don't think that idea is helpful. The root block driver may ask for a backing file. To do that, it sets bs->backing_filename and optionally bs->backing_format, both strings. Example: qcow2 reads the two strings from the image header. We instantiate the backing file bs->backing_hd with bdrv_open(). Recursion. Arguments: bs->backing_filename, flags derived from our own flags argument, and the driver named by bs->backing_format. If bs->backing_format is unset, pick one just like -drive does when its format option is unset. The root block driver stacks on top of bs->backing_hd, by convention. Flag BDRV_O_NO_BACKING supresses backing file setup, but let's ignore that here. This provides for common stacking, but it's not general. Block drivers can and do instantiate other block drivers on their own, for their stacking needs. Example: blkdebug instantiates bs->file with bdrv_file_open(). It passes on its flags argument and the part of its filename argument it doesn't use itself. How could a saner interface look like? An obvious interface for building trees lets you build bottom up: tree node constructor takes children and whatever other arguments it needs. COW backing files complicate matters. We need to open the COW to find its backing file information. I'd build a tree without the backing file normally, read the backing file information, create the tree for the backing file, and attach it to the COW node. Next, let's review the encoding of the filename argument. It is decoded in the block driver bdrv_file_open() methods. Every block driver has its own ad hoc encoding. Example: file interprets it as a filename. Example: nbd parses "nbd:" [ "unix:" filename | host ":" port ] Additionally, bdrv_file_open() recognizes P: (see above). This breaks when the block driver's encoding is incompatible with that. Examples: bdrv_open() arguments behavior filename block driver scruffy:duck none fails: no driver named "scruffy" scruffy:duck bdrv_raw fails: no driver named "scruffy" scruffy:duck bdrv_file bdrv_file uses file "scruffy:duck" fat:duck none bdrv_raw stacks onto bdrv_vvfat uses directory "duck" fat:duck bdrv_raw bdrv_raw stacks onto bdrv_vvfat uses directory "duck" fat:duck bdrv_file bdrv_file uses file "fat:duck" Bizarre, isn't it? More examples: try to use a qcow2 image named "fat:duck" bdrv_open() arguments behavior filename block driver fat:duck qcow2 bdrv_qcow2 stacks onto bdrv_vvfat uses directory "duck" fails: vvfat2 doesn't provide a qcow2 image file:fat:duck qcow2 bdrv_qcow2 stacks onto bdrv_file uses file "file:fat:duck" Close, but no cigar. -drive & friends expose this mess in the user interface as follows: * They use bdrv_open(). * Option format selects its block driver argument. It need not be a format. Any block driver does. Pearls like "format=file" confuse users (What format is "file"? And what's the difference to "raw"?). Note that you need format=file if you have colons in your filenames. * Option file is the filename argument. It's not really a filename, but an encoding of block driver name and arguments. If the block driver selected by format makes bdrv_open() instantiate a second block driver (because it wants to stack on it), then this argument also selects that block driver. But you can only select block drivers that support the funny colon syntax. * Options snapshot, cache, aio, readonly are combined into the flags argument. We need to think about a saner user interface, but I figure this message is already plenty long, so I stop here. [*] It must provide bdrv_file_open(), or else death by infinite recursion (I think).