On 04/04/2016 05:08 PM, Wouter Verhelst wrote: > On Mon, Apr 04, 2016 at 10:54:02PM +0300, Denis V. Lunev wrote: >> saying about dirtiness, we would soon come to the fact, that >> we can have several dirtiness states regarding different >> lines of incremental backups. This complexity is hidden >> inside QEMU and it would be very difficult to publish and >> reuse it. > > How about this then. > > A reply to GET_BLOCK_STATUS containing chunks of this: > > 32-bit length > 32-bit "snapshot status" > if bit 0 in the latter field is set, that means the block is allocated > on the original device > if bit 1 is set, that means the block is allocated on the first-level > snapshot > if bit 2 is set, that means the block is allocated on the second-level > snapshot
The idea of allocation is orthogonal from the idea of reads as zeroes. That is, a client may usefully guarantee that something reads as zeroes, whether or not it is allocated (but knowing whether it is a hole or allocated will determine whether future writes to that area will cause file system fragmentation or be at risk of ENOSPC on thin-provisioning). If we want to expose the notion of depth (and I'm not sure about that yet), we may want to reserve bit 0 for 'reads as zero' and bits 1-30 as 'allocated at depth "bit-1"' (and bit 31 as 'allocated at depth 30 or greater). I don't know if the idea of depth of allocation is useful enough to expose in this manner; qemu could certainly advertise depth if the protocol calls it out, but I'm still not sure whether knowing depth helps any algorithms. > > If all flags are cleared, that means the block is not allocated (i.e., > is a hole) and MUST read as zeroes. That's too strong. NBD_CMD_TRIM says that we can create holes whose data does not necessarily read as zeroes (and SCSI definitely has semantics like this - not all devices guarantee zero reads when you UNMAP; and WRITE_SAME has an UNMAP flag to control whether you are okay with the faster unmapping operation at the expense of bad reads, or slower explicit writes). Hence my complaint that we have to treat 'reads as zero' as an orthogonal bit to 'allocated at depth X'. > > If a flag is set at a particular level X, that means the device is dirty > at the Xth-level snapshot. > > If at least one flag is set for a region, that means the data may read > as "not zero". > > The protocol does not define what it means to have multiple levels of > snapshots, other than: > > - Any write command (WRITE or WRITE_ZEROES) MUST NOT clear or set the > Xth level flag if the Yth level flag is not also cleared at the same > time, for any Y > X > - Any write (as above) MAY set or clear multiple levels of flags at the > same time, as long as the above holds > > Having a 32-bit snapshot status field allows for 32 levels of snapshots. > We could switch length and flags to 64 bits so that things continue to > align nicely, and then we have a maximum of 64 levels of snapshots. 64 bits may not lay out as nicely (a 12-byte struct is not as efficient to copy between the wire and a C array as a 8-byte struct). > > (I'm not going to write this up formally at this time of night, but you > get the general idea) The idea may make it possible to expose dirty information as a layer of depth (from the qemu perspective, each qcow2 file would occupy 2 layers of depth: one if dirty, and another if allocated; then deeper layers are determined by backing files). But I'm also worried that it may be more complicated than the original question at hand (qemu wants to know, in advance of a read, which portions of a file are worth reading, because they are either allocated, or because they are dirty; but doesn't care to what depth the server has to go to actually perform the reads). -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature