Hi Vladimir, On Tue, Nov 29, 2016 at 03:41:10PM +0300, Vladimir Sementsov-Ogievskiy wrote: > Hi, > > 29.11.2016 13:50, Wouter Verhelst wrote: > > Hi, > > On Mon, Nov 28, 2016 at 06:33:24PM +0100, Wouter Verhelst wrote: > > However, I'm arguing that if we're going to provide information about > snapshots, we should be able to properly refer to these snapshots from > within an NBD context. My previous mail suggested adding a negotiation > message that would essentially ask the server "tell me about the > snapshots you know about", giving them an NBD identifier in the > process > (accompanied by a "foreign" identifier that is decidedly *not* an NBD > identifier and that could be used to match the NBD identifier to > something implementation-defined). This would be read-only > information; > the client cannot ask the server to create new snapshots. We can then > later in the protocol refer to these snapshots by way of that NBD > identifier. > > To make this a bit more concrete, I've changed the proposal like so: > [...] > +##### Allocation contexts > + > +Allocation context 0 is the basic "exists at all" allocation context. If > +an extent is not allocated at allocation context 0, it MUST NOT be > +listed as allocated at another allocation context. This supports sparse > > > allocated here is range with unset NBD_STATE_HOLE bit?
Eh, yes. I should clarify that a bit further (no time right now though, but patches are certainly welcome). > +file semantics on the server side. If a server has only one allocation > +context (the default), then writing to an extent which is allocated in > +that allocation context 0 MUST NOT fail with ENOSPC. > + > +For all other cases, this specification requires no specific semantics > +of allocation contexts. Implementations could support allocation > +contexts with semantics like the following: > + > +- Incremental snapshots; if a block is allocated in one allocation > + context, that implies that it is also allocated in the next level up. > +- Various bits of data about the backend of the storage; e.g., if the > + storage is written on a RAID array, an allocation context could > + return information about the redundancy level of a given extent > +- If the backend implements a write-through cache of some sort, or > + synchronises with other servers, an allocation context could state > + that an extent is "allocated" once it has reached permanent storage > + and/or is synchronized with other servers. > + > +The only requirement of an allocation context is that it MUST be > +representable with the flags as defined for `NBD_CMD_BLOCK_STATUS`. > + > +Likewise, the syntax of query strings is not specified by this document. > + > +Server implementations SHOULD document their syntax for query strings > +and semantics for resulting allocation contexts in a document like this > +one. > > > IMHO, redundant paragraph for this spec. The "SHOULD document" one? I want that there at any rate, just to make clear that it's probably a good thing to have it (but no, it's not part of the formal protocol spec). > + > ### Transmission phase > > #### Flag fields > @@ -983,6 +1065,9 @@ valid may depend on negotiation during the handshake > phase. > content chunk in reply. MUST NOT be set unless the transmission > flags include `NBD_FLAG_SEND_DF`. Use of this flag MAY trigger an > `EOVERFLOW` error chunk, if the request length is too large. > +- bit 3, `NBD_CMD_FLAG_REQ_ONE`; valid during `NBD_CMD_BLOCK_STATUS`. If > + set, the client is interested in only one extent per allocation > + context. > > ##### Structured reply flags > > @@ -1371,38 +1456,48 @@ adds a new `NBD_CMD_BLOCK_STATUS` command which > returns a list of > ranges with their respective states. This extension is not available > unless the client also negotiates the `STRUCTURED_REPLY` extension. > > -* `NBD_FLAG_SEND_BLOCK_STATUS` > - > - The server SHOULD set this transmission flag to 1 if structured > - replies have been negotiated, and the `NBD_CMD_BLOCK_STATUS` > - request is supported. > - > * `NBD_REPLY_TYPE_BLOCK_STATUS` > > - *length* MUST be a positive integer multiple of 8. This reply > + *length* MUST be 4 + (a positive integer multiple of 8). This reply > represents a series of consecutive block descriptors where the sum > of the lengths of the descriptors MUST not be greater than the > - length of the original request. This chunk type MUST appear at most > - once in a structured reply. Valid as a reply to > + length of the original request. This chunk type MUST appear at most > + once per allocation ID in a structured reply. Valid as a reply to > `NBD_CMD_BLOCK_STATUS`. > > - The payload is structured as a list of one or more descriptors, > - each with this layout: > + Servers MUST return an `NBD_REPLY_TYPE_BLOCK_STATUS` chunk for every > + allocation context ID, except if the semantics of particular > + allocation contexts mean that the information for one allocation > + context is implied by the information for another. > > > So, actually, instead of selecting with a nbd_cmd or with external tool which > bitmap we want to access, we just reply with all bitmaps (negotiated at the > beginning). Personally I dislike this. With such approach, we will have to > export allocation bitmap always, even if we need only dirtyness. Consider, > that > requesting allocation bitmap may be much more expensive in time that > requesting > dirtyness. Ah, yes, didn't consider that. I suppose more updates will be required, then. What would you suggest instead, then? > Or allocation bitmap information is implied by dirtyness? > > Furthermore, as allocation context semantics defined externally, 'semantics > mean that information is implied' states nothing, and we actually return to > way > #3, where external tool decides which bitmap to export. > > > + > + The payload starts with: > + > + * 32 bits, allocation context ID > + > + and is followed by a list of one or more descriptors, each with this > + layout: > > * 32 bits, length (unsigned, MUST NOT be zero) > * 32 bits, status flags > > - The definition of the status flags is determined based on the > - flags present in the original request. > + If the client used the `NBD_CMD_FLAG_REQ_ONE` flag in the request, > + then every reply chunk MUST NOT contain more than one descriptor. > + > + Even if the client did not use the `NBD_CMD_FLAG_REQ_ONE` flag in > + its request, the server MAY return less descriptors in the reply > + than would be required to fully specify the whole range of requested > + information to the client, if the number of descriptors would be > + over 16 otherwise and looking up the information would be too > + resource-intensive for the server. > > > So, if there are <=16 extents, they all MUST be present in reply.. (just note) That's the proposal, yes. I think it makes sense to have a minimum that MUST be present (unless the client asked for REQ_ONE), although the exact count can be different from 16, if needs be. [...] > Thoughts? > > For me considering dirtyness like one of allocation contexts sounds a bit > weird, as dirtyness is not allocation.. But it is not so important. I would certainly be willing to change the name, if that helps. The idea is that you have various types of information that you can query the server about. I called these "allocation contexts", but I'm certainly not of the opinion that it *has* to be called that. Allocation is one of the information types, but there can be more; my proposed spec explicitly does not go into detail about the others. -- < ron> I mean, the main *practical* problem with C++, is there's like a dozen people in the world who think they really understand all of its rules, and pretty much all of them are just lying to themselves too. -- #debian-devel, OFTC, 2016-02-12