On 03/29/2016 11:53 AM, Wouter Verhelst wrote: > Hi Eric, > > Having read this in more detail now: > > On Mon, Mar 28, 2016 at 09:56:36PM -0600, Eric Blake wrote: >> + The server MUST ensure that each read chunk lies within the original >> + offset and length of the original client request, MUST NOT send read >> + chunks that would cover the same offset more than once, and MUST >> + send at least one byte of data in addition to the offset field of >> + each read chunk. The server MAY send read chunks out of order, and >> + may interleave other responses between read replies. The server >> + MUST NOT set the error field of a read chunk; if an error occurs, it >> + MAY immediately end the sequence of structured response messages, >> + MUST send the error in the concluding normal response, and SHOULD >> + keep the connection open. The final non-structured response MUST >> + set an error unless the sum of data sent by all read chunks totals >> + the original client length request. > > I'm thinking it would probably be a good idea to have the concluding > response (if the error field is nonzero) have an offset too; the server > could use that to specify where, exactly, the error occurred (so that a > client which sent a very large read request doesn't have to go through a > binary search or some such to figure out where the read error happened) > > i.e., > > C: read X bytes at offset Y > S: (X bytes) > S: (error, offset Z)
Here, I'm assuming that you mean X > Z. Unfortunately, I chose the design of 0 or more structured replies followed by a normal reply, so that the normal reply is a reliable indicator that the read is complete (whether successful or not); and the whole goal of the extension is to avoid sending any data payload on a normal reply. I'm not sure how to send the offset in the normal reply without violating the premise that a normal reply has no payload. But what we could do is allow for the server to send a structured reply data chunk of zero bytes, with the offset in question, as the offset where an error occurred, prior to then sending the normal reply with the final error indicator. I guess that also means that if we don't have the DF command flag set, the server could then report multiple failed reads interspersed among larger successful clusters, when trying to recover as much of the failing disk as possible, if each failure is reported via a separate structured read of zero bytes. Hmm, that also means that we have to be careful on the wording - if we allow a structured reply with 0 data bytes to report an error, after already sending a larger reply with partially valid bytes, then that means that a client will receive more than one read chunk visiting the same offset, so we'd have to make the wording permit that. > client now has Z-1 bytes of valid data (with the rest being garbage, > plus a read error) > > The alternative (in the above) would be that the client has 0 bytes of > valid data, and would have to issue another read request to figure out > which parts of the data are valid. So if I'm understanding you, you are trying to state that the server may report the header for X bytes, then fail partway through those X bytes; it must still send X bytes, but can then report how many are valid (that is, a client must assume that 0 of the X bytes received are valid _unless_ the server also reported where it failed). But I was envisioning the opposite: the server must NOT send X bytes unless it knows they are valid; if it encounters a read error at Z, then it sends a structured read of Z-1 bytes before the final normal message that reports overall failure. The client then assumes that all X bytes received are valid. But I also documented that the client MAY, but not MUST, abort the read at the first error; so the idea of being able to report multiple errors and/or send headers prior to learning whether there are read errors means that your interpretation is probably safer than mine. I guess it will help to have actual v2 wording in front of us to further fine-tune the wording. > >> + The client SHOULD immediately close the connection if it detects >> + that the server has sent an offset more than once (whether or not >> + the overlapping data claimed to have the same contents), or if >> + receives the concluding normal reply without an error set but >> + without all bytes covered by read chunk(s). A future extension may > > I would reword this to... > > The client MAY immediately close the connection if it detects that > [...]. The server MUST NOT send an offset more than once. > >> + add a command flag that would allow the server to skip read chunks >> + for portions of the file that read as all zeroes. > > Not sure if that part is necessary or helpful, really. I envision such an extension in parallel to (or as part of) the proposed NBD_CMD_GET_LBA_STATUS (or whatever we name it) - it is slightly more efficient to skip reads of holes with a single read command flag than it is to first read status to determine where holes are and only then issue reads for the non-hole regions. But I can also buy your argument that such language belongs in the extension for sparse reads, and doesn't need to be present in the extension for structured reads. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature