On Tue, Mar 29, 2016 at 12:23:31PM -0600, Eric Blake wrote: > On 03/29/2016 11:53 AM, Wouter Verhelst wrote: > > Hi Eric, > > > > Having read this in more detail now: > > > > On Mon, Mar 28, 2016 at 09:56:36PM -0600, Eric Blake wrote: > >> + The server MUST ensure that each read chunk lies within the original > >> + offset and length of the original client request, MUST NOT send read > >> + chunks that would cover the same offset more than once, and MUST > >> + send at least one byte of data in addition to the offset field of > >> + each read chunk. The server MAY send read chunks out of order, and > >> + may interleave other responses between read replies. The server > >> + MUST NOT set the error field of a read chunk; if an error occurs, it > >> + MAY immediately end the sequence of structured response messages, > >> + MUST send the error in the concluding normal response, and SHOULD > >> + keep the connection open. The final non-structured response MUST > >> + set an error unless the sum of data sent by all read chunks totals > >> + the original client length request. > > > > I'm thinking it would probably be a good idea to have the concluding > > response (if the error field is nonzero) have an offset too; the server > > could use that to specify where, exactly, the error occurred (so that a > > client which sent a very large read request doesn't have to go through a > > binary search or some such to figure out where the read error happened) > > > > i.e., > > > > C: read X bytes at offset Y > > S: (X bytes) > > S: (error, offset Z) > > Here, I'm assuming that you mean X > Z.
Yes, obviously. > Unfortunately, I chose the design of 0 or more structured replies > followed by a normal reply, so that the normal reply is a reliable > indicator that the read is complete (whether successful or not); and the > whole goal of the extension is to avoid sending any data payload on a > normal reply. I'm not sure how to send the offset in the normal reply > without violating the premise that a normal reply has no payload. Oh. I thought you meant for the concluding message to also be a structured reply with the length field be zero, but you mean for it to be a non-structured reply message? If so, you should clarify that a bit more (this wasn't clear to me)... [...] > But what we could do is allow for the server to send a structured reply > data chunk of zero bytes, with the offset in question, as the offset > where an error occurred, prior to then sending the normal reply with the > final error indicator. I guess that also means that if we don't have > the DF command flag set, the server could then report multiple failed > reads interspersed among larger successful clusters, when trying to > recover as much of the failing disk as possible, if each failure is > reported via a separate structured read of zero bytes. Hmm, that also > means that we have to be careful on the wording - if we allow a > structured reply with 0 data bytes to report an error, after already > sending a larger reply with partially valid bytes, then that means that > a client will receive more than one read chunk visiting the same offset, > so we'd have to make the wording permit that. > > > client now has Z-1 bytes of valid data (with the rest being garbage, > > plus a read error) > > > > The alternative (in the above) would be that the client has 0 bytes of > > valid data, and would have to issue another read request to figure out > > which parts of the data are valid. > > So if I'm understanding you, you are trying to state that the server may > report the header for X bytes, then fail partway through those X bytes; > it must still send X bytes, but can then report how many are valid (that > is, a client must assume that 0 of the X bytes received are valid > _unless_ the server also reported where it failed). Yes. > But I was envisioning the opposite: the server must NOT send X bytes > unless it knows they are valid; if it encounters a read error at Z, > then it sends a structured read of Z-1 bytes before the final normal > message that reports overall failure. The client then assumes that > all X bytes received are valid. The problem with that approach is that it makes it impossible for a server to use a sendfile()-like system call, where you don't know that there's a read error until start sending out data to the client (which implies that you must've already sent out the header). > But I also documented that the client MAY, but not MUST, abort the read > at the first error; so the idea of being able to report multiple errors > and/or send headers prior to learning whether there are read errors > means that your interpretation is probably safer than mine. I didn't mean to imply that. I do think that aborting the read at the first error is probably a good idea. If the error occurs because the disk is dying, having the server go ahead and try to read more data anyway is probably not a very good idea (unless instructed to do so by the user, i.e., client). > I guess it will help to have actual v2 wording in front of us to further > fine-tune the wording. Certainly :-) > >> + The client SHOULD immediately close the connection if it detects > >> + that the server has sent an offset more than once (whether or not > >> + the overlapping data claimed to have the same contents), or if > >> + receives the concluding normal reply without an error set but > >> + without all bytes covered by read chunk(s). A future extension may > > > > I would reword this to... > > > > The client MAY immediately close the connection if it detects that > > [...]. The server MUST NOT send an offset more than once. > > > >> + add a command flag that would allow the server to skip read chunks > >> + for portions of the file that read as all zeroes. > > > > Not sure if that part is necessary or helpful, really. > > I envision such an extension in parallel to (or as part of) the proposed > NBD_CMD_GET_LBA_STATUS (or whatever we name it) - it is slightly more > efficient to skip reads of holes with a single read command flag than it > is to first read status to determine where holes are and only then issue > reads for the non-hole regions. Sure. > But I can also buy your argument that such language belongs in the > extension for sparse reads, and doesn't need to be present in the > extension for structured reads. Right, that was my point, mainly. -- < ron> I mean, the main *practical* problem with C++, is there's like a dozen people in the world who think they really understand all of its rules, and pretty much all of them are just lying to themselves too. -- #debian-devel, OFTC, 2016-02-12