Re: [Qemu-devel] [Nbd] [PATCH 3/1] doc: Propose Structured Replies extension

Wouter Verhelst Tue, 29 Mar 2016 11:53:22 -0700

On Tue, Mar 29, 2016 at 12:23:31PM -0600, Eric Blake wrote:
> On 03/29/2016 11:53 AM, Wouter Verhelst wrote:
> > Hi Eric,
> > 
> > Having read this in more detail now:
> > 
> > On Mon, Mar 28, 2016 at 09:56:36PM -0600, Eric Blake wrote:
> >> +  The server MUST ensure that each read chunk lies within the original
> >> +  offset and length of the original client request, MUST NOT send read
> >> +  chunks that would cover the same offset more than once, and MUST
> >> +  send at least one byte of data in addition to the offset field of
> >> +  each read chunk.  The server MAY send read chunks out of order, and
> >> +  may interleave other responses between read replies.  The server
> >> +  MUST NOT set the error field of a read chunk; if an error occurs, it
> >> +  MAY immediately end the sequence of structured response messages,
> >> +  MUST send the error in the concluding normal response, and SHOULD
> >> +  keep the connection open.  The final non-structured response MUST
> >> +  set an error unless the sum of data sent by all read chunks totals
> >> +  the original client length request.
> > 
> > I'm thinking it would probably be a good idea to have the concluding
> > response (if the error field is nonzero) have an offset too; the server
> > could use that to specify where, exactly, the error occurred (so that a
> > client which sent a very large read request doesn't have to go through a
> > binary search or some such to figure out where the read error happened)
> > 
> > i.e.,
> > 
> > C: read X bytes at offset Y
> > S: (X bytes)
> > S: (error, offset Z)
> 
> Here, I'm assuming that you mean X > Z.


Yes, obviously.

> Unfortunately, I chose the design of 0 or more structured replies
> followed by a normal reply, so that the normal reply is a reliable
> indicator that the read is complete (whether successful or not); and the
> whole goal of the extension is to avoid sending any data payload on a
> normal reply.  I'm not sure how to send the offset in the normal reply
> without violating the premise that a normal reply has no payload.

Oh. I thought you meant for the concluding message to also be a
structured reply with the length field be zero, but you mean for it to
be a non-structured reply message? If so, you should clarify that a bit
more (this wasn't clear to me)...

[...]
> But what we could do is allow for the server to send a structured reply
> data chunk of zero bytes, with the offset in question, as the offset
> where an error occurred, prior to then sending the normal reply with the
> final error indicator.  I guess that also means that if we don't have
> the DF command flag set, the server could then report multiple failed
> reads interspersed among larger successful clusters, when trying to
> recover as much of the failing disk as possible, if each failure is
> reported via a separate structured read of zero bytes.  Hmm, that also
> means that we have to be careful on the wording - if we allow a
> structured reply with 0 data bytes to report an error, after already
> sending a larger reply with partially valid bytes, then that means that
> a client will receive more than one read chunk visiting the same offset,
> so we'd have to make the wording permit that.
> 
> > client now has Z-1 bytes of valid data (with the rest being garbage,
> > plus a read error)
> > 
> > The alternative (in the above) would be that the client has 0 bytes of
> > valid data, and would have to issue another read request to figure out
> > which parts of the data are valid.
> 
> So if I'm understanding you, you are trying to state that the server may
> report the header for X bytes, then fail partway through those X bytes;
> it must still send X bytes, but can then report how many are valid (that
> is, a client must assume that 0 of the X bytes received are valid
> _unless_ the server also reported where it failed).

Yes.

> But I was envisioning the opposite: the server must NOT send X bytes
> unless it knows they are valid; if it encounters a read error at Z,
> then it sends a structured read of Z-1 bytes before the final normal
> message that reports overall failure.  The client then assumes that
> all X bytes received are valid.

The problem with that approach is that it makes it impossible for a
server to use a sendfile()-like system call, where you don't know that
there's a read error until start sending out data to the client (which
implies that you must've already sent out the header).

> But I also documented that the client MAY, but not MUST, abort the read
> at the first error; so the idea of being able to report multiple errors
> and/or send headers prior to learning whether there are read errors
> means that your interpretation is probably safer than mine.

I didn't mean to imply that. I do think that aborting the read at the
first error is probably a good idea. If the error occurs because the
disk is dying, having the server go ahead and try to read more data
anyway is probably not a very good idea (unless instructed to do so by
the user, i.e., client).

> I guess it will help to have actual v2 wording in front of us to further
> fine-tune the wording.

Certainly :-)

> >> +  The client SHOULD immediately close the connection if it detects
> >> +  that the server has sent an offset more than once (whether or not
> >> +  the overlapping data claimed to have the same contents), or if
> >> +  receives the concluding normal reply without an error set but
> >> +  without all bytes covered by read chunk(s). A future extension may
> > 
> > I would reword this to...
> > 
> > The client MAY immediately close the connection if it detects that
> > [...]. The server MUST NOT send an offset more than once.
> > 
> >> +  add a command flag that would allow the server to skip read chunks
> >> +  for portions of the file that read as all zeroes.
> > 
> > Not sure if that part is necessary or helpful, really.
> 
> I envision such an extension in parallel to (or as part of) the proposed
> NBD_CMD_GET_LBA_STATUS (or whatever we name it) - it is slightly more
> efficient to skip reads of holes with a single read command flag than it
> is to first read status to determine where holes are and only then issue
> reads for the non-hole regions.

Sure.

> But I can also buy your argument that such language belongs in the
> extension for sparse reads, and doesn't need to be present in the
> extension for structured reads.

Right, that was my point, mainly.

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
       people in the world who think they really understand all of its rules,
       and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12

Re: [Qemu-devel] [Nbd] [PATCH 3/1] doc: Propose Structured Replies extension

Reply via email to