Re: [Qemu-devel] [Nbd] [PATCH 3/1] doc: Propose Structured Replies extension

Eric Blake Tue, 29 Mar 2016 11:25:01 -0700

On 03/29/2016 11:53 AM, Wouter Verhelst wrote:
> Hi Eric,
> 
> Having read this in more detail now:
> 
> On Mon, Mar 28, 2016 at 09:56:36PM -0600, Eric Blake wrote:
>> +  The server MUST ensure that each read chunk lies within the original
>> +  offset and length of the original client request, MUST NOT send read
>> +  chunks that would cover the same offset more than once, and MUST
>> +  send at least one byte of data in addition to the offset field of
>> +  each read chunk.  The server MAY send read chunks out of order, and
>> +  may interleave other responses between read replies.  The server
>> +  MUST NOT set the error field of a read chunk; if an error occurs, it
>> +  MAY immediately end the sequence of structured response messages,
>> +  MUST send the error in the concluding normal response, and SHOULD
>> +  keep the connection open.  The final non-structured response MUST
>> +  set an error unless the sum of data sent by all read chunks totals
>> +  the original client length request.
> 
> I'm thinking it would probably be a good idea to have the concluding
> response (if the error field is nonzero) have an offset too; the server
> could use that to specify where, exactly, the error occurred (so that a
> client which sent a very large read request doesn't have to go through a
> binary search or some such to figure out where the read error happened)
> 
> i.e.,
> 
> C: read X bytes at offset Y
> S: (X bytes)
> S: (error, offset Z)


Here, I'm assuming that you mean X > Z.

Unfortunately, I chose the design of 0 or more structured replies
followed by a normal reply, so that the normal reply is a reliable
indicator that the read is complete (whether successful or not); and the
whole goal of the extension is to avoid sending any data payload on a
normal reply.  I'm not sure how to send the offset in the normal reply
without violating the premise that a normal reply has no payload.

But what we could do is allow for the server to send a structured reply
data chunk of zero bytes, with the offset in question, as the offset
where an error occurred, prior to then sending the normal reply with the
final error indicator.  I guess that also means that if we don't have
the DF command flag set, the server could then report multiple failed
reads interspersed among larger successful clusters, when trying to
recover as much of the failing disk as possible, if each failure is
reported via a separate structured read of zero bytes.  Hmm, that also
means that we have to be careful on the wording - if we allow a
structured reply with 0 data bytes to report an error, after already
sending a larger reply with partially valid bytes, then that means that
a client will receive more than one read chunk visiting the same offset,
so we'd have to make the wording permit that.

> client now has Z-1 bytes of valid data (with the rest being garbage,
> plus a read error)
> 
> The alternative (in the above) would be that the client has 0 bytes of
> valid data, and would have to issue another read request to figure out
> which parts of the data are valid.

So if I'm understanding you, you are trying to state that the server may
report the header for X bytes, then fail partway through those X bytes;
it must still send X bytes, but can then report how many are valid (that
is, a client must assume that 0 of the X bytes received are valid
_unless_ the server also reported where it failed).  But I was
envisioning the opposite: the server must NOT send X bytes unless it
knows they are valid; if it encounters a read error at Z, then it sends
a structured read of Z-1 bytes before the final normal message that
reports overall failure.  The client then assumes that all X bytes
received are valid.

But I also documented that the client MAY, but not MUST, abort the read
at the first error; so the idea of being able to report multiple errors
and/or send headers prior to learning whether there are read errors
means that your interpretation is probably safer than mine.

I guess it will help to have actual v2 wording in front of us to further
fine-tune the wording.

> 
>> +  The client SHOULD immediately close the connection if it detects
>> +  that the server has sent an offset more than once (whether or not
>> +  the overlapping data claimed to have the same contents), or if
>> +  receives the concluding normal reply without an error set but
>> +  without all bytes covered by read chunk(s). A future extension may
> 
> I would reword this to...
> 
> The client MAY immediately close the connection if it detects that
> [...]. The server MUST NOT send an offset more than once.
> 
>> +  add a command flag that would allow the server to skip read chunks
>> +  for portions of the file that read as all zeroes.
> 
> Not sure if that part is necessary or helpful, really.

I envision such an extension in parallel to (or as part of) the proposed
NBD_CMD_GET_LBA_STATUS (or whatever we name it) - it is slightly more
efficient to skip reads of holes with a single read command flag than it
is to first read status to determine where holes are and only then issue
reads for the non-hole regions.  But I can also buy your argument that
such language belongs in the extension for sparse reads, and doesn't
need to be present in the extension for structured reads.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [Nbd] [PATCH 3/1] doc: Propose Structured Replies extension

Reply via email to