On 29 Mar 2016, at 19:51, Wouter Verhelst <w...@uter.be> wrote: >> >> But I was envisioning the opposite: the server must NOT send X bytes >> unless it knows they are valid; if it encounters a read error at Z, >> then it sends a structured read of Z-1 bytes before the final normal >> message that reports overall failure. The client then assumes that >> all X bytes received are valid. > > The problem with that approach is that it makes it impossible for a > server to use a sendfile()-like system call, where you don't know that > there's a read error until start sending out data to the client (which > implies that you must've already sent out the header).
I don't think sendfile semantics are ever compatible with reporting read errors *unless* you pad after the read. IIRC the way sendfile works is that you specify a pointer to an offset, and sendfile sends as much as it can read (up to the length specified) and updates the offset for the length read. Naturally at the start of the read section, you don't know when the error is going to occur, so you must say that the length of the data read is going to be the length of the actual chunk. sendfile then does its stuff, and fills up either the whole thing, or part of it. In the case that part of the data (only) is available, you can't report the error there and then, because the client is expecting chunk data, so you must either close the connection (potentially disruptive) or pad the data, and report the error at the end. Using Eric's current scheme, you have no way of knowing where the error occurred. Remember the chunks could be out of order, e.g. you get chunks 1,3,5,7,9 in, and then an error, so you have no idea where the error was. It could be in chunks 1,3,5,7,9 (and the server might have padded the rest of the chunk) or in an unread chunk (2,4,6,8,10). This seems undesirable. I think we are paying too much attention to trying to keep NBD_RESPONSE intact. The justification for this was (I think) that it made it easier for existing protocol analysers. It doesn't, really, as all the data is going to come BEFORE the NBD_RESPONSE (unlike in NBD_CMD_READ in other situations). I think we should therefore look at this the other way around. Here's a straw man proposal as an alternative for the reply bits. For a structured reply ALL we get is the chunks. The final chunk (possibly the only chunk) is marked specially. Each chunk looks something like: offset+ 0000 32 bit NBD_STRUCTURED_REPLY_MAGIC 0004 64 bit handle 000C 32 bit Flags 0010 32 bit Payload length We have a couple of flags defined: NBD_CHUNK_IS_DATA: the chunk is data, and the payload is a 64 bit offset plus the data read NBD_CHUNK_IS_HOLE: the chunk is zeroes, and the payload is a 64 bit offset (only) NBD_CHUNK_IS_END: (must be the final chunk). The payload is a 64 bit offset plus a 32 bit error code, or zero. If no error, the offset must be set to the total amount read. If there is an error, the offset MAY indicate the position of the error. If an error occurs, no more chunks should be sent. The advantages of this scheme are: 1. Only one packet type in the reply (chunks) 2. It's no more difficult to implement wireshark decoding of this (in addition to the normal NBD protocol) than the current proposal. I'd suggest in fact they could be easier. 3. Chunks that error part way through (sendfile type) must still be padded but now can indicate error location. 4. It would be possible to allow EVERY server reply to be a structured reply that simply set NBD_CHUNK_IS_END. That gives us a convenient route to servers which only implement structured replies. With DF, this would be little harder than implementing the current protocol. -- Alex Bligh