On 7/21/23 14:50, Eric Blake wrote: > On Thu, Jun 01, 2023 at 03:42:30PM +0200, Laszlo Ersek wrote: >> On 5/25/23 15:00, Eric Blake wrote: >>> One of the benefits of extended replies is that we can do a >>> fixed-length read for the entire header of every server reply, which >>> is fewer syscalls than the split-read approach required by structured >>> replies. >> >> (Totally tangential comment: recvmsg() could scatter the incoming >> traffic into non-contiguous fields of a non-packed structure. But I >> don't know if TLS has anything similar. The current "linear receive" >> approach is probably the least demanding of the underlying socket >> abstractions. At the same time it requires us to use packed structs, >> and/or multiple syscalls.) >> >>> But one of the drawbacks of doing a large read is that if >>> the server is non-compliant (not a problem for normal servers, but >>> something I hit rather more than I'd like to admit while developing >>> extended header support in servers), >> >> Haha, so this is where it's coming from! :) I can totally relate. >> >>> nbd_pwrite() and friends will >>> deadlock if the server replies with the wrong header. Add in some >>> code to catch that failure mode and move the state machine to DEAD >>> sooner, to make it easier to diagnose the fault in the server. >>> >>> Unlike in the case of an unexpected simply reply from a structured >> >> (1) s/simply/simple/ > > Yep. > >> >>> server (where we never over-read, and can therefore >> >> (2) At this point my English parser gets thrown off: >> >>> commit b31e7bac >>> can merely fail the command with EPROTO and successfully move on to >>> the next server reply), > > Fixing the parenthetical as follows: > > (where, since commit b31e7bac, we can merely fail the command with > EPROTO and successfully move on to the next server reply, because we > didn't read too many bytes yet) > >> >> resync here: >> >>> in this case we really do have to move to >>> DEAD: in addition to having already read the 16 or 20 bytes that the >>> server sent in its (short) reply for this command, we may have already >>> read the initial bytes of the server's next reply, but we have no way >>> to push those extra bytes back onto our read stream for parsing on our >>> next pass through the state machine. >>> >>> Signed-off-by: Eric Blake <ebl...@redhat.com> >>> --- >>> generator/states-reply.c | 23 ++++++++++++++++++++++- >>> 1 file changed, 22 insertions(+), 1 deletion(-) >>> >>> diff --git a/generator/states-reply.c b/generator/states-reply.c >>> index 4e9f2dde..d4710d91 100644 >>> --- a/generator/states-reply.c >>> +++ b/generator/states-reply.c >>> @@ -109,7 +109,28 @@ REPLY.START: >>> REPLY.RECV_REPLY: >>> switch (recv_into_rbuf (h)) { >>> case -1: SET_NEXT_STATE (%.DEAD); return 0; >>> - case 1: SET_NEXT_STATE (%.READY); return 0; >>> + case 1: SET_NEXT_STATE (%.READY); >>> + /* Special case: if we have a short read, but got at least far >>> + * enough to decode the magic number, we can check if the server >>> + * is matching our expectations. This lets us avoid deadlocking if >>> + * a buggy server sends only 16 bytes of a simple reply, and is >>> + * waiting for our next command, while we are blocked waiting for >>> + * the server to send 32 bytes of an extended reply. >>> + */ >> >> (4) Slight inconsistency between commit message and code comment: the >> former mentions "20 bytes", but the latter doesn't mention "structured >> reply". > > Did I miss (3)? But yes, I can improve this comment.
My mistake, probably due to over-editing! :) Thanks Laszlo > >> >>> + if (h->extended_headers && >>> + (char *)h->rbuf >= (char *)&h->sbuf.reply.hdr.extended.flags) { >> >> (.) I wonder if (address-of-magic + size-of magic) might be more >> readable / greppable. Just in case. >> >> Feel free to ignore. > > Actually, I agree that it is nicer (although a bit longer). > >> >>> + uint32_t magic = be32toh (h->sbuf.reply.hdr.extended.magic); >>> + if (magic != NBD_EXTENDED_REPLY_MAGIC) { >>> + SET_NEXT_STATE (%.DEAD); /* We've probably lost synchronization. */ >>> + set_error (0, "invalid or unexpected reply magic 0x%" PRIx32, >>> magic); >>> +#if 0 /* uncomment to see desynchronized data */ >>> + nbd_internal_hexdump (&h->sbuf.reply.hdr.extended.flags, >>> + sizeof (h->sbuf.reply.hdr.extended.flags), >>> + stderr); >>> +#endif >>> + } >>> + } >>> + return 0; >> >> (5) This could be factored out to a helper function, to share the >> "invalid:" label logic with the previous patch. >> >> (6) Commencing a dump from "flags" onward looks questionable. At this >> point, the "flags" field need not to be complete, or even started -- all >> we know is that the "magic" field *before" "flags" is complete. >> >> I think we should dump "h->sbuf.reply.hdr.simple", like in patch#5. > > Yep, I noticed that while addressing your point (4), just before > reading your point (6). Bug dumping the full hdr.simple is also too > much; really, given the preconditions above, all we can dump is > hdr.magic. At which point, the hexdump is not providing us any > further information than what the set_error() call already output. > I'm just deleting the #if 0 segment, which in turn eliminates the need > to address point (5), as the two invalid: labels are no longer > identical. > >> >> (.) Side comment (so no bullet number assigned): because we can take >> multiple iterations on RECV_REPLY, we may end up checking the "magic" >> field multiple times. Not very costly, just something to be aware of. > > Indeed. In the common case, short reads are rare, and even when they > do happen, it is even more rare to hit it more than once per packet. > I wonder if nc or similar has a way to force a server's response to be > flushed after every byte, to see the performance impact of maximum > network overhead. > > But your observation means you are correctly aware of a larger design > aspect of the state machine: all code handling short reads (case 1 of > recv_into_rbuf) must be re-startable (no messing with h->rbuf, for > example). Stateful changes can only occur when the read is complete > (case 0) or irretrievably failed (case -1). > >> >> (7) Assume that we have a short read first, but then complete >> REPLY.RECV_REPLY successfully, and move to >> REPLY.CHECK_SIMPLE_OR_STRUCTURED_REPLY. > ... >> >> *HOWEVER*! :) If we do *not* see some short-but-together-long-enough >> reads first, but see a full read at once, then our "pre-check" in >> RECV_REPLY does not get activated at all! And then all the conditions in >> CHECK_SIMPLE_OR_STRUCTURED_REPLY remain necessary. >> >> So this is my long-winded way to ask that: >> >>> case 0: SET_NEXT_STATE (%CHECK_SIMPLE_OR_STRUCTURED_REPLY); >>> } >>> return 0; >> >> you please consider adding a comment here, saying that, while the short >> read special case *does* short-circuit a number of checks under >> CHECK_SIMPLE_OR_STRUCTURED_REPLY, all the checks in >> CHECK_SIMPLE_OR_STRUCTURED_REPLY remain required if under RECV_REPLY we >> get a full read immediately. > > Good idea. Added for v4. > _______________________________________________ Libguestfs mailing list Libguestfs@redhat.com https://listman.redhat.com/mailman/listinfo/libguestfs