On Fri, Feb 24, 2023 at 06:41:49AM +0100, Laszlo Ersek wrote: > On 2/23/23 21:40, Eric Blake wrote: > > libnbd's copy/copy-nbd-error.sh was triggering an assertion failure in > > nbdkit: > > > > $ nbdcopy -- [ nbdkit --exit-with-parent -v --filter=error pattern 5M > > error-pread-rate=0.5 ] null: > > ... > > nbdkit: pattern.2: debug: error-inject: pread count=262144 offset=4718592 > > nbdkit: pattern.2: debug: pattern: pread count=262144 offset=4718592 > > nbdkit: pattern.1: debug: error-inject: pread count=262144 offset=4456448 > > nbdkit: pattern.1: error: injecting EIO error into pread > > nbdkit: pattern.1: debug: sending error reply: Input/output error > > nbdkit: pattern.3: debug: error-inject: pread count=262144 offset=4980736 > > nbdkit: pattern.3: debug: pattern: pread count=262144 offset=4980736 > > nbdkit: pattern.4: error: write data: NBD_CMD_READ: Broken pipe > > nbdkit: pattern.4: debug: exiting worker thread pattern.4 > > nbdkit: connections.c:402: raw_send_socket: Assertion `sock >= 0' failed. > > > > When there are multiple queued requests, and the client hangs up > > abruptly as soon as the first error response is seen (rather than > > waiting until all other queued responses are flushed then sending > > NBD_CMD_DISC), another thread that has a queued response ready to send > > sees EPIPE (the client is no longer reading) and closes the socket, > > then the third thread triggers the assertion failure because it not > > expecting to attempt a send to a closed socket. Thus, my claim that > > sock would always be non-negative after collecting data from the > > plugin was wrong. The fix is to gracefully handle an abrupt client > > disconnect, by not attempting to send on an already-closed socket. > > > > For the nbdcopy example, --exit-with-parent means it's just an > > annoyance (nbdcopy has already exited, and no further client will be > > connecting); but for a longer-running nbdkit server accepting parallel > > clients, it means any one client can trigger the SIGABRT by > > intentionally queueing multiple NBD_CMD_READ then disconnecting early, > > and thereby kill nbdkit and starve other clients. Whether it rises to > > the level of CVE depends on whether you consider one client being able > > to starve others a privilege escalation (if you are not using TLS, > > there are other ways for a bad client to starve peers; if you are > > using TLS, then the starved client has the same credentials as the > > client that caused the SIGABRT so there is no privilege boundary > > escalation). > > > > Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2173054 > > Fixes: daef505e ("server: Give client EOF when we are done writing", > > v1.32.4) > > --- > > server/connections.c | 7 +++++-- > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > diff --git a/server/connections.c b/server/connections.c > > index 4d776f2a..1b63eb73 100644 > > --- a/server/connections.c > > +++ b/server/connections.c > > @@ -1,5 +1,5 @@ > > /* nbdkit > > - * Copyright (C) 2013-2022 Red Hat Inc. > > + * Copyright (C) 2013-2023 Red Hat Inc. > > * > > * Redistribution and use in source and binary forms, with or without > > * modification, are permitted provided that the following conditions are > > @@ -400,7 +400,10 @@ raw_send_socket (const void *vbuf, size_t len, int > > flags) > > ssize_t r; > > int f = 0; > > > > - assert (sock >= 0); > > + if (sock < 0) { > > + errno = EBADF; > > + return -1; > > + } > > #ifdef MSG_MORE > > if (flags & SEND_MORE) > > f |= MSG_MORE; > > Is it possible that, after the second thread sees EPIPE and closes the > socket, a new client connects, the original file descriptor is > (effectively) repurposed, and the third thread now attempts to send on a > valid, but *wrong* socket?
Good question. raw_send_socket() is never invoked directly, but rather used by the function pointer conn->send() (that's how we switch between raw_send_socket() and crypto_send() when enabling TLS). More importantly, note how the function starts: static int raw_send_socket (const void *vbuf, size_t len, int flags) { GET_CONN; int sock = conn->sockout; and looking at the definition of GET_CONN, conn is a non-NULL pointer in thread-local storage. You are indeed correct that once we close our socket, another client can be connecting in parallel in a different thread and open its own fd on the number we had previously been storing in conn->sockout. So now we have to check if all assignments/usages of conn->sockout are thread-safe. conn->sockout is directly referenced during protocol-handshake-newstyle.c calling crypto_negotiate_tls() - but at that point, we know the socket is still open. All other references to conn->sockout are in connections.c. new_connection() assigns it as part of creating the socket - at that point, conn is thread-local, and we haven't utilized the fd, so it is not invalid. raw_close() (reached by conn->close()) assigns -1 to conn->sockout after closing it on SHUT_WR, but leaves the variable unchanged after closing it on SHUT_RDWR. So next we check the two callers: conn->close(SHUT_RDWR) is only called from free_connection(), which is only reached after handle_single_connection() has joined all worker threads for conn, so there is no other thread that can attempt to read the value that we left stale. Meanwhile, conn->close(SHUT_WR) is only called from connection_set_status(), which is guarded by the mutex conn->status_lock if more than one thread can be using the connection. We'll use that knowledge below. The only other access of conn->sockout is when raw_send_socket() reads its value to learn where to attempt to send, and here, we have to widen our search to observe that conn->send() is reached in multiple places; however, anything related to handshaking is done while conn is still single-threaded (handle_single_connection() calls protocol_handshake() before spawning worker threads), so the only time we can have more than one thread using conn->send() simultaneously is during protocol_recv_request_send_reply(). All such uses of conn->send() from that function are guarded by the mutex conn->write_lock. But the question you asked is whether there can ever be a race where: thread 1 thread 2 thread 3 protocol_recv_request_send_reply() protocol_recv_request_send_reply() - read from socket - call into plugin - connection_get_status(): - grab conn->status_lock - observe live socket (aka conn->sockout still valid) - release status_lock .. - grab conn->read_lock .. - conn->recv() detects failure .. - call connection_set_status(STATUS_DEAD) .. - grab conn->status_lock .. - call conn->close(SHUT_WR) .. - call closesocket() > window where another - call send_*() to send reply .. > client can connect, - grab conn->write_lock .. > thereby claiming the - call conn->send() .. > fd still present in - sock=conn->sockout .. > thread 1's conn->sockout .. - assign conn->sockout=-1 - write to sock Eww. It looks like your hunch was right. I don't see adequate protection to prevent thread 1 from attempting to use a stale conn->sockout value, and it IS possible that another client connecting at the same time can have re-allocated that fd. > > I don't know if "multiple clients for a single nbdkit server" is a > thing, but I believe that at least "multiconn" could lead to the same > phenomenon. multi-conn is precisely multiple clients for a single nbdkit server! But yes, it is also possible for a single server to serve multiple non-related clients even without multiconn (perhaps even different data to parallel clients, such as the file plugin in dir mode where the export name requested by the client determines which file is being served over the connection). > > If this problem exists, then I guess the socket abstraction (?) would > have to be extended with a refcount, under the protection of a mutex. > Whenever sending fails (and note that EPIPE is a persistent error, so > once one thread sees EPIPE, the others will keep seeing that too!), the > refcount is decreased, and the one thread that reaches refcount 0 closes > the socket. > > This is just brainstorming, could be totally disconnected from nbdkit's > current behavior. No, it is spot on. And it means I need to do a v2 of this patch, which properly protects conn->sockout from stale reuse by a second thread. I don't know if ref-counting is easiest, or just ensuring that the assignment to conn->sockout always has a happens-before relationship to another thread utilizing conn->sockout. conn->read_lock and conn->write_lock are meant to be held for potentially long periods of time (the duration of time for which one direction of the socket is in exclusive use by one half of a message, which may require several network packets). But conn->status_lock is intended to be held as short as possible. Right now, it only protects reads and writes of conn->status, but extending it to also cover reads and writes of conn->sockin/sockout may be sufficient. (It may be possible to use atomics instead of a mutex, except that I'm not sure what code base to copy from - nbdkit's license means I can't copy from Paolo's nice atomics framework in qemu). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org _______________________________________________ Libguestfs mailing list Libguestfs@redhat.com https://listman.redhat.com/mailman/listinfo/libguestfs