Re: [PATCH 1/1] qemu-nbd: fix regression with qemu-nbd --fork run over ssh

Eric Blake Fri, 14 Jul 2023 08:36:04 -0700

On Thu, Jul 06, 2023 at 09:15:45PM +0200, Denis V. Lunev wrote:
> Commit e6df58a5578fee7a50bbf36f4a50a2781cff855d
>     Author: Hanna Reitz <hre...@redhat.com>
>     Date:   Wed May 8 23:18:18 2019 +0200
>     qemu-nbd: Do not close stderr
> has introduced an interesting regression. Original behavior of
>     ssh somehost qemu-nbd /home/den/tmp/file -f raw --fork
> was the following:
>  * qemu-nbd was started as a daemon
>  * the command execution is done and ssh exited with success
> 
> The patch has changed this behavior and 'ssh' command now hangs forever.
> 
> According to the normal specification of the daemon() call, we should
> endup with STDERR pointing to /dev/null. That should be done at the
> very end of the successful startup sequence when the pipe to the
> bootstrap process (used for diagnostics) is no longer needed.
> 
> This could be achived in the same way as done for 'qemu-nbd -c' case.


That was commit 0eaf453e, also fixing up e6df58a5.

> STDOUT copying to STDERR does the trick.

It took me a while to see how this worked (the double-negative of
passing 0 to the 'noclose' parameter of daemon() doesn't make it easy
to track what is supposed to happen), but I agree that our goal is
that after daemon()izing, we temporarily set stderr to the inherited
pipe for passing back any startup error messages, then usually want to
restore it to /dev/null for the remainder of the process.

> 
> This also leads to proper 'ssh' connection closing which fixes my
> original problem.
> 
> Signed-off-by: Denis V. Lunev <d...@openvz.org>
> CC: Eric Blake <ebl...@redhat.com>
> CC: Vladimir Sementsov-Ogievskiy <vsement...@yandex-team.ru>
> CC: Hanna Reitz <hre...@redhat.com>
> ---
>  qemu-nbd.c | 9 +--------
>  1 file changed, 1 insertion(+), 8 deletions(-)

I indeed see how this fixes a regression under 'fork_process'.
However, the code that calls qemu_daemon() is also reachable under the
condition 'device && !verbose'.  Does it make sense for either:

qemu-nbd -v -c /dev/nbd0
qemu-nbd -f -v -c /dev/nbd0

as a way to connect to the kernel device, but explicitly ask to remain
verbose, as a way to debug issues in connecting to the kernel via
stderr?

Going back to the emails at the time of Hanna's original commit...

https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg01872.html

I don't see any consideration about that case; capturing the original
stderr was done to fix what looked like an easy bug fix to a botched
implementation of old_stderr in commit ffb31e1d without considering
the ramifications.

But seeing as how pre-existing code DID want at least 'qemu-nbd -v -c
/dev/nbd0' to log indefinitely, I think we need to squash in:

diff --git i/qemu-nbd.c w/qemu-nbd.c
index 4276163564b..4d037798b9b 100644
--- i/qemu-nbd.c
+++ w/qemu-nbd.c
@@ -313,7 +313,7 @@ static void *nbd_client_thread(void *arg)
     /* update partition table */
     pthread_create(&show_parts_thread, NULL, show_parts, device);

-    if (verbose) {
+    if (verbose && !fork_process) {
         fprintf(stderr, "NBD device %s is now connected to %s\n",
                 device, srcpath);
     } else {


With that tweak, I'm fine with adding:

Reviewed-by: Eric Blake <ebl...@redhat.com>

Do you agree with my tweak?  If so, I can queue this through my NBD
tree without needing to see a v2 post.  However...

> 
> diff --git a/qemu-nbd.c b/qemu-nbd.c
> index 4276163564..e9e118dfdb 100644
> --- a/qemu-nbd.c
> +++ b/qemu-nbd.c
> @@ -575,7 +575,6 @@ int main(int argc, char **argv)
>      bool writethrough = false; /* Client will flush as needed. */
>      bool fork_process = false;
>      bool list = false;
> -    int old_stderr = -1;
>      unsigned socket_activation;
>      const char *pid_file_name = NULL;
>      const char *selinux_label = NULL;
> @@ -930,11 +929,6 @@ int main(int argc, char **argv)
>          } else if (pid == 0) {
>              close(stderr_fd[0]);
>  
> -            /* Remember parent's stderr if we will be restoring it. */
> -            if (fork_process) {
> -                old_stderr = dup(STDERR_FILENO);
> -            }
> -
>              ret = qemu_daemon(1, 0);
>  
>              /* Temporarily redirect stderr to the parent's pipe...  */
> @@ -1152,8 +1146,7 @@ int main(int argc, char **argv)

Pre-existing, but your patch made me notice nearby (minor corner-case)
bugs:

            ret = qemu_daemon(1, 0);

            /* Temporarily redirect stderr to the parent's pipe...  */
            dup2(stderr_fd[1], STDERR_FILENO);

We aren't checking for dup2() failure.  But even if it succeeds (which
we expect), it could have modified the contents of errno compared to
what they were after qemu_daemon()...

            if (ret < 0) {
                error_report("Failed to daemonize: %s", strerror(errno));

...so this could be reporting garbage.  I wouldn't mind additional
patches to make the code more robust against unlikely failure
scenarios.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 1/1] qemu-nbd: fix regression with qemu-nbd --fork run over ssh

Reply via email to