Re: [Libguestfs] [libnbd PATCH 1/2] api: Tighten rules on completion.callback

Laszlo Ersek Fri, 14 Jul 2023 00:14:05 -0700

On 7/13/23 21:29, Eric Blake wrote:
> The documentation has claimed since commit 6f4dcdab that any
> completion callback will be called exactly once; but this is not
> consistent with the code: if nbd_aio_* itself returns an error, then
> nothing is queued and the user does not need to wait for a completion
> callback to know how the command failed.  We could tweak the generator
> to call completion.callback no matter what, but since the
> completion.free callback already serves that role, it's easier to fix
> the documentation to match reality.  After all, one only needs
> completion status if an aio command returned success (if it returned
> failure, we know that there is nothing that is going to complete
> later).
> 
> However, there was one place where we indeed fail to call
> completion.callback, even though the corresponding aio call returned
> success, which can strand a user that was depending on the callback to
> know that the pending aio command failed after all.  That's when a
> call to nbd_close() interrupts a connection while commands are in
> flight.  This problem appears to have been around even before commit
> 52b9b492 (v0.9.8) when we finally settled on having .free callbacks in
> the first place.
> 
> Beef up the closure-lifetimes unit test to more robustly check the
> various conditions guaranteed by the updated documentation, and to
> expose the previous skip of a completion callback during nbd_close.
> 
> In summary, the behavior we want (where sequence is important) is:
> 
> - aio command fails:
> mid-command .callback: 0 calls
> completion .callback: 0 calls
> mid-command .free: 1 call
> completion .free: 1 call
> 
> - aio command succeeds:
> mid-command .callback: 0, 1, or multiple calls
> completion .callback: 1 call
> mid-command .free: 1 call
> completion .free: 1 call
> 
> Reported-by: Tage Johansson <tage.j.li...@posteo.net>
> Fixes: 6f4dcdab ("docs: Clarify how callbacks should handle errors", v1.11.8)
> Signed-off-by: Eric Blake <ebl...@redhat.com>
> ---
>  docs/libnbd.pod           | 26 ++++++++++++++++----------
>  lib/handle.c              |  3 +++
>  tests/closure-lifetimes.c | 14 ++++++++++++++
>  3 files changed, 33 insertions(+), 10 deletions(-)
> 
> diff --git a/docs/libnbd.pod b/docs/libnbd.pod
> index 72f74053..433479e6 100644
> --- a/docs/libnbd.pod
> +++ b/docs/libnbd.pod
> @@ -904,9 +904,12 @@ same nbd object, as it would cause deadlock.
>  =head2 Completion callbacks
> 
>  All of the asychronous commands have an optional completion callback
> -function that is used right before the command is marked complete,
> -after any mid-command callbacks have finished, and before any free
> -functions.
> +function that is used if the asynchronous command succeeded, right
> +before the command is marked complete, after any mid-command callbacks
> +have finished, and before any free functions.  The completion callback
> +is not reached if the asynchronous command itself fails, while free
> +functions are reached regardless of the initial result of the
> +asynchronous command.
> 
>  When the completion callback returns C<1>, the command is
>  automatically retired (there is no need to call


I agree with this approach (i.e., with adapting the documentation to
reality), but I find the language somewhat confusing. We have three terms:

- "asynchronous command succeeds"
- "command is marked complete"
- "command is retired"

The last two are mostly interchangeable in my view, and are also *not*
confusing in the documentation. But the first term is confusing, IMO; it
can easily be mistaken for meanings #2/#3. What we mean by #1 instead is
the successful queueing (or submission) of the command. The text below
does say "queued", but the text above doesn't. So I'd suggest replacing
term#1 above with "call to asynchronous API succeeds" or "asynchronous
command is successfully submitted" or something like that.

(Side comment: I'd kinda prefer an all-or-nothing approach for async
APIs. If the API fails at once, it should not take ownership of
anything; i.e., it shouldn't call either completion or free callbacks.
And if it succeeds, then it should take complete ownership. I'm not
suggesting to rework callbacks whole-sale for this, though.)

With the language clarified a bit:

Acked-by: Laszlo Ersek <ler...@redhat.com>

Thanks
Laszlo



> @@ -946,13 +949,16 @@ mid-command callback may be reached more times than 
> expected if the
>  server is non-compliant.
> 
>  On the other hand, if a completion callback is supplied (only possible
> -with asynchronous commands), it will always be reached exactly once,
> -and the completion callback must not ignore the value pointed to by
> -C<error>.  In particular, the content of a buffer passed to
> -L<nbd_aio_pread(3)> or L<nbd_aio_pread_structured(3)> is undefined
> -if C<*error> is non-zero on entry to the completion callback.  It is
> -recommended that if you choose to use automatic command retirement
> -(where the completion callback returns C<1> to avoid needing to call
> +with asynchronous commands), it will not be called if the initial API
> +call fails (such as attempting an asynchronous command in the wrong
> +state - there is nothing to be completed since the command was not
> +queued), but will otherwise be called exactly once, and the completion
> +callback must not ignore the value pointed to by C<error>.  In
> +particular, the content of a buffer passed to L<nbd_aio_pread(3)> or
> +L<nbd_aio_pread_structured(3)> is undefined if C<*error> is non-zero
> +on entry to the completion callback.  It is recommended that if you
> +choose to use automatic command retirement (where the completion
> +callback returns C<1> to avoid needing to call
>  L<nbd_aio_command_completed(3)> later), your completion function
>  should return C<1> on all control paths, even when handling errors
>  (note that with automatic retirement, assigning into C<error> is
> diff --git a/lib/handle.c b/lib/handle.c
> index 0f11bee5..1d66d585 100644
> --- a/lib/handle.c
> +++ b/lib/handle.c
> @@ -37,7 +37,10 @@ free_cmd_list (struct command *list)
>    struct command *cmd, *cmd_next;
> 
>    for (cmd = list; cmd != NULL; cmd = cmd_next) {
> +    int error = cmd->error ? : ENOTCONN;
> +
>      cmd_next = cmd->next;
> +    CALL_CALLBACK (cmd->cb.completion, &error);
>      nbd_internal_retire_and_free_command (cmd);
>    }
>  }
> diff --git a/tests/closure-lifetimes.c b/tests/closure-lifetimes.c
> index 9bb4e120..23691631 100644
> --- a/tests/closure-lifetimes.c
> +++ b/tests/closure-lifetimes.c
> @@ -66,6 +66,7 @@ read_cb (void *opaque,
>           uint64_t offset, unsigned status, int *error)
>  {
>    assert (!read_cb_freed);
> +  assert (!completion_cb_called);
>    read_cb_called++;
>    return 0;
>  }
> @@ -73,6 +74,7 @@ read_cb (void *opaque,
>  static void
>  read_cb_free (void *opaque)
>  {
> +  assert (!completion_cb_freed);
>    read_cb_freed++;
>  }
> 
> @@ -81,6 +83,7 @@ block_status_cb (void *opaque, const char *meta, uint64_t 
> offset,
>                   uint32_t *entries, size_t nr_entries, int *error)
>  {
>    assert (!block_status_cb_freed);
> +  assert (!completion_cb_called);
>    block_status_cb_called++;
>    return 0;
>  }
> @@ -88,6 +91,7 @@ block_status_cb (void *opaque, const char *meta, uint64_t 
> offset,
>  static void
>  block_status_cb_free (void *opaque)
>  {
> +  assert (!completion_cb_freed);
>    block_status_cb_freed++;
>  }
> 
> @@ -150,6 +154,10 @@ main (int argc, char *argv[])
>    cookie = nbd_aio_pread_structured (nbd, buf, sizeof buf, 0, chunk_callback,
>                                       completion_callback, 0);
>    if (cookie == -1) NBD_ERROR;
> +  /* read_cb_called is indeterminate at this point, as state machine
> +   * progress may vary based on task schduling and network speed factors.
> +   */
> +  assert (completion_cb_called == 0);
>    assert (read_cb_freed == 0);
>    assert (completion_cb_freed == 0);
>    while (!nbd_aio_command_completed (nbd, cookie)) {
> @@ -179,6 +187,8 @@ main (int argc, char *argv[])
>    nbd_kill_subprocess (nbd, 0);
>    nbd_close (nbd);
> 
> +  /* read_cb_called is indeterminate based on timing of kill. */
> +  assert (completion_cb_called == 1);
>    assert (read_cb_freed == 1);
>    assert (completion_cb_freed == 1);
> 
> @@ -198,6 +208,8 @@ main (int argc, char *argv[])
>      fprintf (stderr, "%s: Expecting block_status failure\n", argv[0]);
>      exit (EXIT_FAILURE);
>    }
> +  assert (block_status_cb_called == 0);
> +  assert (completion_cb_called == 0);
>    assert (block_status_cb_freed == 1);
>    assert (completion_cb_freed == 1);
> 
> @@ -212,6 +224,8 @@ main (int argc, char *argv[])
>      fprintf (stderr, "%s: Expecting block_status failure\n", argv[0]);
>      exit (EXIT_FAILURE);
>    }
> +  assert (block_status_cb_called == 0);
> +  assert (completion_cb_called == 0);
>    assert (block_status_cb_freed == 1);
>    assert (completion_cb_freed == 1);
> 

_______________________________________________
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

Re: [Libguestfs] [libnbd PATCH 1/2] api: Tighten rules on completion.callback

Reply via email to