Re: [PATCH v2 5/5] vhost: add device started check in migration set log

Raphael Norwitz Wed, 06 May 2020 15:09:19 -0700

As you correctly point out, this code needs to be looked at more
carefully so that
if the device does disconnect in the background we can handle the migration path
gracefully. In particular, we need to decide whether a migration
should be allowed
to continue if a device disconnects durning the migration stage.


mst, any thoughts?

Have you looked at the suggestion I gave Li Feng to move vhost_dev_cleanup()
into the connection path in vhost-user-blk? I’m not sure if he’s
actively working on it,
but I would prefer if we can find a way to keep some state around
between reconnects
so we aren’t constantly checking dev->started. A device can be stopped
for reasons
other than backend disconnect so I’d rather not reuse this field to
check for backend
disconnect failures.

On Thu, Apr 30, 2020 at 9:57 AM Dima Stepanov <dimas...@yandex-team.ru> wrote:
>
> If vhost-user daemon is used as a backend for the vhost device, then we
> should consider a possibility of disconnect at any moment. If such
> disconnect happened in the vhost_migration_log() routine the vhost
> device structure will be clean up.
> At the start of the vhost_migration_log() function there is a check:
>   if (!dev->started) {
>       dev->log_enabled = enable;
>       return 0;
>   }
> To be consistent with this check add the same check after calling the
> vhost_dev_set_log() routine. This in general help not to break a

Could you point to the specific asserts which are being triggered?

> migration due the assert() message. But it looks like that this code
> should be revised to handle these errors more carefully.
>
> In case of vhost-user device backend the fail paths should consider the
> state of the device. In this case we should skip some function calls
> during rollback on the error paths, so not to get the NULL dereference
> errors.
>
> Signed-off-by: Dima Stepanov <dimas...@yandex-team.ru>
> ---
>  hw/virtio/vhost.c | 39 +++++++++++++++++++++++++++++++++++----
>  1 file changed, 35 insertions(+), 4 deletions(-)
>
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 3ee50c4..d5ab96d 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -787,6 +787,17 @@ static int vhost_dev_set_features(struct vhost_dev *dev,
>  static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
>  {
>      int r, i, idx;

A couple points here


(1) This will fail the live migration if the device is disconnected.
That my be the right thing
      to do, but if there are cases where migrations can proceed with
a disconnected device,
      this may not be desirable.

(2) This looks racy. As far as I can tell vhost_dev_set_log() is only
called by vhost_migration_log(),
      and as you say one of the first things vhost_migration_log does
is return if dev->started is not
      set. What’s to stop a disconnect from clearing the vdev right
after this check, just before
      vhost_dev_set_features() is called?

As stated above, I would prefer it if we could add some state which
would persist between
reconnects which could then be checked in the vhost-user code before
interacting with
the backend. I understand this will be a much more involved change and
will require a lot
of thought.

Also, regarding (1) above, if the original check in
vhost_migration_log() returns success if the
device is not started why return an error here? I imagine this could
lead to some inconsistent
behavior if the device disconnects before the first check verses
before the second.

> +
> +    if (!dev->started) {
> +        /*
> +         * If vhost-user daemon is used as a backend for the
> +         * device and the connection is broken, then the vhost_dev
> +         * structure will be reset all its values to 0.
> +         * Add additional check for the device state.
> +         */
> +        return -1;
> +    }
> +
>      r = vhost_dev_set_features(dev, enable_log);
>      if (r < 0) {
>          goto err_features;
> @@ -801,12 +812,19 @@ static int vhost_dev_set_log(struct vhost_dev *dev, 
> bool enable_log)
>      }

Re: [PATCH v2 5/5] vhost: add device started check in migration set log

Reply via email to