Re: [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition

Matan Azrad Tue, 23 Jun 2020 07:53:23 -0700


> -----Original Message-----
> From: Maxime Coquelin <maxime.coque...@redhat.com>
> Sent: Tuesday, June 23, 2020 4:56 PM
> To: Matan Azrad <ma...@mellanox.com>; Xiao Wang
> <xiao.w.w...@intel.com>
> Cc: dev@dpdk.org
> Subject: Re: [PATCH v1 3/4] vhost: improve device ready definition
> 
> Hi Matan,
> 
> On 6/23/20 1:53 PM, Matan Azrad wrote:
> >
> >
> > From: Maxime Coquelin:
> >> On 6/23/20 11:02 AM, Matan Azrad wrote:
> >>>
> >>>
> >>> From: Maxime Coquelin:
> >>>> On 6/22/20 5:51 PM, Matan Azrad wrote:
> >>>>>
> >>>>>
> >>>>> From: Maxime Coquelin:
> >>>>>> On 6/22/20 3:43 PM, Matan Azrad wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> From: Maxime Coquelin:
> >>>>>>>> Sent: Monday, June 22, 2020 3:33 PM
> >>>>>>>> To: Matan Azrad <ma...@mellanox.com>; Xiao Wang
> >>>>>>>> <xiao.w.w...@intel.com>
> >>>>>>>> Cc: dev@dpdk.org
> >>>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready
> >>>>>>>> definition
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 6/22/20 12:06 PM, Matan Azrad wrote:
> >>>>>>>>>
> >>>>>>>>> Hi Maxime
> >>>>>>>>>
> >>>>>>>>> From: Maxime Coquelin <maxime.coque...@redhat.com>
> >>>>>>>>>> Sent: Monday, June 22, 2020 11:56 AM
> >>>>>>>>>> To: Matan Azrad <ma...@mellanox.com>; Xiao Wang
> >>>>>>>>>> <xiao.w.w...@intel.com>
> >>>>>>>>>> Cc: dev@dpdk.org
> >>>>>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready
> >>>>>>>>>> definition
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On 6/22/20 10:41 AM, Matan Azrad wrote:
> >>>>>>>>>>>> The issue is if you only check ready state only before and
> >>>>>>>>>>>> after the message affecting the ring is handled, it can be
> >>>>>>>>>>>> ready at both stages, while the rings have changed and
> >>>>>>>>>>>> state change callback should
> >>>>>>>>>> have been called.
> >>>>>>>>>>> But in this version I checked twice, before message handler
> >>>>>>>>>>> and after
> >>>>>>>>>> message handler, so it should catch any update.
> >>>>>>>>>>
> >>>>>>>>>> No, this is not enough, we have to check also during some
> >>>>>>>>>> handlers, so that the ready state is invalidated because
> >>>>>>>>>> sometimes it will be ready before and after the message
> >>>>>>>>>> handler but
> >>>> with different values.
> >>>>>>>>>>
> >>>>>>>>>> That's what I did in my example patch:
> >>>>>>>>>> @@ -1847,15 +1892,16 @@ vhost_user_set_vring_kick(struct
> >>>>>> virtio_net
> >>>>>>>>>> **pdev, struct VhostUserMsg *msg,
> >>>>>>>>>>
> >>>>>>>>>> ...
> >>>>>>>>>>
> >>>>>>>>>>         if (vq->kickfd >= 0)
> >>>>>>>>>>                 close(vq->kickfd);
> >>>>>>>>>> +
> >>>>>>>>>> +       vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
> >>>>>>>>>> +
> >>>>>>>>>> +       vhost_user_update_vring_state(dev, file.index);
> >>>>>>>>>> +
> >>>>>>>>>>         vq->kickfd = file.fd;
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Without that, the ready check will return ready before and
> >>>>>>>>>> after the kickfd changed and the driver won't be notified.
> >>>>>>>>>
> >>>>>>>>> The driver will be notified in the next
> >>>>>>>>> VHOST_USER_SET_VRING_ENABLE
> >>>>>>>> message according to v1.
> >>>>>>>>>
> >>>>>>>>> One of our assumption we agreed on in the design mail is that
> >>>>>>>>> it doesn't
> >>>>>>>> make sense that QEMU will change queue configuration without
> >>>>>>>> enabling the queue again.
> >>>>>>>>> Because of that we decided to force calling state callback
> >>>>>>>>> again when
> >>>>>>>> QEMU send VHOST_USER_SET_VRING_ENABLE(1) message even
> if
> >> the
> >>>>>> queue is
> >>>>>>>> already ready.
> >>>>>>>>> So when driver/app see state enable->enable, it should take
> >>>>>>>>> into account
> >>>>>>>> that the queue configuration was probably changed.
> >>>>>>>>>
> >>>>>>>>> I think that this assumption is correct according to the QEMU
> code.
> >>>>>>>>
> >>>>>>>> Yes, this was our initial assumption.
> >>>>>>>> But now looking into the details of the implementation, I find
> >>>>>>>> it is even cleaner & clearer not to do this assumption.
> >>>>>>>>
> >>>>>>>>> That's why I prefer to collect all the ready checks callbacks
> >>>>>>>>> (queue state and
> >>>>>>>> device new\conf) to one function that will be called after the
> >>>>>>>> message
> >>>>>>>> handler:
> >>>>>>>>> Pseudo:
> >>>>>>>>>  vhost_user_update_ready_statuses() {
> >>>>>>>>>     switch (msg):
> >>>>>>>>>             case enable:
> >>>>>>>>>                     if(enable is 1)
> >>>>>>>>>                             force queue state =1.
> >>>>>>>>>             case callfd
> >>>>>>>>>             case kickfd
> >>>>>>>>>                             .....
> >>>>>>>>>             Check queue and device ready + call callbacks if
> >> needed..
> >>>>>>>>>             Default
> >>>>>>>>>                     Return;
> >>>>>>>>> }
> >>>>>>>>
> >>>>>>>> I find it more natural to "invalidate" ready state where it is
> >>>>>>>> handled (after vring_invalidate(), before setting new FD for
> >>>>>>>> call & kick, ...)
> >>>>>>>
> >>>>>>> I think that if you go with this direction, if the first queue
> >>>>>>> pair is invalidated,
> >>>>>> you need to notify app\driver also about device ready change.
> >>>>>>> Also it will cause 2 notifications to the driver instead of one
> >>>>>>> in case of FD
> >>>>>> change.
> >>>>>>
> >>>>>> You'll always end-up with two notifications, either Qemu has sent
> >>>>>> the disable and so you'll have one notification for the disable
> >>>>>> and one for the enable, or it didn't sent the disable and it will
> >>>>>> happen at old value invalidation time and after new value is
> >>>>>> taken into
> >> account.
> >>>>>>
> >>>>>
> >>>>> I don't see it in current QEMU behavior.
> >>>>> When working MQ I see that some virtqs get configuration message
> >>>>> while
> >>>> they are in enabled state.
> >>>>> Then, enable message is sent again later.
> >>>>
> >>>> I guess you mean the first queue pair? And it would not be in ready
> >>>> state as it would be the initial configuration of the queue?
> >>>
> >>> Even after initialization when queue is ready.
> >>>
> >>>>>
> >>>>>>> Why not to take this correct assumption and update ready state
> >>>>>>> only in one
> >>>>>> point in the code instead of doing it in all the configuration
> >>>>>> handlers
> >>>> around?
> >>>>>>> IMO, It is correct, less intrusive, simpler, clearer and cleaner.
> >>>>>>
> >>>>>> I just looked closer at the Vhost-user spec, and I'm no more so
> >>>>>> sure this is a correct assumption:
> >>>>>>
> >>>>>> "While processing the rings (whether they are enabled or not),
> >>>>>> client must support changing some configuration aspects on the fly."
> >>>>>
> >>>>> Ok, this doesn't explain how configuration is changed on the fly.
> >>>>
> >>>> I agree it lacks a bit of clarity.
> >>>>
> >>>>> As I mentioned, QEMU sends enable message always after
> >>>>> configuration
> >>>> message.
> >>>>
> >>>> Yes, but we should not do assumptions on current Qemu version when
> >>>> possible. Better to be safe and follow the specification, it will
> >>>> be more
> >> robust.
> >>>> There is also the Virtio-user PMD to take into account for example.
> >>>
> >>> I understand your point here but do you really want to be ready for
> >>> any
> >> configuration update in run time?
> >>> What does it mean? How datatpath should handle configuration from
> >> control thread in run time while traffic is on?
> >>> For example, changing queue size \ addresses must stop traffic before...
> >>> Also changing FDs is very sensitive.
> >>>
> >>> It doesn't make sense to me.
> >>>
> >>> Also, according to "on the fly" direction we should not disable the
> >>> queue
> >> unless enable message is coming to disable it.
> >
> > No response, so looks like you agree that it doesn't make sense.
> 
> No, my reply was general to all your comments.
> 
> With SW backend, I agree we don't need to disable the rings in case of
> asynchronous changes to the ring because we protect it with a lock, so we
> are sure the ring won't be accessed by another thread while doing the
> change.
> 
> For vDPA case that's more problematic because we have no such locking
> mechanism.
> 
> For example memory hotplug, Qemu does not seem to disable the queues
> so we need to stop the vDPA device one way or another so that it does not
> process the rings while the Vhost lib remaps the memory areas.
> 
> >>> In addition:
> >>> Do you really want to toggle vDPA drivers\app for any configuration
> >> message? It may cause queue recreation for each one (at least for mlx5).
> >>
> >> I want to have something robust and maintainable.
> >
> > Me too.
> >
> >> These messages arriving after a queue have been configured once are
> >> rare events, but this is usually the kind of things that cause maintenance
> burden.
> >
> > In case of guest poll mode (testpmd virtio) we all the time get callfd 
> > twice.
> 
> Right.
> 
> >> If you look at my example patch, you will understand that with my
> >> proposal, there won't be any more state change notification than with
> >> your proposal when Qemu or any other Vhost-user master send a disable
> >> request before sending the request that impact the queue state.
> >
> > we didn't talk about disable time - this one is very simple.
> >
> > Yes, In case the queue is disabled your proposal doesn't send extra
> notification as my.
> > But in case the queue is ready, your proposal send extra not ready
> notification for kikfd,callfd,set_vring_base configurations.
> 
> I think this is necessary for synchronization with the Vhost-user master (in
> case the master asks for this synchronization, like set_mem_table for
> instance when reply-ack is enabled).
> 
> >> It just adds more robustness if this unlikely event happens, by
> >> invalidating the ring state to not ready before doing the actual ring
> configuration change.
> >> So that this config change is not missed by the vDPA driver or the
> application.
> >
> > One more issue here is that there is some time that device is ready (already
> configured) and the first vittq-pair is not ready (your invalidate proposal 
> for
> set_vring_base).
> 
> 
> 
> > It doesn’t save the concept that device is ready only in case the first 
> > virtq-
> pair is ready.
> 
> I understand the spec as "the device is ready as soon as the first queue pair 
> is
> ready", but I might be wrong.
> 
> Do you suggest to call the dev_close() vDPA callback and the
> destroy_device() application callback as soon as one of the ring of the first
> queue pair receive a disable request or, with my patch, when one of the
> rings receives a request that changes the ring state?


I means, your proposal actually may make first virtq-pair ready state disabled 
when device ready.
So, yes, it leads to call device close\destroy.

> > I will not insist anymore on waiting for enable for notifying although I not
> fan with it.
> >
> > So, I suggest to create 1 notification function to be called after message
> handler and before reply.
> > This function is the only one which notify ready states in the next options:
> >
> > 1. virtq ready state is changed in the queue.
> > 2. virtq ready state stays on after configuration message handler.
> > 3. device state will be enabled when the first queue pair is ready.
> 
> IIUC, it will not disable the queues when there is a state change, is that
> correct? If so, I think it does not work with memory hotplug case I mentioned
> earlier.

It will do enable again which mean - something was modified.

> Even for the callfd double change it can be problematic as Vhost-lib will 
> close
> the first one while it will still be used by the driver (Btw, I see my example
> patch is also buggy in this regards, it should reset the call_fd value in the
> virtqueue, then call
> vhost_user_update_vring_state() and finally close the FD).

Yes, this one leads for different handle for each message.

Maybe it leads for new queue modify operation.
So, queue doesn't send the state - just does configuration change on the fly.

What do you think?

 
> Thanks,
> Maxime
> >
> > Matan
> >
> >
> >
> >> Maxime
> >

Re: [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition

Reply via email to