Re: [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition

Matan Azrad Tue, 23 Jun 2020 22:55:03 -0700

Ho Maxime

Good morning


From: Maxime Coquelin:
> On 6/23/20 4:52 PM, Matan Azrad wrote:
> >
> >
> >> -----Original Message-----
> >> From: Maxime Coquelin <maxime.coque...@redhat.com>
> >> Sent: Tuesday, June 23, 2020 4:56 PM
> >> To: Matan Azrad <ma...@mellanox.com>; Xiao Wang
> >> <xiao.w.w...@intel.com>
> >> Cc: dev@dpdk.org
> >> Subject: Re: [PATCH v1 3/4] vhost: improve device ready definition
> >>
> >> Hi Matan,
> >>
> >> On 6/23/20 1:53 PM, Matan Azrad wrote:
> >>>
> >>>
> >>> From: Maxime Coquelin:
> >>>> On 6/23/20 11:02 AM, Matan Azrad wrote:
> >>>>>
> >>>>>
> >>>>> From: Maxime Coquelin:
> >>>>>> On 6/22/20 5:51 PM, Matan Azrad wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> From: Maxime Coquelin:
> >>>>>>>> On 6/22/20 3:43 PM, Matan Azrad wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> From: Maxime Coquelin:
> >>>>>>>>>> Sent: Monday, June 22, 2020 3:33 PM
> >>>>>>>>>> To: Matan Azrad <ma...@mellanox.com>; Xiao Wang
> >>>>>>>>>> <xiao.w.w...@intel.com>
> >>>>>>>>>> Cc: dev@dpdk.org
> >>>>>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready
> >>>>>>>>>> definition
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On 6/22/20 12:06 PM, Matan Azrad wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Hi Maxime
> >>>>>>>>>>>
> >>>>>>>>>>> From: Maxime Coquelin <maxime.coque...@redhat.com>
> >>>>>>>>>>>> Sent: Monday, June 22, 2020 11:56 AM
> >>>>>>>>>>>> To: Matan Azrad <ma...@mellanox.com>; Xiao Wang
> >>>>>>>>>>>> <xiao.w.w...@intel.com>
> >>>>>>>>>>>> Cc: dev@dpdk.org
> >>>>>>>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready
> >>>>>>>>>>>> definition
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 6/22/20 10:41 AM, Matan Azrad wrote:
> >>>>>>>>>>>>>> The issue is if you only check ready state only before
> >>>>>>>>>>>>>> and after the message affecting the ring is handled, it
> >>>>>>>>>>>>>> can be ready at both stages, while the rings have changed
> >>>>>>>>>>>>>> and state change callback should
> >>>>>>>>>>>> have been called.
> >>>>>>>>>>>>> But in this version I checked twice, before message
> >>>>>>>>>>>>> handler and after
> >>>>>>>>>>>> message handler, so it should catch any update.
> >>>>>>>>>>>>
> >>>>>>>>>>>> No, this is not enough, we have to check also during some
> >>>>>>>>>>>> handlers, so that the ready state is invalidated because
> >>>>>>>>>>>> sometimes it will be ready before and after the message
> >>>>>>>>>>>> handler but
> >>>>>> with different values.
> >>>>>>>>>>>>
> >>>>>>>>>>>> That's what I did in my example patch:
> >>>>>>>>>>>> @@ -1847,15 +1892,16 @@
> vhost_user_set_vring_kick(struct
> >>>>>>>> virtio_net
> >>>>>>>>>>>> **pdev, struct VhostUserMsg *msg,
> >>>>>>>>>>>>
> >>>>>>>>>>>> ...
> >>>>>>>>>>>>
> >>>>>>>>>>>>         if (vq->kickfd >= 0)
> >>>>>>>>>>>>                 close(vq->kickfd);
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +       vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +       vhost_user_update_vring_state(dev, file.index);
> >>>>>>>>>>>> +
> >>>>>>>>>>>>         vq->kickfd = file.fd;
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Without that, the ready check will return ready before and
> >>>>>>>>>>>> after the kickfd changed and the driver won't be notified.
> >>>>>>>>>>>
> >>>>>>>>>>> The driver will be notified in the next
> >>>>>>>>>>> VHOST_USER_SET_VRING_ENABLE
> >>>>>>>>>> message according to v1.
> >>>>>>>>>>>
> >>>>>>>>>>> One of our assumption we agreed on in the design mail is
> >>>>>>>>>>> that it doesn't
> >>>>>>>>>> make sense that QEMU will change queue configuration
> without
> >>>>>>>>>> enabling the queue again.
> >>>>>>>>>>> Because of that we decided to force calling state callback
> >>>>>>>>>>> again when
> >>>>>>>>>> QEMU send VHOST_USER_SET_VRING_ENABLE(1) message
> even
> >> if
> >>>> the
> >>>>>>>> queue is
> >>>>>>>>>> already ready.
> >>>>>>>>>>> So when driver/app see state enable->enable, it should take
> >>>>>>>>>>> into account
> >>>>>>>>>> that the queue configuration was probably changed.
> >>>>>>>>>>>
> >>>>>>>>>>> I think that this assumption is correct according to the
> >>>>>>>>>>> QEMU
> >> code.
> >>>>>>>>>>
> >>>>>>>>>> Yes, this was our initial assumption.
> >>>>>>>>>> But now looking into the details of the implementation, I
> >>>>>>>>>> find it is even cleaner & clearer not to do this assumption.
> >>>>>>>>>>
> >>>>>>>>>>> That's why I prefer to collect all the ready checks
> >>>>>>>>>>> callbacks (queue state and
> >>>>>>>>>> device new\conf) to one function that will be called after
> >>>>>>>>>> the message
> >>>>>>>>>> handler:
> >>>>>>>>>>> Pseudo:
> >>>>>>>>>>>  vhost_user_update_ready_statuses() {
> >>>>>>>>>>>   switch (msg):
> >>>>>>>>>>>           case enable:
> >>>>>>>>>>>                   if(enable is 1)
> >>>>>>>>>>>                           force queue state =1.
> >>>>>>>>>>>           case callfd
> >>>>>>>>>>>           case kickfd
> >>>>>>>>>>>                           .....
> >>>>>>>>>>>           Check queue and device ready + call callbacks if
> >>>> needed..
> >>>>>>>>>>>           Default
> >>>>>>>>>>>                   Return;
> >>>>>>>>>>> }
> >>>>>>>>>>
> >>>>>>>>>> I find it more natural to "invalidate" ready state where it
> >>>>>>>>>> is handled (after vring_invalidate(), before setting new FD
> >>>>>>>>>> for call & kick, ...)
> >>>>>>>>>
> >>>>>>>>> I think that if you go with this direction, if the first queue
> >>>>>>>>> pair is invalidated,
> >>>>>>>> you need to notify app\driver also about device ready change.
> >>>>>>>>> Also it will cause 2 notifications to the driver instead of
> >>>>>>>>> one in case of FD
> >>>>>>>> change.
> >>>>>>>>
> >>>>>>>> You'll always end-up with two notifications, either Qemu has
> >>>>>>>> sent the disable and so you'll have one notification for the
> >>>>>>>> disable and one for the enable, or it didn't sent the disable
> >>>>>>>> and it will happen at old value invalidation time and after new
> >>>>>>>> value is taken into
> >>>> account.
> >>>>>>>>
> >>>>>>>
> >>>>>>> I don't see it in current QEMU behavior.
> >>>>>>> When working MQ I see that some virtqs get configuration
> message
> >>>>>>> while
> >>>>>> they are in enabled state.
> >>>>>>> Then, enable message is sent again later.
> >>>>>>
> >>>>>> I guess you mean the first queue pair? And it would not be in
> >>>>>> ready state as it would be the initial configuration of the queue?
> >>>>>
> >>>>> Even after initialization when queue is ready.
> >>>>>
> >>>>>>>
> >>>>>>>>> Why not to take this correct assumption and update ready state
> >>>>>>>>> only in one
> >>>>>>>> point in the code instead of doing it in all the configuration
> >>>>>>>> handlers
> >>>>>> around?
> >>>>>>>>> IMO, It is correct, less intrusive, simpler, clearer and cleaner.
> >>>>>>>>
> >>>>>>>> I just looked closer at the Vhost-user spec, and I'm no more so
> >>>>>>>> sure this is a correct assumption:
> >>>>>>>>
> >>>>>>>> "While processing the rings (whether they are enabled or not),
> >>>>>>>> client must support changing some configuration aspects on the
> fly."
> >>>>>>>
> >>>>>>> Ok, this doesn't explain how configuration is changed on the fly.
> >>>>>>
> >>>>>> I agree it lacks a bit of clarity.
> >>>>>>
> >>>>>>> As I mentioned, QEMU sends enable message always after
> >>>>>>> configuration
> >>>>>> message.
> >>>>>>
> >>>>>> Yes, but we should not do assumptions on current Qemu version
> >>>>>> when possible. Better to be safe and follow the specification, it
> >>>>>> will be more
> >>>> robust.
> >>>>>> There is also the Virtio-user PMD to take into account for example.
> >>>>>
> >>>>> I understand your point here but do you really want to be ready
> >>>>> for any
> >>>> configuration update in run time?
> >>>>> What does it mean? How datatpath should handle configuration from
> >>>> control thread in run time while traffic is on?
> >>>>> For example, changing queue size \ addresses must stop traffic
> before...
> >>>>> Also changing FDs is very sensitive.
> >>>>>
> >>>>> It doesn't make sense to me.
> >>>>>
> >>>>> Also, according to "on the fly" direction we should not disable
> >>>>> the queue
> >>>> unless enable message is coming to disable it.
> >>>
> >>> No response, so looks like you agree that it doesn't make sense.
> >>
> >> No, my reply was general to all your comments.
> >>
> >> With SW backend, I agree we don't need to disable the rings in case
> >> of asynchronous changes to the ring because we protect it with a
> >> lock, so we are sure the ring won't be accessed by another thread
> >> while doing the change.
> >>
> >> For vDPA case that's more problematic because we have no such locking
> >> mechanism.
> >>
> >> For example memory hotplug, Qemu does not seem to disable the
> queues
> >> so we need to stop the vDPA device one way or another so that it does
> >> not process the rings while the Vhost lib remaps the memory areas.
> >>
> >>>>> In addition:
> >>>>> Do you really want to toggle vDPA drivers\app for any
> >>>>> configuration
> >>>> message? It may cause queue recreation for each one (at least for
> mlx5).
> >>>>
> >>>> I want to have something robust and maintainable.
> >>>
> >>> Me too.
> >>>
> >>>> These messages arriving after a queue have been configured once are
> >>>> rare events, but this is usually the kind of things that cause
> >>>> maintenance
> >> burden.
> >>>
> >>> In case of guest poll mode (testpmd virtio) we all the time get callfd
> twice.
> >>
> >> Right.
> >>
> >>>> If you look at my example patch, you will understand that with my
> >>>> proposal, there won't be any more state change notification than
> >>>> with your proposal when Qemu or any other Vhost-user master send a
> >>>> disable request before sending the request that impact the queue
> state.
> >>>
> >>> we didn't talk about disable time - this one is very simple.
> >>>
> >>> Yes, In case the queue is disabled your proposal doesn't send extra
> >> notification as my.
> >>> But in case the queue is ready, your proposal send extra not ready
> >> notification for kikfd,callfd,set_vring_base configurations.
> >>
> >> I think this is necessary for synchronization with the Vhost-user
> >> master (in case the master asks for this synchronization, like
> >> set_mem_table for instance when reply-ack is enabled).
> >>
> >>>> It just adds more robustness if this unlikely event happens, by
> >>>> invalidating the ring state to not ready before doing the actual
> >>>> ring
> >> configuration change.
> >>>> So that this config change is not missed by the vDPA driver or the
> >> application.
> >>>
> >>> One more issue here is that there is some time that device is ready
> >>> (already
> >> configured) and the first vittq-pair is not ready (your invalidate
> >> proposal for set_vring_base).
> >>
> >>
> >>
> >>> It doesn’t save the concept that device is ready only in case the
> >>> first virtq-
> >> pair is ready.
> >>
> >> I understand the spec as "the device is ready as soon as the first
> >> queue pair is ready", but I might be wrong.
> >>
> >> Do you suggest to call the dev_close() vDPA callback and the
> >> destroy_device() application callback as soon as one of the ring of
> >> the first queue pair receive a disable request or, with my patch,
> >> when one of the rings receives a request that changes the ring state?
> >
> > I means, your proposal actually may make first virtq-pair ready state
> disabled when device ready.
> > So, yes, it leads to call device close\destroy.
> 
> No it doesn't, there is no call to .dev_close()/.destroy_device() with my
> patch if first queue pair gets disabled.
> 
> >>> I will not insist anymore on waiting for enable for notifying
> >>> although I not
> >> fan with it.
> >>>
> >>> So, I suggest to create 1 notification function to be called after
> >>> message
> >> handler and before reply.
> >>> This function is the only one which notify ready states in the next
> options:
> >>>
> >>> 1. virtq ready state is changed in the queue.
> >>> 2. virtq ready state stays on after configuration message handler.
> >>> 3. device state will be enabled when the first queue pair is ready.
> >>
> >> IIUC, it will not disable the queues when there is a state change, is
> >> that correct? If so, I think it does not work with memory hotplug
> >> case I mentioned earlier.
> >
> > It will do enable again which mean - something was modified.
> 
> Ok, thanks for the clarification.
> 
> I think it is not enough for the examples I gave below. For set_mem_table,
> we need to stop the device from processing the vrings before the
> set_mem_table handler calls the munmap(), and re-enable it after the
> mmap() (I did that wrong in my example patch, I just did that after the
> munmap/mmap happened, which is too late).
> 
> >> Even for the callfd double change it can be problematic as Vhost-lib
> >> will close the first one while it will still be used by the driver
> >> (Btw, I see my example patch is also buggy in this regards, it should
> >> reset the call_fd value in the virtqueue, then call
> >> vhost_user_update_vring_state() and finally close the FD).
> >
> > Yes, this one leads for different handle for each message.
> >
> > Maybe it leads for new queue modify operation.
> > So, queue doesn't send the state - just does configuration change on the
> fly.
> >
> > What do you think?
> 
> I think that configuration on the fly doesn't fly.
> We would at least need to stop the device from processing the rings for
> memory hotplug case, so why not just send a disable notification?

Yes, driver need notification here.

> And for the double callfd, that does not look right to me not to request the
> driver to stop using it before it is closed, isn't it?

Yes, and some drivers (include mlx5) may stop the traffic in this case too.

modify\update operation will solve all:

For example:

In memory hotplug:
Do new mmap
Call modify
Do munmup for old.

In callfd\kickfd change:

Set new FD.
Call modify.
Close old FD.

Modify is clearer, save calls and faster (datapath will back faster).


>  Thanks,
> Maxime
> 
> >
> >> Thanks,
> >> Maxime
> >>>
> >>> Matan
> >>>
> >>>
> >>>
> >>>> Maxime
> >>>
> >

Re: [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition

Reply via email to