> -----Original Message----- > From: Maxime Coquelin <maxime.coque...@redhat.com> > Sent: Tuesday, June 23, 2020 4:56 PM > To: Matan Azrad <ma...@mellanox.com>; Xiao Wang > <xiao.w.w...@intel.com> > Cc: dev@dpdk.org > Subject: Re: [PATCH v1 3/4] vhost: improve device ready definition > > Hi Matan, > > On 6/23/20 1:53 PM, Matan Azrad wrote: > > > > > > From: Maxime Coquelin: > >> On 6/23/20 11:02 AM, Matan Azrad wrote: > >>> > >>> > >>> From: Maxime Coquelin: > >>>> On 6/22/20 5:51 PM, Matan Azrad wrote: > >>>>> > >>>>> > >>>>> From: Maxime Coquelin: > >>>>>> On 6/22/20 3:43 PM, Matan Azrad wrote: > >>>>>>> > >>>>>>> > >>>>>>> From: Maxime Coquelin: > >>>>>>>> Sent: Monday, June 22, 2020 3:33 PM > >>>>>>>> To: Matan Azrad <ma...@mellanox.com>; Xiao Wang > >>>>>>>> <xiao.w.w...@intel.com> > >>>>>>>> Cc: dev@dpdk.org > >>>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready > >>>>>>>> definition > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> On 6/22/20 12:06 PM, Matan Azrad wrote: > >>>>>>>>> > >>>>>>>>> Hi Maxime > >>>>>>>>> > >>>>>>>>> From: Maxime Coquelin <maxime.coque...@redhat.com> > >>>>>>>>>> Sent: Monday, June 22, 2020 11:56 AM > >>>>>>>>>> To: Matan Azrad <ma...@mellanox.com>; Xiao Wang > >>>>>>>>>> <xiao.w.w...@intel.com> > >>>>>>>>>> Cc: dev@dpdk.org > >>>>>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready > >>>>>>>>>> definition > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On 6/22/20 10:41 AM, Matan Azrad wrote: > >>>>>>>>>>>> The issue is if you only check ready state only before and > >>>>>>>>>>>> after the message affecting the ring is handled, it can be > >>>>>>>>>>>> ready at both stages, while the rings have changed and > >>>>>>>>>>>> state change callback should > >>>>>>>>>> have been called. > >>>>>>>>>>> But in this version I checked twice, before message handler > >>>>>>>>>>> and after > >>>>>>>>>> message handler, so it should catch any update. > >>>>>>>>>> > >>>>>>>>>> No, this is not enough, we have to check also during some > >>>>>>>>>> handlers, so that the ready state is invalidated because > >>>>>>>>>> sometimes it will be ready before and after the message > >>>>>>>>>> handler but > >>>> with different values. > >>>>>>>>>> > >>>>>>>>>> That's what I did in my example patch: > >>>>>>>>>> @@ -1847,15 +1892,16 @@ vhost_user_set_vring_kick(struct > >>>>>> virtio_net > >>>>>>>>>> **pdev, struct VhostUserMsg *msg, > >>>>>>>>>> > >>>>>>>>>> ... > >>>>>>>>>> > >>>>>>>>>> if (vq->kickfd >= 0) > >>>>>>>>>> close(vq->kickfd); > >>>>>>>>>> + > >>>>>>>>>> + vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD; > >>>>>>>>>> + > >>>>>>>>>> + vhost_user_update_vring_state(dev, file.index); > >>>>>>>>>> + > >>>>>>>>>> vq->kickfd = file.fd; > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Without that, the ready check will return ready before and > >>>>>>>>>> after the kickfd changed and the driver won't be notified. > >>>>>>>>> > >>>>>>>>> The driver will be notified in the next > >>>>>>>>> VHOST_USER_SET_VRING_ENABLE > >>>>>>>> message according to v1. > >>>>>>>>> > >>>>>>>>> One of our assumption we agreed on in the design mail is that > >>>>>>>>> it doesn't > >>>>>>>> make sense that QEMU will change queue configuration without > >>>>>>>> enabling the queue again. > >>>>>>>>> Because of that we decided to force calling state callback > >>>>>>>>> again when > >>>>>>>> QEMU send VHOST_USER_SET_VRING_ENABLE(1) message even > if > >> the > >>>>>> queue is > >>>>>>>> already ready. > >>>>>>>>> So when driver/app see state enable->enable, it should take > >>>>>>>>> into account > >>>>>>>> that the queue configuration was probably changed. > >>>>>>>>> > >>>>>>>>> I think that this assumption is correct according to the QEMU > code. > >>>>>>>> > >>>>>>>> Yes, this was our initial assumption. > >>>>>>>> But now looking into the details of the implementation, I find > >>>>>>>> it is even cleaner & clearer not to do this assumption. > >>>>>>>> > >>>>>>>>> That's why I prefer to collect all the ready checks callbacks > >>>>>>>>> (queue state and > >>>>>>>> device new\conf) to one function that will be called after the > >>>>>>>> message > >>>>>>>> handler: > >>>>>>>>> Pseudo: > >>>>>>>>> vhost_user_update_ready_statuses() { > >>>>>>>>> switch (msg): > >>>>>>>>> case enable: > >>>>>>>>> if(enable is 1) > >>>>>>>>> force queue state =1. > >>>>>>>>> case callfd > >>>>>>>>> case kickfd > >>>>>>>>> ..... > >>>>>>>>> Check queue and device ready + call callbacks if > >> needed.. > >>>>>>>>> Default > >>>>>>>>> Return; > >>>>>>>>> } > >>>>>>>> > >>>>>>>> I find it more natural to "invalidate" ready state where it is > >>>>>>>> handled (after vring_invalidate(), before setting new FD for > >>>>>>>> call & kick, ...) > >>>>>>> > >>>>>>> I think that if you go with this direction, if the first queue > >>>>>>> pair is invalidated, > >>>>>> you need to notify app\driver also about device ready change. > >>>>>>> Also it will cause 2 notifications to the driver instead of one > >>>>>>> in case of FD > >>>>>> change. > >>>>>> > >>>>>> You'll always end-up with two notifications, either Qemu has sent > >>>>>> the disable and so you'll have one notification for the disable > >>>>>> and one for the enable, or it didn't sent the disable and it will > >>>>>> happen at old value invalidation time and after new value is > >>>>>> taken into > >> account. > >>>>>> > >>>>> > >>>>> I don't see it in current QEMU behavior. > >>>>> When working MQ I see that some virtqs get configuration message > >>>>> while > >>>> they are in enabled state. > >>>>> Then, enable message is sent again later. > >>>> > >>>> I guess you mean the first queue pair? And it would not be in ready > >>>> state as it would be the initial configuration of the queue? > >>> > >>> Even after initialization when queue is ready. > >>> > >>>>> > >>>>>>> Why not to take this correct assumption and update ready state > >>>>>>> only in one > >>>>>> point in the code instead of doing it in all the configuration > >>>>>> handlers > >>>> around? > >>>>>>> IMO, It is correct, less intrusive, simpler, clearer and cleaner. > >>>>>> > >>>>>> I just looked closer at the Vhost-user spec, and I'm no more so > >>>>>> sure this is a correct assumption: > >>>>>> > >>>>>> "While processing the rings (whether they are enabled or not), > >>>>>> client must support changing some configuration aspects on the fly." > >>>>> > >>>>> Ok, this doesn't explain how configuration is changed on the fly. > >>>> > >>>> I agree it lacks a bit of clarity. > >>>> > >>>>> As I mentioned, QEMU sends enable message always after > >>>>> configuration > >>>> message. > >>>> > >>>> Yes, but we should not do assumptions on current Qemu version when > >>>> possible. Better to be safe and follow the specification, it will > >>>> be more > >> robust. > >>>> There is also the Virtio-user PMD to take into account for example. > >>> > >>> I understand your point here but do you really want to be ready for > >>> any > >> configuration update in run time? > >>> What does it mean? How datatpath should handle configuration from > >> control thread in run time while traffic is on? > >>> For example, changing queue size \ addresses must stop traffic before... > >>> Also changing FDs is very sensitive. > >>> > >>> It doesn't make sense to me. > >>> > >>> Also, according to "on the fly" direction we should not disable the > >>> queue > >> unless enable message is coming to disable it. > > > > No response, so looks like you agree that it doesn't make sense. > > No, my reply was general to all your comments. > > With SW backend, I agree we don't need to disable the rings in case of > asynchronous changes to the ring because we protect it with a lock, so we > are sure the ring won't be accessed by another thread while doing the > change. > > For vDPA case that's more problematic because we have no such locking > mechanism. > > For example memory hotplug, Qemu does not seem to disable the queues > so we need to stop the vDPA device one way or another so that it does not > process the rings while the Vhost lib remaps the memory areas. > > >>> In addition: > >>> Do you really want to toggle vDPA drivers\app for any configuration > >> message? It may cause queue recreation for each one (at least for mlx5). > >> > >> I want to have something robust and maintainable. > > > > Me too. > > > >> These messages arriving after a queue have been configured once are > >> rare events, but this is usually the kind of things that cause maintenance > burden. > > > > In case of guest poll mode (testpmd virtio) we all the time get callfd > > twice. > > Right. > > >> If you look at my example patch, you will understand that with my > >> proposal, there won't be any more state change notification than with > >> your proposal when Qemu or any other Vhost-user master send a disable > >> request before sending the request that impact the queue state. > > > > we didn't talk about disable time - this one is very simple. > > > > Yes, In case the queue is disabled your proposal doesn't send extra > notification as my. > > But in case the queue is ready, your proposal send extra not ready > notification for kikfd,callfd,set_vring_base configurations. > > I think this is necessary for synchronization with the Vhost-user master (in > case the master asks for this synchronization, like set_mem_table for > instance when reply-ack is enabled). > > >> It just adds more robustness if this unlikely event happens, by > >> invalidating the ring state to not ready before doing the actual ring > configuration change. > >> So that this config change is not missed by the vDPA driver or the > application. > > > > One more issue here is that there is some time that device is ready (already > configured) and the first vittq-pair is not ready (your invalidate proposal > for > set_vring_base). > > > > > It doesn’t save the concept that device is ready only in case the first > > virtq- > pair is ready. > > I understand the spec as "the device is ready as soon as the first queue pair > is > ready", but I might be wrong. > > Do you suggest to call the dev_close() vDPA callback and the > destroy_device() application callback as soon as one of the ring of the first > queue pair receive a disable request or, with my patch, when one of the > rings receives a request that changes the ring state?
I means, your proposal actually may make first virtq-pair ready state disabled when device ready. So, yes, it leads to call device close\destroy. > > I will not insist anymore on waiting for enable for notifying although I not > fan with it. > > > > So, I suggest to create 1 notification function to be called after message > handler and before reply. > > This function is the only one which notify ready states in the next options: > > > > 1. virtq ready state is changed in the queue. > > 2. virtq ready state stays on after configuration message handler. > > 3. device state will be enabled when the first queue pair is ready. > > IIUC, it will not disable the queues when there is a state change, is that > correct? If so, I think it does not work with memory hotplug case I mentioned > earlier. It will do enable again which mean - something was modified. > Even for the callfd double change it can be problematic as Vhost-lib will > close > the first one while it will still be used by the driver (Btw, I see my example > patch is also buggy in this regards, it should reset the call_fd value in the > virtqueue, then call > vhost_user_update_vring_state() and finally close the FD). Yes, this one leads for different handle for each message. Maybe it leads for new queue modify operation. So, queue doesn't send the state - just does configuration change on the fly. What do you think? > Thanks, > Maxime > > > > Matan > > > > > > > >> Maxime > >