Ho Maxime Good morning
From: Maxime Coquelin: > On 6/23/20 4:52 PM, Matan Azrad wrote: > > > > > >> -----Original Message----- > >> From: Maxime Coquelin <maxime.coque...@redhat.com> > >> Sent: Tuesday, June 23, 2020 4:56 PM > >> To: Matan Azrad <ma...@mellanox.com>; Xiao Wang > >> <xiao.w.w...@intel.com> > >> Cc: dev@dpdk.org > >> Subject: Re: [PATCH v1 3/4] vhost: improve device ready definition > >> > >> Hi Matan, > >> > >> On 6/23/20 1:53 PM, Matan Azrad wrote: > >>> > >>> > >>> From: Maxime Coquelin: > >>>> On 6/23/20 11:02 AM, Matan Azrad wrote: > >>>>> > >>>>> > >>>>> From: Maxime Coquelin: > >>>>>> On 6/22/20 5:51 PM, Matan Azrad wrote: > >>>>>>> > >>>>>>> > >>>>>>> From: Maxime Coquelin: > >>>>>>>> On 6/22/20 3:43 PM, Matan Azrad wrote: > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> From: Maxime Coquelin: > >>>>>>>>>> Sent: Monday, June 22, 2020 3:33 PM > >>>>>>>>>> To: Matan Azrad <ma...@mellanox.com>; Xiao Wang > >>>>>>>>>> <xiao.w.w...@intel.com> > >>>>>>>>>> Cc: dev@dpdk.org > >>>>>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready > >>>>>>>>>> definition > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On 6/22/20 12:06 PM, Matan Azrad wrote: > >>>>>>>>>>> > >>>>>>>>>>> Hi Maxime > >>>>>>>>>>> > >>>>>>>>>>> From: Maxime Coquelin <maxime.coque...@redhat.com> > >>>>>>>>>>>> Sent: Monday, June 22, 2020 11:56 AM > >>>>>>>>>>>> To: Matan Azrad <ma...@mellanox.com>; Xiao Wang > >>>>>>>>>>>> <xiao.w.w...@intel.com> > >>>>>>>>>>>> Cc: dev@dpdk.org > >>>>>>>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready > >>>>>>>>>>>> definition > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On 6/22/20 10:41 AM, Matan Azrad wrote: > >>>>>>>>>>>>>> The issue is if you only check ready state only before > >>>>>>>>>>>>>> and after the message affecting the ring is handled, it > >>>>>>>>>>>>>> can be ready at both stages, while the rings have changed > >>>>>>>>>>>>>> and state change callback should > >>>>>>>>>>>> have been called. > >>>>>>>>>>>>> But in this version I checked twice, before message > >>>>>>>>>>>>> handler and after > >>>>>>>>>>>> message handler, so it should catch any update. > >>>>>>>>>>>> > >>>>>>>>>>>> No, this is not enough, we have to check also during some > >>>>>>>>>>>> handlers, so that the ready state is invalidated because > >>>>>>>>>>>> sometimes it will be ready before and after the message > >>>>>>>>>>>> handler but > >>>>>> with different values. > >>>>>>>>>>>> > >>>>>>>>>>>> That's what I did in my example patch: > >>>>>>>>>>>> @@ -1847,15 +1892,16 @@ > vhost_user_set_vring_kick(struct > >>>>>>>> virtio_net > >>>>>>>>>>>> **pdev, struct VhostUserMsg *msg, > >>>>>>>>>>>> > >>>>>>>>>>>> ... > >>>>>>>>>>>> > >>>>>>>>>>>> if (vq->kickfd >= 0) > >>>>>>>>>>>> close(vq->kickfd); > >>>>>>>>>>>> + > >>>>>>>>>>>> + vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD; > >>>>>>>>>>>> + > >>>>>>>>>>>> + vhost_user_update_vring_state(dev, file.index); > >>>>>>>>>>>> + > >>>>>>>>>>>> vq->kickfd = file.fd; > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> Without that, the ready check will return ready before and > >>>>>>>>>>>> after the kickfd changed and the driver won't be notified. > >>>>>>>>>>> > >>>>>>>>>>> The driver will be notified in the next > >>>>>>>>>>> VHOST_USER_SET_VRING_ENABLE > >>>>>>>>>> message according to v1. > >>>>>>>>>>> > >>>>>>>>>>> One of our assumption we agreed on in the design mail is > >>>>>>>>>>> that it doesn't > >>>>>>>>>> make sense that QEMU will change queue configuration > without > >>>>>>>>>> enabling the queue again. > >>>>>>>>>>> Because of that we decided to force calling state callback > >>>>>>>>>>> again when > >>>>>>>>>> QEMU send VHOST_USER_SET_VRING_ENABLE(1) message > even > >> if > >>>> the > >>>>>>>> queue is > >>>>>>>>>> already ready. > >>>>>>>>>>> So when driver/app see state enable->enable, it should take > >>>>>>>>>>> into account > >>>>>>>>>> that the queue configuration was probably changed. > >>>>>>>>>>> > >>>>>>>>>>> I think that this assumption is correct according to the > >>>>>>>>>>> QEMU > >> code. > >>>>>>>>>> > >>>>>>>>>> Yes, this was our initial assumption. > >>>>>>>>>> But now looking into the details of the implementation, I > >>>>>>>>>> find it is even cleaner & clearer not to do this assumption. > >>>>>>>>>> > >>>>>>>>>>> That's why I prefer to collect all the ready checks > >>>>>>>>>>> callbacks (queue state and > >>>>>>>>>> device new\conf) to one function that will be called after > >>>>>>>>>> the message > >>>>>>>>>> handler: > >>>>>>>>>>> Pseudo: > >>>>>>>>>>> vhost_user_update_ready_statuses() { > >>>>>>>>>>> switch (msg): > >>>>>>>>>>> case enable: > >>>>>>>>>>> if(enable is 1) > >>>>>>>>>>> force queue state =1. > >>>>>>>>>>> case callfd > >>>>>>>>>>> case kickfd > >>>>>>>>>>> ..... > >>>>>>>>>>> Check queue and device ready + call callbacks if > >>>> needed.. > >>>>>>>>>>> Default > >>>>>>>>>>> Return; > >>>>>>>>>>> } > >>>>>>>>>> > >>>>>>>>>> I find it more natural to "invalidate" ready state where it > >>>>>>>>>> is handled (after vring_invalidate(), before setting new FD > >>>>>>>>>> for call & kick, ...) > >>>>>>>>> > >>>>>>>>> I think that if you go with this direction, if the first queue > >>>>>>>>> pair is invalidated, > >>>>>>>> you need to notify app\driver also about device ready change. > >>>>>>>>> Also it will cause 2 notifications to the driver instead of > >>>>>>>>> one in case of FD > >>>>>>>> change. > >>>>>>>> > >>>>>>>> You'll always end-up with two notifications, either Qemu has > >>>>>>>> sent the disable and so you'll have one notification for the > >>>>>>>> disable and one for the enable, or it didn't sent the disable > >>>>>>>> and it will happen at old value invalidation time and after new > >>>>>>>> value is taken into > >>>> account. > >>>>>>>> > >>>>>>> > >>>>>>> I don't see it in current QEMU behavior. > >>>>>>> When working MQ I see that some virtqs get configuration > message > >>>>>>> while > >>>>>> they are in enabled state. > >>>>>>> Then, enable message is sent again later. > >>>>>> > >>>>>> I guess you mean the first queue pair? And it would not be in > >>>>>> ready state as it would be the initial configuration of the queue? > >>>>> > >>>>> Even after initialization when queue is ready. > >>>>> > >>>>>>> > >>>>>>>>> Why not to take this correct assumption and update ready state > >>>>>>>>> only in one > >>>>>>>> point in the code instead of doing it in all the configuration > >>>>>>>> handlers > >>>>>> around? > >>>>>>>>> IMO, It is correct, less intrusive, simpler, clearer and cleaner. > >>>>>>>> > >>>>>>>> I just looked closer at the Vhost-user spec, and I'm no more so > >>>>>>>> sure this is a correct assumption: > >>>>>>>> > >>>>>>>> "While processing the rings (whether they are enabled or not), > >>>>>>>> client must support changing some configuration aspects on the > fly." > >>>>>>> > >>>>>>> Ok, this doesn't explain how configuration is changed on the fly. > >>>>>> > >>>>>> I agree it lacks a bit of clarity. > >>>>>> > >>>>>>> As I mentioned, QEMU sends enable message always after > >>>>>>> configuration > >>>>>> message. > >>>>>> > >>>>>> Yes, but we should not do assumptions on current Qemu version > >>>>>> when possible. Better to be safe and follow the specification, it > >>>>>> will be more > >>>> robust. > >>>>>> There is also the Virtio-user PMD to take into account for example. > >>>>> > >>>>> I understand your point here but do you really want to be ready > >>>>> for any > >>>> configuration update in run time? > >>>>> What does it mean? How datatpath should handle configuration from > >>>> control thread in run time while traffic is on? > >>>>> For example, changing queue size \ addresses must stop traffic > before... > >>>>> Also changing FDs is very sensitive. > >>>>> > >>>>> It doesn't make sense to me. > >>>>> > >>>>> Also, according to "on the fly" direction we should not disable > >>>>> the queue > >>>> unless enable message is coming to disable it. > >>> > >>> No response, so looks like you agree that it doesn't make sense. > >> > >> No, my reply was general to all your comments. > >> > >> With SW backend, I agree we don't need to disable the rings in case > >> of asynchronous changes to the ring because we protect it with a > >> lock, so we are sure the ring won't be accessed by another thread > >> while doing the change. > >> > >> For vDPA case that's more problematic because we have no such locking > >> mechanism. > >> > >> For example memory hotplug, Qemu does not seem to disable the > queues > >> so we need to stop the vDPA device one way or another so that it does > >> not process the rings while the Vhost lib remaps the memory areas. > >> > >>>>> In addition: > >>>>> Do you really want to toggle vDPA drivers\app for any > >>>>> configuration > >>>> message? It may cause queue recreation for each one (at least for > mlx5). > >>>> > >>>> I want to have something robust and maintainable. > >>> > >>> Me too. > >>> > >>>> These messages arriving after a queue have been configured once are > >>>> rare events, but this is usually the kind of things that cause > >>>> maintenance > >> burden. > >>> > >>> In case of guest poll mode (testpmd virtio) we all the time get callfd > twice. > >> > >> Right. > >> > >>>> If you look at my example patch, you will understand that with my > >>>> proposal, there won't be any more state change notification than > >>>> with your proposal when Qemu or any other Vhost-user master send a > >>>> disable request before sending the request that impact the queue > state. > >>> > >>> we didn't talk about disable time - this one is very simple. > >>> > >>> Yes, In case the queue is disabled your proposal doesn't send extra > >> notification as my. > >>> But in case the queue is ready, your proposal send extra not ready > >> notification for kikfd,callfd,set_vring_base configurations. > >> > >> I think this is necessary for synchronization with the Vhost-user > >> master (in case the master asks for this synchronization, like > >> set_mem_table for instance when reply-ack is enabled). > >> > >>>> It just adds more robustness if this unlikely event happens, by > >>>> invalidating the ring state to not ready before doing the actual > >>>> ring > >> configuration change. > >>>> So that this config change is not missed by the vDPA driver or the > >> application. > >>> > >>> One more issue here is that there is some time that device is ready > >>> (already > >> configured) and the first vittq-pair is not ready (your invalidate > >> proposal for set_vring_base). > >> > >> > >> > >>> It doesn’t save the concept that device is ready only in case the > >>> first virtq- > >> pair is ready. > >> > >> I understand the spec as "the device is ready as soon as the first > >> queue pair is ready", but I might be wrong. > >> > >> Do you suggest to call the dev_close() vDPA callback and the > >> destroy_device() application callback as soon as one of the ring of > >> the first queue pair receive a disable request or, with my patch, > >> when one of the rings receives a request that changes the ring state? > > > > I means, your proposal actually may make first virtq-pair ready state > disabled when device ready. > > So, yes, it leads to call device close\destroy. > > No it doesn't, there is no call to .dev_close()/.destroy_device() with my > patch if first queue pair gets disabled. > > >>> I will not insist anymore on waiting for enable for notifying > >>> although I not > >> fan with it. > >>> > >>> So, I suggest to create 1 notification function to be called after > >>> message > >> handler and before reply. > >>> This function is the only one which notify ready states in the next > options: > >>> > >>> 1. virtq ready state is changed in the queue. > >>> 2. virtq ready state stays on after configuration message handler. > >>> 3. device state will be enabled when the first queue pair is ready. > >> > >> IIUC, it will not disable the queues when there is a state change, is > >> that correct? If so, I think it does not work with memory hotplug > >> case I mentioned earlier. > > > > It will do enable again which mean - something was modified. > > Ok, thanks for the clarification. > > I think it is not enough for the examples I gave below. For set_mem_table, > we need to stop the device from processing the vrings before the > set_mem_table handler calls the munmap(), and re-enable it after the > mmap() (I did that wrong in my example patch, I just did that after the > munmap/mmap happened, which is too late). > > >> Even for the callfd double change it can be problematic as Vhost-lib > >> will close the first one while it will still be used by the driver > >> (Btw, I see my example patch is also buggy in this regards, it should > >> reset the call_fd value in the virtqueue, then call > >> vhost_user_update_vring_state() and finally close the FD). > > > > Yes, this one leads for different handle for each message. > > > > Maybe it leads for new queue modify operation. > > So, queue doesn't send the state - just does configuration change on the > fly. > > > > What do you think? > > I think that configuration on the fly doesn't fly. > We would at least need to stop the device from processing the rings for > memory hotplug case, so why not just send a disable notification? Yes, driver need notification here. > And for the double callfd, that does not look right to me not to request the > driver to stop using it before it is closed, isn't it? Yes, and some drivers (include mlx5) may stop the traffic in this case too. modify\update operation will solve all: For example: In memory hotplug: Do new mmap Call modify Do munmup for old. In callfd\kickfd change: Set new FD. Call modify. Close old FD. Modify is clearer, save calls and faster (datapath will back faster). > Thanks, > Maxime > > > > >> Thanks, > >> Maxime > >>> > >>> Matan > >>> > >>> > >>> > >>>> Maxime > >>> > >