On Tue, Oct 10, 2023 at 4:57 AM Yajun Wu <yaj...@nvidia.com> wrote: > > > On 10/9/2023 6:28 PM, German Maglione wrote: > > External email: Use caution opening links or attachments > > > > > > On Sat, Oct 7, 2023 at 4:23 AM Yajun Wu <yaj...@nvidia.com> wrote: > >> > >> On 10/6/2023 6:34 PM, Michael S. Tsirkin wrote: > >>> External email: Use caution opening links or attachments > >>> > >>> > >>> On Fri, Oct 06, 2023 at 11:47:55AM +0200, Hanna Czenczek wrote: > >>>> On 06.10.23 11:26, Michael S. Tsirkin wrote: > >>>>> On Fri, Oct 06, 2023 at 11:15:55AM +0200, Hanna Czenczek wrote: > >>>>>> On 06.10.23 10:45, Michael S. Tsirkin wrote: > >>>>>>> On Fri, Oct 06, 2023 at 09:48:14AM +0200, Hanna Czenczek wrote: > >>>>>>>> On 05.10.23 19:15, Michael S. Tsirkin wrote: > >>>>>>>>> On Thu, Oct 05, 2023 at 01:08:52PM -0400, Stefan Hajnoczi wrote: > >>>>>>>>>> On Wed, Oct 04, 2023 at 02:58:57PM +0200, Hanna Czenczek wrote: > >>>>>>>>>>> There is no clearly defined purpose for the virtio status byte in > >>>>>>>>>>> vhost-user: For resetting, we already have RESET_DEVICE; and for > >>>>>>>>>>> virtio > >>>>>>>>>>> feature negotiation, we have [GS]ET_FEATURES. With the REPLY_ACK > >>>>>>>>>>> protocol extension, it is possible for SET_FEATURES to return > >>>>>>>>>>> errors > >>>>>>>>>>> (SET_PROTOCOL_FEATURES may be called before SET_FEATURES). > >>>>>>>>>>> > >>>>>>>>>>> As for implementations, SET_STATUS is not widely implemented. > >>>>>>>>>>> dpdk does > >>>>>>>>>>> implement it, but only uses it to signal feature negotiation > >>>>>>>>>>> failure. > >>>>>>>>>>> While it does log reset requests (SET_STATUS 0) as such, it > >>>>>>>>>>> effectively > >>>>>>>>>>> ignores them, in contrast to RESET_OWNER (which is deprecated, > >>>>>>>>>>> and today > >>>>>>>>>>> means the same thing as RESET_DEVICE). > >>>>>>>>>>> > >>>>>>>>>>> While qemu superficially has support for [GS]ET_STATUS, it does > >>>>>>>>>>> not > >>>>>>>>>>> forward the guest-set status byte, but instead just makes it up > >>>>>>>>>>> internally, and actually completely ignores what the back-end > >>>>>>>>>>> returns, > >>>>>>>>>>> only using it as the template for a subsequent SET_STATUS to add > >>>>>>>>>>> single > >>>>>>>>>>> bits to it. Notably, after setting FEATURES_OK, it never reads > >>>>>>>>>>> it back > >>>>>>>>>>> to see whether the flag is still set, which is the only way in > >>>>>>>>>>> which > >>>>>>>>>>> dpdk uses the status byte. > >>>>>>>>>>> > >>>>>>>>>>> As-is, no front-end or back-end can rely on the other side > >>>>>>>>>>> handling this > >>>>>>>>>>> field in a useful manner, and it also provides no practical use > >>>>>>>>>>> over > >>>>>>>>>>> other mechanisms the vhost-user protocol has, which are more > >>>>>>>>>>> clearly > >>>>>>>>>>> defined. Deprecate it. > >>>>>>>>>>> > >>>>>>>>>>> Suggested-by: Stefan Hajnoczi <stefa...@redhat.com> > >>>>>>>>>>> Signed-off-by: Hanna Czenczek <hre...@redhat.com> > >>>>>>>>>>> --- > >>>>>>>>>>> docs/interop/vhost-user.rst | 28 > >>>>>>>>>>> +++++++++++++++++++++------- > >>>>>>>>>>> 1 file changed, 21 insertions(+), 7 deletions(-) > >>>>>>>>>> Reviewed-by: Stefan Hajnoczi <stefa...@redhat.com> > >>>>>>>>> SET_STATUS is the only way to signal failure to acknowledge > >>>>>>>>> FEATURES_OK. > >>>>>>>>> The fact current backends never check errors does not mean they > >>>>>>>>> never > >>>>>>>>> will. So no, not applying this. > >>>>>>>> Can this not be done with REPLY_ACK? I.e., with the following > >>>>>>>> message > >>>>>>>> order: > >>>>>>>> > >>>>>>>> 1. GET_FEATURES to find out whether VHOST_USER_F_PROTOCOL_FEATURES is > >>>>>>>> present > >>>>>>>> 2. GET_PROTOCOL_FEATURES to hopefully get > >>>>>>>> VHOST_USER_PROTOCOL_F_REPLY_ACK > >>>>>>>> 3. SET_PROTOCOL_FEATURES to set VHOST_USER_PROTOCOL_F_REPLY_ACK > >>>>>>>> 4. SET_FEATURES with need_reply > >>>>>>>> > >>>>>>>> If not, the problem is that qemu has sent SET_STATUS 0 for a while > >>>>>>>> when the > >>>>>>>> vCPUs are stopped, which generally seems to request a device reset. > >>>>>>>> If we > >>>>>>>> don’t state at least that SET_STATUS 0 is to be ignored, back-ends > >>>>>>>> that will > >>>>>>>> implement SET_STATUS later may break with at least these qemu > >>>>>>>> versions. But > >>>>>>>> documenting that a particular use of the status byte is to be > >>>>>>>> ignored would > >>>>>>>> be really strange. > >>>>>>>> > >>>>>>>> Hanna > >>>>>>> Hmm I guess. Though just following virtio spec seems cleaner to me... > >>>>>>> vhost-user reconfigures the state fully on start. > >>>>>> Not the internal device state, though. virtiofsd has internal state, > >>>>>> and > >>>>>> other devices like vhost-gpu back-ends would probably, too. > >>>>>> > >>>>>> Stefan has recently sent a series > >>>>>> (https://lists.nongnu.org/archive/html/qemu-devel/2023-10/msg00709.html) > >>>>>> to > >>>>>> put the reset (RESET_DEVICE) into virtio_reset() (when we really need a > >>>>>> reset). > >>>>>> > >>>>>> I really don’t like our current approach with the status byte. > >>>>>> Following the > >>>>>> virtio specification to me would mean that the guest directly controls > >>>>>> this > >>>>>> byte, which it does not. qemu makes up values as it deems > >>>>>> appropriate, and > >>>>>> this includes sending a SET_STATUS 0 when the guest is just paused, > >>>>>> i.e. > >>>>>> when the guest really doesn’t want a device reset. > >>>>>> > >>>>>> That means that qemu does not treat this as a virtio device field > >>>>>> (because > >>>>>> that would mean exposing it to the guest driver), but instead treats > >>>>>> it as > >>>>>> part of the vhost(-user) protocol. It doesn’t feel right to me that > >>>>>> we use > >>>>>> a virtio-defined feature for communication on the vhost level, i.e. > >>>>>> between > >>>>>> front-end and back-end, and not between guest driver and device. I > >>>>>> think > >>>>>> all vhost-level protocol features should be fully defined in the > >>>>>> vhost-user > >>>>>> specification, which REPLY_ACK is. > >>>>> Hmm that makes sense. Maybe we should have done what stefan's patch > >>>>> is doing. > >>>>> > >>>>> Do look at the original commit that introduced it to understand why > >>>>> it was added. > >>>> I don’t understand why this was added to the stop/cont code, though. If > >>>> it > >>>> is time consuming to make these changes, why are they done every time > >>>> the VM > >>>> is paused > >>>> and resumed? It makes sense that this would be done for the initial > >>>> configuration (where a reset also wouldn’t hurt), but here it seems > >>>> wrong. > >>>> > >>>> (To be clear, a reset in the stop/cont code is wrong, because it breaks > >>>> stateful devices.) > >>>> > >>>> Also, note the newer commits 6f8be29ec17 and c3716f260bf. The reset as > >>>> originally introduced was wrong even for non-stateful devices, because it > >>>> occurred before we fetched the state (vring indices) so we could restore > >>>> it > >>>> later. I don’t know how 923b8921d21 was tested, but if the back-end used > >>>> for testing implemented SET_STATUS 0 as a reset, it could not have > >>>> survived > >>>> either migration or a stop/cont in general, because the vring indices > >>>> would > >>>> have been reset to 0. > >>>> > >>>> What I’m saying is, 923b8921d21 introduced SET_STATUS calls that broke > >>>> all > >>>> devices that would implement them as per virtio spec, and even today it’s > >>>> broken for stateful devices. The mentioned performance issue is likely > >>>> real, but we can’t address it by making up SET_STATUS calls that are > >>>> wrong. > >>>> > >>>> I concede that I didn’t think about DRIVER_OK. Personally, I would do > >>>> all > >>>> final configuration that would happen upon a DRIVER_OK once the first > >>>> vring > >>>> is started (i.e. receives a kick). That has the added benefit of being > >>>> asynchronous because it doesn’t block any vhost-user messages (which are > >>>> synchronous, and thus block downtime). > >>>> > >>>> Hanna > >>> For better or worse kick is per ring. It's out of spec to start rings > >>> that were not kicked but I guess you could do configuration ... > >>> Seems somewhat asymmetrical though. > >>> > >>> Let's wait until next week, hopefully Yajun Wu will answer. > >> The main motivation of adding VHOST_USER_SET_STATUS is to let backend > >> DPDK know > >> when DRIVER_OK bit is valid. It's an indication of all VQ configuration > >> has sent, > >> otherwise DPDK has to rely on first queue pair is ready, then > >> receiving/applying > >> VQ configuration one by one. > >> > >> During live migration, configuring VQ one by one is very time consuming. > >> For VIRTIO > >> net vDPA, HW needs to know how many VQs are enabled to set > >> RSS(Receive-Side Scaling). > >> > >> If you don’t want SET_STATUS message, backend can remove protocol > >> feature bit > >> VHOST_USER_PROTOCOL_F_STATUS. > >> DPDK is ignoring SET_STATUS 0, but using GET_VRING_BASE to do device > >> close/reset. > > This is incorrect, resetting the device on GET_VRING_BASE breaks > > the stop/cont. Since you don't want to reset the VQs on stop/cont. > Sorry for the misunderstanding, dpdk vhost backend framework doesn't > have RESET concept(only device level .dev_conf and .dev_close). On > receiving DRIVER_OK does dev_conf, on receiving GET_VRING_BASE does > dev_close. For every VM suspend/resume, dpdk issues dev_close then dev_conf.
(sorry I did not explain myself well) I meant that resetting the VQs upon receiveng GET_VRING_BASE makes the backend to fail if qemu continues after a "stop". I notice that in dpdk, when it receives a GET_VRING_BASE[0], it calls 'vring_invalidate(dev, vq);'[1], resetting the VQ[2], doing that is incorrect. [0] https://github.com/DPDK/dpdk/blob/main/lib/vhost/vhost_user.c#L2135 [1] https://github.com/DPDK/dpdk/blob/main/lib/vhost/vhost_user.c#L2201 [2] https://github.com/DPDK/dpdk/blob/main/lib/vhost/vhost.c#L580 > > > >> I'm not involved in discussion about adding SET_STATUS in Vhost > >> protocol. This feature > >> is essential for vDPA(same as vhost-vdpa implements VHOST_VDPA_SET_STATUS). > >> > >> Thanks, > >> Yajun > >>>>>> Now, we could hand full control of the status byte to the guest, and > >>>>>> that > >>>>>> would make me content. But I feel like that doesn’t really work, > >>>>>> because > >>>>>> qemu needs to intercept the status byte anyway (it needs to know when > >>>>>> there > >>>>>> is a reset, probably wants to know when the device is configured, > >>>>>> etc.), so > >>>>>> I don’t think having the status byte in vhost-user really gains us > >>>>>> much when > >>>>>> qemu could translate status byte changes to/from other vhost-user > >>>>>> commands. > >>>>>> > >>>>>> Hanna > >>>>> well it intercepts it but I think it could pass it on unchanged. > >>>>> > >>>>> > >>>>>>> I guess symmetry was the > >>>>>>> point. So I don't see why SET_STATUS 0 has to be ignored. > >>>>>>> > >>>>>>> > >>>>>>> SET_STATUS was introduced by: > >>>>>>> > >>>>>>> commit 923b8921d210763359e96246a58658ac0db6c645 > >>>>>>> Author: Yajun Wu <yaj...@nvidia.com> > >>>>>>> Date: Mon Oct 17 14:44:52 2022 +0800 > >>>>>>> > >>>>>>> vhost-user: Support vhost_dev_start > >>>>>>> > >>>>>>> CC the author. > >>>>>>> > >> _______________________________________________ > >> Virtio-fs mailing list > >> virtio...@redhat.com > >> https://listman.redhat.com/mailman/listinfo/virtio-fs > > > > > > -- > > German > > > -- German