On 2015/06/23 18:41, Michael S. Tsirkin wrote: > On Tue, Jun 23, 2015 at 05:31:06PM +0900, Tetsuya Mukawa wrote: >> On 2015/06/22 17:14, Michael S. Tsirkin wrote: >>> On Mon, Jun 22, 2015 at 12:50:43PM +0900, Tetsuya Mukawa wrote: >>>> Hi guys, >>>> >>>> Here are patches to add feature to start QEMU without vhost-user backend. >>>> Currently, if we want to use vhost-user backend, the backend must start >>>> before >>>> QEMU. >>> Aren't you looking for something like xinetd? >> It will help for restricting starting order, but not help for >> reconnection of vhost-user backend. > At this point, I suggest that we always connect at vm start. > With that, you can add an option to reset the VM > on backend disconnect. > So > - detect backend disconnect > - reset and stop (notifying management) > - reconnect or detect backend reconnect > - proceed with boot > > As I tried to explain below, getting the full functionality > will require guest driver changes. They aren't hard to get > done, patches welcome. >
Could you please let me know your thinking about using DEVICE_NEEDS_RESET for vhost-user reconnection? If it's works, I will try to submit it. > >>>> Also, if QEMU or the backend is closed unexpectedly, there is no way to >>>> recover without restarting both applications. >>> This was previously discussed: >>> https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg00585.html >>> It doesn't look like any of the issues have been addressed. >>> >>> In my opinion, the right way to handle this requires >>> guest cooperation. For example, we can extend virtio 1 to allow a >>> single queue to reset. >>> >>> We then can do something like the following: >>> - hypervisor detects queue backend disconnect. >>> - hypervisor sets flag and notifies guest >>> - guest resets queue, optionally discarding queued packets >>> - hypervisor detects (or requests if it's a client), backend reconnect >>> - hypervisor detects guest queue reset >>> - hypervisor notifies guests about backend reconnect >>> - hypervisor sends hypervisor device state (e.g. queue addresses and >>> indices) to backend >>> >>> Reconnect isn't robust without such an extension. >> I appreciate your all comments. It's truly helpful. >> Do you have any plans to support such queue reset features in virtio spec? > Maybe, but I don't have the time to work on this :( > You are welcome to contribute this. To do that, I need to talk around my boss to join OASIS :'( Probably it's not so easy, so I will just wait, if we need to change spec itself. >> Also, I haven't known following behavior. >> >>> some guests crash if we never complete handling buffers, >>> or networking breaks, etc ... >> Does it mean that device needs to return buffers through used ring as >> soon as possible? >> If so, when backend switch is under heavy load, some guests could be >> crashed? > Yes. Some guests crash when disk times out, too. It's bad news. I haven't assumed the case while writing patches. >>>> Practically, it's not useful. >>> One thing we do seem to lack is specifying an action >>> taken on disconnect. It could be an action similar to r/werror. >>> One extra useful option could be to exit qemu. >> My implementation treats disconnection as link down. >> And when backend is connected, link status will be changed. >> So here is behavior of my patch. >> >> - hypervisor detects queue backend disconnect. >> - hypervisor sets link status down and notifies guest >> - optionally, guest stops sending and receiving packets >> => As you mentioned above, this step may crash some guests. > It's not just that. > Under the virtio protocol, you can not, just by looking at the ring, > detect which buffers have been used. > Imagine: > > - guest adds buffer A > - guest adds buffer B * 10000 - backend uses buffer B * 10000 > > buffer A is nowhere in the available ring since it has > since wrapped around many times. > So backend that crashes and re-connects simply can not recover > and complete buffer A. > Could I make sure that I understand the issue correctly? I guess that an one of cause of the above issue is that the guest doesn't check used_ring before filling avail ring to make sure that buffers overwritten by new filling buffers have already returned through used_ring. If the guest checks used_ring, probably performance will be poor, but there are no buffer loss. Also I guess the above issue might be happen if backend is too slow, because the guest will overwrite avail_ring without checking. Is this correct? > > > >> - hypervisor detects (or requests if it's a client), backend reconnect >> - hypervisor make sure backend supports features that the former backend >> supports. >> - hypervisor sends hypervisor device state (e.g. queue addresses and >> indices) to backend >> - backend reads virtqueues, and recover correct indices. >> => Some backends needs to have this feature to reconnect correctly. >> - hypervisor notifies guests about backend reconnect as link status up >> >>>> This patch series adds following features. >>>> - QEMU can start before the backend. >>> I don't see the point of this one. >> Sorry for my bad English. Here is correct. >> - Guest can start before the backend. >> >> Currently, guest will not start without vhost-user backend application >> when vhost-user is specified. > E.g. with control VQ (which vhost user doesn't handle, but really > should, and I think it will in the future), guest will loop waiting for > the command to complete. I agree. I guess control queue should work fine even when link status is down, but my patches doesn't care about it. >>>> - QEMU or the backend can restart anytime. >>>> connectivity will be recovered automatically, when app starts again. >>>> (if QEMU is server, QEMU just wait reconnection) >>>> while lost connection, link status of virtio-net device is down, >>>> so virtio-net driver on the guest can know it >>>> >>>> To work like above, the patch introduces flags to specify features >>>> vhost-user >>>> backend will support. >>>> >>>> Here are examples. >>>> ('backend_features' is the new flags. Each bit of the flag represents >>>> VIRTIO_NET_F_* in linux/virtio_net.h) >>>> >>>> * QEMU is configured as vhost-user client. >>>> -chardev socket,id=chr0,path=/tmp/sock,reconnect=3 \ >>>> -netdev >>>> vhost-user,id=net0,chardev=chr0,vhostforce,backend-features=0x68000 \ >>>> -device virtio-net-pci,netdev=net0 \ >>>> >>>> * QEMU is configured as vhost-user server. >>>> -chardev socket,id=chr0,path=/tmp/sock,server,nowait \ >>>> -netdev >>>> vhost-user,id=net0,chardev=chr0,vhostforce,backend-features=0x68000 \ >>>> -device virtio-net-pci,netdev=net0 \ >>> This relies on management knowing the feature bitmap encoding of the >>> backend, not sure how practical that is. >> If we can assume which vhost-user backend application is used, we can >> specify backend-features value like above. >> Probably such an assumption is not so strange in practice. >> And above option allow us to start VM before starting backend application. >> >> I guess it's useful, but if you don't think, I could remove this feature. > To make it useful, management needs a way to find out what are the > values. We don't want it to deal with the virtio spec, do we? I agree. > >> In the case, still VM needs to start before backend. >> And when backend is connected, virtio-net device will keep backend >> features for future reconnection. >> (When backend reconnects again, device will make sure that backend >> supports features that the former backend supports.) >> >> Regards, >> Tetsuya > I think that > - reconnect requires guest support. let's add this to virtio > - when not connected, the cleanest thing is to block VM > options to exit and reset might also make sense > - if you want to start guest before backend, I think > we are looking at a spec extension, to tell guest > not to access device since backend is not ready. > we also need to solve two problems > - how does management find out features supported by qemu? > - how does management find out features supported by backend? Thanks for clarifying. I understand the correct steps. Regards, Tetsuya