On 2016/03/28 10:53, Tetsuya Mukawa wrote: > On 2016/03/26 3:00, Marc-André Lureau wrote: >> Hi >> >> On Thu, Mar 24, 2016 at 8:10 AM, Yuanhan Liu >> <yuanhan....@linux.intel.com> wrote: >>>>> The following series starts from the idea that the slave can request a >>>>> "managed" shutdown instead and later recover (I guess the use case for >>>>> this is to allow for example to update static dispatching/filter rules >>>>> etc) >>> What if the backend crashes, that no such request will be sent? And >>> I'm wondering why this request is needed, as we are able to detect >>> the disconnect now (with your patches). >> I don't think trying to handle backend crashes is really a thing we >> need to take care of. If the backend is bad enough to crash, it may as >> well corrupt the guest memory (mst: my understanding of vhost-user is >> that backend must be trusted, or it could just throw garbage in the >> queue descriptors with surprising consequences or elsewhere in the >> guest memory actually, right?). >> >>> BTW, you meant to let QEMU as the server and the backend as the client >>> here, right? Honestly, that's what we've thought of, too, in the first >>> time. >>> However, I'm wondering could we still go with the QEMU as the client >>> and the backend as the server (the default and the only way DPDK >>> supports), and let QEMU to try to reconnect when the backend crashes >>> and restarts. In such case, we need enable the "reconnect" option >>> for vhost-user, and once I have done that, it basically works in my >>> test: >>> >> Conceptually, I think if we allow the backend to disconnect, it makes >> sense that qemu is actually the socket server. But it doesn't matter >> much, it's simple to teach qemu to reconnect a timer... So we should >> probably allow both cases anyway. >> >>> - start DPDK vhost-switch example >>> >>> - start QEMU, which will connect to DPDK vhost-user >>> >>> link is good now. >>> >>> - kill DPDK vhost-switch >>> >>> link is broken at this stage >>> >>> - start DPDK vhost-switch again >>> >>> you will find that the link is back again. >>> >>> >>> Will that makes sense to you? If so, we may need do nothing (or just >>> very few) changes at all to DPDK to get the reconnect work. >> The main issue with handling crashes (gone at any time) is that the >> backend my not have time to sync the used idx (at the least). It may >> already have processed incoming packets, so on reconnect, it may >> duplicate the receiving/dispatching work. Similarly, on the backend >> receiving end, some packets may be lost, never received by the VM, and >> later overwritten by the backend after reconnect (for the same used >> idx update reason). This may not be a big deal for unreliable >> protocols, but I am not familiar enough with network usage to know if >> that's fine in all cases. It may be fine for some packets, such as >> udp. >> >> However, in general, vhost-user should not be specific to network >> transmission, and it would be nice to have a reliable way for the the >> backend to reconnect. That's what I try to do in this series. I'll >> repost it after I have done more testing. >> >> thanks >> > Hi Yuanhan, > > Probably, we have 2 options here. > One is using DEVICE_NEEDS_RESET, or adding one more new status like > QUEUE_NEEDS_RESET to virtio specification. > In this case, we will need to fix virtio-net drivers and virtio-net > device of QEMU, so it might need to fix a lot of code, but we can handle > unexpected shutdown of vhost-user backend. > The other option is Marc's simple solution. In this case, we don't need > to change virtio-net drivers, but we cannot handle unexpected shutdown.
Let me add a bit. Actually we can use both options at the same. For example, only when vhost-user backend closes unexpectedly, use DEVICE_NEEDS_RESET status. So probably it's nice to start merging Marc's patches first. Anyway, if we want to handle unexpected shutdown properly , we may need to use a kind of DEVICE_NEEDS_RESET status. Tetsuya > Thanks, > Tetsuya > > >