Re: [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection

Marc-André Lureau Tue, 29 Mar 2016 03:53:59 -0700

Hi

On Tue, Mar 29, 2016 at 10:10 AM, Yuanhan Liu
<yuanhan....@linux.intel.com> wrote:
> Backend crash may be not that normal in production usage, it is in
> development stage. It would be handy if we can handle that as well,
> as it's a painful experience for me that every time I do a VM2VM test
> and once my changes to backend causes a crash (unfortunately, it
> happens often in early stage), I have to reboot the two VMs.
>
> If we have reconnect, life could be easier. I then don't need worry
> that I will mess the backend and crash it any more.
>
> And again, the reason I mentioned crash here again is not because
> we need handle it, specially. Instead, I was thinking we might be
> able to handle the two reasons both.


I think crash could be handle with queue reset Michael proposed, but
that would probably require guest side changes.

>> Conceptually, I think if we allow the backend to disconnect, it makes
>> sense that qemu is actually the socket server. But it doesn't matter
>> much, it's simple to teach qemu to reconnect a timer...
>
> We already have that, right? I mean, the char-dev "reconnect" option.

Sure, what I mean is that it sounds cleaner from a design pov for qemu
to be the server (since it is the one actually waiting for backend in
this case), beside a timer is often a pretty rough solution to
anything.

>> So we should
>> probably allow both cases anyway.
>
> Yes, I think both should work. I may be wrong (that I may miss
> something), but it seems it's (far) easier to me to keep QEMU
> as the client, and adding the "reconnect" option, judging that
> we have all stuff to make it work ready. In this way,
>
> - if backend crashes/quits, you just need restart the backend,
>   and QEMU will retry the connection.
>
> - if QEMU crashes/quits, you just need restart the QEMU, and
>   then QEMU will start a fresh connection.

It may all depend on use cases, it's not more obvious or easy than the
other to me.

> However, if let QEMU as the server, there are 2 major works need
> to be done in the backend side (take DPDK as example, that just
> acts as vhost-user server only so far):
>
> - Introducing a new API or extending current API to let it to
>   connect the server socket from QEMU.
>
> - If QEMU crashes/quits, we need add code to backend to keep
>   reconnecting unless connection is established, which is something
>   similar to the "reconnect" stuff in QEMU.
> As you can see, it needs more effort (though it's not something
> you care :). And it has duplicate work.


Ah, I am looking at this from qemu angle, backend may need to adapt if
it doesn't already handle both "socket role" (client & server).

>
>>
>> > - start DPDK vhost-switch example
>> >
>> > - start QEMU, which will connect to DPDK vhost-user
>> >
>> >   link is good now.
>> >
>> > - kill DPDK vhost-switch
>> >
>> >   link is broken at this stage
>> >
>> > - start DPDK vhost-switch again
>> >
>> >   you will find that the link is back again.
>> >
>> >
>> > Will that makes sense to you? If so, we may need do nothing (or just
>> > very few) changes at all to DPDK to get the reconnect work.
>>
>> The main issue with handling crashes (gone at any time) is that the
>> backend my not have time to sync the used idx (at the least). It may
>> already have processed incoming packets, so on reconnect, it may
>> duplicate the receiving/dispatching work.
>
> That's not the case for DPDK backend implementation: incoming packets
> won't be delivered for processing before we update the used idx.

It could be ok on incoming packet side (vm->dpdk), but I don't see how
you could avoid packet loss on the other side (dpdk->vm), since the
packets must be added to queues before updating used idx.

>
>> Similarly, on the backend
>> receiving end, some packets may be lost, never received by the VM, and
>> later overwritten by the backend after reconnect (for the same used
>> idx update reason). This may not be a big deal for unreliable
>> protocols, but I am not familiar enough with network usage to know if
>> that's fine in all cases. It may be fine for some packets, such as
>> udp.
>>
>> However, in general, vhost-user should not be specific to network
>> transmission, and it would be nice to have a reliable way for the the
>> backend to reconnect. That's what I try to do in this series.
>
> Agreed.
>
>> I'll
>> repost it after I have done more testing.
>
> Great, and thanks for the work. I will also take some time to give
> closer review.

thanks


-- 
Marc-André Lureau

Re: [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection

Reply via email to