On Tue, Dec 15, 2015 at 12:43 PM, Thibaut Collet <thibaut.collet at 6wind.com> wrote:
> > > On Tue, Dec 15, 2015 at 11:05 AM, Peter Xu <peterx at redhat.com> wrote: > >> On Tue, Dec 15, 2015 at 11:45:56AM +0300, Pavel Fedin wrote: >> > To tell the truth, i don't know. I am also learning qemu internals on >> the fly. Indeed, i see that it should announce itself. But >> > this brings up a question: why do we need special announce procedure in >> vhost-user then? >> >> I have the same question. Here is my guess... >> >> In customized networks, maybe people are not using ARP at all? When >> we use DPDK, we directly pass through the network logic inside >> kernel itself. So logically all the network protocols could be >> customized by the user of it. In the customized network, maybe there >> is some other protocol (rather than RARP) that would do the same >> thing as what ARP/RARP does. So, this SEND_RARP request could give >> the vhost-user backend a chance to format its own announce packet >> and broadcast (in the SEND_RARP request, the guest's mac address >> will be appended). >> >> CCing Victor to better know the truth... >> >> Peter >> > > > Hi, > > After a migration, to avoid network outage, the guest must announce its > new location to the L2 layer, typically with a GARP. Otherwise requests > sent to the guest arrive to the old host until a ARP request is sent (after > 30 seconds) or the guest sends some data. > > QEMU implementation of self announce after a migration with a vhost > backend is the following: > - If the VIRTIO_GUEST_ANNOUNCE feature has been negotiated the guest > sends automatically a GARP. > - Else if the vhost backend implements VHOST_USER_SEND_RARP this request > is sent to the vhost backend. When this message is received the vhost > backend must act as it receives a RARP from the guest (purpose of this RARP > is to update switches' MAC->port maaping as a GARP). This RARP is a false > one, created by the vhost backend, > - Else nothing is done and we have a network outage until a ARP is sent > or the guest sends some data. > > > VIRTIO_GUEST_ANNOUNCE feature is negotiated if: > - the vhost backend announces the support of this feature. Maybe QEMU > can be updated to support unconditionnaly this feature > - the virtio driver of the guest implements this feature. It is not the > case for old kernel or dpdk virtio pmd. > > Regarding dpdk to have a migration of vhost interface with limited network > outage we have to: > > - Implement management VHOST_USER_SEND_RARP request to emulate a fake > RARP for guest > > To do that we have to consider two kinds of guest: > 1. Guest with virtio driver implementing VIRTIO_GUEST_ANNOUNCE feature > 2. Guest with virtio driver that does not have the VIRTIO_GUEST_ANNOUNCE > feature. This is the case with old kernel or guest running a dpdk (virtio > pmd of dpdk does not have this feature) > > Guest with VIRTIO_GUEST_ANNOUNCE feature sends automatically some GARP > after a migration if this feature has been negotiated. So the only thing to > do it is to negotiate the VIRTIO_GUEST_ANNOUNCE feature between QEMU, vhost > backend and the guest. > For this kind of guest the vhost-backend must announce the support of > VIRTIO_GUEST_ANNOUNCE feature. As vhost-backend has no particular action to > do in this case the support of VIRTIO_GUEST_ANNOUNCE feature can be > unconditionally set in QEMU in the future. > > For guest without VIRTIO_GUEST_ANNOUNCE feature we have to send a fake > RARP: QEMU knows the MAC address of the guest and can create and broadcast > a RARP. But in case of vhost-backend QEMU is not able to broadcast this > fake RARP and must ask to the vhost backend to do it through the > VHOST_USER_SEND_RARP request. When the vhost backend receives this message > it must create a fake RARP message (as done by QEMU) and do the appropriate > operation as this message has been sent by the guest through the virtio > rings. > > > To solve this point 2 solutions are implemented: > - After the migration the guest automatically sends GARP. This solution > occurs if VIRTIO_GUEST_ANNOUNCE feature has been negotiated between QEMU > and the guest. > * VIRTIO_GUEST_ANNOUNCE > Sorry my previous message will be sent by error (it is a draft with rework in progress) The full explanation are: Hi, After a migration, to avoid network outage, the guest must announce its new location to the L2 layer, typically with a GARP. Otherwise requests sent to the guest arrive to the old host until a ARP request is sent (after 30 seconds) or the guest sends some data. QEMU implementation of self announce after a migration with a vhost backend is the following: - If the VIRTIO_GUEST_ANNOUNCE feature has been negotiated the guest sends automatically a GARP. - Else if the vhost backend implements VHOST_USER_SEND_RARP this request is sent to the vhost backend. When this message is received the vhost backend must act as it receives a RARP from the guest (purpose of this RARP is to update switches' MAC->port maaping as a GARP). This RARP is a false one, created by the vhost backend, - Else nothing is done and we have a network outage until a ARP is sent or the guest sends some data. VIRTIO_GUEST_ANNOUNCE feature is negotiated if: - the vhost backend announces the support of this feature. Maybe QEMU can be updated to support unconditionnaly this feature - the virtio driver of the guest implements this feature. It is not the case for old kernel or dpdk virtio pmd. Regarding dpdk to have a migration of vhost interface with limited network outage we have to: - In the vhost pmd * Announce supports of VIRTIO_GUEST_ANNOUNCE feature * Implement management of VHOST_USER_SEND_RARP request to emulate a fake RARP if the VIRTIO_GUEST_ANNOUNCE feature is not implemented by the guest - In the virtio pmd * Support VIRTIO_GUEST_ANNOUNCE feature to avoid RARP emission by the host after a migration. Hope this explanation will help Regards. Thibaut.