On Fri, May 16, 2014 at 01:02:51PM +0800, Jason Wang wrote: > On 05/15/2014 05:45 PM, Michael S. Tsirkin wrote: > > On Thu, May 15, 2014 at 05:22:28PM +0800, Jason Wang wrote: > >> On 05/15/2014 04:28 PM, Michael S. Tsirkin wrote: > >>> Thanks, looks good. > >>> Some minor comments below, > >>> > >>> On Thu, May 15, 2014 at 03:16:47PM +0800, Jason Wang wrote: > >>>> It's hard to track all mac addresses and their configurations (e.g > >>>> vlan or ipv6) in qemu. Without those informations, it's impossible to > >>> s/those informations/this information/ > >>> > >>>> build proper garp packet after migration. The only possible solution > >>>> to this is let guest (who knew all configurations) to do this. > >>> s/knew/knows/ > >>> > >>>> So, this patch introduces a new readonly config status bit of virtio-net, > >>>> VIRTIO_NET_S_ANNOUNCE which is used to notify guest to announce > >>>> presence of its link through config update interrupt.When guest has > >>>> done the announcement, it should ack the notification through > >>>> VIRTIO_NET_CTRL_ANNOUNCE_ACK cmd. This feature is negotiated by a new > >>>> feature bit VIRTIO_NET_F_ANNOUNCE (which has already been supported by > >>>> Linux guest). > >>>> > >>>> During load, a counter of announcing rounds were set so that the after > >>> s/were/is/ > >>> s/the after/after/ > >> Will correct those typos. > >>>> the vm is running it can trigger rounds of config interrupts to notify > >>>> the guest to build and send the correct garps. > >>>> > >>>> Tested with ping to guest with vlan during migration. Without the > >>>> patch, lots of the packets were lost after migration. With the patch, > >>>> could not notice packet loss after migration. > >>> below changelog should go after ---, until the ack list. > >>> > >> Ok. > >>>> Reference: > >>>> RFC v2: > >>>> https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg01750.html > >>>> RFC v1: > >>>> https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg02648.html > >>>> V7: > >>>> https://lists.gnu.org/archive/html/qemu-devel/2013-03/msg01127.html > >>>> > >>>> Changes from RFC v2: > >>>> - use QEMU_CLOCK_VIRTUAL instead of QEMU_CLOCK_REALTIME > >>>> - compat self announce for 2.0 machine type > >>>> > >>>> Changes from RFC v1: > >>>> - clean VIRTIO_NET_S_ANNOUNCE bit during reset > >>>> - free announce timer during clean > >>>> - make announce work for non-vhost case > >>>> > >>>> Changes from V7: > >>>> - Instead of introducing a global method for each kind of nic, this > >>>> version limits the changes to virtio-net itself. > >>>> > >>>> Cc: Liuyongan <liuyon...@huawei.com> > >>>> Cc: Amos Kong <ak...@redhat.com> > >>>> Signed-off-by: Jason Wang <jasow...@redhat.com> > >>>> --- > >>>> hw/net/virtio-net.c | 48 > >>>> ++++++++++++++++++++++++++++++++++++++++ > >>>> include/hw/i386/pc.h | 5 ++++ > >>>> include/hw/virtio/virtio-net.h | 16 +++++++++++++ > >>>> 3 files changed, 69 insertions(+), 0 deletions(-) > >>>> > >>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c > >>>> index 940a7cf..98d59e9 100644 > >>>> --- a/hw/net/virtio-net.c > >>>> +++ b/hw/net/virtio-net.c > >>>> @@ -25,6 +25,7 @@ > >>>> #include "monitor/monitor.h" > >>>> > >>>> #define VIRTIO_NET_VM_VERSION 11 > >>>> +#define VIRTIO_NET_ANNOUNCE_ROUNDS 3 > >>>> > >>>> #define MAC_TABLE_ENTRIES 64 > >>>> #define MAX_VLAN (1 << 12) /* Per 802.1Q definition */ > >>> I would make it 5 to be consistent with SELF_ANNOUNCE_ROUNDS > >>> in savevm.c > >>> > >>> in fact, let's move SELF_ANNOUNCE_ROUNDS to include/migration/vmstate.h > >>> and reuse it. > >> Ok. > >>>> @@ -99,6 +100,25 @@ static bool virtio_net_started(VirtIONet *n, uint8_t > >>>> status) > >>>> (n->status & VIRTIO_NET_S_LINK_UP) && vdev->vm_running; > >>>> } > >>>> > >>>> +static void virtio_net_announce_timer(void *opaque) > >>>> +{ > >>>> + VirtIONet *n = opaque; > >>>> + VirtIODevice *vdev = VIRTIO_DEVICE(n); > >>>> + > >>>> + if (n->announce && > >>> I would make it > 0 here, just in case it becomes negative as a result > >>> of some bug. > >> Sure. > >>>> + virtio_net_started(n, vdev->status) && > >>>> + vdev->guest_features & (0x1 << VIRTIO_NET_F_GUEST_ANNOUNCE) && > >>>> + vdev->guest_features & (0x1 << VIRTIO_NET_F_CTRL_VQ)) { > >>>> + > >>>> + n->announce--; > >>>> + n->status |= VIRTIO_NET_S_ANNOUNCE; > >>>> + virtio_notify_config(vdev); > >>>> + } else { > >>>> + timer_del(n->announce_timer); > >>> why do this here? > >>> > >>>> + n->announce = 0; > >>> why is this here? > >>> > >> If guest shuts down the device, there's no need to do the announcing. > > It's still weird. > > Either announce is 0 and then timer was not running > > or it's > 0 and then this won't trigger. > > Right, the logic here is for QEMU_CLOCK_REALTIME. But there's another > question, we use QEMU_CLOCK_VIRTUAL while qemu is using > QEMU_CLOCK_REALTIME for its announcing. This looks fine but not sure > whether this is safe.
Meaning QEMU_CLOCK_REALTIME that qemu uses? Not sure either but it doesn't modify guest state so it seems safe. > And if the link was down, it's better for us to > stop the announcing? I think that it doesn't matter: guest won't announce when link is down, anyway. Not worth it to write extra logic here in the host. > > > >>>> + } > >>>> +} > >>>> + > >>>> static void virtio_net_vhost_status(VirtIONet *n, uint8_t status) > >>>> { > >>>> VirtIODevice *vdev = VIRTIO_DEVICE(n); > >>>> @@ -147,6 +167,8 @@ static void virtio_net_set_status(struct > >>>> VirtIODevice *vdev, uint8_t status) > >>>> > >>>> virtio_net_vhost_status(n, status); > >>>> > >>>> + virtio_net_announce_timer(n); > >>>> + > >>> why do this here? why not right after we set announce counter? > >> The reasons are: > >> > >> - The counters were set in load, but the device is not running so we > >> could not inject the interrupt at that time. > > I see. This makes sense - but this isn't intuitive. > > Why don't we simply start timer with current time? > > Need to make sure it runs fine if time passes, but > > I think it's fine. > > Not sure I get the point, I didn't see any differences except for an > extra timer fire. The only reason you call virtio_net_announce_timer from set_status is because it gets run on vm start/stop. It's true but not intuitive. Just run timer always from timer, it's clearer this way :) > > > >> - We can stop the progress when guest is shutting down the device. > > On shut down guest will reset device stopping timer - this seems enough. > > Yes, I see. > >>>> for (i = 0; i < n->max_queues; i++) { > >>>> q = &n->vqs[i]; > >>>> > >>>> @@ -322,6 +344,9 @@ static void virtio_net_reset(VirtIODevice *vdev) > >>>> n->nobcast = 0; > >>>> /* multiqueue is disabled by default */ > >>>> n->curr_queues = 1; > >>>> + timer_del(n->announce_timer); > >>>> + n->announce = 0; > >>>> + n->status &= ~VIRTIO_NET_S_ANNOUNCE; > >>>> > >>>> /* Flush any MAC and VLAN filter table state */ > >>>> n->mac_table.in_use = 0; > >>>> @@ -731,6 +756,22 @@ static int virtio_net_handle_vlan_table(VirtIONet > >>>> *n, uint8_t cmd, > >>>> return VIRTIO_NET_OK; > >>>> } > >>>> > >>>> +static int virtio_net_handle_announce(VirtIONet *n, uint8_t cmd, > >>>> + struct iovec *iov, unsigned int > >>>> iov_cnt) > >>>> +{ > >>>> + if (cmd == VIRTIO_NET_CTRL_ANNOUNCE_ACK) { > >>>> + n->status &= ~VIRTIO_NET_S_ANNOUNCE; > >>>> + if (n->announce) { > >>> I would make it > 0 here, just in case it becomes negative as a result > >>> of some bug. > >> Ok. > >>>> + timer_mod(n->announce_timer, > >>>> + qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 50 + > >>>> + (VIRTIO_NET_ANNOUNCE_ROUNDS - n->announce - 1) * > >>>> 100); > >>> savevm.c has this code, and a comment: > >>> /* delay 50ms, 150ms, 250ms, ... */ > >>> 50 + (SELF_ANNOUNCE_ROUNDS - count - 1) * 100 > >>> > >>> how about an API in include/migration/vmstate.h > >>> > >>> static inline > >>> int64_t self_announce_delay(int round) > >>> { > >>> assert(round < SELF_ANNOUNCE_ROUNDS && round > 0); > >>> /* delay 50ms, 150ms, 250ms, ... */ > >>> return 50 + (SELF_ANNOUNCE_ROUNDS - round - 1) * 100; > >>> } > >>> > >>> or something to this end. > >>> > >> Good idea, will do this. > >>>> + } > >>>> + return VIRTIO_NET_OK; > >>>> + } else { > >>>> + return VIRTIO_NET_ERR; > >>>> + } > >>>> +} > >>>> + > >>>> static int virtio_net_handle_mq(VirtIONet *n, uint8_t cmd, > >>>> struct iovec *iov, unsigned int iov_cnt) > >>>> { > >>>> @@ -794,6 +835,8 @@ static void virtio_net_handle_ctrl(VirtIODevice > >>>> *vdev, VirtQueue *vq) > >>>> status = virtio_net_handle_mac(n, ctrl.cmd, iov, iov_cnt); > >>>> } else if (ctrl.class == VIRTIO_NET_CTRL_VLAN) { > >>>> status = virtio_net_handle_vlan_table(n, ctrl.cmd, iov, > >>>> iov_cnt); > >>>> + } else if (ctrl.class == VIRTIO_NET_CTRL_ANNOUNCE) { > >>>> + status = virtio_net_handle_announce(n, ctrl.cmd, iov, > >>>> iov_cnt); > >>>> } else if (ctrl.class == VIRTIO_NET_CTRL_MQ) { > >>>> status = virtio_net_handle_mq(n, ctrl.cmd, iov, iov_cnt); > >>>> } else if (ctrl.class == VIRTIO_NET_CTRL_GUEST_OFFLOADS) { > >>>> @@ -1451,6 +1494,7 @@ static int virtio_net_load(QEMUFile *f, void > >>>> *opaque, int version_id) > >>>> qemu_get_subqueue(n->nic, i)->link_down = link_down; > >>>> } > >>>> > >>>> + n->announce = VIRTIO_NET_ANNOUNCE_ROUNDS; > >>> Well if virtio_net_handle_announce runs now it will start timer > >>> in the past, right? > >>> This seems wrong. > >> Not sure I get the case. When in virtio_net_load() the vm is not even > >> running so looks like virtio_net_handle_announce() could not run in the > >> same time. > > I see, this works because you decrement it when VM starts running. > > I think it's not a good idea to rely on this though, > > better do everything from timer, and handle > > the case of command arriving too early. > > > > Right, if QEMU_CLOCK_VIRTUAL is fine, we can do everything in a timer. > > For the case of command arriving too early. Is there anything else we > need to do? Since we only start the next timer when we get ack command. > > Thanks I think we need to make sure we don't set the timer in the past or very far in the future.