On Mon, May 19, 2014 at 05:34:38PM +0800, Jason Wang wrote: > On 05/18/2014 05:04 PM, Michael S. Tsirkin wrote: > > On Fri, May 16, 2014 at 01:02:51PM +0800, Jason Wang wrote: > >> On 05/15/2014 05:45 PM, Michael S. Tsirkin wrote: > >>> On Thu, May 15, 2014 at 05:22:28PM +0800, Jason Wang wrote: > >>>> On 05/15/2014 04:28 PM, Michael S. Tsirkin wrote: > >>>>> Thanks, looks good. > >>>>> Some minor comments below, > >>>>> > >>>>> On Thu, May 15, 2014 at 03:16:47PM +0800, Jason Wang wrote: > >>>>>> It's hard to track all mac addresses and their configurations (e.g > >>>>>> vlan or ipv6) in qemu. Without those informations, it's impossible to > >>>>> s/those informations/this information/ > >>>>> > >>>>>> build proper garp packet after migration. The only possible solution > >>>>>> to this is let guest (who knew all configurations) to do this. > >>>>> s/knew/knows/ > >>>>> > >>>>>> So, this patch introduces a new readonly config status bit of > >>>>>> virtio-net, > >>>>>> VIRTIO_NET_S_ANNOUNCE which is used to notify guest to announce > >>>>>> presence of its link through config update interrupt.When guest has > >>>>>> done the announcement, it should ack the notification through > >>>>>> VIRTIO_NET_CTRL_ANNOUNCE_ACK cmd. This feature is negotiated by a new > >>>>>> feature bit VIRTIO_NET_F_ANNOUNCE (which has already been supported by > >>>>>> Linux guest). > >>>>>> > >>>>>> During load, a counter of announcing rounds were set so that the after > >>>>> s/were/is/ > >>>>> s/the after/after/ > >>>> Will correct those typos. > >>>>>> the vm is running it can trigger rounds of config interrupts to notify > >>>>>> the guest to build and send the correct garps. > >>>>>> > >>>>>> Tested with ping to guest with vlan during migration. Without the > >>>>>> patch, lots of the packets were lost after migration. With the patch, > >>>>>> could not notice packet loss after migration. > >>>>> below changelog should go after ---, until the ack list. > >>>>> > >>>> Ok. > >>>>>> Reference: > >>>>>> RFC v2: > >>>>>> https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg01750.html > >>>>>> RFC v1: > >>>>>> https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg02648.html > >>>>>> V7: > >>>>>> https://lists.gnu.org/archive/html/qemu-devel/2013-03/msg01127.html > >>>>>> > >>>>>> Changes from RFC v2: > >>>>>> - use QEMU_CLOCK_VIRTUAL instead of QEMU_CLOCK_REALTIME > >>>>>> - compat self announce for 2.0 machine type > >>>>>> > >>>>>> Changes from RFC v1: > >>>>>> - clean VIRTIO_NET_S_ANNOUNCE bit during reset > >>>>>> - free announce timer during clean > >>>>>> - make announce work for non-vhost case > >>>>>> > >>>>>> Changes from V7: > >>>>>> - Instead of introducing a global method for each kind of nic, this > >>>>>> version limits the changes to virtio-net itself. > >>>>>> > >>>>>> Cc: Liuyongan <liuyon...@huawei.com> > >>>>>> Cc: Amos Kong <ak...@redhat.com> > >>>>>> Signed-off-by: Jason Wang <jasow...@redhat.com> > >>>>>> --- > >>>>>> hw/net/virtio-net.c | 48 > >>>>>> ++++++++++++++++++++++++++++++++++++++++ > >>>>>> include/hw/i386/pc.h | 5 ++++ > >>>>>> include/hw/virtio/virtio-net.h | 16 +++++++++++++ > >>>>>> 3 files changed, 69 insertions(+), 0 deletions(-) > >>>>>> > >>>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c > >>>>>> index 940a7cf..98d59e9 100644 > >>>>>> --- a/hw/net/virtio-net.c > >>>>>> +++ b/hw/net/virtio-net.c > >>>>>> @@ -25,6 +25,7 @@ > >>>>>> #include "monitor/monitor.h" > >>>>>> > >>>>>> #define VIRTIO_NET_VM_VERSION 11 > >>>>>> +#define VIRTIO_NET_ANNOUNCE_ROUNDS 3 > >>>>>> > >>>>>> #define MAC_TABLE_ENTRIES 64 > >>>>>> #define MAX_VLAN (1 << 12) /* Per 802.1Q definition */ > >>>>> I would make it 5 to be consistent with SELF_ANNOUNCE_ROUNDS > >>>>> in savevm.c > >>>>> > >>>>> in fact, let's move SELF_ANNOUNCE_ROUNDS to include/migration/vmstate.h > >>>>> and reuse it. > >>>> Ok. > >>>>>> @@ -99,6 +100,25 @@ static bool virtio_net_started(VirtIONet *n, > >>>>>> uint8_t status) > >>>>>> (n->status & VIRTIO_NET_S_LINK_UP) && vdev->vm_running; > >>>>>> } > >>>>>> > >>>>>> +static void virtio_net_announce_timer(void *opaque) > >>>>>> +{ > >>>>>> + VirtIONet *n = opaque; > >>>>>> + VirtIODevice *vdev = VIRTIO_DEVICE(n); > >>>>>> + > >>>>>> + if (n->announce && > >>>>> I would make it > 0 here, just in case it becomes negative as a result > >>>>> of some bug. > >>>> Sure. > >>>>>> + virtio_net_started(n, vdev->status) && > >>>>>> + vdev->guest_features & (0x1 << VIRTIO_NET_F_GUEST_ANNOUNCE) && > >>>>>> + vdev->guest_features & (0x1 << VIRTIO_NET_F_CTRL_VQ)) { > >>>>>> + > >>>>>> + n->announce--; > >>>>>> + n->status |= VIRTIO_NET_S_ANNOUNCE; > >>>>>> + virtio_notify_config(vdev); > >>>>>> + } else { > >>>>>> + timer_del(n->announce_timer); > >>>>> why do this here? > >>>>> > >>>>>> + n->announce = 0; > >>>>> why is this here? > >>>>> > >>>> If guest shuts down the device, there's no need to do the announcing. > >>> It's still weird. > >>> Either announce is 0 and then timer was not running > >>> or it's > 0 and then this won't trigger. > >> Right, the logic here is for QEMU_CLOCK_REALTIME. But there's another > >> question, we use QEMU_CLOCK_VIRTUAL while qemu is using > >> QEMU_CLOCK_REALTIME for its announcing. This looks fine but not sure > >> whether this is safe. > > Meaning QEMU_CLOCK_REALTIME that qemu uses? > > Not sure either but it doesn't modify guest state so it seems safe. > > > > > >> And if the link was down, it's better for us to > >> stop the announcing? > > I think that it doesn't matter: guest won't announce when > > link is down, anyway. > > Not worth it to write extra logic here in the host. > > Ok. > >>>>>> + } > >>>>>> +} > >>>>>> + > >>>>>> static void virtio_net_vhost_status(VirtIONet *n, uint8_t status) > >>>>>> { > >>>>>> VirtIODevice *vdev = VIRTIO_DEVICE(n); > >>>>>> @@ -147,6 +167,8 @@ static void virtio_net_set_status(struct > >>>>>> VirtIODevice *vdev, uint8_t status) > >>>>>> > >>>>>> virtio_net_vhost_status(n, status); > >>>>>> > >>>>>> + virtio_net_announce_timer(n); > >>>>>> + > >>>>> why do this here? why not right after we set announce counter? > >>>> The reasons are: > >>>> > >>>> - The counters were set in load, but the device is not running so we > >>>> could not inject the interrupt at that time. > >>> I see. This makes sense - but this isn't intuitive. > >>> Why don't we simply start timer with current time? > >>> Need to make sure it runs fine if time passes, but > >>> I think it's fine. > >> Not sure I get the point, I didn't see any differences except for an > >> extra timer fire. > > The only reason you call virtio_net_announce_timer from set_status > > is because it gets run on vm start/stop. > > It's true but not intuitive. > > Just run timer always from timer, it's clearer this way :) > > > > Sure. > >>>> - We can stop the progress when guest is shutting down the device. > >>> On shut down guest will reset device stopping timer - this seems enough. > >> Yes, I see. > >>>>>> for (i = 0; i < n->max_queues; i++) { > >>>>>> q = &n->vqs[i]; > >>>>>> > >>>>>> @@ -322,6 +344,9 @@ static void virtio_net_reset(VirtIODevice *vdev) > >>>>>> n->nobcast = 0; > >>>>>> /* multiqueue is disabled by default */ > >>>>>> n->curr_queues = 1; > >>>>>> + timer_del(n->announce_timer); > >>>>>> + n->announce = 0; > >>>>>> + n->status &= ~VIRTIO_NET_S_ANNOUNCE; > >>>>>> > >>>>>> /* Flush any MAC and VLAN filter table state */ > >>>>>> n->mac_table.in_use = 0; > >>>>>> @@ -731,6 +756,22 @@ static int virtio_net_handle_vlan_table(VirtIONet > >>>>>> *n, uint8_t cmd, > >>>>>> return VIRTIO_NET_OK; > >>>>>> } > >>>>>> > >>>>>> +static int virtio_net_handle_announce(VirtIONet *n, uint8_t cmd, > >>>>>> + struct iovec *iov, unsigned int > >>>>>> iov_cnt) > >>>>>> +{ > >>>>>> + if (cmd == VIRTIO_NET_CTRL_ANNOUNCE_ACK) { > >>>>>> + n->status &= ~VIRTIO_NET_S_ANNOUNCE; > >>>>>> + if (n->announce) { > >>>>> I would make it > 0 here, just in case it becomes negative as a result > >>>>> of some bug. > >>>> Ok. > >>>>>> + timer_mod(n->announce_timer, > >>>>>> + qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 50 + > >>>>>> + (VIRTIO_NET_ANNOUNCE_ROUNDS - n->announce - 1) > >>>>>> * 100); > >>>>> savevm.c has this code, and a comment: > >>>>> /* delay 50ms, 150ms, 250ms, ... */ > >>>>> 50 + (SELF_ANNOUNCE_ROUNDS - count - 1) * 100 > >>>>> > >>>>> how about an API in include/migration/vmstate.h > >>>>> > >>>>> static inline > >>>>> int64_t self_announce_delay(int round) > >>>>> { > >>>>> assert(round < SELF_ANNOUNCE_ROUNDS && round > 0); > >>>>> /* delay 50ms, 150ms, 250ms, ... */ > >>>>> return 50 + (SELF_ANNOUNCE_ROUNDS - round - 1) * 100; > >>>>> } > >>>>> > >>>>> or something to this end. > >>>>> > >>>> Good idea, will do this. > >>>>>> + } > >>>>>> + return VIRTIO_NET_OK; > >>>>>> + } else { > >>>>>> + return VIRTIO_NET_ERR; > >>>>>> + } > >>>>>> +} > >>>>>> + > >>>>>> static int virtio_net_handle_mq(VirtIONet *n, uint8_t cmd, > >>>>>> struct iovec *iov, unsigned int > >>>>>> iov_cnt) > >>>>>> { > >>>>>> @@ -794,6 +835,8 @@ static void virtio_net_handle_ctrl(VirtIODevice > >>>>>> *vdev, VirtQueue *vq) > >>>>>> status = virtio_net_handle_mac(n, ctrl.cmd, iov, iov_cnt); > >>>>>> } else if (ctrl.class == VIRTIO_NET_CTRL_VLAN) { > >>>>>> status = virtio_net_handle_vlan_table(n, ctrl.cmd, iov, > >>>>>> iov_cnt); > >>>>>> + } else if (ctrl.class == VIRTIO_NET_CTRL_ANNOUNCE) { > >>>>>> + status = virtio_net_handle_announce(n, ctrl.cmd, iov, > >>>>>> iov_cnt); > >>>>>> } else if (ctrl.class == VIRTIO_NET_CTRL_MQ) { > >>>>>> status = virtio_net_handle_mq(n, ctrl.cmd, iov, iov_cnt); > >>>>>> } else if (ctrl.class == VIRTIO_NET_CTRL_GUEST_OFFLOADS) { > >>>>>> @@ -1451,6 +1494,7 @@ static int virtio_net_load(QEMUFile *f, void > >>>>>> *opaque, int version_id) > >>>>>> qemu_get_subqueue(n->nic, i)->link_down = link_down; > >>>>>> } > >>>>>> > >>>>>> + n->announce = VIRTIO_NET_ANNOUNCE_ROUNDS; > >>>>> Well if virtio_net_handle_announce runs now it will start timer > >>>>> in the past, right? > >>>>> This seems wrong. > >>>> Not sure I get the case. When in virtio_net_load() the vm is not even > >>>> running so looks like virtio_net_handle_announce() could not run in the > >>>> same time. > >>> I see, this works because you decrement it when VM starts running. > >>> I think it's not a good idea to rely on this though, > >>> better do everything from timer, and handle > >>> the case of command arriving too early. > >>> > >> Right, if QEMU_CLOCK_VIRTUAL is fine, we can do everything in a timer. > >> > >> For the case of command arriving too early. Is there anything else we > >> need to do? Since we only start the next timer when we get ack command. > >> > >> Thanks > > I think we need to make sure we don't set the timer in the past or > > very far in the future. > > n->announce is between 0 and VIRTIO_NET_ANNOUNCE_ROUNDS - 1, when doing > timer_mod(), so it looks ok.
All I am saying, make sure it's within the range even if handle announce is called before timer_mod. Can not happen with your current patch, but must make sure it's still correct after you make changes :) Also, maybe add an assert there. > There's another case that if vm was stopped > for a long time, the timer will also be delayed, but it's still safe in > this case.