On 05/19/2014 06:06 PM, Michael S. Tsirkin wrote: > On Mon, May 19, 2014 at 05:34:38PM +0800, Jason Wang wrote: >> > On 05/18/2014 05:04 PM, Michael S. Tsirkin wrote: >>> > > On Fri, May 16, 2014 at 01:02:51PM +0800, Jason Wang wrote: >>>> > >> On 05/15/2014 05:45 PM, Michael S. Tsirkin wrote: >>>>> > >>> On Thu, May 15, 2014 at 05:22:28PM +0800, Jason Wang wrote: >>>>>> > >>>> On 05/15/2014 04:28 PM, Michael S. Tsirkin wrote: >>>>>>> > >>>>> Thanks, looks good. >>>>>>> > >>>>> Some minor comments below, >>>>>>> > >>>>> >>>>>>> > >>>>> On Thu, May 15, 2014 at 03:16:47PM +0800, Jason Wang wrote: >>>>>>>> > >>>>>> It's hard to track all mac addresses and their configurations >>>>>>>> > >>>>>> (e.g >>>>>>>> > >>>>>> vlan or ipv6) in qemu. Without those informations, it's >>>>>>>> > >>>>>> impossible to >>>>>>> > >>>>> s/those informations/this information/ >>>>>>> > >>>>> >>>>>>>> > >>>>>> build proper garp packet after migration. The only possible >>>>>>>> > >>>>>> solution >>>>>>>> > >>>>>> to this is let guest (who knew all configurations) to do this. >>>>>>> > >>>>> s/knew/knows/ >>>>>>> > >>>>> >>>>>>>> > >>>>>> So, this patch introduces a new readonly config status bit of >>>>>>>> > >>>>>> virtio-net, >>>>>>>> > >>>>>> VIRTIO_NET_S_ANNOUNCE which is used to notify guest to >>>>>>>> > >>>>>> announce >>>>>>>> > >>>>>> presence of its link through config update interrupt.When >>>>>>>> > >>>>>> guest has >>>>>>>> > >>>>>> done the announcement, it should ack the notification through >>>>>>>> > >>>>>> VIRTIO_NET_CTRL_ANNOUNCE_ACK cmd. This feature is negotiated >>>>>>>> > >>>>>> by a new >>>>>>>> > >>>>>> feature bit VIRTIO_NET_F_ANNOUNCE (which has already been >>>>>>>> > >>>>>> supported by >>>>>>>> > >>>>>> Linux guest). >>>>>>>> > >>>>>> >>>>>>>> > >>>>>> During load, a counter of announcing rounds were set so that >>>>>>>> > >>>>>> the after >>>>>>> > >>>>> s/were/is/ >>>>>>> > >>>>> s/the after/after/ >>>>>> > >>>> Will correct those typos. >>>>>>>> > >>>>>> the vm is running it can trigger rounds of config interrupts >>>>>>>> > >>>>>> to notify >>>>>>>> > >>>>>> the guest to build and send the correct garps. >>>>>>>> > >>>>>> >>>>>>>> > >>>>>> Tested with ping to guest with vlan during migration. Without >>>>>>>> > >>>>>> the >>>>>>>> > >>>>>> patch, lots of the packets were lost after migration. With >>>>>>>> > >>>>>> the patch, >>>>>>>> > >>>>>> could not notice packet loss after migration. >>>>>>> > >>>>> below changelog should go after ---, until the ack list. >>>>>>> > >>>>> >>>>>> > >>>> Ok. >>>>>>>> > >>>>>> Reference: >>>>>>>> > >>>>>> RFC v2: >>>>>>>> > >>>>>> https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg01750.html >>>>>>>> > >>>>>> RFC v1: >>>>>>>> > >>>>>> https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg02648.html >>>>>>>> > >>>>>> V7: >>>>>>>> > >>>>>> https://lists.gnu.org/archive/html/qemu-devel/2013-03/msg01127.html >>>>>>>> > >>>>>> >>>>>>>> > >>>>>> Changes from RFC v2: >>>>>>>> > >>>>>> - use QEMU_CLOCK_VIRTUAL instead of QEMU_CLOCK_REALTIME >>>>>>>> > >>>>>> - compat self announce for 2.0 machine type >>>>>>>> > >>>>>> >>>>>>>> > >>>>>> Changes from RFC v1: >>>>>>>> > >>>>>> - clean VIRTIO_NET_S_ANNOUNCE bit during reset >>>>>>>> > >>>>>> - free announce timer during clean >>>>>>>> > >>>>>> - make announce work for non-vhost case >>>>>>>> > >>>>>> >>>>>>>> > >>>>>> Changes from V7: >>>>>>>> > >>>>>> - Instead of introducing a global method for each kind of >>>>>>>> > >>>>>> nic, this >>>>>>>> > >>>>>> version limits the changes to virtio-net itself. >>>>>>>> > >>>>>> >>>>>>>> > >>>>>> Cc: Liuyongan <liuyon...@huawei.com> >>>>>>>> > >>>>>> Cc: Amos Kong <ak...@redhat.com> >>>>>>>> > >>>>>> Signed-off-by: Jason Wang <jasow...@redhat.com> >>>>>>>> > >>>>>> --- >>>>>>>> > >>>>>> hw/net/virtio-net.c | 48 >>>>>>>> > >>>>>> ++++++++++++++++++++++++++++++++++++++++ >>>>>>>> > >>>>>> include/hw/i386/pc.h | 5 ++++ >>>>>>>> > >>>>>> include/hw/virtio/virtio-net.h | 16 +++++++++++++ >>>>>>>> > >>>>>> 3 files changed, 69 insertions(+), 0 deletions(-) >>>>>>>> > >>>>>> >>>>>>>> > >>>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c >>>>>>>> > >>>>>> index 940a7cf..98d59e9 100644 >>>>>>>> > >>>>>> --- a/hw/net/virtio-net.c >>>>>>>> > >>>>>> +++ b/hw/net/virtio-net.c >>>>>>>> > >>>>>> @@ -25,6 +25,7 @@ >>>>>>>> > >>>>>> #include "monitor/monitor.h" >>>>>>>> > >>>>>> >>>>>>>> > >>>>>> #define VIRTIO_NET_VM_VERSION 11 >>>>>>>> > >>>>>> +#define VIRTIO_NET_ANNOUNCE_ROUNDS 3 >>>>>>>> > >>>>>> >>>>>>>> > >>>>>> #define MAC_TABLE_ENTRIES 64 >>>>>>>> > >>>>>> #define MAX_VLAN (1 << 12) /* Per 802.1Q definition */ >>>>>>> > >>>>> I would make it 5 to be consistent with SELF_ANNOUNCE_ROUNDS >>>>>>> > >>>>> in savevm.c >>>>>>> > >>>>> >>>>>>> > >>>>> in fact, let's move SELF_ANNOUNCE_ROUNDS to >>>>>>> > >>>>> include/migration/vmstate.h >>>>>>> > >>>>> and reuse it. >>>>>> > >>>> Ok. >>>>>>>> > >>>>>> @@ -99,6 +100,25 @@ static bool virtio_net_started(VirtIONet >>>>>>>> > >>>>>> *n, uint8_t status) >>>>>>>> > >>>>>> (n->status & VIRTIO_NET_S_LINK_UP) && >>>>>>>> > >>>>>> vdev->vm_running; >>>>>>>> > >>>>>> } >>>>>>>> > >>>>>> >>>>>>>> > >>>>>> +static void virtio_net_announce_timer(void *opaque) >>>>>>>> > >>>>>> +{ >>>>>>>> > >>>>>> + VirtIONet *n = opaque; >>>>>>>> > >>>>>> + VirtIODevice *vdev = VIRTIO_DEVICE(n); >>>>>>>> > >>>>>> + >>>>>>>> > >>>>>> + if (n->announce && >>>>>>> > >>>>> I would make it > 0 here, just in case it becomes negative as a >>>>>>> > >>>>> result >>>>>>> > >>>>> of some bug. >>>>>> > >>>> Sure. >>>>>>>> > >>>>>> + virtio_net_started(n, vdev->status) && >>>>>>>> > >>>>>> + vdev->guest_features & (0x1 << >>>>>>>> > >>>>>> VIRTIO_NET_F_GUEST_ANNOUNCE) && >>>>>>>> > >>>>>> + vdev->guest_features & (0x1 << >>>>>>>> > >>>>>> VIRTIO_NET_F_CTRL_VQ)) { >>>>>>>> > >>>>>> + >>>>>>>> > >>>>>> + n->announce--; >>>>>>>> > >>>>>> + n->status |= VIRTIO_NET_S_ANNOUNCE; >>>>>>>> > >>>>>> + virtio_notify_config(vdev); >>>>>>>> > >>>>>> + } else { >>>>>>>> > >>>>>> + timer_del(n->announce_timer); >>>>>>> > >>>>> why do this here? >>>>>>> > >>>>> >>>>>>>> > >>>>>> + n->announce = 0; >>>>>>> > >>>>> why is this here? >>>>>>> > >>>>> >>>>>> > >>>> If guest shuts down the device, there's no need to do the >>>>>> > >>>> announcing. >>>>> > >>> It's still weird. >>>>> > >>> Either announce is 0 and then timer was not running >>>>> > >>> or it's > 0 and then this won't trigger. >>>> > >> Right, the logic here is for QEMU_CLOCK_REALTIME. But there's another >>>> > >> question, we use QEMU_CLOCK_VIRTUAL while qemu is using >>>> > >> QEMU_CLOCK_REALTIME for its announcing. This looks fine but not sure >>>> > >> whether this is safe. >>> > > Meaning QEMU_CLOCK_REALTIME that qemu uses? >>> > > Not sure either but it doesn't modify guest state so it seems safe. >>> > > >>> > > >>>> > >> And if the link was down, it's better for us to >>>> > >> stop the announcing? >>> > > I think that it doesn't matter: guest won't announce when >>> > > link is down, anyway. >>> > > Not worth it to write extra logic here in the host. >> > >> > Ok. >>>>>>>> > >>>>>> + } >>>>>>>> > >>>>>> +} >>>>>>>> > >>>>>> + >>>>>>>> > >>>>>> static void virtio_net_vhost_status(VirtIONet *n, uint8_t >>>>>>>> > >>>>>> status) >>>>>>>> > >>>>>> { >>>>>>>> > >>>>>> VirtIODevice *vdev = VIRTIO_DEVICE(n); >>>>>>>> > >>>>>> @@ -147,6 +167,8 @@ static void virtio_net_set_status(struct >>>>>>>> > >>>>>> VirtIODevice *vdev, uint8_t status) >>>>>>>> > >>>>>> >>>>>>>> > >>>>>> virtio_net_vhost_status(n, status); >>>>>>>> > >>>>>> >>>>>>>> > >>>>>> + virtio_net_announce_timer(n); >>>>>>>> > >>>>>> + >>>>>>> > >>>>> why do this here? why not right after we set announce counter? >>>>>> > >>>> The reasons are: >>>>>> > >>>> >>>>>> > >>>> - The counters were set in load, but the device is not running so >>>>>> > >>>> we >>>>>> > >>>> could not inject the interrupt at that time. >>>>> > >>> I see. This makes sense - but this isn't intuitive. >>>>> > >>> Why don't we simply start timer with current time? >>>>> > >>> Need to make sure it runs fine if time passes, but >>>>> > >>> I think it's fine. >>>> > >> Not sure I get the point, I didn't see any differences except for an >>>> > >> extra timer fire. >>> > > The only reason you call virtio_net_announce_timer from set_status >>> > > is because it gets run on vm start/stop. >>> > > It's true but not intuitive. >>> > > Just run timer always from timer, it's clearer this way :) >>> > > >> > >> > Sure. >>>>>> > >>>> - We can stop the progress when guest is shutting down the device. >>>>> > >>> On shut down guest will reset device stopping timer - this seems >>>>> > >>> enough. >>>> > >> Yes, I see. >>>>>>>> > >>>>>> for (i = 0; i < n->max_queues; i++) { >>>>>>>> > >>>>>> q = &n->vqs[i]; >>>>>>>> > >>>>>> >>>>>>>> > >>>>>> @@ -322,6 +344,9 @@ static void virtio_net_reset(VirtIODevice >>>>>>>> > >>>>>> *vdev) >>>>>>>> > >>>>>> n->nobcast = 0; >>>>>>>> > >>>>>> /* multiqueue is disabled by default */ >>>>>>>> > >>>>>> n->curr_queues = 1; >>>>>>>> > >>>>>> + timer_del(n->announce_timer); >>>>>>>> > >>>>>> + n->announce = 0; >>>>>>>> > >>>>>> + n->status &= ~VIRTIO_NET_S_ANNOUNCE; >>>>>>>> > >>>>>> >>>>>>>> > >>>>>> /* Flush any MAC and VLAN filter table state */ >>>>>>>> > >>>>>> n->mac_table.in_use = 0; >>>>>>>> > >>>>>> @@ -731,6 +756,22 @@ static int >>>>>>>> > >>>>>> virtio_net_handle_vlan_table(VirtIONet *n, uint8_t cmd, >>>>>>>> > >>>>>> return VIRTIO_NET_OK; >>>>>>>> > >>>>>> } >>>>>>>> > >>>>>> >>>>>>>> > >>>>>> +static int virtio_net_handle_announce(VirtIONet *n, uint8_t >>>>>>>> > >>>>>> cmd, >>>>>>>> > >>>>>> + struct iovec *iov, >>>>>>>> > >>>>>> unsigned int iov_cnt) >>>>>>>> > >>>>>> +{ >>>>>>>> > >>>>>> + if (cmd == VIRTIO_NET_CTRL_ANNOUNCE_ACK) { >>>>>>>> > >>>>>> + n->status &= ~VIRTIO_NET_S_ANNOUNCE; >>>>>>>> > >>>>>> + if (n->announce) { >>>>>>> > >>>>> I would make it > 0 here, just in case it becomes negative as a >>>>>>> > >>>>> result >>>>>>> > >>>>> of some bug. >>>>>> > >>>> Ok. >>>>>>>> > >>>>>> + timer_mod(n->announce_timer, >>>>>>>> > >>>>>> + qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) >>>>>>>> > >>>>>> + 50 + >>>>>>>> > >>>>>> + (VIRTIO_NET_ANNOUNCE_ROUNDS - >>>>>>>> > >>>>>> n->announce - 1) * 100); >>>>>>> > >>>>> savevm.c has this code, and a comment: >>>>>>> > >>>>> /* delay 50ms, 150ms, 250ms, ... */ >>>>>>> > >>>>> 50 + (SELF_ANNOUNCE_ROUNDS - count - 1) * 100 >>>>>>> > >>>>> >>>>>>> > >>>>> how about an API in include/migration/vmstate.h >>>>>>> > >>>>> >>>>>>> > >>>>> static inline >>>>>>> > >>>>> int64_t self_announce_delay(int round) >>>>>>> > >>>>> { >>>>>>> > >>>>> assert(round < SELF_ANNOUNCE_ROUNDS && round > 0); >>>>>>> > >>>>> /* delay 50ms, 150ms, 250ms, ... */ >>>>>>> > >>>>> return 50 + (SELF_ANNOUNCE_ROUNDS - round - 1) * 100; >>>>>>> > >>>>> } >>>>>>> > >>>>> >>>>>>> > >>>>> or something to this end. >>>>>>> > >>>>> >>>>>> > >>>> Good idea, will do this. >>>>>>>> > >>>>>> + } >>>>>>>> > >>>>>> + return VIRTIO_NET_OK; >>>>>>>> > >>>>>> + } else { >>>>>>>> > >>>>>> + return VIRTIO_NET_ERR; >>>>>>>> > >>>>>> + } >>>>>>>> > >>>>>> +} >>>>>>>> > >>>>>> + >>>>>>>> > >>>>>> static int virtio_net_handle_mq(VirtIONet *n, uint8_t cmd, >>>>>>>> > >>>>>> struct iovec *iov, unsigned >>>>>>>> > >>>>>> int iov_cnt) >>>>>>>> > >>>>>> { >>>>>>>> > >>>>>> @@ -794,6 +835,8 @@ static void >>>>>>>> > >>>>>> virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq) >>>>>>>> > >>>>>> status = virtio_net_handle_mac(n, ctrl.cmd, iov, >>>>>>>> > >>>>>> iov_cnt); >>>>>>>> > >>>>>> } else if (ctrl.class == VIRTIO_NET_CTRL_VLAN) { >>>>>>>> > >>>>>> status = virtio_net_handle_vlan_table(n, >>>>>>>> > >>>>>> ctrl.cmd, iov, iov_cnt); >>>>>>>> > >>>>>> + } else if (ctrl.class == VIRTIO_NET_CTRL_ANNOUNCE) { >>>>>>>> > >>>>>> + status = virtio_net_handle_announce(n, ctrl.cmd, >>>>>>>> > >>>>>> iov, iov_cnt); >>>>>>>> > >>>>>> } else if (ctrl.class == VIRTIO_NET_CTRL_MQ) { >>>>>>>> > >>>>>> status = virtio_net_handle_mq(n, ctrl.cmd, iov, >>>>>>>> > >>>>>> iov_cnt); >>>>>>>> > >>>>>> } else if (ctrl.class == >>>>>>>> > >>>>>> VIRTIO_NET_CTRL_GUEST_OFFLOADS) { >>>>>>>> > >>>>>> @@ -1451,6 +1494,7 @@ static int virtio_net_load(QEMUFile *f, >>>>>>>> > >>>>>> void *opaque, int version_id) >>>>>>>> > >>>>>> qemu_get_subqueue(n->nic, i)->link_down = link_down; >>>>>>>> > >>>>>> } >>>>>>>> > >>>>>> >>>>>>>> > >>>>>> + n->announce = VIRTIO_NET_ANNOUNCE_ROUNDS; >>>>>>> > >>>>> Well if virtio_net_handle_announce runs now it will start timer >>>>>>> > >>>>> in the past, right? >>>>>>> > >>>>> This seems wrong. >>>>>> > >>>> Not sure I get the case. When in virtio_net_load() the vm is not >>>>>> > >>>> even >>>>>> > >>>> running so looks like virtio_net_handle_announce() could not run >>>>>> > >>>> in the >>>>>> > >>>> same time. >>>>> > >>> I see, this works because you decrement it when VM starts running. >>>>> > >>> I think it's not a good idea to rely on this though, >>>>> > >>> better do everything from timer, and handle >>>>> > >>> the case of command arriving too early. >>>>> > >>> >>>> > >> Right, if QEMU_CLOCK_VIRTUAL is fine, we can do everything in a timer. >>>> > >> >>>> > >> For the case of command arriving too early. Is there anything else we >>>> > >> need to do? Since we only start the next timer when we get ack >>>> > >> command. >>>> > >> >>>> > >> Thanks >>> > > I think we need to make sure we don't set the timer in the past or >>> > > very far in the future. >> > >> > n->announce is between 0 and VIRTIO_NET_ANNOUNCE_ROUNDS - 1, when doing >> > timer_mod(), so it looks ok. > All I am saying, make sure it's within the range even if > handle announce is called before timer_mod. > Can not happen with your current patch, but must make sure > it's still correct after you make changes :) > Also, maybe add an assert there. >
I get your meaning. Will post V2. Thanks