On 2014/8/18 20:27, Dr. David Alan Gilbert wrote:
* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote:
For all NICs(except virtio-net) emulated by qemu,
Such as e1000, rtl8139, pcnet and ne2k_pci,
Qemu can still receive packets when VM is not running.
If this happened in *migration's* last PAUSE VM stage,
The new dirty RAM related to the packets will be missed,
And this will lead serious network fault in VM.
I'd like to understand more about when exactly this happens.
migration.c:migration_thread in the last step, takes the iothread, puts
the runstate into RUN_STATE_FINISH_MIGRATE and only then calls
qemu_savevm_state_complete
before unlocking the iothread.
Hmm, Sorry, the description above may not be exact.
Actually, the action of receiving packets action happens after the
migration thread unlock the iothread-lock(when VM is stopped), but
before the end of the migration(to be exact, before close the socket
fd, which is used to send data to destination).
I think before close the socket fd, we can not assure all the RAM of the
VM has been copied to the memory of network card or has been send to
the destination, we still have the chance to modify its content. (If i
am wrong, please let me know, Thanks. ;) )
If the above situation happens, it will dirty parts of the RAM which
will be send and parts of new dirty RAM page may be missed. As a result,
The contents of memory in the destination is not integral, And this
will lead network fault in VM.
Here i have made a test:
(1) Download the new qemu.
(2) Extend the time window between qemu_savevm_state_complete and
migrate_fd_cleanup(where qemu_fclose(s->file) will be called), patch
like(this will extend the chances of receiving packets between the time
window):
diff --git a/migration.c b/migration.c
index 8d675b3..597abf9 100644
--- a/migration.c
+++ b/migration.c
@@ -614,7 +614,7 @@ static void *migration_thread(void *opaque)
qemu_savevm_state_complete(s->file);
}
qemu_mutex_unlock_iothread();
-
+ sleep(2);/*extend the time window between stop vm and
end migration*/
if (ret < 0) {
migrate_set_state(s, MIG_STATE_ACTIVE,
MIG_STATE_ERROR);
break;
(3) Start VM (suse11sp1) and in VM ping xxx -i 0.1
(4) Migrate the VM to another host.
And the *PING* command in VM will very likely fail with message:
'Destination HOST Unreachable', the NIC in VM will stay unavailable
unless you run 'service network restart'.(without step 2, you may have
to migration lots of times between two hosts before that happens).
Thanks,
zhanghailiang
If that's true, how does this problem happen - can the net code
still receive packets with the iothread lock taken?
qemu_savevm_state_complete does a call to the RAM code, so I think this should
get
any RAM changes that happened before the lock was taken.
I'm gently worried about threading stuff as well - is all this net code
running off fd handlers on the main thread or are there separate threads?
Dave
To avoid this, we forbid receiving packets in generic net code when
VM is not running. Also, when the runstate changes back to running,
we definitely need to flush queues to get packets flowing again.
Here we implement this in the net layer:
(1) Judge the vm runstate in qemu_can_send_packet
(2) Add a member 'VMChangeStateEntry *vmstate' to struct NICState,
Which will listen for VM runstate changes.
(3) Register a handler function for VMstate change.
When vm changes back to running, we flush all queues in the callback function.
(4) Remove checking vm state in virtio_net_can_receive
Signed-off-by: zhanghailiang<zhang.zhanghaili...@huawei.com>
---
hw/net/virtio-net.c | 4 ----
include/net/net.h | 2 ++
net/net.c | 32 ++++++++++++++++++++++++++++++++
3 files changed, 34 insertions(+), 4 deletions(-)
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 268eff9..287d762 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -839,10 +839,6 @@ static int virtio_net_can_receive(NetClientState *nc)
VirtIODevice *vdev = VIRTIO_DEVICE(n);
VirtIONetQueue *q = virtio_net_get_subqueue(nc);
- if (!vdev->vm_running) {
- return 0;
- }
-
if (nc->queue_index>= n->curr_queues) {
return 0;
}
diff --git a/include/net/net.h b/include/net/net.h
index ed594f9..a294277 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -8,6 +8,7 @@
#include "net/queue.h"
#include "migration/vmstate.h"
#include "qapi-types.h"
+#include "sysemu/sysemu.h"
#define MAX_QUEUE_NUM 1024
@@ -96,6 +97,7 @@ typedef struct NICState {
NICConf *conf;
void *opaque;
bool peer_deleted;
+ VMChangeStateEntry *vmstate;
} NICState;
NetClientState *qemu_find_netdev(const char *id);
diff --git a/net/net.c b/net/net.c
index 6d930ea..21f0d48 100644
--- a/net/net.c
+++ b/net/net.c
@@ -242,6 +242,29 @@ NetClientState *qemu_new_net_client(NetClientInfo *info,
return nc;
}
+static void nic_vmstate_change_handler(void *opaque,
+ int running,
+ RunState state)
+{
+ NICState *nic = opaque;
+ NetClientState *nc;
+ int i, queues;
+
+ if (!running) {
+ return;
+ }
+
+ queues = MAX(1, nic->conf->peers.queues);
+ for (i = 0; i< queues; i++) {
+ nc =&nic->ncs[i];
+ if (nc->receive_disabled
+ || (nc->info->can_receive&& !nc->info->can_receive(nc))) {
+ continue;
+ }
+ qemu_flush_queued_packets(nc);
+ }
+}
+
NICState *qemu_new_nic(NetClientInfo *info,
NICConf *conf,
const char *model,
@@ -259,6 +282,8 @@ NICState *qemu_new_nic(NetClientInfo *info,
nic->ncs = (void *)nic + info->size;
nic->conf = conf;
nic->opaque = opaque;
+ nic->vmstate = qemu_add_vm_change_state_handler(nic_vmstate_change_handler,
+ nic);
for (i = 0; i< queues; i++) {
qemu_net_client_setup(&nic->ncs[i], info, peers[i], model, name,
@@ -379,6 +404,7 @@ void qemu_del_nic(NICState *nic)
qemu_free_net_client(nc);
}
+ qemu_del_vm_change_state_handler(nic->vmstate);
g_free(nic);
}
@@ -452,6 +478,12 @@ void qemu_set_vnet_hdr_len(NetClientState *nc, int len)
int qemu_can_send_packet(NetClientState *sender)
{
+ int vmstat = runstate_is_running();
+
+ if (!vmstat) {
+ return 0;
+ }
+
if (!sender->peer) {
return 1;
}
--
1.7.12.4
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
.