On 08/21/2014 02:28 PM, Zhangjie (HZ) wrote: > On 2014/8/21 12:29, Jason Wang wrote: >> > On 08/20/2014 05:23 PM, Zhangjie (HZ) wrote: >>> >> On 2014/8/19 12:56, Jason Wang wrote: >>>> >>> commit a9f98bb5ebe6fb1869321dcc58e72041ae626ad8 vhost: multiqueue >>> >> call it before setting >>>> >>> Zhang Jie, please test this patch to see if it fixes the issue. >>>> >>> +static void vhost_net_set_vq_index(struct vhost_net *net, int >>>> >>> vq_index) >>>> >>> +{ >>>> >>> + net->dev.vq_index = vq_index; >>>> >>> +} >>> >> int vq_index) >>>> >>> ... >>> >> Because of vhost_net_set_vq_index, VM can be start successfully. >>> >> But, after about 80 times of migration under my environment, virtual nic >>> >> became unreachable again. >>> >> When I use jprobe to notify tap, the virtual nic becomes reachable >>> >> again. This shows that interrupts missing causes >>> >> the problem. >> > >> > Thanks for the testing. A questions is can you reproduce this when vhost >> > is disabled? > After migration, vhost is not disabled, virtual nic became unreachable > because vhost is not awakened. > By the logical of EVENT_IDX, virtio-net will not kick vhost again if the used > idx is not updated. > So, if one interrupts is lost during migration, virtio_net will not kick > vhost again. > Then, no skb from virtio-net can be sent to tap. > > Jason's patch reduced the probability of occurrence, from about 1/20 to 1/80. > It is really effective. I think the patch should be acked. > May be we can try to solve the problem from another perspective. Do you have > some methods to sense the migration? > We can make up a signal from virtio-net after the migration. > >> > >> > Anyway, I will try to reproduce it by myself. >> > > The test environment is really terrible, I build a environment myself, but it > problem did not occur. > The environment I use now is from a colleague Responsible for test work. > Two hosts, every host has about 20 vms, they send packages(ipv4 and ipv6) > between each other. > The VM to be migrated also sens packages itself, and there is a ping(-i > 0.001) from another host to it. > The physical nic is 1GE, connected through a internal nework. Yes.
I'm trying to reproduce locally, but with my patch on top, after 5000+ times of migration, network is still available (I stress the guest network in the same time). What's the qemu command line did you use, and did you enable zerocopy?