On 12/27/2013 12:44 PM, Alexey Kardashevskiy wrote: > On 12/27/2013 02:12 AM, Michael S. Tsirkin wrote: >> On Fri, Dec 27, 2013 at 01:59:19AM +1100, Alexey Kardashevskiy wrote: >>> On 12/27/2013 12:48 AM, Michael S. Tsirkin wrote: >>>> On Thu, Dec 26, 2013 at 11:51:04PM +1100, Alexey Kardashevskiy wrote: >>>>> On 12/26/2013 09:49 PM, Michael S. Tsirkin wrote: >>>>>> On Thu, Dec 26, 2013 at 09:13:31PM +1100, Alexey Kardashevskiy wrote: >>>>>>> On 12/25/2013 08:52 PM, Michael S. Tsirkin wrote: >>>>>>>> On Wed, Dec 25, 2013 at 12:36:12PM +1100, Alexey Kardashevskiy wrote: >>>>>>>>> On 12/25/2013 02:43 AM, Michael S. Tsirkin wrote: >>>>>>>>>> On Wed, Dec 25, 2013 at 01:15:29AM +1100, Alexey Kardashevskiy wrote: >>>>>>>>>>> On 12/24/2013 08:40 PM, Michael S. Tsirkin wrote: >>>>>>>>>>>> On Tue, Dec 24, 2013 at 02:09:07PM +1100, Alexey Kardashevskiy >>>>>>>>>>>> wrote: >>>>>>>>>>>>> On 12/24/2013 03:24 AM, Michael S. Tsirkin wrote: >>>>>>>>>>>>>> On Mon, Dec 23, 2013 at 02:01:13AM +1100, Alexey Kardashevskiy >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> On 12/23/2013 01:46 AM, Alexey Kardashevskiy wrote: >>>>>>>>>>>>>>>> On 12/22/2013 09:56 PM, Michael S. Tsirkin wrote: >>>>>>>>>>>>>>>>> On Sun, Dec 22, 2013 at 02:01:23AM +1100, Alexey >>>>>>>>>>>>>>>>> Kardashevskiy wrote: >>>>>>>>>>>>>>>>>> Hi! >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I am having a problem with virtio-net + vhost on POWER7 >>>>>>>>>>>>>>>>>> machine - it does >>>>>>>>>>>>>>>>>> not survive reboot of the guest. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Steps to reproduce: >>>>>>>>>>>>>>>>>> 1. boot the guest >>>>>>>>>>>>>>>>>> 2. configure eth0 and do ping - everything works >>>>>>>>>>>>>>>>>> 3. reboot the guest (i.e. type "reboot") >>>>>>>>>>>>>>>>>> 4. when it is booted, eth0 can be configured but will not >>>>>>>>>>>>>>>>>> work at all. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The test is: >>>>>>>>>>>>>>>>>> ifconfig eth0 172.20.1.2 up >>>>>>>>>>>>>>>>>> ping 172.20.1.23 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> If to run tcpdump on the host's "tap-id3" interface, it >>>>>>>>>>>>>>>>>> shows no trafic >>>>>>>>>>>>>>>>>> coming from the guest. If to compare how it works before and >>>>>>>>>>>>>>>>>> after reboot, >>>>>>>>>>>>>>>>>> I can see the guest doing an ARP request for 172.20.1.23 and >>>>>>>>>>>>>>>>>> receives the >>>>>>>>>>>>>>>>>> response and it does the same after reboot but the answer >>>>>>>>>>>>>>>>>> does not come. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> So you see the arp packet in guest but not in host? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Yes. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> One thing to try is to boot debug kernel - where pr_debug is >>>>>>>>>>>>>>>>> enabled - then you might see some errors in the kernel log. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Tried and added lot more debug printk myself, not clear at all >>>>>>>>>>>>>>>> what is >>>>>>>>>>>>>>>> happening there. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> One more hint - if I boot the guest and the guest does not >>>>>>>>>>>>>>>> bring eth0 up >>>>>>>>>>>>>>>> AND wait more than 200 seconds (and less than 210 seconds), >>>>>>>>>>>>>>>> then eth0 will >>>>>>>>>>>>>>>> not work at all. I.e. this script produces not-working-eth0: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ifconfig eth0 172.20.1.2 down >>>>>>>>>>>>>>>> sleep 210 >>>>>>>>>>>>>>>> ifconfig eth0 172.20.1.2 up >>>>>>>>>>>>>>>> ping 172.20.1.23 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> s/210/200/ - and it starts working. No reboot is required to >>>>>>>>>>>>>>>> reproduce. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> No "vhost" == always works. The only difference I can see here >>>>>>>>>>>>>>>> is vhost's >>>>>>>>>>>>>>>> thread which may get suspended if not used for a while after >>>>>>>>>>>>>>>> the start and >>>>>>>>>>>>>>>> does not wake up but this is almost a blind guess. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Yet another clue - this host kernel patch seems to help with >>>>>>>>>>>>>>> the guest >>>>>>>>>>>>>>> reboot but does not help with the initial 210 seconds delay: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c >>>>>>>>>>>>>>> index 69068e0..5e67650 100644 >>>>>>>>>>>>>>> --- a/drivers/vhost/vhost.c >>>>>>>>>>>>>>> +++ b/drivers/vhost/vhost.c >>>>>>>>>>>>>>> @@ -162,10 +162,10 @@ void vhost_work_queue(struct vhost_dev >>>>>>>>>>>>>>> *dev, struct >>>>>>>>>>>>>>> vhost_work *work) >>>>>>>>>>>>>>> list_add_tail(&work->node, &dev->work_list); >>>>>>>>>>>>>>> work->queue_seq++; >>>>>>>>>>>>>>> spin_unlock_irqrestore(&dev->work_lock, flags); >>>>>>>>>>>>>>> - wake_up_process(dev->worker); >>>>>>>>>>>>>>> } else { >>>>>>>>>>>>>>> spin_unlock_irqrestore(&dev->work_lock, flags); >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> + wake_up_process(dev->worker); >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> EXPORT_SYMBOL_GPL(vhost_work_queue); >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Interesting. Some kind of race? A missing memory barrier >>>>>>>>>>>>>> somewhere? >>>>>>>>>>>>> >>>>>>>>>>>>> I do not see how. I boot the guest and just wait 210 seconds, >>>>>>>>>>>>> nothing >>>>>>>>>>>>> happens to cause races. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Since it's all around startup, >>>>>>>>>>>>>> you can try kicking the host eventfd in >>>>>>>>>>>>>> vhost_net_start. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> How exactly? This did not help. Thanks. >>>>>>>>>>>>> >>>>>>>>>>>>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c >>>>>>>>>>>>> index 006576d..407ecf2 100644 >>>>>>>>>>>>> --- a/hw/net/vhost_net.c >>>>>>>>>>>>> +++ b/hw/net/vhost_net.c >>>>>>>>>>>>> @@ -229,6 +229,17 @@ int vhost_net_start(VirtIODevice *dev, >>>>>>>>>>>>> NetClientState >>>>>>>>>>>>> *ncs, >>>>>>>>>>>>> if (r < 0) { >>>>>>>>>>>>> goto err; >>>>>>>>>>>>> } >>>>>>>>>>>>> + >>>>>>>>>>>>> + VHostNetState *vn = tap_get_vhost_net(ncs[i].peer); >>>>>>>>>>>>> + struct vhost_vring_file file = { >>>>>>>>>>>>> + .index = i >>>>>>>>>>>>> + }; >>>>>>>>>>>>> + file.fd = >>>>>>>>>>>>> event_notifier_get_fd(virtio_queue_get_host_notifier(dev->vq)); >>>>>>>>>>>>> + r = ioctl(vn->dev.control, VHOST_SET_VRING_KICK, &file); >>>>>>>>>>>> >>>>>>>>>>>> No, this sets the notifier, it does not kick. >>>>>>>>>>>> To kick you write 1 there: >>>>>>>>>>>> uint6_t v = 1; >>>>>>>>>>>> write(fd, &v, sizeof v); >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Please, be precise. How/where do I get that @fd? Is what I do >>>>>>>>>>> correct? >>>>>>>>>> >>>>>>>>>> Yes. >>>>>>>>>> >>>>>>>>>>> What >>>>>>>>>>> is uint6_t - uint8_t or uint16_t (neither works)? >>>>>>>>>> >>>>>>>>>> Sorry, should have been uint64_t. >>>>>>>>> >>>>>>>>> >>>>>>>>> Oh, that I missed :-) Anyway, this does not make any difference. Is >>>>>>>>> there >>>>>>>>> any cheap&dirty way to make vhost-net kernel thread always awake? >>>>>>>>> Sending >>>>>>>>> it signals from the user space does not work... >>>>>>>> >>>>>>>> You can run a timer in qemu and signal the eventfd from there >>>>>>>> periodically. >>>>>>>> >>>>>>>> Just to restate, tcpdump in guest shows that guest sends arp packet, >>>>>>>> but tcpdump in host on tun device does not show any packets? >>>>>>> >>>>>>> >>>>>>> Ok. Figured it out about disabling interfaces in Fedora19. I was wrong, >>>>>>> something is happening on the host's TAP - the guest sends ARP request, >>>>>>> the >>>>>>> response is visible on the TAP interface but not in the guest. >>>>>> >>>>>> Okay. So problem is on host to guest path then. >>>>>> Things to try: >>>>>> >>>>>> 1. trace handle_rx [vhost_net] >>>>>> 2. trace tun_put_user [tun] >>>>>> 3. I suspect some host bug in one of the features. >>>>>> Let's try to disable some flags with device property: >>>>>> you can get the list by doing: >>>>>> ./x86_64-softmmu/qemu-system-x86_64 -device virtio-net-pci,?|grep on/off >>>>>> Things I would try turning off is guest offloads (ones that start with >>>>>> guest_) >>>>>> event_idx,any_layout,mq. >>>>>> Turn them all off, if it helps try to find the one that helped. >>>>> >>>>> >>>>> Heh. It still would be awesome to read basics about this vhost thing as I >>>>> am debugging blindly :) >>>>> >>>>> Regarding your suggestions. >>>>> >>>>> 1. I put "printk" in handle_rx and tun_put_user. >>>> >>>> Fine, though it's easier with ftrace http://lwn.net/Articles/370423/ >>>> look for function filtering. >>>> >>>>> handle_rx stopped being called after 2:40 from the guest start, >>>>> tun_put_user stopped after 0:20 from the guest start. Accuracy is 5 >>>>> seconds. >>>>> If I bring the guest's eth0 up while handle_rx is still printing, it >>>>> works, >>>>> i.e. tun_put_user is called a lot. Once handle_rx stopped, nothing can >>>>> bring eth0 back to live. >>>> >>>> OK so what should happen is that handle rx is called >>>> when you bring eth0 up. >>>> Do you see this? >>>> The way it is supposed to work is this: >>>> >>>> vhost_net_enable_vq calls vhost_poll_start then >>> >>> >>> This and what follows it is called when QEMU is just booting (in response >>> to PCI enable? somewhere in the middle of PCI discovery process) and then >>> VHOST_NET_SET_BACKEND is not called ever again. >>> >> >> What should happen is up/down in guest >> will call virtio_net_vhost_status in qemu >> and then vhost_net_start/vhost_net_stop is called >> accordingly. >> These call VHOST_NET_SET_BACKEND ioctls >> >> you don't see this? > > > Nope. What I see is that vhost_net_start is only called on > VIRTIO_PCI_STATUS and never after that as PCI status does not change (does > not it?). > > The log of QEMU + gdb with some breakpoints: > http://pastebin.com/CSN6iSn6 > > In this example, I did not wait ~240 seconds so it works but still does not > print what you say it should print :) > > Here is what I run: > http://ozlabs.ru/gitweb/?p=qemu/.git;a=shortlog;h=refs/heads/vhostdbg > > Thanks! > > [ time to go to the ocean :) ]
I am back. Are you? :) Looked a bit further. In the guest's virtnet_set_rx_mode() (drivers/net/virtio_net.c) I added this: === struct scatterlist sg; struct virtio_net_ctrl_mq s; s.virtqueue_pairs = 1; sg_init_one(&sg, &s, sizeof(s)); virtnet_send_command(vi, VIRTIO_NET_CTRL_MQ, VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET, &sg, NULL); === .. in a desperate hope that it will signal to QEMU to stop vhost in virtio_net_vhost_status(). But it does not call vhost_net_stop() as the link is up - it is always up since virtnet_probe() and it never goes down so I guess this is by design. So... vhost-net thread in the host goes to sleep and there is no way to wake it up from the guest as "ifconfig eth0 down ; ifconfig eth0 up" does not change neither link status or @VirtIODevice::status. What would be the right thing to do now? Implement link state management? Or invent "virtio link" and leave QEMU's nc->peer->link_down alone? Or there is some way to tell the kernel thread not to sleep? Thanks! > > >>> >>>> this calls mask = file->f_op->poll(file, &poll->table) >>>> on the tun file. >>>> this calls tun_chr_poll. >>>> at this point there are packets queued on tun already >>>> so that returns POLLIN | POLLRDNORM; >>>> this calls vhost_poll_wakeup and that checks mask against >>>> the key. >>>> key is POLLIN so vhost_poll_queue is called. >>>> this in turn calls vhost_work_queue >>>> work list is either empty then we wake up worker >>>> or it's not empty then worker is running out job anyway. >>>> this will then invoke handle_rx_net. >>>> >>>> >>>>> 2. This is exactly how I run QEMU now. I basically set "off" for every >>>>> on/off parameters. This did not change anything. >>>>> >>>>> ./qemu-system-ppc64 \ >>>>> -enable-kvm \ >>>>> -m 2048 \ >>>>> -L qemu-ppc64-bios/ \ >>>>> -machine pseries \ >>>>> -trace events=qemu_trace_events \ >>>>> -kernel vml312 \ >>>>> -append root=/dev/sda3 virtimg/fc19_16GB_vhostdbg.qcow2 \ >>>>> -nographic \ >>>>> -vga none \ >>>>> -nodefaults \ >>>>> -chardev stdio,id=id0,signal=off,mux=on \ >>>>> -device spapr-vty,id=id1,chardev=id0,reg=0x71000100 \ >>>>> -mon id=id2,chardev=id0,mode=readline \ >>>>> -netdev >>>>> tap,id=id3,ifname=tap-id3,script=ifup.sh,downscript=ifdown.sh,vhost=on \ >>>>> -device >>>>> virtio-net-pci,id=id4,netdev=id3,mac=C0:41:49:4b:00:00,tx=timer,ioeventfd=off,\ >>>>> indirect_desc=off,event_idx=off,any_layout=off,csum=off,guest_csum=off,\ >>>>> gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off,\ >>>>> host_tso4=off,host_tso6=off,host_ecn=off,host_ufo=off,mrg_rxbuf=off,\ >>>>> status=off,ctrl_vq=off,ctrl_rx=off,ctrl_vlan=off,ctrl_rx_extra=off,\ >>>>> ctrl_mac_addr=off,ctrl_guest_offloads=off,mq=off,multifunction=off,\ >>>>> command_serr_enable=off \ >>>>> -netdev user,id=id5,hostfwd=tcp::5000-:22 \ >>>>> -device spapr-vlan,id=id6,netdev=id5,mac=C0:41:49:4b:00:01 >>>>> >>>> >>>> Yes this looks like some kind of race. >>> >>> >>> -- >>> Alexey > > -- Alexey