On Mon, Jul 14, 2014 at 01:55:05AM +0000, Wangkai (Kevin,C) wrote: > > > > -----Original Message----- > > From: Stefan Hajnoczi [mailto:stefa...@redhat.com] > > Sent: Friday, July 11, 2014 9:04 PM > > To: Wangkai (Kevin,C) > > Cc: qemu-devel@nongnu.org; aligu...@amazon.com; Lee yang > > Subject: Re: [PATCH] Tap: fix vcpu long time io blocking on tap > > > > On Fri, Jul 11, 2014 at 01:05:30AM +0000, Wangkai (Kevin,C) wrote: > > > When used a tap as net driver for vm, if too many packets was > > > delivered to the guest os via tap interface, the guest os will be > > > blocked on io events for a long time, while tap driver was busying > > process packets. > > > > > > kvm vcpu thread block on io lock call trace: > > > __lll_lock_wait > > > _L_lock_1004 > > > __pthread_mutex_lock > > > qemu_mutex_lock > > > kvm_cpu_exec > > > qemu_kvm_cpu_thread_fn > > > start_thread > > > > > > qemu io thread call trace: > > > ... > > > qemu_net_queue_send > > > tap_send > > > qemu_iohandler_poll > > > main_loop_wait > > > main_loop > > > > > > > > > I think the qemu io lock time should be as small as possible, and the > > > io work slice should be limited at a particular ration or time. > > > > > > --- > > > Signed-off-by: Wangkai <wangka...@huawei.com> > > > > How many packets are you seeing in a single tap_send() call? > > > > Have you profiled the tap_send() code path? Maybe it is performing > > some operation that is very slow. > > > > By the way, if you want good performance you should use vhost_net > > instead of userspace vhost_net. Userspace virtio-net is not very > > optimized. > > > > Stefan > > > Hi Stefan, > > I am not use profile, just debug with gdb and code review.
It's worth understanding the root cause for this behavior because something is probably wrong. > When packets delivered, I found the VM was hung, and I check qemu run > > State by gdb, I see the call trace for IO thread and vcpu thread, and > > I add debug info to check how many packets within tap_send, the info below: > > total recv 393520 time 1539821 us > total recv 1270 time 4931 us > total recv 257872 time 995828 us > total recv 10745 time 41438 us > total recv 505387 time 2000925 us 505387 packets or 505387 bytes? If that's packets, then even with small 64-byte packets that would mean 32 MB of pending data! Are you running a networking benchmark where you'd expect lots of packets? Have you checked how lot the time between tap_send() calls is? Perhaps something else in QEMU is blocking the event loop so packets accumulate in the host kernel. Stefan
pgpuAR_aSa7hS.pgp
Description: PGP signature