Hi,The problem doesn't seem to happen with latest ovs 2.5.0 git branch (commit ac93328273238b5dc86353222264fa4f30ad95e8, dpdk 16.04). This is the stack traces we got with 2.5.0 release: before running traffic:
(gdb) info threads Id TargetId Frame 29 Thread 0x7fb44f004700 (LWP 4498)"dpdk_watchdog1" 0x00007fb44f0bef2d in nanosleep () at../sysdeps/unix/syscall-template.S:81 28 Thread 0x7fb44e803700 (LWP 4499)"vhost_thread2" 0x00007fb44f0e6ae3 in select () at../sysdeps/unix/syscall-template.S:81 27 Thread 0x7fb44e002700 (LWP 4500)"urcu3" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 26 Thread 0x7fb3fbfff700 (LWP 4601)"handler82" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 25 Thread 0x7fb418ff9700 (LWP 4602)"handler79" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 24 Thread 0x7fb4197fa700 (LWP 4603)"handler78" 0x00007fb44f0e4d3d in poll () at ../sysdeps/unix/syscall-template.S:81 23 Thread 0x7fb419ffb700 (LWP 4604)"handler77" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 22 Thread 0x7fb44d400700 (LWP 4605)"handler80" 0x00007fb44f0e4d3d in poll () at ../sysdeps/unix/syscall-template.S:81 21 Thread 0x7fb44cbff700 (LWP 4606)"handler81" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 20 Thread 0x7fb43ffff700 (LWP 4607)"handler83" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 19 Thread 0x7fb43f7fe700 (LWP 4608)"handler84" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 18 Thread 0x7fb43effd700 (LWP 4609)"handler85" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 17 Thread 0x7fb43e7fc700 (LWP 4610)"handler86" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 16 Thread 0x7fb43dffb700 (LWP 4611)"handler87" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 15 Thread 0x7fb43d7fa700 (LWP 4612)"handler89" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 14 Thread 0x7fb43cff9700 (LWP 4613)"handler88" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 13 Thread 0x7fb423fff700 (LWP 4614)"handler91" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 12 Thread 0x7fb4237fe700 (LWP 4615)"handler90" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 11 Thread 0x7fb422ffd700 (LWP 4616)"handler92" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 10 Thread 0x7fb4227fc700 (LWP 4617)"handler93" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 9 Thread 0x7fb421ffb700 (LWP 4618)"revalidator94" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 8 Thread 0x7fb4217fa700 (LWP 4619)"revalidator95" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 7 Thread 0x7fb420ff9700 (LWP 4620) "revalidator96"0x00007fb44f0e4d3d in poll () at ../sysdeps/unix/syscall-template.S:81 6 Thread 0x7fb41bfff700 (LWP 4621)"revalidator97" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 5 Thread 0x7fb41b7fe700 (LWP 4622)"revalidator98" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 4 Thread 0x7fb41affd700 (LWP 4623)"revalidator99" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 3 Thread 0x7fb41a7fc700 (LWP 4624)"revalidator100" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 2 Thread 0x7fb3fb7fe700 (LWP 4625)"pmd101" 0x00000000005c8088 in dp_netdev_process_rxq_port.isra ()* 1 Thread 0x7fb45074eb00 (LWP 4497)"ovs-vswitchd" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 (gdb) thread 2[Switching to thread 2 (Thread 0x7fb3fb7fe700 (LWP 4625))]#0 0x00000000005c8088 indp_netdev_process_rxq_port.isra () (gdb) bt#0 0x00000000005c8088 in dp_netdev_process_rxq_port.isra()#1 0x00000000005c84aa in pmd_thread_main ()#2 0x0000000000648c54 in ovsthread_wrapper ()#3 0x00007fb44f8c10a4 in start_thread(arg=0x7fb3fb7fe700) at pthread_create.c:309#4 0x00007fb44f0ed87d in clone () at../sysdeps/unix/sysv/linux/x86_64/clone.S:111 After running traffic and ovs stuck: (gdb) info threads Id TargetId Frame 29 Thread 0x7fb44f004700 (LWP 4498)"dpdk_watchdog1" 0x00007fb44f0bef2d in nanosleep () at../sysdeps/unix/syscall-template.S:81 28 Thread 0x7fb44e803700 (LWP 4499)"vhost_thread2" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 27 Thread 0x7fb44e002700 (LWP 4500)"urcu3" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 26 Thread 0x7fb3fbfff700 (LWP 4601)"handler82" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 25 Thread 0x7fb418ff9700 (LWP 4602)"handler79" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 24 Thread 0x7fb4197fa700 (LWP 4603)"handler78" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 23 Thread 0x7fb419ffb700 (LWP 4604)"handler77" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 22 Thread 0x7fb44d400700 (LWP 4605)"handler80" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 21 Thread 0x7fb44cbff700 (LWP 4606)"handler81" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 20 Thread 0x7fb43ffff700 (LWP 4607)"handler83" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 19 Thread 0x7fb43f7fe700 (LWP 4608)"handler84" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 18 Thread 0x7fb43effd700 (LWP 4609)"handler85" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 17 Thread 0x7fb43e7fc700 (LWP 4610)"handler86" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 16 Thread 0x7fb43dffb700 (LWP 4611)"handler87" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 15 Thread 0x7fb43d7fa700 (LWP 4612)"handler89" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 14 Thread 0x7fb43cff9700 (LWP 4613)"handler88" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 13 Thread 0x7fb423fff700 (LWP 4614)"handler91" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 12 Thread 0x7fb4237fe700 (LWP 4615)"handler90" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 11 Thread 0x7fb422ffd700 (LWP 4616)"handler92" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 10 Thread 0x7fb4227fc700 (LWP 4617)"handler93" 0x00007fb44f0e4d3d in poll () at ../sysdeps/unix/syscall-template.S:81 9 Thread 0x7fb421ffb700 (LWP 4618)"revalidator94" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 8 Thread 0x7fb4217fa700 (LWP 4619)"revalidator95" 0x00007fb44f0e4d3d in poll () at ../sysdeps/unix/syscall-template.S:81 7 Thread 0x7fb420ff9700 (LWP 4620)"revalidator96" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 6 Thread 0x7fb41bfff700 (LWP 4621)"revalidator97" 0x00007fb44f0e4d3d in poll () at ../sysdeps/unix/syscall-template.S:81 5 Thread 0x7fb41b7fe700 (LWP 4622)"revalidator98" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 4 Thread 0x7fb41affd700 (LWP 4623)"revalidator99" 0x00007fb44f0e4d3d in poll () at ../sysdeps/unix/syscall-template.S:81 3 Thread 0x7fb41a7fc700 (LWP 4624)"revalidator100" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 2 Thread 0x7fb3fb7fe700 (LWP 4625)"pmd101" 0x00000000004664c9 in rte_vhost_dequeue_burst ()* 1 Thread 0x7fb45074eb00 (LWP 4497)"ovs-vswitchd" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81(gdb) thread 2[Switching to thread 2 (Thread 0x7fb3fb7fe700 (LWP 4625))]#0 0x00000000004664c9 in rte_vhost_dequeue_burst ()(gdb) bt#0 0x00000000004664c9 in rte_vhost_dequeue_burst ()#1 0x000000000069fb85 in netdev_dpdk_vhost_rxq_recv ()#2 0x00000000005f25d1 in netdev_rxq_recv ()#3 0x00000000005c8076 indp_netdev_process_rxq_port.isra ()#4 0x00000000005c84aa in pmd_thread_main ()#5 0x0000000000648c54 in ovsthread_wrapper ()#6 0x00007fb44f8c10a4 in start_thread(arg=0x7fb3fb7fe700) at pthread_create.c:309#7 0x00007fb44f0ed87d in clone () at../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Detach and re-attach: (gdb) thread 2[Switching to thread 2 (Thread 0x7fb450719b00 (LWP 5315))]#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238238 ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S: No such fileor directory.(gdb) bt#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238#1 0x00007fb44f6b3ce3 in handle_fildes_io(arg=<optimized out>) at ../sysdeps/pthread/aio_misc.c:645#2 0x00007fb44f8c10a4 in start_thread(arg=0x7fb450719b00) at pthread_create.c:309#3 0x00007fb44f0ed87d in clone () at../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Detach and re-attach: (gdb) thread 2[Switching to thread 2 (Thread 0x7fb3fb7fe700 (LWP 4625))]#0 0x0000000000463844 in rte_pktmbuf_free ()(gdb) bt#0 0x0000000000463844 in rte_pktmbuf_free ()#1 0x00000000004668f4 in rte_vhost_dequeue_burst ()#2 0x000000000069fb85 in netdev_dpdk_vhost_rxq_recv ()#3 0x00000000005f25d1 in netdev_rxq_recv ()#4 0x00000000005c8076 indp_netdev_process_rxq_port.isra ()#5 0x00000000005c84aa in pmd_thread_main ()#6 0x0000000000648c54 in ovsthread_wrapper ()#7 0x00007fb44f8c10a4 in start_thread(arg=0x7fb3fb7fe700) at pthread_create.c:309#8 0x00007fb44f0ed87d in clone () at../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Thanks On Tuesday, 26 April 2016 11:29 AM, Daniele Di Proietto <diproiet...@ovn.org> wrote: 2016-04-26 9:08 GMT-07:00 Traynor, Kevin <kevin.tray...@intel.com>: > -----Original Message----- > From: discuss [mailto:discuss-boun...@openvswitch.org] On Behalf Of > Kochba, Alon > Sent: Tuesday, April 26, 2016 4:38 PM > To: Ben Pfaff <b...@ovn.org>; Yi Ba <yby.develo...@yahoo.com> > Cc: b...@openvswitch.org > Subject: Re: [ovs-discuss] ovs get stuck when running traffic from VM > to VM on same compute > > Hi Ben, > > Could you point us to the commit that fixed this issue? > We already tried patching with this commit which seemed relevant, but > the issue still recreated - > https://github.com/openvswitch/ovs/commit/f519a72d9a3708fbc5f796f176e7 > c8bd3dcfb738 > > We will retry with your suggestion of using the 2.5 branch code, but > we might want to backport the specific fix unless there is a 2.5.1 > release including it. > If the commit linked above is the one you were thinking of, please > note a small difference - in the commit the rcu is blocked waiting for > vhost_thread to quiesce, while in our case rcu is blocked waiting for > pmd to quiesce. It sounds similar to the problem that this commit fixed. If so the fix is applied to master and 2.5 branches. https://github.com/openvswitch/ovs/commit/61c4e39460a7db3be7262a3b2af767a84167a9d8 Could you try applying the above commit and see if it fixes the problem? If you manage to reproduce the problem, could you get a backtrace of the blocked thread (pmd101 in this case)? Thanks, Daniele
_______________________________________________ discuss mailing list discuss@openvswitch.org http://openvswitch.org/mailman/listinfo/discuss