*Sure, when I tried to delete my br-int, ovs hangs* *Basically, main thread joins the revalidator thread, revalidator threads are either blocking at recvmsg() or the mutex.*
*Following is the trace:* Id Target Id Frame 47 Thread 0x7f18bed6a700 (LWP 338) "revalidator57" 0x00007f18bf8528ad in recvmsg () at ../sysdeps/unix/syscall-template.S:81 46 Thread 0x7f18be569700 (LWP 337) "revalidator56" __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 6 Thread 0x7f18bcd66700 (LWP 32584) "urcu5" 0x00007f18bf05cfbd in poll () at ../sysdeps/unix/syscall-template.S:81 * 1 Thread 0x7f18c02ab980 (LWP 32553) "ovs-vswitchd" 0x00007f18bf84c66b in pthread_join (threadid=139744249288448, thread_return=thread_return@entry =0x0) at pthread_join.c:92 *The main thread will udpif_flush, which stops all revalidators...* #0 0x00007f18bf84c66b in pthread_join (threadid=139744249288448, thread_return=thread_return@entry=0x0) at pthread_join.c:92 #1 0x0000000000495d39 in xpthread_join (arg1=<optimized out>, arg2=arg2@entry=0x0) at lib/ovs-thread.c:173 #2 0x000000000042ca8a in udpif_stop_threads (udpif=0x2294bb0) at ofproto/ofproto-dpif-upcall.c:317 #3 0x000000000042e48c in udpif_flush (udpif=0x2294bb0) at ofproto/ofproto-dpif-upcall.c:482 #4 0x0000000000419101 in ofproto_flush__ (ofproto=ofproto@entry=0x227dc80) at ofproto/ofproto.c:1281 #5 0x0000000000419285 in ofproto_destroy (p=0x227dc80) at ofproto/ofproto.c:1363 #6 0x0000000000408a90 in bridge_destroy (br=br@entry=0x21fe2c0) at vswitchd/bridge.c:2716 #7 0x0000000000408e81 in add_del_bridges (cfg=0x21b8a90, cfg=0x21b8a90) at vswitchd/bridge.c:1428 #8 0x000000000040a977 in bridge_reconfigure (ovs_cfg=ovs_cfg@entry =0x21b8a90) at vswitchd/bridge.c:538 #9 0x000000000040e214 in bridge_run () at vswitchd/bridge.c:2387 #10 0x0000000000405e15 in main (argc=6, argv=0x7fff70c4a268) at vswitchd/ovs-vswitchd.c:116 *At the same time, revalidator is dumping flow and blocked,* *revalidator57* #0 0x00007f18bf8528ad in recvmsg () at ../sysdeps/unix/syscall-template.S:81 #1 0x00000000004d7bfb in nl_sock_recv__ (buf=buf@entry=0x7f18b0000a10, wait=wait@entry=true, sock=<optimized out>, sock=<optimized out>) at lib/netlink-socket.c:337 #2 0x00000000004d8d1d in nl_dump_refill (dump=<optimized out>, dump=<optimized out>, buffer=<optimized out>) at lib/netlink-socket.c:727 #3 nl_dump_next (dump=dump@entry=0x7f18b4004ac8, reply=reply@entry =0x7f18bed67170, buffer=buffer@entry=0x7f18b0000a10) at lib/netlink-socket.c:804 #4 0x00000000004ce628 in dpif_linux_flow_dump_next (thread_=0x7f18b0000980, flows=0x7f18bed67370, max_flows=50) at lib/dpif-linux.c:1279 #5 0x0000000000450152 in dpif_flow_dump_next (thread=thread@entry =0x7f18b0000980, flows=flows@entry=0x7f18bed67370, max_flows=max_flows@entry=50) at lib/dpif.c:1048 #6 0x000000000042d97f in revalidate (revalidator=0x227cb50) at ofproto/ofproto-dpif-upcall.c:1375 #7 0x000000000042ddcb in udpif_revalidator (arg=0x227cb50) at ofproto/ofproto-dpif-upcall.c:599 #8 0x0000000000495531 in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:329 #9 0x00007f18bf84b182 in start_thread (arg=0x7f18bed6a700) at pthread_create.c:312 #10 0x00007f18bf06a30d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 *revalidator56* #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007f18bf84d657 in _L_lock_909 () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00007f18bf84d480 in __GI___pthread_mutex_lock (mutex=0x7f18b4004ad8) at ../nptl/pthread_mutex_lock.c:79 #3 0x0000000000495568 in ovs_mutex_lock_at (l_=l_@entry=0x7f18b4004ad8, where=where@entry=0x51bbf6 "lib/netlink-socket.c:816") at lib/ovs-thread.c:73 #4 0x00000000004d8dc1 in nl_dump_next (dump=dump@entry=0x7f18b4004ac8, reply=reply@entry=0x7f18be566170, buffer=buffer@entry=0x7f18b4004f80) at lib/netlink-socket.c:816 #5 0x00000000004ce628 in dpif_linux_flow_dump_next (thread_=0x7f18b4004ef0, flows=0x7f18be566370, max_flows=50) at lib/dpif-linux.c:1279 #6 0x0000000000450152 in dpif_flow_dump_next (thread=thread@entry =0x7f18b4004ef0, flows=flows@entry=0x7f18be566370, max_flows=max_flows@entry=50) at lib/dpif.c:1048 #7 0x000000000042d97f in revalidate (revalidator=0x227cb30) at ofproto/ofproto-dpif-upcall.c:1375 #8 0x000000000042ddcb in udpif_revalidator (arg=0x227cb30) at ofproto/ofproto-dpif-upcall.c:599 #9 0x0000000000495531 in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:329 #10 0x00007f18bf84b182 in start_thread (arg=0x7f18be569700) at pthread_create.c:312 #11 0x00007f18bf06a30d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 On Fri, Jul 18, 2014 at 2:51 PM, Ben Pfaff <b...@nicira.com> wrote: > On Fri, Jul 18, 2014 at 02:45:47PM -0700, Alex Wang wrote: > > Commit 93295354 (netlink-socket: Simplify multithreaded dumping > > to match Linux reality.) makes the call to recvmsg() block if no > > messages are available. This can cause revalidator threads hanging > > for long time or even deadlock when main thread tries to stop the > > revalidator threads. > > > > This commit fixes the issue by enabling the MSG_DONTWAIT flag in > > the call to recvmsg(). > > > > Signed-off-by: Alex Wang <al...@nicira.com> > > It's a reasonable fix but I'd like to learn more about the situation > where the problem arises. It seems like there might be more to it. > Can you explain more, or maybe give a backtrace? > > Thanks, > > Ben. > _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev