On Fri, Mar 29, 2013 at 8:38 PM, Tianpeng Zhang (Gmail) <tianpeng0...@gmail.com> wrote: > Hi All, > > I met an issue when running DRBD in Xenserver with ovs-1.7.1. DRBD works > fine when creating and sync data. But when trying to down DRBD resource, > ovs-vswitchd hangs for about 20 minutes, then all network connections > broken. > > I add some debug trace, ovs-vswitchd finally stopped at sendmsg() for > netlink message. The call path is: > bridge_run_fast()->ofproto_run_fast()->run_fast()->handle_upcalls()->handle_miss_upcalls()->dpif_operate()->dpif_linux_operate__()->nl_sock_transact_multiple()->nl_sock_transact_multiple__()->sendmsg() > > vswitchd stop here because sendmsg() does not return. > 465 memset(&msg, 0, sizeof msg); > 466 msg.msg_iov = iovs; > 467 msg.msg_iovlen = n; > 468 do { > 469 error = sendmsg(sock->fd, &msg, 0) < 0 ? errno : 0; > 470 } while (error == EINTR); > 471 > > Several guys met similar issue before from Xen/DRBD's mail list, but the > solution is just stop OVS, use linux bridge. I am thinking the issue may > because before DRBD stop resource, it will do some cleanup for its netlink > socket, this conflict with OVS's handling?
It looks like DRBD is also using genetlink for communication with userspace. There's a global lock so I suspect that DRBD is holding it for a long time, which is blocking OVS. The could also be deadlock if there is another shared lock that is taken in a different order but this seems somewhat less likely since there isn't a lot in common between DRBD and OVS. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev