On 21.07.2016 12:37, Yuanhan Liu wrote: > On Thu, Jul 21, 2016 at 11:21:15AM +0300, Ilya Maximets wrote: >> If something abnormal happened to QEMU, 'connect()' can block calling >> thread (e.g. main thread of OVS) forever or for a really long time. >> This can break whole application or block the reconnection thread. >> >> Example with OVS: >> >> ovs_rcu(urcu2)|WARN|blocked 512000 ms waiting for main to quiesce >> (gdb) bt >> #0 connect () from /lib64/libpthread.so.0 >> #1 vhost_user_create_client (vsocket=0xa816e0) >> #2 rte_vhost_driver_register >> #3 netdev_dpdk_vhost_user_construct >> #4 netdev_open (name=0xa664b0 "vhost1") >> [...] >> #11 main >> >> Fix that by setting non-blocking mode for client sockets for connection. >> >> Fixes: 64ab701c3d1e ("vhost: add vhost-user client mode") > > Thanks for spotting and fixing yet another bug! > >> >> +static int >> +vhost_user_connect_nonblock(int fd, struct sockaddr *un, size_t sz) > > I don't quite understand why this is needed: connect() with O_NONBLOCK > flag set is not enough?
There is a little issue with non-blocking connect() call. Connection establishing may be started but '-1' returned with 'errno = EINPROGRESS'. In this case we must wait on fd until it will be available for writing. After that we need to check current status of connection using getsockopt(). I don't sure that we're able to get such situation, but it's documented, and, I think, we should handle it. See 'man connect' for details. Best regards, Ilya Maximets.