On 21.07.2016 12:37, Yuanhan Liu wrote:
> On Thu, Jul 21, 2016 at 11:21:15AM +0300, Ilya Maximets wrote:
>> If something abnormal happened to QEMU, 'connect()' can block calling
>> thread (e.g. main thread of OVS) forever or for a really long time.
>> This can break whole application or block the reconnection thread.
>>
>> Example with OVS:
>>
>>      ovs_rcu(urcu2)|WARN|blocked 512000 ms waiting for main to quiesce
>>      (gdb) bt
>>      #0  connect () from /lib64/libpthread.so.0
>>      #1  vhost_user_create_client (vsocket=0xa816e0)
>>      #2  rte_vhost_driver_register
>>      #3  netdev_dpdk_vhost_user_construct
>>      #4  netdev_open (name=0xa664b0 "vhost1")
>>      [...]
>>      #11 main
>>
>> Fix that by setting non-blocking mode for client sockets for connection.
>>
>> Fixes: 64ab701c3d1e ("vhost: add vhost-user client mode")
> 
> Thanks for spotting and fixing yet another bug!
> 
>>  
>> +static int
>> +vhost_user_connect_nonblock(int fd, struct sockaddr *un, size_t sz)
> 
> I don't quite understand why this is needed: connect() with O_NONBLOCK
> flag set is not enough?

There is a little issue with non-blocking connect() call. Connection
establishing may be started but '-1' returned with 'errno = EINPROGRESS'.
In this case we must wait on fd until it will be available for writing.
After that we need to check current status of connection using getsockopt().

I don't sure that we're able to get such situation, but it's documented,
and, I think, we should handle it.

See 'man connect' for details.

Best regards, Ilya Maximets.

Reply via email to