On Wed, Aug 24, 2022 at 05:40:27PM +0800, Bin Meng wrote:
> From: Bin Meng <bin.m...@windriver.com>
> 
> Random failure was observed when running qtests on Windows due to
> "Broken pipe" detected by qmp_fd_receive(). What happened is that
> the qtest executable sends testing data over a socket to the QEMU
> under test but no response is received. The errno of the recv()
> call from the qtest executable indicates ETIMEOUT, due to the qmp
> chardev's tcp_chr_read() is never called to receive testing data
> hence no response is sent to the other side.
> 
> tcp_chr_read() is registered as the callback of the socket watch
> GSource. The reason of the callback not being called by glib, is
> that the source check fails to indicate the source is ready. There
> are two socket watch sources created to monitor the same socket
> event object from the char-socket backend in update_ioc_handlers().
> During the source check phase, qio_channel_socket_source_check()
> calls WSAEnumNetworkEvents() to discovers occurrences of network
> events for the indicated socket, clear internal network event records,
> and reset the event object. Testing shows that if we don't reset the
> event object by not passing the event handle to WSAEnumNetworkEvents()
> the symptom goes away and qtest runs very stably.
> 
> It looks we don't need to call WSAEnumNetworkEvents() at all, as we
> don't parse the result of WSANETWORKEVENTS returned from this API.
> We use select() to poll the socket status. Fix this instability by
> dropping the WSAEnumNetworkEvents() call.
> 
> Signed-off-by: Bin Meng <bin.m...@windriver.com>
> ---
> During the testing, I removed the following codes in update_ioc_handlers():
> 
>     remove_hup_source(s);
>     s->hup_source = qio_channel_create_watch(s->ioc, G_IO_HUP);
>     g_source_set_callback(s->hup_source, (GSourceFunc)tcp_chr_hup,
>                           chr, NULL);
>     g_source_attach(s->hup_source, chr->gcontext);
> 
> and such change also makes the symptom go away.
> 
> And if I moved the above codes to the beginning, before the call to
> io_add_watch_poll(), the symptom also goes away.
> 
> It seems two sources watching on the same socket event object is
> the key that leads to the instability. The order of adding a source
> watch seems to also play a role but I can't explain why.
> Hopefully a Windows and glib expert could explain this behavior.
> 
>  io/channel-watch.c | 4 ----
>  1 file changed, 4 deletions(-)

Reviewed-by: Daniel P. Berrangé <berra...@redhat.com>

and queued


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Reply via email to