On Tue, Oct 18, 2022 at 06:55:08PM +0100, Peter Maydell wrote: > I've been looking at a (long-standing) bug where an avocado test > intermittently fails. > > This happens because at the avocado end we write "halt\r" to the > serial console, which is wired up to a Unix socket; but at the UART > model we only ever see the 'h' character and no further data. As far > as I can tell this happens because Avocado closes the socket and the > QEMU socket chardev layer loses the last few characters of data that > the guest hasn't yet read at that point. > > This is what seems to me to be going on: > * Avocado writes the data ('halt\r') and closes the socket > pretty much immediately afterwards > * At the glib layer, the socket is polled, and it gets G_IO_IN > and G_IO_HUP, indicating "readable, and also closed" > * glib's source dispatch mechanism first calls tcp_chr_read() > to handle the G_IO_IN part > * tcp_chr_read() reads a single byte (the 'h'), because > SocketChardev::max_size is 1 (which in turn is because the > device model's can_write function returned 1 to say that's > all it can accept for now). So there's still data to be > read in future > * glib now calls tcp_chr_hup() because of the G_IO_HUP (as part > of the same handle-all-the-sources loop) > * tcp_chr_hup() calls tcp_chr_disconnect(), which basically > frees everything, tells the chardev backend that the connection > just closed, etc > * the data remaining in the socket to be read after the 'h' > is never read > > How is this intended to work? I guess the socket ought to go > into some kind of "disconnecting" state, but not actually do > a tcp_chr_disconnect() until all the data has been read via > tcp_chr_read() and it's finally got an EOF indication back from > tcp_chr_recv() ?
Right, this is basically broken by (lack of) design right now. The main problem here is that we're watching the socket twice. One set of callbacks added with io_add_watch_poll, and then a second callback added with qio_chanel_create_watch just for G_IO_HUP. We need there to be only 1 callback, and when that callback gets G_IO_IN, it should *ignore* G_IO_HUP until tcp_chr_recv returns 0 to indicate EOF. This would cause tcp_chr_read to be invoked repeatedly with G_IO_IN | G_IO_HUP, as we read "halt\r" one byte at a time. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|