On Wed, 22 Jan 2014 17:53:52 +0200 Stratos Psomadakis <pso...@grnet.gr> wrote:
> Hi, > > we've encountered a weird issue regarding monitor (qmp and hmp) behavior > with qemu-1.7 (and qemu-1.5). The following steps will reproduce the issue: > > 1) Client A connects to qmp socket with socat > 2) Client A gets greeting message {"QMP": {"version": ..} > 3) Client A waits (select on the socket's fd) > 4) Client B tries to connect to the *same* qmp socket with socat > 5) Client B does *NOT* get any greating message > 6) Client B waits (select on the socket's fd) > 7) Client B closes connection (kill socat) > 8) Client A quits too > 9) Client C connects to qmp socket > 10) Client C gets *two* greeting messages!!! > > After some investigation, we traced it down to the monitor_flush() > function in monitor.c. Specifically, when a second client connects to > the qmp (client B), while another one is already using it (client A), we > get the following from stracing the second client (client B): > > connect(3, {sa_family=AF_FILE, path="foo.mon"}, 9) = 0 > getsockname(3, {sa_family=AF_FILE, NULL}, [2]) = 0 > select(4, [0 3], [1 3], [], NULL) = 2 (out [1 3]) > select(4, [0 3], [], [], NULL > > So, the connect() syscall from client B succeeds, although client B > connection has not yet been accepted by the qmp server (it's still in > the backlog of the qmp listening socket). > > After killing client B and then client A, we see the following when > stracing the qemu proc: > > 22363 accept4(6, {sa_family=AF_FILE, NULL}, [2], SOCK_CLOEXEC) = 9 > 22363 fcntl(9, F_GETFL) = 0x2 (flags O_RDWR) > 22363 fcntl(9, F_SETFL, O_RDWR|O_NONBLOCK) = 0 > 22363 fstat(9, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0 > 22363 fcntl(9, F_GETFL) = 0x802 (flags > O_RDWR|O_NONBLOCK) > 22363 write(9, "{\"QMP\": {\"version\": {\"qemu\": {\"m"..., 127) = > -1 EPIPE (Broken pipe) > 22363 --- SIGPIPE (Broken pipe) @ 0 (0) --- > > The qmp server / qemu accepts the connection from client B (who has now > closed the connection) and tries to write the greeting message to the > socket fd. This results in write returning an error (EPIPE). > > The monitor_flush() function doesn't seem to handle this case (write > error). Instead, it adds a watch / handler to retry the write operation. > Thus, mon->outbuf is not cleaned up properly, which results in duplicate > greeting messages for the next client to connect. > > The following seems to do the trick. > > diff --git a/monitor.c b/monitor.c > index 845f608..5622f20 100644 > --- a/monitor.c > +++ b/monitor.c > @@ -288,8 +288,8 @@ void monitor_flush(Monitor *mon) > > if (len && !mon->mux_out) { > rc = qemu_chr_fe_write(mon->chr, (const uint8_t *) buf, len); > - if (rc == len) { > - /* all flushed */ > + if ((rc < 0 && errno != EAGAIN) || (rc == len)) { > + /* all flushed or error */ > QDECREF(mon->outbuf); > mon->outbuf = qstring_new(); > return; > > Comments? I can reproduce the problem very easily and I can't think of a better way to fix it. The right thing to do, I guess, would be to move the error up and kill the connection if there's any. But last time I checked the chardev layer did not have an API to kill a connection...