On 01/23/2014 05:07 AM, Fam Zheng wrote: > On Wed, 01/22 17:53, Stratos Psomadakis wrote: >> Hi, >> >> we've encountered a weird issue regarding monitor (qmp and hmp) behavior >> with qemu-1.7 (and qemu-1.5). The following steps will reproduce the issue: >> >> 1) Client A connects to qmp socket with socat >> 2) Client A gets greeting message {"QMP": {"version": ..} >> 3) Client A waits (select on the socket's fd) >> 4) Client B tries to connect to the *same* qmp socket with socat >> 5) Client B does *NOT* get any greating message >> 6) Client B waits (select on the socket's fd) >> 7) Client B closes connection (kill socat) >> 8) Client A quits too >> 9) Client C connects to qmp socket >> 10) Client C gets *two* greeting messages!!! > Hi Stratos, thank you for debugging and reporting this. > > I tested this sequence but can't fully reproduce this. What I see is 5) but no > 10). Client C acts normally. And your patch below doesn't solve it for me.
Hm, which qemu version (or repo branch / tag) did you use? We did a quick scan of the master branch code / commits, but we didn't find anything that might fix the issue. > To submit a patch, please follow instructions as described in > http://wiki.qemu.org/Contribute/SubmitAPatch > so it could be picked up by maintainers. Specifically, you need to format your > patch email with "git format-patch" and add a "Signed-off-by:" line in your > patch email. Ok. If any dev can confirm that this is a bug (and that the patch below is the correct way to fix it) I'll resubmit it properly. Thanks, Stratos > Thanks, > > Fam > >> After some investigation, we traced it down to the monitor_flush() >> function in monitor.c. Specifically, when a second client connects to >> the qmp (client B), while another one is already using it (client A), we >> get the following from stracing the second client (client B): >> >> connect(3, {sa_family=AF_FILE, path="foo.mon"}, 9) = 0 >> getsockname(3, {sa_family=AF_FILE, NULL}, [2]) = 0 >> select(4, [0 3], [1 3], [], NULL) = 2 (out [1 3]) >> select(4, [0 3], [], [], NULL >> >> So, the connect() syscall from client B succeeds, although client B >> connection has not yet been accepted by the qmp server (it's still in >> the backlog of the qmp listening socket). >> >> After killing client B and then client A, we see the following when >> stracing the qemu proc: >> >> 22363 accept4(6, {sa_family=AF_FILE, NULL}, [2], SOCK_CLOEXEC) = 9 >> 22363 fcntl(9, F_GETFL) = 0x2 (flags O_RDWR) >> 22363 fcntl(9, F_SETFL, O_RDWR|O_NONBLOCK) = 0 >> 22363 fstat(9, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0 >> 22363 fcntl(9, F_GETFL) = 0x802 (flags >> O_RDWR|O_NONBLOCK) >> 22363 write(9, "{\"QMP\": {\"version\": {\"qemu\": {\"m"..., 127) = >> -1 EPIPE (Broken pipe) >> 22363 --- SIGPIPE (Broken pipe) @ 0 (0) --- >> >> The qmp server / qemu accepts the connection from client B (who has now >> closed the connection) and tries to write the greeting message to the >> socket fd. This results in write returning an error (EPIPE). >> >> The monitor_flush() function doesn't seem to handle this case (write >> error). Instead, it adds a watch / handler to retry the write operation. >> Thus, mon->outbuf is not cleaned up properly, which results in duplicate >> greeting messages for the next client to connect. >> >> The following seems to do the trick. >> >> diff --git a/monitor.c b/monitor.c >> index 845f608..5622f20 100644 >> --- a/monitor.c >> +++ b/monitor.c >> @@ -288,8 +288,8 @@ void monitor_flush(Monitor *mon) >> >> if (len && !mon->mux_out) { >> rc = qemu_chr_fe_write(mon->chr, (const uint8_t *) buf, len); >> - if (rc == len) { >> - /* all flushed */ >> + if ((rc < 0 && errno != EAGAIN) || (rc == len)) { >> + /* all flushed or error */ >> QDECREF(mon->outbuf); >> mon->outbuf = qstring_new(); >> return; >> >> Comments? >> >> Thanks, >> Stratos >> >> -- >> Stratos Psomadakis >> <pso...@grnet.gr> >> > -- Stratos Psomadakis <pso...@grnet.gr>
signature.asc
Description: OpenPGP digital signature