On Fri, 24 Jan 2014 12:14:12 +0200 Stratos Psomadakis <pso...@grnet.gr> wrote:
> On 01/23/2014 08:28 PM, Luiz Capitulino wrote: > > On Thu, 23 Jan 2014 17:33:33 +0200 > > Stratos Psomadakis <pso...@grnet.gr> wrote: > > > >> On 01/23/2014 03:54 PM, Luiz Capitulino wrote: > >>> On Thu, 23 Jan 2014 08:44:02 -0500 > >>> Luiz Capitulino <lcapitul...@redhat.com> wrote: > >>> > >>>> On Thu, 23 Jan 2014 19:23:51 +0800 > >>>> Fam Zheng <f...@redhat.com> wrote: > >>>> > >>>>> Bcc: > >>>>> Subject: Re: [Qemu-devel] Possible bug in monitor code > >>>>> Reply-To: > >>>>> In-Reply-To: <52e0ec4b.7010...@grnet.gr> > >>>>> > >>>>> On Thu, 01/23 12:17, Stratos Psomadakis wrote: > >>>>>> On 01/23/2014 05:07 AM, Fam Zheng wrote: > >>>>>>> On Wed, 01/22 17:53, Stratos Psomadakis wrote: > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> we've encountered a weird issue regarding monitor (qmp and hmp) > >>>>>>>> behavior > >>>>>>>> with qemu-1.7 (and qemu-1.5). The following steps will reproduce the > >>>>>>>> issue: > >>>>>>>> > >>>>>>>> 1) Client A connects to qmp socket with socat > >>>>>>>> 2) Client A gets greeting message {"QMP": {"version": ..} > >>>>>>>> 3) Client A waits (select on the socket's fd) > >>>>>>>> 4) Client B tries to connect to the *same* qmp socket with socat > >>>> QMP/HMP can only handle a single client per connection, so this is > >>>> not supported. What you could do is to create multiple QMP/HMP instances > >>>> on the command-line, but this has never been fully tested so we don't > >>>> officially support this either (although it may work). > >>>> > >>>> Now, not gracefully failing on step 4 is a real bug here. I think the > >>>> best thing to do would be to close client's B connection. Patches are > >>>> welcome :) > >>> Thinking about this again, I think this should already be the case > >>> (as I don't believe chardev code is sitting on accept()). So maybe > >>> you triggered a race on chardev code or broke something there. > >> Yes, the chardev code doesn't accept any more connections until the > >> currently active connection is closed. > >> > >> However, when a new client tries to connect (while there's another qmp / > >> hmp connection active) the kernel, as long as there's room in the socket > >> backlog, will return to the client that the connection has been > >> successfully established and will queue the connection to be accepted > >> later, when qmp actually finishes with its active connection and > >> re-executes accept(). > >> > >> The problem arises if the new client closes the connection before the > >> older one is finished. When qmp runs accept() again, it will get a > >> socket fd for a client who has now closed the connection. At this point, > >> the monitor control event handler will get triggered and try to write / > >> flush the greeting message to the client. The client however has closed > >> its socket so the write will fail with SIGPIPE / EPIPE. Neither the > >> qemu-char nor the monitor code seem to handle this situation. > >> > >> Btw, have you tried to reproduce it? > > Not yet, I may have some time tomorrow. How reproducible is it for you? > > We can trigger it (by following the steps described in the first mail) > consistently. > > > Another question: have you tried to reproduce with an old qemu version > > (say v1.0) to see if this bug always existed? If the bug was introduced > > in some recent QEMU version you could try to bisect it. > > v1.1 is not affected. I checked the code and it seems the monitor code > has been refactored since v1.1. > > > Maybe you could try to reproduce with a different subsystem so that we > > can rule out or confirm monitor's involvement? Like -serial? > > It's actually a fault of the monitor_flush() function. As far as I can > understand, monitor_flush() calls qemu_chr_fe_write() and doesn't handle > all of the return codes / error cases properly (as I described in a > previous mail). If you check the function, you'll see that the final > case (where it set ups a watch / callback) always assumes an EAGAIN / > EWOULDBLOCK error. > > If you can verify / confirm that this is the case and that the patch > sent resolves the issue in a sane / correct way, I'll resubmit it > properly (with git-format-patch, a git log msg etc). I'll check that later today.