On Tue, Nov 20, 2018 at 06:44:27PM +0100, Markus Armbruster wrote:
> Peter Xu <pet...@redhat.com> writes:
> 
> > On Mon, Nov 19, 2018 at 07:08:13PM +0100, Markus Armbruster wrote:
> >> Peter Xu <pet...@redhat.com> writes:
> >> 
> >> > On Mon, Nov 19, 2018 at 02:17:27PM +0800, Peter Xu wrote:
> >> >> I reproduced the error with a FreeBSD guest and this change (which
> >> >> possibly can be squashed into "tests: qmp-test: add queue full test")
> >> >> worked for me:
> >> >> 
> >> >> diff --git a/tests/qmp-test.c b/tests/qmp-test.c
> >> >> index 6989acbca4..83f353db4f 100644
> >> >> --- a/tests/qmp-test.c
> >> >> +++ b/tests/qmp-test.c
> >> >> @@ -281,8 +281,15 @@ static void test_qmp_oob(void)
> >> >>       * will only be able to be handled after the queue is shrinked, so
> >> >>       * it'll be processed only after one existing in-band command
> >> >>       * finishes.
> >> >> +     *
> >> >> +     * NOTE: we need to feed the queue with one extra request to make
> >> >> +     * sure it'll be stuck since when we have sent the Nth request
> >> >> +     * it's possible that we have already popped and processing the
> >> >> +     * 1st request so the Nth request (which could potentially be the
> >> >> +     * [N-1]th element on the queue) might not trigger the
> >> >> +     * monitor-full condition deterministically.
> >> >>       */
> >> >> -    for (i = 1; i <= QMP_REQ_QUEUE_LEN_MAX; i++) {
> >> >> +    for (i = 1; i <= QMP_REQ_QUEUE_LEN_MAX + 1; i++) {
> >> >>          id = g_strdup_printf("queue-blocks-%d", i);
> >> >>          send_cmd_that_blocks(qts, id);
> >> >>          g_free(id);
> >> >> @@ -291,7 +298,7 @@ static void test_qmp_oob(void)
> >> >>      unblock_blocked_cmd();
> >> >>      recv_cmd_id(qts, "queue-blocks-1");
> >> >>      recv_cmd_id(qts, "oob-1");
> >> >> -    for (i = 2; i <= QMP_REQ_QUEUE_LEN_MAX; i++) {
> >> >> +    for (i = 2; i <= QMP_REQ_QUEUE_LEN_MAX + 1; i++) {
> >> >>          unblock_blocked_cmd();
> >> >>          id = g_strdup_printf("queue-blocks-%d", i);
> >> >>          recv_cmd_id(qts, id);
> >> >> 
> >> >> So the problem here is that the queue-block-N command might not really
> >> >> suspend the monitor everytime if we already popped the 1st request,
> >> >> which will let the N-th request to be (N-1)th, then the parser will
> >> >> continue to eat the oob command and it could "preempt" the previous
> >> >> commands.
> >> >> 
> >> >> Maybe FreeBSD is scheduling the threads in some pattern so it happens
> >> >> only on FreeBSD and very constantly, but anyway it should be a general
> >> >> fix to the test program.
> >> 
> >> I feel particularly dense right now, and need a more detailed analysis
> >> to understand.  Let me try.
> >> 
> >> Here's what the test does before your fix
> >> 
> >>     Send QMP_REQ_QUEUE_LEN_MAX "blocking" in-band commands
> >>     queue-blocks-1, queue-blocks-2, ...  "Blocking" means the command
> >>     blocks until we explicitly unblock it.
> >> 
> >>     Send an out-of-band command oob-1.
> >> 
> >>     Unblock queue-blocks-1.
> >> 
> >>     Receive response to queue-blocks-1.
> >> 
> >>     Receive response to oob-1.
> >> 
> >>     Unblock queue-blocks-2, receive response
> >>     ...
> >> 
> >> Here's what test test expects QEMU to do:
> >> 
> >>     In I/O thread:
> >> 
> >>         Receive and queue queue-blocks-1, ...
> >> 
> >>         Queue is full, stop receiving commands.
> >> 
> >>     In main thread:
> >> 
> >>         Dequeue queue-blocks-1.
> >> 
> >>         Execute queue-blocks-1, block for a while, finish and send
> >>         response.
> >> 
> >>     In I/O thread:
> >> 
> >>         Queue is no longer full, resume receiving commands.
> >> 
> >>         Receive and execute oob-1, send response
> >> 
> >>     In main thread:
> >> 
> >>         Dequeue queue-blocks-2, execute, block for a while, finish and
> >>         send response.
> >>         ...
> >> 
> >> No good because the two threads race.  On my Linux box the test passes.
> >> Let's have a closer look how it does that.  The appended patch that adds
> >> another tracepoint.  With tracepoints monitor_* and handle_qmp_command
> >> enabled, I get:
> >> 
> >>     19271@1542641581.372890:handle_qmp_command mon 0x563e19be4e80 req: 
> >> {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, 
> >> "node-name": "queue-blocks-1", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-EZroW3/fifo"}}
> >> 
> >> I/O thread receives and queues queue-blocks-1.
> >> 
> >>     19271@1542641581.372954:monitor_qmp_cmd_in_band queue-blocks-1
> >> 
> >> Main thread dequeues qemu-blocks-1 and starts executing it.
> >> 
> >> We already lost: the queue won't actually fill up.  The test passes
> >> anyway, but it doesn't actually test "queue full".  But why does it
> >> pass?  Let's see.
> >> 
> >>     19271@1542641581.373211:monitor_qmp_respond queue-blocks-1
> >> 
> >> Main thread sends response to queue-blocks-1.  This means queue-blocks-1
> >> has been unblocked already.  This is possible, because qmp-test sends
> >> all the commands and then immediately unblocks queue-blocks-1, and the
> >> main thread can win the race with the I/O thread.
> >> 
> >>     19271@1542641581.373826:handle_qmp_command mon 0x563e19be4e80 req: 
> >> {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, 
> >> "node-name": "queue-blocks-2", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-EZroW3/fifo"}}
> >>     19271@1542641581.373864:monitor_qmp_cmd_in_band queue-blocks-2
> >>     19271@1542641581.374575:handle_qmp_command mon 0x563e19be4e80 req: 
> >> {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, 
> >> "node-name": "queue-blocks-3", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-EZroW3/fifo"}}
> >>     19271@1542641581.375344:handle_qmp_command mon 0x563e19be4e80 req: 
> >> {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, 
> >> "node-name": "queue-blocks-4", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-EZroW3/fifo"}}
> >>     19271@1542641581.376162:handle_qmp_command mon 0x563e19be4e80 req: 
> >> {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, 
> >> "node-name": "queue-blocks-5", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-EZroW3/fifo"}}
> >>     19271@1542641581.376996:handle_qmp_command mon 0x563e19be4e80 req: 
> >> {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, 
> >> "node-name": "queue-blocks-6", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-EZroW3/fifo"}}
> >>     19271@1542641581.377790:handle_qmp_command mon 0x563e19be4e80 req: 
> >> {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, 
> >> "node-name": "queue-blocks-7", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-EZroW3/fifo"}}
> >>     19271@1542641581.378562:handle_qmp_command mon 0x563e19be4e80 req: 
> >> {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, 
> >> "node-name": "queue-blocks-8", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-EZroW3/fifo"}}
> >> 
> >> I/O thread receives and queues queue-blocks-2, ... queue-blocks-8.
> >> 
> >>     19271@1542641581.378796:handle_qmp_command mon 0x563e19be4e80 req: 
> >> {"exec-oob": "migrate-pause"}
> >>     19271@1542641581.378800:monitor_qmp_cmd_out_of_band oob-1
> >>     19271@1542641581.378823:monitor_qmp_respond oob-1
> >> 
> >> I/O thread receives, executes and responds to oob-1.
> >> 
> >>     19271@1542641581.379197:monitor_qmp_respond queue-blocks-2
> >>     19271@1542641581.379228:monitor_qmp_cmd_in_band queue-blocks-3
> >>     19271@1542641581.379435:monitor_qmp_respond queue-blocks-3
> >>     19271@1542641581.379477:monitor_qmp_cmd_in_band queue-blocks-4
> >>     19271@1542641581.379597:monitor_qmp_respond queue-blocks-4
> >>     19271@1542641581.379621:monitor_qmp_cmd_in_band queue-blocks-5
> >>     19271@1542641581.379800:monitor_qmp_respond queue-blocks-5
> >>     19271@1542641581.379824:monitor_qmp_cmd_in_band queue-blocks-6
> >>     19271@1542641581.379953:monitor_qmp_respond queue-blocks-6
> >>     19271@1542641581.379992:monitor_qmp_cmd_in_band queue-blocks-7
> >>     19271@1542641581.380132:monitor_qmp_respond queue-blocks-7
> >>     19271@1542641581.380170:monitor_qmp_cmd_in_band queue-blocks-8
> >>     19271@1542641581.380283:monitor_qmp_respond queue-blocks-8
> >> 
> >> Main thread dequeues, executes and responds to queue-blocks-2,
> >> ... queue-blocks-8.
> >> 
> >> If I make the main thread lose the race by sticking a sleep(1) before
> >> the unblock_blocked_cmd() that unblocks queue-blocks-1, I get:
> >> 
> >>     22762@1542641707.818838:handle_qmp_command mon 0x55f5b6a5de80 req: 
> >> {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, 
> >> "node-name": "queue-blocks-1", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-PqO2uz/fifo"}}
> >> 
> >> I/O thread receives and queues queue-blocks-1.
> >> 
> >>     22762@1542641707.818856:monitor_qmp_cmd_in_band queue-blocks-1
> >> 
> >> Main thread dequeues qemu-blocks-1 and starts executing it.  We already
> >> lost.
> >> 
> >>     22762@1542641707.819585:handle_qmp_command mon 0x55f5b6a5de80 req: 
> >> {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, 
> >> "node-name": "queue-blocks-2", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-PqO2uz/fifo"}}
> >>     22762@1542641707.820331:handle_qmp_command mon 0x55f5b6a5de80 req: 
> >> {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, 
> >> "node-name": "queue-blocks-3", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-PqO2uz/fifo"}}
> >>     22762@1542641707.821072:handle_qmp_command mon 0x55f5b6a5de80 req: 
> >> {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, 
> >> "node-name": "queue-blocks-4", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-PqO2uz/fifo"}}
> >>     22762@1542641707.821821:handle_qmp_command mon 0x55f5b6a5de80 req: 
> >> {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, 
> >> "node-name": "queue-blocks-5", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-PqO2uz/fifo"}}
> >>     22762@1542641707.822569:handle_qmp_command mon 0x55f5b6a5de80 req: 
> >> {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, 
> >> "node-name": "queue-blocks-6", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-PqO2uz/fifo"}}
> >>     22762@1542641707.823314:handle_qmp_command mon 0x55f5b6a5de80 req: 
> >> {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, 
> >> "node-name": "queue-blocks-7", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-PqO2uz/fifo"}}
> >>     22762@1542641707.824069:handle_qmp_command mon 0x55f5b6a5de80 req: 
> >> {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, 
> >> "node-name": "queue-blocks-8", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-PqO2uz/fifo"}}
> >> 
> >> I/O thread receives and queues queue-blocks-2, ... queue-blocks-8.
> >> 
> >>     22762@1542641707.824253:handle_qmp_command mon 0x55f5b6a5de80 req: 
> >> {"exec-oob": "migrate-pause"}
> >>     22762@1542641707.824257:monitor_qmp_cmd_out_of_band oob-1
> >>     22762@1542641707.824282:monitor_qmp_respond oob-1
> >> 
> >> I/O thread receives, executes and responds to oob-1.
> >> 
> >> Test is going to fail: response to oob-1 sent before response to
> >> queue-blocks-1.
> >> 
> >> Now let me try your fix.  Here's what I see:
> >> 
> >>     16585@1542650042.980953:handle_qmp_command mon 0x5597b8e0de80 req: 
> >> {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, 
> >> "node-name": "queue-blocks-1", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-8YRbOJ/fifo"}}
> >>     16585@1542650042.981018:monitor_qmp_cmd_in_band queue-blocks-1
> >>     16585@1542650042.981252:monitor_qmp_respond queue-blocks-1
> >>     16585@1542650042.981838:handle_qmp_command mon 0x5597b8e0de80 req: 
> >> {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, 
> >> "node-name": "queue-blocks-2", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-8YRbOJ/fifo"}}
> >>     16585@1542650042.981916:monitor_qmp_cmd_in_band queue-blocks-2
> >>     16585@1542650042.982656:handle_qmp_command mon 0x5597b8e0de80 req: 
> >> {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, 
> >> "node-name": "queue-blocks-3", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-8YRbOJ/fifo"}}
> >>     16585@1542650042.983406:handle_qmp_command mon 0x5597b8e0de80 req: 
> >> {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, 
> >> "node-name": "queue-blocks-4", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-8YRbOJ/fifo"}}
> >>     16585@1542650042.984170:handle_qmp_command mon 0x5597b8e0de80 req: 
> >> {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, 
> >> "node-name": "queue-blocks-5", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-8YRbOJ/fifo"}}
> >>     16585@1542650042.984975:handle_qmp_command mon 0x5597b8e0de80 req: 
> >> {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, 
> >> "node-name": "queue-blocks-6", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-8YRbOJ/fifo"}}
> >>     16585@1542650042.985778:handle_qmp_command mon 0x5597b8e0de80 req: 
> >> {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, 
> >> "node-name": "queue-blocks-7", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-8YRbOJ/fifo"}}
> >>     16585@1542650042.986565:handle_qmp_command mon 0x5597b8e0de80 req: 
> >> {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, 
> >> "node-name": "queue-blocks-8", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-8YRbOJ/fifo"}}
> >>     16585@1542650042.987396:handle_qmp_command mon 0x5597b8e0de80 req: 
> >> {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, 
> >> "node-name": "queue-blocks-9", "driver": "blkdebug", "config": 
> >> "/tmp/qmp-test-8YRbOJ/fifo"}}
> >>     16585@1542650042.987626:handle_qmp_command mon 0x5597b8e0de80 req: 
> >> {"exec-oob": "migrate-pause"}
> >>     16585@1542650042.987630:monitor_qmp_cmd_out_of_band oob-1
> >>     16585@1542650042.987676:monitor_qmp_respond oob-1
> >>     16585@1542650042.988035:monitor_qmp_respond queue-blocks-2
> >>     16585@1542650042.988066:monitor_qmp_cmd_in_band queue-blocks-3
> >>     16585@1542650042.988329:monitor_qmp_respond queue-blocks-3
> >>     16585@1542650042.988360:monitor_qmp_cmd_in_band queue-blocks-4
> >>     16585@1542650042.988575:monitor_qmp_respond queue-blocks-4
> >>     16585@1542650042.988616:monitor_qmp_cmd_in_band queue-blocks-5
> >>     16585@1542650042.988803:monitor_qmp_respond queue-blocks-5
> >>     16585@1542650042.988826:monitor_qmp_cmd_in_band queue-blocks-6
> >>     16585@1542650042.988977:monitor_qmp_respond queue-blocks-6
> >>     16585@1542650042.989001:monitor_qmp_cmd_in_band queue-blocks-7
> >>     16585@1542650042.989223:monitor_qmp_respond queue-blocks-7
> >>     16585@1542650042.989279:monitor_qmp_cmd_in_band queue-blocks-8
> >>     16585@1542650042.989447:monitor_qmp_respond queue-blocks-8
> >>     16585@1542650042.989469:monitor_qmp_cmd_in_band queue-blocks-9
> >>     16585@1542650042.989582:monitor_qmp_respond queue-blocks-9
> >> 
> >> Tracepoint monitor_suspend still not hit.  This means the queue does not
> >> fill up.
> >
> > You are right.  I think I figured out one thing but I missed another
> > point here that "sent the command" does not mean "received by the QMP
> > server": we can never identify when the queue is full, meanwhile
> > that's what we need to write a unit test for it.
> >
> > For example: in this test what we want to achieve is:
> >
> >   for i in 1 .. N:
> >       send blocking commands Bi
> >   ... wait until queue is full ...  [1]
> >   send oob command O1
> >   unblock blocking command B1
> >   receive response B1
> >   receive response O1
> >   for i in 2 .. N:
> >       unblock blocking command Bi
> >       receive response Bi
> >
> > All looks sane except the assumption at [1].  In the code we didn't do
> > anything for [1] assuming that it's naturally happening.  However it's
> > not since when finishing the first for loop we only queued the
> > commands into the send buffer, but we can't guarantee that the
> > commands are well received and further queued in the qmp request list
> > on the server side.  Now when the server qmp request queue gets full
> > it will only suspend the monitor, and that event is totally unseen
> > from the client POV, so IMHO we don't have a good way to guarantee
> > this.  E.g., we can do a very long sleep(100) at [1] to pretty make
> > sure the queue full is happened, however it might still break on a
> > super slow machine.
> >
> > In other words, I don't see a good way to automate the testing of
> > queue full for now (but I'm pretty sure the queue full is working
> > there since if you really insert a sleep you'll definitely see the
> > monitor_suspend tracepoint).  We might need something like an event to
> > make sure the queue full happened and then the client can detect this
> > and insert the next out-of-band command, but then we're going backword
> > and I don't see much point to introduce a new event only for testing
> > purpose...
> 
> We want the test to drive QEMU into the "queue full" state with an
> out-of-band command unreceived, then watch the queue drain and the
> out-of-band command overtaking part of it.
> 
> Driving QEMU into this state is easy enough: we can use a blocking
> in-band command to plug the queue's drain.
> 
> Once we have QEMU in this state, we can unblock that command and watch
> the show.  But we really have to wait until QEMU reaches this state
> before we unblock.
> 
> Our problem is the test can't easily observe when QEMU reaches this
> state.
> 
> You mentioned sending a "queue full" event, and that adding such an
> event just for testing is hard to justify.  I agree.  Could such an
> event have a non-testing use?  All I can think of is letting the
> management application detect "we mismanaged the queue and compromised
> monitor availability for out-of-band commands".  Doesn't sound
> particularly convincing to me, but we can talk with libvirt guys to see
> whether they want it.
> 
> Can the test observe the monitor is suspended just from the behavior of
> its monitor socket?  Not reliably: sending lots of input eventually
> fails with EAGAIN when the monitor is suspended, but it can also fail
> when the monitor reads the input too slowly.
> 
> Can the test observe the monitor_suspend trace point?  Awkward, because
> the trace backends are configurable.  We could have the test monitor the
> "simple" backend's output file, skipping the test when "simple" isn't
> available.  Hmm.
> 
> Here comes the heavy hammer: extend the qtest protocol.  When the
> monitor suspends / resumes, send a suitable asynchronous message to the
> qtest socket.  Have libqtest use that to track monitor suspend state.
> The test can then busy-wait for the state to flip to suspended.
> 
> Other ideas?

I don't have more to add (already a lot of ideas! :-).  I'd say I
would prefer to start with the queue full event and we can discuss
with libvirt on that after 3.1 when proper.  And besides libvirt
(which I am not quite sure whether this event could be really useful
in the near future) maybe it can also be used as a hint for any new
QMP client when people want to complain about "hey, why my QMP client
didn't get any respond from QEMU" - if the client didn't get any
response but however it sees this event then we know the answer even
before debugging more.

> 
> >> > Markus, do you want me to repost a new version with this change?  Is
> >> > it still possible to have the oob-default series for 3.1?
> >> 
> >> We're trying to, but we need to get the test to work.
> >> 
> >> Two problems:
> >> 
> >> 1. The test doesn't seem to succed at testing "queue full".
> >> 
> >> 2. The test might still be racy.
> >> 
> >> I suspect the code we test is solid, and it's "only" the test we screwed
> >> up.
> >
> > I agree.  Do you have better idea to test queue full?  I'd be more
> > than glad to try any suggestions here, otherwise I am thinking whether
> > we can just simply drop the queue full test (which is the last patch
> > of the oob series) for now but merge the rest.  We can introduce new
> > queue full test after 3.1 if we want, though again I really doubt
> > whether we can without new things introduced into the procedure.
> >
> > Thoughts?
> 
> I'm afraid dropping the test (for now?) is what we have to do.

Agreed.

Thanks,

-- 
Peter Xu

Reply via email to