On Thu, Mar 31, 2022, 10:37 AM John Snow <js...@redhat.com> wrote:

>
>
> On Thu, Mar 31, 2022, 6:47 AM Li Zhang <lizh...@suse.de> wrote:
>
>> On 3/31/22 08:10, Hanna Reitz wrote:
>> > On 29.03.22 18:49, Li Zhang wrote:
>> >> Update what I observed.
>> >>
>> >> It seems that aqmp is not stable when running test cases.
>> >> So I revert the patches as the following, iotest40/41 test cases work
>> >> well.
>> >
>> > Thanks for bisecting.  I haven’t seen this problem before, so I didn’t
>> > look into it; CC-ing John, perhaps he figures something.
>> >
>> Thanks a lot.
>> It happens occassionally. I didn't find a way to reproduce it manually.
>>
>
> Sorry for the instability.
>
> A few questions then:
>
> - what operating system and version
> - kernel version
> - architecture (looks like x86 and others?)
> - what commit/version of QEMU are you testing?
> - what python version are you using to run iotests?
>

I'll also point out these patchsets, because we're in different timezones
and if I'm lucky it will be the answer:

[1] https://patchew.org/QEMU/20220201041134.1237016-1-js...@redhat.com/

[2] https://patchew.org/QEMU/20220225205948.3693480-1-js...@redhat.com/

#1 wasn't sufficient to fix the problem, but I had hoped #2 was. The second
patchset was merged somewhat recently, just before rc testing for 7.0.

Thanks,
--js



>
>
>> > Hanna
>> >
>> >> commit 76cd358671e6b8e7c435ec65b1c44200254514a9
>> >>
>> >> Author: John Snow <js...@redhat.com>
>> >>
>> >> Date:   Tue Oct 26 13:56:12 2021 -0400
>> >>
>> >>
>> >>
>> >>     python, iotests: replace qmp with aqmp
>> >>
>> >>
>> >>
>> >>     Swap out the synchronous QEMUMonitorProtocol from qemu.qmp with
>> >> the sync
>> >>
>> >>     wrapper from qemu.aqmp instead.
>> >>
>> >>
>> >>
>> >>     Add an escape hatch in the form of the environment variable
>> >>
>> >>     QEMU_PYTHON_LEGACY_QMP which allows you to cajole QEMUMachine into
>> >> using
>> >>
>> >>     the old implementation, proving that both implementations work
>> >>
>> >>     concurrently.
>> >>
>> >>
>> >>
>> >>     Signed-off-by: John Snow <js...@redhat.com>
>> >>
>> >>     Reviewed-by: Kevin Wolf <kw...@redhat.com>
>> >>
>> >>     Reviewed-by: Hanna Reitz <hre...@redhat.com>
>> >>
>> >>     Message-id: 20211026175612.4127598-9-js...@redhat.com
>> >>
>> >>     Signed-off-by: John Snow <js...@redhat.com>
>> >>
>> >>
>> >> commit 1611e6cf4e7163f6102b37010a8b7e7120f468b5
>> >>
>> >> Author: John Snow <js...@redhat.com>
>> >>
>> >> Date:   Thu Nov 18 15:46:18 2021 -0500
>> >>
>> >>
>> >>
>> >>     python/machine: handle "fast" QEMU terminations
>> >>
>> >>
>> >>
>> >>     In the case that the QEMU process actually launches -- but then
>> >> dies so
>> >>
>> >>     quickly that we can't establish a QMP connection to it --
>> QEMUMachine
>> >>
>> >>     currently calls _post_shutdown() assuming that it never launched
>> >> the VM
>> >>
>> >>     process.
>> >>
>> >>
>> >>
>> >>     This isn't true, though: it "merely" may have failed to establish
>> >> a QMP
>> >>
>> >>     connection and the process is in the middle of its own exit path.
>> >>
>> >>
>> >>
>> >>     If we don't wait for the subprocess, the caller may get a bogus
>> >> `None`
>> >>
>> >>     return for .exitcode(). This behavior was observed from
>> >>
>> >>     device-crash-test; after the switch to Async QMP, the timings were
>> >>
>> >>     changed such that it was now seemingly possible to witness the
>> >> failure
>> >>
>> >>     of "vm.launch()" *prior* to the exitcode becoming available.
>> >>
>> >>
>> >>
>> >>     The semantic of the `_launched` property is changed in this
>> >>
>> >>     patch. Instead of representing the condition "launch() executed
>> >>
>> >>     successfully", it will now represent "has forked a child process
>> >>
>> >>     successfully". This way, wait() when called in the exit path won't
>> >>
>> >>     become a no-op.
>> >>
>> >>
>> >>
>> >>     Signed-off-by: John Snow <js...@redhat.com>
>> >>
>> >>     Reviewed-by: Willian Rampazzo <willi...@redhat.com>
>> >>
>> >>     Message-id: 20211118204620.1897674-6-js...@redhat.com
>> >>
>> >>     Signed-off-by: John Snow <js...@redhat.com>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On 3/25/22 11:17, Li Zhang wrote:
>> >>> Hi,
>> >>>
>> >>> I backport some iotests patches to the tree and change timeout.
>> >>> It doesn't work.
>> >>>
>> >>> Sometimes, iotest41 also reports the errors.
>> >>> [ 1347s]
>> >>>
>> +======================================================================
>> >>> [ 1347s] +ERROR: test_top_node_in_wrong_chain
>> (__main__.TestSingleDrive)
>> >>> [ 1347s]
>> >>>
>> +----------------------------------------------------------------------
>> >>> [ 1347s] +Traceback (most recent call last):
>> >>> [ 1347s] +  File
>> >>>
>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
>> >>> line 399, in launch
>> >>> [ 1347s] +    self._launch()
>> >>> [ 1347s] +  File
>> >>>
>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
>> >>> line 434, in _launch
>> >>> [ 1347s] +    self._post_launch()
>> >>> [ 1347s] +  File
>> >>>
>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py", line
>> >>> 147, in _post_launch
>> >>> [ 1347s] +    super()._post_launch()
>> >>> [ 1347s] +  File
>> >>>
>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
>> >>> line 340, in _post_launch
>> >>> [ 1347s] +    self._qmp.accept(self._qmp_timer)
>> >>> [ 1347s] +  File
>> >>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py",
>> >>> line 69, in accept
>> >>> [ 1347s] +    timeout
>> >>> [ 1347s] +  File
>> >>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py",
>> >>> line 42, in _sync
>> >>> [ 1347s] +    asyncio.wait_for(future, timeout=timeout)
>> >>> [ 1347s] +  File "/usr/lib64/python3.6/asyncio/base_events.py", line
>> >>> 488, in run_until_complete
>> >>> [ 1347s] +    return future.result()
>> >>> [ 1347s] +  File "/usr/lib64/python3.6/asyncio/tasks.py", line 362,
>> >>> in wait_for
>> >>> [ 1347s] +    raise futures.TimeoutError()
>> >>> [ 1347s] +concurrent.futures._base.TimeoutError
>> >>>
>> >>>
>> >>> I can see other errors like this, it's the problem of the socket.
>> >>>
>> >>> [ 1535s] socket_accept failed: Resource temporarily unavailable
>> >>> [ 1535s] **
>> >>> [ 1535s]
>> >>> ERROR:../tests/qtest/libqtest.c:321:qtest_init_without_qmp_handshake:
>> >>> assertion failed: (s->fd >= 0 && s->qmp_fd >= 0)
>> >>>
>> >>>
>> >>> The script is running the command as this:
>> >>> /usr/bin/make -O -j4 check-block V=1
>> >>>
>> >>> I can see the errors on ppc, arm or x86.
>> >>> But I couldn't reproduce it when I run it manually.
>> >>>
>> >>> It will be appreciated if any suggestions. Thanks.
>> >>>
>> >>>
>> >>> On 3/24/22 14:47, Li Zhang wrote:
>> >>>> Hi,
>> >>>>
>> >>>> When I run the testsuit on our buidling system, it reports a timeout
>> >>>> sometimes not always as the following.
>> >>>> It couldn't connect qmp socket. Any ideas about this problem?
>> >>>>
>> >>>>
>> >>>> [ 1989s] ---
>> >>>> /home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040.out
>> >>>> [ 1989s] +++ 040.out.bad
>> >>>> [ 1989s] @@ -1,5 +1,55 @@
>> >>>> [ 1989s]
>> >>>> -.................................................................
>> >>>> [ 1989s] +....ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed to
>> >>>> establish connection: asyncio.exceptions.CancelledError
>> >>>> [ 1989s]
>> >>>>
>> +E..................................ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed
>>
>> >>>> to establish connection: asyncio.exceptions.CancelledError
>> >>>> [ 1989s] +E.........................
>> >>>> [ 1989s]
>> >>>>
>> +======================================================================
>> >>>> [ 1989s] +ERROR: test_commit_node
>> (__main__.TestActiveZeroLengthImage)
>> >>>> [ 1989s]
>> >>>>
>> +----------------------------------------------------------------------
>> >>>> [ 1989s] +Traceback (most recent call last):
>> >>>> [ 1989s] +  File
>> >>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040",
>> >>>> line 94, in setUp
>> >>>> [ 1989s] +    self.vm.launch()
>> >>>> [ 1989s] +  File
>> >>>>
>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
>> >>>> line 399, in launch
>> >>>> [ 1989s] +    self._launch()
>> >>>> [ 1989s] +  File
>> >>>>
>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
>> >>>> line 434, in _launch
>> >>>> [ 1989s] +    self._post_launch()
>> >>>> [ 1989s] +  File
>> >>>>
>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py",
>> >>>> line 147, in _post_launch
>> >>>> [ 1989s] +    super()._post_launch()
>> >>>> [ 1989s] +  File
>> >>>>
>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
>> >>>> line 340, in _post_launch
>> >>>> [ 1989s] +    self._qmp.accept(self._qmp_timer)
>> >>>> [ 1989s] +  File
>> >>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py",
>> >>>> line 67, in accept
>> >>>> [ 1989s] +    self._sync(
>> >>>> [ 1989s] +  File
>> >>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py",
>> >>>> line 41, in _sync
>> >>>> [ 1989s] +    return self._aloop.run_until_complete(
>> >>>> [ 1989s] +  File "/usr/lib64/python3.8/asyncio/base_events.py", line
>> >>>> 616, in run_until_complete
>> >>>> [ 1989s] +    return future.result()
>> >>>> [ 1989s] +  File "/usr/lib64/python3.8/asyncio/tasks.py", line 501,
>> >>>> in wait_for
>> >>>> [ 1989s] +    raise exceptions.TimeoutError()
>> >>>> [ 1989s] +asyncio.exceptions.TimeoutError
>> >>>>
>> >>>
>> >>>
>> >>
>> >
>>
>>
>>

Reply via email to