On Thu, Mar 31, 2022, 10:37 AM John Snow <js...@redhat.com> wrote: > > > On Thu, Mar 31, 2022, 6:47 AM Li Zhang <lizh...@suse.de> wrote: > >> On 3/31/22 08:10, Hanna Reitz wrote: >> > On 29.03.22 18:49, Li Zhang wrote: >> >> Update what I observed. >> >> >> >> It seems that aqmp is not stable when running test cases. >> >> So I revert the patches as the following, iotest40/41 test cases work >> >> well. >> > >> > Thanks for bisecting. I haven’t seen this problem before, so I didn’t >> > look into it; CC-ing John, perhaps he figures something. >> > >> Thanks a lot. >> It happens occassionally. I didn't find a way to reproduce it manually. >> > > Sorry for the instability. > > A few questions then: > > - what operating system and version > - kernel version > - architecture (looks like x86 and others?) > - what commit/version of QEMU are you testing? > - what python version are you using to run iotests? >
I'll also point out these patchsets, because we're in different timezones and if I'm lucky it will be the answer: [1] https://patchew.org/QEMU/20220201041134.1237016-1-js...@redhat.com/ [2] https://patchew.org/QEMU/20220225205948.3693480-1-js...@redhat.com/ #1 wasn't sufficient to fix the problem, but I had hoped #2 was. The second patchset was merged somewhat recently, just before rc testing for 7.0. Thanks, --js > > >> > Hanna >> > >> >> commit 76cd358671e6b8e7c435ec65b1c44200254514a9 >> >> >> >> Author: John Snow <js...@redhat.com> >> >> >> >> Date: Tue Oct 26 13:56:12 2021 -0400 >> >> >> >> >> >> >> >> python, iotests: replace qmp with aqmp >> >> >> >> >> >> >> >> Swap out the synchronous QEMUMonitorProtocol from qemu.qmp with >> >> the sync >> >> >> >> wrapper from qemu.aqmp instead. >> >> >> >> >> >> >> >> Add an escape hatch in the form of the environment variable >> >> >> >> QEMU_PYTHON_LEGACY_QMP which allows you to cajole QEMUMachine into >> >> using >> >> >> >> the old implementation, proving that both implementations work >> >> >> >> concurrently. >> >> >> >> >> >> >> >> Signed-off-by: John Snow <js...@redhat.com> >> >> >> >> Reviewed-by: Kevin Wolf <kw...@redhat.com> >> >> >> >> Reviewed-by: Hanna Reitz <hre...@redhat.com> >> >> >> >> Message-id: 20211026175612.4127598-9-js...@redhat.com >> >> >> >> Signed-off-by: John Snow <js...@redhat.com> >> >> >> >> >> >> commit 1611e6cf4e7163f6102b37010a8b7e7120f468b5 >> >> >> >> Author: John Snow <js...@redhat.com> >> >> >> >> Date: Thu Nov 18 15:46:18 2021 -0500 >> >> >> >> >> >> >> >> python/machine: handle "fast" QEMU terminations >> >> >> >> >> >> >> >> In the case that the QEMU process actually launches -- but then >> >> dies so >> >> >> >> quickly that we can't establish a QMP connection to it -- >> QEMUMachine >> >> >> >> currently calls _post_shutdown() assuming that it never launched >> >> the VM >> >> >> >> process. >> >> >> >> >> >> >> >> This isn't true, though: it "merely" may have failed to establish >> >> a QMP >> >> >> >> connection and the process is in the middle of its own exit path. >> >> >> >> >> >> >> >> If we don't wait for the subprocess, the caller may get a bogus >> >> `None` >> >> >> >> return for .exitcode(). This behavior was observed from >> >> >> >> device-crash-test; after the switch to Async QMP, the timings were >> >> >> >> changed such that it was now seemingly possible to witness the >> >> failure >> >> >> >> of "vm.launch()" *prior* to the exitcode becoming available. >> >> >> >> >> >> >> >> The semantic of the `_launched` property is changed in this >> >> >> >> patch. Instead of representing the condition "launch() executed >> >> >> >> successfully", it will now represent "has forked a child process >> >> >> >> successfully". This way, wait() when called in the exit path won't >> >> >> >> become a no-op. >> >> >> >> >> >> >> >> Signed-off-by: John Snow <js...@redhat.com> >> >> >> >> Reviewed-by: Willian Rampazzo <willi...@redhat.com> >> >> >> >> Message-id: 20211118204620.1897674-6-js...@redhat.com >> >> >> >> Signed-off-by: John Snow <js...@redhat.com> >> >> >> >> >> >> >> >> >> >> >> >> On 3/25/22 11:17, Li Zhang wrote: >> >>> Hi, >> >>> >> >>> I backport some iotests patches to the tree and change timeout. >> >>> It doesn't work. >> >>> >> >>> Sometimes, iotest41 also reports the errors. >> >>> [ 1347s] >> >>> >> +====================================================================== >> >>> [ 1347s] +ERROR: test_top_node_in_wrong_chain >> (__main__.TestSingleDrive) >> >>> [ 1347s] >> >>> >> +---------------------------------------------------------------------- >> >>> [ 1347s] +Traceback (most recent call last): >> >>> [ 1347s] + File >> >>> >> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", >> >>> line 399, in launch >> >>> [ 1347s] + self._launch() >> >>> [ 1347s] + File >> >>> >> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", >> >>> line 434, in _launch >> >>> [ 1347s] + self._post_launch() >> >>> [ 1347s] + File >> >>> >> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py", line >> >>> 147, in _post_launch >> >>> [ 1347s] + super()._post_launch() >> >>> [ 1347s] + File >> >>> >> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", >> >>> line 340, in _post_launch >> >>> [ 1347s] + self._qmp.accept(self._qmp_timer) >> >>> [ 1347s] + File >> >>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", >> >>> line 69, in accept >> >>> [ 1347s] + timeout >> >>> [ 1347s] + File >> >>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", >> >>> line 42, in _sync >> >>> [ 1347s] + asyncio.wait_for(future, timeout=timeout) >> >>> [ 1347s] + File "/usr/lib64/python3.6/asyncio/base_events.py", line >> >>> 488, in run_until_complete >> >>> [ 1347s] + return future.result() >> >>> [ 1347s] + File "/usr/lib64/python3.6/asyncio/tasks.py", line 362, >> >>> in wait_for >> >>> [ 1347s] + raise futures.TimeoutError() >> >>> [ 1347s] +concurrent.futures._base.TimeoutError >> >>> >> >>> >> >>> I can see other errors like this, it's the problem of the socket. >> >>> >> >>> [ 1535s] socket_accept failed: Resource temporarily unavailable >> >>> [ 1535s] ** >> >>> [ 1535s] >> >>> ERROR:../tests/qtest/libqtest.c:321:qtest_init_without_qmp_handshake: >> >>> assertion failed: (s->fd >= 0 && s->qmp_fd >= 0) >> >>> >> >>> >> >>> The script is running the command as this: >> >>> /usr/bin/make -O -j4 check-block V=1 >> >>> >> >>> I can see the errors on ppc, arm or x86. >> >>> But I couldn't reproduce it when I run it manually. >> >>> >> >>> It will be appreciated if any suggestions. Thanks. >> >>> >> >>> >> >>> On 3/24/22 14:47, Li Zhang wrote: >> >>>> Hi, >> >>>> >> >>>> When I run the testsuit on our buidling system, it reports a timeout >> >>>> sometimes not always as the following. >> >>>> It couldn't connect qmp socket. Any ideas about this problem? >> >>>> >> >>>> >> >>>> [ 1989s] --- >> >>>> /home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040.out >> >>>> [ 1989s] +++ 040.out.bad >> >>>> [ 1989s] @@ -1,5 +1,55 @@ >> >>>> [ 1989s] >> >>>> -................................................................. >> >>>> [ 1989s] +....ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed to >> >>>> establish connection: asyncio.exceptions.CancelledError >> >>>> [ 1989s] >> >>>> >> +E..................................ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed >> >> >>>> to establish connection: asyncio.exceptions.CancelledError >> >>>> [ 1989s] +E......................... >> >>>> [ 1989s] >> >>>> >> +====================================================================== >> >>>> [ 1989s] +ERROR: test_commit_node >> (__main__.TestActiveZeroLengthImage) >> >>>> [ 1989s] >> >>>> >> +---------------------------------------------------------------------- >> >>>> [ 1989s] +Traceback (most recent call last): >> >>>> [ 1989s] + File >> >>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040", >> >>>> line 94, in setUp >> >>>> [ 1989s] + self.vm.launch() >> >>>> [ 1989s] + File >> >>>> >> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", >> >>>> line 399, in launch >> >>>> [ 1989s] + self._launch() >> >>>> [ 1989s] + File >> >>>> >> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", >> >>>> line 434, in _launch >> >>>> [ 1989s] + self._post_launch() >> >>>> [ 1989s] + File >> >>>> >> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py", >> >>>> line 147, in _post_launch >> >>>> [ 1989s] + super()._post_launch() >> >>>> [ 1989s] + File >> >>>> >> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py", >> >>>> line 340, in _post_launch >> >>>> [ 1989s] + self._qmp.accept(self._qmp_timer) >> >>>> [ 1989s] + File >> >>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", >> >>>> line 67, in accept >> >>>> [ 1989s] + self._sync( >> >>>> [ 1989s] + File >> >>>> "/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py", >> >>>> line 41, in _sync >> >>>> [ 1989s] + return self._aloop.run_until_complete( >> >>>> [ 1989s] + File "/usr/lib64/python3.8/asyncio/base_events.py", line >> >>>> 616, in run_until_complete >> >>>> [ 1989s] + return future.result() >> >>>> [ 1989s] + File "/usr/lib64/python3.8/asyncio/tasks.py", line 501, >> >>>> in wait_for >> >>>> [ 1989s] + raise exceptions.TimeoutError() >> >>>> [ 1989s] +asyncio.exceptions.TimeoutError >> >>>> >> >>> >> >>> >> >> >> > >> >> >>