Stefan Berger <stef...@linux.ibm.com> writes: > On 10/15/24 6:02 PM, Fabiano Rosas wrote: >> Stefan Berger <stef...@linux.ibm.com> writes: >> >>> On 10/15/24 3:57 PM, Fabiano Rosas wrote: >>>> Stefan Berger <stef...@linux.ibm.com> writes: >>>> >>> >>>>> >>>>> So this here is failing for you every time? >>>>> >>>>> QTEST_QEMU_BINARY=build/qemu-system-aarch64 >>>>> ./build/tests/qtest/tpm-tis-device-swtpm-test >>>> >>>> Sorry, I was unclear. No, that runs for about 30 iterations before it >>>> fails. I just ran each of these in a terminal window: >>>> >>>> $ for i in $(seq 1 999); do echo "$i ============="; >>>> QTEST_QEMU_BINARY=./qemu-system-aarch64 >>>> ./tests/qtest/tpm-tis-device-swtpm-test || break ; done >>> >>> On my Fedora 40 host this command line here alone has been running for >>> 250 loop iterations now and is still continuing. >>> >>>> $ make -j$(nproc) check >>> >>> So this needs to be run in parallel to the above command line to cause >>> the failure? >>> >> >> Yes, I've been using that method to reproduce live migration race >> conditions as well. It's quite effective. >> >> If you don't think you'll be able to find the root cause due to the >> unreproducibility on your side, maybe we could at least add an assert >> that bcount is not larger than rsp_size. I think that would at least >> give an explicit error instead of a buffer overflow. >> >> I can also try to dig deeper into this when I get some time. At the >> moment I know nothing about the tpm device emulation. >> > > The loop has run 3000 times by itself so that part is stable. However, > it seems there is some other test case that the loop cannot run in > parallel with. So, yes there is 'something'. ... ... Just having all > CPUs in a system busy requires waiting for migration to be complete on > the dst_qemu side as well. Can you try it with this patch: > > diff --git a/tests/qtest/tpm-tests.c b/tests/qtest/tpm-tests.c > index fb94496bbd..b52cd44841 100644 > --- a/tests/qtest/tpm-tests.c > +++ b/tests/qtest/tpm-tests.c > @@ -115,6 +115,7 @@ void tpm_test_swtpm_migration_test(const char > *src_tpm_path, > > tpm_util_migrate(src_qemu, uri); > tpm_util_wait_for_migration_complete(src_qemu); > + tpm_util_wait_for_migration_complete(dst_qemu); > > tpm_util_pcrread(dst_qemu, tx, tpm_pcrread_resp, > sizeof(tpm_pcrread_resp)); > > > For me this fixes the issue I had seen where reading the STS register > was done too early before all the TPM TIS state was completely restored. > The active locality was -1 and STS return 0xffffffff and from then on > things went bad.
Thanks, that fixes the issue. Could you send a patch please?