Stefan Berger <stef...@linux.ibm.com> writes:

> On 10/15/24 6:02 PM, Fabiano Rosas wrote:
>> Stefan Berger <stef...@linux.ibm.com> writes:
>> 
>>> On 10/15/24 3:57 PM, Fabiano Rosas wrote:
>>>> Stefan Berger <stef...@linux.ibm.com> writes:
>>>>
>>>
>>>>>
>>>>> So this here is failing for you every time?
>>>>>
>>>>> QTEST_QEMU_BINARY=build/qemu-system-aarch64
>>>>> ./build/tests/qtest/tpm-tis-device-swtpm-test
>>>>
>>>> Sorry, I was unclear. No, that runs for about 30 iterations before it
>>>> fails. I just ran each of these in a terminal window:
>>>>
>>>> $ for i in $(seq 1 999); do echo "$i =============";  
>>>> QTEST_QEMU_BINARY=./qemu-system-aarch64 
>>>> ./tests/qtest/tpm-tis-device-swtpm-test || break ; done
>>>
>>> On my Fedora 40 host this command line here alone has been running for
>>> 250 loop iterations now and is still continuing.
>>>
>>>> $ make -j$(nproc) check
>>>
>>> So this needs to be run in parallel to the above command line to cause
>>> the failure?
>>>
>> 
>> Yes, I've been using that method to reproduce live migration race
>> conditions as well. It's quite effective.
>> 
>> If you don't think you'll be able to find the root cause due to the
>> unreproducibility on your side, maybe we could at least add an assert
>> that bcount is not larger than rsp_size. I think that would at least
>> give an explicit error instead of a buffer overflow.
>> 
>> I can also try to dig deeper into this when I get some time. At the
>> moment I know nothing about the tpm device emulation.
>> 
>
> The loop has run 3000 times by itself so that part is stable. However, 
> it seems there is some other test case that the loop cannot run in 
> parallel with. So, yes there is 'something'. ... ... Just having all 
> CPUs in a system busy requires waiting for migration to be complete on 
> the dst_qemu side as well. Can you try it with this patch:
>
> diff --git a/tests/qtest/tpm-tests.c b/tests/qtest/tpm-tests.c
> index fb94496bbd..b52cd44841 100644
> --- a/tests/qtest/tpm-tests.c
> +++ b/tests/qtest/tpm-tests.c
> @@ -115,6 +115,7 @@ void tpm_test_swtpm_migration_test(const char 
> *src_tpm_path,
>
>       tpm_util_migrate(src_qemu, uri);
>       tpm_util_wait_for_migration_complete(src_qemu);
> +    tpm_util_wait_for_migration_complete(dst_qemu);
>
>       tpm_util_pcrread(dst_qemu, tx, tpm_pcrread_resp,
>                        sizeof(tpm_pcrread_resp));
>
>
> For me this fixes the issue I had seen where reading the STS register 
> was done too early before all the TPM TIS state was completely restored. 
> The active locality was -1 and STS return 0xffffffff and from then on 
> things went bad.

Thanks, that fixes the issue. Could you send a patch please?

Reply via email to