On 20/02/2024 02.13, Nicholas Piggin wrote:
On Tue Feb 20, 2024 at 12:10 AM AEST, Thomas Huth wrote:
On 19/02/2024 07.17, Nicholas Piggin wrote:
The fastpath in cpu_physical_memory_sync_dirty_bitmap() to test large
aligned ranges forgot to bring the TCG TLB up to date after clearing
some of the dirty memory bitmap bits. This can result in stores though
the TCG TLB not setting the dirty memory bitmap and ultimately causes
memory corruption / lost updates during migration from a TCG host.

Fix this by exporting an abstracted function to call when dirty bits
have been cleared.

Fixes: aa8dc044772 ("migration: synchronize memory bitmap 64bits at a time")
Signed-off-by: Nicholas Piggin <npig...@gmail.com>
---

Sounds promising! ... but it doesn't seem to fix the migration-test qtest
with s390x when it gets enabled again:

Did it fix kvm-unit-tests for you?

It does, indeed! With your QEMU patch here, your new selftest-migration test of the k-u-t is working reliably with TCG now, indeed. Thus feel free to add:

Tested-by: Thomas Huth <th...@redhat.com>

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -3385,15 +3385,6 @@ int main(int argc, char **argv)
           return g_test_run();
       }

-    /*
-     * Similar to ppc64, s390x seems to be touchy with TCG, so disable it
-     * there until the problems are resolved
-     */
-    if (g_str_equal(arch, "s390x") && !has_kvm) {
-        g_test_message("Skipping test: s390x host with KVM is required");
-        return g_test_run();
-    }
-
       tmpfs = g_dir_make_tmp("migration-test-XXXXXX", &err);
       if (!tmpfs) {
           g_test_message("Can't create temporary directory in %s: %s",

I wonder whether there is more stuff like this necessary somewhere?

Possibly. That's what the commit logs for the TCG disable indicate. I
have found another dirty bitmap TCG race too. I'll send it out after
some more testing.

Did you try to re-enable tests/qtest/migration-test.c for ppc64 with TCG to
see whether that works fine now?

Hmm, I did try and so far ppc64 is not failing even with upstream QEMU.

Oh, indeed! Actually, now that you mentioned it, I remembered that I checked it a couple of weeks ago already:

https://lore.kernel.org/qemu-devel/7d4f5624-83d2-4330-9315-b23869529...@redhat.com/

I'll try with s390x. Any additional build or runtime options to make it
break? How long does it take for breakage to be evident?

For me, it normally breaks after running the migration test a couple of few times already, let's say one time out of ten runs?

 Thomas


Reply via email to