Yu,
On 07/03/2024 00:30, Philippe Mathieu-Daudé wrote: > Cc'ing RDMA migration reviewers/maintainers: > > $ ./scripts/get_maintainer.pl -f migration/rdma.c > Li Zhijian <lizhij...@fujitsu.com> (reviewer:RDMA Migration) > Peter Xu <pet...@redhat.com> (maintainer:Migration) > Fabiano Rosas <faro...@suse.de> (maintainer:Migration) > > On 5/3/24 22:32, Yu Zhang wrote: >> Hello Het and all, >> >> while I was testing qemu-8.2, I saw a lot of our migration test cases failed. >> After debugging the commits of the 8.2 branch, I saw the issue and mad a >> diff: >> >> diff --git a/migration/rdma.c b/migration/rdma.c >> index 6a29e53daf..f10d56f556 100644 >> --- a/migration/rdma.c >> +++ b/migration/rdma.c >> @@ -3353,9 +3353,9 @@ static int qemu_rdma_accept(RDMAContext *rdma) >> goto err_rdma_dest_wait; >> } >> >> - isock->host = rdma->host; >> + isock->host = g_strdup_printf("%s", rdma->host); >> isock->port = g_strdup_printf("%d", rdma->port); Thanks for your analysis. It will be great if you send this as a patch. isock is defined as a _autoptr VVV 3333 _autoptr(InetSocketAddress) isock = g_new0(InetSocketAddress, 1); I'm surprised that it seems the auto free scheme will free the member of isock as well see below valrind log. That will cause a double free. ==809138== Invalid free() / delete / delete[] / realloc() ==809138== at 0x483A9F5: free (vg_replace_malloc.c:538) ==809138== by 0x598F70C: g_free (in /usr/lib64/libglib-2.0.so.0.6600.8) ==809138== by 0x79B6AD: qemu_rdma_cleanup (rdma.c:2432) ==809138== by 0x79CEE6: qio_channel_rdma_close_rcu (rdma.c:3108) ==809138== by 0xC2E339: call_rcu_thread (rcu.c:301) ==809138== by 0xC2116A: qemu_thread_start (qemu-thread-posix.c:541) ==809138== by 0x72683F8: ??? (in /usr/lib64/libpthread-2.32.so) ==809138== by 0x73824C2: clone (in /usr/lib64/libc-2.32.so) ==809138== Address 0x13daa070 is 0 bytes inside a block of size 14 free'd ==809138== at 0x483A9F5: free (vg_replace_malloc.c:538) ==809138== by 0x598F70C: g_free (in /usr/lib64/libglib-2.0.so.0.6600.8) ==809138== by 0xC058CF: qapi_dealloc_type_str (qapi-dealloc-visitor.c:68) ==809138== by 0xC09EF3: visit_type_str (qapi-visit-core.c:349) ==809138== by 0xBDDECC: visit_type_InetSocketAddressBase_members (qapi-visit-sockets.c:29) ==809138== by 0xBDE055: visit_type_InetSocketAddress_members (qapi-visit-sockets.c:67) ==809138== by 0xBDE30D: visit_type_InetSocketAddress (qapi-visit-sockets.c:119) ==809138== by 0xBDDB38: qapi_free_InetSocketAddress (qapi-types-sockets.c:51) ==809138== by 0x792351: glib_autoptr_clear_InetSocketAddress (qapi-types-sockets.h:109) ==809138== by 0x79236F: glib_autoptr_cleanup_InetSocketAddress (qapi-types-sockets.h:109) ==809138== by 0x79D956: qemu_rdma_accept (rdma.c:3341) ==809138== by 0x79F05A: rdma_accept_incoming_migration (rdma.c:4041) ==809138== Block was alloc'd at ==809138== at 0x4839809: malloc (vg_replace_malloc.c:307) ==809138== by 0x5992BB8: g_malloc (in /usr/lib64/libglib-2.0.so.0.6600.8) ==809138== by 0x59A7FE3: g_strdup (in /usr/lib64/libglib-2.0.so.0.6600.8) ==809138== by 0x79C2A8: qemu_rdma_data_init (rdma.c:2731) ==809138== by 0x79F183: rdma_start_incoming_migration (rdma.c:4081) ==809138== by 0x76F200: qemu_start_incoming_migration (migration.c:581) ==809138== by 0x77193A: qmp_migrate_incoming (migration.c:1735) ==809138== by 0x74B3D3: qmp_x_exit_preconfig (vl.c:2718) ==809138== by 0x74DB6F: qemu_init (vl.c:3753) ==809138== by 0xA14F3F: main (main.c:47) Thanks Zhijian >> >> which was introduced by the commit below: >> >> commit 3fa9642ff7d51f7fc3ba68e6ccd13a939d5bd609 (HEAD) >> Author: Het Gala <het.g...@nutanix.com> >> Date: Mon Oct 23 15:20:45 2023 -0300 >> >> migration: convert rdma backend to accept MigrateAddress >> >> RDMA based transport backend for 'migrate'/'migrate-incoming' QAPIs >> accept new wire protocol of MigrateAddress struct. >> >> It is achived by parsing 'uri' string and storing migration parameters >> required for RDMA connection into well defined InetSocketAddress struct. >> ... >> >> A debug line >> isock->host = rdma->host; >> isock->port = g_strdup_printf("%d", rdma->port); >> +fprintf(stdout, "QEMU: %s, host %s, port %s\n", __func__, >> isock->host, isock->port); >> >> produced this error: >> QEMU: qemu_rdma_accept, host ::, port 8089 >> corrupted size vs. prev_size in fastbins >> >> on the target host, which may indicate a crash related to the memory >> allocation or a memory >> corruption of the data. With the patch, it doesn't happen any more, >> and the migration is fine. >> Could you be kind to test this and confirm the issue? >> >> Furthermore, I'm confused by the two struct: >> >> struct InetSocketAddressBase { >> char *host; >> char *port; >> }; >> >> struct InetSocketAddress { >> /* Members inherited from InetSocketAddressBase: */ >> char *host; >> char *port; >> >> To my understanding, they are used to consolidate the separated data >> to a well-defined >> struct "MigrateAddress", while the struct whose member receive their >> data has a different type: >> >> typedef struct RDMAContext { >> char *host; >> int port; >> ... >> } >> >> Is there any reason to keep "port" like this (char* instead of int) or >> can we improve it? >> Thank you so much for any of your comments! >> >> Best regards, >> >> Yu Zhang @ IONOS Compute Platform >> 05.03.2024 >> >