On Tue, May 16, 2023 at 02:37:00PM +0200, Laszlo Ersek wrote: > Something is not adding up. > > * I've run "ldd" on my locally built virt-v2v binary, to learn what shared > libraries it uses. Then I located all the packages (installed RPMs) providing > those libraries (symlinks in fact), using "rpm -qf". Then I installed the > debuginfo packages for each of those RPMs.
I've just tried it on RHEL 9 with upstream virt-v2v + commit c0bb624a151b. I'm seeing some failures but they look quite different to yours and all seem to be caused by a single leak in libvirt or how we use libvirt (at least potentially, I've not investigated, and I don't see this happening in Fedora). I have: glibc-2.34-67.el9.x86_64 glibc-debuginfo-2.34-67.el9.x86_64 glibc-debugsource-2.34-67.el9.x86_64 valgrind-3.19.0-3.el9.x86_64 valgrind-devel-3.19.0-3.el9.x86_64 libvirt-9.3.0-1.el9.x86_64 libvirt-debuginfo-9.3.0-1.el9.x86_64 libvirt-debugsource-9.3.0-1.el9.x86_64 How many of the tests fail for you? Just a small number or all of them? If it's a small number, which ones? Rich. > I *still* get stack dumps like the following (taken from > "tests/test-v2v-fedora-luks-on-lvm-conversion.sh.log"): > > ==34448== Conditional jump or move depends on uninitialised value(s) > ==34448== at 0x40191DD: __GI___tunables_init (dl-tunables.c:211) > ==34448== by 0x4020056: _dl_sysdep_start (dl-sysdep.c:110) > ==34448== by 0x4021A07: _dl_start (rtld.c:502) > ==34448== by 0x4020AD7: ??? (in /usr/lib64/ld-linux-x86-64.so.2) > ==34448== by 0xE: ??? > ==34448== by 0x1FFEFFE352: ??? > ==34448== by 0x1FFEFFE35B: ??? > ==34448== by 0x1FFEFFE366: ??? > ==34448== by 0x1FFEFFE369: ??? > ==34448== by 0x1FFEFFE36E: ??? > ==34448== by 0x1FFEFFE39F: ??? > ==34448== by 0x1FFEFFE3A2: ??? > ==34448== by 0x1FFEFFE3A7: ??? > ==34448== by 0x1FFEFFE3AD: ??? > ==34448== by 0x1FFEFFE3CA: ??? > ==34448== by 0x1FFEFFE3D0: ??? > ==34448== by 0x1FFEFFE3EB: ??? > ==34448== by 0x1FFEFFE3F1: ??? > ==34448== by 0x1FFEFFE40C: ??? > ==34448== by 0x1FFEFFE412: ??? > > Note the address 0x4020AD7. Valgrind itself says that the address is > somewhere inside "/usr/lib64/ld-linux-x86-64.so.2". Problem is, I *do* have > the debuginfo package installed (with correct version) for that binary. The > binary comes from "glibc-2.34-40.el9_1.1.x86_64", and I've got the matching > "glibc-debuginfo-2.34-40.el9_1.1.x86_64" package installed. > > * Now, from that kind of (useless) backtrace, I have four instances in this > test case log, in total. However, there's a different kind too (just one > instance): > > ==34448== Conditional jump or move depends on uninitialised value(s) > ==34448== at 0x484A608: strlen (vg_replace_strmem.c:495) > ==34448== by 0x5443D32: strdup (strdup.c:41) > ==34448== by 0x4F09819: guestfs_int_copy_string_list (in > /home/lacos/src/v2v/libguestfs/lib/.libs/libguestfs.so.0.513.0) > ==34448== by 0x4F091DD: guestfs_int_copy_environ (in > /home/lacos/src/v2v/libguestfs/lib/.libs/libguestfs.so.0.513.0) > ==34448== by 0x4EB6B67: run_command (in > /home/lacos/src/v2v/libguestfs/lib/.libs/libguestfs.so.0.513.0) > ==34448== by 0x4EB778D: guestfs_int_cmd_run (in > /home/lacos/src/v2v/libguestfs/lib/.libs/libguestfs.so.0.513.0) > ==34448== by 0x4EC7B10: qemu_img_supports_U_option (in > /home/lacos/src/v2v/libguestfs/lib/.libs/libguestfs.so.0.513.0) > ==34448== by 0x4EC775A: get_json_output (in > /home/lacos/src/v2v/libguestfs/lib/.libs/libguestfs.so.0.513.0) > ==34448== by 0x4EC745D: guestfs_impl_disk_format (in > /home/lacos/src/v2v/libguestfs/lib/.libs/libguestfs.so.0.513.0) > ==34448== by 0x4E8769C: guestfs_disk_format (in > /home/lacos/src/v2v/libguestfs/lib/.libs/libguestfs.so.0.513.0) > ==34448== by 0x3B2A67: guestfs_int_ocaml_disk_format (in > /home/lacos/src/v2v/virt-v2v/v2v/virt-v2v) > ==34448== by 0x31B9D6: camlGuestfs__fun_12954 (guestfs.ml:1186) > ==34448== by 0x334370: camlStdlib__list__map_233 (in > /home/lacos/src/v2v/virt-v2v/v2v/virt-v2v) > ==34448== by 0x2AE27A: camlInput_disk__detect_local_input_format_217 > (input_disk.ml:142) > ==34448== by 0x2ADE82: camlInput_disk__setup_216 (input_disk.ml:88) > ==34448== by 0x28E671: camlV2v__main_202 (v2v.ml:552) > ==34448== by 0x2DD3C1: camlTools_utils__run_main_and_handle_errors_510 > (tools_utils.ml:228) > ==34448== by 0x290D07: camlV2v__entry (v2v.ml:700) > ==34448== by 0x27FB28: caml_program (in > /home/lacos/src/v2v/virt-v2v/v2v/virt-v2v) > ==34448== by 0x41AD53: caml_start_program (in > /home/lacos/src/v2v/virt-v2v/v2v/virt-v2v) > ==34448== by 0x41B166: caml_startup_common (in > /home/lacos/src/v2v/virt-v2v/v2v/virt-v2v) > ==34448== by 0x41B1AC: caml_startup (in > /home/lacos/src/v2v/virt-v2v/v2v/virt-v2v) > ==34448== by 0x27F16F: main (in /home/lacos/src/v2v/virt-v2v/v2v/virt-v2v) > > Here all addresses seem to be resolved, even those that point into my locally > built libguestfs. What I don't understand however are the topmost two frames. > I *think* those come from valgrind itself! So is valgrind complaining > about... valgrind??? > > "vg_replace_strmem.c" is definitely a valgrind source file. I've cloned the > upstream git repo and checked -- it is "shared/vg_replace_strmem.c", and that > file has existed since November 2013. Yet, when I install > valgrind-debugsource and valgrind-debuginfo (matching the installed valgrind > version -- "valgrind-3.19.0-3.el9.x86_64"), *none* of the files in those > packages are "vg_replace_strmem.c". > > After downloading the SRPM from Brew and build-prepping it, I find, in > "shared/vg_replace_strmem.c": > > 476 /*---------------------- strlen ----------------------*/ > 477 > 478 // Note that this replacement often doesn't get used because gcc > inlines > 479 // calls to strlen() with its own built-in version. This can be very > 480 // confusing if you aren't expecting it. Other small functions in > 481 // this file may also be inline by gcc. > 482 > 483 #define STRLEN(soname, fnname) \ > 484 SizeT VG_REPLACE_FUNCTION_EZU(20070,soname,fnname) \ > 485 ( const char* str ); \ > 486 SizeT VG_REPLACE_FUNCTION_EZU(20070,soname,fnname) \ > 487 ( const char* str ) \ > 488 { \ > 489 SizeT i = 0; \ > 490 while (str[i] != 0) i++; \ > 491 return i; \ > 492 } > 493 > 494 #if defined(VGO_linux) > 495 STRLEN(VG_Z_LIBC_SONAME, strlen) > > So basically valgrind tries to preempt the strlen() symbol from glibc with > its own implementation. > > Then, "strdup.c" is not a valgrind source file, but I found it from the glibc > debug packages -- > "/usr/src/debug/glibc-2.34-40.el9_1.1.x86_64/string/strdup.c". (How > *incredibly* useful of valgrind *not* to print the *full* pathname of a > source file.) It goes like this: > > 37 /* Duplicate S, returning an identical malloc'd string. */ > 38 char * > 39 __strdup (const char *s) > 40 { > 41 size_t len = strlen (s) + 1; > 42 void *new = malloc (len); > 43 > 44 if (new == NULL) > 45 return NULL; > 46 > 47 return (char *) memcpy (new, s, len); > 48 } > > So guestfs_int_copy_string_list() calls strdup() calls strlen(), with strdup > coming from glibc and strlen coming from valgrind itself. And then valgrind > complains about its own strlen implementation (fun!), which is BTW an > incorrect complaint, because the *C-language* code at lines 488-492 is proper. > > This whole thing looks completely busted. I'll try to fool around with glibc > tunables. > > Laszlo -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top
test-suite.log.xz
Description: application/xz
_______________________________________________ Libguestfs mailing list Libguestfs@redhat.com https://listman.redhat.com/mailman/listinfo/libguestfs