On 6/19/23 13:18, Vincent MAILHOL wrote:
> On Fri. 16 juin 2023 at 16:34, Richard W.M. Jones <rjo...@redhat.com> wrote:
> (...)
>>> Last thing, the segfault on ldmtool [1] still seems a valid issue.
>>> Even if I now do have a workaround for my problem, that segfault might
>>> be worth a bit more investigation.
>>
>> Yes that does look like a real problem.  Does it crash if you just run
>> ldmtool as a normal command, nothing to do with libguestfs?  Might be
>> a good idea to try to get a stack trace of the crash.
> 
> The fact is that it only crashes with the UUID 65534 in the qemu VM. I
> am not sure what command line is passed to ldmtool for this crash to
> occur.
> 
> I can help to gather information, but my biggest issue is that I do
> not know how to interact with the VM under /tmp/.guestfs-1001/
> 
>   [    0.777352] ldmtool[164]: segfault at 0 ip 0000563a225cd6a5 sp
> 00007ffe54965a60 error 4 in ldmtool[563a225cb000+3000]
>                                         ^^^^ ^^^^^^^^^^^^^^^^^^^
> This smells like a NULL pointer dereference.

... Hey this is actually my line from an email I started writing earlier
today :) , but I then decided not to send it.

It certainly looks like a null pointer dereference, and if you
disassemble the instruction byte stream dump (the "Code:" line from the
kernel log) with (e.g.) ndisasm, that confirms it. You get something like

00000025  E8DBFDFFFF        call 0xfffffffffffffe05
0000002A  4C8B20            mov r12,[rax]              <---- crash
0000002D  4889442408        mov [rsp+0x8],rax
00000032  4C89E7            mov rdi,r12
00000035  E80BE1FFFF        call 0xffffffffffffe145

with the "mov r12,[rax]" instruction faulting (with the previously
called function presumably having returned 0 in rax). See the "<4c> 8b
20" substring in the "Code:" line -- the angle brackets point at the
first byte of the crashing instruction.

I didn't send the email ultimately because your email included a link
[1] pointing at a particular line number:

https://github.com/mdbooth/libldm/blob/master/src/ldmtool.c#L164

and so I assumed you actually traced the crash to that line.

Is that the case?

Or did you perhaps mistake *PID* 164 (from the kernel log) for the line
number?

> The instruction pointer
> being 563a225cd6a5, I installed libguestfs-tools-dbgsym and tried a:
> 
>   addr2line -e /usr/bin/ldmtool 564a892506a5
> 
> Results:
> 
>   ??:0
> 
> Without conviction, I also tried in GDB:
> 
>   $ gdb /usr/bin/ldmtool
>   (...)
>   Reading symbols from /usr/bin/ldmtool...
>   Reading symbols from
> /usr/lib/debug/.build-id/21/37b4a64903ebe427c242be08b8d496ba570583.debug...
>   (gdb) info line *0x564a892506a5
>   No line number information available for address 0x564a892506a5
> 
> Debug symbols are correctly installed but impossible to convert that
> instruction pointer into a line number. It is as if the ldmtool on my
> host and the ldmtool in the qemu VM were from a different build. I
> tried to mount /tmp/.guestfs-1001/appliance.d/root but that disk image
> did not contain ldmtool.
> 
> I am not sure how to generate a stack trace or a core dump within that
> qemu VM. If you can tell me how to get an interactive prompt (or any
> other guidance) I can try to collect more information.

The IP where the crash occurs is 0000563a225cd6a5. The ldmtool binary
(as opposed to a shared object / library) is mapped into the process's
address space at 563a225cb000, for a length of 0x3000 bytes. So the
offending instruction is supposed to be 0000563a225cd6a5 - 563a225cb000
= 26A5.

With the debug symbols installed, can you attach the output of

  objdump --headers --wide -S /usr/bin/ldmtool

?

Can you try

  addr2line -p -i -f -e /usr/bin/ldmtool 26A5

?

(This still may not be good enough; we might have to offset the
difference 0x26A5 with some address related to the .text section... The
objdump output should help us experiment.)

Laszlo
_______________________________________________
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

Reply via email to