Re: [Libguestfs] Libguestfs Failure on latest Ubuntu 22.04 LTS

Laszlo Ersek Mon, 20 Mar 2023 23:35:46 -0700

Hi Justin,

On 3/20/23 16:47, Justin Churchey wrote:
> Hello Laszlo,
> 
> Thank you for the rundown. I enabled the
> additional LIBGUESTFS_BACKEND_SETTINGS, and I have attached a follow up
> to the libguestfs-test-tool output.


Your computer has faulty RAM.

Your libguestfs-test-tool log file contains the following line (read it
very carefully):

  LIBGUESTFS_BANKEND_SETTINGS=force_tcg

I was staring my eyes out at your log, not understanding why the
"force_tcg" setting wouldn't take effect -- because it didn't, the log
file confirms the repeated test run still used KVM.

That was when I copied and pasted the above line (before the equal sign)
into a git-grep, and then a "git log -S". It turns out that "whatever"
variable name was captured in the libguestfs-test-tool log, libguestfs
never checks that variable, worse, libguestfs has *never* checked it
over its entire history.

So then I thought, "aha, Justin must have typed the variable name from
memory, instead of using the clipboard". But that's not possible: even
if you mistyped the variable name when setting the environment,
libguestfs-test-tool would not look for that (misnamed) variable, and
log it!

So the only explanation is that your RAM is faulty; a single character
in the variable name got corrupted in this instance (C -> N):

  LIBGUESTFS_BACKEND_SETTINGS
  LIBGUESTFS_BANKEND_SETTINGS
               ^

With faulty RAM, there's nothing more to investigate here; the guest
kernel crash (page fault) can be trivially explained by a pointer
getting corrupted and pointing into outer space.

I suggest running MemTest86 or MemTest86+.

(NB, faulty RAM is not as infrequent as one would think. In my life, if
I count right, this is actually the third occasion that I've determined
faulty RAM for a user -- not necessarily via the same program /
misbehavior, of course. Also I think a faulty disk is much less likely:
non-ECC RAM exists, but disks without redundancy checks don't /
shouldn't exist, as far as I know.)

Laszlo

> 
> I also checked out my CPU settings (cat /proc/cpuinfo output attached),
> and the host does appear to support PCLMULQDQ (AMD Ryzen 7 5700X).  I
> also checked the cpuinfo in one of the guests I have created (Ubuntu
> 18.04, unstable due to intermittent kernel panics), and the cpuinfo
> indicates that this feature seems to be passed down to my guest as well. 
> 
> I noticed that the libguestfs-test-tool didn't seem to like the qemu
> settings it tried to boot with.  So, I went back to basics and built a
> disk using qemu-img (qcow2) and utilized qemu-system-x86_64 to do the
> base install (Ubuntu 18.04).  The resulting image boots and I import the
> resulting image with virt-install. However, the GUI/console seems to
> want to lock up shortly after boot if I am using virt-tools.  The guest
> seems more stable when I boot it directly with `qemu-system,` and this
> may be my workaround for now. 
> 
> In virt-tools, I can consistently get a panic on the guest by trying to
> enable the qemu-guest-agent: `systemctl enable qemu-guest-agent.` 
> Unfortunately, I cannot get the full output from that panic (attached).
> It would seem that this problem is more than just libguestfs-tools. Is
> there a KVM listserv that this might be more appropriate for?
> 
> Sincerely,
> 
> On Mon, Mar 20, 2023 at 1:31 AM Laszlo Ersek <ler...@redhat.com
> <mailto:ler...@redhat.com>> wrote:
> 
>     On 3/17/23 16:10, Justin Churchey wrote:
>     > Hello Everyone,
>     >
>     > I was having some difficulties converting OVA images yesterday. At
>     > first, I thought it may have been a compatibility issue with
>     > VirtualBox 7.0.  However, when I went to run libguestfs-test-tool, it
>     > began failing with the exact same error as the conversions, which
>     > leads me to believe the issue may lie with libguestfs and not the
>     > images themselves.
>     >
>     > To test further, I created a fresh install of Ubuntu 22.04, and the
>     > libguestfs-test-tool seems to fail with the same error, even on a
>     > fresh install.  I am attaching the libguestfs-test-tool output for
>     > reference.
>     >
>     > Ubuntu 22.04 is running libguestfs-tools 1.46.2-10ubuntu3
>     >
>     > If anybody has any insight into the issue, or if you feel a bug report
>     > needs to be filed, please let me know.
> 
>     Your appliance kernel crashes.
> 
>     Here's my theory on why this might happen, based on your log.
> 
>     The guestfish appliance runs with KVM acceleration.
> 
>     The crash happens after/while inserting the modules crc32-pclmul.ko and
>     crct10dif-pclmul.ko.
> 
>     The "pclmul" in the names of those modules indicates that these modules
>     calculate various (crc32) checksums with the PCLMULQDQ instruction. I
>     believe that PCLMULQDQ is an advanced / accelerated instruction and not
>     all CPUs may support it.
> 
>     Your appliance guest is started with "-cpu max" on the QEMU command line
>     (from libguestfs commit 30f74f38bd6e, "appliance: Use -cpu max.",
>     2021-01-28). This is probably why the appliance kernel thinks PCLMULQDQ
>     is available.
> 
>     I think the PCLMULQDQ instruction may cause an issue here. I don't know
>     why it misbehaves under KVM, but that's my suspicion anyway.
> 
>     Note that the kernel crash log provides the following instruction
>     (assembly binary) dump:
> 
>     46 70 48 8b 56 68 48 03 97 90 01 00 00 48 c1 e0 06 48 03 46 20 48 89 97
>     08 02 00 00 48 be ab aa aa aa aa aa aa aa 48 8b 48 10 <48> 89 0a 48 8b
>     50 20 48 8b 8f 08 02 00 00 48 89 d0 48 f7 e6 48 c1
> 
>     with the instruction starting at <48> causing the page fault, as the
>     direct symptom. Now, we can disassemble this:
> 
>     printf \
>       '%b' \
>      
>     
> '\x46\x70\x48\x8b\x56\x68\x48\x03\x97\x90\x01\x00\x00\x48\xc1\xe0\x06\x48\x03\x46\x20\x48\x89\x97\x08\x02\x00\x00\x48\xbe\xab\xaa\xaa\xaa\xaa\xaa\xaa\xaa\x48\x8b\x48\x10\x48\x89\x0a\x48\x8b\x50\x20\x48\x8b\x8f\x08\x02\x00\x00\x48\x89\xd0\x48\xf7\xe6\x48\xc1'
>  \
>       > bin
> 
>     $ ndisasm -b64 bin
> 
>     00000000  467048            jo 0x4b
>     00000003  8B5668            mov edx,[rsi+0x68]
>     00000006  48039790010000    add rdx,[rdi+0x190]
>     0000000D  48C1E006          shl rax,byte 0x6
>     00000011  48034620          add rax,[rsi+0x20]
>     00000015  48899708020000    mov [rdi+0x208],rdx
>     0000001C  48BEABAAAAAAAAAA  mov rsi,0xaaaaaaaaaaaaaaab
>              -AAAA
>     00000026  488B4810          mov rcx,[rax+0x10]
>     0000002A  48890A            mov [rdx],rcx        <----------- crash
>     0000002D  488B5020          mov rdx,[rax+0x20]
>     00000031  488B8F08020000    mov rcx,[rdi+0x208]
>     00000038  4889D0            mov rax,rdx
>     0000003B  48F7E6            mul rsi
>     0000003E  48                rex.w
>     0000003F  C1                db 0xc1
> 
>     Note the constant 0xaaaaaaaaaaaaaaab; that seems very special. We can
>     search the kernel tree for it (I'm not bothering about checking out the
>     particular ubuntu kernel version for now):
> 
>     $ git grep -i aaaaaaaaaaaaaaab
>     arch/x86/math-emu/poly_atan.c:/*  0xaaaaaaaaaaaaaaabLL,  transferred
>     to fixedpterm[] */
>     arch/x86/math-emu/poly_sin.c:   0xaaaaaaaaaaaaaaabLL,
>     arch/x86/math-emu/poly_tan.c:static const unsigned long long
>     twothirds = 0xaaaaaaaaaaaaaaabLL;
> 
>     In particular, in the last file (poly_tan.c) contains a snippet like
> 
>             mul64_Xsig(&accum, &twothirds);
> 
>     which seems vagely related to
> 
>     0000001C  48BEABAAAAAAAAAA  mov rsi,0xaaaaaaaaaaaaaaab
>              -AAAA
>     ...
>     0000003B  48F7E6            mul rsi
> 
>     Now this does not seem connected to PCLMULQDQ, but it does somehow look
>     connected to multiplication.
> 
>     I don't really know where to go with this, except for asking KVM
>     experts.
> 
>     For now, can you try:
> 
>       export LIBGUESTFS_BACKEND_SETTINGS=force_tcg
> 
>     from <https://libguestfs.org/guestfs.3.html#backend-settings
>     <https://libguestfs.org/guestfs.3.html#backend-settings>>, and see
>     if that makes a difference?
> 
>     Laszlo
> 

_______________________________________________
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

Re: [Libguestfs] Libguestfs Failure on latest Ubuntu 22.04 LTS

Reply via email to