Public bug reported:

linux-image-5.13.0-39-generic:
  Installed: 5.13.0-39.44~20.04.1

Description:    Ubuntu 20.04.1 LTS
Release:        20.04

I use qemu to run short lived Linux VMs as part of a CI pipeline, using
nested KVM on Intel CPUs. With good probability, one of the qemu
processes managing the VMs exits without any output. I've been able to
track the behaviour to L1 qemu receiving KVM_EXIT_SHUTDOWN from KVM_RUN
ioctl:

    ...
    15268@1647341556.924605:kvm_run_exit cpu_index 0, reason 2
    15268@1647341556.928341:kvm_run_exit cpu_index 0, reason 8

    on QEMU emulator version 4.2.1 (Debian 1:4.2-3ubuntu6.21)

Digging deeper, I managed to capture the following trace from the L1
kernel (via perf record -a -e "kvm:*"):

    ...
    [001]   770.850287:                      kvm:kvm_entry: vcpu 0, rip 0x100146
    [001]   770.850307:                       kvm:kvm_exit: vcpu 0 reason 
TRIPLE_FAULT rip 0x100146 info1 0x0000000000000000 info2 0x0000000000000000 
intr_info 0x00000000 error_code 0x00000000
    [001]   770.850313:                        kvm:kvm_fpu: unload
    [001]   770.850316:             kvm:kvm_userspace_exit: reason 
KVM_EXIT_SHUTDOWN (8)

   on Linux 5.13.0-30-generic #33~20.04.1-Ubuntu SMP Mon Feb 7 14:25:10
UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Immediately prior to the triple fault there are a bunch of
EXTERNAL_INTERRUPT and reads / writes of MSRs and CRs. The crash seems
independent of the Linux version running in L2, I see it across a bunch
of LTS kernels. Unfortunately I don't know which version of Linux /
Ubuntu is in L0.

I've tried to reproduce on other machines I have access to, without much
luck. I've also tried to make sense of rip 0x100146 on my own, but I
don't understand x86 / qemu boot enough. Finally, I've tried looking at
commits to KVM between 5.13 and master that mention TRIPLE_FAULT, but
nothing rang a bell.

I've put traces from two failed executions + lscpu at 
https://gist.github.com/lmb/c36479bb67f397ba08319b5e0f752386
For completeness sake, you can see the failing CI runs at 
https://ebpf.semaphoreci.com/branches/317c3f18-4de0-488b-af6d-2a1fa0967f87

I've tried to get help with this issue via k...@vger.kernel.org but had
no luck. See
https://lore.kernel.org/kvm/95c1dc01-4aa0-46a6-95b1-bbc62588a...@www.fastmail.com/

** Affects: linux-meta-hwe-5.13 (Ubuntu)
     Importance: Undecided
         Status: New

** Package changed: ubuntu => linux-meta-hwe-5.13 (Ubuntu)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1970034

Title:
  Intel nested KVM exits L2 due to TRIPLE_FAULT

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-meta-hwe-5.13/+bug/1970034/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to