FYI, the glibc bug is not
https://sourceware.org/bugzilla/show_bug.cgi?id=28784; instead, it's Bug
30037 - glibc 2.34 and newer segfault if CPUID leaf 0x2 reports zero
(https://sourceware.org/bugzilla/show_bug.cgi?id=30037)

** Bug watch added: Sourceware.org Bugzilla #28784
   https://sourceware.org/bugzilla/show_bug.cgi?id=28784

** Bug watch added: Sourceware.org Bugzilla #30037
   https://sourceware.org/bugzilla/show_bug.cgi?id=30037

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure in Ubuntu.
https://bugs.launchpad.net/bugs/2003714

Title:
  Azure: TDX enabled hyper-visors cause segfault

Status in linux-azure package in Ubuntu:
  In Progress

Bug description:
  SRU Justification

  [Impact]

  Microsoft TDX enabled hyper visors cause a segfault due to an upstream
  glibc bug. This can be worked around with a kernel patch.

  Issue Description:

  When I start an Intel TDX Ubuntu 22.04 (or RHEL 9.0) guest on Hyper-V,
  the guest always hits segfaults and can’t boot up. Here the kernel
  running in the guest is the upstream kernel + my TDX patchset, or the
  5.19.0-azure kernel + the same TDX patchset:

  [Fix]

  We confirmed the segfault also happens to TDX guests on the KVM
  hypervisor. After I checked with more Intel folks, it turns out this
  is indeed a glibc bug
  (https://sourceware.org/bugzilla/show_bug.cgi?id=28784), which has
  been fixed in the upsteram glibc, but Ubuntu 22.04 and newer haven’t
  picked up the glibc fix yet.

  I got a kernel side temporary workarouond from Intel:
  https://github.com/dcui/tdx/commit/16218cf73491e867fd39c16c9e4b8aa926cbda68,
  which is on the same existing branch “decui/upstream-
  kinetic-22.10/master-next/1209”.

  [   21.081453] Run /inits init process
  [   21.086896]   with arguments:
  [   21.095790]     /init
  [   21.100982]   with environment:
  [   21.106611]     HOME=/
  [   21.112463]     TERM=linux
  [   21.119850]     BOOT_IMAGE=/boot/vmlinuz-6.1.0-rc7-decui+

  Loading, please wait...

  Starting version 249.11-0ubuntu3.6

  [   21.253908] udevadm[144]: segfault at 56538d61e0c0 ip 00007f8f5899efeb sp 
00007ffd08fb7648 error 6 in libc.so.6[7f8f58820000+195000] likely on CPU 0 
(core 0, socket 0)
  [   21.316549] Code: 07 62 e1 7d 48 e7 4f 01 62 e1 7d 48 e7 67 40 62 e1 7d 48 
e7 6f 41 62 61 7d 48 e7 87 00 20 00 00 62 61 7d 48 e7 8f 40 20 00 00 <62> 61 7d 
48 e7 a7 00 30 00 00 62 61 7d 48 e7 af 40 30 00 00 48 83

  Segmentation fault

  [   22.499317] setfont[153]: segfault at 55ef3b91b000 ip 00007f5899899fa4 sp 
00007ffc8008f628 error 4 in libc.so.6[7f589971b000+195000] likely on CPU 0 
(core 0, socket 0)
  [   22.602677] Code: 06 62 e1 fe 48 6f 4e 01 62 e1 fe 48 6f 66 40 62 e1 fe 48 
6f 6e 41 62 61 fe 48 6f 86 00 20 00 00 62 61 fe 48 6f 8e 40 20 00 00 <62> 61 fe 
48 6f a6 00 30 00 00 62 61 fe 48 6f ae 40 30 00 00 48 83
  [   22.732413] loadkeys[156]: segfault at 563ffe292000 ip 00007fbff957afa4 sp 
00007ffe31453808 error 4 in libc.so.6[7fbff93fc000+195000] likely on CPU 0 
(core 0, socket 0)
  [   22.833061] Code: 06 62 e1 fe 48 6f 4e 01 62 e1 fe 48 6f 66 40 62 e1 fe 48 
6f 6e 41 62 61 fe 48 6f 86 00 20 00 00 62 61 fe 48 6f 8e 40 20 00 00 <62> 61 fe 
48 6f a6 00 30 00 00 62 61 fe 48 6f ae 40 30 00 00 48 83

  The segfault only happens to recent glibc versions (e.g. v2.35 in
  Ubuntu 22.04, and v2.34 in RHEL 9.0). It doesn’t happens to v2.31 in
  Ubuntu 20.04, or v2.32 in Ubuntu 20.10. So something in glibc must
  have changed between v2.32 (good) and 2.34+ (not working for TDX). The
  oddity is: when I run the same Ubuntu 22.04/RHEL 9.0 image as a
  regular non-TDX guest, the segfault never happens.

  If I boot up a Ubuntu 20.04 TDX guest (which works fine), mount a
  Ubuntu 22.04 VHD image (“mount /dev/sdd1 /mnt”) and try to run “chroot
  /mnt”, I hit the same segfault:

  [  109.478556] EXT4-fs (sdd1): mounted filesystem with ordered data mode. 
Quota mode: none.
  [  129.224444] bash[2112]: segfault at 556987854000 ip 00007f88468c4ea4 sp 
00007ffc22ecf158 error 6 in libc.so.6[7f8846828000+195000] likely on CPU 48 
(core 0, socket 48)
  [  129.242434] Code: e7 bf 30 10 00 00 66 44 0f e7 87 00 20 00 00 66 44 0f e7 
8f 10 20 00 00 66 44 0f e7 97 20 20 00 00 66 44 0f e7 9f 30 20 00 00 <66> 44 0f 
e7 a7 00 30 00 00 66 44 0f e7 af 10 30 00 00 66 44 0f e7

  It looks like the application is referencing a memory location that
  somehow triggers a page fault, which is converted to a sigal SIGSEGV,
  which causes a segfault and terminates the application (I’m not sure
  where the below “movntdq” instructions come from):

  root@decui-u2004-u28:/opt/linus-0824# echo 'Code: e7 bf 30 10 00 00 66
  44 0f e7 87 00 20 00 00 66 44 0f e7 8f 10 20 00 00 66 44 0f e7 97 20
  20 00 00 66 44 0f e7 9f 30 20 00 00 <66> 44 0f e7 a7 00 30 00 00 66 44
  0f e7 af 10 30 00 00 66 44 0f e7' | scripts/decodecode

  Code: e7 bf 30 10 00 00 66 44 0f e7 87 00 20 00 00 66 44 0f e7 8f 10
  20 00 00 66 44 0f e7 97 20 20 00 00 66 44 0f e7 9f 30 20 00 00 <66> 44
  0f e7 a7 00 30 00 00 66 44 0f e7 af 10 30 00 00 66 44 0f e7

  All code
  ========
     0:   e7 bf                   out    %eax,$0xbf
     2:   30 10                   xor    %dl,(%rax)
     4:   00 00                   add    %al,(%rax)
     6:   66 44 0f e7 87 00 20    movntdq %xmm8,0x2000(%rdi)
     d:   00 00
     f:   66 44 0f e7 8f 10 20    movntdq %xmm9,0x2010(%rdi)
    16:   00 00
    18:   66 44 0f e7 97 20 20    movntdq %xmm10,0x2020(%rdi)
    1f:   00 00
    21:   66 44 0f e7 9f 30 20    movntdq %xmm11,0x2030(%rdi)
    28:   00 00
    2a:*  66 44 0f e7 a7 00 30    movntdq %xmm12,0x3000(%rdi)             
  <-- trapping instruction

    31:   00 00
    33:   66 44 0f e7 af 10 30    movntdq %xmm13,0x3010(%rdi)
    3a:   00 00
    3c:   66                      data16
    3d:   44                      rex.R
    3e:   0f                      .byte 0xf
    3f:   e7                      .byte 0xe7

  Code starting with the faulting instruction

  ===========================================

     0:   66 44 0f e7 a7 00 30    movntdq %xmm12,0x3000(%rdi)
     7:   00 00
     9:   66 44 0f e7 af 10 30    movntdq %xmm13,0x3010(%rdi)
    10:   00 00
    12:   66                      data16
    13:   44                      rex.R
    14:   0f                      .byte 0xf
    15:   e7                      .byte 0xe7

  After I add a delay of “sleep 2 minutes” in the kernel’s
  arch/x86/mm/fault.c: show_signal_msg(), it turns out somehow the
  application is trying to write to the end of the heap area (which
  doesn’t seem to be mapped in the process’s address space), and the
  segfault is triggered:

  [  129.224444] bash[2112]: segfault at 556987854000 ip
  00007f88468c4ea4 sp 00007ffc22ecf158 error 6 in
  libc.so.6[7f8846828000+195000] likely on CPU 48 (core 0, socket 48)

  root@decui-u2004-u28:/proc/2112# cat maps

  5569874a9000-5569874d8000 r--p 00000000 08:31 1582                       
/mnt/usr/bin/bash
  5569874d8000-5569875b7000 r-xp 0002f000 08:31 1582                       
/mnt/usr/bin/bash
  5569875b7000-5569875f1000 r--p 0010e000 08:31 1582                       
/mnt/usr/bin/bash
  5569875f2000-5569875f6000 r--p 00148000 08:31 1582                       
/mnt/usr/bin/bash
  5569875f6000-5569875ff000 rw-p 0014c000 08:31 1582                       
/mnt/usr/bin/bash
  5569875ff000-55698760a000 rw-p 00000000 00:00 0
  556987833000-556987854000 rw-p 00000000 00:00 0                          

  [heap]
  7f8846400000-7f88466e9000 r--p 00000000 08:31 6124                       
/mnt/usr/lib/locale/locale-archive
  7f8846800000-7f8846828000 r--p 00000000 08:31 4966                       
/mnt/usr/lib/x86_64-linux-gnu/libc.so.6
  7f8846828000-7f88469bd000 r-xp 00028000 08:31 4966                       
/mnt/usr/lib/x86_64-linux-gnu/libc.so.6
  7f88469bd000-7f8846a15000 r--p 001bd000 08:31 4966                       
/mnt/usr/lib/x86_64-linux-gnu/libc.so.6
  7f8846a15000-7f8846a19000 r--p 00214000 08:31 4966                       
/mnt/usr/lib/x86_64-linux-gnu/libc.so.6
  7f8846a19000-7f8846a1b000 rw-p 00218000 08:31 4966                       
/mnt/usr/lib/x86_64-linux-gnu/libc.so.6
  7f8846a1b000-7f8846a28000 rw-p 00000000 00:00 0
  7f8846b09000-7f8846b10000 r--s 00000000 08:31 3841                       
/mnt/usr/lib/x86_64-linux-gnu/gconv/gconv-modules.cache
  7f8846b10000-7f8846b13000 rw-p 00000000 00:00 0
  7f8846b13000-7f8846b21000 r--p 00000000 08:31 4729                       
/mnt/usr/lib/x86_64-linux-gnu/libtinfo.so.6.3
  7f8846b21000-7f8846b32000 r-xp 0000e000 08:31 4729                       
/mnt/usr/lib/x86_64-linux-gnu/libtinfo.so.6.3
  7f8846b32000-7f8846b40000 r--p 0001f000 08:31 4729                       
/mnt/usr/lib/x86_64-linux-gnu/libtinfo.so.6.3
  7f8846b40000-7f8846b44000 r--p 0002c000 08:31 4729                       
/mnt/usr/lib/x86_64-linux-gnu/libtinfo.so.6.3
  7f8846b44000-7f8846b45000 rw-p 00030000 08:31 4729                       
/mnt/usr/lib/x86_64-linux-gnu/libtinfo.so.6.3
  7f8846b4b000-7f8846b4d000 rw-p 00000000 00:00 0
  7f8846b4d000-7f8846b4f000 r--p 00000000 08:31 4960                       
/mnt/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
  7f8846b4f000-7f8846b79000 r-xp 00002000 08:31 4960                       
/mnt/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
  7f8846b79000-7f8846b84000 r--p 0002c000 08:31 4960                       
/mnt/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
  7f8846b85000-7f8846b87000 r--p 00037000 08:31 4960                       
/mnt/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
  7f8846b87000-7f8846b89000 rw-p 00039000 08:31 4960                       
/mnt/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
  7ffc22eb1000-7ffc22ed2000 rw-p 00000000 00:00 0                          

  [stack]

  7ffc22fcd000-7ffc22fd1000 r--p 00000000 00:00 0                          
[vvar]
  7ffc22fd1000-7ffc22fd3000 r-xp 00000000 00:00 0                          
[vdso]

  [Test Plan]

  Microsoft tested

  [Where things could go wrong]

  TDX is a new feature and is unlikely to have regressions.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/2003714/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to