On Fri, Feb 07, 2025 at 04:08:08PM +0800, Coiby Xu wrote:
LUKS is the standard for Linux disk encryption, widely adopted by users,
and in some cases, such as Confidential VMs, it is a requirement. With
kdump enabled, when the first kernel crashes, the system can boot into
the kdump/crash kernel to dump the memory image (i.e., /proc/vmcore)
to a specified target. However, there are two challenges when dumping
vmcore to a LUKS-encrypted device:
- Kdump kernel may not be able to decrypt the LUKS partition. For some
machines, a system administrator may not have a chance to enter the
password to decrypt the device in kdump initramfs after the 1st kernel
crashes; For cloud confidential VMs, depending on the policy the
kdump kernel may not be able to unseal the keys with TPM and the
console virtual keyboard is untrusted.
- LUKS2 by default use the memory-hard Argon2 key derivation function
which is quite memory-consuming compared to the limited memory reserved
for kdump. Take Fedora example, by default, only 256M is reserved for
systems having memory between 4G-64G. With LUKS enabled, ~1300M needs
to be reserved for kdump. Note if the memory reserved for kdump can't
be used by 1st kernel i.e. an user sees ~1300M memory missing in the
1st kernel.
Besides users (at least for Fedora) usually expect kdump to work out of
the box i.e. no manual password input or custom crashkernel value is
needed. And it doesn't make sense to derivate the keys again in kdump
kernel which seems to be redundant work.
This patch set addresses the above issues by making the LUKS volume keys
persistent for kdump kernel with the help of cryptsetup's new APIs
(--link-vk-to-keyring/--volume-key-keyring). Here is the life cycle of
the kdump copies of LUKS volume keys,
1. After the 1st kernel loads the initramfs during boot, systemd
use an user-input passphrase to de-crypt the LUKS volume keys
or TPM-sealed key and then save the volume keys to specified keyring
(using the --link-vk-to-keyring API) and the key will expire within
specified time.
2. A user space tool (kdump initramfs loader like kdump-utils) create
key items inside /sys/kernel/config/crash_dm_crypt_keys to inform
the 1st kernel which keys are needed.
3. When the kdump initramfs is loaded by the kexec_file_load
syscall, the 1st kernel will iterate created key items, save the
keys to kdump reserved memory.
4. When the 1st kernel crashes and the kdump initramfs is booted, the
kdump initramfs asks the kdump kernel to create a user key using the
key stored in kdump reserved memory by writing yes to
/sys/kernel/crash_dm_crypt_keys/restore. Then the LUKS encrypted
device is unlocked with libcryptsetup's --volume-key-keyring API.
5. The system gets rebooted to the 1st kernel after dumping vmcore to
the LUKS encrypted device is finished
After libcryptsetup saving the LUKS volume keys to specified keyring,
whoever takes this should be responsible for the safety of these copies
of keys. The keys will be saved in the memory area exclusively reserved
for kdump where even the 1st kernel has no direct access. And further
more, two additional protections are added,
- save the copy randomly in kdump reserved memory as suggested by Jan
- clear the _PAGE_PRESENT flag of the page that stores the copy as
suggested by Pingfan
This patch set only supports x86. There will be patches to support other
architectures once this patch set gets merged.
I'm not sure what's the problem here but I can reliably trigger a kernel
panic on a qemu VM + custom kernel (compiled from
bb066fe812d6fb3a9d01c073d9f1e2fd5a63403b + your patches).
When I configure the crash configfs and call kexec in a systemd service
using ExecStart=, the panic occurs when I start the service:
~ # cat /etc/systemd/system/my-kexec.service
[Unit]
Description=kexec loading for the crash capture kernel
[Service]
Type=oneshot
ExecStart=/usr/bin/mkdir /sys/kernel/config/crash_dm_crypt_keys/mykey
ExecStart=/usr/bin/echo cryptsetup:mykey
>/sys/kernel/config/crash_dm_crypt_keys/mykey/description
ExecStart=/usr/host/bin/kexec --debug --load-panic /linux-hv --initrd
/crash-initrd
KeyringMode=shared
[Install]
WantedBy=default.target
Starting the service:
~ # systemctl start my-kexec.service
kexec_file: kernel: 00000000ace85dcc kernel_size: 0x16e3000
crash_core: Crash PT_LOAD ELF header. phdr=00000000d08940fa
vaddr=0xffff888000100000, paddr=0x100000, sz=0x700000 e_phnum=11
p_offset=0x100000
crash_core: Crash PT_LOAD ELF header. phdr=00000000304ef570
vaddr=0xffff888000808000, paddr=0x808000, sz=0x3000 e_phnum=12
p_offset=0x808000
crash_core: Crash PT_LOAD ELF header. phdr=000000000275e248
vaddr=0xffff88800080c000, paddr=0x80c000, sz=0x5000 e_phnum=13
p_offset=0x80c000
crash_core: Crash PT_LOAD ELF header. phdr=000000004e47ca09
vaddr=0xffff888000900000, paddr=0x900000, sz=0xa5700000 e_phnum=14
p_offset=0x900000
crash_core: Crash PT_LOAD ELF header. phdr=00000000e56c8350
vaddr=0xffff8880b6000000, paddr=0xb6000000, sz=0x7d51018 e_phnum=15
p_offset=0xb6000000
crash_core: Crash PT_LOAD ELF header. phdr=0000000099d67ff3
vaddr=0xffff8880bdd51018, paddr=0xbdd51018, sz=0x27440 e_phnum=16
p_offset=0xbdd51018
crash_core: Crash PT_LOAD ELF header. phdr=00000000461a2f21
vaddr=0xffff8880bdd78458, paddr=0xbdd78458, sz=0xbc0 e_phnum=17
p_offset=0xbdd78458
crash_core: Crash PT_LOAD ELF header. phdr=0000000058149b54
vaddr=0xffff8880bdd79018, paddr=0xbdd79018, sz=0x9a40 e_phnum=18
p_offset=0xbdd79018
crash_core: Crash PT_LOAD ELF header. phdr=000000001e30ff2c
vaddr=0xffff8880bdd82a58, paddr=0xbdd82a58, sz=0xdbc5a8 e_phnum=19
p_offset=0xbdd82a58
crash_core: Crash PT_LOAD ELF header. phdr=00000000e67a9768
vaddr=0xffff8880bec00000, paddr=0xbec00000, sz=0xaed000 e_phnum=20
p_offset=0xbec00000
crash_core: Crash PT_LOAD ELF header. phdr=000000005909c4c6
vaddr=0xffff8880bf9ff000, paddr=0xbf9ff000, sz=0x453000 e_phnum=21
p_offset=0xbf9ff000
crash_core: Crash PT_LOAD ELF header. phdr=00000000473d74ef
vaddr=0xffff8880bfe58000, paddr=0xbfe58000, sz=0x64000 e_phnum=22
p_offset=0xbfe58000
crash_core: Crash PT_LOAD ELF header. phdr=00000000abde8123
vaddr=0xffff888100000000, paddr=0x100000000, sz=0x23f000000 e_phnum=23
p_offset=0x100000000
crash_core: Crash PT_LOAD ELF header. phdr=00000000bda3e0bf
vaddr=0xffff88843f000000, paddr=0x43f000000, sz=0x1000000 e_phnum=24
p_offset=0x43f000000
kexec: Loaded ELF headers at 0x33f000000 bufsz=0x1000 memsz=0xe1000
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP NOPTI
CPU: 5 UID: 0 PID: 3812 Comm: kexec Not tainted 6.14.0-rc1+ #20
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 2025.02-6
04/08/2025
RIP: 0010:sized_strscpy+0x71/0x150
Code: b9 80 80 80 80 80 80 80 80 48 c1 e8 03 48 8d 1c c5 08 00 00 00 31 c0 eb
11
48 89 34 07 48 83 c0 08 48 39 d8 0f 84 83 00 00 00 <49> 8b 34 00 4a 8d 14 1e 49
89 f2 49 f7 d2 4c 21 d2 4c 8d 14 07 4c
RSP: 0018:ffffc9000420fc68 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000080 RCX: 0000000000000080
RDX: 0000000000000080 RSI: 0000000000000000 RDI: ffff8881030ec808
RBP: ffff888109724000 R08: 0000000000000000 R09: 8080808080808080
R10: ffffc9000420fc78 R11: fefefefefefefeff R12: ffffc90004219000
R13: ffff888104a80000 R14: 0000000000000008 R15: 0000000000000000
FS: 00007f09ea73f740(0000) GS:ffff88843fc80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000120760002 CR4: 0000000000772ef0
PKRU: 55555554
Call Trace:
<TASK>
? __die+0x23/0x60
? page_fault_oops+0x177/0x510
? _prb_read_valid+0x2e7/0x370
? exc_page_fault+0x6f/0x130
? asm_exc_page_fault+0x26/0x30
? sized_strscpy+0x71/0x150
crash_load_dm_crypt_keys+0x1bc/0x370
bzImage64_load+0x41b/0xa30
__do_sys_kexec_file_load+0x2af/0x8a0
do_syscall_64+0x4b/0x110
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f09ea848d6d
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48
89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01
c3 48 8b 0d 6b 70 0d 00 f7 d8 64 89 01 48
RSP: 002b:00007fff8cf979e8 EFLAGS: 00000206 ORIG_RAX: 0000000000000140
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f09ea848d6d
RDX: 0000000000000001 RSI: 0000000000000004 RDI: 0000000000000003
RBP: 0000000000000003 R08: 000000000000000a R09: 00007fff8cf97a10
R10: 000055de70eee9a0 R11: 0000000000000206 R12: 0000000000000003
R13: 00007fff8cf97d08 R14: 000055de4c336448 R15: 0000000000000004
</TASK>
CR2: 0000000000000000
---[ end trace 0000000000000000 ]---
RIP: 0010:sized_strscpy+0x71/0x150
Code: b9 80 80 80 80 80 80 80 80 48 c1 e8 03 48 8d 1c c5 08 00 00 00 31 c0 eb
11 48 89 34 07 48 83 c0 08 48 39 d8 0f 84 83 00 00 00 <49> 8b 34 00 4a 8d 14 1e
49 89 f2 49 f7 d2 4c 21 d2 4c 8d 14 07 4c
RSP: 0018:ffffc9000420fc68 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000080 RCX: 0000000000000080
RDX: 0000000000000080 RSI: 0000000000000000 RDI: ffff8881030ec808
RBP: ffff888109724000 R08: 0000000000000000 R09: 8080808080808080
R10: ffffc9000420fc78 R11: fefefefefefefeff R12: ffffc90004219000
R13: ffff888104a80000 R14: 0000000000000008 R15: 0000000000000000
FS: 00007f09ea73f740(0000) GS:ffff88843fc80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000120760002 CR4: 0000000000772ef0
PKRU: 55555554
Kernel panic - not syncing: Fatal exception
Kernel Offset: disabled
Calling a script that does the same thing works fine and loads the keys
correctly:
[Service]
ExecStart=/root/kexec.sh
~ # cat /root/kexec.sh
#!/bin/bash
mkdir /sys/kernel/config/crash_dm_crypt_keys/mykey
echo cryptsetup:mykey > /sys/kernel/config/crash_dm_crypt_keys/mykey/description
/usr/host/bin/kexec --debug --load-panic /linux-hv --initrd /crash-initrd
If that's any help, my crypttab:
~ # cat /etc/crypttab
root UUID=8001fca4-2e54-48e9-9235-031c19fc6e36 none
luks,link-volume-key=@u::%logon:cryptsetup:mykey
If you can't reproduce, I can help track this. Just let me know if you need
any help.