Overview
=========

Implemented Firmware Assisted Dump (fadump) on PSeries machine in QEMU.

Fadump is an alternative dump mechanism to kdump, in which we the firmware
does a memory preserving boot, and the second/crashkernel is booted fresh
like a normal system reset, instead of the crashed kernel loading the
second/crashkernel in case of kdump.

This requires implementing the "ibm,configure-kernel-dump" RTAS call in
QEMU.

While booting with fadump=on, Linux will register fadump memory regions.

Some memory regions like Real Mode Memory regions, and custom memory
regions declared by OS basically require copying the requested memory
range to a destination

While other memory regions are populated by the firmware/platform (QEMU in
this case), such as CPU State Data and HPTE.
We pass the sizes for these data segment to the kernel as it needs to know
how much memory to reserve (ibm,configure-kernel-dump-sizes).

Then after a crash, once Linux does a OS terminate call, we trigger fadump
if fadump was registered.

Implementing the fadump boot as:
    * pause all vcpus (will save registers later)
    * preserve memory regions specified by fadump
    * do a memory preserving reboot (using GUEST_RESET as it doesn't clear
      the memory)

And then we pass a metadata (firmware memory structure) as
"ibm,kernel-dump" in the device tree, containing all details of the
preserved memory regions to the kernel.

Testing
=======

Has been tested with following QEMU options:

* firmware: x-vof and SLOF
* tcg & kvm
* l1 guest and l2 guest
* with/without smp
* cma/nocma
* default crashkernel values and crashkernel=1G

Logs of a linux boot with firmware assisted dump:

    ./build/qemu-system-ppc64 -M pseries,x-vof=on --cpu power10 --smp 4 -m 4G 
-kernel some-vmlinux -initrd some-initrd -append "debug fadump=on 
crashkernel=1G" -nographic
    [    0.000000] random: crng init done
    [    0.000000] fadump: Reserved 1024MB of memory at 0x00000040000000 
(System RAM: 4096MB)
    ...
    [    1.084686] rtas fadump: Registration is successful!
    ...
    # cat /sys/kernel/debug/powerpc/fadump_region
    CPU :[0x00000040000000-0x000000400013d3] 0x13d4 bytes, Dumped: 0x0
    HPTE:[0x000000400013d4-0x000000400013d3] 0x0 bytes, Dumped: 0x0
    DUMP: Src: 0x00000000000000, Dest: 0x00000040010000, Size: 0x40000000, 
Dumped: 0x0 bytes

    [0x000000fffff800-0x000000ffffffff]: cmdline append: ''
    # echo c > /proc/sysrq-trigger

The fadump boot after crash:

    [    0.000000] rtas fadump: Firmware-assisted dump is active.
    [    0.000000] fadump: Updated cmdline: debug fadump=on crashkernel=1G
    [    0.000000] fadump: Firmware-assisted dump is active.
    [    0.000000] fadump: Reserving 3072MB of memory at 0x00000040000000 for 
preserving crash data
    ....
    # file /proc/vmcore
    /proc/vmcore: ELF 64-bit LSB core file, 64-bit PowerPC or cisco 7500, 
OpenPOWER ELF V2 ABI, version 1 (SYSV), SVR4-style

Analysing the vmcore with crash-utility:

          KERNEL: vmlinux-6.14-rc2
        DUMPFILE: vmcore-a64dcfb451e2-nocma
            CPUS: 4
            DATE: Thu Jan  1 05:30:00 IST 1970
          UPTIME: 00:00:30
    LOAD AVERAGE: 0.74, 0.21, 0.07
           TASKS: 94
        NODENAME: buildroot
         RELEASE: 6.14.0-rc2+
         VERSION: #1 SMP Wed Feb 12 06:49:59 CST 2025
         MACHINE: ppc64le  (1000 Mhz)
          MEMORY: 4 GB
           PANIC: "Kernel panic - not syncing: sysrq triggered crash"
             PID: 270
         COMMAND: "sh"
            TASK: c000000009e7cc00  [THREAD_INFO: c000000009e7cc00]
             CPU: 3
           STATE: TASK_RUNNING (PANIC)



Git Tree for Testing
====================

https://github.com/adi-g15-ibm/qemu/tree/fadump-pseries-v1

Known Issues
============

* CPU register saving seems to have cases where it's showing all registers
with the same value

* The implementation doesn't pass all the registers mentioned in PAPR since
  QEMU doesn't implement them/doesn't need them.
  The linux kernel uses only 9 of the 45 registers we are passing in QEMU.

Aditya Gupta (6):
  hw/ppc: Implement skeleton code for fadump in PSeries
  hw/ppc: Trigger Fadump boot if fadump is registered
  hw/ppc: Preserve memory regions registered for fadump
  hw/ppc: Implement saving CPU state in Fadump
  hw/ppc: Pass device tree properties for Fadump
  hw/ppc: Enable Fadump for PSeries

 hw/ppc/spapr.c         |  62 ++++++
 hw/ppc/spapr_rtas.c    | 456 +++++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr.h | 172 +++++++++++++++-
 3 files changed, 689 insertions(+), 1 deletion(-)

-- 
2.48.1


Reply via email to