With FADump support now available on both pseries and OPAL platforms, update FADump documentation with these details. Also, update about backup area and why it is used.
Signed-off-by: Hari Bathini <hbath...@linux.ibm.com> --- Documentation/powerpc/firmware-assisted-dump.txt | 102 ++++++++++++++-------- 1 file changed, 64 insertions(+), 38 deletions(-) diff --git a/Documentation/powerpc/firmware-assisted-dump.txt b/Documentation/powerpc/firmware-assisted-dump.txt index 326f89c..eff9f38 100644 --- a/Documentation/powerpc/firmware-assisted-dump.txt +++ b/Documentation/powerpc/firmware-assisted-dump.txt @@ -70,7 +70,8 @@ as follows: normal. -- The freshly booted kernel will notice that there is a new - node (ibm,dump-kernel) in the device tree, indicating that + node (ibm,dump-kernel on PSeries or ibm,opal/dump/result-table + on OPAL platform) in the device tree, indicating that there is crash data available from a previous boot. During the early boot OS will reserve rest of the memory above boot memory size effectively booting with restricted memory @@ -92,7 +93,20 @@ as follows: Please note that the firmware-assisted dump feature is only available on Power6 and above systems with recent -firmware versions. +firmware versions on PSeries (PowerVM) platform and Power9 +and above systems with recent firmware versions on PowerNV +(OPAL) platform. + +To process dump on OPAL platform, additional meta data (PIR to +Logical CPU map) from the crashing kernel is required. This info +has to be backed up by the crashing kernel for capture kernel to +use it in making sense of the register state data provided by the +F/W. The start address of the area where this info is backed up +is stored at the tail end of FADump crash info header. To indicate +the presence of this additional meta data (backup info), the magic +number field in FADump crash info header is overloaded as version +identifier. + Implementation details: ---------------------- @@ -108,56 +122,65 @@ that are run. If there is dump data, then the memory is held. If there is no waiting dump data, then only the memory required -to hold CPU state, HPTE region, boot memory dump and elfcore -header, is usually reserved at an offset greater than boot memory -size (see Fig. 1). This area is *not* released: this region will -be kept permanently reserved, so that it can act as a receptacle -for a copy of the boot memory content in addition to CPU state -and HPTE region, in the case a crash does occur. Since this reserved -memory area is used only after the system crash, there is no point in -blocking this significant chunk of memory from production kernel. -Hence, the implementation uses the Linux kernel's Contiguous Memory -Allocator (CMA) for memory reservation if CMA is configured for kernel. -With CMA reservation this memory will be available for applications to -use it, while kernel is prevented from using it. With this FADump will -still be able to capture all of the kernel memory and most of the user -space memory except the user pages that were present in CMA region. +to hold CPU state, HPTE region, boot memory dump, FADump header, +elfcore header and backup area, is usually reserved at an offset +greater than boot memory size (see Fig. 1). This area is *not* +released: this region will be kept permanently reserved, so that +it can act as a receptacle for a copy of the boot memory content in +addition to CPU state and HPTE region, in the case a crash does occur. +Since this reserved memory area is used only after the system crash, +there is no point in blocking this significant chunk of memory from +production kernel. Hence, the implementation uses the Linux kernel's +Contiguous Memory Allocator (CMA) for memory reservation if CMA is +configured for kernel. With CMA reservation this memory will be +available for applications to use it, while kernel is prevented from +using it. With this FADump will still be able to capture all of the +kernel memory and most of the user space memory except the user pages +that were present in CMA region. o Memory Reservation during first kernel - Low memory Top of memory - 0 boot memory size |<--Reserved dump area --->| | - | | | Permanent Reservation | | - V V | (Preserve area) | V - +-----------+----------/ /---+---+----+--------+---+----+------+ - | | |CPU|HPTE| DUMP |HDR|ELF | | - +-----------+----------/ /---+---+----+--------+---+----+------+ - | ^ ^ - | | | - \ / | - ----------------------------------- FADump Header - Boot memory content gets transferred (meta area) - to reserved area by firmware at the - time of crash - + Low memory Top of memory + 0 boot memory size |<---- Reserved dump area ---->| | + | | | Permanent Reservation | | + V V | (Preserve area) | V + +-----------+--------/ /---+---+----+-------+-----+----+--+-------+ + | | |///|////| DUMP |HDR|/|ELF |//| | + +-----------+--------/ /---+---+----+-------+-----+----+--+-------+ + | ^ ^ ^ ^ | ^^ + | | | | | | || + \ CPU HPTE / | \ / Backup Info + --------------------------------- | ---- + Boot memory content gets transferred | Start address of + to reserved area by firmware at the | Backup Info. + time of crash. | + FADump Header + (meta area) Fig. 1 o Memory Reservation during second kernel after crash - Low memory Top of memory - 0 boot memory size | - | |<------------- Reserved dump area --------------->| - V V |<---- Preserve area ----->| V - +-----------+----------/ /---+---+----+--------+---+----+------+ - | | |CPU|HPTE| DUMP |HDR|ELF | | - +-----------+----------/ /---+---+----+--------+---+----+------+ + Low memory Top of memory + 0 boot memory size | + | |<--------------- Reserved dump area ---------------->| + V V |<----- Preserve area -------->| | + +-----------+--------/ /---+---+----+-------+-----+----+--+-------+ + | | |///|////| DUMP |HDR|/|ELF |//| | + +-----------+--------/ /---+---+----+-------+-----+----+--+-------+ | | V V Used by second /proc/vmcore kernel to boot Fig. 2 + +---+ + |///| -> Regions (CPU, HPTE, HDR extension & Backup area) marked + +---+ like this in the above figures are not always present + For example, OPAL platform does not have CPU & HPTE regions + while PSeries platform doesn't use Backup area currently. + + Currently the dump will be copied from /proc/vmcore to a new file upon user intervention. The dump data available through /proc/vmcore will be in ELF format. Hence the existing kdump infrastructure (kdump scripts) @@ -289,7 +312,10 @@ TODO: 2. Reserve the area of predefined size (say PAGE_SIZE) for this structure and have unused area as reserved (initialized to zero) for future field additions. + The advantage of approach 1 over 2 is we don't need to reserve extra space. + Using approach 1 to provide additional meta data on OPAL platform while + overloading magic number field as version identifier for version tracking. --- Author: Mahesh Salgaonkar <mah...@linux.vnet.ibm.com> This document is based on the original documentation written for phyp