On 2023-09-11 21:23:42 Mon, Pingfan Liu wrote: > Hi Mahesh, > > I am not quite sure about fdt, so I skip that part, and leave some > comments from the kexec view. > > On Thu, Sep 7, 2023 at 1:59 AM Mahesh Salgaonkar <[email protected]> wrote: > > > > The kernel boot parameter 'nr_cpus=' allows one to specify number of > > possible cpus in the system. In the normal scenario the first cpu (cpu0) > > that shows up is the boot cpu and hence it gets covered under nr_cpus > > limit. > > > > But this assumption is broken in kdump scenario where kdump kernel after a > > crash can boot up on an non-zero boot cpu. The paca structure allocation > > depends on value of nr_cpus and is indexed using logical cpu ids. The cpu > > discovery code brings up the cpus as they appear sequentially on device > > tree and assigns logical cpu ids starting from 0. This definitely becomes > > an issue if boot cpu id > nr_cpus. When this occurs it results into > > > > In past there were proposals to fix this by making changes to cpu discovery > > code to identify non-zero boot cpu and map it to logical cpu 0. However, > > the changes were very invasive, making discovery code more complicated and > > risky. > > > > Considering that the non-zero boot cpu scenario is more specific to kdump > > kernel, limiting the changes in panic/crash kexec path would probably be a > > best approach to have. > > > > Hence proposed change is, in crash kexec path, move the crashing cpu's > > device node to the first position under '/cpus' node, which will make the > > crashing cpu to be discovered as part of the first core in kdump kernel. > > > > In order to accommodate boot cpu for the case where boot_cpuid > nr_cpu_ids, > > align up the nr_cpu_ids to SMT threads in early_init_dt_scan_cpus(). This > > will allow kdump kernel to work with nr_cpus=X where X will be aligned up > > in multiple of SMT threads per core. > > > > Signed-off-by: Mahesh Salgaonkar <[email protected]> > > --- > > arch/powerpc/include/asm/kexec.h | 1 > > arch/powerpc/kernel/prom.c | 13 ++++ > > arch/powerpc/kexec/core_64.c | 128 > > +++++++++++++++++++++++++++++++++++++ > > arch/powerpc/kexec/file_load_64.c | 2 - > > 4 files changed, 143 insertions(+), 1 deletion(-) > > > > diff --git a/arch/powerpc/include/asm/kexec.h > > b/arch/powerpc/include/asm/kexec.h > > index a1ddba01e7d13..f5a6f4a1b8eb0 100644 > > --- a/arch/powerpc/include/asm/kexec.h > > +++ b/arch/powerpc/include/asm/kexec.h > > @@ -144,6 +144,7 @@ unsigned int kexec_extra_fdt_size_ppc64(struct kimage > > *image); > > int setup_new_fdt_ppc64(const struct kimage *image, void *fdt, > > unsigned long initrd_load_addr, > > unsigned long initrd_len, const char *cmdline); > > +int add_node_props(void *fdt, int node_offset, const struct device_node > > *dn); > > #endif /* CONFIG_PPC64 */ > > > > #endif /* CONFIG_KEXEC_FILE */ > > diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c > > index 0b5878c3125b1..c2d4f55042d72 100644 > > --- a/arch/powerpc/kernel/prom.c > > +++ b/arch/powerpc/kernel/prom.c > > @@ -322,6 +322,9 @@ static void __init > > check_cpu_feature_properties(unsigned long node) > > } > > } > > > > +/* align addr on a size boundary - adjust address up */ > > +#define _ALIGN_UP(addr, size) > > (((addr)+((size)-1))&(~((typeof(addr))(size)-1))) > > + > > static int __init early_init_dt_scan_cpus(unsigned long node, > > const char *uname, int depth, > > void *data) > > @@ -348,6 +351,16 @@ static int __init early_init_dt_scan_cpus(unsigned > > long node, > > > > nthreads = len / sizeof(int); > > > > + /* > > + * Align nr_cpu_ids to correct SMT value. This will help us to > > allocate > > + * pacas correctly to accomodate boot_cpu != 0 scenario e.g. in > > kdump > > + * kernel the boot cpu can be any cpu between 0 through nthreads. > > + */ > > + if (nr_cpu_ids % nthreads) { > > + nr_cpu_ids = _ALIGN_UP(nr_cpu_ids, nthreads); > > It is better to use set_nr_cpu_ids(), which can hide the difference of > nr_cpus_ids under different kernel configuration.
Yup, I have fixed that in RFC v2 at https://lore.kernel.org/linuxppc-dev/169426331903.1523857.16489366285648613217.stgit@jupiter/ > > > + pr_info("Aligned nr_cpus to SMT=%d, nr_cpu_ids = %d\n", > > nthreads, nr_cpu_ids); > > + } > > + > > /* > > * Now see if any of these threads match our boot cpu. > > * NOTE: This must match the parsing done in smp_setup_cpu_maps. > > + [...] > > /* too late to fail here */ > > void default_machine_kexec(struct kimage *image) > > { > > @@ -341,6 +455,20 @@ void default_machine_kexec(struct kimage *image) > > printk("kexec: Unshared all shared pages.\n"); > > } > > > > + /* > > + * Move the crashing cpus FDT node as the first node under /cpus > > node. > > + * This will make the core (where crashing cpu belongs) to > > + * automatically become first core to show up in kdump kernel and > > + * crashing cpu as boot cpu within first n threads of that core. > > + * > > + * Currently this will work with kexec_file_load only. > > + * > > + * XXX: For kexec_load, change is required in kexec tool to exclude > > FDT > > + * segment from purgatory checksum check. > > + */ > > + if (image->type == KEXEC_TYPE_CRASH && image->file_mode) > > + move_crashing_cpu(image); > > + > > Could "kexec -e" have the same logic? Then nr_cpus can work for both > "kexec -p" and "kexec -e". I see that for normal kexec path we always migrate to reboot_cpus (which is by default is logical cpu 0) before kexec'ing into new kernel. Hence, the 3rd hunk of this patch which aligns up nr_cpu_ids to nthreads is enough to support nr_cpus. However, if someone chooses to change reboot_cpu other than 0, either through reboot_cpu= kernel parameter or by "echo cpunum > /sys/kernel/reboot/cpu", then kexec -e will need to move reboot_cpus fdt node to first position for nr_cpus to work. Simple way to confirm this is: $ kexec -l <vmlinux> --initrd=<initrd> --command-line="... nr_cpus=8" $ echo 8 > /sys/kernel/reboot/cpu $ kexec -e Kexec kernel fails to boot. Will work on changes to make kexec -e to work for nr_cpus in v3. Thanks for your review. -Mahesh. > > Thanks, > > Pingfan > > > paca_ptrs[kexec_paca.paca_index] = &kexec_paca; > > > > setup_paca(&kexec_paca); > > diff --git a/arch/powerpc/kexec/file_load_64.c > > b/arch/powerpc/kexec/file_load_64.c > > index 110d28bede2a7..42d55a19454a7 100644 > > --- a/arch/powerpc/kexec/file_load_64.c > > +++ b/arch/powerpc/kexec/file_load_64.c > > @@ -1027,7 +1027,7 @@ unsigned int kexec_extra_fdt_size_ppc64(struct kimage > > *image) > > * > > * Returns 0 on success, negative errno on error. > > */ > > -static int add_node_props(void *fdt, int node_offset, const struct > > device_node *dn) > > +int add_node_props(void *fdt, int node_offset, const struct device_node > > *dn) > > { > > int ret = 0; > > struct property *pp; > > > > -- Mahesh J Salgaonkar
