Re: [PATCH 2/2] powerpc/fadump: fix additional param memory reservation for HASH MMU
On 2025-01-31 20:44, Hari Bathini wrote: On 23/01/25 7:54 pm, Avnish Chouhan wrote: On 2025-01-23 15:26, Hari Bathini wrote: On 20/01/25 11:05 pm, Sourabh Jain wrote: Commit 683eab94da75bc ("powerpc/fadump: setup additional parameters for dump capture kernel") introduced the additional parameter feature in fadump for HASH MMU with the understanding that GRUB does not use the memory area between 640MB and 768MB for its operation. However, the patch ("powerpc: increase MIN RMA size for CAS negotiation") changes the MIN RMA size to 768MB, allowing GRUB to use memory up to 768MB. This makes the fadump reservation for the additional parameter feature for HASH MMU unreliable. To address this, adjust the memory range for the additional parameter in fadump for HASH MMU. This will ensure that GRUB does not overwrite the memory reserved for fadump's additional parameter in HASH MMU. The new policy for the memory range for the additional parameter in HASH MMU is that the first memory block must be larger than the MIN_RMA size, as the bootloader can use memory up to the MIN_RMA size. The range should be between MIN_RMA and the RMA size (ppc64_rma_size), and it must not overlap with the fadump reserved area. IIRC, even memory above MIN_RMA is used by the bootloader except for 640MB to 768MB (assuming RMA size is >768MB). So, how does this change guarantee that the bootloader is not using memory reserved for bootargs? Avnish, earlier, bootloader was using RUNTIME_MIN_SPACE (128MB) starting top-down at 768MB earlier. With MIN_RMA changed to 768MB, is bootloader still using the concept of RUNTIME_MIN_SPACE to set aside some memory for kernel to use. If yes, where exactly is it allocating this space now? Also, rtas instantiates top-down at 768MB. Would that not have a conflict with grub allocations without RUNTIME_MIN_SPACE at 768MB? - Hari Hi Hari, Hi Avnish, The RUNTIME_MIN_SPACE is the space left aside by Grub is within the MIN_RMA size. Grub won't use memory beyond the MIN_RMA. With this change, we haven't changed the RUNTIME_MIN_SPACE behavior. Grub will still keep the 128 MB space in MIN_RMA for loading stock kernel and initrd. IIUC, you mean, 640MB to 768MB is not used by Grub even if MIN_RMA is at 768MB? If that is true, this change is not needed, as fadump could still use the memory between 640MB to 768MB, right? Am I missing something here.. Hari, No. As we are changing MIN_RMA to 768 MB, GRUB can use memory till 768 MB if required. Regards, Avnish Chouhan - Hari
Re: [PATCH v2 5/9] powerpc: Use preempt_model_str().
Le 03/02/2025 à 15:16, Sebastian Andrzej Siewior a écrit : Use preempt_model_str() instead of manually conducting the preemption model. Use pr_emerg() instead of printk() to pass a loglevel. Why use pr_emerg() for that line and not all other ones ? The purpose of using printk() is to get it at the level defined by CONFIG_MESSAGE_LOGLEVEL_DEFAULT and I think it is important to have the full Oops block at the same level. Cc: Madhavan Srinivasan Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Christophe Leroy Cc: Naveen N Rao Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Sebastian Andrzej Siewior --- arch/powerpc/kernel/traps.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index edf5cabe5dfdb..9eb383189cfb2 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -263,10 +263,10 @@ static int __die(const char *str, struct pt_regs *regs, long err) { printk("Oops: %s, sig: %ld [#%d]\n", str, err, ++die_counter); - printk("%s PAGE_SIZE=%luK%s%s%s%s%s%s %s\n", + pr_emerg("%s PAGE_SIZE=%luK%s %s %s%s%s%s %s\n", IS_ENABLED(CONFIG_CPU_LITTLE_ENDIAN) ? "LE" : "BE", PAGE_SIZE / 1024, get_mmu_str(), - IS_ENABLED(CONFIG_PREEMPT) ? " PREEMPT" : "", + preempt_model_str(), IS_ENABLED(CONFIG_SMP) ? " SMP" : "", IS_ENABLED(CONFIG_SMP) ? (" NR_CPUS=" __stringify(NR_CPUS)) : "", debug_pagealloc_enabled() ? " DEBUG_PAGEALLOC" : "",
Re: [PATCH v2 5/9] powerpc: Use preempt_model_str().
On 2025-02-03 16:19:06 [+0100], Christophe Leroy wrote: > > > Le 03/02/2025 à 15:16, Sebastian Andrzej Siewior a écrit : > > Use preempt_model_str() instead of manually conducting the preemption > > model. Use pr_emerg() instead of printk() to pass a loglevel. > > Why use pr_emerg() for that line and not all other ones ? checkpatch complained for the current printk() line and this looks like an emergency coming from die(). > The purpose of using printk() is to get it at the level defined by > CONFIG_MESSAGE_LOGLEVEL_DEFAULT and I think it is important to have the full > Oops block at the same level. Okay. So "printk(KERN_DEFAULT " then. Sebastian
Re: [PATCH v6 0/2] Improve interrupt handling during machine kexec
Hello: This series was applied to riscv/linux.git (fixes) by Thomas Gleixner : On Wed, 4 Dec 2024 14:20:01 + you wrote: > This patch series focuses on improving the machine_kexec_mask_interrupts() > function by consolidating its implementation and optimizing its behavior to > avoid redundant interrupt masking. > > Patch Summary: > [PATCH v6 1/2] Move machine_kexec_mask_interrupts() to kernel/irq/kexec.c, >removing duplicate architecture-specific implementations. > [PATCH v6 2/2] Refine machine_kexec_mask_interrupts() to avoid re-masking >already-masked interrupts, resolving specific warnings >triggered in GPIO IRQ flows. > > [...] Here is the summary with links: - [v6,1/2] kexec: Consolidate machine_kexec_mask_interrupts() implementation https://git.kernel.org/riscv/c/bad6722e478f - [v6,2/2] kexec: Prevent redundant IRQ masking by checking state before shutdown https://git.kernel.org/riscv/c/b4706d814921 You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html
Re: [PATCH v2 1/2] cxlflash: Remove driver
Andrew, > Remove the cxlflash driver for IBM CAPI Flash devices. Applied to 6.15/scsi-staging, thanks! -- Martin K. Petersen Oracle Linux Engineering
[PATCH V1.1 20/33] cpufreq: powernv: Stop setting common freq attributes
The cpufreq core handles this now, the driver can skip setting it. Signed-off-by: Viresh Kumar Acked-by: Rafael J. Wysocki --- V1.1: - Drop runtime updates to freq attr. drivers/cpufreq/powernv-cpufreq.c | 6 -- 1 file changed, 6 deletions(-) diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c index ae79d909943b..0631284c4cfb 100644 --- a/drivers/cpufreq/powernv-cpufreq.c +++ b/drivers/cpufreq/powernv-cpufreq.c @@ -386,12 +386,8 @@ static ssize_t cpuinfo_nominal_freq_show(struct cpufreq_policy *policy, static struct freq_attr cpufreq_freq_attr_cpuinfo_nominal_freq = __ATTR_RO(cpuinfo_nominal_freq); -#define SCALING_BOOST_FREQS_ATTR_INDEX 2 - static struct freq_attr *powernv_cpu_freq_attr[] = { - &cpufreq_freq_attr_scaling_available_freqs, &cpufreq_freq_attr_cpuinfo_nominal_freq, - &cpufreq_freq_attr_scaling_boost_freqs, NULL, }; @@ -1129,8 +1125,6 @@ static int __init powernv_cpufreq_init(void) if (powernv_pstate_info.wof_enabled) powernv_cpufreq_driver.boost_enabled = true; - else - powernv_cpu_freq_attr[SCALING_BOOST_FREQS_ATTR_INDEX] = NULL; rc = cpufreq_register_driver(&powernv_cpufreq_driver); if (rc) { -- 2.31.1.272.g89b43f80a514
Re: [PATCH v4 0/7] ptrace: introduce PTRACE_SET_SYSCALL_INFO API
On Mon, Feb 03, 2025 at 10:29:37AM +0100, Alexander Gordeev wrote: > On Mon, Feb 03, 2025 at 08:58:49AM +0200, Dmitry V. Levin wrote: > > Hi Dmitry, > > > PTRACE_SET_SYSCALL_INFO is a generic ptrace API that complements > > PTRACE_GET_SYSCALL_INFO by letting the ptracer modify details of > > system calls the tracee is blocked in. > ... > > FWIW, I am getting these on s390: > > # ./tools/testing/selftests/ptrace/set_syscall_info > TAP version 13 > 1..1 > # Starting 1 tests from 1 test cases. > # RUN global.set_syscall_info ... > # set_syscall_info.c:87:set_syscall_info:Expected exp_entry->nr (-1) == > info->entry.nr (65535) > # set_syscall_info.c:88:set_syscall_info:wait #3: PTRACE_GET_SYSCALL_INFO #2: > syscall nr mismatch > # set_syscall_info: Test terminated by assertion > # FAIL global.set_syscall_info > not ok 1 global.set_syscall_info > # FAILED: 0 / 1 tests passed. > # Totals: pass:0 fail:1 xfail:0 xpass:0 skip:0 error:0 > > I remember one of the earlier versions (v1 or v2) was working for me. > > Thanks! In v3, this test was extended to check whether PTRACE_GET_SYSCALL_INFO called immediately after PTRACE_SET_SYSCALL_INFO returns the same syscall number, and on s390 it apparently doesn't, thanks to its implementation of syscall_get_nr() that returns 0x in this case. To workaround this, we could either change syscall_get_nr() to return -1 in this case, or add an #ifdef __s390x__ exception to the test. What would you prefer? -- ldv
[PATCH] powerpc/code-patching: Disable KASAN report during patching via temporary mm
Erhard reports the following KASAN hit on Talos II (power9) with kernel 6.13: [ 12.028126] == [ 12.028198] BUG: KASAN: user-memory-access in copy_to_kernel_nofault+0x8c/0x1a0 [ 12.028260] Write of size 8 at addr 187e458f2000 by task systemd/1 [ 12.028346] CPU: 87 UID: 0 PID: 1 Comm: systemd Tainted: GT 6.13.0-P9-dirty #3 [ 12.028408] Tainted: [T]=RANDSTRUCT [ 12.028446] Hardware name: T2P9D01 REV 1.01 POWER9 0x4e1202 opal:skiboot-bc106a0 PowerNV [ 12.028500] Call Trace: [ 12.028536] [c8dbf3b0] [c1656a48] dump_stack_lvl+0xbc/0x110 (unreliable) [ 12.028609] [c8dbf3f0] [c06e2fc8] print_report+0x6b0/0x708 [ 12.028666] [c8dbf4e0] [c06e2454] kasan_report+0x164/0x300 [ 12.028725] [c8dbf600] [c06e54d4] kasan_check_range+0x314/0x370 [ 12.028784] [c8dbf640] [c06e6310] __kasan_check_write+0x20/0x40 [ 12.028842] [c8dbf660] [c0578e8c] copy_to_kernel_nofault+0x8c/0x1a0 [ 12.028902] [c8dbf6a0] [c00acfe4] __patch_instructions+0x194/0x210 [ 12.028965] [c8dbf6e0] [c00ade80] patch_instructions+0x150/0x590 [ 12.029026] [c8dbf7c0] [c01159bc] bpf_arch_text_copy+0x6c/0xe0 [ 12.029085] [c8dbf800] [c0424250] bpf_jit_binary_pack_finalize+0x40/0xc0 [ 12.029147] [c8dbf830] [c0115dec] bpf_int_jit_compile+0x3bc/0x930 [ 12.029206] [c8dbf990] [c0423720] bpf_prog_select_runtime+0x1f0/0x280 [ 12.029266] [c8dbfa00] [c0434b18] bpf_prog_load+0xbb8/0x1370 [ 12.029324] [c8dbfb70] [c0436ebc] __sys_bpf+0x5ac/0x2e00 [ 12.029379] [c8dbfd00] [c043a228] sys_bpf+0x28/0x40 [ 12.029435] [c8dbfd20] [c0038eb4] system_call_exception+0x334/0x610 [ 12.029497] [c8dbfe50] [c000c270] system_call_vectored_common+0xf0/0x280 [ 12.029561] --- interrupt: 3000 at 0x3fff82f5cfa8 [ 12.029608] NIP: 3fff82f5cfa8 LR: 3fff82f5cfa8 CTR: [ 12.029660] REGS: c8dbfe80 TRAP: 3000 Tainted: GT (6.13.0-P9-dirty) [ 12.029735] MSR: 9280f032 CR: 42004848 XER: [ 12.029855] IRQMASK: 0 GPR00: 0169 3fffdcf789a0 3fff83067100 0005 GPR04: 3fffdcf78a98 0090 0008 GPR08: GPR12: 3fff836ff7e0 c0010678 GPR16: 3fffdcf78f28 3fffdcf78f90 GPR20: 3fffdcf78f80 GPR24: 3fffdcf78f70 3fffdcf78d10 3fff835c7239 3fffdcf78bd8 GPR28: 3fffdcf78a98 00011f547580 [ 12.030316] NIP [3fff82f5cfa8] 0x3fff82f5cfa8 [ 12.030361] LR [3fff82f5cfa8] 0x3fff82f5cfa8 [ 12.030405] --- interrupt: 3000 [ 12.030444] == Commit c28c15b6d28a ("powerpc/code-patching: Use temporary mm for Radix MMU") is inspired from x86 but unlike x86 is doesn't disable KASAN reports during patching. This wasn't a problem at the begining because __patch_mem() is not instrumented. Commit 465cabc97b42 ("powerpc/code-patching: introduce patch_instructions()") use copy_to_kernel_nofault() to copy several instructions at once. But when using temporary mm the destination is not regular kernel memory but a kind of kernel-like memory located in user address space. Because it is not in kernel address space it is not covered by KASAN shadow memory. Since commit e4137f08816b ("mm, kasan, kmsan: instrument copy_from/to_kernel_nofault") KASAN reports bad accesses from copy_to_kernel_nofault(). Here a bad access to user memory is reported because KASAN detects the lack of shadow memory and the address is below TASK_SIZE. Do like x86 in commit b3fd8e83ada0 ("x86/alternatives: Use temporary mm for text poking") and disable KASAN reports during patching when using temporary mm. Reported-by: Erhard Furtner Close: https://lore.kernel.org/all/20250201151435.48400261@yea/ Fixes: 465cabc97b42 ("powerpc/code-patching: introduce patch_instructions()") Signed-off-by: Christophe Leroy --- arch/powerpc/lib/code-patching.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index 8a378fc19074..f84e0337cc02 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -493,7 +493,9 @@ static int __do_patch_instructions_mm(u32 *addr, u32 *code, size_t len, bool rep orig_mm = start_using_temp_mm(patching_mm); + kasan_di
Re: [PATCH v6 22/26] device/dax: Properly refcount device dax pages when mapping
On Mon, Jan 13, 2025 at 10:12:41PM -0800, Dan Williams wrote: > Alistair Popple wrote: > > Device DAX pages are currently not reference counted when mapped, > > instead relying on the devmap PTE bit to ensure mapping code will not > > get/put references. This requires special handling in various page > > table walkers, particularly GUP, to manage references on the > > underlying pgmap to ensure the pages remain valid. > > > > However there is no reason these pages can't be refcounted properly at > > map time. Doning so eliminates the need for the devmap PTE bit, > > freeing up a precious PTE bit. It also simplifies GUP as it no longer > > needs to manage the special pgmap references and can instead just > > treat the pages normally as defined by vm_normal_page(). > > > > Signed-off-by: Alistair Popple > > --- > > drivers/dax/device.c | 15 +-- > > mm/memremap.c| 13 ++--- > > 2 files changed, 15 insertions(+), 13 deletions(-) > > > > diff --git a/drivers/dax/device.c b/drivers/dax/device.c > > index 6d74e62..fd22dbf 100644 > > --- a/drivers/dax/device.c > > +++ b/drivers/dax/device.c > > @@ -126,11 +126,12 @@ static vm_fault_t __dev_dax_pte_fault(struct dev_dax > > *dev_dax, > > return VM_FAULT_SIGBUS; > > } > > > > - pfn = phys_to_pfn_t(phys, PFN_DEV|PFN_MAP); > > + pfn = phys_to_pfn_t(phys, 0); > > > > dax_set_mapping(vmf, pfn, fault_size); > > > > - return vmf_insert_mixed(vmf->vma, vmf->address, pfn); > > + return vmf_insert_page_mkwrite(vmf, pfn_t_to_page(pfn), > > + vmf->flags & FAULT_FLAG_WRITE); > > } > > > > static vm_fault_t __dev_dax_pmd_fault(struct dev_dax *dev_dax, > > @@ -169,11 +170,12 @@ static vm_fault_t __dev_dax_pmd_fault(struct dev_dax > > *dev_dax, > > return VM_FAULT_SIGBUS; > > } > > > > - pfn = phys_to_pfn_t(phys, PFN_DEV|PFN_MAP); > > + pfn = phys_to_pfn_t(phys, 0); > > > > dax_set_mapping(vmf, pfn, fault_size); > > > > - return vmf_insert_pfn_pmd(vmf, pfn, vmf->flags & FAULT_FLAG_WRITE); > > + return vmf_insert_folio_pmd(vmf, page_folio(pfn_t_to_page(pfn)), > > + vmf->flags & FAULT_FLAG_WRITE); > > This looks suspect without initializing the compound page metadata. I initially wondered about this too, however I think the compound page metadata should be initialised by memmap_init_zone_device(). That said I kind of get lost in all the namespace/CXL/PMEM/DAX drivers in the stack so maybe I've overlooked something. > This might be getting compound pages by default with > CONFIG_ARCH_WANT_OPTIMIZE_DAX_VMEMMAP. The device-dax unit tests are ok > so far, but that is not super comforting until I can think about this a > bit more... but not tonight. >From my reading of the code I don't _think_ CONFIG_ARCH_WANT_OPTIMIZE_DAX_VMEMMAP would change whether or not we got compound pages by default, just that if we did some of the (tail?) pages may refer to the same physical struct page. > Might as well fix up device-dax refcounts in this series too, but I > won't ask you to do that, will send you something to include. Eh. That should be relatively straight forward. But then I thought that about FS DAX too :-)
Re: [PATCH] ASoC: fsl_micfil: Enable default case in micfil_set_quality()
On Thu, 16 Jan 2025 06:24:36 -0800, Nikita Zhandarovich wrote: > If 'micfil->quality' received from micfil_quality_set() somehow ends > up with an unpredictable value, switch() operator will fail to > initialize local variable qsel before regmap_update_bits() tries > to utilize it. > > While it is unlikely, play it safe and enable a default case that > returns -EINVAL error. > > [...] Applied to https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next Thanks! [1/1] ASoC: fsl_micfil: Enable default case in micfil_set_quality() commit: a8c9a453387640dbe45761970f41301a6985e7fa All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark
[PATCH v2 5/9] powerpc: Use preempt_model_str().
Use preempt_model_str() instead of manually conducting the preemption model. Use pr_emerg() instead of printk() to pass a loglevel. Cc: Madhavan Srinivasan Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Christophe Leroy Cc: Naveen N Rao Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Sebastian Andrzej Siewior --- arch/powerpc/kernel/traps.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index edf5cabe5dfdb..9eb383189cfb2 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -263,10 +263,10 @@ static int __die(const char *str, struct pt_regs *regs, long err) { printk("Oops: %s, sig: %ld [#%d]\n", str, err, ++die_counter); - printk("%s PAGE_SIZE=%luK%s%s%s%s%s%s %s\n", + pr_emerg("%s PAGE_SIZE=%luK%s %s %s%s%s%s %s\n", IS_ENABLED(CONFIG_CPU_LITTLE_ENDIAN) ? "LE" : "BE", PAGE_SIZE / 1024, get_mmu_str(), - IS_ENABLED(CONFIG_PREEMPT) ? " PREEMPT" : "", + preempt_model_str(), IS_ENABLED(CONFIG_SMP) ? " SMP" : "", IS_ENABLED(CONFIG_SMP) ? (" NR_CPUS=" __stringify(NR_CPUS)) : "", debug_pagealloc_enabled() ? " DEBUG_PAGEALLOC" : "", -- 2.47.2
Re: [PATCH v2 5/9] powerpc: Use preempt_model_str().
Le 03/02/2025 à 17:01, Sebastian Andrzej Siewior a écrit : On 2025-02-03 16:19:06 [+0100], Christophe Leroy wrote: Le 03/02/2025 à 15:16, Sebastian Andrzej Siewior a écrit : Use preempt_model_str() instead of manually conducting the preemption model. Use pr_emerg() instead of printk() to pass a loglevel. Why use pr_emerg() for that line and not all other ones ? checkpatch complained for the current printk() line and this looks like an emergency coming from die(). Right but checkpatch only looks at the line you modify with your patch, it doesn't consider the global picture. The purpose of using printk() is to get it at the level defined by CONFIG_MESSAGE_LOGLEVEL_DEFAULT and I think it is important to have the full Oops block at the same level. Okay. So "printk(KERN_DEFAULT " then. Up to you, I'm fine with that but you should consistently update all printk's in the function, not only that one, so is it really worth it ? Christophe
Re: [PATCH 2/2] powerpc/fadump: fix additional param memory reservation for HASH MMU
On 04/02/25 10:58 am, Avnish Chouhan wrote: On 2025-01-31 20:44, Hari Bathini wrote: On 23/01/25 7:54 pm, Avnish Chouhan wrote: On 2025-01-23 15:26, Hari Bathini wrote: On 20/01/25 11:05 pm, Sourabh Jain wrote: Commit 683eab94da75bc ("powerpc/fadump: setup additional parameters for dump capture kernel") introduced the additional parameter feature in fadump for HASH MMU with the understanding that GRUB does not use the memory area between 640MB and 768MB for its operation. However, the patch ("powerpc: increase MIN RMA size for CAS negotiation") changes the MIN RMA size to 768MB, allowing GRUB to use memory up to 768MB. This makes the fadump reservation for the additional parameter feature for HASH MMU unreliable. To address this, adjust the memory range for the additional parameter in fadump for HASH MMU. This will ensure that GRUB does not overwrite the memory reserved for fadump's additional parameter in HASH MMU. The new policy for the memory range for the additional parameter in HASH MMU is that the first memory block must be larger than the MIN_RMA size, as the bootloader can use memory up to the MIN_RMA size. The range should be between MIN_RMA and the RMA size (ppc64_rma_size), and it must not overlap with the fadump reserved area. IIRC, even memory above MIN_RMA is used by the bootloader except for 640MB to 768MB (assuming RMA size is >768MB). So, how does this change guarantee that the bootloader is not using memory reserved for bootargs? Avnish, earlier, bootloader was using RUNTIME_MIN_SPACE (128MB) starting top-down at 768MB earlier. With MIN_RMA changed to 768MB, is bootloader still using the concept of RUNTIME_MIN_SPACE to set aside some memory for kernel to use. If yes, where exactly is it allocating this space now? Also, rtas instantiates top-down at 768MB. Would that not have a conflict with grub allocations without RUNTIME_MIN_SPACE at 768MB? - Hari Hi Hari, Hi Avnish, The RUNTIME_MIN_SPACE is the space left aside by Grub is within the MIN_RMA size. Grub won't use memory beyond the MIN_RMA. With this change, we haven't changed the RUNTIME_MIN_SPACE behavior. Grub will still keep the 128 MB space in MIN_RMA for loading stock kernel and initrd. IIUC, you mean, 640MB to 768MB is not used by Grub even if MIN_RMA is at 768MB? If that is true, this change is not needed, as fadump could still use the memory between 640MB to 768MB, right? Am I missing something here.. Hari, No. As we are changing MIN_RMA to 768 MB, GRUB can use memory till 768 MB if required. Does that mean 'linux_rmo_save' related code in grub-core/kern/ieee1275/init.c is going to be dead code after this change. Also, does this imply, there isn't going to be any RUNTIME_MIN_SPACE support for linux in grub? - Hari
[PATCH v2 3/3] docs: ABI: sysfs-bus-event_source-devices-vpa-dtl: Document sysfs event format entries for vpa_dtl pmu
Details are added for the vpa_dtl pmu event and format attributes in the ABI documentation. Signed-off-by: Kajol Jain --- .../sysfs-bus-event_source-devices-vpa-dtl| 25 +++ 1 file changed, 25 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-bus-event_source-devices-vpa-dtl diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-vpa-dtl b/Documentation/ABI/testing/sysfs-bus-event_source-devices-vpa-dtl new file mode 100644 index ..39882e0e852d --- /dev/null +++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-vpa-dtl @@ -0,0 +1,25 @@ +What: /sys/bus/event_source/devices/vpa_dtl/format +Date: January 2025 +Contact:Linux on PowerPC Developer List +Description:Read-only. Attribute group to describe the magic bits +that go into perf_event_attr.config for a particular pmu. +(See ABI/testing/sysfs-bus-event_source-devices-format). + +Each attribute under this group defines a bit range of the +perf_event_attr.config. Supported attribute are listed +below:: + + event = "config:0-7" - event ID + + For example:: + + dtl_cede = "event=0x1" + +What: /sys/bus/event_source/devices/vpa_dtl/events +Date: January 2025 +Contact:Linux on PowerPC Developer List +Description: (RO) Attribute group to describe performance monitoring events +for the Virtual Processor Dispatch Trace Log. Each attribute in + this group describes a single performance monitoring event + supported by vpa_dtl pmu. The name of the file is the name of + the event (See ABI/testing/sysfs-bus-event_source-devices-events). -- 2.43.0
[PATCH v2 2/3] powerpc/vpa_dtl: Add interface to expose vpa dtl counters via perf
The pseries Shared Processor Logical Partition(SPLPAR) machines can retrieve a log of dispatch and preempt events from the hypervisor using data from Disptach Trace Log(DTL) buffer. With this information, user can retrieve when and why each dispatch & preempt has occurred. Added an interface to expose the Virtual Processor Area(VPA) DTL counters via perf. The following events are available and exposed in sysfs: vpa_dtl/dtl_cede/ - Trace voluntary (OS initiated) virtual processor waits vpa_dtl/dtl_preempt/ - Trace time slice preempts vpa_dtl/dtl_fault/ - Trace virtual partition memory page faults. vpa_dtl/dtl_all/ - Trace all (dtl_cede/dtl_preempt/dtl_fault) Added interface defines supported event list, config fields for the event attributes and their corresponding bit values which are exported via sysfs. User could use the standard perf tool to access perf events exposed via vpa-dtl pmu. The VPA DTL PMU counters do not interrupt on overflow or generate any PMI interrupts. Therefore, the kernel needs to poll the counters, added hrtimer code to do that. The timer interval can be provided by user via sample_period field in nano seconds. Result on power10 SPLPAR system with 656 cpu threads. In the below perf record command with vpa_dtl pmu, -c option is used to provide sample_period whch corresponding to 10ns i.e; 1sec and the workload time is also 1 second, hence we are getting 656 samples: [command] perf record -a -R -e vpa_dtl/dtl_all/ -c 10 sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.828 MB perf.data (656 samples) ] There is one hrtimer added per vpa-dtl pmu thread. Code added to handle addition of dtl buffer data in the raw sample. Since DTL does not provide IP address for a sample and it just have traces on reason of dispatch/preempt, we directly saving DTL buffer data to perf.data file as raw sample. For each hrtimer restart call, interface will dump all the new dtl entries added to dtl buffer as a raw sample. To ensure there are no other conflicting dtl users (example: debugfs dtl or /proc/powerpc/vcpudispatch_stats), interface added code to use "down_write_trylock" call to take the dtl_access_lock. The dtl_access_lock is defined in dtl.h file. Also added global reference count variable called "dtl_global_refc", to ensure dtl data can be captured per-cpu. Code also added global lock called "dtl_global_lock" to avoid race condition. Signed-off-by: Kajol Jain --- Changelog: v1 -> v2 - Rebase patches on top latest upstream kernel - Remove the cpu online/offline code and directly allocating and deallocating memory of dtl_cache in event_init/event_del function respectively. - Also include boot_tb variable as part of raw sample to convert timebase value into relative system time. - Also add check for CONFIG_PPC_SPLPAR arch/powerpc/perf/Makefile | 2 +- arch/powerpc/perf/vpa-dtl.c | 421 2 files changed, 422 insertions(+), 1 deletion(-) create mode 100644 arch/powerpc/perf/vpa-dtl.c diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile index ac2cf58d62db..623168572685 100644 --- a/arch/powerpc/perf/Makefile +++ b/arch/powerpc/perf/Makefile @@ -14,7 +14,7 @@ obj-$(CONFIG_PPC_POWERNV) += imc-pmu.o obj-$(CONFIG_FSL_EMB_PERF_EVENT) += core-fsl-emb.o obj-$(CONFIG_FSL_EMB_PERF_EVENT_E500) += e500-pmu.o e6500-pmu.o -obj-$(CONFIG_HV_PERF_CTRS) += hv-24x7.o hv-gpci.o hv-common.o +obj-$(CONFIG_HV_PERF_CTRS) += hv-24x7.o hv-gpci.o hv-common.o vpa-dtl.o obj-$(CONFIG_VPA_PMU) += vpa-pmu.o diff --git a/arch/powerpc/perf/vpa-dtl.c b/arch/powerpc/perf/vpa-dtl.c new file mode 100644 index ..dc6a71ea6539 --- /dev/null +++ b/arch/powerpc/perf/vpa-dtl.c @@ -0,0 +1,421 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Perf interface to expose Dispatch Trace Log counters. + * + * Copyright (C) 2024 Kajol Jain, IBM Corporation + */ + +#ifdef CONFIG_PPC_SPLPAR +#define pr_fmt(fmt) "vpa_dtl: " fmt + +#include +#include +#include + +#define EVENT(_name, _code) enum{_name = _code} + +/* + * Based on Power Architecture Platform Reference(PAPR) documentation, + * Table 14.14. Per Virtual Processor Area, below Dispatch Trace Log(DTL) + * Enable Mask used to get corresponding virtual processor dispatch + * to preempt traces: + * DTL_CEDE(0x1): Trace voluntary (OS initiated) virtual + * processor waits + * DTL_PREEMPT(0x2): Trace time slice preempts + * DTLFAULT(0x4): Trace virtual partition memory page + faults. + * DTL_ALL(0x7): Trace all (DTL_CEDE | DTL_PREEMPT | DTL_FAULT) + * + * Event codes based on Dispatch Trace Log Enable Mask. + */ +EVENT(DTL_CEDE, 0x1); +EVENT(DTL_PREEMPT, 0x2); +EVENT(DTL_FAULT,0x4); +EVENT(DTL_ALL, 0x7); + +GENERIC_EVENT_ATTR(dtl_cede, DTL_CEDE); +GENERIC_EVENT_ATTR(dtl_preempt, DTL_PREEMPT); +GENERIC_EVENT_ATTR(dtl_fault, DTL_FAULT); +GENERIC_EVENT_ATTR(dtl_all, DTL_ALL); + +PMU_FORMAT_ATTR
[PATCH v2 1/3] powerpc/time: Export boot_tb and log initial timebase at boot
From: Aboorva Devarajan - Export `boot_tb` for external use, this is useful in perf vpa-dtl interface, where `boot_tb` can be used to convert raw timebase values to it's relative boot timestamp. - Log the initial timebase at `time_init` as it is a useful information which can be referred to as needed. Signed-off-by: Aboorva Devarajan Signed-off-by: Kajol Jain --- arch/powerpc/include/asm/time.h | 1 + arch/powerpc/kernel/time.c | 4 +++- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h index 9bdd8080299b..b6fc5df01d53 100644 --- a/arch/powerpc/include/asm/time.h +++ b/arch/powerpc/include/asm/time.h @@ -23,6 +23,7 @@ extern u64 decrementer_max; extern unsigned long tb_ticks_per_jiffy; extern unsigned long tb_ticks_per_usec; extern unsigned long tb_ticks_per_sec; +extern u64 boot_tb; extern struct clock_event_device decrementer_clockevent; extern u64 decrementer_max; diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c index 0727332ad86f..6e8548f0e48f 100644 --- a/arch/powerpc/kernel/time.c +++ b/arch/powerpc/kernel/time.c @@ -137,7 +137,8 @@ EXPORT_SYMBOL_GPL(rtc_lock); static u64 tb_to_ns_scale __read_mostly; static unsigned tb_to_ns_shift __read_mostly; -static u64 boot_tb __read_mostly; +u64 boot_tb __read_mostly; +EXPORT_SYMBOL_GPL(boot_tb); extern struct timezone sys_tz; static long timezone_offset; @@ -943,6 +944,7 @@ void __init time_init(void) tb_to_ns_shift = shift; /* Save the current timebase to pretty up CONFIG_PRINTK_TIME */ boot_tb = get_tb(); + pr_debug("%s: timebase at boot: %llu\n", __func__, (unsigned long long)boot_tb); /* If platform provided a timezone (pmac), we correct the time */ if (timezone_offset) { -- 2.43.0
Re: [PATCH v4 0/7] ptrace: introduce PTRACE_SET_SYSCALL_INFO API
On Mon, Feb 03, 2025 at 08:58:49AM +0200, Dmitry V. Levin wrote: Hi Dmitry, > PTRACE_SET_SYSCALL_INFO is a generic ptrace API that complements > PTRACE_GET_SYSCALL_INFO by letting the ptracer modify details of > system calls the tracee is blocked in. ... FWIW, I am getting these on s390: # ./tools/testing/selftests/ptrace/set_syscall_info TAP version 13 1..1 # Starting 1 tests from 1 test cases. # RUN global.set_syscall_info ... # set_syscall_info.c:87:set_syscall_info:Expected exp_entry->nr (-1) == info->entry.nr (65535) # set_syscall_info.c:88:set_syscall_info:wait #3: PTRACE_GET_SYSCALL_INFO #2: syscall nr mismatch # set_syscall_info: Test terminated by assertion # FAIL global.set_syscall_info not ok 1 global.set_syscall_info # FAILED: 0 / 1 tests passed. # Totals: pass:0 fail:1 xfail:0 xpass:0 skip:0 error:0 I remember one of the earlier versions (v1 or v2) was working for me. Thanks!