[RESEND PATCH] powerpc/kprobes: don't save r13 register in kprobe context
When CONFIG_STACKPROTECTOR_STRONG is enabled and FTRACE is disabled on powerpc64, repeatedly triggering the kprobe process may cause stack check failures and panic. Case: There is a kprobe(do nothing in handler) attached to the "shmem_get_inode", and a process A is creating file on tmpfs. CPU0 A |r13 = paca_ptrs[0], paca_ptrs[0]->canary=A->stack_canary |touch a file on tmpfs |shmem_mknod(): |load A's canary through r13 and stored it in A's stack |shmem_get_inode(): |enter kprobe first |optinsn_slot(): |stored r13 (paca_ptrs[0]) in stack .. ==> schedule, B run on CPU0, A run on CPU1 CPU0 B |r13 = paca_ptrs[0], paca_ptrs[0]->canary=B->stack_canary |do something... CPU1 A |r13 = paca_ptrs[1], paca_ptrs[1]->canary=A->stack_canary |about to leave 'optinsn_slot', restore r13 from A's stack |r13 = paca_ptrs[0], paca_ptrs[0]->canary=B->stack_canary |leave optinsn_slot, back to shmem_get_inode |leave shmem_get_inode, back to shmem_mknod |do canary check in shmem_mknod, but canary stored in A's stack (A's canary) doesn't match the canary loaded through r13 (B's canary), so panic When A(on CPU0) entring optinsn_slot, it stored r13(paca_ptrs[0]) in stack, then A is scheduled to CPU1 and restore r13 from A's stack when leaving 'optinsn_slot'. Now A is running on CPU1 but r13 point to CPU0's paca_ptrs[0], at this time paca_ptrs[0]->__current points to another process B, which cause A use B's canary to do stack check and panic. This can be simply fixed by not saving and restoring the r13 register, because on powerpc64, r13 is a special register that reserved to point to the current process, no need to restore the outdated r13 if scheduled had happened. Fixes: 51c9c0843993 ("powerpc/kprobes: Implement Optprobes") Signed-off-by: pangliyuan --- arch/powerpc/kernel/optprobes_head.S | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/kernel/optprobes_head.S b/arch/powerpc/kernel/optprobes_head.S index 35932f45fb4e..bf0d77e62ba8 100644 --- a/arch/powerpc/kernel/optprobes_head.S +++ b/arch/powerpc/kernel/optprobes_head.S @@ -10,12 +10,12 @@ #include #ifdef CONFIG_PPC64 -#define SAVE_30GPRS(base) SAVE_GPRS(2, 31, base) -#define REST_30GPRS(base) REST_GPRS(2, 31, base) +#define SAVE_NEEDED_GPRS(base) SAVE_GPRS(2, 12, base); SAVE_GPRS(14, 31, base) +#define REST_NEEDED_GPRS(base) REST_GPRS(2, 12, base); REST_GPRS(14, 31, base) #define TEMPLATE_FOR_IMM_LOAD_INSNSnop; nop; nop; nop; nop #else -#define SAVE_30GPRS(base) stmw r2, GPR2(base) -#define REST_30GPRS(base) lmw r2, GPR2(base) +#define SAVE_NEEDED_GPRS(base) stmwr2, GPR2(base) +#define REST_NEEDED_GPRS(base) lmw r2, GPR2(base) #define TEMPLATE_FOR_IMM_LOAD_INSNSnop; nop; nop #endif @@ -45,7 +45,7 @@ optprobe_template_entry: /* Save the previous SP into stack */ addir0,r1,INT_FRAME_SIZE PPC_STL r0,GPR1(r1) - SAVE_30GPRS(r1) + SAVE_NEEDED_GPRS(r1) /* Save SPRS */ mfmsr r5 PPC_STL r5,_MSR(r1) @@ -123,7 +123,7 @@ optprobe_template_call_emulate: PPC_LL r5,_CCR(r1) mtcrr5 REST_GPR(0,r1) - REST_30GPRS(r1) + REST_NEEDED_GPRS(r1) /* Restore the previous SP */ addir1,r1,INT_FRAME_SIZE -- 2.37.7
Re: [PATCH v5 12/15] objtool: Add support for more complex UACCESS control
On Wed, Jan 15, 2025 at 11:42:52PM +0100, Christophe Leroy wrote: > diff --git a/tools/objtool/check.c b/tools/objtool/check.c > index 91436f4b3622..54625f09d831 100644 > --- a/tools/objtool/check.c > +++ b/tools/objtool/check.c > @@ -2422,6 +2422,14 @@ static int __annotate_late(struct objtool_file *file, > int type, struct instructi > insn->dead_end = false; > break; > > + case ANNOTYPE_UACCESS_BEGIN: > + insn->type = INSN_STAC; > + break; > + > + case ANNOTYPE_UACCESS_END: > + insn->type = INSN_CLAC; > + break; I would feel better if this had something like: if (insn->type != INSN_OTHER) WARN_INSN(insn, "over-riding instruction type: %d", insn->type); Adding these annotations to control flow instruction would be bad etc.
Re: [PATCH] selftests: livepatch: handle PRINTK_CALLER in check_result()
On Tue 2025-01-14 20:01:44, Madhavan Srinivasan wrote: > Some arch configs (like ppc64) enable CONFIG_PRINTK_CALLER, which > adds the caller id as part of the dmesg. Due to this, even though > the expected vs observed are same, end testcase results are failed. CONFIG_PRINTK_CALLER is not the only culprit. We (SUSE) have it enabled as well and the selftests pass without this patch. The difference might be in dmesg. It shows the caller only when the messages are read via the syslog syscall (-S) option. It should not show the caller when the messages are read via /dev/kmsg which should be the default. I wonder if you define an alias to dmesg which adds the "-S" option or if /dev/kmsg is not usable from some reason. That said, I am fine with the patch. But I would like to better understand and document why you need it. Also it would be nice to update the filter format as suggested by Joe. Best Regards, Petr
Re: [PATCH RFC v2 01/29] mm: asi: Make some utility functions noinstr compatible
On Thu, Jan 16, 2025 at 01:18:58AM +0100, Borislav Petkov wrote: > Long story short, lemme try to poke around tomorrow to try to figure out what > actually happens. It could be caused by the part of Rik's patches and this one > inlining things. We'll see... Looks transient... The very similar guest boots fine on another machine. Let's watch this... -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette
[PATCH v4 RESEND] powerpc/pseries/eeh: Fix get PE state translation
The PE Reset State "0" returned by RTAS calls "ibm_read_slot_reset_[state|state2]" indicates that the reset is deactivated and the PE is in a state where MMIO and DMA are allowed. However, the current implementation of "pseries_eeh_get_state()" does not reflect this, causing drivers to incorrectly assume that MMIO and DMA operations cannot be resumed. The userspace drivers as a part of EEH recovery using VFIO ioctls fail to detect when the recovery process is complete. The VFIO_EEH_PE_GET_STATE ioctl does not report the expected EEH_PE_STATE_NORMAL state, preventing userspace drivers from functioning properly on pseries systems. The patch addresses this issue by updating 'pseries_eeh_get_state()' to include "EEH_STATE_MMIO_ENABLED" and "EEH_STATE_DMA_ENABLED" in the result mask for PE Reset State "0". This ensures correct state reporting to the callers, aligning the behavior with the PAPR specification and fixing the bug in EEH recovery for VFIO user workflows. Fixes: 00ba05a12b3c ("powerpc/pseries: Cleanup on pseries_eeh_get_state()") Cc: Reviewed-by: Ritesh Harjani (IBM) Signed-off-by: Narayana Murty N --- Changelog: V1:https://lore.kernel.org/all/20241107042027.338065-1-nnmli...@linux.ibm.com/ --added Fixes tag for "powerpc/pseries: Cleanup on pseries_eeh_get_state()". V2:https://lore.kernel.org/stable/20241212075044.10563-1-nnmlinux%40linux.ibm.com --Updated the patch description to include it in the stable kernel tree. V3:https://lore.kernel.org/all/87v7vm8pwz@gmail.com/ --Updated commit description. --- arch/powerpc/platforms/pseries/eeh_pseries.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c index 1893f66371fa..b12ef382fec7 100644 --- a/arch/powerpc/platforms/pseries/eeh_pseries.c +++ b/arch/powerpc/platforms/pseries/eeh_pseries.c @@ -580,8 +580,10 @@ static int pseries_eeh_get_state(struct eeh_pe *pe, int *delay) switch(rets[0]) { case 0: - result = EEH_STATE_MMIO_ACTIVE | -EEH_STATE_DMA_ACTIVE; + result = EEH_STATE_MMIO_ACTIVE | +EEH_STATE_DMA_ACTIVE | +EEH_STATE_MMIO_ENABLED | +EEH_STATE_DMA_ENABLED; break; case 1: result = EEH_STATE_RESET_ACTIVE | -- 2.47.1
Re: [PATCH] selftests: livepatch: handle PRINTK_CALLER in check_result()
On 1/16/25 04:29, Petr Mladek wrote: > On Tue 2025-01-14 20:01:44, Madhavan Srinivasan wrote: >> Some arch configs (like ppc64) enable CONFIG_PRINTK_CALLER, which >> adds the caller id as part of the dmesg. Due to this, even though >> the expected vs observed are same, end testcase results are failed. > > CONFIG_PRINTK_CALLER is not the only culprit. We (SUSE) have it enabled > as well and the selftests pass without this patch. > > The difference might be in dmesg. It shows the caller only when > the messages are read via the syslog syscall (-S) option. It should > not show the caller when the messages are read via /dev/kmsg > which should be the default. > > I wonder if you define an alias to dmesg which adds the "-S" option > or if /dev/kmsg is not usable from some reason. > Hi Petr, To see the thread markers on a RHEL-9.6 machine, I built and installed the latest dmesg from: https://github.com/util-linux/util-linux and ran Madhavan's tests. I don't think there was any alias involved: $ alias | grep dmesg (nothing) $ ~/util-linux/dmesg | tail -n1 [ 4361.322790] [ T98877] % rmmod test_klp_livepatch >From util-linux's 467a5b3192f1 ("dmesg: add caller_id support"): The dmesg -S using the old syslog interface supports printing the PRINTK_CALLER field but currently standard dmesg does not support printing the field if present. There are utilities that use dmesg and so it would be optimal if dmesg supported PRINTK_CALLER as well. does that imply that printing the thread IDs is now a (util-linux's) dmesg default? Regards, -- Joe
Re: [PATCH RFC v2 01/29] mm: asi: Make some utility functions noinstr compatible
On Thu, 16 Jan 2025 at 01:21, Borislav Petkov wrote: > > Unfortunately Thomas pointed out this will prevent the function from > > being inlined at call sites in .text. > > > > So far I haven't been able[1] to find a formulation that lets us : > > 1. avoid calls from .noinstr.text -> .text, > > 2. while also letting the compiler freely decide what to inline. > > > > 1 is a functional requirement so here I'm just giving up on 2. Existing > > callsites of this code are just forced inline. For the incoming code > > that needs to call it from noinstr, they will be out-of-line calls. > > I'm not sure some of that belongs in the commit message - if you want to have > it in the submission, you should put it under the --- line below, right above > the diffstat. Sure. I'm actually not even sure that for a [PATCH]-quality thing this cross-cutting commit even makes sense - once we've decided on the general way to solve this problem, perhaps the changes should just be part of the commit that needs them? It feels messy to have a patch that "does multiple things", but on the other hand it might be annoying to review a patch that says "make a load of random changes across the kernel, which are needed at various points in various upcoming patches, trust me". Do you have any opinion on that? (BTW, since a comment you made on another series (can't find it on Lore...), I've changed my writing style to avoid stuff like this in comments & commit messages in general, but this text all predates that. I'll do my best to sort all that stuff out before I send anything as a [PATCH].) On Thu, 16 Jan 2025 at 11:29, Borislav Petkov wrote: > > On Thu, Jan 16, 2025 at 01:18:58AM +0100, Borislav Petkov wrote: > > Long story short, lemme try to poke around tomorrow to try to figure out > > what > > actually happens. It could be caused by the part of Rik's patches and this > > one > > inlining things. We'll see... > > Looks transient... The very similar guest boots fine on another machine. Let's > watch this... Oh, I didn't notice your update until now. But yeah I also couldn't reproduce it on a Sapphire Rapids machine and on QEMU with this patch applied on top of tip/master (37bc915c6ad0f).
Re: [PATCH RFC v2 01/29] mm: asi: Make some utility functions noinstr compatible
On Thu, Jan 16, 2025 at 02:22:42PM +0100, Brendan Jackman wrote: > Sure. I'm actually not even sure that for a [PATCH]-quality thing this > cross-cutting commit even makes sense - once we've decided on the > general way to solve this problem, perhaps the changes should just be > part of the commit that needs them? Right, that sounds better. > It feels messy to have a patch that "does multiple things", but on the > other hand it might be annoying to review a patch that says "make a > load of random changes across the kernel, which are needed at various > points in various upcoming patches, trust me". > > Do you have any opinion on that? You're absolutely right - we do things when we need them and not before. Otherwise, often times things get done preemptively and then forgotten only for someone to notice way later and undo them again. > (BTW, since a comment you made on another series (can't find it on > Lore...), I've changed my writing style to avoid stuff like this in > comments & commit messages in general, but this text all predates > that. I'll do my best to sort all that stuff out before I send > anything as a [PATCH].) Thanks! Btw, good and funny way to use "[PATCH]-quality" to mean non-RFC. :-P > Oh, I didn't notice your update until now. But yeah I also couldn't > reproduce it on a Sapphire Rapids machine and on QEMU with this patch > applied on top of tip/master (37bc915c6ad0f). Yeah, it feels like toolchain-related but I can't put my finger on it yet. We'll see if and when this thing will re-surface its ugly head... :-) Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH v5 00/15] powerpc/objtool: uaccess validation for PPC32 (v5)
On Wed, Jan 15, 2025 at 11:42:40PM +0100, Christophe Leroy wrote: > Christophe Leroy (15): > objtool: Fix generic annotation infrastructure cross build > objtool: Move back misplaced comment > objtool: Allow an architecture to disable objtool on ASM files > objtool: Fix JUMP_ENTRY_SIZE for bi-arch like powerpc > objtool: Add INSN_RETURN_CONDITIONAL > objtool: Add support for relative switch tables > objtool: Merge mark_func_jump_tables() and add_func_jump_tables() > objtool: Track general purpose register used for switch table base > objtool: Find end of switch table directly > objtool: When looking for switch tables also follow conditional and > dynamic jumps > objtool: .rodata.cst{2/4/8/16} are not switch tables > objtool: Add support for more complex UACCESS control > objtool: Prepare noreturns.h for more architectures > powerpc/bug: Annotate reachable after warning trap > powerpc: Implement UACCESS validation on PPC32 This mostly looks okay to me. You have a tendency to violate the reverse xmas tree for variables thing, but meh. I would like Josh to have a hard look at this though, jump tables are his thing :-)
Re: [RESEND PATCH] powerpc/kprobes: don't save r13 register in kprobe context
Le 16/01/2025 à 09:45, pangliyuan a écrit : [Vous ne recevez pas souvent de courriers de pangliyu...@huawei.com. Découvrez pourquoi ceci est important à https://aka.ms/LearnAboutSenderIdentification ] When CONFIG_STACKPROTECTOR_STRONG is enabled and FTRACE is disabled on powerpc64, repeatedly triggering the kprobe process may cause stack check failures and panic. Case: There is a kprobe(do nothing in handler) attached to the "shmem_get_inode", and a process A is creating file on tmpfs. CPU0 A |r13 = paca_ptrs[0], paca_ptrs[0]->canary=A->stack_canary |touch a file on tmpfs |shmem_mknod(): |load A's canary through r13 and stored it in A's stack |shmem_get_inode(): |enter kprobe first |optinsn_slot(): |stored r13 (paca_ptrs[0]) in stack .. ==> schedule, B run on CPU0, A run on CPU1 CPU0 B |r13 = paca_ptrs[0], paca_ptrs[0]->canary=B->stack_canary |do something... CPU1 A |r13 = paca_ptrs[1], paca_ptrs[1]->canary=A->stack_canary |about to leave 'optinsn_slot', restore r13 from A's stack |r13 = paca_ptrs[0], paca_ptrs[0]->canary=B->stack_canary |leave optinsn_slot, back to shmem_get_inode |leave shmem_get_inode, back to shmem_mknod |do canary check in shmem_mknod, but canary stored in A's stack (A's canary) doesn't match the canary loaded through r13 (B's canary), so panic When A(on CPU0) entring optinsn_slot, it stored r13(paca_ptrs[0]) in stack, then A is scheduled to CPU1 and restore r13 from A's stack when leaving 'optinsn_slot'. Now A is running on CPU1 but r13 point to CPU0's paca_ptrs[0], at this time paca_ptrs[0]->__current points to another process B, which cause A use B's canary to do stack check and panic. This can be simply fixed by not saving and restoring the r13 register, because on powerpc64, r13 is a special register that reserved to point to the current process, no need to restore the outdated r13 if scheduled had happened. Does r13 really points to current process ? I thought it was pointing to PACA which is a structure attached to a given CPU not a given process. By the way, don't we have the same problem on powerpc32 with register r2 ? Fixes: 51c9c0843993 ("powerpc/kprobes: Implement Optprobes") Signed-off-by: pangliyuan --- arch/powerpc/kernel/optprobes_head.S | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/kernel/optprobes_head.S b/arch/powerpc/kernel/optprobes_head.S index 35932f45fb4e..bf0d77e62ba8 100644 --- a/arch/powerpc/kernel/optprobes_head.S +++ b/arch/powerpc/kernel/optprobes_head.S @@ -10,12 +10,12 @@ #include #ifdef CONFIG_PPC64 -#define SAVE_30GPRS(base) SAVE_GPRS(2, 31, base) -#define REST_30GPRS(base) REST_GPRS(2, 31, base) +#define SAVE_NEEDED_GPRS(base) SAVE_GPRS(2, 12, base); SAVE_GPRS(14, 31, base) +#define REST_NEEDED_GPRS(base) REST_GPRS(2, 12, base); REST_GPRS(14, 31, base) This macro name seems a bit sketchy, as far as I understand r0 and r1 also need to be saved/restored allthough they are not handled by this macro. #define TEMPLATE_FOR_IMM_LOAD_INSNSnop; nop; nop; nop; nop #else -#define SAVE_30GPRS(base) stmw r2, GPR2(base) -#define REST_30GPRS(base) lmw r2, GPR2(base) +#define SAVE_NEEDED_GPRS(base) stmwr2, GPR2(base) +#define REST_NEEDED_GPRS(base) lmw r2, GPR2(base) #define TEMPLATE_FOR_IMM_LOAD_INSNSnop; nop; nop #endif @@ -45,7 +45,7 @@ optprobe_template_entry: /* Save the previous SP into stack */ addir0,r1,INT_FRAME_SIZE PPC_STL r0,GPR1(r1) - SAVE_30GPRS(r1) + SAVE_NEEDED_GPRS(r1) /* Save SPRS */ mfmsr r5 PPC_STL r5,_MSR(r1) @@ -123,7 +123,7 @@ optprobe_template_call_emulate: PPC_LL r5,_CCR(r1) mtcrr5 REST_GPR(0,r1) - REST_30GPRS(r1) + REST_NEEDED_GPRS(r1) /* Restore the previous SP */ addir1,r1,INT_FRAME_SIZE -- 2.37.7
Re: [PATCH] selftests: livepatch: handle PRINTK_CALLER in check_result()
On Thu 2025-01-16 08:10:44, Joe Lawrence wrote: > On 1/16/25 04:29, Petr Mladek wrote: > > On Tue 2025-01-14 20:01:44, Madhavan Srinivasan wrote: > >> Some arch configs (like ppc64) enable CONFIG_PRINTK_CALLER, which > >> adds the caller id as part of the dmesg. Due to this, even though > >> the expected vs observed are same, end testcase results are failed. > > > > CONFIG_PRINTK_CALLER is not the only culprit. We (SUSE) have it enabled > > as well and the selftests pass without this patch. > > > > The difference might be in dmesg. It shows the caller only when > > the messages are read via the syslog syscall (-S) option. It should > > not show the caller when the messages are read via /dev/kmsg > > which should be the default. > > > > I wonder if you define an alias to dmesg which adds the "-S" option > > or if /dev/kmsg is not usable from some reason. > > > > Hi Petr, > > To see the thread markers on a RHEL-9.6 machine, I built and installed > the latest dmesg from: > > https://github.com/util-linux/util-linux > > and ran Madhavan's tests. I don't think there was any alias involved: > > $ alias | grep dmesg > (nothing) > > $ ~/util-linux/dmesg | tail -n1 > [ 4361.322790] [ T98877] % rmmod test_klp_livepatch Good to know. I havn't seen this yet. > >From util-linux's 467a5b3192f1 ("dmesg: add caller_id support"): > > The dmesg -S using the old syslog interface supports printing the > PRINTK_CALLER field but currently standard dmesg does not support > printing the field if present. There are utilities that use dmesg and > so it would be optimal if dmesg supported PRINTK_CALLER as well. > > does that imply that printing the thread IDs is now a (util-linux's) > dmesg default? It looks like. The caller ID information is available also via /dev/kmsg but the older dmesg version did not show it. I guess that they just added support to parse and show it. It actually makes sense to show the same output independently on whether the messages are read via syslog or /dev/kmsg. So, we need this patch, definitely ;-) Best Regards, Petr
Re: [PATCH v2 23/25] EDAC/cell: Remove powerpc Cell driver
On Wed, Dec 18, 2024 at 09:55:11PM +1100, Michael Ellerman wrote: > This driver can no longer be built since support for IBM Cell Blades was > removed, in particular PPC_CELL_COMMON. > > Remove the driver. > > Signed-off-by: Michael Ellerman > --- > v2: Rebase. > > Cc: linux-e...@vger.kernel.org > > drivers/edac/Kconfig | 8 -- > drivers/edac/Makefile| 2 - > drivers/edac/cell_edac.c | 281 --- > 3 files changed, 291 deletions(-) > delete mode 100644 drivers/edac/cell_edac.c You forgot a spot: diff --git a/arch/powerpc/configs/cell_defconfig b/arch/powerpc/configs/cell_defconfig index 53f43a34e1a9..b33f0034990c 100644 --- a/arch/powerpc/configs/cell_defconfig +++ b/arch/powerpc/configs/cell_defconfig @@ -168,7 +168,6 @@ CONFIG_INFINIBAND_MTHCA=m CONFIG_INFINIBAND_IPOIB=m CONFIG_INFINIBAND_IPOIB_DEBUG_DATA=y CONFIG_EDAC=y -CONFIG_EDAC_CELL=y CONFIG_UIO=m CONFIG_EXT2_FS=y CONFIG_EXT4_FS=y Queued with that hunk added. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH RFC v2 02/29] x86: Create CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION
On Fri, Jan 10, 2025 at 06:40:28PM +, Brendan Jackman wrote: > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index > 7b9a7e8f39acc8e9aeb7d4213e87d71047865f5c..5a50582eb210e9d1309856a737d32b76fa1bfc85 > 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -2519,6 +2519,20 @@ config MITIGATION_PAGE_TABLE_ISOLATION > > See Documentation/arch/x86/pti.rst for more details. > > +config MITIGATION_ADDRESS_SPACE_ISOLATION > + bool "Allow code to run with a reduced kernel address space" > + default n > + depends on X86_64 && !PARAVIRT && !UML > + help > + This feature provides the ability to run some kernel code s/This feature provide/Provide/ > + with a reduced kernel address space. This can be used to > + mitigate some speculative execution attacks. > + > + The !PARAVIRT dependency is only because of lack of testing; in theory > + the code is written to work under paravirtualization. In practice > + there are likely to be unhandled cases, in particular concerning TLB > + flushes. Right, this paragraph should be under the "---" line too until PARAVIRT gets tested, ofc. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette
[PATCH] ASoC: fsl_micfil: Enable default case in micfil_set_quality()
If 'micfil->quality' received from micfil_quality_set() somehow ends up with an unpredictable value, switch() operator will fail to initialize local variable qsel before regmap_update_bits() tries to utilize it. While it is unlikely, play it safe and enable a default case that returns -EINVAL error. Found by Linux Verification Center (linuxtesting.org) with static analysis tool SVACE. Fixes: bea1d61d5892 ("ASoC: fsl_micfil: rework quality setting") Cc: sta...@vger.kernel.org Signed-off-by: Nikita Zhandarovich --- sound/soc/fsl/fsl_micfil.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sound/soc/fsl/fsl_micfil.c b/sound/soc/fsl/fsl_micfil.c index 8c15389c9a04..5585f4c8f455 100644 --- a/sound/soc/fsl/fsl_micfil.c +++ b/sound/soc/fsl/fsl_micfil.c @@ -157,6 +157,8 @@ static int micfil_set_quality(struct fsl_micfil *micfil) case QUALITY_VLOW2: qsel = MICFIL_QSEL_VLOW2_QUALITY; break; + default: + return -EINVAL; } return regmap_update_bits(micfil->regmap, REG_MICFIL_CTRL2,
linux-next: duplicate patch in the powerpc tree
Hi all, The following commit is also in Linus Torvalds' tree as a different commit (but the same patch): 7fee0217538a ("MAINTAINERS: powerpc: Update my status") This is commit 77a903cd8e5a ("MAINTAINERS: powerpc: Update my status") in Linus' tree. -- Cheers, Stephen Rothwell pgpLyn8VsCJtv.pgp Description: OpenPGP digital signature
Re: [PATCH] of: address: Unify resource bounds overflow checking
>> >> Thomas Weißschuh writes: >> >> > The members "start" and "end" of struct resource are of type >> >> > "resource_size_t" which can be 32bit wide. >> >> > Values read from OF however are always 64bit wide. >> >> > >> >> > Refactor the diff overflow checks into a helper function. >> >> > Also extend the checks to validate each calculation step. >> >> > >> >> > Signed-off-by: Thomas Weißschuh >> >> > --- >> >> > drivers/of/address.c | 45 ++--- >> >> > 1 file changed, 26 insertions(+), 19 deletions(-) >> >> > >> >> > diff --git a/drivers/of/address.c b/drivers/of/address.c >> >> > index 7e59283a4472..df854bb427ce 100644 >> >> > --- a/drivers/of/address.c >> >> > +++ b/drivers/of/address.c >> >> > @@ -198,6 +198,25 @@ static u64 of_bus_pci_map(__be32 *addr, const >> >> > __be32 >> >> > *range, int na, int ns, >> >> > >> >> > #endif /* CONFIG_PCI */ >> >> > >> >> > +static int __of_address_resource_bounds(struct resource *r, u64 start, >> >> > u64 >> >> > size) >> >> > +{ >> >> > + u64 end = start; >> >> > + >> >> > + if (overflows_type(start, r->start)) >> >> > + return -EOVERFLOW; >> >> > + if (size == 0) >> >> > + return -EOVERFLOW; >> >> > + if (check_add_overflow(end, size - 1, &end)) >> >> > + return -EOVERFLOW; >> >> > + if (overflows_type(end, r->end)) >> >> > + return -EOVERFLOW; >> >> >> >> This breaks PCI on powerpc qemu. Part of the PCI probe reads a resource >> >> that's zero sized, which used to succeed but now fails due to the size >> >> check above. >> >> >> >> The diff below fixes it for me. >> > >> > I fixed it up with your change. >> >> >> This commit is breaking Ethernet functionality on the TI AM57xx platform due >> to >> zero byte SRAM block size allocation during initialization. Prior to this >> patch, zero byte block sizes were handled properly. > > What driver and where exactly? We found an issue while developing the driver [1] and more specifically in [2] (lines 313-327), but it looks like this is a generic issue which can block 1 byte of memory, when a zero size request has been initiated for the reserved region. static int __of_address_resource_bounds(struct resource *r, u64 start, u64 size) { u64 end = start; if (overflows_type(start, r->start)) return -EOVERFLOW; if (size && check_add_overflow(end, size - 1, &end)) return -EOVERFLOW; if (overflows_type(end, r->end)) return -EOVERFLOW; r->start = start; r->end = end; return 0; } Though we have the start address handling already in place above, we do see an issue with the end address, because there is an unconditional +1 afterwards in resource_size() API below which is responsible for reserving the extra byte static inline resource_size_t resource_size(const struct resource *res) { return res->end - res->start + 1; } We have 4 ways of fixing it. Option 1: Modify the function to handle the size zero case diff --git a/drivers/of/address.c b/drivers/of/address.c index c1f1c810e810..8db6ae9a12b8 100644 --- a/drivers/of/address.c +++ b/drivers/of/address.c @@ -204,6 +204,12 @@ static int __of_address_resource_bounds(struct resource *r, u64 start, u64 size) if (overflows_type(start, r->start)) return -EOVERFLOW; + if (!size) { + r->start = start; + r->end = end - 1; + + return 0; + } if (size && check_add_overflow(end, size - 1, &end)) return -EOVERFLOW; if (overflows_type(end, r->end)) This seems to be the simplest solution. Option 2: Handle in resource_size(). static inline resource_size_t resource_size(const struct resource *res) { return (res->end == res->start) ? 0 : (res->end - res->start + 1); } There are plenty of places where we are using this API and there is an assumption that the end address should always be start + size -1. We are a bit unsure about the side effects of this change. Option 3: Handle in sram_reserve_region(). We can avoid calling the resource_size() API and handle size zero as a special case. We are a bit unsure about the side effects of this change as well. Option 4: Handle this in dts [2] with non zero size. Estimate the approximate size and update that value in dts file with extra buffer. However, as indicated in [2] in lines 313-327, the size is not known apriori and the actual size is only known in runtime. So if we set some size for this buffer, then this will always be blocked and may or may not be used subsequently. [1] https://lore.kernel.org/all/20250109105600.41297-1-bashar...@couthit.com/ [2] https://github.com/torvalds/linux/blob/master/arch/arm/boot/dts/ti/omap/dra7.dtsi Thanks & Best Regards, Basharath
Re: [PATCH v6 20/26] mm/mlock: Skip ZONE_DEVICE PMDs during mlock
On Mon, Jan 13, 2025 at 06:42:46PM -0800, Dan Williams wrote: > Alistair Popple wrote: > > At present mlock skips ptes mapping ZONE_DEVICE pages. A future change > > to remove pmd_devmap will allow pmd_trans_huge_lock() to return > > ZONE_DEVICE folios so make sure we continue to skip those. > > > > Signed-off-by: Alistair Popple > > Acked-by: David Hildenbrand > > This looks like a fix in that mlock_pte_range() *does* call mlock_folio() > when pmd_trans_huge_lock() returns a non-NULL @ptl. > > So it is not in preparation for a future change it is making the pte and > pmd cases behave the same to drop mlock requests. > > The code change looks good, but do add a Fixes tag and reword the > changelog a bit before adding: Yeah, that changelog is a bit whacked. In fact it's not a fix - because mlock_fixup() (the only caller) already filters dax VMAs. So this is really about fixing a possible future bug when we start having PMDs for other types of ZONE_DEVICE pages (ie. private, coherent, etc). So probably I should just roll this into "mm: Allow compound zone device pages". > Reviewed-by: Dan Williams
Re: [PATCH v6 11/26] mm: Allow compound zone device pages
On Tue, Jan 14, 2025 at 03:59:31PM +0100, David Hildenbrand wrote: > On 10.01.25 07:00, Alistair Popple wrote: > > Zone device pages are used to represent various type of device memory > > managed by device drivers. Currently compound zone device pages are > > not supported. This is because MEMORY_DEVICE_FS_DAX pages are the only > > user of higher order zone device pages and have their own page > > reference counting. > > > > A future change will unify FS DAX reference counting with normal page > > reference counting rules and remove the special FS DAX reference > > counting. Supporting that requires compound zone device pages. > > > > Supporting compound zone device pages requires compound_head() to > > distinguish between head and tail pages whilst still preserving the > > special struct page fields that are specific to zone device pages. > > > > A tail page is distinguished by having bit zero being set in > > page->compound_head, with the remaining bits pointing to the head > > page. For zone device pages page->compound_head is shared with > > page->pgmap. > > > > The page->pgmap field is common to all pages within a memory section. > > Therefore pgmap is the same for both head and tail pages and can be > > moved into the folio and we can use the standard scheme to find > > compound_head from a tail page. > > The more relevant thing is that the pgmap field must be common to all pages > in a folio, even if a folio exceeds memory sections (e.g., 128 MiB on x86_64 > where we have 1 GiB folios). Thanks for pointing that out. I had assumed folios couldn't cross a memory section. Obviously that is wrong so I've updated the commit message accordingly. - Alistair > > > Signed-off-by: Alistair Popple > > Reviewed-by: Jason Gunthorpe > > Reviewed-by: Dan Williams > > > > --- > > > > Changes for v4: > > - Fix build breakages reported by kernel test robot > > > > Changes since v2: > > > > - Indentation fix > > - Rename page_dev_pagemap() to page_pgmap() > > - Rename folio _unused field to _unused_pgmap_compound_head > > - s/WARN_ON/VM_WARN_ON_ONCE_PAGE/ > > > > Changes since v1: > > > > - Move pgmap to the folio as suggested by Matthew Wilcox > > --- > > [...] > > > static inline bool folio_is_device_coherent(const struct folio *folio) > > diff --git a/include/linux/migrate.h b/include/linux/migrate.h > > index 29919fa..61899ec 100644 > > --- a/include/linux/migrate.h > > +++ b/include/linux/migrate.h > > @@ -205,8 +205,8 @@ struct migrate_vma { > > unsigned long end; > > /* > > -* Set to the owner value also stored in page->pgmap->owner for > > -* migrating out of device private memory. The flags also need to > > +* Set to the owner value also stored in page_pgmap(page)->owner > > +* for migrating out of device private memory. The flags also need to > > * be set to MIGRATE_VMA_SELECT_DEVICE_PRIVATE. > > * The caller should always set this field when using mmu notifier > > * callbacks to avoid device MMU invalidations for device private > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > > index df8f515..54b59b8 100644 > > --- a/include/linux/mm_types.h > > +++ b/include/linux/mm_types.h > > @@ -129,8 +129,11 @@ struct page { > > unsigned long compound_head;/* Bit zero is set */ > > }; > > struct {/* ZONE_DEVICE pages */ > > - /** @pgmap: Points to the hosting device page map. */ > > - struct dev_pagemap *pgmap; > > + /* > > +* The first word is used for compound_head or folio > > +* pgmap > > +*/ > > + void *_unused_pgmap_compound_head; > > void *zone_device_data; > > /* > > * ZONE_DEVICE private pages are counted as being > > @@ -299,6 +302,7 @@ typedef struct { > >* @_refcount: Do not access this member directly. Use folio_ref_count() > >*to find how many references there are to this folio. > >* @memcg_data: Memory Control Group data. > > + * @pgmap: Metadata for ZONE_DEVICE mappings > >* @virtual: Virtual address in the kernel direct map. > >* @_last_cpupid: IDs of last CPU and last process that accessed the > > folio. > >* @_entire_mapcount: Do not use directly, call folio_entire_mapcount(). > > @@ -337,6 +341,7 @@ struct folio { > > /* private: */ > > }; > > /* public: */ > > + struct dev_pagemap *pgmap; > > Agreed, that should work. > > Acked-by: David Hildenbrand > > -- > Cheers, > > David / dhildenb >
Re: [PATCH v2 3/7] syscall.h: add syscall_set_arguments() and syscall_set_return_value()
I link the concept of this patchset, but *please* make it clear in the comments that this does not solve the issue of 64-bit kernel arguments on 32-bit systems being ABI specific. This isn't unique to this patch in any way; the only way to handle it is by keeping track of each ABI. On 1/15/25 18:20, Charlie Jenkins wrote: On Mon, Jan 13, 2025 at 07:11:40PM +0200, Dmitry V. Levin wrote: These functions are going to be needed on all HAVE_ARCH_TRACEHOOK architectures to implement PTRACE_SET_SYSCALL_INFO API. This partially reverts commit 7962c2eddbfe ("arch: remove unused function syscall_set_arguments()") by reusing some of old syscall_set_arguments() implementations. Signed-off-by: Dmitry V. Levin --- Note that I'm not a MIPS expert, I just added mips_set_syscall_arg() by looking at mips_get_syscall_arg() and the result passes tests in qemu on mips O32, mips64 O32, mips64 N32, and mips64 N64. arch/arc/include/asm/syscall.h| 14 +++ arch/arm/include/asm/syscall.h| 13 ++ arch/arm64/include/asm/syscall.h | 13 ++ arch/csky/include/asm/syscall.h | 13 ++ arch/hexagon/include/asm/syscall.h| 14 +++ arch/loongarch/include/asm/syscall.h | 8 ++ arch/mips/include/asm/syscall.h | 32 arch/nios2/include/asm/syscall.h | 11 arch/openrisc/include/asm/syscall.h | 7 ++ arch/parisc/include/asm/syscall.h | 12 + arch/powerpc/include/asm/syscall.h| 10 arch/riscv/include/asm/syscall.h | 9 +++ arch/s390/include/asm/syscall.h | 12 + arch/sh/include/asm/syscall_32.h | 12 + arch/sparc/include/asm/syscall.h | 10 arch/um/include/asm/syscall-generic.h | 14 +++ arch/x86/include/asm/syscall.h| 36 +++ arch/xtensa/include/asm/syscall.h | 11 include/asm-generic/syscall.h | 16 19 files changed, 267 insertions(+) diff --git a/arch/arc/include/asm/syscall.h b/arch/arc/include/asm/syscall.h index 9709256e31c8..89c1e1736356 100644 --- a/arch/arc/include/asm/syscall.h +++ b/arch/arc/include/asm/syscall.h @@ -67,6 +67,20 @@ syscall_get_arguments(struct task_struct *task, struct pt_regs *regs, } } +static inline void +syscall_set_arguments(struct task_struct *task, struct pt_regs *regs, + unsigned long *args) +{ + unsigned long *inside_ptregs = ®s->r0; + unsigned int n = 6; + unsigned int i = 0; + + while (n--) { + *inside_ptregs = args[i++]; + inside_ptregs--; + } +} + static inline int syscall_get_arch(struct task_struct *task) { diff --git a/arch/arm/include/asm/syscall.h b/arch/arm/include/asm/syscall.h index fe4326d938c1..21927fa0ae2b 100644 --- a/arch/arm/include/asm/syscall.h +++ b/arch/arm/include/asm/syscall.h @@ -80,6 +80,19 @@ static inline void syscall_get_arguments(struct task_struct *task, memcpy(args, ®s->ARM_r0 + 1, 5 * sizeof(args[0])); } +static inline void syscall_set_arguments(struct task_struct *task, +struct pt_regs *regs, +const unsigned long *args) +{ + memcpy(®s->ARM_r0, args, 6 * sizeof(args[0])); + /* +* Also copy the first argument into ARM_ORIG_r0 +* so that syscall_get_arguments() would return it +* instead of the previous value. +*/ + regs->ARM_ORIG_r0 = regs->ARM_r0; +} + static inline int syscall_get_arch(struct task_struct *task) { /* ARM tasks don't change audit architectures on the fly. */ diff --git a/arch/arm64/include/asm/syscall.h b/arch/arm64/include/asm/syscall.h index ab8e14b96f68..76020b66286b 100644 --- a/arch/arm64/include/asm/syscall.h +++ b/arch/arm64/include/asm/syscall.h @@ -73,6 +73,19 @@ static inline void syscall_get_arguments(struct task_struct *task, memcpy(args, ®s->regs[1], 5 * sizeof(args[0])); } +static inline void syscall_set_arguments(struct task_struct *task, +struct pt_regs *regs, +const unsigned long *args) +{ + memcpy(®s->regs[0], args, 6 * sizeof(args[0])); + /* +* Also copy the first argument into orig_x0 +* so that syscall_get_arguments() would return it +* instead of the previous value. +*/ + regs->orig_x0 = regs->regs[0]; +} + /* * We don't care about endianness (__AUDIT_ARCH_LE bit) here because * AArch64 has the same system calls both on little- and big- endian. diff --git a/arch/csky/include/asm/syscall.h b/arch/csky/include/asm/syscall.h index 0de5734950bf..30403f7a0487 100644 --- a/arch/csky/include/asm/syscall.h +++ b/arch/csky/include/asm/syscall.h @@ -59,6 +59,19 @@ syscall_get_arguments(struct task_struct *task, struct pt_regs *regs, mem
Re: [PATCH v6 19/26] proc/task_mmu: Mark devdax and fsdax pages as always unpinned
On Tue, Jan 14, 2025 at 05:45:46PM +0100, David Hildenbrand wrote: > On 14.01.25 03:28, Dan Williams wrote: > > Alistair Popple wrote: > > > The procfs mmu files such as smaps and pagemap currently ignore devdax and > > > fsdax pages because these pages are considered special. A future change > > > will start treating these as normal pages, meaning they can be exposed via > > > smaps and pagemap. > > > > > > The only difference is that devdax and fsdax pages can never be pinned for > > > DMA via FOLL_LONGTERM, so add an explicit check in pte_is_pinned() to > > > reflect that. > > > > I don't understand this patch. > > > This whole pte_is_pinned() should likely be ripped out (and I have a patch > here to do that for a long time). Agreed. > But that's a different discussion. > > > > > pin_user_pages() is also used for Direct-I/O page pinning, so the > > comment about FOLL_LONGTERM is wrong, and I otherwise do not understand > > what goes wrong if the only pte_is_pinned() user correctly detects the > > pin state? > > Yes, this patch should likely just be dropped. Yeah, I think I was just being overly cautious about the change to vm_normal_page(). Agree this can be dropped. Looking at task_mmu.c there is one other user of vm_normal_page() - clear_refs_pte_range(). We will start clearing access/referenced bits on DAX PTEs there. But I think that's actually the right thing to do given we do currently clear them for PMD mapped DAX pages. > Even if folio_maybe_dma_pinned() == true because of "false positives", it > will behave just like other order-0 pages with false positives, and only > affect soft-dirty tracking ... which nobody should be caring about here at > all. > > We would always detect the PTE as soft-dirty because we we never > pte_wrprotect(old_pte) > > Yes, nobody should care. > > -- > Cheers, > > David / dhildenb >
Re: [PATCH v6 08/26] fs/dax: Remove PAGE_MAPPING_DAX_SHARED mapping flag
On Tue, Jan 14, 2025 at 09:44:38PM -0800, Dan Williams wrote: > Alistair Popple wrote: > [..] > > > How does this case happen? I don't think any page would ever enter with > > > both ->mapping and ->share set, right? > > > > Sigh. You're right - it can't. This patch series is getting a litte bit > > large > > and unweildy with all the prerequisite bugfixes and cleanups. Obviously I > > fixed > > this when developing the main fs dax count fixup but forgot to rebase the > > fix > > further back in the series. > > I assumed as much when I got to that patch. > > > Anyway I have fixed that now, thanks. > > You deserve a large helping of grace for waking and then slaying this > old dragon. Heh, thanks. Lets hope this dragon doesn't have too many more heads :-)
Re: [PATCH v2 4/6] kvm powerpc/book3s-apiv2: Introduce kvm-hv specific PMU
Hi Vaibhav, kernel test robot noticed the following build warnings: [auto build test WARNING on powerpc/topic/ppc-kvm] [also build test WARNING on powerpc/next powerpc/fixes kvm/queue kvm/next mst-vhost/linux-next linus/master v6.13-rc7 next-20250116] [cannot apply to kvm/linux-next] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Vaibhav-Jain/powerpc-Document-APIv2-KVM-hcall-spec-for-Hostwide-counters/20250116-024240 base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git topic/ppc-kvm patch link: https://lore.kernel.org/r/20250115143948.369379-5-vaibhav%40linux.ibm.com patch subject: [PATCH v2 4/6] kvm powerpc/book3s-apiv2: Introduce kvm-hv specific PMU config: powerpc-allnoconfig (https://download.01.org/0day-ci/archive/20250117/202501171030.3x0gqw8g-...@intel.com/config) compiler: powerpc-linux-gcc (GCC) 14.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250117/202501171030.3x0gqw8g-...@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot | Closes: https://lore.kernel.org/oe-kbuild-all/202501171030.3x0gqw8g-...@intel.com/ All warnings (new ones prefixed by >>): In file included from arch/powerpc/include/asm/kvm_ppc.h:22, from arch/powerpc/include/asm/dbell.h:17, from arch/powerpc/kernel/asm-offsets.c:36: >> arch/powerpc/include/asm/kvm_book3s.h:357:13: warning: >> 'kvmppc_unregister_pmu' defined but not used [-Wunused-function] 357 | static void kvmppc_unregister_pmu(void) | ^ >> arch/powerpc/include/asm/kvm_book3s.h:352:12: warning: 'kvmppc_register_pmu' >> defined but not used [-Wunused-function] 352 | static int kvmppc_register_pmu(void) |^~~ -- In file included from arch/powerpc/include/asm/kvm_ppc.h:22, from arch/powerpc/include/asm/dbell.h:17, from arch/powerpc/kernel/asm-offsets.c:36: >> arch/powerpc/include/asm/kvm_book3s.h:357:13: warning: >> 'kvmppc_unregister_pmu' defined but not used [-Wunused-function] 357 | static void kvmppc_unregister_pmu(void) | ^ >> arch/powerpc/include/asm/kvm_book3s.h:352:12: warning: 'kvmppc_register_pmu' >> defined but not used [-Wunused-function] 352 | static int kvmppc_register_pmu(void) |^~~ vim +/kvmppc_unregister_pmu +357 arch/powerpc/include/asm/kvm_book3s.h 351 > 352 static int kvmppc_register_pmu(void) 353 { 354 return 0; 355 } 356 > 357 static void kvmppc_unregister_pmu(void) 358 { 359 } 360 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki