Christophe Leroy <christophe.le...@csgroup.eu> writes: > Le 06/09/2025 à 05:52, Ritesh Harjani a écrit : >> [Vous ne recevez pas souvent de courriers de rite...@linux.ibm.com. >> Découvrez pourquoi ceci est important à >> https://aka.ms/LearnAboutSenderIdentification ] >> >> Andrew Donnellan <a...@linux.ibm.com> writes: >> >>> If patch_branch() or patch_instruction() fails while updating a jump >>> label, we presently fail silently, leading to unpredictable behaviour >>> later on. >>> >>> Change arch_jump_label_transform() to panic on a code patching failure, >>> matching the existing behaviour of arch_static_call_transform(). >>> >>> Reported-by: Erhard Furtner <erhar...@mailbox.org> >>> Signed-off-by: Andrew Donnellan <a...@linux.ibm.com> >>> >>> --- >>> >>> Ran into this while debugging an issue that Erhard reported to me about my >>> PAGE_TABLE_CHECK series on a G4, where updating a static key failed >>> silently, but only for one call site, leading to an incorrect reference >>> count later on. This looks to be due to the issue fixed in [0]. A loud >>> failure would have saved us all considerable debugging time. >>> >>> Should I change the return type of arch_jump_label_transform() and handle >>> this in an arch-independent way? Are there other users of code patching >>> in powerpc that ought to be hardened? >>> >>> Or is this excessive? >>> >>> [0] >>> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/4b5e6eb281d7b1ea77619bee17095f905a125168.1757003584.git.christophe.le...@csgroup.eu/ >>> --- >>> arch/powerpc/kernel/jump_label.c | 9 +++++++-- >>> 1 file changed, 7 insertions(+), 2 deletions(-) >>> >>> diff --git a/arch/powerpc/kernel/jump_label.c >>> b/arch/powerpc/kernel/jump_label.c >>> index 2659e1ac8604..80d41ed7ac50 100644 >>> --- a/arch/powerpc/kernel/jump_label.c >>> +++ b/arch/powerpc/kernel/jump_label.c >>> @@ -12,9 +12,14 @@ void arch_jump_label_transform(struct jump_entry *entry, >>> enum jump_label_type type) >>> { >>> u32 *addr = (u32 *)jump_entry_code(entry); >>> + int err; >>> >>> if (type == JUMP_LABEL_JMP) >>> - patch_branch(addr, jump_entry_target(entry), 0); >>> + err = patch_branch(addr, jump_entry_target(entry), 0); >>> else >>> - patch_instruction(addr, ppc_inst(PPC_RAW_NOP())); >>> + err = patch_instruction(addr, ppc_inst(PPC_RAW_NOP())); >>> + >>> + if (err) >>> + panic("%s: patching failed, err %d, type %d, addr %pS, target >>> %pS\n", >>> + __func__, err, type, addr, (void >>> *)jump_entry_target(entry)); >>> } >> >> arch_jump_label_transform() is mainly getting called from >> __jump_level_update() and it's used for enabling or updating static keys / >> branch. >> >> But static keys can also be used by drivers / module subsystem whose >> initialization happens late. Although I understand that if the above >> fails, it might fail much before, from the arch setup code itself, but >> panic() still feels like a big hammer. > > But not being able to patch the kernel as required means that you get a > kernel behaving differently from what is expected. > > Imagine a kernel running on a board that is controlling a saw. There is > a patch_instruction() to activate the safety feature which detects when > your hands are too close to the blade. Do you want the kernel to > continue running seamlessly when that patch_instruction() fails ? I'm > sure you don't ! >
;) Sure. Not a fan of playing with blades or saws. :) >> >> Would pr_err() print with WARN_ON_ONCE(1) would suffice in case of an >> err? > > No, that's not enough, you can't rely on a kernel that will no behave as > expected. > Sure, Christophe. Thanks for the clarification. My main concern was that during module load time we should not bring the system down. But as I see the behavior in arch_static_call_transform() is also the same. And as you said, maybe it's good to panic if an important functionality like patch_instruction() itself fails which means the kernel may not be doing what we are expecting it to. >> >> Also you said you ran into a problem at just one call site where above >> was silently failing. With the above change are you able to hit the >> panic() now? Because from what I see in patch_instruction(), it mainly >> will boil down to calling __patch_mem() which always returns 0. > > As far as I can see, __patch_mem() returns -EPERM when > __put_kernel_nofault() fails: > > static int __patch_mem(void *exec_addr, unsigned long val, void > *patch_addr, bool is_dword) > { > if (!IS_ENABLED(CONFIG_PPC64) || likely(!is_dword)) { > /* For big endian correctness: plain address would use the > wrong half */ > u32 val32 = val; > > __put_kernel_nofault(patch_addr, &val32, u32, failed); > } else { > __put_kernel_nofault(patch_addr, &val, u64, failed); > } > > asm ("dcbst 0, %0; sync; icbi 0,%1; sync; isync" :: "r" (patch_addr), > "r" (exec_addr)); > > return 0; > > failed: > mb(); /* sync */ > return -EPERM; > } > Right, I somehow missed that "_nofault" part. > >> Although there are other places where there can be an error returned, >> so I was wondering if that is what you were hitting or something else? > > Andrew was hitting the -EPERM because the memory area was read-only. yes, that make sense. > > Christophe Thanks for the comments! -ritesh