Re: [PATCH RFC 20/37] mm: compaction: Reserve metadata storage in compaction_alloc()
Hi Peter, On Mon, Nov 20, 2023 at 08:49:32PM -0800, Peter Collingbourne wrote: > Hi Alexandru, > > On Wed, Aug 23, 2023 at 6:16 AM Alexandru Elisei > wrote: > > > > If the source page being migrated has metadata associated with it, make > > sure to reserve the metadata storage when choosing a suitable destination > > page from the free list. > > > > Signed-off-by: Alexandru Elisei > > --- > > mm/compaction.c | 9 + > > mm/internal.h | 1 + > > 2 files changed, 10 insertions(+) > > > > diff --git a/mm/compaction.c b/mm/compaction.c > > index cc0139fa0cb0..af2ee3085623 100644 > > --- a/mm/compaction.c > > +++ b/mm/compaction.c > > @@ -570,6 +570,7 @@ static unsigned long isolate_freepages_block(struct > > compact_control *cc, > > bool locked = false; > > unsigned long blockpfn = *start_pfn; > > unsigned int order; > > + int ret; > > > > /* Strict mode is for isolation, speed is secondary */ > > if (strict) > > @@ -626,6 +627,11 @@ static unsigned long isolate_freepages_block(struct > > compact_control *cc, > > > > /* Found a free page, will break it into order-0 pages */ > > order = buddy_order(page); > > + if (metadata_storage_enabled() && cc->reserve_metadata) { > > + ret = reserve_metadata_storage(page, order, > > cc->gfp_mask); > > At this point the zone lock is held and preemption is disabled, which > makes it invalid to call reserve_metadata_storage. You are correct, I missed that. I dropped reserving tag storage during compaction in the next iteration, so fortunately I unintentionally fixed it. Thanks, Alex > > Peter > > > + if (ret) > > + goto isolate_fail; > > + } > > isolated = __isolate_free_page(page, order); > > if (!isolated) > > break; > > @@ -1757,6 +1763,9 @@ static struct folio *compaction_alloc(struct folio > > *src, unsigned long data) > > struct compact_control *cc = (struct compact_control *)data; > > struct folio *dst; > > > > + if (metadata_storage_enabled()) > > + cc->reserve_metadata = folio_has_metadata(src); > > + > > if (list_empty(&cc->freepages)) { > > isolate_freepages(cc); > > > > diff --git a/mm/internal.h b/mm/internal.h > > index d28ac0085f61..046cc264bfbe 100644 > > --- a/mm/internal.h > > +++ b/mm/internal.h > > @@ -492,6 +492,7 @@ struct compact_control { > > */ > > bool alloc_contig; /* alloc_contig_range allocation */ > > bool source_has_metadata; /* source pages have associated > > metadata */ > > + bool reserve_metadata; > > }; > > > > /* > > -- > > 2.41.0 > >
Re: Re: EEVDF/vhost regression (bisected to 86bfbb7ce4f6 sched/fair: Add lag based placement)
On Fri, Nov 17, 2023 at 09:07:55PM +0800, Abel Wu wrote: > On 11/17/23 8:37 PM, Peter Zijlstra Wrote: [...] > > Ah, so if this is a cgroup issue, it might be worth trying this patch > > that we have in tip/sched/urgent. > > And please also apply this fix: > https://lore.kernel.org/all/20231117080106.12890-1-s921975...@gmail.com/ > We applied both suggested patch options and ran the test again, so sched/eevdf: Fix vruntime adjustment on reweight sched/fair: Update min_vruntime for reweight_entity() correctly and sched/eevdf: Delay dequeue Unfortunately, both variants do NOT fix the problem. The regression remains unchanged. I will continue getting myself familiar with how cgroups are scheduled to dig deeper here. If there are any other ideas, I'd be happy to use them as a starting point for further analysis. Would additional traces still be of interest? If so, I would be glad to provide them. [...]
[PATCH v3 0/5] arch,locking/atomic: add arch_cmpxchg[64]_local
Archtectures arc, openrisc and hexagon haven't arch_cmpxchg_local() defined, so the usecase of try_cmpxchg_local() in lib/objpool.c can not pass kernel building by the kernel test robot. Patch 1 improves the data size checking logic for arc; Patches 2/3/4 implement arch_cmpxchg[64]_local for arc/openrisc/hexagon. Patch 5 defines arch_cmpxchg_local as existing __cmpxchg_local rather the generic variant. wuqiang.matt (5): arch,locking/atomic: arc: arch_cmpxchg should check data size arch,locking/atomic: arc: add arch_cmpxchg[64]_local arch,locking/atomic: openrisc: add arch_cmpxchg[64]_local arch,locking/atomic: hexagon: add arch_cmpxchg[64]_local arch,locking/atomic: xtensa: define arch_cmpxchg_local as __cmpxchg_local arch/arc/include/asm/cmpxchg.h | 40 ++ arch/hexagon/include/asm/cmpxchg.h | 51 - arch/openrisc/include/asm/cmpxchg.h | 6 arch/xtensa/include/asm/cmpxchg.h | 2 +- 4 files changed, 91 insertions(+), 8 deletions(-) -- 2.40.1
[PATCH v3 1/5] arch,locking/atomic: arc: arch_cmpxchg should check data size
arch_cmpxchg() should check data size rather than pointer size in case CONFIG_ARC_HAS_LLSC is defined. So rename __cmpxchg to __cmpxchg_32 to emphasize it's explicit support of 32bit data size with BUILD_BUG_ON() added to avoid any possible misuses with unsupported data types. In case CONFIG_ARC_HAS_LLSC is undefined, arch_cmpxchg() uses spinlock to accomplish SMP-safety, so the BUILD_BUG_ON checking is uncecessary. v2 -> v3: - Patches regrouped and has the improvement for xtensa included - Comments refined to address why these changes are needed v1 -> v2: - Try using native cmpxchg variants if avaialble, as Arnd advised Signed-off-by: wuqiang.matt Reviewed-by: Masami Hiramatsu (Google) --- arch/arc/include/asm/cmpxchg.h | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/arc/include/asm/cmpxchg.h b/arch/arc/include/asm/cmpxchg.h index e138fde067de..bf46514f6f12 100644 --- a/arch/arc/include/asm/cmpxchg.h +++ b/arch/arc/include/asm/cmpxchg.h @@ -18,14 +18,16 @@ * if (*ptr == @old) * *ptr = @new */ -#define __cmpxchg(ptr, old, new) \ +#define __cmpxchg_32(ptr, old, new)\ ({ \ __typeof__(*(ptr)) _prev; \ \ + BUILD_BUG_ON(sizeof(*(ptr)) != 4); \ + \ __asm__ __volatile__( \ - "1: llock %0, [%1] \n" \ + "1: llock %0, [%1] \n" \ " brne %0, %2, 2f \n" \ - " scond %3, [%1] \n" \ + " scond %3, [%1] \n" \ " bnz 1b \n" \ "2: \n" \ : "=&r"(_prev) /* Early clobber prevent reg reuse */ \ @@ -47,7 +49,7 @@ \ switch(sizeof((_p_))) { \ case 4: \ - _prev_ = __cmpxchg(_p_, _o_, _n_); \ + _prev_ = __cmpxchg_32(_p_, _o_, _n_); \ break; \ default:\ BUILD_BUG();\ @@ -65,8 +67,6 @@ __typeof__(*(ptr)) _prev_; \ unsigned long __flags; \ \ - BUILD_BUG_ON(sizeof(_p_) != 4); \ - \ /* \ * spin lock/unlock provide the needed smp_mb() before/after\ */ \ -- 2.40.1
[PATCH v3 2/5] arch,locking/atomic: arc: add arch_cmpxchg[64]_local
arc doesn't have arch_cmpxhg_local implemented, which causes building failures for any references of try_cmpxchg_local, reported by the kernel test robot. This patch implements arch_cmpxchg[64]_local with the native cmpxchg variant if the corresponding data size is supported, otherwise generci_cmpxchg[64]_local is to be used. Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202310272207.tlpflya4-...@intel.com/ Signed-off-by: wuqiang.matt Reviewed-by: Masami Hiramatsu (Google) --- arch/arc/include/asm/cmpxchg.h | 28 1 file changed, 28 insertions(+) diff --git a/arch/arc/include/asm/cmpxchg.h b/arch/arc/include/asm/cmpxchg.h index bf46514f6f12..91429f2350df 100644 --- a/arch/arc/include/asm/cmpxchg.h +++ b/arch/arc/include/asm/cmpxchg.h @@ -80,6 +80,34 @@ #endif +/* + * always make arch_cmpxchg[64]_local available, native cmpxchg + * will be used if available, then generic_cmpxchg[64]_local + */ +#include +static inline unsigned long __cmpxchg_local(volatile void *ptr, + unsigned long old, + unsigned long new, int size) +{ + switch (size) { +#ifdef CONFIG_ARC_HAS_LLSC + case 4: + return __cmpxchg_32((int32_t *)ptr, old, new); +#endif + default: + return __generic_cmpxchg_local(ptr, old, new, size); + } + + return old; +} +#define arch_cmpxchg_local(ptr, o, n) ({ \ + (__typeof__(*ptr))__cmpxchg_local((ptr),\ + (unsigned long)(o), \ + (unsigned long)(n), \ + sizeof(*(ptr)));\ +}) +#define arch_cmpxchg64_local(ptr, o, n) __generic_cmpxchg64_local((ptr), (o), (n)) + /* * xchg */ -- 2.40.1
[PATCH v3 3/5] arch,locking/atomic: openrisc: add arch_cmpxchg[64]_local
openrisc hasn't arch_cmpxhg_local implemented, which causes building failures for any references of try_cmpxchg_local, reported by the kernel test robot. This patch implements arch_cmpxchg[64]_local with the native cmpxchg variant if the corresponding data size is supported, otherwise generci_cmpxchg[64]_local is to be used. Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202310272207.tlpflya4-...@intel.com/ Signed-off-by: wuqiang.matt Reviewed-by: Masami Hiramatsu (Google) --- arch/openrisc/include/asm/cmpxchg.h | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/openrisc/include/asm/cmpxchg.h b/arch/openrisc/include/asm/cmpxchg.h index 8ee151c072e4..f1ffe8b6f5ef 100644 --- a/arch/openrisc/include/asm/cmpxchg.h +++ b/arch/openrisc/include/asm/cmpxchg.h @@ -139,6 +139,12 @@ static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old, (unsigned long)(n), \ sizeof(*(ptr))); \ }) +#define arch_cmpxchg_local arch_cmpxchg + +/* always make arch_cmpxchg64_local available for openrisc */ +#include + +#define arch_cmpxchg64_local(ptr, o, n) __generic_cmpxchg64_local((ptr), (o), (n)) /* * This function doesn't exist, so you'll get a linker error if -- 2.40.1
[PATCH v3 4/5] arch,locking/atomic: hexagon: add arch_cmpxchg[64]_local
hexagonc hasn't arch_cmpxhg_local implemented, which causes building failures for any references of try_cmpxchg_local, reported by the kernel test robot. This patch implements arch_cmpxchg[64]_local with the native cmpxchg variant if the corresponding data size is supported, otherwise generci_cmpxchg[64]_local is to be used. Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202310272207.tlpflya4-...@intel.com/ Signed-off-by: wuqiang.matt Reviewed-by: Masami Hiramatsu (Google) --- arch/hexagon/include/asm/cmpxchg.h | 51 +- 1 file changed, 50 insertions(+), 1 deletion(-) diff --git a/arch/hexagon/include/asm/cmpxchg.h b/arch/hexagon/include/asm/cmpxchg.h index bf6cf5579cf4..302fa30f25aa 100644 --- a/arch/hexagon/include/asm/cmpxchg.h +++ b/arch/hexagon/include/asm/cmpxchg.h @@ -8,6 +8,8 @@ #ifndef _ASM_CMPXCHG_H #define _ASM_CMPXCHG_H +#include + /* * __arch_xchg - atomically exchange a register and a memory location * @x: value to swap @@ -51,13 +53,15 @@ __arch_xchg(unsigned long x, volatile void *ptr, int size) * variable casting. */ -#define arch_cmpxchg(ptr, old, new)\ +#define __cmpxchg_32(ptr, old, new)\ ({ \ __typeof__(ptr) __ptr = (ptr); \ __typeof__(*(ptr)) __old = (old); \ __typeof__(*(ptr)) __new = (new); \ __typeof__(*(ptr)) __oldval = 0;\ \ + BUILD_BUG_ON(sizeof(*(ptr)) != 4); \ + \ asm volatile( \ "1: %0 = memw_locked(%1);\n"\ " { P0 = cmp.eq(%0,%2);\n"\ @@ -72,4 +76,49 @@ __arch_xchg(unsigned long x, volatile void *ptr, int size) __oldval; \ }) +#define __cmpxchg(ptr, old, val, size) \ +({ \ + __typeof__(*(ptr)) oldval; \ + \ + switch (size) { \ + case 4: \ + oldval = __cmpxchg_32(ptr, old, val); \ + break; \ + default:\ + BUILD_BUG();\ + oldval = val; \ + break; \ + } \ + \ + oldval; \ +}) + +#define arch_cmpxchg(ptr, o, n)__cmpxchg((ptr), (o), (n), sizeof(*(ptr))) + +/* + * always make arch_cmpxchg[64]_local available, native cmpxchg + * will be used if available, then generic_cmpxchg[64]_local + */ +#include + +#define arch_cmpxchg_local(ptr, old, val) \ +({ \ + __typeof__(*(ptr)) __retval;\ + int __size = sizeof(*(ptr));\ + \ + switch (__size) { \ + case 4: \ + __retval = __cmpxchg_32(ptr, old, val); \ + break; \ + default:\ + __retval = __generic_cmpxchg_local(ptr, old,\ + val, __size); \ + break; \ + } \ + \ + __retval; \ +}) + +#define arch_cmpxchg64_local(ptr, o, n) __generic_cmpxchg64_local((ptr), (o), (n)) + #endif /* _ASM_CMPXCHG_H */ -- 2.40.1
[PATCH v3 5/5] arch,locking/atomic: xtensa: define arch_cmpxchg_local as __cmpxchg_local
The xtensa architecture already has __cmpxchg_local defined but not used. The purpose of __cmpxchg_local() is solely for arch_cmpxchg_local(), just as the definition of arch_cmpxchg_local() for other architectures like x86, arm and powerpc. Signed-off-by: wuqiang.matt --- arch/xtensa/include/asm/cmpxchg.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/xtensa/include/asm/cmpxchg.h b/arch/xtensa/include/asm/cmpxchg.h index 675a11ea8de7..956c9925df1c 100644 --- a/arch/xtensa/include/asm/cmpxchg.h +++ b/arch/xtensa/include/asm/cmpxchg.h @@ -108,7 +108,7 @@ static inline unsigned long __cmpxchg_local(volatile void *ptr, * them available. */ #define arch_cmpxchg_local(ptr, o, n) \ - ((__typeof__(*(ptr)))__generic_cmpxchg_local((ptr), (unsigned long)(o),\ + ((__typeof__(*(ptr)))__cmpxchg_local((ptr), (unsigned long)(o),\ (unsigned long)(n), sizeof(*(ptr #define arch_cmpxchg64_local(ptr, o, n) __generic_cmpxchg64_local((ptr), (o), (n)) #define arch_cmpxchg64(ptr, o, n)arch_cmpxchg64_local((ptr), (o), (n)) -- 2.40.1
[PATCH v6 06/13] x86/bugs: Rename SLS to CONFIG_MITIGATION_SLS
CPU mitigations config entries are inconsistent, and names are hard to related. There are concrete benefits for both users and developers of having all the mitigation config options living in the same config namespace. The mitigation options should have consistency and start with MITIGATION. Rename the Kconfig entry from SLS to MITIGATION_SLS. Suggested-by: Josh Poimboeuf Signed-off-by: Breno Leitao --- arch/x86/Kconfig | 2 +- arch/x86/Makefile | 2 +- arch/x86/include/asm/linkage.h | 4 ++-- arch/x86/kernel/alternative.c | 4 ++-- arch/x86/kernel/ftrace.c | 3 ++- arch/x86/net/bpf_jit_comp.c| 4 ++-- scripts/Makefile.lib | 2 +- 7 files changed, 11 insertions(+), 10 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 862be9b3b216..fa246de60cdb 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2580,7 +2580,7 @@ config CPU_SRSO help Enable the SRSO mitigation needed on AMD Zen1-4 machines. -config SLS +config MITIGATION_SLS bool "Mitigate Straight-Line-Speculation" depends on CC_HAS_SLS && X86_64 select OBJTOOL if HAVE_OBJTOOL diff --git a/arch/x86/Makefile b/arch/x86/Makefile index b8d23ed059fb..5ce8c30e7701 100644 --- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -205,7 +205,7 @@ ifdef CONFIG_MITIGATION_RETPOLINE endif endif -ifdef CONFIG_SLS +ifdef CONFIG_MITIGATION_SLS KBUILD_CFLAGS += -mharden-sls=all endif diff --git a/arch/x86/include/asm/linkage.h b/arch/x86/include/asm/linkage.h index c5165204c66f..09e2d026df33 100644 --- a/arch/x86/include/asm/linkage.h +++ b/arch/x86/include/asm/linkage.h @@ -43,7 +43,7 @@ #if defined(CONFIG_RETHUNK) && !defined(__DISABLE_EXPORTS) && !defined(BUILD_VDSO) #define RETjmp __x86_return_thunk #else /* CONFIG_MITIGATION_RETPOLINE */ -#ifdef CONFIG_SLS +#ifdef CONFIG_MITIGATION_SLS #define RETret; int3 #else #define RETret @@ -55,7 +55,7 @@ #if defined(CONFIG_RETHUNK) && !defined(__DISABLE_EXPORTS) && !defined(BUILD_VDSO) #define ASM_RET"jmp __x86_return_thunk\n\t" #else /* CONFIG_MITIGATION_RETPOLINE */ -#ifdef CONFIG_SLS +#ifdef CONFIG_MITIGATION_SLS #define ASM_RET"ret; int3\n\t" #else #define ASM_RET"ret\n\t" diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 5ec887d065ce..b01d49862497 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -637,8 +637,8 @@ static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes) /* * The compiler is supposed to EMIT an INT3 after every unconditional * JMP instruction due to AMD BTC. However, if the compiler is too old -* or SLS isn't enabled, we still need an INT3 after indirect JMPs -* even on Intel. +* or MITIGATION_SLS isn't enabled, we still need an INT3 after +* indirect JMPs even on Intel. */ if (op == JMP32_INSN_OPCODE && i < insn->length) bytes[i++] = INT3_INSN_OPCODE; diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c index 93bc52d4a472..70139d9d2e01 100644 --- a/arch/x86/kernel/ftrace.c +++ b/arch/x86/kernel/ftrace.c @@ -307,7 +307,8 @@ union ftrace_op_code_union { } __attribute__((packed)); }; -#define RET_SIZE (IS_ENABLED(CONFIG_MITIGATION_RETPOLINE) ? 5 : 1 + IS_ENABLED(CONFIG_SLS)) +#define RET_SIZE \ + (IS_ENABLED(CONFIG_MITIGATION_RETPOLINE) ? 5 : 1 + IS_ENABLED(CONFIG_MITIGATION_SLS)) static unsigned long create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size) diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index ef732f323926..96a63c4386a9 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -469,7 +469,7 @@ static void emit_indirect_jump(u8 **pprog, int reg, u8 *ip) emit_jump(&prog, &__x86_indirect_thunk_array[reg], ip); } else { EMIT2(0xFF, 0xE0 + reg);/* jmp *%\reg */ - if (IS_ENABLED(CONFIG_MITIGATION_RETPOLINE) || IS_ENABLED(CONFIG_SLS)) + if (IS_ENABLED(CONFIG_MITIGATION_RETPOLINE) || IS_ENABLED(CONFIG_MITIGATION_SLS)) EMIT1(0xCC);/* int3 */ } @@ -484,7 +484,7 @@ static void emit_return(u8 **pprog, u8 *ip) emit_jump(&prog, x86_return_thunk, ip); } else { EMIT1(0xC3);/* ret */ - if (IS_ENABLED(CONFIG_SLS)) + if (IS_ENABLED(CONFIG_MITIGATION_SLS)) EMIT1(0xCC);/* int3 */ } diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib index d6e157938b5f..0d5461276179 100644 --- a/scripts/Makefile.lib +++ b/scripts/Makefile.lib @@ -264,7 +264,7 @@ endif objtool-args-$(CONFIG_UNWINDER_ORC)+= --orc objtool-args-$(CONFIG_MITIGATION_RETPOLINE)+= --retpoline objtool-args-$(CONFIG_RETHUNK)
Re: [PATCH v4 4/5] x86/paravirt: switch mixed paravirt/alternative calls to alternative_2
On Mon, Oct 30, 2023 at 03:25:07PM +0100, Juergen Gross wrote: > Instead of stacking alternative and paravirt patching, use the new > ALT_FLAG_CALL flag to switch those mixed calls to pure alternative > handling. > > This eliminates the need to be careful regarding the sequence of > alternative and paravirt patching. > > For call depth tracking callthunks_setup() needs to be adapted to patch > calls at alternative patching sites instead of paravirt calls. Why is this important so that it is called out explicitly in the commit message? Is callthunks_setup() special somehow? > > Signed-off-by: Juergen Gross > Acked-by: Peter Zijlstra (Intel) > --- > arch/x86/include/asm/alternative.h| 5 +++-- > arch/x86/include/asm/paravirt.h | 9 ++--- > arch/x86/include/asm/paravirt_types.h | 26 +- > arch/x86/kernel/callthunks.c | 17 - > arch/x86/kernel/module.c | 20 +--- > 5 files changed, 31 insertions(+), 46 deletions(-) > > diff --git a/arch/x86/include/asm/alternative.h > b/arch/x86/include/asm/alternative.h > index 2a74a94bd569..07b17bc615a0 100644 > --- a/arch/x86/include/asm/alternative.h > +++ b/arch/x86/include/asm/alternative.h > @@ -89,6 +89,8 @@ struct alt_instr { > u8 replacementlen; /* length of new instruction */ > } __packed; > > +extern struct alt_instr __alt_instructions[], __alt_instructions_end[]; > + arch/x86/include/asm/alternative.h:92:extern struct alt_instr __alt_instructions[], __alt_instructions_end[]; arch/x86/kernel/alternative.c:163:extern struct alt_instr __alt_instructions[], __alt_instructions_end[]; Zap the declaration from the .c file pls. > /* > * Debug flag that can be tested to see whether alternative > * instructions were patched in already: > @@ -104,11 +106,10 @@ extern void apply_fineibt(s32 *start_retpoline, s32 > *end_retpoine, > s32 *start_cfi, s32 *end_cfi); > > struct module; > -struct paravirt_patch_site; > > struct callthunk_sites { > s32 *call_start, *call_end; > - struct paravirt_patch_site *pv_start, *pv_end; > + struct alt_instr*alt_start, *alt_end; > }; > > #ifdef CONFIG_CALL_THUNKS > diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h > index 3749311d51c3..9c6c5cfa9fe2 100644 > --- a/arch/x86/include/asm/paravirt.h > +++ b/arch/x86/include/asm/paravirt.h > @@ -740,20 +740,23 @@ void native_pv_lock_init(void) __init; > > #ifdef CONFIG_X86_64 > #ifdef CONFIG_PARAVIRT_XXL > +#ifdef CONFIG_DEBUG_ENTRY > > #define PARA_PATCH(off) ((off) / 8) > #define PARA_SITE(ptype, ops)_PVSITE(ptype, ops, .quad, 8) > #define PARA_INDIRECT(addr) *addr(%rip) > > -#ifdef CONFIG_DEBUG_ENTRY > .macro PARA_IRQ_save_fl > PARA_SITE(PARA_PATCH(PV_IRQ_save_fl), > ANNOTATE_RETPOLINE_SAFE; > call PARA_INDIRECT(pv_ops+PV_IRQ_save_fl);) > + ANNOTATE_RETPOLINE_SAFE; > + call PARA_INDIRECT(pv_ops+PV_IRQ_save_fl); > .endm > > -#define SAVE_FLAGS ALTERNATIVE "PARA_IRQ_save_fl;", "pushf; pop %rax;", \ > - ALT_NOT_XEN > +#define SAVE_FLAGS ALTERNATIVE_2 "PARA_IRQ_save_fl;", > \ > + ALT_CALL_INSTR, ALT_CALL_ALWAYS, \ > + "pushf; pop %rax;", ALT_NOT_XEN How is that supposed to work? At build time for a PARAVIRT_XXL build it'll have that PARA_IRQ_save_fl macro in there which issues a .parainstructions section and an indirect call to call *pv_ops+240(%rip); then it'll always patch in "call BUG_func" due to X86_FEATURE_ALWAYS. I guess this is your way of saying "this should always be patched, one way or the other, depending on X86_FEATURE_XENPV, and this is a way to catch unpatched locations... Then on a pv build which doesn't set X86_FEATURE_XENPV during boot, it'll replace the "call BUG_func" thing with the pushf;pop. And if it does set X86_FEATURE_XENPV, it'll patch in the direct call to /me greps tree ... pv_native_save_fl I guess. If anything, how those ALT_CALL_ALWAYS things are supposed to work, should be documented there, over the macro definition and what the intent is. Because otherwise we'll have to swap in the whole machinery back into our L1s each time we need to touch it. And btw, this whole patching stuff becomes insanely non-trivial slowly. :-\ > diff --git a/arch/x86/kernel/callthunks.c b/arch/x86/kernel/callthunks.c > index faa9f2299848..200ea8087ddb 100644 > --- a/arch/x86/kernel/callthunks.c > +++ b/arch/x86/kernel/callthunks.c > @@ -238,14 +238,13 @@ patch_call_sites(s32 *start, s32 *end, const struct > core_text *ct) > } > > static __init_or_module void > -patch_paravirt_call_sites(struct paravirt_patch_site *start, > - struct paravirt_patch_site *end, > - const s
[PATCH net v1] vsock/test: fix SEQPACKET message bounds test
Tune message length calculation to make this test work on machines where 'getpagesize()' returns >32KB. Now maximum message length is not hardcoded (on machines above it was smaller than 'getpagesize()' return value, thus we get negative value and test fails), but calculated at runtime and always bigger than 'getpagesize()' result. Reproduced on aarch64 with 64KB page size. Fixes: 5c338112e48a ("test/vsock: rework message bounds test") Signed-off-by: Arseniy Krasnov --- tools/testing/vsock/vsock_test.c | 19 +-- 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c index f5623b8d76b7..691e44c746bf 100644 --- a/tools/testing/vsock/vsock_test.c +++ b/tools/testing/vsock/vsock_test.c @@ -353,11 +353,12 @@ static void test_stream_msg_peek_server(const struct test_opts *opts) } #define SOCK_BUF_SIZE (2 * 1024 * 1024) -#define MAX_MSG_SIZE (32 * 1024) +#define MAX_MSG_PAGES 4 static void test_seqpacket_msg_bounds_client(const struct test_opts *opts) { unsigned long curr_hash; + size_t max_msg_size; int page_size; int msg_count; int fd; @@ -373,7 +374,8 @@ static void test_seqpacket_msg_bounds_client(const struct test_opts *opts) curr_hash = 0; page_size = getpagesize(); - msg_count = SOCK_BUF_SIZE / MAX_MSG_SIZE; + max_msg_size = MAX_MSG_PAGES * page_size; + msg_count = SOCK_BUF_SIZE / max_msg_size; for (int i = 0; i < msg_count; i++) { size_t buf_size; @@ -383,7 +385,7 @@ static void test_seqpacket_msg_bounds_client(const struct test_opts *opts) /* Use "small" buffers and "big" buffers. */ if (i & 1) buf_size = page_size + - (rand() % (MAX_MSG_SIZE - page_size)); + (rand() % (max_msg_size - page_size)); else buf_size = 1 + (rand() % page_size); @@ -429,7 +431,6 @@ static void test_seqpacket_msg_bounds_server(const struct test_opts *opts) unsigned long remote_hash; unsigned long curr_hash; int fd; - char buf[MAX_MSG_SIZE]; struct msghdr msg = {0}; struct iovec iov = {0}; @@ -457,8 +458,13 @@ static void test_seqpacket_msg_bounds_server(const struct test_opts *opts) control_writeln("SRVREADY"); /* Wait, until peer sends whole data. */ control_expectln("SENDDONE"); - iov.iov_base = buf; - iov.iov_len = sizeof(buf); + iov.iov_len = MAX_MSG_PAGES * getpagesize(); + iov.iov_base = malloc(iov.iov_len); + if (!iov.iov_base) { + perror("malloc"); + exit(EXIT_FAILURE); + } + msg.msg_iov = &iov; msg.msg_iovlen = 1; @@ -483,6 +489,7 @@ static void test_seqpacket_msg_bounds_server(const struct test_opts *opts) curr_hash += hash_djb2(msg.msg_iov[0].iov_base, recv_size); } + free(iov.iov_base); close(fd); remote_hash = control_readulong(); -- 2.25.1
[PATCH v2 01/33] ftrace: Unpoison ftrace_regs in ftrace_ops_list_func()
Architectures use assembly code to initialize ftrace_regs and call ftrace_ops_list_func(). Therefore, from the KMSAN's point of view, ftrace_regs is poisoned on ftrace_ops_list_func entry(). This causes KMSAN warnings when running the ftrace testsuite. Fix by trusting the architecture-specific assembly code and always unpoisoning ftrace_regs in ftrace_ops_list_func. Signed-off-by: Ilya Leoshkevich --- kernel/trace/ftrace.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index 8de8bec5f366..dfb8b26966aa 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -7399,6 +7399,7 @@ __ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip, void arch_ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip, struct ftrace_ops *op, struct ftrace_regs *fregs) { + kmsan_unpoison_memory(fregs, sizeof(*fregs)); __ftrace_ops_list_func(ip, parent_ip, NULL, fregs); } #else -- 2.41.0
[PATCH v2 08/33] kmsan: Remove an x86-specific #include from kmsan.h
Replace the x86-specific asm/pgtable_64_types.h #include with the linux/pgtable.h one, which all architectures have. While at it, sort the headers alphabetically for the sake of consistency with other KMSAN code. Fixes: f80be4571b19 ("kmsan: add KMSAN runtime core") Suggested-by: Heiko Carstens Reviewed-by: Alexander Potapenko Signed-off-by: Ilya Leoshkevich --- mm/kmsan/kmsan.h | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/mm/kmsan/kmsan.h b/mm/kmsan/kmsan.h index a14744205435..adf443bcffe8 100644 --- a/mm/kmsan/kmsan.h +++ b/mm/kmsan/kmsan.h @@ -10,14 +10,14 @@ #ifndef __MM_KMSAN_KMSAN_H #define __MM_KMSAN_KMSAN_H -#include #include +#include +#include +#include +#include #include #include #include -#include -#include -#include #define KMSAN_ALLOCA_MAGIC_ORIGIN 0xabcd0100 #define KMSAN_CHAIN_MAGIC_ORIGIN 0xabcd0200 -- 2.41.0
[PATCH v2 07/33] kmsan: Remove a useless assignment from kmsan_vmap_pages_range_noflush()
The value assigned to prot is immediately overwritten on the next line with PAGE_KERNEL. The right hand side of the assignment has no side-effects. Fixes: b073d7f8aee4 ("mm: kmsan: maintain KMSAN metadata for page operations") Suggested-by: Alexander Gordeev Reviewed-by: Alexander Potapenko Signed-off-by: Ilya Leoshkevich --- mm/kmsan/shadow.c | 1 - 1 file changed, 1 deletion(-) diff --git a/mm/kmsan/shadow.c b/mm/kmsan/shadow.c index b9d05aff313e..2d57408c78ae 100644 --- a/mm/kmsan/shadow.c +++ b/mm/kmsan/shadow.c @@ -243,7 +243,6 @@ int kmsan_vmap_pages_range_noflush(unsigned long start, unsigned long end, s_pages[i] = shadow_page_for(pages[i]); o_pages[i] = origin_page_for(pages[i]); } - prot = __pgprot(pgprot_val(prot) | _PAGE_NX); prot = PAGE_KERNEL; origin_start = vmalloc_meta((void *)start, KMSAN_META_ORIGIN); -- 2.41.0
[PATCH v2 10/33] kmsan: Expose kmsan_get_metadata()
Each s390 CPU has lowcore pages associated with it. Each CPU sees its own lowcore at virtual address 0 through a hardware mechanism called prefixing. Additionally, all lowcores are mapped to non-0 virtual addresses stored in the lowcore_ptr[] array. When lowcore is accessed through virtual address 0, one needs to resolve metadata for lowcore_ptr[raw_smp_processor_id()]. Expose kmsan_get_metadata() to make it possible to do this from the arch code. Signed-off-by: Ilya Leoshkevich --- include/linux/kmsan.h | 14 ++ mm/kmsan/instrumentation.c | 1 + mm/kmsan/kmsan.h | 1 - 3 files changed, 15 insertions(+), 1 deletion(-) diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h index e0c23a32cdf0..ff8fd95733fa 100644 --- a/include/linux/kmsan.h +++ b/include/linux/kmsan.h @@ -230,6 +230,15 @@ void kmsan_handle_urb(const struct urb *urb, bool is_out); */ void kmsan_unpoison_entry_regs(const struct pt_regs *regs); +/** + * kmsan_get_metadata() - Return a pointer to KMSAN shadow or origins. + * @addr: kernel address. + * @is_origin: whether to return origins or shadow. + * + * Return NULL if metadata cannot be found. + */ +void *kmsan_get_metadata(void *addr, bool is_origin); + #else static inline void kmsan_init_shadow(void) @@ -329,6 +338,11 @@ static inline void kmsan_unpoison_entry_regs(const struct pt_regs *regs) { } +static inline void *kmsan_get_metadata(void *addr, bool is_origin) +{ + return NULL; +} + #endif #endif /* _LINUX_KMSAN_H */ diff --git a/mm/kmsan/instrumentation.c b/mm/kmsan/instrumentation.c index 8a1bbbc723ab..94b49fac9d8b 100644 --- a/mm/kmsan/instrumentation.c +++ b/mm/kmsan/instrumentation.c @@ -14,6 +14,7 @@ #include "kmsan.h" #include +#include #include #include #include diff --git a/mm/kmsan/kmsan.h b/mm/kmsan/kmsan.h index adf443bcffe8..34b83c301d57 100644 --- a/mm/kmsan/kmsan.h +++ b/mm/kmsan/kmsan.h @@ -66,7 +66,6 @@ struct shadow_origin_ptr { struct shadow_origin_ptr kmsan_get_shadow_origin_ptr(void *addr, u64 size, bool store); -void *kmsan_get_metadata(void *addr, bool is_origin); void __init kmsan_init_alloc_meta_for_range(void *start, void *end); enum kmsan_bug_reason { -- 2.41.0
[PATCH v2 11/33] kmsan: Export panic_on_kmsan
When building the kmsan test as a module, modpost fails with the following error message: ERROR: modpost: "panic_on_kmsan" [mm/kmsan/kmsan_test.ko] undefined! Export panic_on_kmsan in order to improve the KMSAN usability for modules. Reviewed-by: Alexander Potapenko Signed-off-by: Ilya Leoshkevich --- mm/kmsan/report.c | 1 + 1 file changed, 1 insertion(+) diff --git a/mm/kmsan/report.c b/mm/kmsan/report.c index 02736ec757f2..c79d3b0d2d0d 100644 --- a/mm/kmsan/report.c +++ b/mm/kmsan/report.c @@ -20,6 +20,7 @@ static DEFINE_RAW_SPINLOCK(kmsan_report_lock); /* Protected by kmsan_report_lock */ static char report_local_descr[DESCR_SIZE]; int panic_on_kmsan __read_mostly; +EXPORT_SYMBOL_GPL(panic_on_kmsan); #ifdef MODULE_PARAM_PREFIX #undef MODULE_PARAM_PREFIX -- 2.41.0
[PATCH v2 18/33] lib/string: Add KMSAN support to strlcpy() and strlcat()
Currently KMSAN does not fully propagate metadata in strlcpy() and strlcat(), because they are built with -ffreestanding and call memcpy(). In this combination memcpy() calls are not instrumented. Fix by copying the metadata manually. Add the __STDC_HOSTED__ #ifdef in case the code is compiled with different flags in the future. Signed-off-by: Ilya Leoshkevich --- lib/string.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/lib/string.c b/lib/string.c index be26623953d2..e83c6dd77ec6 100644 --- a/lib/string.c +++ b/lib/string.c @@ -111,6 +111,9 @@ size_t strlcpy(char *dest, const char *src, size_t size) if (size) { size_t len = (ret >= size) ? size - 1 : ret; __builtin_memcpy(dest, src, len); +#if __STDC_HOSTED__ == 0 + kmsan_memmove_metadata(dest, src, len); +#endif dest[len] = '\0'; } return ret; @@ -261,6 +264,9 @@ size_t strlcat(char *dest, const char *src, size_t count) if (len >= count) len = count-1; __builtin_memcpy(dest, src, len); +#if __STDC_HOSTED__ == 0 + kmsan_memmove_metadata(dest, src, len); +#endif dest[len] = 0; return res; } -- 2.41.0
[PATCH v2 14/33] kmsan: Support SLAB_POISON
Avoid false KMSAN negatives with SLUB_DEBUG by allowing kmsan_slab_free() to poison the freed memory, and by preventing init_object() from unpoisoning new allocations. The usage of memset_no_sanitize_memory() does not degrade the generated code quality. There are two alternatives to this approach. First, init_object() can be marked with __no_sanitize_memory. This annotation should be used with great care, because it drops all instrumentation from the function, and any shadow writes will be lost. Even though this is not a concern with the current init_object() implementation, this may change in the future. Second, kmsan_poison_memory() calls may be added after memset() calls. The downside is that init_object() is called from free_debug_processing(), in which case poisoning will erase the distinction between simply uninitialized memory and UAF. Signed-off-by: Ilya Leoshkevich --- mm/kmsan/hooks.c | 2 +- mm/slub.c| 10 ++ 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c index 7b5814412e9f..7a30274b893c 100644 --- a/mm/kmsan/hooks.c +++ b/mm/kmsan/hooks.c @@ -76,7 +76,7 @@ void kmsan_slab_free(struct kmem_cache *s, void *object) return; /* RCU slabs could be legally used after free within the RCU period */ - if (unlikely(s->flags & (SLAB_TYPESAFE_BY_RCU | SLAB_POISON))) + if (unlikely(s->flags & SLAB_TYPESAFE_BY_RCU)) return; /* * If there's a constructor, freed memory must remain in the same state diff --git a/mm/slub.c b/mm/slub.c index 63d281dfacdb..169e5f645ea8 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1030,7 +1030,8 @@ static void init_object(struct kmem_cache *s, void *object, u8 val) unsigned int poison_size = s->object_size; if (s->flags & SLAB_RED_ZONE) { - memset(p - s->red_left_pad, val, s->red_left_pad); + memset_no_sanitize_memory(p - s->red_left_pad, val, + s->red_left_pad); if (slub_debug_orig_size(s) && val == SLUB_RED_ACTIVE) { /* @@ -1043,12 +1044,13 @@ static void init_object(struct kmem_cache *s, void *object, u8 val) } if (s->flags & __OBJECT_POISON) { - memset(p, POISON_FREE, poison_size - 1); - p[poison_size - 1] = POISON_END; + memset_no_sanitize_memory(p, POISON_FREE, poison_size - 1); + memset_no_sanitize_memory(p + poison_size - 1, POISON_END, 1); } if (s->flags & SLAB_RED_ZONE) - memset(p + poison_size, val, s->inuse - poison_size); + memset_no_sanitize_memory(p + poison_size, val, + s->inuse - poison_size); } static void restore_bytes(struct kmem_cache *s, char *message, u8 data, -- 2.41.0
[PATCH v2 02/33] kmsan: Make the tests compatible with kmsan.panic=1
It's useful to have both tests and kmsan.panic=1 during development, but right now the warnings, that the tests cause, lead to kernel panics. Temporarily set kmsan.panic=0 for the duration of the KMSAN testing. Reviewed-by: Alexander Potapenko Signed-off-by: Ilya Leoshkevich --- mm/kmsan/kmsan_test.c | 5 + 1 file changed, 5 insertions(+) diff --git a/mm/kmsan/kmsan_test.c b/mm/kmsan/kmsan_test.c index 07d3a3a5a9c5..9bfd11674fe3 100644 --- a/mm/kmsan/kmsan_test.c +++ b/mm/kmsan/kmsan_test.c @@ -659,9 +659,13 @@ static void test_exit(struct kunit *test) { } +static int orig_panic_on_kmsan; + static int kmsan_suite_init(struct kunit_suite *suite) { register_trace_console(probe_console, NULL); + orig_panic_on_kmsan = panic_on_kmsan; + panic_on_kmsan = 0; return 0; } @@ -669,6 +673,7 @@ static void kmsan_suite_exit(struct kunit_suite *suite) { unregister_trace_console(probe_console, NULL); tracepoint_synchronize_unregister(); + panic_on_kmsan = orig_panic_on_kmsan; } static struct kunit_suite kmsan_test_suite = { -- 2.41.0
[PATCH v2 15/33] kmsan: Use ALIGN_DOWN() in kmsan_get_metadata()
Improve the readability by replacing the custom aligning logic with ALIGN_DOWN(). Unlike other places where a similar sequence is used, there is no size parameter that needs to be adjusted, so the standard macro fits. Reviewed-by: Alexander Potapenko Signed-off-by: Ilya Leoshkevich --- mm/kmsan/shadow.c | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/mm/kmsan/shadow.c b/mm/kmsan/shadow.c index 2d57408c78ae..9c58f081d84f 100644 --- a/mm/kmsan/shadow.c +++ b/mm/kmsan/shadow.c @@ -123,14 +123,12 @@ struct shadow_origin_ptr kmsan_get_shadow_origin_ptr(void *address, u64 size, */ void *kmsan_get_metadata(void *address, bool is_origin) { - u64 addr = (u64)address, pad, off; + u64 addr = (u64)address, off; struct page *page; void *ret; - if (is_origin && !IS_ALIGNED(addr, KMSAN_ORIGIN_SIZE)) { - pad = addr % KMSAN_ORIGIN_SIZE; - addr -= pad; - } + if (is_origin) + addr = ALIGN_DOWN(addr, KMSAN_ORIGIN_SIZE); address = (void *)addr; if (kmsan_internal_is_vmalloc_addr(address) || kmsan_internal_is_module_addr(address)) -- 2.41.0
[PATCH v2 16/33] mm: slub: Let KMSAN access metadata
Building the kernel with CONFIG_SLUB_DEBUG and CONFIG_KMSAN causes KMSAN to complain about touching redzones in kfree(). Fix by extending the existing KASAN-related metadata_access_enable() and metadata_access_disable() functions to KMSAN. Signed-off-by: Ilya Leoshkevich --- mm/slub.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/slub.c b/mm/slub.c index 169e5f645ea8..6e61c27951a4 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -700,10 +700,12 @@ static int disable_higher_order_debug; static inline void metadata_access_enable(void) { kasan_disable_current(); + kmsan_disable_current(); } static inline void metadata_access_disable(void) { + kmsan_enable_current(); kasan_enable_current(); } -- 2.41.0
[PATCH v2 20/33] kmsan: Accept ranges starting with 0 on s390
On s390 the virtual address 0 is valid (current CPU's lowcore is mapped there), therefore KMSAN should not complain about it. Disable the respective check on s390. There doesn't seem to be a Kconfig option to describe this situation, so explicitly check for s390. Reviewed-by: Alexander Potapenko Signed-off-by: Ilya Leoshkevich --- mm/kmsan/init.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/mm/kmsan/init.c b/mm/kmsan/init.c index ffedf4dbc49d..7a3df4d359f8 100644 --- a/mm/kmsan/init.c +++ b/mm/kmsan/init.c @@ -33,7 +33,10 @@ static void __init kmsan_record_future_shadow_range(void *start, void *end) bool merged = false; KMSAN_WARN_ON(future_index == NUM_FUTURE_RANGES); - KMSAN_WARN_ON((nstart >= nend) || !nstart || !nend); + KMSAN_WARN_ON((nstart >= nend) || + /* Virtual address 0 is valid on s390. */ + (!IS_ENABLED(CONFIG_S390) && !nstart) || + !nend); nstart = ALIGN_DOWN(nstart, PAGE_SIZE); nend = ALIGN(nend, PAGE_SIZE); -- 2.41.0
[PATCH v2 05/33] kmsan: Fix is_bad_asm_addr() on arches with overlapping address spaces
Comparing pointers with TASK_SIZE does not make sense when kernel and userspace overlap. Skip the comparison when this is the case. Signed-off-by: Ilya Leoshkevich --- mm/kmsan/instrumentation.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/mm/kmsan/instrumentation.c b/mm/kmsan/instrumentation.c index 470b0b4afcc4..8a1bbbc723ab 100644 --- a/mm/kmsan/instrumentation.c +++ b/mm/kmsan/instrumentation.c @@ -20,7 +20,8 @@ static inline bool is_bad_asm_addr(void *addr, uintptr_t size, bool is_store) { - if ((u64)addr < TASK_SIZE) + if (IS_ENABLED(CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE) && + (u64)addr < TASK_SIZE) return true; if (!kmsan_get_metadata(addr, KMSAN_META_SHADOW)) return true; -- 2.41.0
[PATCH v2 03/33] kmsan: Disable KMSAN when DEFERRED_STRUCT_PAGE_INIT is enabled
KMSAN relies on memblock returning all available pages to it (see kmsan_memblock_free_pages()). It partitions these pages into 3 categories: pages available to the buddy allocator, shadow pages and origin pages. This partitioning is static. If new pages appear after kmsan_init_runtime(), it is considered an error. DEFERRED_STRUCT_PAGE_INIT causes this, so mark it as incompatible with KMSAN. Reviewed-by: Alexander Potapenko Signed-off-by: Ilya Leoshkevich --- mm/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/mm/Kconfig b/mm/Kconfig index 89971a894b60..4f2f99339fc7 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -985,6 +985,7 @@ config DEFERRED_STRUCT_PAGE_INIT depends on SPARSEMEM depends on !NEED_PER_CPU_KM depends on 64BIT + depends on !KMSAN select PADATA help Ordinarily all struct pages are initialised during early boot in a -- 2.41.0
[PATCH v2 17/33] mm: kfence: Disable KMSAN when checking the canary
KMSAN warns about check_canary() accessing the canary. The reason is that, even though set_canary() is properly instrumented and sets shadow, slub explicitly poisons the canary's address range afterwards. Unpoisoning the canary is not the right thing to do: only check_canary() is supposed to ever touch it. Instead, disable KMSAN checks around canary read accesses. Signed-off-by: Ilya Leoshkevich --- mm/kfence/core.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/mm/kfence/core.c b/mm/kfence/core.c index 3872528d0963..a2ea8e5a1ad9 100644 --- a/mm/kfence/core.c +++ b/mm/kfence/core.c @@ -306,7 +306,7 @@ metadata_update_state(struct kfence_metadata *meta, enum kfence_object_state nex } /* Check canary byte at @addr. */ -static inline bool check_canary_byte(u8 *addr) +__no_kmsan_checks static inline bool check_canary_byte(u8 *addr) { struct kfence_metadata *meta; unsigned long flags; @@ -341,7 +341,8 @@ static inline void set_canary(const struct kfence_metadata *meta) *((u64 *)addr) = KFENCE_CANARY_PATTERN_U64; } -static inline void check_canary(const struct kfence_metadata *meta) +__no_kmsan_checks static inline void +check_canary(const struct kfence_metadata *meta) { const unsigned long pageaddr = ALIGN_DOWN(meta->addr, PAGE_SIZE); unsigned long addr = pageaddr; -- 2.41.0
[PATCH v2 00/33] kmsan: Enable on s390
v1: https://lore.kernel.org/lkml/20231115203401.2495875-1-...@linux.ibm.com/ v1 -> v2: Add comments, sort #includes, introduce memset_no_sanitize_memory() and use it to avoid unpoisoning of redzones, change vmalloc alignment to _REGION3_SIZE, add R-bs (Alexander P.). Fix building [PATCH 28/33] s390/string: Add KMSAN support with FORTIFY_SOURCE. Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202311170550.bsbo44ix-...@intel.com/ Hi, This series provides the minimal support for Kernel Memory Sanitizer on s390. Kernel Memory Sanitizer is clang-only instrumentation for finding accesses to uninitialized memory. The clang support for s390 has already been merged [1]. With this series, I can successfully boot s390 defconfig and debug_defconfig with kmsan.panic=1. The tool found one real s390-specific bug (fixed in master). Best regards, Ilya [1] https://reviews.llvm.org/D148596 Ilya Leoshkevich (33): ftrace: Unpoison ftrace_regs in ftrace_ops_list_func() kmsan: Make the tests compatible with kmsan.panic=1 kmsan: Disable KMSAN when DEFERRED_STRUCT_PAGE_INIT is enabled kmsan: Increase the maximum store size to 4096 kmsan: Fix is_bad_asm_addr() on arches with overlapping address spaces kmsan: Fix kmsan_copy_to_user() on arches with overlapping address spaces kmsan: Remove a useless assignment from kmsan_vmap_pages_range_noflush() kmsan: Remove an x86-specific #include from kmsan.h kmsan: Introduce kmsan_memmove_metadata() kmsan: Expose kmsan_get_metadata() kmsan: Export panic_on_kmsan kmsan: Allow disabling KMSAN checks for the current task kmsan: Introduce memset_no_sanitize_memory() kmsan: Support SLAB_POISON kmsan: Use ALIGN_DOWN() in kmsan_get_metadata() mm: slub: Let KMSAN access metadata mm: kfence: Disable KMSAN when checking the canary lib/string: Add KMSAN support to strlcpy() and strlcat() lib/zlib: Unpoison DFLTCC output buffers kmsan: Accept ranges starting with 0 on s390 s390: Turn off KMSAN for boot, vdso and purgatory s390: Use a larger stack for KMSAN s390/boot: Add the KMSAN runtime stub s390/checksum: Add a KMSAN check s390/cpacf: Unpoison the results of cpacf_trng() s390/ftrace: Unpoison ftrace_regs in kprobe_ftrace_handler() s390/mm: Define KMSAN metadata for vmalloc and modules s390/string: Add KMSAN support s390/traps: Unpoison the kernel_stack_overflow()'s pt_regs s390/uaccess: Add KMSAN support to put_user() and get_user() s390/unwind: Disable KMSAN checks s390: Implement the architecture-specific kmsan functions kmsan: Enable on s390 Documentation/dev-tools/kmsan.rst | 4 +- arch/s390/Kconfig | 1 + arch/s390/Makefile | 2 +- arch/s390/boot/Makefile | 3 + arch/s390/boot/kmsan.c | 6 ++ arch/s390/boot/startup.c| 8 ++ arch/s390/boot/string.c | 16 arch/s390/include/asm/checksum.h| 2 + arch/s390/include/asm/cpacf.h | 2 + arch/s390/include/asm/kmsan.h | 36 + arch/s390/include/asm/pgtable.h | 10 +++ arch/s390/include/asm/string.h | 20 +++-- arch/s390/include/asm/thread_info.h | 2 +- arch/s390/include/asm/uaccess.h | 110 arch/s390/kernel/ftrace.c | 1 + arch/s390/kernel/traps.c| 6 ++ arch/s390/kernel/unwind_bc.c| 4 + arch/s390/kernel/vdso32/Makefile| 3 +- arch/s390/kernel/vdso64/Makefile| 3 +- arch/s390/purgatory/Makefile| 2 + include/linux/kmsan-checks.h| 26 +++ include/linux/kmsan.h | 23 ++ include/linux/kmsan_types.h | 2 +- kernel/trace/ftrace.c | 1 + lib/string.c| 6 ++ lib/zlib_dfltcc/dfltcc.h| 1 + lib/zlib_dfltcc/dfltcc_util.h | 23 ++ mm/Kconfig | 1 + mm/kfence/core.c| 5 +- mm/kmsan/core.c | 2 +- mm/kmsan/hooks.c| 30 +++- mm/kmsan/init.c | 5 +- mm/kmsan/instrumentation.c | 11 +-- mm/kmsan/kmsan.h| 9 +-- mm/kmsan/kmsan_test.c | 5 ++ mm/kmsan/report.c | 7 +- mm/kmsan/shadow.c | 9 +-- mm/slub.c | 12 ++- 38 files changed, 345 insertions(+), 74 deletions(-) create mode 100644 arch/s390/boot/kmsan.c create mode 100644 arch/s390/include/asm/kmsan.h -- 2.41.0
[PATCH v2 22/33] s390: Use a larger stack for KMSAN
Adjust the stack size for the KMSAN-enabled kernel like it was done for the KASAN-enabled one in commit 7fef92ccadd7 ("s390/kasan: double the stack size"). Both tools have similar requirements. Reviewed-by: Alexander Gordeev Reviewed-by: Alexander Potapenko Signed-off-by: Ilya Leoshkevich --- arch/s390/Makefile | 2 +- arch/s390/include/asm/thread_info.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/s390/Makefile b/arch/s390/Makefile index 73873e451686..a7f5386d25ad 100644 --- a/arch/s390/Makefile +++ b/arch/s390/Makefile @@ -34,7 +34,7 @@ KBUILD_CFLAGS_DECOMPRESSOR += $(if $(CONFIG_DEBUG_INFO_DWARF4), $(call cc-option KBUILD_CFLAGS_DECOMPRESSOR += $(if $(CONFIG_CC_NO_ARRAY_BOUNDS),-Wno-array-bounds) UTS_MACHINE:= s390x -STACK_SIZE := $(if $(CONFIG_KASAN),65536,16384) +STACK_SIZE := $(if $(CONFIG_KASAN),65536,$(if $(CONFIG_KMSAN),65536,16384)) CHECKFLAGS += -D__s390__ -D__s390x__ export LD_BFD diff --git a/arch/s390/include/asm/thread_info.h b/arch/s390/include/asm/thread_info.h index a674c7d25da5..d02a709717b8 100644 --- a/arch/s390/include/asm/thread_info.h +++ b/arch/s390/include/asm/thread_info.h @@ -16,7 +16,7 @@ /* * General size of kernel stacks */ -#ifdef CONFIG_KASAN +#if defined(CONFIG_KASAN) || defined(CONFIG_KMSAN) #define THREAD_SIZE_ORDER 4 #else #define THREAD_SIZE_ORDER 2 -- 2.41.0
[PATCH v2 24/33] s390/checksum: Add a KMSAN check
Add a KMSAN check to the CKSM inline assembly, similar to how it was done for ASAN in commit e42ac7789df6 ("s390/checksum: always use cksm instruction"). Acked-by: Alexander Gordeev Signed-off-by: Ilya Leoshkevich --- arch/s390/include/asm/checksum.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/s390/include/asm/checksum.h b/arch/s390/include/asm/checksum.h index 69837eec2ff5..55ba0ddd8eab 100644 --- a/arch/s390/include/asm/checksum.h +++ b/arch/s390/include/asm/checksum.h @@ -13,6 +13,7 @@ #define _S390_CHECKSUM_H #include +#include #include /* @@ -35,6 +36,7 @@ static inline __wsum csum_partial(const void *buff, int len, __wsum sum) }; kasan_check_read(buff, len); + kmsan_check_memory(buff, len); asm volatile( "0: cksm%[sum],%[rp]\n" " jo 0b\n" -- 2.41.0
[PATCH v2 23/33] s390/boot: Add the KMSAN runtime stub
It should be possible to have inline functions in the s390 header files, which call kmsan_unpoison_memory(). The problem is that these header files might be included by the decompressor, which does not contain KMSAN runtime, causing linker errors. Not compiling these calls if __SANITIZE_MEMORY__ is not defined - either by changing kmsan-checks.h or at the call sites - may cause unintended side effects, since calling these functions from an uninstrumented code that is linked into the kernel is valid use case. One might want to explicitly distinguish between the kernel and the decompressor. Checking for a decompressor-specific #define is quite heavy-handed, and will have to be done at all call sites. A more generic approach is to provide a dummy kmsan_unpoison_memory() definition. This produces some runtime overhead, but only when building with CONFIG_KMSAN. The benefit is that it does not disturb the existing KMSAN build logic and call sites don't need to be changed. Signed-off-by: Ilya Leoshkevich --- arch/s390/boot/Makefile | 1 + arch/s390/boot/kmsan.c | 6 ++ 2 files changed, 7 insertions(+) create mode 100644 arch/s390/boot/kmsan.c diff --git a/arch/s390/boot/Makefile b/arch/s390/boot/Makefile index fb10fcd21221..096216a72e98 100644 --- a/arch/s390/boot/Makefile +++ b/arch/s390/boot/Makefile @@ -44,6 +44,7 @@ obj-$(findstring y, $(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) $(CONFIG_PGSTE)) += obj-$(CONFIG_RANDOMIZE_BASE) += kaslr.o obj-y += $(if $(CONFIG_KERNEL_UNCOMPRESSED),,decompressor.o) info.o obj-$(CONFIG_KERNEL_ZSTD) += clz_ctz.o +obj-$(CONFIG_KMSAN) += kmsan.o obj-all := $(obj-y) piggy.o syms.o targets:= bzImage section_cmp.boot.data section_cmp.boot.preserved.data $(obj-y) diff --git a/arch/s390/boot/kmsan.c b/arch/s390/boot/kmsan.c new file mode 100644 index ..e7b3ac48143e --- /dev/null +++ b/arch/s390/boot/kmsan.c @@ -0,0 +1,6 @@ +// SPDX-License-Identifier: GPL-2.0 +#include + +void kmsan_unpoison_memory(const void *address, size_t size) +{ +} -- 2.41.0
[PATCH v2 26/33] s390/ftrace: Unpoison ftrace_regs in kprobe_ftrace_handler()
s390 uses assembly code to initialize ftrace_regs and call kprobe_ftrace_handler(). Therefore, from the KMSAN's point of view, ftrace_regs is poisoned on kprobe_ftrace_handler() entry. This causes KMSAN warnings when running the ftrace testsuite. Fix by trusting the assembly code and always unpoisoning ftrace_regs in kprobe_ftrace_handler(). Signed-off-by: Ilya Leoshkevich --- arch/s390/kernel/ftrace.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/s390/kernel/ftrace.c b/arch/s390/kernel/ftrace.c index c46381ea04ec..3bad34eaa51e 100644 --- a/arch/s390/kernel/ftrace.c +++ b/arch/s390/kernel/ftrace.c @@ -300,6 +300,7 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip, if (bit < 0) return; + kmsan_unpoison_memory(fregs, sizeof(*fregs)); regs = ftrace_get_regs(fregs); p = get_kprobe((kprobe_opcode_t *)ip); if (!regs || unlikely(!p) || kprobe_disabled(p)) -- 2.41.0
[PATCH v2 28/33] s390/string: Add KMSAN support
Add KMSAN support for the s390 implementations of the string functions. Do this similar to how it's already done for KASAN, except that the optimized memset{16,32,64}() functions need to be disabled: it's important for KMSAN to know that they initialized something. The way boot code is built with regard to string functions is problematic, since most files think it's configured with sanitizers, but boot/string.c doesn't. This creates various problems with the memset64() definitions, depending on whether the code is built with sanitizers or fortify. This should probably be streamlined, but in the meantime resolve the issues by introducing the IN_BOOT_STRING_C macro, similar to the existing IN_ARCH_STRING_C macro. Signed-off-by: Ilya Leoshkevich --- arch/s390/boot/string.c| 16 arch/s390/include/asm/string.h | 20 +++- 2 files changed, 31 insertions(+), 5 deletions(-) diff --git a/arch/s390/boot/string.c b/arch/s390/boot/string.c index faccb33b462c..f6b9b1df48a8 100644 --- a/arch/s390/boot/string.c +++ b/arch/s390/boot/string.c @@ -1,11 +1,18 @@ // SPDX-License-Identifier: GPL-2.0 +#define IN_BOOT_STRING_C 1 #include #include #include #undef CONFIG_KASAN #undef CONFIG_KASAN_GENERIC +#undef CONFIG_KMSAN #include "../lib/string.c" +/* + * Duplicate some functions from the common lib/string.c + * instead of fully including it. + */ + int strncmp(const char *cs, const char *ct, size_t count) { unsigned char c1, c2; @@ -22,6 +29,15 @@ int strncmp(const char *cs, const char *ct, size_t count) return 0; } +void *memset64(uint64_t *s, uint64_t v, size_t count) +{ + uint64_t *xs = s; + + while (count--) + *xs++ = v; + return s; +} + char *skip_spaces(const char *str) { while (isspace(*str)) diff --git a/arch/s390/include/asm/string.h b/arch/s390/include/asm/string.h index 351685de53d2..2ab868cbae6c 100644 --- a/arch/s390/include/asm/string.h +++ b/arch/s390/include/asm/string.h @@ -15,15 +15,12 @@ #define __HAVE_ARCH_MEMCPY /* gcc builtin & arch function */ #define __HAVE_ARCH_MEMMOVE/* gcc builtin & arch function */ #define __HAVE_ARCH_MEMSET /* gcc builtin & arch function */ -#define __HAVE_ARCH_MEMSET16 /* arch function */ -#define __HAVE_ARCH_MEMSET32 /* arch function */ -#define __HAVE_ARCH_MEMSET64 /* arch function */ void *memcpy(void *dest, const void *src, size_t n); void *memset(void *s, int c, size_t n); void *memmove(void *dest, const void *src, size_t n); -#ifndef CONFIG_KASAN +#if !defined(CONFIG_KASAN) && !defined(CONFIG_KMSAN) #define __HAVE_ARCH_MEMCHR /* inline & arch function */ #define __HAVE_ARCH_MEMCMP /* arch function */ #define __HAVE_ARCH_MEMSCAN/* inline & arch function */ @@ -36,6 +33,9 @@ void *memmove(void *dest, const void *src, size_t n); #define __HAVE_ARCH_STRNCPY/* arch function */ #define __HAVE_ARCH_STRNLEN/* inline & arch function */ #define __HAVE_ARCH_STRSTR /* arch function */ +#define __HAVE_ARCH_MEMSET16 /* arch function */ +#define __HAVE_ARCH_MEMSET32 /* arch function */ +#define __HAVE_ARCH_MEMSET64 /* arch function */ /* Prototypes for non-inlined arch strings functions. */ int memcmp(const void *s1, const void *s2, size_t n); @@ -44,7 +44,7 @@ size_t strlcat(char *dest, const char *src, size_t n); char *strncat(char *dest, const char *src, size_t n); char *strncpy(char *dest, const char *src, size_t n); char *strstr(const char *s1, const char *s2); -#endif /* !CONFIG_KASAN */ +#endif /* !defined(CONFIG_KASAN) && !defined(CONFIG_KMSAN) */ #undef __HAVE_ARCH_STRCHR #undef __HAVE_ARCH_STRNCHR @@ -74,20 +74,30 @@ void *__memset16(uint16_t *s, uint16_t v, size_t count); void *__memset32(uint32_t *s, uint32_t v, size_t count); void *__memset64(uint64_t *s, uint64_t v, size_t count); +#ifdef __HAVE_ARCH_MEMSET16 static inline void *memset16(uint16_t *s, uint16_t v, size_t count) { return __memset16(s, v, count * sizeof(v)); } +#endif +#ifdef __HAVE_ARCH_MEMSET32 static inline void *memset32(uint32_t *s, uint32_t v, size_t count) { return __memset32(s, v, count * sizeof(v)); } +#endif +#ifdef __HAVE_ARCH_MEMSET64 +#ifdef IN_BOOT_STRING_C +void *memset64(uint64_t *s, uint64_t v, size_t count); +#else static inline void *memset64(uint64_t *s, uint64_t v, size_t count) { return __memset64(s, v, count * sizeof(v)); } +#endif +#endif #if !defined(IN_ARCH_STRING_C) && (!defined(CONFIG_FORTIFY_SOURCE) || defined(__NO_FORTIFY)) -- 2.41.0
[PATCH v2 25/33] s390/cpacf: Unpoison the results of cpacf_trng()
Prevent KMSAN from complaining about buffers filled by cpacf_trng() being uninitialized. Tested-by: Alexander Gordeev Signed-off-by: Ilya Leoshkevich --- arch/s390/include/asm/cpacf.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/s390/include/asm/cpacf.h b/arch/s390/include/asm/cpacf.h index b378e2b57ad8..a72b92770c4b 100644 --- a/arch/s390/include/asm/cpacf.h +++ b/arch/s390/include/asm/cpacf.h @@ -473,6 +473,8 @@ static inline void cpacf_trng(u8 *ucbuf, unsigned long ucbuf_len, : [ucbuf] "+&d" (u.pair), [cbuf] "+&d" (c.pair) : [fc] "K" (CPACF_PRNO_TRNG), [opc] "i" (CPACF_PRNO) : "cc", "memory", "0"); + kmsan_unpoison_memory(ucbuf, ucbuf_len); + kmsan_unpoison_memory(cbuf, cbuf_len); } /** -- 2.41.0
[PATCH v2 31/33] s390/unwind: Disable KMSAN checks
The unwind code can read uninitialized frames. Furthermore, even in the good case, KMSAN does not emit shadow for backchains. Therefore disable it for the unwinding functions. Reviewed-by: Alexander Potapenko Signed-off-by: Ilya Leoshkevich --- arch/s390/kernel/unwind_bc.c | 4 1 file changed, 4 insertions(+) diff --git a/arch/s390/kernel/unwind_bc.c b/arch/s390/kernel/unwind_bc.c index 0ece156fdd7c..cd44be2b6ce8 100644 --- a/arch/s390/kernel/unwind_bc.c +++ b/arch/s390/kernel/unwind_bc.c @@ -49,6 +49,8 @@ static inline bool is_final_pt_regs(struct unwind_state *state, READ_ONCE_NOCHECK(regs->psw.mask) & PSW_MASK_PSTATE; } +/* Avoid KMSAN false positives from touching uninitialized frames. */ +__no_kmsan_checks bool unwind_next_frame(struct unwind_state *state) { struct stack_info *info = &state->stack_info; @@ -118,6 +120,8 @@ bool unwind_next_frame(struct unwind_state *state) } EXPORT_SYMBOL_GPL(unwind_next_frame); +/* Avoid KMSAN false positives from touching uninitialized frames. */ +__no_kmsan_checks void __unwind_start(struct unwind_state *state, struct task_struct *task, struct pt_regs *regs, unsigned long first_frame) { -- 2.41.0
[PATCH v2 29/33] s390/traps: Unpoison the kernel_stack_overflow()'s pt_regs
This is normally done by the generic entry code, but the kernel_stack_overflow() flow bypasses it. Reviewed-by: Alexander Potapenko Signed-off-by: Ilya Leoshkevich --- arch/s390/kernel/traps.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/s390/kernel/traps.c b/arch/s390/kernel/traps.c index 1d2aa448d103..f299b1203a20 100644 --- a/arch/s390/kernel/traps.c +++ b/arch/s390/kernel/traps.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include #include @@ -260,6 +261,11 @@ static void monitor_event_exception(struct pt_regs *regs) void kernel_stack_overflow(struct pt_regs *regs) { + /* +* Normally regs are unpoisoned by the generic entry code, but +* kernel_stack_overflow() is a rare case that is called bypassing it. +*/ + kmsan_unpoison_entry_regs(regs); bust_spinlocks(1); printk("Kernel stack overflow.\n"); show_regs(regs); -- 2.41.0
[PATCH v2 32/33] s390: Implement the architecture-specific kmsan functions
arch_kmsan_get_meta_or_null() finds the lowcore shadow by querying the prefix and calling kmsan_get_metadata() again. kmsan_virt_addr_valid() delegates to virt_addr_valid(). Signed-off-by: Ilya Leoshkevich --- arch/s390/include/asm/kmsan.h | 36 +++ 1 file changed, 36 insertions(+) create mode 100644 arch/s390/include/asm/kmsan.h diff --git a/arch/s390/include/asm/kmsan.h b/arch/s390/include/asm/kmsan.h new file mode 100644 index ..afec71e9e9ac --- /dev/null +++ b/arch/s390/include/asm/kmsan.h @@ -0,0 +1,36 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_S390_KMSAN_H +#define _ASM_S390_KMSAN_H + +#include +#include +#include +#include +#include + +#ifndef MODULE + +static inline void *arch_kmsan_get_meta_or_null(void *addr, bool is_origin) +{ + if (addr >= (void *)&S390_lowcore && + addr < (void *)(&S390_lowcore + 1)) { + /* +* Different lowcores accessed via S390_lowcore are described +* by the same struct page. Resolve the prefix manually in +* order to get a distinct struct page. +*/ + addr += (void *)lowcore_ptr[raw_smp_processor_id()] - + (void *)&S390_lowcore; + return kmsan_get_metadata(addr, is_origin); + } + return NULL; +} + +static inline bool kmsan_virt_addr_valid(void *addr) +{ + return virt_addr_valid(addr); +} + +#endif /* !MODULE */ + +#endif /* _ASM_S390_KMSAN_H */ -- 2.41.0
[PATCH v2 33/33] kmsan: Enable on s390
Now that everything else is in place, enable KMSAN in Kconfig. Signed-off-by: Ilya Leoshkevich --- arch/s390/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index 3bec98d20283..160ad2220c53 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -153,6 +153,7 @@ config S390 select HAVE_ARCH_KASAN select HAVE_ARCH_KASAN_VMALLOC select HAVE_ARCH_KCSAN + select HAVE_ARCH_KMSAN select HAVE_ARCH_KFENCE select HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET select HAVE_ARCH_SECCOMP_FILTER -- 2.41.0
[PATCH v2 30/33] s390/uaccess: Add KMSAN support to put_user() and get_user()
put_user() uses inline assembly with precise constraints, so Clang is in principle capable of instrumenting it automatically. Unfortunately, one of the constraints contains a dereferenced user pointer, and Clang does not currently distinguish user and kernel pointers. Therefore KMSAN attempts to access shadow for user pointers, which is not a right thing to do. An obvious fix to add __no_sanitize_memory to __put_user_fn() does not work, since it's __always_inline. And __always_inline cannot be removed due to the __put_user_bad() trick. A different obvious fix of using the "a" instead of the "+Q" constraint degrades the code quality, which is very important here, since it's a hot path. Instead, repurpose the __put_user_asm() macro to define __put_user_{char,short,int,long}_noinstr() functions and mark them with __no_sanitize_memory. For the non-KMSAN builds make them __always_inline in order to keep the generated code quality. Also define __put_user_{char,short,int,long}() functions, which call the aforementioned ones and which *are* instrumented, because they call KMSAN hooks, which may be implemented as macros. The same applies to get_user() as well. Acked-by: Heiko Carstens Signed-off-by: Ilya Leoshkevich --- arch/s390/include/asm/uaccess.h | 110 ++-- 1 file changed, 78 insertions(+), 32 deletions(-) diff --git a/arch/s390/include/asm/uaccess.h b/arch/s390/include/asm/uaccess.h index 81ae8a98e7ec..b0715b88b55a 100644 --- a/arch/s390/include/asm/uaccess.h +++ b/arch/s390/include/asm/uaccess.h @@ -78,13 +78,23 @@ union oac { int __noreturn __put_user_bad(void); -#define __put_user_asm(to, from, size) \ -({ \ +#ifdef CONFIG_KMSAN +#define GET_PUT_USER_NOINSTR_ATTRIBUTES inline __no_sanitize_memory +#else +#define GET_PUT_USER_NOINSTR_ATTRIBUTES __always_inline +#endif + +#define DEFINE_PUT_USER(type) \ +static GET_PUT_USER_NOINSTR_ATTRIBUTES int \ +__put_user_##type##_noinstr(unsigned type __user *to, \ + unsigned type *from,\ + unsigned long size) \ +{ \ union oac __oac_spec = {\ .oac1.as = PSW_BITS_AS_SECONDARY, \ .oac1.a = 1,\ }; \ - int __rc; \ + int rc; \ \ asm volatile( \ " lr 0,%[spec]\n"\ @@ -93,12 +103,28 @@ int __noreturn __put_user_bad(void); "2:\n" \ EX_TABLE_UA_STORE(0b, 2b, %[rc])\ EX_TABLE_UA_STORE(1b, 2b, %[rc])\ - : [rc] "=&d" (__rc), [_to] "+Q" (*(to)) \ + : [rc] "=&d" (rc), [_to] "+Q" (*(to)) \ : [_size] "d" (size), [_from] "Q" (*(from)),\ [spec] "d" (__oac_spec.val) \ : "cc", "0"); \ - __rc; \ -}) + return rc; \ +} \ + \ +static __always_inline int \ +__put_user_##type(unsigned type __user *to, unsigned type *from, \ + unsigned long size) \ +{ \ + int rc; \ + \ + rc = __put_user_##type##_noinstr(to, from, size); \ + instrument_put_user(*from, to, size); \ + return rc; \ +} + +DEFINE_PUT_USER(char); +DEFINE_PUT_USER(short); +DEFINE_PUT_USER(int); +DEFINE_PUT_USER(long); static __always_inline int __put_user_fn(void *x, void __user *ptr, unsigned long size) { @@ -106,24 +132,24 @@ static __always_inline int __put_user_fn(void *x, void __user *ptr, unsigned lon switch (size) {
[PATCH v2 13/33] kmsan: Introduce memset_no_sanitize_memory()
Add a wrapper for memset() that prevents unpoisoning. This is useful for filling memory allocator redzones. Signed-off-by: Ilya Leoshkevich --- include/linux/kmsan.h | 9 + 1 file changed, 9 insertions(+) diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h index ff8fd95733fa..439df72c8dc6 100644 --- a/include/linux/kmsan.h +++ b/include/linux/kmsan.h @@ -345,4 +345,13 @@ static inline void *kmsan_get_metadata(void *addr, bool is_origin) #endif +/** + * memset_no_sanitize_memory() - memset() without the KMSAN instrumentation. + */ +__no_sanitize_memory +static inline void *memset_no_sanitize_memory(void *s, int c, size_t n) +{ + return memset(s, c, n); +} + #endif /* _LINUX_KMSAN_H */ -- 2.41.0
[PATCH v2 06/33] kmsan: Fix kmsan_copy_to_user() on arches with overlapping address spaces
Comparing pointers with TASK_SIZE does not make sense when kernel and userspace overlap. Assume that we are handling user memory access in this case. Reported-by: Alexander Gordeev Reviewed-by: Alexander Potapenko Signed-off-by: Ilya Leoshkevich --- mm/kmsan/hooks.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c index 5d6e2dee5692..eafc45f937eb 100644 --- a/mm/kmsan/hooks.c +++ b/mm/kmsan/hooks.c @@ -267,7 +267,8 @@ void kmsan_copy_to_user(void __user *to, const void *from, size_t to_copy, return; ua_flags = user_access_save(); - if ((u64)to < TASK_SIZE) { + if (!IS_ENABLED(CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE) || + (u64)to < TASK_SIZE) { /* This is a user memory access, check it. */ kmsan_internal_check_memory((void *)from, to_copy - left, to, REASON_COPY_TO_USER); -- 2.41.0
[PATCH v2 12/33] kmsan: Allow disabling KMSAN checks for the current task
Like for KASAN, it's useful to temporarily disable KMSAN checks around, e.g., redzone accesses. Introduce kmsan_disable_current() and kmsan_enable_current(), which are similar to their KASAN counterparts. Even though it's not strictly necessary, make them reentrant, in order to match the KASAN behavior. Repurpose the allow_reporting field for this. Signed-off-by: Ilya Leoshkevich --- Documentation/dev-tools/kmsan.rst | 4 ++-- include/linux/kmsan-checks.h | 12 include/linux/kmsan_types.h | 2 +- mm/kmsan/core.c | 2 +- mm/kmsan/hooks.c | 14 +- mm/kmsan/report.c | 6 +++--- 6 files changed, 32 insertions(+), 8 deletions(-) diff --git a/Documentation/dev-tools/kmsan.rst b/Documentation/dev-tools/kmsan.rst index 323eedad53cd..022a823f5f1b 100644 --- a/Documentation/dev-tools/kmsan.rst +++ b/Documentation/dev-tools/kmsan.rst @@ -338,11 +338,11 @@ Per-task KMSAN state Every task_struct has an associated KMSAN task state that holds the KMSAN -context (see above) and a per-task flag disallowing KMSAN reports:: +context (see above) and a per-task counter disallowing KMSAN reports:: struct kmsan_context { ... -bool allow_reporting; +unsigned int depth; struct kmsan_context_state cstate; ... } diff --git a/include/linux/kmsan-checks.h b/include/linux/kmsan-checks.h index 5218973f0ad0..bab2603685f7 100644 --- a/include/linux/kmsan-checks.h +++ b/include/linux/kmsan-checks.h @@ -72,6 +72,10 @@ void kmsan_copy_to_user(void __user *to, const void *from, size_t to_copy, */ void kmsan_memmove_metadata(void *dst, const void *src, size_t n); +void kmsan_enable_current(void); + +void kmsan_disable_current(void); + #else static inline void kmsan_poison_memory(const void *address, size_t size, @@ -92,6 +96,14 @@ static inline void kmsan_memmove_metadata(void *dst, const void *src, size_t n) { } +static inline void kmsan_enable_current(void) +{ +} + +static inline void kmsan_disable_current(void) +{ +} + #endif #endif /* _LINUX_KMSAN_CHECKS_H */ diff --git a/include/linux/kmsan_types.h b/include/linux/kmsan_types.h index 8bfa6c98176d..27bb146ece95 100644 --- a/include/linux/kmsan_types.h +++ b/include/linux/kmsan_types.h @@ -29,7 +29,7 @@ struct kmsan_context_state { struct kmsan_ctx { struct kmsan_context_state cstate; int kmsan_in_runtime; - bool allow_reporting; + unsigned int depth; }; #endif /* _LINUX_KMSAN_TYPES_H */ diff --git a/mm/kmsan/core.c b/mm/kmsan/core.c index c19f47af0424..b8767378cf8a 100644 --- a/mm/kmsan/core.c +++ b/mm/kmsan/core.c @@ -43,7 +43,7 @@ void kmsan_internal_task_create(struct task_struct *task) struct thread_info *info = current_thread_info(); __memset(ctx, 0, sizeof(*ctx)); - ctx->allow_reporting = true; + ctx->depth = 0; kmsan_internal_unpoison_memory(info, sizeof(*info), false); } diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c index 4d477a0a356c..7b5814412e9f 100644 --- a/mm/kmsan/hooks.c +++ b/mm/kmsan/hooks.c @@ -44,7 +44,7 @@ void kmsan_task_exit(struct task_struct *task) if (!kmsan_enabled || kmsan_in_runtime()) return; - ctx->allow_reporting = false; + ctx->depth++; } void kmsan_slab_alloc(struct kmem_cache *s, void *object, gfp_t flags) @@ -434,3 +434,15 @@ void kmsan_check_memory(const void *addr, size_t size) REASON_ANY); } EXPORT_SYMBOL(kmsan_check_memory); + +void kmsan_enable_current(void) +{ + current->kmsan_ctx.depth--; +} +EXPORT_SYMBOL(kmsan_enable_current); + +void kmsan_disable_current(void) +{ + current->kmsan_ctx.depth++; +} +EXPORT_SYMBOL(kmsan_disable_current); diff --git a/mm/kmsan/report.c b/mm/kmsan/report.c index c79d3b0d2d0d..edcf53ca428e 100644 --- a/mm/kmsan/report.c +++ b/mm/kmsan/report.c @@ -158,12 +158,12 @@ void kmsan_report(depot_stack_handle_t origin, void *address, int size, if (!kmsan_enabled) return; - if (!current->kmsan_ctx.allow_reporting) + if (current->kmsan_ctx.depth) return; if (!origin) return; - current->kmsan_ctx.allow_reporting = false; + current->kmsan_ctx.depth++; ua_flags = user_access_save(); raw_spin_lock(&kmsan_report_lock); pr_err("=\n"); @@ -216,5 +216,5 @@ void kmsan_report(depot_stack_handle_t origin, void *address, int size, if (panic_on_kmsan) panic("kmsan.panic set ...\n"); user_access_restore(ua_flags); - current->kmsan_ctx.allow_reporting = true; + current->kmsan_ctx.depth--; } -- 2.41.0
[PATCH v2 04/33] kmsan: Increase the maximum store size to 4096
The inline assembly block in s390's chsc() stores that much. Signed-off-by: Ilya Leoshkevich --- mm/kmsan/instrumentation.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/mm/kmsan/instrumentation.c b/mm/kmsan/instrumentation.c index cc3907a9c33a..470b0b4afcc4 100644 --- a/mm/kmsan/instrumentation.c +++ b/mm/kmsan/instrumentation.c @@ -110,11 +110,10 @@ void __msan_instrument_asm_store(void *addr, uintptr_t size) ua_flags = user_access_save(); /* -* Most of the accesses are below 32 bytes. The two exceptions so far -* are clwb() (64 bytes) and FPU state (512 bytes). -* It's unlikely that the assembly will touch more than 512 bytes. +* Most of the accesses are below 32 bytes. The exceptions so far are +* clwb() (64 bytes), FPU state (512 bytes) and chsc() (4096 bytes). */ - if (size > 512) { + if (size > 4096) { WARN_ONCE(1, "assembly store size too big: %ld\n", size); size = 8; } -- 2.41.0
[PATCH v2 09/33] kmsan: Introduce kmsan_memmove_metadata()
It is useful to manually copy metadata in order to describe the effects of memmove()-like logic in uninstrumented code or inline asm. Introduce kmsan_memmove_metadata() for this purpose. Signed-off-by: Ilya Leoshkevich --- include/linux/kmsan-checks.h | 14 ++ mm/kmsan/hooks.c | 11 +++ 2 files changed, 25 insertions(+) diff --git a/include/linux/kmsan-checks.h b/include/linux/kmsan-checks.h index c4cae333deec..5218973f0ad0 100644 --- a/include/linux/kmsan-checks.h +++ b/include/linux/kmsan-checks.h @@ -61,6 +61,17 @@ void kmsan_check_memory(const void *address, size_t size); void kmsan_copy_to_user(void __user *to, const void *from, size_t to_copy, size_t left); +/** + * kmsan_memmove_metadata() - Copy kernel memory range metadata. + * @dst: start of the destination kernel memory range. + * @src: start of the source kernel memory range. + * @n: size of the memory ranges. + * + * KMSAN will treat the destination range as if its contents were memmove()d + * from the source range. + */ +void kmsan_memmove_metadata(void *dst, const void *src, size_t n); + #else static inline void kmsan_poison_memory(const void *address, size_t size, @@ -77,6 +88,9 @@ static inline void kmsan_copy_to_user(void __user *to, const void *from, size_t to_copy, size_t left) { } +static inline void kmsan_memmove_metadata(void *dst, const void *src, size_t n) +{ +} #endif diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c index eafc45f937eb..4d477a0a356c 100644 --- a/mm/kmsan/hooks.c +++ b/mm/kmsan/hooks.c @@ -286,6 +286,17 @@ void kmsan_copy_to_user(void __user *to, const void *from, size_t to_copy, } EXPORT_SYMBOL(kmsan_copy_to_user); +void kmsan_memmove_metadata(void *dst, const void *src, size_t n) +{ + if (!kmsan_enabled || kmsan_in_runtime()) + return; + + kmsan_enter_runtime(); + kmsan_internal_memmove_metadata(dst, (void *)src, n); + kmsan_leave_runtime(); +} +EXPORT_SYMBOL(kmsan_memmove_metadata); + /* Helper function to check an URB. */ void kmsan_handle_urb(const struct urb *urb, bool is_out) { -- 2.41.0
[PATCH v2 19/33] lib/zlib: Unpoison DFLTCC output buffers
The constraints of the DFLTCC inline assembly are not precise: they do not communicate the size of the output buffers to the compiler, so it cannot automatically instrument it. Add the manual kmsan_unpoison_memory() calls for the output buffers. The logic is the same as in [1]. [1] https://github.com/zlib-ng/zlib-ng/commit/1f5ddcc009ac3511e99fc88736a9e1a6381168c5 Reported-by: Alexander Gordeev Signed-off-by: Ilya Leoshkevich --- lib/zlib_dfltcc/dfltcc.h | 1 + lib/zlib_dfltcc/dfltcc_util.h | 23 +++ 2 files changed, 24 insertions(+) diff --git a/lib/zlib_dfltcc/dfltcc.h b/lib/zlib_dfltcc/dfltcc.h index b96232bdd44d..0f2a16d7a48a 100644 --- a/lib/zlib_dfltcc/dfltcc.h +++ b/lib/zlib_dfltcc/dfltcc.h @@ -80,6 +80,7 @@ struct dfltcc_param_v0 { uint8_t csb[1152]; }; +static_assert(offsetof(struct dfltcc_param_v0, csb) == 384); static_assert(sizeof(struct dfltcc_param_v0) == 1536); #define CVT_CRC32 0 diff --git a/lib/zlib_dfltcc/dfltcc_util.h b/lib/zlib_dfltcc/dfltcc_util.h index 4a46b5009f0d..ce2e039a55b5 100644 --- a/lib/zlib_dfltcc/dfltcc_util.h +++ b/lib/zlib_dfltcc/dfltcc_util.h @@ -2,6 +2,7 @@ #ifndef DFLTCC_UTIL_H #define DFLTCC_UTIL_H +#include "dfltcc.h" #include /* @@ -20,6 +21,7 @@ typedef enum { #define DFLTCC_CMPR 2 #define DFLTCC_XPND 4 #define HBT_CIRCULAR (1 << 7) +#define DFLTCC_FN_MASK ((1 << 7) - 1) #define HB_BITS 15 #define HB_SIZE (1 << HB_BITS) @@ -34,6 +36,7 @@ static inline dfltcc_cc dfltcc( ) { Byte *t2 = op1 ? *op1 : NULL; +unsigned char *orig_t2 = t2; size_t t3 = len1 ? *len1 : 0; const Byte *t4 = op2 ? *op2 : NULL; size_t t5 = len2 ? *len2 : 0; @@ -59,6 +62,26 @@ static inline dfltcc_cc dfltcc( : "cc", "memory"); t2 = r2; t3 = r3; t4 = r4; t5 = r5; +switch (fn & DFLTCC_FN_MASK) { +case DFLTCC_QAF: +kmsan_unpoison_memory(param, sizeof(struct dfltcc_qaf_param)); +break; +case DFLTCC_GDHT: +kmsan_unpoison_memory(param, offsetof(struct dfltcc_param_v0, csb)); +break; +case DFLTCC_CMPR: +kmsan_unpoison_memory(param, sizeof(struct dfltcc_param_v0)); +kmsan_unpoison_memory( +orig_t2, +t2 - orig_t2 + +(((struct dfltcc_param_v0 *)param)->sbb == 0 ? 0 : 1)); +break; +case DFLTCC_XPND: +kmsan_unpoison_memory(param, sizeof(struct dfltcc_param_v0)); +kmsan_unpoison_memory(orig_t2, t2 - orig_t2); +break; +} + if (op1) *op1 = t2; if (len1) -- 2.41.0
[PATCH v2 21/33] s390: Turn off KMSAN for boot, vdso and purgatory
All other sanitizers are disabled for these components as well. While at it, add a comment to boot and purgatory. Reviewed-by: Alexander Gordeev Reviewed-by: Alexander Potapenko Signed-off-by: Ilya Leoshkevich --- arch/s390/boot/Makefile | 2 ++ arch/s390/kernel/vdso32/Makefile | 3 ++- arch/s390/kernel/vdso64/Makefile | 3 ++- arch/s390/purgatory/Makefile | 2 ++ 4 files changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/s390/boot/Makefile b/arch/s390/boot/Makefile index c7c81e5f9218..fb10fcd21221 100644 --- a/arch/s390/boot/Makefile +++ b/arch/s390/boot/Makefile @@ -3,11 +3,13 @@ # Makefile for the linux s390-specific parts of the memory manager. # +# Tooling runtimes are unavailable and cannot be linked for early boot code KCOV_INSTRUMENT := n GCOV_PROFILE := n UBSAN_SANITIZE := n KASAN_SANITIZE := n KCSAN_SANITIZE := n +KMSAN_SANITIZE := n KBUILD_AFLAGS := $(KBUILD_AFLAGS_DECOMPRESSOR) KBUILD_CFLAGS := $(KBUILD_CFLAGS_DECOMPRESSOR) diff --git a/arch/s390/kernel/vdso32/Makefile b/arch/s390/kernel/vdso32/Makefile index caec7db6f966..7cbec6b0b11f 100644 --- a/arch/s390/kernel/vdso32/Makefile +++ b/arch/s390/kernel/vdso32/Makefile @@ -32,11 +32,12 @@ obj-y += vdso32_wrapper.o targets += vdso32.lds CPPFLAGS_vdso32.lds += -P -C -U$(ARCH) -# Disable gcov profiling, ubsan and kasan for VDSO code +# Disable gcov profiling, ubsan, kasan and kmsan for VDSO code GCOV_PROFILE := n UBSAN_SANITIZE := n KASAN_SANITIZE := n KCSAN_SANITIZE := n +KMSAN_SANITIZE := n # Force dependency (incbin is bad) $(obj)/vdso32_wrapper.o : $(obj)/vdso32.so diff --git a/arch/s390/kernel/vdso64/Makefile b/arch/s390/kernel/vdso64/Makefile index e3c9085f8fa7..6f3252712f64 100644 --- a/arch/s390/kernel/vdso64/Makefile +++ b/arch/s390/kernel/vdso64/Makefile @@ -36,11 +36,12 @@ obj-y += vdso64_wrapper.o targets += vdso64.lds CPPFLAGS_vdso64.lds += -P -C -U$(ARCH) -# Disable gcov profiling, ubsan and kasan for VDSO code +# Disable gcov profiling, ubsan, kasan and kmsan for VDSO code GCOV_PROFILE := n UBSAN_SANITIZE := n KASAN_SANITIZE := n KCSAN_SANITIZE := n +KMSAN_SANITIZE := n # Force dependency (incbin is bad) $(obj)/vdso64_wrapper.o : $(obj)/vdso64.so diff --git a/arch/s390/purgatory/Makefile b/arch/s390/purgatory/Makefile index 4e930f566878..4e421914e50f 100644 --- a/arch/s390/purgatory/Makefile +++ b/arch/s390/purgatory/Makefile @@ -15,11 +15,13 @@ CFLAGS_sha256.o := -D__DISABLE_EXPORTS -D__NO_FORTIFY $(obj)/mem.o: $(srctree)/arch/s390/lib/mem.S FORCE $(call if_changed_rule,as_o_S) +# Tooling runtimes are unavailable and cannot be linked for purgatory code KCOV_INSTRUMENT := n GCOV_PROFILE := n UBSAN_SANITIZE := n KASAN_SANITIZE := n KCSAN_SANITIZE := n +KMSAN_SANITIZE := n KBUILD_CFLAGS := -fno-strict-aliasing -Wall -Wstrict-prototypes KBUILD_CFLAGS += -Wno-pointer-sign -Wno-sign-compare -- 2.41.0
[PATCH v2 27/33] s390/mm: Define KMSAN metadata for vmalloc and modules
The pages for the KMSAN metadata associated with most kernel mappings are taken from memblock by the common code. However, vmalloc and module metadata needs to be defined by the architectures. Be a little bit more careful than x86: allocate exactly MODULES_LEN for the module shadow and origins, and then take 2/3 of vmalloc for the vmalloc shadow and origins. This ensures that users passing small vmalloc= values on the command line do not cause module metadata collisions. Signed-off-by: Ilya Leoshkevich --- arch/s390/boot/startup.c| 8 arch/s390/include/asm/pgtable.h | 10 ++ 2 files changed, 18 insertions(+) diff --git a/arch/s390/boot/startup.c b/arch/s390/boot/startup.c index 8104e0e3d188..e37e7ffda430 100644 --- a/arch/s390/boot/startup.c +++ b/arch/s390/boot/startup.c @@ -253,9 +253,17 @@ static unsigned long setup_kernel_memory_layout(void) MODULES_END = round_down(__abs_lowcore, _SEGMENT_SIZE); MODULES_VADDR = MODULES_END - MODULES_LEN; VMALLOC_END = MODULES_VADDR; +#ifdef CONFIG_KMSAN + VMALLOC_END -= MODULES_LEN * 2; +#endif /* allow vmalloc area to occupy up to about 1/2 of the rest virtual space left */ vmalloc_size = min(vmalloc_size, round_down(VMALLOC_END / 2, _REGION3_SIZE)); +#ifdef CONFIG_KMSAN + /* take 2/3 of vmalloc area for KMSAN shadow and origins */ + vmalloc_size = round_down(vmalloc_size / 3, _REGION3_SIZE); + VMALLOC_END -= vmalloc_size * 2; +#endif VMALLOC_START = VMALLOC_END - vmalloc_size; /* split remaining virtual space between 1:1 mapping & vmemmap array */ diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h index 601e87fa8a9a..d764abeb9e6d 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -107,6 +107,16 @@ static inline int is_module_addr(void *addr) return 1; } +#ifdef CONFIG_KMSAN +#define KMSAN_VMALLOC_SIZE (VMALLOC_END - VMALLOC_START) +#define KMSAN_VMALLOC_SHADOW_START VMALLOC_END +#define KMSAN_VMALLOC_ORIGIN_START (KMSAN_VMALLOC_SHADOW_START + \ + KMSAN_VMALLOC_SIZE) +#define KMSAN_MODULES_SHADOW_START (KMSAN_VMALLOC_ORIGIN_START + \ + KMSAN_VMALLOC_SIZE) +#define KMSAN_MODULES_ORIGIN_START (KMSAN_MODULES_SHADOW_START + MODULES_LEN) +#endif + /* * A 64 bit pagetable entry of S390 has following format: * |PFRA |0IPC| OS | -- 2.41.0
[syzbot] [bpf?] [trace?] WARNING in format_decode (3)
Hello, syzbot found the following issue on: HEAD commit:76df934c6d5f MAINTAINERS: Add netdev subsystem profile link git tree: net console+strace: https://syzkaller.appspot.com/x/log.txt?x=10c2b66768 kernel config: https://syzkaller.appspot.com/x/.config?x=84217b7fc4acdc59 dashboard link: https://syzkaller.appspot.com/bug?extid=e2c932aec5c8a6e1d31c compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12b2f668e8 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=171ea200e8 Downloadable assets: disk image: https://storage.googleapis.com/syzbot-assets/e271179068c6/disk-76df934c.raw.xz vmlinux: https://storage.googleapis.com/syzbot-assets/b9523b3749bb/vmlinux-76df934c.xz kernel image: https://storage.googleapis.com/syzbot-assets/6c1a888bade0/bzImage-76df934c.xz The issue was bisected to: commit 114039b342014680911c35bd6b72624180fd669a Author: Stanislav Fomichev Date: Mon Nov 21 18:03:39 2022 + bpf: Move skb->len == 0 checks into __bpf_redirect bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=13d237b768 final oops: https://syzkaller.appspot.com/x/report.txt?x=103237b768 console output: https://syzkaller.appspot.com/x/log.txt?x=17d237b768 IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+e2c932aec5c8a6e1d...@syzkaller.appspotmail.com Fixes: 114039b34201 ("bpf: Move skb->len == 0 checks into __bpf_redirect") [ cut here ] Please remove unsupported % in format string WARNING: CPU: 0 PID: 5068 at lib/vsprintf.c:2675 format_decode+0xa03/0xba0 lib/vsprintf.c:2675 Modules linked in: CPU: 0 PID: 5068 Comm: syz-executor288 Not tainted 6.7.0-rc1-syzkaller-00134-g76df934c6d5f #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023 RIP: 0010:format_decode+0xa03/0xba0 lib/vsprintf.c:2675 Code: f7 41 c6 44 24 05 08 e9 c4 fa ff ff e8 c6 f7 15 f7 c6 05 0b bd 91 04 01 90 48 c7 c7 60 5f 19 8c 40 0f b6 f5 e8 2e 17 dc f6 90 <0f> 0b 90 90 e9 17 fc ff ff 48 8b 3c 24 e8 4b 87 6c f7 e9 13 f7 ff RSP: 0018:c90003b6f798 EFLAGS: 00010286 RAX: RBX: c90003b6fa0c RCX: 814db209 RDX: 8880214b9dc0 RSI: 814db216 RDI: 0001 RBP: R08: 0001 R09: R10: R11: 0001 R12: c90003b6f898 R13: R14: R15: ffd0 FS: 5567c380() GS:8880b980() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 00f6f398 CR3: 251e7000 CR4: 003506f0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 Call Trace: bstr_printf+0x13b/0x1050 lib/vsprintf.c:3248 bpf_trace_printk kernel/trace/bpf_trace.c:386 [inline] bpf_trace_printk+0x10b/0x180 kernel/trace/bpf_trace.c:371 bpf_prog_12183cdb1cd51dab+0x36/0x3a bpf_dispatcher_nop_func include/linux/bpf.h:1196 [inline] __bpf_prog_run include/linux/filter.h:651 [inline] bpf_prog_run include/linux/filter.h:658 [inline] bpf_test_run+0x3e1/0x9e0 net/bpf/test_run.c:423 bpf_prog_test_run_skb+0xb75/0x1dd0 net/bpf/test_run.c:1045 bpf_prog_test_run kernel/bpf/syscall.c:4040 [inline] __sys_bpf+0x11bf/0x4920 kernel/bpf/syscall.c:5401 __do_sys_bpf kernel/bpf/syscall.c:5487 [inline] __se_sys_bpf kernel/bpf/syscall.c:5485 [inline] __x64_sys_bpf+0x78/0xc0 kernel/bpf/syscall.c:5485 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x63/0x6b RIP: 0033:0x7fefcec014e9 Code: 48 83 c4 28 c3 e8 37 17 00 00 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:7ffca6179888 EFLAGS: 0246 ORIG_RAX: 0141 RAX: ffda RBX: 7ffca6179a58 RCX: 7fefcec014e9 RDX: 0028 RSI: 2080 RDI: 000a RBP: 7fefcec74610 R08: R09: 7ffca6179a58 R10: R11: 0246 R12: 0001 R13: 7ffca6179a48 R14: 0001 R15: 0001 --- This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkal...@googlegroups.com. syzbot will keep track of this issue. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot. For information about bisection process see: https://goo.gl/tpsmEJ#bisection If the report is already addressed, let syzbot know by replying with: #syz fix: exact-commit-title If you want syzbot to run the reproducer, reply with: #syz test: git://repo/address.git branch-or-commit-hash If you attach
[PATCH 05/14] tools headers UAPI: Update tools's copy of vhost.h header
tldr; Just FYI, I'm carrying this on the perf tools tree. Full explanation: There used to be no copies, with tools/ code using kernel headers directly. From time to time tools/perf/ broke due to legitimate kernel hacking. At some point Linus complained about such direct usage. Then we adopted the current model. The way these headers are used in perf are not restricted to just including them to compile something. There are sometimes used in scripts that convert defines into string tables, etc, so some change may break one of these scripts, or new MSRs may use some different #define pattern, etc. E.g.: $ ls -1 tools/perf/trace/beauty/*.sh | head -5 tools/perf/trace/beauty/arch_errno_names.sh tools/perf/trace/beauty/drm_ioctl.sh tools/perf/trace/beauty/fadvise.sh tools/perf/trace/beauty/fsconfig.sh tools/perf/trace/beauty/fsmount.sh $ $ tools/perf/trace/beauty/fadvise.sh static const char *fadvise_advices[] = { [0] = "NORMAL", [1] = "RANDOM", [2] = "SEQUENTIAL", [3] = "WILLNEED", [4] = "DONTNEED", [5] = "NOREUSE", }; $ The tools/perf/check-headers.sh script, part of the tools/ build process, points out changes in the original files. So its important not to touch the copies in tools/ when doing changes in the original kernel headers, that will be done later, when check-headers.sh inform about the change to the perf tools hackers. Cc: "Michael S. Tsirkin" Cc: Jason Wang Cc: k...@vger.kernel.org Cc: virtualizat...@lists.linux.dev Cc: net...@vger.kernel.org Signed-off-by: Namhyung Kim --- tools/include/uapi/linux/vhost.h | 8 1 file changed, 8 insertions(+) diff --git a/tools/include/uapi/linux/vhost.h b/tools/include/uapi/linux/vhost.h index f5c48b61ab62..649560c685f1 100644 --- a/tools/include/uapi/linux/vhost.h +++ b/tools/include/uapi/linux/vhost.h @@ -219,4 +219,12 @@ */ #define VHOST_VDPA_RESUME _IO(VHOST_VIRTIO, 0x7E) +/* Get the group for the descriptor table including driver & device areas + * of a virtqueue: read index, write group in num. + * The virtqueue index is stored in the index field of vhost_vring_state. + * The group ID of the descriptor table for this specific virtqueue + * is returned via num field of vhost_vring_state. + */ +#define VHOST_VDPA_GET_VRING_DESC_GROUP_IOWR(VHOST_VIRTIO, 0x7F, \ + struct vhost_vring_state) #endif -- 2.43.0.rc1.413.gea7ed67945-goog
Re: [PATCH v7 3/4] remoteproc: zynqmp: add pm domains support
Hi, On Fri, Nov 17, 2023 at 09:42:37AM -0800, Tanmay Shah wrote: > Use TCM pm domains extracted from device-tree > to power on/off TCM using general pm domain framework. > > Signed-off-by: Tanmay Shah > --- > > Changes in v7: > - %s/pm_dev1/pm_dev_core0/r > - %s/pm_dev_link1/pm_dev_core0_link/r > - %s/pm_dev2/pm_dev_core1/r > - %s/pm_dev_link2/pm_dev_core1_link/r > - remove pm_domain_id check to move next patch > - add comment about how 1st entry in pm domain list is used > - fix loop when jump to fail_add_pm_domains loop > > drivers/remoteproc/xlnx_r5_remoteproc.c | 215 +++- > 1 file changed, 212 insertions(+), 3 deletions(-) > > diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c > b/drivers/remoteproc/xlnx_r5_remoteproc.c > index 4395edea9a64..22bccc5075a0 100644 > --- a/drivers/remoteproc/xlnx_r5_remoteproc.c > +++ b/drivers/remoteproc/xlnx_r5_remoteproc.c > @@ -16,6 +16,7 @@ > #include > #include > #include > +#include > > #include "remoteproc_internal.h" > > @@ -102,6 +103,12 @@ static const struct mem_bank_data > zynqmp_tcm_banks_lockstep[] = { > * @rproc: rproc handle > * @pm_domain_id: RPU CPU power domain id > * @ipi: pointer to mailbox information > + * @num_pm_dev: number of tcm pm domain devices for this core > + * @pm_dev_core0: pm domain virtual devices for power domain framework > + * @pm_dev_core0_link: pm domain device links after registration > + * @pm_dev_core1: used only in lockstep mode. second core's pm domain > virtual devices > + * @pm_dev_core1_link: used only in lockstep mode. second core's pm device > links after > + * registration > */ > struct zynqmp_r5_core { > struct device *dev; > @@ -111,6 +118,11 @@ struct zynqmp_r5_core { > struct rproc *rproc; > u32 pm_domain_id; > struct mbox_info *ipi; > + int num_pm_dev; > + struct device **pm_dev_core0; > + struct device_link **pm_dev_core0_link; > + struct device **pm_dev_core1; > + struct device_link **pm_dev_core1_link; > }; > > /** > @@ -651,7 +663,8 @@ static int add_tcm_carveout_lockstep_mode(struct rproc > *rproc) >ZYNQMP_PM_CAPABILITY_ACCESS, 0, >ZYNQMP_PM_REQUEST_ACK_BLOCKING); > if (ret < 0) { > - dev_err(dev, "failed to turn on TCM 0x%x", > pm_domain_id); > + dev_err(dev, "failed to turn on TCM 0x%x", > + pm_domain_id); Spurious change, you should have caught that. > goto release_tcm_lockstep; > } > > @@ -758,6 +771,189 @@ static int zynqmp_r5_parse_fw(struct rproc *rproc, > const struct firmware *fw) > return ret; > } > > +static void zynqmp_r5_remove_pm_domains(struct rproc *rproc) > +{ > + struct zynqmp_r5_core *r5_core = rproc->priv; > + struct device *dev = r5_core->dev; > + struct zynqmp_r5_cluster *cluster; > + int i; > + > + cluster = platform_get_drvdata(to_platform_device(dev->parent)); > + > + for (i = 1; i < r5_core->num_pm_dev; i++) { > + device_link_del(r5_core->pm_dev_core0_link[i]); > + dev_pm_domain_detach(r5_core->pm_dev_core0[i], false); > + } > + > + kfree(r5_core->pm_dev_core0); > + r5_core->pm_dev_core0 = NULL; > + kfree(r5_core->pm_dev_core0_link); > + r5_core->pm_dev_core0_link = NULL; > + > + if (cluster->mode == SPLIT_MODE) { > + r5_core->num_pm_dev = 0; > + return; > + } > + > + for (i = 1; i < r5_core->num_pm_dev; i++) { > + device_link_del(r5_core->pm_dev_core1_link[i]); > + dev_pm_domain_detach(r5_core->pm_dev_core1[i], false); > + } > + > + kfree(r5_core->pm_dev_core1); > + r5_core->pm_dev_core1 = NULL; > + kfree(r5_core->pm_dev_core1_link); > + r5_core->pm_dev_core1_link = NULL; > + r5_core->num_pm_dev = 0; > +} > + > +static int zynqmp_r5_add_pm_domains(struct rproc *rproc) > +{ > + struct zynqmp_r5_core *r5_core = rproc->priv; > + struct device *dev = r5_core->dev, *dev2; > + struct zynqmp_r5_cluster *cluster; > + struct platform_device *pdev; > + struct device_node *np; > + int i, j, num_pm_dev, ret; > + > + cluster = dev_get_drvdata(dev->parent); > + > + /* get number of power-domains */ > + num_pm_dev = of_count_phandle_with_args(r5_core->np, "power-domains", > + "#power-domain-cells"); > + > + if (num_pm_dev <= 0) > + return -EINVAL; > + > + r5_core->pm_dev_core0 = kcalloc(num_pm_dev, > + sizeof(struct device *), > + GFP_KERNEL); > + if (!r5_core->pm_dev_core0) > + ret = -ENOMEM; > + > + r5_core->pm_dev_core0_link = kcalloc(num_pm_dev, > + sizeof(struct device_link *), > +
[PATCH 1/4] eventfs: Use GFP_NOFS for allocation when eventfs_mutex is held
From: "Steven Rostedt (Google)" If memory reclaim happens, it can reclaim file system pages. The file system pages from eventfs may take the eventfs_mutex on reclaim. This means that allocation while holding the eventfs_mutex must not call into filesystem reclaim. A lockdep splat uncovered this. Fixes: 28e12c09f5aa0 ("eventfs: Save ownership and mode") Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode") Reported-by: Mark Rutland Signed-off-by: Steven Rostedt (Google) --- fs/tracefs/event_inode.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 3eb6c622a74d..56d192f0ead8 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -95,7 +95,7 @@ static int eventfs_set_attr(struct mnt_idmap *idmap, struct dentry *dentry, if (!(dentry->d_inode->i_mode & S_IFDIR)) { if (!ei->entry_attrs) { ei->entry_attrs = kzalloc(sizeof(*ei->entry_attrs) * ei->nr_entries, - GFP_KERNEL); + GFP_NOFS); if (!ei->entry_attrs) { ret = -ENOMEM; goto out; @@ -627,7 +627,7 @@ static int add_dentries(struct dentry ***dentries, struct dentry *d, int cnt) { struct dentry **tmp; - tmp = krealloc(*dentries, sizeof(d) * (cnt + 2), GFP_KERNEL); + tmp = krealloc(*dentries, sizeof(d) * (cnt + 2), GFP_NOFS); if (!tmp) return -1; tmp[cnt] = d; -- 2.42.0
[PATCH 3/4] eventfs: Do not allow NULL parent to eventfs_start_creating()
From: "Steven Rostedt (Google)" The eventfs directory is dynamically created via the meta data supplied by the existing trace events. All files and directories in eventfs has a parent. Do not allow NULL to be passed into eventfs_start_creating() as the parent because that should never happen. Warn if it does. Signed-off-by: Steven Rostedt (Google) --- fs/tracefs/inode.c | 13 - 1 file changed, 4 insertions(+), 9 deletions(-) diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c index 5b54948514fe..ae648deed019 100644 --- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -509,20 +509,15 @@ struct dentry *eventfs_start_creating(const char *name, struct dentry *parent) struct dentry *dentry; int error; + /* Must always have a parent. */ + if (WARN_ON_ONCE(!parent)) + return ERR_PTR(-EINVAL); + error = simple_pin_fs(&trace_fs_type, &tracefs_mount, &tracefs_mount_count); if (error) return ERR_PTR(error); - /* -* If the parent is not specified, we create it in the root. -* We need the root dentry to do this, which is in the super -* block. A pointer to that is in the struct vfsmount that we -* have around. -*/ - if (!parent) - parent = tracefs_mount->mnt_root; - if (unlikely(IS_DEADDIR(parent->d_inode))) dentry = ERR_PTR(-ENOENT); else -- 2.42.0
[PATCH 0/4] eventfs: Some more minor fixes
Mark Rutland reported some crashes from the latest eventfs updates. This fixes most of them. He still has one splat that he can trigger but I can not. Still looking into that. Steven Rostedt (Google) (4): eventfs: Use GFP_NOFS for allocation when eventfs_mutex is held eventfs: Move taking of inode_lock into dcache_dir_open_wrapper() eventfs: Do not allow NULL parent to eventfs_start_creating() eventfs: Make sure that parent->d_inode is locked in creating files/dirs fs/tracefs/event_inode.c | 24 fs/tracefs/inode.c | 13 - 2 files changed, 12 insertions(+), 25 deletions(-)
[PATCH 2/4] eventfs: Move taking of inode_lock into dcache_dir_open_wrapper()
From: "Steven Rostedt (Google)" The both create_file_dentry() and create_dir_dentry() takes a boolean parameter "lookup", as on lookup the inode_lock should already be taken, but for dcache_dir_open_wrapper() it is not taken. There's no reason that the dcache_dir_open_wrapper() can't take the inode_lock before calling these functions. In fact, it's better if it does, as the lock can be held throughout both directory and file creations. This also simplifies the code, and possibly prevents unexpected race conditions when the lock is released. Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode") Signed-off-by: Steven Rostedt (Google) --- fs/tracefs/event_inode.c | 16 ++-- 1 file changed, 2 insertions(+), 14 deletions(-) diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 56d192f0ead8..590e8176449b 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -347,15 +347,8 @@ create_file_dentry(struct eventfs_inode *ei, int idx, mutex_unlock(&eventfs_mutex); - /* The lookup already has the parent->d_inode locked */ - if (!lookup) - inode_lock(parent->d_inode); - dentry = create_file(name, mode, attr, parent, data, fops); - if (!lookup) - inode_unlock(parent->d_inode); - mutex_lock(&eventfs_mutex); if (IS_ERR_OR_NULL(dentry)) { @@ -453,15 +446,8 @@ create_dir_dentry(struct eventfs_inode *pei, struct eventfs_inode *ei, } mutex_unlock(&eventfs_mutex); - /* The lookup already has the parent->d_inode locked */ - if (!lookup) - inode_lock(parent->d_inode); - dentry = create_dir(ei, parent); - if (!lookup) - inode_unlock(parent->d_inode); - mutex_lock(&eventfs_mutex); if (IS_ERR_OR_NULL(dentry) && !ei->is_freed) { @@ -693,6 +679,7 @@ static int dcache_dir_open_wrapper(struct inode *inode, struct file *file) return -ENOMEM; } + inode_lock(parent->d_inode); list_for_each_entry_srcu(ei_child, &ei->children, list, srcu_read_lock_held(&eventfs_srcu)) { d = create_dir_dentry(ei, ei_child, parent, false); @@ -725,6 +712,7 @@ static int dcache_dir_open_wrapper(struct inode *inode, struct file *file) cnt++; } } + inode_unlock(parent->d_inode); srcu_read_unlock(&eventfs_srcu, idx); ret = dcache_dir_open(inode, file); -- 2.42.0
[PATCH 4/4] eventfs: Make sure that parent->d_inode is locked in creating files/dirs
From: "Steven Rostedt (Google)" Since the locking of the parent->d_inode has been moved outside the creation of the files and directories (as it use to be locked via a conditional), add a WARN_ON_ONCE() to the case that it's not locked. Signed-off-by: Steven Rostedt (Google) --- fs/tracefs/event_inode.c | 4 1 file changed, 4 insertions(+) diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 590e8176449b..0b90869fd805 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -327,6 +327,8 @@ create_file_dentry(struct eventfs_inode *ei, int idx, struct dentry **e_dentry = &ei->d_children[idx]; struct dentry *dentry; + WARN_ON_ONCE(!inode_is_locked(parent->d_inode)); + mutex_lock(&eventfs_mutex); if (ei->is_freed) { mutex_unlock(&eventfs_mutex); @@ -430,6 +432,8 @@ create_dir_dentry(struct eventfs_inode *pei, struct eventfs_inode *ei, { struct dentry *dentry = NULL; + WARN_ON_ONCE(!inode_is_locked(parent->d_inode)); + mutex_lock(&eventfs_mutex); if (pei->is_freed || ei->is_freed) { mutex_unlock(&eventfs_mutex); -- 2.42.0
Re: [PATCH] MAINTAINERS: TRACING: Add Mathieu Desnoyers as Reviewer
On Wed, 15 Nov 2023 10:50:18 -0500 Mathieu Desnoyers wrote: > In order to make sure I get CC'd on tracing changes for which my input > would be relevant, add my name as reviewer of the TRACING subsystem. > Yes, you should be a reviewer for tracing subsystem :) Acked-by: Masami Hiramatsu (Google) Thanks! > Signed-off-by: Mathieu Desnoyers > Cc: Steven Rostedt > Cc: Masami Hiramatsu > Cc: linux-trace-ker...@vger.kernel.org > --- > MAINTAINERS | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/MAINTAINERS b/MAINTAINERS > index 4cc6bf79fdd8..a7c2092d0063 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -21622,6 +21622,7 @@ F:drivers/hwmon/pmbus/tps546d24.c > TRACING > M: Steven Rostedt > M: Masami Hiramatsu > +R: Mathieu Desnoyers > L: linux-kernel@vger.kernel.org > L: linux-trace-ker...@vger.kernel.org > S: Maintained > -- > 2.39.2 > -- Masami Hiramatsu (Google)
[PATCH net] bpf: test_run: fix WARNING in format_decode
Confirm that skb->len is not 0 to ensure that skb length is valid. Fixes: 114039b34201 ("bpf: Move skb->len == 0 checks into __bpf_redirect") Reported-by: syzbot+e2c932aec5c8a6e1d...@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis --- net/bpf/test_run.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index c9fdcc5cdce1..78258a822a5c 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -845,6 +845,9 @@ static int convert___skb_to_skb(struct sk_buff *skb, struct __sk_buff *__skb) { struct qdisc_skb_cb *cb = (struct qdisc_skb_cb *)skb->cb; + if (!skb->len) + return -EINVAL; + if (!__skb) return 0; -- 2.26.1
[ANNOUNCE] 5.10.201-rt98
Hello RT-list! I'm pleased to announce the 5.10.201-rt98 stable release. This release is just an update to the new stable 5.10.201 version and no RT changes have been made. You can get this release via the git tree at: git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git branch: v5.10-rt Head SHA1: 3a93f0a0d49dd0db4c6876ca9a7369350e64320e Or to build 5.10.201-rt98 directly, the following patches should be applied: https://www.kernel.org/pub/linux/kernel/v5.x/linux-5.10.tar.xz https://www.kernel.org/pub/linux/kernel/v5.x/patch-5.10.201.xz https://www.kernel.org/pub/linux/kernel/projects/rt/5.10/older/patch-5.10.201-rt98.patch.xz Signing key fingerprint: 9354 0649 9972 8D31 D464 D140 F394 A423 F8E6 7C26 All keys used for the above files and repositories can be found on the following git repository: git://git.kernel.org/pub/scm/docs/kernel/pgpkeys.git Enjoy! Luis
[ANNOUNCE] 4.14.330-rt157
Hello RT-list! I'm pleased to announce the 4.14.330-rt157 stable release. This release is just an update to the new stable 4.14.330 version and no RT changes have been made. You can get this release via the git tree at: git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git branch: v4.14-rt Head SHA1: 7c25b98670e2f172f2813e18a39c02bf13b9849e Or to build 4.14.330-rt157 directly, the following patches should be applied: https://www.kernel.org/pub/linux/kernel/v4.x/linux-4.14.tar.xz https://www.kernel.org/pub/linux/kernel/v4.x/patch-4.14.330.xz https://www.kernel.org/pub/linux/kernel/projects/rt/4.14/older/patch-4.14.330-rt157.patch.xz Signing key fingerprint: 9354 0649 9972 8D31 D464 D140 F394 A423 F8E6 7C26 All keys used for the above files and repositories can be found on the following git repository: git://git.kernel.org/pub/scm/docs/kernel/pgpkeys.git Enjoy! Luis
Re: [PATCH RFC v2 20/27] mm: hugepage: Handle huge page fault on access
On Sun, Nov 19, 2023 at 8:59 AM Alexandru Elisei wrote: > > Handle PAGE_FAULT_ON_ACCESS faults for huge pages in a similar way to > regular pages. > > Signed-off-by: Alexandru Elisei > --- > arch/arm64/include/asm/mte_tag_storage.h | 1 + > arch/arm64/include/asm/pgtable.h | 7 ++ > arch/arm64/mm/fault.c| 81 > include/linux/huge_mm.h | 2 + > include/linux/pgtable.h | 5 ++ > mm/huge_memory.c | 4 +- > mm/memory.c | 3 + > 7 files changed, 101 insertions(+), 2 deletions(-) > > diff --git a/arch/arm64/include/asm/mte_tag_storage.h > b/arch/arm64/include/asm/mte_tag_storage.h > index c70ced60a0cd..b97406d369ce 100644 > --- a/arch/arm64/include/asm/mte_tag_storage.h > +++ b/arch/arm64/include/asm/mte_tag_storage.h > @@ -35,6 +35,7 @@ void free_tag_storage(struct page *page, int order); > bool page_tag_storage_reserved(struct page *page); > > vm_fault_t handle_page_missing_tag_storage(struct vm_fault *vmf); > +vm_fault_t handle_huge_page_missing_tag_storage(struct vm_fault *vmf); > #else > static inline bool tag_storage_enabled(void) > { > diff --git a/arch/arm64/include/asm/pgtable.h > b/arch/arm64/include/asm/pgtable.h > index 8cc135f1c112..1704411c096d 100644 > --- a/arch/arm64/include/asm/pgtable.h > +++ b/arch/arm64/include/asm/pgtable.h > @@ -477,6 +477,13 @@ static inline vm_fault_t > arch_do_page_fault_on_access(struct vm_fault *vmf) > return handle_page_missing_tag_storage(vmf); > return VM_FAULT_SIGBUS; > } > + > +static inline vm_fault_t arch_do_huge_page_fault_on_access(struct vm_fault > *vmf) > +{ > + if (tag_storage_enabled()) > + return handle_huge_page_missing_tag_storage(vmf); > + return VM_FAULT_SIGBUS; > +} > #endif /* CONFIG_ARCH_HAS_FAULT_ON_ACCESS */ > > #define pmd_present_invalid(pmd) (!!(pmd_val(pmd) & PMD_PRESENT_INVALID)) > diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c > index f5fa583acf18..6730a0812a24 100644 > --- a/arch/arm64/mm/fault.c > +++ b/arch/arm64/mm/fault.c > @@ -1041,6 +1041,87 @@ vm_fault_t handle_page_missing_tag_storage(struct > vm_fault *vmf) > > return 0; > > +out_retry: > + put_page(page); > + if (vmf->flags & FAULT_FLAG_VMA_LOCK) > + vma_end_read(vma); > + if (fault_flag_allow_retry_first(vmf->flags)) { > + err = VM_FAULT_RETRY; > + } else { > + /* Replay the fault. */ > + err = 0; > + } > + return err; > +} > + > +vm_fault_t handle_huge_page_missing_tag_storage(struct vm_fault *vmf) > +{ > + unsigned long haddr = vmf->address & HPAGE_PMD_MASK; > + struct vm_area_struct *vma = vmf->vma; > + pmd_t old_pmd, new_pmd; > + bool writable = false; > + struct page *page; > + vm_fault_t err; > + int ret; > + > + vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd); > + if (unlikely(!pmd_same(vmf->orig_pmd, *vmf->pmd))) { > + spin_unlock(vmf->ptl); > + return 0; > + } > + > + old_pmd = vmf->orig_pmd; > + new_pmd = pmd_modify(old_pmd, vma->vm_page_prot); > + > + /* > +* Detect now whether the PMD could be writable; this information > +* is only valid while holding the PT lock. > +*/ > + writable = pmd_write(new_pmd); > + if (!writable && vma_wants_manual_pte_write_upgrade(vma) && > + can_change_pmd_writable(vma, vmf->address, new_pmd)) > + writable = true; > + > + page = vm_normal_page_pmd(vma, haddr, new_pmd); > + if (!page) > + goto out_map; > + > + if (!(vma->vm_flags & VM_MTE)) > + goto out_map; > + > + get_page(page); > + vma_set_access_pid_bit(vma); > + > + spin_unlock(vmf->ptl); > + writable = false; > + > + if (unlikely(is_migrate_isolate_page(page))) > + goto out_retry; > + > + ret = reserve_tag_storage(page, HPAGE_PMD_ORDER, > GFP_HIGHUSER_MOVABLE); > + if (ret) > + goto out_retry; > + > + put_page(page); > + > + vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd); > + if (unlikely(!pmd_same(old_pmd, *vmf->pmd))) { > + spin_unlock(vmf->ptl); > + return 0; > + } > + > +out_map: > + /* Restore the PMD */ > + new_pmd = pmd_modify(old_pmd, vma->vm_page_prot); > + new_pmd = pmd_mkyoung(new_pmd); > + if (writable) > + new_pmd = pmd_mkwrite(new_pmd, vma); > + set_pmd_at(vma->vm_mm, haddr, vmf->pmd, new_pmd); > + update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); > + spin_unlock(vmf->ptl); > + > + return 0; > + > out_retry: > put_page(page); > if (vmf->flags & FAULT_FLAG_VMA_LOCK) > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > index fa0350b0812
Re: [PATCH net] bpf: test_run: fix WARNING in format_decode
On 11/21/23 7:50 PM, Edward Adam Davis wrote: Confirm that skb->len is not 0 to ensure that skb length is valid. Fixes: 114039b34201 ("bpf: Move skb->len == 0 checks into __bpf_redirect") Reported-by: syzbot+e2c932aec5c8a6e1d...@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis Stan, Could you take a look at this patch? --- net/bpf/test_run.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index c9fdcc5cdce1..78258a822a5c 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -845,6 +845,9 @@ static int convert___skb_to_skb(struct sk_buff *skb, struct __sk_buff *__skb) { struct qdisc_skb_cb *cb = (struct qdisc_skb_cb *)skb->cb; + if (!skb->len) + return -EINVAL; + if (!__skb) return 0;