date:20231121

Re: [PATCH RFC 20/37] mm: compaction: Reserve metadata storage in compaction_alloc()

2023-11-21 Thread Alexandru Elisei

Hi Peter,

On Mon, Nov 20, 2023 at 08:49:32PM -0800, Peter Collingbourne wrote:
> Hi Alexandru,
> 
> On Wed, Aug 23, 2023 at 6:16 AM Alexandru Elisei
>  wrote:
> >
> > If the source page being migrated has metadata associated with it, make
> > sure to reserve the metadata storage when choosing a suitable destination
> > page from the free list.
> >
> > Signed-off-by: Alexandru Elisei 
> > ---
> >  mm/compaction.c | 9 +
> >  mm/internal.h   | 1 +
> >  2 files changed, 10 insertions(+)
> >
> > diff --git a/mm/compaction.c b/mm/compaction.c
> > index cc0139fa0cb0..af2ee3085623 100644
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -570,6 +570,7 @@ static unsigned long isolate_freepages_block(struct 
> > compact_control *cc,
> > bool locked = false;
> > unsigned long blockpfn = *start_pfn;
> > unsigned int order;
> > +   int ret;
> >
> > /* Strict mode is for isolation, speed is secondary */
> > if (strict)
> > @@ -626,6 +627,11 @@ static unsigned long isolate_freepages_block(struct 
> > compact_control *cc,
> >
> > /* Found a free page, will break it into order-0 pages */
> > order = buddy_order(page);
> > +   if (metadata_storage_enabled() && cc->reserve_metadata) {
> > +   ret = reserve_metadata_storage(page, order, 
> > cc->gfp_mask);
> 
> At this point the zone lock is held and preemption is disabled, which
> makes it invalid to call reserve_metadata_storage.

You are correct, I missed that. I dropped reserving tag storage during
compaction in the next iteration, so fortunately I unintentionally fixed
it.

Thanks,
Alex

> 
> Peter
> 
> > +   if (ret)
> > +   goto isolate_fail;
> > +   }
> > isolated = __isolate_free_page(page, order);
> > if (!isolated)
> > break;
> > @@ -1757,6 +1763,9 @@ static struct folio *compaction_alloc(struct folio 
> > *src, unsigned long data)
> > struct compact_control *cc = (struct compact_control *)data;
> > struct folio *dst;
> >
> > +   if (metadata_storage_enabled())
> > +   cc->reserve_metadata = folio_has_metadata(src);
> > +
> > if (list_empty(&cc->freepages)) {
> > isolate_freepages(cc);
> >
> > diff --git a/mm/internal.h b/mm/internal.h
> > index d28ac0085f61..046cc264bfbe 100644
> > --- a/mm/internal.h
> > +++ b/mm/internal.h
> > @@ -492,6 +492,7 @@ struct compact_control {
> >  */
> > bool alloc_contig;  /* alloc_contig_range allocation */
> > bool source_has_metadata;   /* source pages have associated 
> > metadata */
> > +   bool reserve_metadata;
> >  };
> >
> >  /*
> > --
> > 2.41.0
> >

Re: Re: EEVDF/vhost regression (bisected to 86bfbb7ce4f6 sched/fair: Add lag based placement)

2023-11-21 Thread Tobias Huschle

On Fri, Nov 17, 2023 at 09:07:55PM +0800, Abel Wu wrote:
> On 11/17/23 8:37 PM, Peter Zijlstra Wrote:

[...]

> > Ah, so if this is a cgroup issue, it might be worth trying this patch
> > that we have in tip/sched/urgent.
> 
> And please also apply this fix:
> https://lore.kernel.org/all/20231117080106.12890-1-s921975...@gmail.com/
> 

We applied both suggested patch options and ran the test again, so 

sched/eevdf: Fix vruntime adjustment on reweight
sched/fair: Update min_vruntime for reweight_entity() correctly

and

sched/eevdf: Delay dequeue

Unfortunately, both variants do NOT fix the problem.
The regression remains unchanged.

I will continue getting myself familiar with how cgroups are scheduled to dig 
deeper here. If there are any other ideas, I'd be happy to use them as a 
starting point for further analysis.

Would additional traces still be of interest? If so, I would be glad to
provide them.

[...]

[PATCH v3 0/5] arch,locking/atomic: add arch_cmpxchg[64]_local

2023-11-21 Thread wuqiang.matt

Archtectures arc, openrisc and hexagon haven't arch_cmpxchg_local()
defined, so the usecase of try_cmpxchg_local() in lib/objpool.c can
not pass kernel building by the kernel test robot.

Patch 1 improves the data size checking logic for arc; Patches 2/3/4
implement arch_cmpxchg[64]_local for arc/openrisc/hexagon. Patch 5
defines arch_cmpxchg_local as existing __cmpxchg_local rather the
generic variant.

wuqiang.matt (5):
  arch,locking/atomic: arc: arch_cmpxchg should check data size
  arch,locking/atomic: arc: add arch_cmpxchg[64]_local
  arch,locking/atomic: openrisc: add arch_cmpxchg[64]_local
  arch,locking/atomic: hexagon: add arch_cmpxchg[64]_local
  arch,locking/atomic: xtensa: define arch_cmpxchg_local as
__cmpxchg_local

 arch/arc/include/asm/cmpxchg.h  | 40 ++
 arch/hexagon/include/asm/cmpxchg.h  | 51 -
 arch/openrisc/include/asm/cmpxchg.h |  6 
 arch/xtensa/include/asm/cmpxchg.h   |  2 +-
 4 files changed, 91 insertions(+), 8 deletions(-)

-- 
2.40.1

[PATCH v3 1/5] arch,locking/atomic: arc: arch_cmpxchg should check data size

2023-11-21 Thread wuqiang.matt

arch_cmpxchg() should check data size rather than pointer size in case
CONFIG_ARC_HAS_LLSC is defined. So rename __cmpxchg to __cmpxchg_32 to
emphasize it's explicit support of 32bit data size with BUILD_BUG_ON()
added to avoid any possible misuses with unsupported data types.

In case CONFIG_ARC_HAS_LLSC is undefined, arch_cmpxchg() uses spinlock
to accomplish SMP-safety, so the BUILD_BUG_ON checking is uncecessary.

v2 -> v3:
  - Patches regrouped and has the improvement for xtensa included
  - Comments refined to address why these changes are needed

v1 -> v2:
  - Try using native cmpxchg variants if avaialble, as Arnd advised

Signed-off-by: wuqiang.matt 
Reviewed-by: Masami Hiramatsu (Google) 
---
 arch/arc/include/asm/cmpxchg.h | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arc/include/asm/cmpxchg.h b/arch/arc/include/asm/cmpxchg.h
index e138fde067de..bf46514f6f12 100644
--- a/arch/arc/include/asm/cmpxchg.h
+++ b/arch/arc/include/asm/cmpxchg.h
@@ -18,14 +18,16 @@
  * if (*ptr == @old)
  *  *ptr = @new
  */
-#define __cmpxchg(ptr, old, new)   \
+#define __cmpxchg_32(ptr, old, new)\
 ({ \
__typeof__(*(ptr)) _prev;   \
\
+   BUILD_BUG_ON(sizeof(*(ptr)) != 4);  \
+   \
__asm__ __volatile__(   \
-   "1: llock  %0, [%1] \n" \
+   "1: llock  %0, [%1] \n" \
"   brne   %0, %2, 2f   \n" \
-   "   scond  %3, [%1] \n" \
+   "   scond  %3, [%1] \n" \
"   bnz 1b  \n" \
"2: \n" \
: "=&r"(_prev)  /* Early clobber prevent reg reuse */   \
@@ -47,7 +49,7 @@
\
switch(sizeof((_p_))) { \
case 4: \
-   _prev_ = __cmpxchg(_p_, _o_, _n_);  \
+   _prev_ = __cmpxchg_32(_p_, _o_, _n_);   \
break;  \
default:\
BUILD_BUG();\
@@ -65,8 +67,6 @@
__typeof__(*(ptr)) _prev_;  \
unsigned long __flags;  \
\
-   BUILD_BUG_ON(sizeof(_p_) != 4); \
-   \
/*  \
 * spin lock/unlock provide the needed smp_mb() before/after\
 */ \
-- 
2.40.1

[PATCH v3 2/5] arch,locking/atomic: arc: add arch_cmpxchg[64]_local

2023-11-21 Thread wuqiang.matt

arc doesn't have arch_cmpxhg_local implemented, which causes
building failures for any references of try_cmpxchg_local,
reported by the kernel test robot.

This patch implements arch_cmpxchg[64]_local with the native
cmpxchg variant if the corresponding data size is supported,
otherwise generci_cmpxchg[64]_local is to be used.

Reported-by: kernel test robot 
Closes: 
https://lore.kernel.org/oe-kbuild-all/202310272207.tlpflya4-...@intel.com/

Signed-off-by: wuqiang.matt 
Reviewed-by: Masami Hiramatsu (Google) 
---
 arch/arc/include/asm/cmpxchg.h | 28 
 1 file changed, 28 insertions(+)

diff --git a/arch/arc/include/asm/cmpxchg.h b/arch/arc/include/asm/cmpxchg.h
index bf46514f6f12..91429f2350df 100644
--- a/arch/arc/include/asm/cmpxchg.h
+++ b/arch/arc/include/asm/cmpxchg.h
@@ -80,6 +80,34 @@
 
 #endif
 
+/*
+ * always make arch_cmpxchg[64]_local available, native cmpxchg
+ * will be used if available, then generic_cmpxchg[64]_local
+ */
+#include 
+static inline unsigned long __cmpxchg_local(volatile void *ptr,
+ unsigned long old,
+ unsigned long new, int size)
+{
+   switch (size) {
+#ifdef CONFIG_ARC_HAS_LLSC
+   case 4:
+   return __cmpxchg_32((int32_t *)ptr, old, new);
+#endif
+   default:
+   return __generic_cmpxchg_local(ptr, old, new, size);
+   }
+
+   return old;
+}
+#define arch_cmpxchg_local(ptr, o, n) ({   \
+   (__typeof__(*ptr))__cmpxchg_local((ptr),\
+   (unsigned long)(o), \
+   (unsigned long)(n), \
+   sizeof(*(ptr)));\
+})
+#define arch_cmpxchg64_local(ptr, o, n) __generic_cmpxchg64_local((ptr), (o), 
(n))
+
 /*
  * xchg
  */
-- 
2.40.1

[PATCH v3 3/5] arch,locking/atomic: openrisc: add arch_cmpxchg[64]_local

2023-11-21 Thread wuqiang.matt

openrisc hasn't arch_cmpxhg_local implemented, which causes
building failures for any references of try_cmpxchg_local,
reported by the kernel test robot.

This patch implements arch_cmpxchg[64]_local with the native
cmpxchg variant if the corresponding data size is supported,
otherwise generci_cmpxchg[64]_local is to be used.

Reported-by: kernel test robot 
Closes: 
https://lore.kernel.org/oe-kbuild-all/202310272207.tlpflya4-...@intel.com/

Signed-off-by: wuqiang.matt 
Reviewed-by: Masami Hiramatsu (Google) 
---
 arch/openrisc/include/asm/cmpxchg.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/openrisc/include/asm/cmpxchg.h 
b/arch/openrisc/include/asm/cmpxchg.h
index 8ee151c072e4..f1ffe8b6f5ef 100644
--- a/arch/openrisc/include/asm/cmpxchg.h
+++ b/arch/openrisc/include/asm/cmpxchg.h
@@ -139,6 +139,12 @@ static inline unsigned long __cmpxchg(volatile void *ptr, 
unsigned long old,
   (unsigned long)(n),  \
   sizeof(*(ptr))); \
})
+#define arch_cmpxchg_local arch_cmpxchg
+
+/* always make arch_cmpxchg64_local available for openrisc */
+#include 
+
+#define arch_cmpxchg64_local(ptr, o, n) __generic_cmpxchg64_local((ptr), (o), 
(n))
 
 /*
  * This function doesn't exist, so you'll get a linker error if
-- 
2.40.1

[PATCH v3 4/5] arch,locking/atomic: hexagon: add arch_cmpxchg[64]_local

2023-11-21 Thread wuqiang.matt

hexagonc hasn't arch_cmpxhg_local implemented, which causes
building failures for any references of try_cmpxchg_local,
reported by the kernel test robot.

This patch implements arch_cmpxchg[64]_local with the native
cmpxchg variant if the corresponding data size is supported,
otherwise generci_cmpxchg[64]_local is to be used.

Reported-by: kernel test robot 
Closes: 
https://lore.kernel.org/oe-kbuild-all/202310272207.tlpflya4-...@intel.com/

Signed-off-by: wuqiang.matt 
Reviewed-by: Masami Hiramatsu (Google) 
---
 arch/hexagon/include/asm/cmpxchg.h | 51 +-
 1 file changed, 50 insertions(+), 1 deletion(-)

diff --git a/arch/hexagon/include/asm/cmpxchg.h 
b/arch/hexagon/include/asm/cmpxchg.h
index bf6cf5579cf4..302fa30f25aa 100644
--- a/arch/hexagon/include/asm/cmpxchg.h
+++ b/arch/hexagon/include/asm/cmpxchg.h
@@ -8,6 +8,8 @@
 #ifndef _ASM_CMPXCHG_H
 #define _ASM_CMPXCHG_H
 
+#include 
+
 /*
  * __arch_xchg - atomically exchange a register and a memory location
  * @x: value to swap
@@ -51,13 +53,15 @@ __arch_xchg(unsigned long x, volatile void *ptr, int size)
  *  variable casting.
  */
 
-#define arch_cmpxchg(ptr, old, new)\
+#define __cmpxchg_32(ptr, old, new)\
 ({ \
__typeof__(ptr) __ptr = (ptr);  \
__typeof__(*(ptr)) __old = (old);   \
__typeof__(*(ptr)) __new = (new);   \
__typeof__(*(ptr)) __oldval = 0;\
\
+   BUILD_BUG_ON(sizeof(*(ptr)) != 4);  \
+   \
asm volatile(   \
"1: %0 = memw_locked(%1);\n"\
"   { P0 = cmp.eq(%0,%2);\n"\
@@ -72,4 +76,49 @@ __arch_xchg(unsigned long x, volatile void *ptr, int size)
__oldval;   \
 })
 
+#define __cmpxchg(ptr, old, val, size) \
+({ \
+   __typeof__(*(ptr)) oldval;  \
+   \
+   switch (size) { \
+   case 4: \
+   oldval = __cmpxchg_32(ptr, old, val);   \
+   break;  \
+   default:\
+   BUILD_BUG();\
+   oldval = val;   \
+   break;  \
+   }   \
+   \
+   oldval; \
+})
+
+#define arch_cmpxchg(ptr, o, n)__cmpxchg((ptr), (o), (n), 
sizeof(*(ptr)))
+
+/*
+ * always make arch_cmpxchg[64]_local available, native cmpxchg
+ * will be used if available, then generic_cmpxchg[64]_local
+ */
+#include 
+
+#define arch_cmpxchg_local(ptr, old, val)  \
+({ \
+   __typeof__(*(ptr)) __retval;\
+   int __size = sizeof(*(ptr));\
+   \
+   switch (__size) {   \
+   case 4: \
+   __retval = __cmpxchg_32(ptr, old, val); \
+   break;  \
+   default:\
+   __retval = __generic_cmpxchg_local(ptr, old,\
+   val, __size);   \
+   break;  \
+   }   \
+   \
+   __retval;   \
+})
+
+#define arch_cmpxchg64_local(ptr, o, n) __generic_cmpxchg64_local((ptr), (o), 
(n))
+
 #endif /* _ASM_CMPXCHG_H */
-- 
2.40.1

[PATCH v3 5/5] arch,locking/atomic: xtensa: define arch_cmpxchg_local as __cmpxchg_local

2023-11-21 Thread wuqiang.matt

The xtensa architecture already has __cmpxchg_local defined but not used.
The purpose of __cmpxchg_local() is solely for arch_cmpxchg_local(), just
as the definition of arch_cmpxchg_local() for other architectures like x86,
arm and powerpc.

Signed-off-by: wuqiang.matt 
---
 arch/xtensa/include/asm/cmpxchg.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/xtensa/include/asm/cmpxchg.h 
b/arch/xtensa/include/asm/cmpxchg.h
index 675a11ea8de7..956c9925df1c 100644
--- a/arch/xtensa/include/asm/cmpxchg.h
+++ b/arch/xtensa/include/asm/cmpxchg.h
@@ -108,7 +108,7 @@ static inline unsigned long __cmpxchg_local(volatile void 
*ptr,
  * them available.
  */
 #define arch_cmpxchg_local(ptr, o, n) \
-   ((__typeof__(*(ptr)))__generic_cmpxchg_local((ptr), (unsigned long)(o),\
+   ((__typeof__(*(ptr)))__cmpxchg_local((ptr), (unsigned long)(o),\
(unsigned long)(n), sizeof(*(ptr
 #define arch_cmpxchg64_local(ptr, o, n) __generic_cmpxchg64_local((ptr), (o), 
(n))
 #define arch_cmpxchg64(ptr, o, n)arch_cmpxchg64_local((ptr), (o), (n))
-- 
2.40.1

[PATCH v6 06/13] x86/bugs: Rename SLS to CONFIG_MITIGATION_SLS

2023-11-21 Thread Breno Leitao

CPU mitigations config entries are inconsistent, and names are hard to
related. There are concrete benefits for both users and developers of
having all the mitigation config options living in the same config
namespace.

The mitigation options should have consistency and start with
MITIGATION.

Rename the Kconfig entry from SLS to MITIGATION_SLS.

Suggested-by: Josh Poimboeuf 
Signed-off-by: Breno Leitao 
---
 arch/x86/Kconfig   | 2 +-
 arch/x86/Makefile  | 2 +-
 arch/x86/include/asm/linkage.h | 4 ++--
 arch/x86/kernel/alternative.c  | 4 ++--
 arch/x86/kernel/ftrace.c   | 3 ++-
 arch/x86/net/bpf_jit_comp.c| 4 ++--
 scripts/Makefile.lib   | 2 +-
 7 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 862be9b3b216..fa246de60cdb 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2580,7 +2580,7 @@ config CPU_SRSO
help
  Enable the SRSO mitigation needed on AMD Zen1-4 machines.
 
-config SLS
+config MITIGATION_SLS
bool "Mitigate Straight-Line-Speculation"
depends on CC_HAS_SLS && X86_64
select OBJTOOL if HAVE_OBJTOOL
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index b8d23ed059fb..5ce8c30e7701 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -205,7 +205,7 @@ ifdef CONFIG_MITIGATION_RETPOLINE
   endif
 endif
 
-ifdef CONFIG_SLS
+ifdef CONFIG_MITIGATION_SLS
   KBUILD_CFLAGS += -mharden-sls=all
 endif
 
diff --git a/arch/x86/include/asm/linkage.h b/arch/x86/include/asm/linkage.h
index c5165204c66f..09e2d026df33 100644
--- a/arch/x86/include/asm/linkage.h
+++ b/arch/x86/include/asm/linkage.h
@@ -43,7 +43,7 @@
 #if defined(CONFIG_RETHUNK) && !defined(__DISABLE_EXPORTS) && 
!defined(BUILD_VDSO)
 #define RETjmp __x86_return_thunk
 #else /* CONFIG_MITIGATION_RETPOLINE */
-#ifdef CONFIG_SLS
+#ifdef CONFIG_MITIGATION_SLS
 #define RETret; int3
 #else
 #define RETret
@@ -55,7 +55,7 @@
 #if defined(CONFIG_RETHUNK) && !defined(__DISABLE_EXPORTS) && 
!defined(BUILD_VDSO)
 #define ASM_RET"jmp __x86_return_thunk\n\t"
 #else /* CONFIG_MITIGATION_RETPOLINE */
-#ifdef CONFIG_SLS
+#ifdef CONFIG_MITIGATION_SLS
 #define ASM_RET"ret; int3\n\t"
 #else
 #define ASM_RET"ret\n\t"
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 5ec887d065ce..b01d49862497 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -637,8 +637,8 @@ static int patch_retpoline(void *addr, struct insn *insn, 
u8 *bytes)
/*
 * The compiler is supposed to EMIT an INT3 after every unconditional
 * JMP instruction due to AMD BTC. However, if the compiler is too old
-* or SLS isn't enabled, we still need an INT3 after indirect JMPs
-* even on Intel.
+* or MITIGATION_SLS isn't enabled, we still need an INT3 after
+* indirect JMPs even on Intel.
 */
if (op == JMP32_INSN_OPCODE && i < insn->length)
bytes[i++] = INT3_INSN_OPCODE;
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 93bc52d4a472..70139d9d2e01 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -307,7 +307,8 @@ union ftrace_op_code_union {
} __attribute__((packed));
 };
 
-#define RET_SIZE   (IS_ENABLED(CONFIG_MITIGATION_RETPOLINE) ? 5 : 1 + 
IS_ENABLED(CONFIG_SLS))
+#define RET_SIZE \
+   (IS_ENABLED(CONFIG_MITIGATION_RETPOLINE) ? 5 : 1 + 
IS_ENABLED(CONFIG_MITIGATION_SLS))
 
 static unsigned long
 create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size)
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index ef732f323926..96a63c4386a9 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -469,7 +469,7 @@ static void emit_indirect_jump(u8 **pprog, int reg, u8 *ip)
emit_jump(&prog, &__x86_indirect_thunk_array[reg], ip);
} else {
EMIT2(0xFF, 0xE0 + reg);/* jmp *%\reg */
-   if (IS_ENABLED(CONFIG_MITIGATION_RETPOLINE) || 
IS_ENABLED(CONFIG_SLS))
+   if (IS_ENABLED(CONFIG_MITIGATION_RETPOLINE) || 
IS_ENABLED(CONFIG_MITIGATION_SLS))
EMIT1(0xCC);/* int3 */
}
 
@@ -484,7 +484,7 @@ static void emit_return(u8 **pprog, u8 *ip)
emit_jump(&prog, x86_return_thunk, ip);
} else {
EMIT1(0xC3);/* ret */
-   if (IS_ENABLED(CONFIG_SLS))
+   if (IS_ENABLED(CONFIG_MITIGATION_SLS))
EMIT1(0xCC);/* int3 */
}
 
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index d6e157938b5f..0d5461276179 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -264,7 +264,7 @@ endif
 objtool-args-$(CONFIG_UNWINDER_ORC)+= --orc
 objtool-args-$(CONFIG_MITIGATION_RETPOLINE)+= --retpoline
 objtool-args-$(CONFIG_RETHUNK)

Re: [PATCH v4 4/5] x86/paravirt: switch mixed paravirt/alternative calls to alternative_2

2023-11-21 Thread Borislav Petkov

On Mon, Oct 30, 2023 at 03:25:07PM +0100, Juergen Gross wrote:
> Instead of stacking alternative and paravirt patching, use the new
> ALT_FLAG_CALL flag to switch those mixed calls to pure alternative
> handling.
> 
> This eliminates the need to be careful regarding the sequence of
> alternative and paravirt patching.
> 
> For call depth tracking callthunks_setup() needs to be adapted to patch
> calls at alternative patching sites instead of paravirt calls.

Why is this important so that it is called out explicitly in the commit
message? Is callthunks_setup() special somehow?

> 
> Signed-off-by: Juergen Gross 
> Acked-by: Peter Zijlstra (Intel) 
> ---
>  arch/x86/include/asm/alternative.h|  5 +++--
>  arch/x86/include/asm/paravirt.h   |  9 ++---
>  arch/x86/include/asm/paravirt_types.h | 26 +-
>  arch/x86/kernel/callthunks.c  | 17 -
>  arch/x86/kernel/module.c  | 20 +---
>  5 files changed, 31 insertions(+), 46 deletions(-)
> 
> diff --git a/arch/x86/include/asm/alternative.h 
> b/arch/x86/include/asm/alternative.h
> index 2a74a94bd569..07b17bc615a0 100644
> --- a/arch/x86/include/asm/alternative.h
> +++ b/arch/x86/include/asm/alternative.h
> @@ -89,6 +89,8 @@ struct alt_instr {
>   u8  replacementlen; /* length of new instruction */
>  } __packed;
>  
> +extern struct alt_instr __alt_instructions[], __alt_instructions_end[];
> +

arch/x86/include/asm/alternative.h:92:extern struct alt_instr 
__alt_instructions[], __alt_instructions_end[];
arch/x86/kernel/alternative.c:163:extern struct alt_instr __alt_instructions[], 
__alt_instructions_end[];

Zap the declaration from the .c file pls.

>  /*
>   * Debug flag that can be tested to see whether alternative
>   * instructions were patched in already:
> @@ -104,11 +106,10 @@ extern void apply_fineibt(s32 *start_retpoline, s32 
> *end_retpoine,
> s32 *start_cfi, s32 *end_cfi);
>  
>  struct module;
> -struct paravirt_patch_site;
>  
>  struct callthunk_sites {
>   s32 *call_start, *call_end;
> - struct paravirt_patch_site  *pv_start, *pv_end;
> + struct alt_instr*alt_start, *alt_end;
>  };
>  
>  #ifdef CONFIG_CALL_THUNKS
> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
> index 3749311d51c3..9c6c5cfa9fe2 100644
> --- a/arch/x86/include/asm/paravirt.h
> +++ b/arch/x86/include/asm/paravirt.h
> @@ -740,20 +740,23 @@ void native_pv_lock_init(void) __init;
>  
>  #ifdef CONFIG_X86_64
>  #ifdef CONFIG_PARAVIRT_XXL
> +#ifdef CONFIG_DEBUG_ENTRY
>  
>  #define PARA_PATCH(off)  ((off) / 8)
>  #define PARA_SITE(ptype, ops)_PVSITE(ptype, ops, .quad, 8)
>  #define PARA_INDIRECT(addr)  *addr(%rip)
>  
> -#ifdef CONFIG_DEBUG_ENTRY
>  .macro PARA_IRQ_save_fl
>   PARA_SITE(PARA_PATCH(PV_IRQ_save_fl),
> ANNOTATE_RETPOLINE_SAFE;
> call PARA_INDIRECT(pv_ops+PV_IRQ_save_fl);)
> + ANNOTATE_RETPOLINE_SAFE;
> + call PARA_INDIRECT(pv_ops+PV_IRQ_save_fl);
>  .endm
>  
> -#define SAVE_FLAGS   ALTERNATIVE "PARA_IRQ_save_fl;", "pushf; pop %rax;", \
> - ALT_NOT_XEN
> +#define SAVE_FLAGS ALTERNATIVE_2 "PARA_IRQ_save_fl;",
> \
> +  ALT_CALL_INSTR, ALT_CALL_ALWAYS,   \
> +  "pushf; pop %rax;", ALT_NOT_XEN

How is that supposed to work?

At build time for a PARAVIRT_XXL build it'll have that PARA_IRQ_save_fl
macro in there which issues a .parainstructions section and an indirect
call to

call *pv_ops+240(%rip);

then it'll always patch in "call BUG_func" due to X86_FEATURE_ALWAYS.

I guess this is your way of saying "this should always be patched, one
way or the other, depending on X86_FEATURE_XENPV, and this is a way to
catch unpatched locations...

Then on a pv build which doesn't set X86_FEATURE_XENPV during boot,
it'll replace the "call BUG_func" thing with the pushf;pop.

And if it does set X86_FEATURE_XENPV, it'll patch in the direct call to
 /me greps tree ... pv_native_save_fl I guess.

If anything, how those ALT_CALL_ALWAYS things are supposed to work,
should be documented there, over the macro definition and what the
intent is.

Because otherwise we'll have to swap in the whole machinery back into
our L1s each time we need to touch it.

And btw, this whole patching stuff becomes insanely non-trivial slowly.

:-\

> diff --git a/arch/x86/kernel/callthunks.c b/arch/x86/kernel/callthunks.c
> index faa9f2299848..200ea8087ddb 100644
> --- a/arch/x86/kernel/callthunks.c
> +++ b/arch/x86/kernel/callthunks.c
> @@ -238,14 +238,13 @@ patch_call_sites(s32 *start, s32 *end, const struct 
> core_text *ct)
>  }
>  
>  static __init_or_module void
> -patch_paravirt_call_sites(struct paravirt_patch_site *start,
> -   struct paravirt_patch_site *end,
> -   const s

[PATCH net v1] vsock/test: fix SEQPACKET message bounds test

2023-11-21 Thread Arseniy Krasnov

Tune message length calculation to make this test work on machines
where 'getpagesize()' returns >32KB. Now maximum message length is not
hardcoded (on machines above it was smaller than 'getpagesize()' return
value, thus we get negative value and test fails), but calculated at
runtime and always bigger than 'getpagesize()' result. Reproduced on
aarch64 with 64KB page size.

Fixes: 5c338112e48a ("test/vsock: rework message bounds test")
Signed-off-by: Arseniy Krasnov 
---
 tools/testing/vsock/vsock_test.c | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
index f5623b8d76b7..691e44c746bf 100644
--- a/tools/testing/vsock/vsock_test.c
+++ b/tools/testing/vsock/vsock_test.c
@@ -353,11 +353,12 @@ static void test_stream_msg_peek_server(const struct 
test_opts *opts)
 }
 
 #define SOCK_BUF_SIZE (2 * 1024 * 1024)
-#define MAX_MSG_SIZE (32 * 1024)
+#define MAX_MSG_PAGES 4
 
 static void test_seqpacket_msg_bounds_client(const struct test_opts *opts)
 {
unsigned long curr_hash;
+   size_t max_msg_size;
int page_size;
int msg_count;
int fd;
@@ -373,7 +374,8 @@ static void test_seqpacket_msg_bounds_client(const struct 
test_opts *opts)
 
curr_hash = 0;
page_size = getpagesize();
-   msg_count = SOCK_BUF_SIZE / MAX_MSG_SIZE;
+   max_msg_size = MAX_MSG_PAGES * page_size;
+   msg_count = SOCK_BUF_SIZE / max_msg_size;
 
for (int i = 0; i < msg_count; i++) {
size_t buf_size;
@@ -383,7 +385,7 @@ static void test_seqpacket_msg_bounds_client(const struct 
test_opts *opts)
/* Use "small" buffers and "big" buffers. */
if (i & 1)
buf_size = page_size +
-   (rand() % (MAX_MSG_SIZE - page_size));
+   (rand() % (max_msg_size - page_size));
else
buf_size = 1 + (rand() % page_size);
 
@@ -429,7 +431,6 @@ static void test_seqpacket_msg_bounds_server(const struct 
test_opts *opts)
unsigned long remote_hash;
unsigned long curr_hash;
int fd;
-   char buf[MAX_MSG_SIZE];
struct msghdr msg = {0};
struct iovec iov = {0};
 
@@ -457,8 +458,13 @@ static void test_seqpacket_msg_bounds_server(const struct 
test_opts *opts)
control_writeln("SRVREADY");
/* Wait, until peer sends whole data. */
control_expectln("SENDDONE");
-   iov.iov_base = buf;
-   iov.iov_len = sizeof(buf);
+   iov.iov_len = MAX_MSG_PAGES * getpagesize();
+   iov.iov_base = malloc(iov.iov_len);
+   if (!iov.iov_base) {
+   perror("malloc");
+   exit(EXIT_FAILURE);
+   }
+
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
 
@@ -483,6 +489,7 @@ static void test_seqpacket_msg_bounds_server(const struct 
test_opts *opts)
curr_hash += hash_djb2(msg.msg_iov[0].iov_base, recv_size);
}
 
+   free(iov.iov_base);
close(fd);
remote_hash = control_readulong();
 
-- 
2.25.1

[PATCH v2 01/33] ftrace: Unpoison ftrace_regs in ftrace_ops_list_func()

2023-11-21 Thread Ilya Leoshkevich

Architectures use assembly code to initialize ftrace_regs and call
ftrace_ops_list_func(). Therefore, from the KMSAN's point of view,
ftrace_regs is poisoned on ftrace_ops_list_func entry(). This causes
KMSAN warnings when running the ftrace testsuite.

Fix by trusting the architecture-specific assembly code and always
unpoisoning ftrace_regs in ftrace_ops_list_func.

Signed-off-by: Ilya Leoshkevich 
---
 kernel/trace/ftrace.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 8de8bec5f366..dfb8b26966aa 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -7399,6 +7399,7 @@ __ftrace_ops_list_func(unsigned long ip, unsigned long 
parent_ip,
 void arch_ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
   struct ftrace_ops *op, struct ftrace_regs *fregs)
 {
+   kmsan_unpoison_memory(fregs, sizeof(*fregs));
__ftrace_ops_list_func(ip, parent_ip, NULL, fregs);
 }
 #else
-- 
2.41.0

[PATCH v2 08/33] kmsan: Remove an x86-specific #include from kmsan.h

2023-11-21 Thread Ilya Leoshkevich

Replace the x86-specific asm/pgtable_64_types.h #include with the
linux/pgtable.h one, which all architectures have.

While at it, sort the headers alphabetically for the sake of
consistency with other KMSAN code.

Fixes: f80be4571b19 ("kmsan: add KMSAN runtime core")
Suggested-by: Heiko Carstens 
Reviewed-by: Alexander Potapenko 
Signed-off-by: Ilya Leoshkevich 
---
 mm/kmsan/kmsan.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/kmsan/kmsan.h b/mm/kmsan/kmsan.h
index a14744205435..adf443bcffe8 100644
--- a/mm/kmsan/kmsan.h
+++ b/mm/kmsan/kmsan.h
@@ -10,14 +10,14 @@
 #ifndef __MM_KMSAN_KMSAN_H
 #define __MM_KMSAN_KMSAN_H
 
-#include 
 #include 
+#include 
+#include 
+#include 
+#include 
 #include 
 #include 
 #include 
-#include 
-#include 
-#include 
 
 #define KMSAN_ALLOCA_MAGIC_ORIGIN 0xabcd0100
 #define KMSAN_CHAIN_MAGIC_ORIGIN 0xabcd0200
-- 
2.41.0

[PATCH v2 07/33] kmsan: Remove a useless assignment from kmsan_vmap_pages_range_noflush()

2023-11-21 Thread Ilya Leoshkevich

The value assigned to prot is immediately overwritten on the next line
with PAGE_KERNEL. The right hand side of the assignment has no
side-effects.

Fixes: b073d7f8aee4 ("mm: kmsan: maintain KMSAN metadata for page operations")
Suggested-by: Alexander Gordeev 
Reviewed-by: Alexander Potapenko 
Signed-off-by: Ilya Leoshkevich 
---
 mm/kmsan/shadow.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/mm/kmsan/shadow.c b/mm/kmsan/shadow.c
index b9d05aff313e..2d57408c78ae 100644
--- a/mm/kmsan/shadow.c
+++ b/mm/kmsan/shadow.c
@@ -243,7 +243,6 @@ int kmsan_vmap_pages_range_noflush(unsigned long start, 
unsigned long end,
s_pages[i] = shadow_page_for(pages[i]);
o_pages[i] = origin_page_for(pages[i]);
}
-   prot = __pgprot(pgprot_val(prot) | _PAGE_NX);
prot = PAGE_KERNEL;
 
origin_start = vmalloc_meta((void *)start, KMSAN_META_ORIGIN);
-- 
2.41.0

[PATCH v2 10/33] kmsan: Expose kmsan_get_metadata()

2023-11-21 Thread Ilya Leoshkevich

Each s390 CPU has lowcore pages associated with it. Each CPU sees its
own lowcore at virtual address 0 through a hardware mechanism called
prefixing. Additionally, all lowcores are mapped to non-0 virtual
addresses stored in the lowcore_ptr[] array.

When lowcore is accessed through virtual address 0, one needs to
resolve metadata for lowcore_ptr[raw_smp_processor_id()].

Expose kmsan_get_metadata() to make it possible to do this from the
arch code.

Signed-off-by: Ilya Leoshkevich 
---
 include/linux/kmsan.h  | 14 ++
 mm/kmsan/instrumentation.c |  1 +
 mm/kmsan/kmsan.h   |  1 -
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
index e0c23a32cdf0..ff8fd95733fa 100644
--- a/include/linux/kmsan.h
+++ b/include/linux/kmsan.h
@@ -230,6 +230,15 @@ void kmsan_handle_urb(const struct urb *urb, bool is_out);
  */
 void kmsan_unpoison_entry_regs(const struct pt_regs *regs);
 
+/**
+ * kmsan_get_metadata() - Return a pointer to KMSAN shadow or origins.
+ * @addr:  kernel address.
+ * @is_origin: whether to return origins or shadow.
+ *
+ * Return NULL if metadata cannot be found.
+ */
+void *kmsan_get_metadata(void *addr, bool is_origin);
+
 #else
 
 static inline void kmsan_init_shadow(void)
@@ -329,6 +338,11 @@ static inline void kmsan_unpoison_entry_regs(const struct 
pt_regs *regs)
 {
 }
 
+static inline void *kmsan_get_metadata(void *addr, bool is_origin)
+{
+   return NULL;
+}
+
 #endif
 
 #endif /* _LINUX_KMSAN_H */
diff --git a/mm/kmsan/instrumentation.c b/mm/kmsan/instrumentation.c
index 8a1bbbc723ab..94b49fac9d8b 100644
--- a/mm/kmsan/instrumentation.c
+++ b/mm/kmsan/instrumentation.c
@@ -14,6 +14,7 @@
 
 #include "kmsan.h"
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/mm/kmsan/kmsan.h b/mm/kmsan/kmsan.h
index adf443bcffe8..34b83c301d57 100644
--- a/mm/kmsan/kmsan.h
+++ b/mm/kmsan/kmsan.h
@@ -66,7 +66,6 @@ struct shadow_origin_ptr {
 
 struct shadow_origin_ptr kmsan_get_shadow_origin_ptr(void *addr, u64 size,
 bool store);
-void *kmsan_get_metadata(void *addr, bool is_origin);
 void __init kmsan_init_alloc_meta_for_range(void *start, void *end);
 
 enum kmsan_bug_reason {
-- 
2.41.0

[PATCH v2 11/33] kmsan: Export panic_on_kmsan

2023-11-21 Thread Ilya Leoshkevich

When building the kmsan test as a module, modpost fails with the
following error message:

ERROR: modpost: "panic_on_kmsan" [mm/kmsan/kmsan_test.ko] undefined!

Export panic_on_kmsan in order to improve the KMSAN usability for
modules.

Reviewed-by: Alexander Potapenko 
Signed-off-by: Ilya Leoshkevich 
---
 mm/kmsan/report.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/kmsan/report.c b/mm/kmsan/report.c
index 02736ec757f2..c79d3b0d2d0d 100644
--- a/mm/kmsan/report.c
+++ b/mm/kmsan/report.c
@@ -20,6 +20,7 @@ static DEFINE_RAW_SPINLOCK(kmsan_report_lock);
 /* Protected by kmsan_report_lock */
 static char report_local_descr[DESCR_SIZE];
 int panic_on_kmsan __read_mostly;
+EXPORT_SYMBOL_GPL(panic_on_kmsan);
 
 #ifdef MODULE_PARAM_PREFIX
 #undef MODULE_PARAM_PREFIX
-- 
2.41.0

[PATCH v2 18/33] lib/string: Add KMSAN support to strlcpy() and strlcat()

2023-11-21 Thread Ilya Leoshkevich

Currently KMSAN does not fully propagate metadata in strlcpy() and
strlcat(), because they are built with -ffreestanding and call
memcpy(). In this combination memcpy() calls are not instrumented.

Fix by copying the metadata manually. Add the __STDC_HOSTED__ #ifdef in
case the code is compiled with different flags in the future.

Signed-off-by: Ilya Leoshkevich 
---
 lib/string.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/lib/string.c b/lib/string.c
index be26623953d2..e83c6dd77ec6 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -111,6 +111,9 @@ size_t strlcpy(char *dest, const char *src, size_t size)
if (size) {
size_t len = (ret >= size) ? size - 1 : ret;
__builtin_memcpy(dest, src, len);
+#if __STDC_HOSTED__ == 0
+   kmsan_memmove_metadata(dest, src, len);
+#endif
dest[len] = '\0';
}
return ret;
@@ -261,6 +264,9 @@ size_t strlcat(char *dest, const char *src, size_t count)
if (len >= count)
len = count-1;
__builtin_memcpy(dest, src, len);
+#if __STDC_HOSTED__ == 0
+   kmsan_memmove_metadata(dest, src, len);
+#endif
dest[len] = 0;
return res;
 }
-- 
2.41.0

[PATCH v2 14/33] kmsan: Support SLAB_POISON

2023-11-21 Thread Ilya Leoshkevich

Avoid false KMSAN negatives with SLUB_DEBUG by allowing
kmsan_slab_free() to poison the freed memory, and by preventing
init_object() from unpoisoning new allocations. The usage of
memset_no_sanitize_memory() does not degrade the generated code
quality.

There are two alternatives to this approach. First, init_object()
can be marked with __no_sanitize_memory. This annotation should be used
with great care, because it drops all instrumentation from the
function, and any shadow writes will be lost. Even though this is not a
concern with the current init_object() implementation, this may change
in the future.

Second, kmsan_poison_memory() calls may be added after memset() calls.
The downside is that init_object() is called from
free_debug_processing(), in which case poisoning will erase the
distinction between simply uninitialized memory and UAF.

Signed-off-by: Ilya Leoshkevich 
---
 mm/kmsan/hooks.c |  2 +-
 mm/slub.c| 10 ++
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
index 7b5814412e9f..7a30274b893c 100644
--- a/mm/kmsan/hooks.c
+++ b/mm/kmsan/hooks.c
@@ -76,7 +76,7 @@ void kmsan_slab_free(struct kmem_cache *s, void *object)
return;
 
/* RCU slabs could be legally used after free within the RCU period */
-   if (unlikely(s->flags & (SLAB_TYPESAFE_BY_RCU | SLAB_POISON)))
+   if (unlikely(s->flags & SLAB_TYPESAFE_BY_RCU))
return;
/*
 * If there's a constructor, freed memory must remain in the same state
diff --git a/mm/slub.c b/mm/slub.c
index 63d281dfacdb..169e5f645ea8 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1030,7 +1030,8 @@ static void init_object(struct kmem_cache *s, void 
*object, u8 val)
unsigned int poison_size = s->object_size;
 
if (s->flags & SLAB_RED_ZONE) {
-   memset(p - s->red_left_pad, val, s->red_left_pad);
+   memset_no_sanitize_memory(p - s->red_left_pad, val,
+ s->red_left_pad);
 
if (slub_debug_orig_size(s) && val == SLUB_RED_ACTIVE) {
/*
@@ -1043,12 +1044,13 @@ static void init_object(struct kmem_cache *s, void 
*object, u8 val)
}
 
if (s->flags & __OBJECT_POISON) {
-   memset(p, POISON_FREE, poison_size - 1);
-   p[poison_size - 1] = POISON_END;
+   memset_no_sanitize_memory(p, POISON_FREE, poison_size - 1);
+   memset_no_sanitize_memory(p + poison_size - 1, POISON_END, 1);
}
 
if (s->flags & SLAB_RED_ZONE)
-   memset(p + poison_size, val, s->inuse - poison_size);
+   memset_no_sanitize_memory(p + poison_size, val,
+ s->inuse - poison_size);
 }
 
 static void restore_bytes(struct kmem_cache *s, char *message, u8 data,
-- 
2.41.0

[PATCH v2 02/33] kmsan: Make the tests compatible with kmsan.panic=1

2023-11-21 Thread Ilya Leoshkevich

It's useful to have both tests and kmsan.panic=1 during development,
but right now the warnings, that the tests cause, lead to kernel
panics.

Temporarily set kmsan.panic=0 for the duration of the KMSAN testing.

Reviewed-by: Alexander Potapenko 
Signed-off-by: Ilya Leoshkevich 
---
 mm/kmsan/kmsan_test.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/mm/kmsan/kmsan_test.c b/mm/kmsan/kmsan_test.c
index 07d3a3a5a9c5..9bfd11674fe3 100644
--- a/mm/kmsan/kmsan_test.c
+++ b/mm/kmsan/kmsan_test.c
@@ -659,9 +659,13 @@ static void test_exit(struct kunit *test)
 {
 }
 
+static int orig_panic_on_kmsan;
+
 static int kmsan_suite_init(struct kunit_suite *suite)
 {
register_trace_console(probe_console, NULL);
+   orig_panic_on_kmsan = panic_on_kmsan;
+   panic_on_kmsan = 0;
return 0;
 }
 
@@ -669,6 +673,7 @@ static void kmsan_suite_exit(struct kunit_suite *suite)
 {
unregister_trace_console(probe_console, NULL);
tracepoint_synchronize_unregister();
+   panic_on_kmsan = orig_panic_on_kmsan;
 }
 
 static struct kunit_suite kmsan_test_suite = {
-- 
2.41.0

[PATCH v2 15/33] kmsan: Use ALIGN_DOWN() in kmsan_get_metadata()

2023-11-21 Thread Ilya Leoshkevich

Improve the readability by replacing the custom aligning logic with
ALIGN_DOWN(). Unlike other places where a similar sequence is used,
there is no size parameter that needs to be adjusted, so the standard
macro fits.

Reviewed-by: Alexander Potapenko 
Signed-off-by: Ilya Leoshkevich 
---
 mm/kmsan/shadow.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/mm/kmsan/shadow.c b/mm/kmsan/shadow.c
index 2d57408c78ae..9c58f081d84f 100644
--- a/mm/kmsan/shadow.c
+++ b/mm/kmsan/shadow.c
@@ -123,14 +123,12 @@ struct shadow_origin_ptr kmsan_get_shadow_origin_ptr(void 
*address, u64 size,
  */
 void *kmsan_get_metadata(void *address, bool is_origin)
 {
-   u64 addr = (u64)address, pad, off;
+   u64 addr = (u64)address, off;
struct page *page;
void *ret;
 
-   if (is_origin && !IS_ALIGNED(addr, KMSAN_ORIGIN_SIZE)) {
-   pad = addr % KMSAN_ORIGIN_SIZE;
-   addr -= pad;
-   }
+   if (is_origin)
+   addr = ALIGN_DOWN(addr, KMSAN_ORIGIN_SIZE);
address = (void *)addr;
if (kmsan_internal_is_vmalloc_addr(address) ||
kmsan_internal_is_module_addr(address))
-- 
2.41.0

[PATCH v2 16/33] mm: slub: Let KMSAN access metadata

2023-11-21 Thread Ilya Leoshkevich

Building the kernel with CONFIG_SLUB_DEBUG and CONFIG_KMSAN causes
KMSAN to complain about touching redzones in kfree().

Fix by extending the existing KASAN-related metadata_access_enable()
and metadata_access_disable() functions to KMSAN.

Signed-off-by: Ilya Leoshkevich 
---
 mm/slub.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/slub.c b/mm/slub.c
index 169e5f645ea8..6e61c27951a4 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -700,10 +700,12 @@ static int disable_higher_order_debug;
 static inline void metadata_access_enable(void)
 {
kasan_disable_current();
+   kmsan_disable_current();
 }
 
 static inline void metadata_access_disable(void)
 {
+   kmsan_enable_current();
kasan_enable_current();
 }
 
-- 
2.41.0

[PATCH v2 20/33] kmsan: Accept ranges starting with 0 on s390

2023-11-21 Thread Ilya Leoshkevich

On s390 the virtual address 0 is valid (current CPU's lowcore is mapped
there), therefore KMSAN should not complain about it.

Disable the respective check on s390. There doesn't seem to be a
Kconfig option to describe this situation, so explicitly check for
s390.

Reviewed-by: Alexander Potapenko 
Signed-off-by: Ilya Leoshkevich 
---
 mm/kmsan/init.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/kmsan/init.c b/mm/kmsan/init.c
index ffedf4dbc49d..7a3df4d359f8 100644
--- a/mm/kmsan/init.c
+++ b/mm/kmsan/init.c
@@ -33,7 +33,10 @@ static void __init kmsan_record_future_shadow_range(void 
*start, void *end)
bool merged = false;
 
KMSAN_WARN_ON(future_index == NUM_FUTURE_RANGES);
-   KMSAN_WARN_ON((nstart >= nend) || !nstart || !nend);
+   KMSAN_WARN_ON((nstart >= nend) ||
+ /* Virtual address 0 is valid on s390. */
+ (!IS_ENABLED(CONFIG_S390) && !nstart) ||
+ !nend);
nstart = ALIGN_DOWN(nstart, PAGE_SIZE);
nend = ALIGN(nend, PAGE_SIZE);
 
-- 
2.41.0

[PATCH v2 05/33] kmsan: Fix is_bad_asm_addr() on arches with overlapping address spaces

2023-11-21 Thread Ilya Leoshkevich

Comparing pointers with TASK_SIZE does not make sense when kernel and
userspace overlap. Skip the comparison when this is the case.

Signed-off-by: Ilya Leoshkevich 
---
 mm/kmsan/instrumentation.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/kmsan/instrumentation.c b/mm/kmsan/instrumentation.c
index 470b0b4afcc4..8a1bbbc723ab 100644
--- a/mm/kmsan/instrumentation.c
+++ b/mm/kmsan/instrumentation.c
@@ -20,7 +20,8 @@
 
 static inline bool is_bad_asm_addr(void *addr, uintptr_t size, bool is_store)
 {
-   if ((u64)addr < TASK_SIZE)
+   if (IS_ENABLED(CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE) &&
+   (u64)addr < TASK_SIZE)
return true;
if (!kmsan_get_metadata(addr, KMSAN_META_SHADOW))
return true;
-- 
2.41.0

[PATCH v2 03/33] kmsan: Disable KMSAN when DEFERRED_STRUCT_PAGE_INIT is enabled

2023-11-21 Thread Ilya Leoshkevich

KMSAN relies on memblock returning all available pages to it
(see kmsan_memblock_free_pages()). It partitions these pages into 3
categories: pages available to the buddy allocator, shadow pages and
origin pages. This partitioning is static.

If new pages appear after kmsan_init_runtime(), it is considered
an error. DEFERRED_STRUCT_PAGE_INIT causes this, so mark it as
incompatible with KMSAN.

Reviewed-by: Alexander Potapenko 
Signed-off-by: Ilya Leoshkevich 
---
 mm/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/Kconfig b/mm/Kconfig
index 89971a894b60..4f2f99339fc7 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -985,6 +985,7 @@ config DEFERRED_STRUCT_PAGE_INIT
depends on SPARSEMEM
depends on !NEED_PER_CPU_KM
depends on 64BIT
+   depends on !KMSAN
select PADATA
help
  Ordinarily all struct pages are initialised during early boot in a
-- 
2.41.0

[PATCH v2 17/33] mm: kfence: Disable KMSAN when checking the canary

2023-11-21 Thread Ilya Leoshkevich

KMSAN warns about check_canary() accessing the canary.

The reason is that, even though set_canary() is properly instrumented
and sets shadow, slub explicitly poisons the canary's address range
afterwards.

Unpoisoning the canary is not the right thing to do: only
check_canary() is supposed to ever touch it. Instead, disable KMSAN
checks around canary read accesses.

Signed-off-by: Ilya Leoshkevich 
---
 mm/kfence/core.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/kfence/core.c b/mm/kfence/core.c
index 3872528d0963..a2ea8e5a1ad9 100644
--- a/mm/kfence/core.c
+++ b/mm/kfence/core.c
@@ -306,7 +306,7 @@ metadata_update_state(struct kfence_metadata *meta, enum 
kfence_object_state nex
 }
 
 /* Check canary byte at @addr. */
-static inline bool check_canary_byte(u8 *addr)
+__no_kmsan_checks static inline bool check_canary_byte(u8 *addr)
 {
struct kfence_metadata *meta;
unsigned long flags;
@@ -341,7 +341,8 @@ static inline void set_canary(const struct kfence_metadata 
*meta)
*((u64 *)addr) = KFENCE_CANARY_PATTERN_U64;
 }
 
-static inline void check_canary(const struct kfence_metadata *meta)
+__no_kmsan_checks static inline void
+check_canary(const struct kfence_metadata *meta)
 {
const unsigned long pageaddr = ALIGN_DOWN(meta->addr, PAGE_SIZE);
unsigned long addr = pageaddr;
-- 
2.41.0

[PATCH v2 00/33] kmsan: Enable on s390

2023-11-21 Thread Ilya Leoshkevich

v1: https://lore.kernel.org/lkml/20231115203401.2495875-1-...@linux.ibm.com/
v1 -> v2: Add comments, sort #includes, introduce
  memset_no_sanitize_memory() and use it to avoid unpoisoning
  of redzones, change vmalloc alignment to _REGION3_SIZE, add
  R-bs (Alexander P.).

  Fix building
  [PATCH 28/33] s390/string: Add KMSAN support
  with FORTIFY_SOURCE.
  Reported-by: kernel test robot 
  Closes: 
https://lore.kernel.org/oe-kbuild-all/202311170550.bsbo44ix-...@intel.com/

Hi,

This series provides the minimal support for Kernel Memory Sanitizer on
s390. Kernel Memory Sanitizer is clang-only instrumentation for finding
accesses to uninitialized memory. The clang support for s390 has already
been merged [1].

With this series, I can successfully boot s390 defconfig and
debug_defconfig with kmsan.panic=1. The tool found one real
s390-specific bug (fixed in master).

Best regards,
Ilya

[1] https://reviews.llvm.org/D148596

Ilya Leoshkevich (33):
  ftrace: Unpoison ftrace_regs in ftrace_ops_list_func()
  kmsan: Make the tests compatible with kmsan.panic=1
  kmsan: Disable KMSAN when DEFERRED_STRUCT_PAGE_INIT is enabled
  kmsan: Increase the maximum store size to 4096
  kmsan: Fix is_bad_asm_addr() on arches with overlapping address spaces
  kmsan: Fix kmsan_copy_to_user() on arches with overlapping address
spaces
  kmsan: Remove a useless assignment from
kmsan_vmap_pages_range_noflush()
  kmsan: Remove an x86-specific #include from kmsan.h
  kmsan: Introduce kmsan_memmove_metadata()
  kmsan: Expose kmsan_get_metadata()
  kmsan: Export panic_on_kmsan
  kmsan: Allow disabling KMSAN checks for the current task
  kmsan: Introduce memset_no_sanitize_memory()
  kmsan: Support SLAB_POISON
  kmsan: Use ALIGN_DOWN() in kmsan_get_metadata()
  mm: slub: Let KMSAN access metadata
  mm: kfence: Disable KMSAN when checking the canary
  lib/string: Add KMSAN support to strlcpy() and strlcat()
  lib/zlib: Unpoison DFLTCC output buffers
  kmsan: Accept ranges starting with 0 on s390
  s390: Turn off KMSAN for boot, vdso and purgatory
  s390: Use a larger stack for KMSAN
  s390/boot: Add the KMSAN runtime stub
  s390/checksum: Add a KMSAN check
  s390/cpacf: Unpoison the results of cpacf_trng()
  s390/ftrace: Unpoison ftrace_regs in kprobe_ftrace_handler()
  s390/mm: Define KMSAN metadata for vmalloc and modules
  s390/string: Add KMSAN support
  s390/traps: Unpoison the kernel_stack_overflow()'s pt_regs
  s390/uaccess: Add KMSAN support to put_user() and get_user()
  s390/unwind: Disable KMSAN checks
  s390: Implement the architecture-specific kmsan functions
  kmsan: Enable on s390

 Documentation/dev-tools/kmsan.rst   |   4 +-
 arch/s390/Kconfig   |   1 +
 arch/s390/Makefile  |   2 +-
 arch/s390/boot/Makefile |   3 +
 arch/s390/boot/kmsan.c  |   6 ++
 arch/s390/boot/startup.c|   8 ++
 arch/s390/boot/string.c |  16 
 arch/s390/include/asm/checksum.h|   2 +
 arch/s390/include/asm/cpacf.h   |   2 +
 arch/s390/include/asm/kmsan.h   |  36 +
 arch/s390/include/asm/pgtable.h |  10 +++
 arch/s390/include/asm/string.h  |  20 +++--
 arch/s390/include/asm/thread_info.h |   2 +-
 arch/s390/include/asm/uaccess.h | 110 
 arch/s390/kernel/ftrace.c   |   1 +
 arch/s390/kernel/traps.c|   6 ++
 arch/s390/kernel/unwind_bc.c|   4 +
 arch/s390/kernel/vdso32/Makefile|   3 +-
 arch/s390/kernel/vdso64/Makefile|   3 +-
 arch/s390/purgatory/Makefile|   2 +
 include/linux/kmsan-checks.h|  26 +++
 include/linux/kmsan.h   |  23 ++
 include/linux/kmsan_types.h |   2 +-
 kernel/trace/ftrace.c   |   1 +
 lib/string.c|   6 ++
 lib/zlib_dfltcc/dfltcc.h|   1 +
 lib/zlib_dfltcc/dfltcc_util.h   |  23 ++
 mm/Kconfig  |   1 +
 mm/kfence/core.c|   5 +-
 mm/kmsan/core.c |   2 +-
 mm/kmsan/hooks.c|  30 +++-
 mm/kmsan/init.c |   5 +-
 mm/kmsan/instrumentation.c  |  11 +--
 mm/kmsan/kmsan.h|   9 +--
 mm/kmsan/kmsan_test.c   |   5 ++
 mm/kmsan/report.c   |   7 +-
 mm/kmsan/shadow.c   |   9 +--
 mm/slub.c   |  12 ++-
 38 files changed, 345 insertions(+), 74 deletions(-)
 create mode 100644 arch/s390/boot/kmsan.c
 create mode 100644 arch/s390/include/asm/kmsan.h

-- 
2.41.0

[PATCH v2 22/33] s390: Use a larger stack for KMSAN

2023-11-21 Thread Ilya Leoshkevich

Adjust the stack size for the KMSAN-enabled kernel like it was done
for the KASAN-enabled one in commit 7fef92ccadd7 ("s390/kasan: double
the stack size"). Both tools have similar requirements.

Reviewed-by: Alexander Gordeev 
Reviewed-by: Alexander Potapenko 
Signed-off-by: Ilya Leoshkevich 
---
 arch/s390/Makefile  | 2 +-
 arch/s390/include/asm/thread_info.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/s390/Makefile b/arch/s390/Makefile
index 73873e451686..a7f5386d25ad 100644
--- a/arch/s390/Makefile
+++ b/arch/s390/Makefile
@@ -34,7 +34,7 @@ KBUILD_CFLAGS_DECOMPRESSOR += $(if 
$(CONFIG_DEBUG_INFO_DWARF4), $(call cc-option
 KBUILD_CFLAGS_DECOMPRESSOR += $(if 
$(CONFIG_CC_NO_ARRAY_BOUNDS),-Wno-array-bounds)
 
 UTS_MACHINE:= s390x
-STACK_SIZE := $(if $(CONFIG_KASAN),65536,16384)
+STACK_SIZE := $(if $(CONFIG_KASAN),65536,$(if $(CONFIG_KMSAN),65536,16384))
 CHECKFLAGS += -D__s390__ -D__s390x__
 
 export LD_BFD
diff --git a/arch/s390/include/asm/thread_info.h 
b/arch/s390/include/asm/thread_info.h
index a674c7d25da5..d02a709717b8 100644
--- a/arch/s390/include/asm/thread_info.h
+++ b/arch/s390/include/asm/thread_info.h
@@ -16,7 +16,7 @@
 /*
  * General size of kernel stacks
  */
-#ifdef CONFIG_KASAN
+#if defined(CONFIG_KASAN) || defined(CONFIG_KMSAN)
 #define THREAD_SIZE_ORDER 4
 #else
 #define THREAD_SIZE_ORDER 2
-- 
2.41.0

[PATCH v2 24/33] s390/checksum: Add a KMSAN check

2023-11-21 Thread Ilya Leoshkevich

Add a KMSAN check to the CKSM inline assembly, similar to how it was
done for ASAN in commit e42ac7789df6 ("s390/checksum: always use cksm
instruction").

Acked-by: Alexander Gordeev 
Signed-off-by: Ilya Leoshkevich 
---
 arch/s390/include/asm/checksum.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/s390/include/asm/checksum.h b/arch/s390/include/asm/checksum.h
index 69837eec2ff5..55ba0ddd8eab 100644
--- a/arch/s390/include/asm/checksum.h
+++ b/arch/s390/include/asm/checksum.h
@@ -13,6 +13,7 @@
 #define _S390_CHECKSUM_H
 
 #include 
+#include 
 #include 
 
 /*
@@ -35,6 +36,7 @@ static inline __wsum csum_partial(const void *buff, int len, 
__wsum sum)
};
 
kasan_check_read(buff, len);
+   kmsan_check_memory(buff, len);
asm volatile(
"0: cksm%[sum],%[rp]\n"
"   jo  0b\n"
-- 
2.41.0

[PATCH v2 23/33] s390/boot: Add the KMSAN runtime stub

2023-11-21 Thread Ilya Leoshkevich

It should be possible to have inline functions in the s390 header
files, which call kmsan_unpoison_memory(). The problem is that these
header files might be included by the decompressor, which does not
contain KMSAN runtime, causing linker errors.

Not compiling these calls if __SANITIZE_MEMORY__ is not defined -
either by changing kmsan-checks.h or at the call sites - may cause
unintended side effects, since calling these functions from an
uninstrumented code that is linked into the kernel is valid use case.

One might want to explicitly distinguish between the kernel and the
decompressor. Checking for a decompressor-specific #define is quite
heavy-handed, and will have to be done at all call sites.

A more generic approach is to provide a dummy kmsan_unpoison_memory()
definition. This produces some runtime overhead, but only when building
with CONFIG_KMSAN. The benefit is that it does not disturb the existing
KMSAN build logic and call sites don't need to be changed.

Signed-off-by: Ilya Leoshkevich 
---
 arch/s390/boot/Makefile | 1 +
 arch/s390/boot/kmsan.c  | 6 ++
 2 files changed, 7 insertions(+)
 create mode 100644 arch/s390/boot/kmsan.c

diff --git a/arch/s390/boot/Makefile b/arch/s390/boot/Makefile
index fb10fcd21221..096216a72e98 100644
--- a/arch/s390/boot/Makefile
+++ b/arch/s390/boot/Makefile
@@ -44,6 +44,7 @@ obj-$(findstring y, $(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) 
$(CONFIG_PGSTE)) +=
 obj-$(CONFIG_RANDOMIZE_BASE)   += kaslr.o
 obj-y  += $(if $(CONFIG_KERNEL_UNCOMPRESSED),,decompressor.o) info.o
 obj-$(CONFIG_KERNEL_ZSTD) += clz_ctz.o
+obj-$(CONFIG_KMSAN) += kmsan.o
 obj-all := $(obj-y) piggy.o syms.o
 
 targets:= bzImage section_cmp.boot.data 
section_cmp.boot.preserved.data $(obj-y)
diff --git a/arch/s390/boot/kmsan.c b/arch/s390/boot/kmsan.c
new file mode 100644
index ..e7b3ac48143e
--- /dev/null
+++ b/arch/s390/boot/kmsan.c
@@ -0,0 +1,6 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+
+void kmsan_unpoison_memory(const void *address, size_t size)
+{
+}
-- 
2.41.0

[PATCH v2 26/33] s390/ftrace: Unpoison ftrace_regs in kprobe_ftrace_handler()

2023-11-21 Thread Ilya Leoshkevich

s390 uses assembly code to initialize ftrace_regs and call
kprobe_ftrace_handler(). Therefore, from the KMSAN's point of view,
ftrace_regs is poisoned on kprobe_ftrace_handler() entry. This causes
KMSAN warnings when running the ftrace testsuite.

Fix by trusting the assembly code and always unpoisoning ftrace_regs in
kprobe_ftrace_handler().

Signed-off-by: Ilya Leoshkevich 
---
 arch/s390/kernel/ftrace.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/s390/kernel/ftrace.c b/arch/s390/kernel/ftrace.c
index c46381ea04ec..3bad34eaa51e 100644
--- a/arch/s390/kernel/ftrace.c
+++ b/arch/s390/kernel/ftrace.c
@@ -300,6 +300,7 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long 
parent_ip,
if (bit < 0)
return;
 
+   kmsan_unpoison_memory(fregs, sizeof(*fregs));
regs = ftrace_get_regs(fregs);
p = get_kprobe((kprobe_opcode_t *)ip);
if (!regs || unlikely(!p) || kprobe_disabled(p))
-- 
2.41.0

[PATCH v2 28/33] s390/string: Add KMSAN support

2023-11-21 Thread Ilya Leoshkevich

Add KMSAN support for the s390 implementations of the string functions.
Do this similar to how it's already done for KASAN, except that the
optimized memset{16,32,64}() functions need to be disabled: it's
important for KMSAN to know that they initialized something.

The way boot code is built with regard to string functions is
problematic, since most files think it's configured with sanitizers,
but boot/string.c doesn't. This creates various problems with the
memset64() definitions, depending on whether the code is built with
sanitizers or fortify. This should probably be streamlined, but in the
meantime resolve the issues by introducing the IN_BOOT_STRING_C macro,
similar to the existing IN_ARCH_STRING_C macro.

Signed-off-by: Ilya Leoshkevich 
---
 arch/s390/boot/string.c| 16 
 arch/s390/include/asm/string.h | 20 +++-
 2 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/arch/s390/boot/string.c b/arch/s390/boot/string.c
index faccb33b462c..f6b9b1df48a8 100644
--- a/arch/s390/boot/string.c
+++ b/arch/s390/boot/string.c
@@ -1,11 +1,18 @@
 // SPDX-License-Identifier: GPL-2.0
+#define IN_BOOT_STRING_C 1
 #include 
 #include 
 #include 
 #undef CONFIG_KASAN
 #undef CONFIG_KASAN_GENERIC
+#undef CONFIG_KMSAN
 #include "../lib/string.c"
 
+/*
+ * Duplicate some functions from the common lib/string.c
+ * instead of fully including it.
+ */
+
 int strncmp(const char *cs, const char *ct, size_t count)
 {
unsigned char c1, c2;
@@ -22,6 +29,15 @@ int strncmp(const char *cs, const char *ct, size_t count)
return 0;
 }
 
+void *memset64(uint64_t *s, uint64_t v, size_t count)
+{
+   uint64_t *xs = s;
+
+   while (count--)
+   *xs++ = v;
+   return s;
+}
+
 char *skip_spaces(const char *str)
 {
while (isspace(*str))
diff --git a/arch/s390/include/asm/string.h b/arch/s390/include/asm/string.h
index 351685de53d2..2ab868cbae6c 100644
--- a/arch/s390/include/asm/string.h
+++ b/arch/s390/include/asm/string.h
@@ -15,15 +15,12 @@
 #define __HAVE_ARCH_MEMCPY /* gcc builtin & arch function */
 #define __HAVE_ARCH_MEMMOVE/* gcc builtin & arch function */
 #define __HAVE_ARCH_MEMSET /* gcc builtin & arch function */
-#define __HAVE_ARCH_MEMSET16   /* arch function */
-#define __HAVE_ARCH_MEMSET32   /* arch function */
-#define __HAVE_ARCH_MEMSET64   /* arch function */
 
 void *memcpy(void *dest, const void *src, size_t n);
 void *memset(void *s, int c, size_t n);
 void *memmove(void *dest, const void *src, size_t n);
 
-#ifndef CONFIG_KASAN
+#if !defined(CONFIG_KASAN) && !defined(CONFIG_KMSAN)
 #define __HAVE_ARCH_MEMCHR /* inline & arch function */
 #define __HAVE_ARCH_MEMCMP /* arch function */
 #define __HAVE_ARCH_MEMSCAN/* inline & arch function */
@@ -36,6 +33,9 @@ void *memmove(void *dest, const void *src, size_t n);
 #define __HAVE_ARCH_STRNCPY/* arch function */
 #define __HAVE_ARCH_STRNLEN/* inline & arch function */
 #define __HAVE_ARCH_STRSTR /* arch function */
+#define __HAVE_ARCH_MEMSET16   /* arch function */
+#define __HAVE_ARCH_MEMSET32   /* arch function */
+#define __HAVE_ARCH_MEMSET64   /* arch function */
 
 /* Prototypes for non-inlined arch strings functions. */
 int memcmp(const void *s1, const void *s2, size_t n);
@@ -44,7 +44,7 @@ size_t strlcat(char *dest, const char *src, size_t n);
 char *strncat(char *dest, const char *src, size_t n);
 char *strncpy(char *dest, const char *src, size_t n);
 char *strstr(const char *s1, const char *s2);
-#endif /* !CONFIG_KASAN */
+#endif /* !defined(CONFIG_KASAN) && !defined(CONFIG_KMSAN) */
 
 #undef __HAVE_ARCH_STRCHR
 #undef __HAVE_ARCH_STRNCHR
@@ -74,20 +74,30 @@ void *__memset16(uint16_t *s, uint16_t v, size_t count);
 void *__memset32(uint32_t *s, uint32_t v, size_t count);
 void *__memset64(uint64_t *s, uint64_t v, size_t count);
 
+#ifdef __HAVE_ARCH_MEMSET16
 static inline void *memset16(uint16_t *s, uint16_t v, size_t count)
 {
return __memset16(s, v, count * sizeof(v));
 }
+#endif
 
+#ifdef __HAVE_ARCH_MEMSET32
 static inline void *memset32(uint32_t *s, uint32_t v, size_t count)
 {
return __memset32(s, v, count * sizeof(v));
 }
+#endif
 
+#ifdef __HAVE_ARCH_MEMSET64
+#ifdef IN_BOOT_STRING_C
+void *memset64(uint64_t *s, uint64_t v, size_t count);
+#else
 static inline void *memset64(uint64_t *s, uint64_t v, size_t count)
 {
return __memset64(s, v, count * sizeof(v));
 }
+#endif
+#endif
 
 #if !defined(IN_ARCH_STRING_C) && (!defined(CONFIG_FORTIFY_SOURCE) || 
defined(__NO_FORTIFY))
 
-- 
2.41.0

[PATCH v2 25/33] s390/cpacf: Unpoison the results of cpacf_trng()

2023-11-21 Thread Ilya Leoshkevich

Prevent KMSAN from complaining about buffers filled by cpacf_trng()
being uninitialized.

Tested-by: Alexander Gordeev 
Signed-off-by: Ilya Leoshkevich 
---
 arch/s390/include/asm/cpacf.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/s390/include/asm/cpacf.h b/arch/s390/include/asm/cpacf.h
index b378e2b57ad8..a72b92770c4b 100644
--- a/arch/s390/include/asm/cpacf.h
+++ b/arch/s390/include/asm/cpacf.h
@@ -473,6 +473,8 @@ static inline void cpacf_trng(u8 *ucbuf, unsigned long 
ucbuf_len,
: [ucbuf] "+&d" (u.pair), [cbuf] "+&d" (c.pair)
: [fc] "K" (CPACF_PRNO_TRNG), [opc] "i" (CPACF_PRNO)
: "cc", "memory", "0");
+   kmsan_unpoison_memory(ucbuf, ucbuf_len);
+   kmsan_unpoison_memory(cbuf, cbuf_len);
 }
 
 /**
-- 
2.41.0

[PATCH v2 31/33] s390/unwind: Disable KMSAN checks

2023-11-21 Thread Ilya Leoshkevich

The unwind code can read uninitialized frames. Furthermore, even in
the good case, KMSAN does not emit shadow for backchains. Therefore
disable it for the unwinding functions.

Reviewed-by: Alexander Potapenko 
Signed-off-by: Ilya Leoshkevich 
---
 arch/s390/kernel/unwind_bc.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/s390/kernel/unwind_bc.c b/arch/s390/kernel/unwind_bc.c
index 0ece156fdd7c..cd44be2b6ce8 100644
--- a/arch/s390/kernel/unwind_bc.c
+++ b/arch/s390/kernel/unwind_bc.c
@@ -49,6 +49,8 @@ static inline bool is_final_pt_regs(struct unwind_state 
*state,
   READ_ONCE_NOCHECK(regs->psw.mask) & PSW_MASK_PSTATE;
 }
 
+/* Avoid KMSAN false positives from touching uninitialized frames. */
+__no_kmsan_checks
 bool unwind_next_frame(struct unwind_state *state)
 {
struct stack_info *info = &state->stack_info;
@@ -118,6 +120,8 @@ bool unwind_next_frame(struct unwind_state *state)
 }
 EXPORT_SYMBOL_GPL(unwind_next_frame);
 
+/* Avoid KMSAN false positives from touching uninitialized frames. */
+__no_kmsan_checks
 void __unwind_start(struct unwind_state *state, struct task_struct *task,
struct pt_regs *regs, unsigned long first_frame)
 {
-- 
2.41.0

[PATCH v2 29/33] s390/traps: Unpoison the kernel_stack_overflow()'s pt_regs

2023-11-21 Thread Ilya Leoshkevich

This is normally done by the generic entry code, but the
kernel_stack_overflow() flow bypasses it.

Reviewed-by: Alexander Potapenko 
Signed-off-by: Ilya Leoshkevich 
---
 arch/s390/kernel/traps.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/s390/kernel/traps.c b/arch/s390/kernel/traps.c
index 1d2aa448d103..f299b1203a20 100644
--- a/arch/s390/kernel/traps.c
+++ b/arch/s390/kernel/traps.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -260,6 +261,11 @@ static void monitor_event_exception(struct pt_regs *regs)
 
 void kernel_stack_overflow(struct pt_regs *regs)
 {
+   /*
+* Normally regs are unpoisoned by the generic entry code, but
+* kernel_stack_overflow() is a rare case that is called bypassing it.
+*/
+   kmsan_unpoison_entry_regs(regs);
bust_spinlocks(1);
printk("Kernel stack overflow.\n");
show_regs(regs);
-- 
2.41.0

[PATCH v2 32/33] s390: Implement the architecture-specific kmsan functions

2023-11-21 Thread Ilya Leoshkevich

arch_kmsan_get_meta_or_null() finds the lowcore shadow by querying the
prefix and calling kmsan_get_metadata() again.

kmsan_virt_addr_valid() delegates to virt_addr_valid().

Signed-off-by: Ilya Leoshkevich 
---
 arch/s390/include/asm/kmsan.h | 36 +++
 1 file changed, 36 insertions(+)
 create mode 100644 arch/s390/include/asm/kmsan.h

diff --git a/arch/s390/include/asm/kmsan.h b/arch/s390/include/asm/kmsan.h
new file mode 100644
index ..afec71e9e9ac
--- /dev/null
+++ b/arch/s390/include/asm/kmsan.h
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_S390_KMSAN_H
+#define _ASM_S390_KMSAN_H
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#ifndef MODULE
+
+static inline void *arch_kmsan_get_meta_or_null(void *addr, bool is_origin)
+{
+   if (addr >= (void *)&S390_lowcore &&
+   addr < (void *)(&S390_lowcore + 1)) {
+   /*
+* Different lowcores accessed via S390_lowcore are described
+* by the same struct page. Resolve the prefix manually in
+* order to get a distinct struct page.
+*/
+   addr += (void *)lowcore_ptr[raw_smp_processor_id()] -
+   (void *)&S390_lowcore;
+   return kmsan_get_metadata(addr, is_origin);
+   }
+   return NULL;
+}
+
+static inline bool kmsan_virt_addr_valid(void *addr)
+{
+   return virt_addr_valid(addr);
+}
+
+#endif /* !MODULE */
+
+#endif /* _ASM_S390_KMSAN_H */
-- 
2.41.0

[PATCH v2 33/33] kmsan: Enable on s390

2023-11-21 Thread Ilya Leoshkevich

Now that everything else is in place, enable KMSAN in Kconfig.

Signed-off-by: Ilya Leoshkevich 
---
 arch/s390/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 3bec98d20283..160ad2220c53 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -153,6 +153,7 @@ config S390
select HAVE_ARCH_KASAN
select HAVE_ARCH_KASAN_VMALLOC
select HAVE_ARCH_KCSAN
+   select HAVE_ARCH_KMSAN
select HAVE_ARCH_KFENCE
select HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
select HAVE_ARCH_SECCOMP_FILTER
-- 
2.41.0

[PATCH v2 30/33] s390/uaccess: Add KMSAN support to put_user() and get_user()

2023-11-21 Thread Ilya Leoshkevich

put_user() uses inline assembly with precise constraints, so Clang is
in principle capable of instrumenting it automatically. Unfortunately,
one of the constraints contains a dereferenced user pointer, and Clang
does not currently distinguish user and kernel pointers. Therefore
KMSAN attempts to access shadow for user pointers, which is not a right
thing to do.

An obvious fix to add __no_sanitize_memory to __put_user_fn() does not
work, since it's __always_inline. And __always_inline cannot be removed
due to the __put_user_bad() trick.

A different obvious fix of using the "a" instead of the "+Q" constraint
degrades the code quality, which is very important here, since it's a
hot path.

Instead, repurpose the __put_user_asm() macro to define
__put_user_{char,short,int,long}_noinstr() functions and mark them with
__no_sanitize_memory. For the non-KMSAN builds make them
__always_inline in order to keep the generated code quality. Also
define __put_user_{char,short,int,long}() functions, which call the
aforementioned ones and which *are* instrumented, because they call
KMSAN hooks, which may be implemented as macros.

The same applies to get_user() as well.

Acked-by: Heiko Carstens 
Signed-off-by: Ilya Leoshkevich 
---
 arch/s390/include/asm/uaccess.h | 110 ++--
 1 file changed, 78 insertions(+), 32 deletions(-)

diff --git a/arch/s390/include/asm/uaccess.h b/arch/s390/include/asm/uaccess.h
index 81ae8a98e7ec..b0715b88b55a 100644
--- a/arch/s390/include/asm/uaccess.h
+++ b/arch/s390/include/asm/uaccess.h
@@ -78,13 +78,23 @@ union oac {
 
 int __noreturn __put_user_bad(void);
 
-#define __put_user_asm(to, from, size) \
-({ \
+#ifdef CONFIG_KMSAN
+#define GET_PUT_USER_NOINSTR_ATTRIBUTES inline __no_sanitize_memory
+#else
+#define GET_PUT_USER_NOINSTR_ATTRIBUTES __always_inline
+#endif
+
+#define DEFINE_PUT_USER(type)  \
+static GET_PUT_USER_NOINSTR_ATTRIBUTES int \
+__put_user_##type##_noinstr(unsigned type __user *to,  \
+   unsigned type *from,\
+   unsigned long size) \
+{  \
union oac __oac_spec = {\
.oac1.as = PSW_BITS_AS_SECONDARY,   \
.oac1.a = 1,\
};  \
-   int __rc;   \
+   int rc; \
\
asm volatile(   \
"   lr  0,%[spec]\n"\
@@ -93,12 +103,28 @@ int __noreturn __put_user_bad(void);
"2:\n"  \
EX_TABLE_UA_STORE(0b, 2b, %[rc])\
EX_TABLE_UA_STORE(1b, 2b, %[rc])\
-   : [rc] "=&d" (__rc), [_to] "+Q" (*(to)) \
+   : [rc] "=&d" (rc), [_to] "+Q" (*(to))   \
: [_size] "d" (size), [_from] "Q" (*(from)),\
  [spec] "d" (__oac_spec.val)   \
: "cc", "0");   \
-   __rc;   \
-})
+   return rc;  \
+}  \
+   \
+static __always_inline int \
+__put_user_##type(unsigned type __user *to, unsigned type *from,   \
+ unsigned long size)   \
+{  \
+   int rc; \
+   \
+   rc = __put_user_##type##_noinstr(to, from, size);   \
+   instrument_put_user(*from, to, size);   \
+   return rc;  \
+}
+
+DEFINE_PUT_USER(char);
+DEFINE_PUT_USER(short);
+DEFINE_PUT_USER(int);
+DEFINE_PUT_USER(long);
 
 static __always_inline int __put_user_fn(void *x, void __user *ptr, unsigned 
long size)
 {
@@ -106,24 +132,24 @@ static __always_inline int __put_user_fn(void *x, void 
__user *ptr, unsigned lon
 
switch (size) {

[PATCH v2 13/33] kmsan: Introduce memset_no_sanitize_memory()

2023-11-21 Thread Ilya Leoshkevich

Add a wrapper for memset() that prevents unpoisoning. This is useful
for filling memory allocator redzones.

Signed-off-by: Ilya Leoshkevich 
---
 include/linux/kmsan.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
index ff8fd95733fa..439df72c8dc6 100644
--- a/include/linux/kmsan.h
+++ b/include/linux/kmsan.h
@@ -345,4 +345,13 @@ static inline void *kmsan_get_metadata(void *addr, bool 
is_origin)
 
 #endif
 
+/**
+ * memset_no_sanitize_memory() - memset() without the KMSAN instrumentation.
+ */
+__no_sanitize_memory
+static inline void *memset_no_sanitize_memory(void *s, int c, size_t n)
+{
+   return memset(s, c, n);
+}
+
 #endif /* _LINUX_KMSAN_H */
-- 
2.41.0

[PATCH v2 06/33] kmsan: Fix kmsan_copy_to_user() on arches with overlapping address spaces

2023-11-21 Thread Ilya Leoshkevich

Comparing pointers with TASK_SIZE does not make sense when kernel and
userspace overlap. Assume that we are handling user memory access in
this case.

Reported-by: Alexander Gordeev 
Reviewed-by: Alexander Potapenko 
Signed-off-by: Ilya Leoshkevich 
---
 mm/kmsan/hooks.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
index 5d6e2dee5692..eafc45f937eb 100644
--- a/mm/kmsan/hooks.c
+++ b/mm/kmsan/hooks.c
@@ -267,7 +267,8 @@ void kmsan_copy_to_user(void __user *to, const void *from, 
size_t to_copy,
return;
 
ua_flags = user_access_save();
-   if ((u64)to < TASK_SIZE) {
+   if (!IS_ENABLED(CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE) ||
+   (u64)to < TASK_SIZE) {
/* This is a user memory access, check it. */
kmsan_internal_check_memory((void *)from, to_copy - left, to,
REASON_COPY_TO_USER);
-- 
2.41.0

[PATCH v2 12/33] kmsan: Allow disabling KMSAN checks for the current task

2023-11-21 Thread Ilya Leoshkevich

Like for KASAN, it's useful to temporarily disable KMSAN checks around,
e.g., redzone accesses. Introduce kmsan_disable_current() and
kmsan_enable_current(), which are similar to their KASAN counterparts.

Even though it's not strictly necessary, make them reentrant, in order
to match the KASAN behavior. Repurpose the allow_reporting field for
this.

Signed-off-by: Ilya Leoshkevich 
---
 Documentation/dev-tools/kmsan.rst |  4 ++--
 include/linux/kmsan-checks.h  | 12 
 include/linux/kmsan_types.h   |  2 +-
 mm/kmsan/core.c   |  2 +-
 mm/kmsan/hooks.c  | 14 +-
 mm/kmsan/report.c |  6 +++---
 6 files changed, 32 insertions(+), 8 deletions(-)

diff --git a/Documentation/dev-tools/kmsan.rst 
b/Documentation/dev-tools/kmsan.rst
index 323eedad53cd..022a823f5f1b 100644
--- a/Documentation/dev-tools/kmsan.rst
+++ b/Documentation/dev-tools/kmsan.rst
@@ -338,11 +338,11 @@ Per-task KMSAN state
 
 
 Every task_struct has an associated KMSAN task state that holds the KMSAN
-context (see above) and a per-task flag disallowing KMSAN reports::
+context (see above) and a per-task counter disallowing KMSAN reports::
 
   struct kmsan_context {
 ...
-bool allow_reporting;
+unsigned int depth;
 struct kmsan_context_state cstate;
 ...
   }
diff --git a/include/linux/kmsan-checks.h b/include/linux/kmsan-checks.h
index 5218973f0ad0..bab2603685f7 100644
--- a/include/linux/kmsan-checks.h
+++ b/include/linux/kmsan-checks.h
@@ -72,6 +72,10 @@ void kmsan_copy_to_user(void __user *to, const void *from, 
size_t to_copy,
  */
 void kmsan_memmove_metadata(void *dst, const void *src, size_t n);
 
+void kmsan_enable_current(void);
+
+void kmsan_disable_current(void);
+
 #else
 
 static inline void kmsan_poison_memory(const void *address, size_t size,
@@ -92,6 +96,14 @@ static inline void kmsan_memmove_metadata(void *dst, const 
void *src, size_t n)
 {
 }
 
+static inline void kmsan_enable_current(void)
+{
+}
+
+static inline void kmsan_disable_current(void)
+{
+}
+
 #endif
 
 #endif /* _LINUX_KMSAN_CHECKS_H */
diff --git a/include/linux/kmsan_types.h b/include/linux/kmsan_types.h
index 8bfa6c98176d..27bb146ece95 100644
--- a/include/linux/kmsan_types.h
+++ b/include/linux/kmsan_types.h
@@ -29,7 +29,7 @@ struct kmsan_context_state {
 struct kmsan_ctx {
struct kmsan_context_state cstate;
int kmsan_in_runtime;
-   bool allow_reporting;
+   unsigned int depth;
 };
 
 #endif /* _LINUX_KMSAN_TYPES_H */
diff --git a/mm/kmsan/core.c b/mm/kmsan/core.c
index c19f47af0424..b8767378cf8a 100644
--- a/mm/kmsan/core.c
+++ b/mm/kmsan/core.c
@@ -43,7 +43,7 @@ void kmsan_internal_task_create(struct task_struct *task)
struct thread_info *info = current_thread_info();
 
__memset(ctx, 0, sizeof(*ctx));
-   ctx->allow_reporting = true;
+   ctx->depth = 0;
kmsan_internal_unpoison_memory(info, sizeof(*info), false);
 }
 
diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
index 4d477a0a356c..7b5814412e9f 100644
--- a/mm/kmsan/hooks.c
+++ b/mm/kmsan/hooks.c
@@ -44,7 +44,7 @@ void kmsan_task_exit(struct task_struct *task)
if (!kmsan_enabled || kmsan_in_runtime())
return;
 
-   ctx->allow_reporting = false;
+   ctx->depth++;
 }
 
 void kmsan_slab_alloc(struct kmem_cache *s, void *object, gfp_t flags)
@@ -434,3 +434,15 @@ void kmsan_check_memory(const void *addr, size_t size)
   REASON_ANY);
 }
 EXPORT_SYMBOL(kmsan_check_memory);
+
+void kmsan_enable_current(void)
+{
+   current->kmsan_ctx.depth--;
+}
+EXPORT_SYMBOL(kmsan_enable_current);
+
+void kmsan_disable_current(void)
+{
+   current->kmsan_ctx.depth++;
+}
+EXPORT_SYMBOL(kmsan_disable_current);
diff --git a/mm/kmsan/report.c b/mm/kmsan/report.c
index c79d3b0d2d0d..edcf53ca428e 100644
--- a/mm/kmsan/report.c
+++ b/mm/kmsan/report.c
@@ -158,12 +158,12 @@ void kmsan_report(depot_stack_handle_t origin, void 
*address, int size,
 
if (!kmsan_enabled)
return;
-   if (!current->kmsan_ctx.allow_reporting)
+   if (current->kmsan_ctx.depth)
return;
if (!origin)
return;
 
-   current->kmsan_ctx.allow_reporting = false;
+   current->kmsan_ctx.depth++;
ua_flags = user_access_save();
raw_spin_lock(&kmsan_report_lock);
pr_err("=\n");
@@ -216,5 +216,5 @@ void kmsan_report(depot_stack_handle_t origin, void 
*address, int size,
if (panic_on_kmsan)
panic("kmsan.panic set ...\n");
user_access_restore(ua_flags);
-   current->kmsan_ctx.allow_reporting = true;
+   current->kmsan_ctx.depth--;
 }
-- 
2.41.0

[PATCH v2 04/33] kmsan: Increase the maximum store size to 4096

2023-11-21 Thread Ilya Leoshkevich

The inline assembly block in s390's chsc() stores that much.

Signed-off-by: Ilya Leoshkevich 
---
 mm/kmsan/instrumentation.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/mm/kmsan/instrumentation.c b/mm/kmsan/instrumentation.c
index cc3907a9c33a..470b0b4afcc4 100644
--- a/mm/kmsan/instrumentation.c
+++ b/mm/kmsan/instrumentation.c
@@ -110,11 +110,10 @@ void __msan_instrument_asm_store(void *addr, uintptr_t 
size)
 
ua_flags = user_access_save();
/*
-* Most of the accesses are below 32 bytes. The two exceptions so far
-* are clwb() (64 bytes) and FPU state (512 bytes).
-* It's unlikely that the assembly will touch more than 512 bytes.
+* Most of the accesses are below 32 bytes. The exceptions so far are
+* clwb() (64 bytes), FPU state (512 bytes) and chsc() (4096 bytes).
 */
-   if (size > 512) {
+   if (size > 4096) {
WARN_ONCE(1, "assembly store size too big: %ld\n", size);
size = 8;
}
-- 
2.41.0

[PATCH v2 09/33] kmsan: Introduce kmsan_memmove_metadata()

2023-11-21 Thread Ilya Leoshkevich

It is useful to manually copy metadata in order to describe the effects
of memmove()-like logic in uninstrumented code or inline asm. Introduce
kmsan_memmove_metadata() for this purpose.

Signed-off-by: Ilya Leoshkevich 
---
 include/linux/kmsan-checks.h | 14 ++
 mm/kmsan/hooks.c | 11 +++
 2 files changed, 25 insertions(+)

diff --git a/include/linux/kmsan-checks.h b/include/linux/kmsan-checks.h
index c4cae333deec..5218973f0ad0 100644
--- a/include/linux/kmsan-checks.h
+++ b/include/linux/kmsan-checks.h
@@ -61,6 +61,17 @@ void kmsan_check_memory(const void *address, size_t size);
 void kmsan_copy_to_user(void __user *to, const void *from, size_t to_copy,
size_t left);
 
+/**
+ * kmsan_memmove_metadata() - Copy kernel memory range metadata.
+ * @dst: start of the destination kernel memory range.
+ * @src: start of the source kernel memory range.
+ * @n:   size of the memory ranges.
+ *
+ * KMSAN will treat the destination range as if its contents were memmove()d
+ * from the source range.
+ */
+void kmsan_memmove_metadata(void *dst, const void *src, size_t n);
+
 #else
 
 static inline void kmsan_poison_memory(const void *address, size_t size,
@@ -77,6 +88,9 @@ static inline void kmsan_copy_to_user(void __user *to, const 
void *from,
  size_t to_copy, size_t left)
 {
 }
+static inline void kmsan_memmove_metadata(void *dst, const void *src, size_t n)
+{
+}
 
 #endif
 
diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
index eafc45f937eb..4d477a0a356c 100644
--- a/mm/kmsan/hooks.c
+++ b/mm/kmsan/hooks.c
@@ -286,6 +286,17 @@ void kmsan_copy_to_user(void __user *to, const void *from, 
size_t to_copy,
 }
 EXPORT_SYMBOL(kmsan_copy_to_user);
 
+void kmsan_memmove_metadata(void *dst, const void *src, size_t n)
+{
+   if (!kmsan_enabled || kmsan_in_runtime())
+   return;
+
+   kmsan_enter_runtime();
+   kmsan_internal_memmove_metadata(dst, (void *)src, n);
+   kmsan_leave_runtime();
+}
+EXPORT_SYMBOL(kmsan_memmove_metadata);
+
 /* Helper function to check an URB. */
 void kmsan_handle_urb(const struct urb *urb, bool is_out)
 {
-- 
2.41.0

[PATCH v2 19/33] lib/zlib: Unpoison DFLTCC output buffers

2023-11-21 Thread Ilya Leoshkevich

The constraints of the DFLTCC inline assembly are not precise: they
do not communicate the size of the output buffers to the compiler, so
it cannot automatically instrument it.

Add the manual kmsan_unpoison_memory() calls for the output buffers.
The logic is the same as in [1].

[1] 
https://github.com/zlib-ng/zlib-ng/commit/1f5ddcc009ac3511e99fc88736a9e1a6381168c5

Reported-by: Alexander Gordeev 
Signed-off-by: Ilya Leoshkevich 
---
 lib/zlib_dfltcc/dfltcc.h  |  1 +
 lib/zlib_dfltcc/dfltcc_util.h | 23 +++
 2 files changed, 24 insertions(+)

diff --git a/lib/zlib_dfltcc/dfltcc.h b/lib/zlib_dfltcc/dfltcc.h
index b96232bdd44d..0f2a16d7a48a 100644
--- a/lib/zlib_dfltcc/dfltcc.h
+++ b/lib/zlib_dfltcc/dfltcc.h
@@ -80,6 +80,7 @@ struct dfltcc_param_v0 {
 uint8_t csb[1152];
 };
 
+static_assert(offsetof(struct dfltcc_param_v0, csb) == 384);
 static_assert(sizeof(struct dfltcc_param_v0) == 1536);
 
 #define CVT_CRC32 0
diff --git a/lib/zlib_dfltcc/dfltcc_util.h b/lib/zlib_dfltcc/dfltcc_util.h
index 4a46b5009f0d..ce2e039a55b5 100644
--- a/lib/zlib_dfltcc/dfltcc_util.h
+++ b/lib/zlib_dfltcc/dfltcc_util.h
@@ -2,6 +2,7 @@
 #ifndef DFLTCC_UTIL_H
 #define DFLTCC_UTIL_H
 
+#include "dfltcc.h"
 #include 
 
 /*
@@ -20,6 +21,7 @@ typedef enum {
 #define DFLTCC_CMPR 2
 #define DFLTCC_XPND 4
 #define HBT_CIRCULAR (1 << 7)
+#define DFLTCC_FN_MASK ((1 << 7) - 1)
 #define HB_BITS 15
 #define HB_SIZE (1 << HB_BITS)
 
@@ -34,6 +36,7 @@ static inline dfltcc_cc dfltcc(
 )
 {
 Byte *t2 = op1 ? *op1 : NULL;
+unsigned char *orig_t2 = t2;
 size_t t3 = len1 ? *len1 : 0;
 const Byte *t4 = op2 ? *op2 : NULL;
 size_t t5 = len2 ? *len2 : 0;
@@ -59,6 +62,26 @@ static inline dfltcc_cc dfltcc(
  : "cc", "memory");
 t2 = r2; t3 = r3; t4 = r4; t5 = r5;
 
+switch (fn & DFLTCC_FN_MASK) {
+case DFLTCC_QAF:
+kmsan_unpoison_memory(param, sizeof(struct dfltcc_qaf_param));
+break;
+case DFLTCC_GDHT:
+kmsan_unpoison_memory(param, offsetof(struct dfltcc_param_v0, csb));
+break;
+case DFLTCC_CMPR:
+kmsan_unpoison_memory(param, sizeof(struct dfltcc_param_v0));
+kmsan_unpoison_memory(
+orig_t2,
+t2 - orig_t2 +
+(((struct dfltcc_param_v0 *)param)->sbb == 0 ? 0 : 1));
+break;
+case DFLTCC_XPND:
+kmsan_unpoison_memory(param, sizeof(struct dfltcc_param_v0));
+kmsan_unpoison_memory(orig_t2, t2 - orig_t2);
+break;
+}
+
 if (op1)
 *op1 = t2;
 if (len1)
-- 
2.41.0

[PATCH v2 21/33] s390: Turn off KMSAN for boot, vdso and purgatory

2023-11-21 Thread Ilya Leoshkevich

All other sanitizers are disabled for these components as well.
While at it, add a comment to boot and purgatory.

Reviewed-by: Alexander Gordeev 
Reviewed-by: Alexander Potapenko 
Signed-off-by: Ilya Leoshkevich 
---
 arch/s390/boot/Makefile  | 2 ++
 arch/s390/kernel/vdso32/Makefile | 3 ++-
 arch/s390/kernel/vdso64/Makefile | 3 ++-
 arch/s390/purgatory/Makefile | 2 ++
 4 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/s390/boot/Makefile b/arch/s390/boot/Makefile
index c7c81e5f9218..fb10fcd21221 100644
--- a/arch/s390/boot/Makefile
+++ b/arch/s390/boot/Makefile
@@ -3,11 +3,13 @@
 # Makefile for the linux s390-specific parts of the memory manager.
 #
 
+# Tooling runtimes are unavailable and cannot be linked for early boot code
 KCOV_INSTRUMENT := n
 GCOV_PROFILE := n
 UBSAN_SANITIZE := n
 KASAN_SANITIZE := n
 KCSAN_SANITIZE := n
+KMSAN_SANITIZE := n
 
 KBUILD_AFLAGS := $(KBUILD_AFLAGS_DECOMPRESSOR)
 KBUILD_CFLAGS := $(KBUILD_CFLAGS_DECOMPRESSOR)
diff --git a/arch/s390/kernel/vdso32/Makefile b/arch/s390/kernel/vdso32/Makefile
index caec7db6f966..7cbec6b0b11f 100644
--- a/arch/s390/kernel/vdso32/Makefile
+++ b/arch/s390/kernel/vdso32/Makefile
@@ -32,11 +32,12 @@ obj-y += vdso32_wrapper.o
 targets += vdso32.lds
 CPPFLAGS_vdso32.lds += -P -C -U$(ARCH)
 
-# Disable gcov profiling, ubsan and kasan for VDSO code
+# Disable gcov profiling, ubsan, kasan and kmsan for VDSO code
 GCOV_PROFILE := n
 UBSAN_SANITIZE := n
 KASAN_SANITIZE := n
 KCSAN_SANITIZE := n
+KMSAN_SANITIZE := n
 
 # Force dependency (incbin is bad)
 $(obj)/vdso32_wrapper.o : $(obj)/vdso32.so
diff --git a/arch/s390/kernel/vdso64/Makefile b/arch/s390/kernel/vdso64/Makefile
index e3c9085f8fa7..6f3252712f64 100644
--- a/arch/s390/kernel/vdso64/Makefile
+++ b/arch/s390/kernel/vdso64/Makefile
@@ -36,11 +36,12 @@ obj-y += vdso64_wrapper.o
 targets += vdso64.lds
 CPPFLAGS_vdso64.lds += -P -C -U$(ARCH)
 
-# Disable gcov profiling, ubsan and kasan for VDSO code
+# Disable gcov profiling, ubsan, kasan and kmsan for VDSO code
 GCOV_PROFILE := n
 UBSAN_SANITIZE := n
 KASAN_SANITIZE := n
 KCSAN_SANITIZE := n
+KMSAN_SANITIZE := n
 
 # Force dependency (incbin is bad)
 $(obj)/vdso64_wrapper.o : $(obj)/vdso64.so
diff --git a/arch/s390/purgatory/Makefile b/arch/s390/purgatory/Makefile
index 4e930f566878..4e421914e50f 100644
--- a/arch/s390/purgatory/Makefile
+++ b/arch/s390/purgatory/Makefile
@@ -15,11 +15,13 @@ CFLAGS_sha256.o := -D__DISABLE_EXPORTS -D__NO_FORTIFY
 $(obj)/mem.o: $(srctree)/arch/s390/lib/mem.S FORCE
$(call if_changed_rule,as_o_S)
 
+# Tooling runtimes are unavailable and cannot be linked for purgatory code
 KCOV_INSTRUMENT := n
 GCOV_PROFILE := n
 UBSAN_SANITIZE := n
 KASAN_SANITIZE := n
 KCSAN_SANITIZE := n
+KMSAN_SANITIZE := n
 
 KBUILD_CFLAGS := -fno-strict-aliasing -Wall -Wstrict-prototypes
 KBUILD_CFLAGS += -Wno-pointer-sign -Wno-sign-compare
-- 
2.41.0

[PATCH v2 27/33] s390/mm: Define KMSAN metadata for vmalloc and modules

2023-11-21 Thread Ilya Leoshkevich

The pages for the KMSAN metadata associated with most kernel mappings
are taken from memblock by the common code. However, vmalloc and module
metadata needs to be defined by the architectures.

Be a little bit more careful than x86: allocate exactly MODULES_LEN
for the module shadow and origins, and then take 2/3 of vmalloc for
the vmalloc shadow and origins. This ensures that users passing small
vmalloc= values on the command line do not cause module metadata
collisions.

Signed-off-by: Ilya Leoshkevich 
---
 arch/s390/boot/startup.c|  8 
 arch/s390/include/asm/pgtable.h | 10 ++
 2 files changed, 18 insertions(+)

diff --git a/arch/s390/boot/startup.c b/arch/s390/boot/startup.c
index 8104e0e3d188..e37e7ffda430 100644
--- a/arch/s390/boot/startup.c
+++ b/arch/s390/boot/startup.c
@@ -253,9 +253,17 @@ static unsigned long setup_kernel_memory_layout(void)
MODULES_END = round_down(__abs_lowcore, _SEGMENT_SIZE);
MODULES_VADDR = MODULES_END - MODULES_LEN;
VMALLOC_END = MODULES_VADDR;
+#ifdef CONFIG_KMSAN
+   VMALLOC_END -= MODULES_LEN * 2;
+#endif
 
/* allow vmalloc area to occupy up to about 1/2 of the rest virtual 
space left */
vmalloc_size = min(vmalloc_size, round_down(VMALLOC_END / 2, 
_REGION3_SIZE));
+#ifdef CONFIG_KMSAN
+   /* take 2/3 of vmalloc area for KMSAN shadow and origins */
+   vmalloc_size = round_down(vmalloc_size / 3, _REGION3_SIZE);
+   VMALLOC_END -= vmalloc_size * 2;
+#endif
VMALLOC_START = VMALLOC_END - vmalloc_size;
 
/* split remaining virtual space between 1:1 mapping & vmemmap array */
diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 601e87fa8a9a..d764abeb9e6d 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -107,6 +107,16 @@ static inline int is_module_addr(void *addr)
return 1;
 }
 
+#ifdef CONFIG_KMSAN
+#define KMSAN_VMALLOC_SIZE (VMALLOC_END - VMALLOC_START)
+#define KMSAN_VMALLOC_SHADOW_START VMALLOC_END
+#define KMSAN_VMALLOC_ORIGIN_START (KMSAN_VMALLOC_SHADOW_START + \
+   KMSAN_VMALLOC_SIZE)
+#define KMSAN_MODULES_SHADOW_START (KMSAN_VMALLOC_ORIGIN_START + \
+   KMSAN_VMALLOC_SIZE)
+#define KMSAN_MODULES_ORIGIN_START (KMSAN_MODULES_SHADOW_START + MODULES_LEN)
+#endif
+
 /*
  * A 64 bit pagetable entry of S390 has following format:
  * |PFRA |0IPC|  OS  |
-- 
2.41.0

[syzbot] [bpf?] [trace?] WARNING in format_decode (3)

2023-11-21 Thread syzbot

Hello,

syzbot found the following issue on:

HEAD commit:76df934c6d5f MAINTAINERS: Add netdev subsystem profile link
git tree:   net
console+strace: https://syzkaller.appspot.com/x/log.txt?x=10c2b66768
kernel config:  https://syzkaller.appspot.com/x/.config?x=84217b7fc4acdc59
dashboard link: https://syzkaller.appspot.com/bug?extid=e2c932aec5c8a6e1d31c
compiler:   gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 
2.40
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=12b2f668e8
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=171ea200e8

Downloadable assets:
disk image: 
https://storage.googleapis.com/syzbot-assets/e271179068c6/disk-76df934c.raw.xz
vmlinux: 
https://storage.googleapis.com/syzbot-assets/b9523b3749bb/vmlinux-76df934c.xz
kernel image: 
https://storage.googleapis.com/syzbot-assets/6c1a888bade0/bzImage-76df934c.xz

The issue was bisected to:

commit 114039b342014680911c35bd6b72624180fd669a
Author: Stanislav Fomichev 
Date:   Mon Nov 21 18:03:39 2022 +

bpf: Move skb->len == 0 checks into __bpf_redirect

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=13d237b768
final oops: https://syzkaller.appspot.com/x/report.txt?x=103237b768
console output: https://syzkaller.appspot.com/x/log.txt?x=17d237b768

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+e2c932aec5c8a6e1d...@syzkaller.appspotmail.com
Fixes: 114039b34201 ("bpf: Move skb->len == 0 checks into __bpf_redirect")

[ cut here ]
Please remove unsupported % in format string
WARNING: CPU: 0 PID: 5068 at lib/vsprintf.c:2675 format_decode+0xa03/0xba0 
lib/vsprintf.c:2675
Modules linked in:
CPU: 0 PID: 5068 Comm: syz-executor288 Not tainted 
6.7.0-rc1-syzkaller-00134-g76df934c6d5f #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
11/10/2023
RIP: 0010:format_decode+0xa03/0xba0 lib/vsprintf.c:2675
Code: f7 41 c6 44 24 05 08 e9 c4 fa ff ff e8 c6 f7 15 f7 c6 05 0b bd 91 04 01 
90 48 c7 c7 60 5f 19 8c 40 0f b6 f5 e8 2e 17 dc f6 90 <0f> 0b 90 90 e9 17 fc ff 
ff 48 8b 3c 24 e8 4b 87 6c f7 e9 13 f7 ff
RSP: 0018:c90003b6f798 EFLAGS: 00010286
RAX:  RBX: c90003b6fa0c RCX: 814db209
RDX: 8880214b9dc0 RSI: 814db216 RDI: 0001
RBP:  R08: 0001 R09: 
R10:  R11: 0001 R12: c90003b6f898
R13:  R14:  R15: ffd0
FS:  5567c380() GS:8880b980() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 00f6f398 CR3: 251e7000 CR4: 003506f0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 
 bstr_printf+0x13b/0x1050 lib/vsprintf.c:3248
 bpf_trace_printk kernel/trace/bpf_trace.c:386 [inline]
 bpf_trace_printk+0x10b/0x180 kernel/trace/bpf_trace.c:371
 bpf_prog_12183cdb1cd51dab+0x36/0x3a
 bpf_dispatcher_nop_func include/linux/bpf.h:1196 [inline]
 __bpf_prog_run include/linux/filter.h:651 [inline]
 bpf_prog_run include/linux/filter.h:658 [inline]
 bpf_test_run+0x3e1/0x9e0 net/bpf/test_run.c:423
 bpf_prog_test_run_skb+0xb75/0x1dd0 net/bpf/test_run.c:1045
 bpf_prog_test_run kernel/bpf/syscall.c:4040 [inline]
 __sys_bpf+0x11bf/0x4920 kernel/bpf/syscall.c:5401
 __do_sys_bpf kernel/bpf/syscall.c:5487 [inline]
 __se_sys_bpf kernel/bpf/syscall.c:5485 [inline]
 __x64_sys_bpf+0x78/0xc0 kernel/bpf/syscall.c:5485
 do_syscall_x64 arch/x86/entry/common.c:51 [inline]
 do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82
 entry_SYSCALL_64_after_hwframe+0x63/0x6b
RIP: 0033:0x7fefcec014e9
Code: 48 83 c4 28 c3 e8 37 17 00 00 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 
89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 
c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:7ffca6179888 EFLAGS: 0246 ORIG_RAX: 0141
RAX: ffda RBX: 7ffca6179a58 RCX: 7fefcec014e9
RDX: 0028 RSI: 2080 RDI: 000a
RBP: 7fefcec74610 R08:  R09: 7ffca6179a58
R10:  R11: 0246 R12: 0001
R13: 7ffca6179a48 R14: 0001 R15: 0001
 


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach

[PATCH 05/14] tools headers UAPI: Update tools's copy of vhost.h header

2023-11-21 Thread Namhyung Kim

tldr; Just FYI, I'm carrying this on the perf tools tree.

Full explanation:

There used to be no copies, with tools/ code using kernel headers
directly. From time to time tools/perf/ broke due to legitimate kernel
hacking. At some point Linus complained about such direct usage. Then we
adopted the current model.

The way these headers are used in perf are not restricted to just
including them to compile something.

There are sometimes used in scripts that convert defines into string
tables, etc, so some change may break one of these scripts, or new MSRs
may use some different #define pattern, etc.

E.g.:

  $ ls -1 tools/perf/trace/beauty/*.sh | head -5
  tools/perf/trace/beauty/arch_errno_names.sh
  tools/perf/trace/beauty/drm_ioctl.sh
  tools/perf/trace/beauty/fadvise.sh
  tools/perf/trace/beauty/fsconfig.sh
  tools/perf/trace/beauty/fsmount.sh
  $
  $ tools/perf/trace/beauty/fadvise.sh
  static const char *fadvise_advices[] = {
[0] = "NORMAL",
[1] = "RANDOM",
[2] = "SEQUENTIAL",
[3] = "WILLNEED",
[4] = "DONTNEED",
[5] = "NOREUSE",
  };
  $

The tools/perf/check-headers.sh script, part of the tools/ build
process, points out changes in the original files.

So its important not to touch the copies in tools/ when doing changes in
the original kernel headers, that will be done later, when
check-headers.sh inform about the change to the perf tools hackers.

Cc: "Michael S. Tsirkin" 
Cc: Jason Wang 
Cc: k...@vger.kernel.org
Cc: virtualizat...@lists.linux.dev
Cc: net...@vger.kernel.org
Signed-off-by: Namhyung Kim 
---
 tools/include/uapi/linux/vhost.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/tools/include/uapi/linux/vhost.h b/tools/include/uapi/linux/vhost.h
index f5c48b61ab62..649560c685f1 100644
--- a/tools/include/uapi/linux/vhost.h
+++ b/tools/include/uapi/linux/vhost.h
@@ -219,4 +219,12 @@
  */
 #define VHOST_VDPA_RESUME  _IO(VHOST_VIRTIO, 0x7E)
 
+/* Get the group for the descriptor table including driver & device areas
+ * of a virtqueue: read index, write group in num.
+ * The virtqueue index is stored in the index field of vhost_vring_state.
+ * The group ID of the descriptor table for this specific virtqueue
+ * is returned via num field of vhost_vring_state.
+ */
+#define VHOST_VDPA_GET_VRING_DESC_GROUP_IOWR(VHOST_VIRTIO, 0x7F,   
\
+ struct vhost_vring_state)
 #endif
-- 
2.43.0.rc1.413.gea7ed67945-goog

Re: [PATCH v7 3/4] remoteproc: zynqmp: add pm domains support

2023-11-21 Thread Mathieu Poirier

Hi,

On Fri, Nov 17, 2023 at 09:42:37AM -0800, Tanmay Shah wrote:
> Use TCM pm domains extracted from device-tree
> to power on/off TCM using general pm domain framework.
> 
> Signed-off-by: Tanmay Shah 
> ---
> 
> Changes in v7:
>   - %s/pm_dev1/pm_dev_core0/r
>   - %s/pm_dev_link1/pm_dev_core0_link/r
>   - %s/pm_dev2/pm_dev_core1/r
>   - %s/pm_dev_link2/pm_dev_core1_link/r
>   - remove pm_domain_id check to move next patch
>   - add comment about how 1st entry in pm domain list is used
>   - fix loop when jump to fail_add_pm_domains loop
> 
>  drivers/remoteproc/xlnx_r5_remoteproc.c | 215 +++-
>  1 file changed, 212 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c 
> b/drivers/remoteproc/xlnx_r5_remoteproc.c
> index 4395edea9a64..22bccc5075a0 100644
> --- a/drivers/remoteproc/xlnx_r5_remoteproc.c
> +++ b/drivers/remoteproc/xlnx_r5_remoteproc.c
> @@ -16,6 +16,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "remoteproc_internal.h"
>  
> @@ -102,6 +103,12 @@ static const struct mem_bank_data 
> zynqmp_tcm_banks_lockstep[] = {
>   * @rproc: rproc handle
>   * @pm_domain_id: RPU CPU power domain id
>   * @ipi: pointer to mailbox information
> + * @num_pm_dev: number of tcm pm domain devices for this core
> + * @pm_dev_core0: pm domain virtual devices for power domain framework
> + * @pm_dev_core0_link: pm domain device links after registration
> + * @pm_dev_core1: used only in lockstep mode. second core's pm domain 
> virtual devices
> + * @pm_dev_core1_link: used only in lockstep mode. second core's pm device 
> links after
> + * registration
>   */
>  struct zynqmp_r5_core {
>   struct device *dev;
> @@ -111,6 +118,11 @@ struct zynqmp_r5_core {
>   struct rproc *rproc;
>   u32 pm_domain_id;
>   struct mbox_info *ipi;
> + int num_pm_dev;
> + struct device **pm_dev_core0;
> + struct device_link **pm_dev_core0_link;
> + struct device **pm_dev_core1;
> + struct device_link **pm_dev_core1_link;
>  };
>  
>  /**
> @@ -651,7 +663,8 @@ static int add_tcm_carveout_lockstep_mode(struct rproc 
> *rproc)
>ZYNQMP_PM_CAPABILITY_ACCESS, 0,
>ZYNQMP_PM_REQUEST_ACK_BLOCKING);
>   if (ret < 0) {
> - dev_err(dev, "failed to turn on TCM 0x%x", 
> pm_domain_id);
> + dev_err(dev, "failed to turn on TCM 0x%x",
> + pm_domain_id);

Spurious change, you should have caught that.

>   goto release_tcm_lockstep;
>   }
>  
> @@ -758,6 +771,189 @@ static int zynqmp_r5_parse_fw(struct rproc *rproc, 
> const struct firmware *fw)
>   return ret;
>  }
>  
> +static void zynqmp_r5_remove_pm_domains(struct rproc *rproc)
> +{
> + struct zynqmp_r5_core *r5_core = rproc->priv;
> + struct device *dev = r5_core->dev;
> + struct zynqmp_r5_cluster *cluster;
> + int i;
> +
> + cluster = platform_get_drvdata(to_platform_device(dev->parent));
> +
> + for (i = 1; i < r5_core->num_pm_dev; i++) {
> + device_link_del(r5_core->pm_dev_core0_link[i]);
> + dev_pm_domain_detach(r5_core->pm_dev_core0[i], false);
> + }
> +
> + kfree(r5_core->pm_dev_core0);
> + r5_core->pm_dev_core0 = NULL;
> + kfree(r5_core->pm_dev_core0_link);
> + r5_core->pm_dev_core0_link = NULL;
> +
> + if (cluster->mode == SPLIT_MODE) {
> + r5_core->num_pm_dev = 0;
> + return;
> + }
> +
> + for (i = 1; i < r5_core->num_pm_dev; i++) {
> + device_link_del(r5_core->pm_dev_core1_link[i]);
> + dev_pm_domain_detach(r5_core->pm_dev_core1[i], false);
> + }
> +
> + kfree(r5_core->pm_dev_core1);
> + r5_core->pm_dev_core1 = NULL;
> + kfree(r5_core->pm_dev_core1_link);
> + r5_core->pm_dev_core1_link = NULL;
> + r5_core->num_pm_dev = 0;
> +}
> +
> +static int zynqmp_r5_add_pm_domains(struct rproc *rproc)
> +{
> + struct zynqmp_r5_core *r5_core = rproc->priv;
> + struct device *dev = r5_core->dev, *dev2;
> + struct zynqmp_r5_cluster *cluster;
> + struct platform_device *pdev;
> + struct device_node *np;
> + int i, j, num_pm_dev, ret;
> +
> + cluster = dev_get_drvdata(dev->parent);
> +
> + /* get number of power-domains */
> + num_pm_dev = of_count_phandle_with_args(r5_core->np, "power-domains",
> + "#power-domain-cells");
> +
> + if (num_pm_dev <= 0)
> + return -EINVAL;
> +
> + r5_core->pm_dev_core0 = kcalloc(num_pm_dev,
> + sizeof(struct device *),
> + GFP_KERNEL);
> + if (!r5_core->pm_dev_core0)
> + ret = -ENOMEM;
> +
> + r5_core->pm_dev_core0_link = kcalloc(num_pm_dev,
> +  sizeof(struct device_link *),
> +

[PATCH 1/4] eventfs: Use GFP_NOFS for allocation when eventfs_mutex is held

2023-11-21 Thread Steven Rostedt

From: "Steven Rostedt (Google)" 

If memory reclaim happens, it can reclaim file system pages. The file
system pages from eventfs may take the eventfs_mutex on reclaim. This
means that allocation while holding the eventfs_mutex must not call into
filesystem reclaim. A lockdep splat uncovered this.

Fixes: 28e12c09f5aa0 ("eventfs: Save ownership and mode")
Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode")
Reported-by: Mark Rutland 
Signed-off-by: Steven Rostedt (Google) 
---
 fs/tracefs/event_inode.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 3eb6c622a74d..56d192f0ead8 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -95,7 +95,7 @@ static int eventfs_set_attr(struct mnt_idmap *idmap, struct 
dentry *dentry,
if (!(dentry->d_inode->i_mode & S_IFDIR)) {
if (!ei->entry_attrs) {
ei->entry_attrs = kzalloc(sizeof(*ei->entry_attrs) * 
ei->nr_entries,
- GFP_KERNEL);
+ GFP_NOFS);
if (!ei->entry_attrs) {
ret = -ENOMEM;
goto out;
@@ -627,7 +627,7 @@ static int add_dentries(struct dentry ***dentries, struct 
dentry *d, int cnt)
 {
struct dentry **tmp;
 
-   tmp = krealloc(*dentries, sizeof(d) * (cnt + 2), GFP_KERNEL);
+   tmp = krealloc(*dentries, sizeof(d) * (cnt + 2), GFP_NOFS);
if (!tmp)
return -1;
tmp[cnt] = d;
-- 
2.42.0

[PATCH 3/4] eventfs: Do not allow NULL parent to eventfs_start_creating()

2023-11-21 Thread Steven Rostedt

From: "Steven Rostedt (Google)" 

The eventfs directory is dynamically created via the meta data supplied by
the existing trace events. All files and directories in eventfs has a
parent. Do not allow NULL to be passed into eventfs_start_creating() as
the parent because that should never happen. Warn if it does.

Signed-off-by: Steven Rostedt (Google) 
---
 fs/tracefs/inode.c | 13 -
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index 5b54948514fe..ae648deed019 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -509,20 +509,15 @@ struct dentry *eventfs_start_creating(const char *name, 
struct dentry *parent)
struct dentry *dentry;
int error;
 
+   /* Must always have a parent. */
+   if (WARN_ON_ONCE(!parent))
+   return ERR_PTR(-EINVAL);
+
error = simple_pin_fs(&trace_fs_type, &tracefs_mount,
  &tracefs_mount_count);
if (error)
return ERR_PTR(error);
 
-   /*
-* If the parent is not specified, we create it in the root.
-* We need the root dentry to do this, which is in the super
-* block. A pointer to that is in the struct vfsmount that we
-* have around.
-*/
-   if (!parent)
-   parent = tracefs_mount->mnt_root;
-
if (unlikely(IS_DEADDIR(parent->d_inode)))
dentry = ERR_PTR(-ENOENT);
else
-- 
2.42.0

[PATCH 0/4] eventfs: Some more minor fixes

2023-11-21 Thread Steven Rostedt

Mark Rutland reported some crashes from the latest eventfs updates.
This fixes most of them.

He still has one splat that he can trigger but I can not. Still looking
into that.


Steven Rostedt (Google) (4):
  eventfs: Use GFP_NOFS for allocation when eventfs_mutex is held
  eventfs: Move taking of inode_lock into dcache_dir_open_wrapper()
  eventfs: Do not allow NULL parent to eventfs_start_creating()
  eventfs: Make sure that parent->d_inode is locked in creating files/dirs


 fs/tracefs/event_inode.c | 24 
 fs/tracefs/inode.c   | 13 -
 2 files changed, 12 insertions(+), 25 deletions(-)

[PATCH 2/4] eventfs: Move taking of inode_lock into dcache_dir_open_wrapper()

2023-11-21 Thread Steven Rostedt

From: "Steven Rostedt (Google)" 

The both create_file_dentry() and create_dir_dentry() takes a boolean
parameter "lookup", as on lookup the inode_lock should already be taken,
but for dcache_dir_open_wrapper() it is not taken.

There's no reason that the dcache_dir_open_wrapper() can't take the
inode_lock before calling these functions. In fact, it's better if it
does, as the lock can be held throughout both directory and file
creations.

This also simplifies the code, and possibly prevents unexpected race
conditions when the lock is released.

Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode")
Signed-off-by: Steven Rostedt (Google) 
---
 fs/tracefs/event_inode.c | 16 ++--
 1 file changed, 2 insertions(+), 14 deletions(-)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 56d192f0ead8..590e8176449b 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -347,15 +347,8 @@ create_file_dentry(struct eventfs_inode *ei, int idx,
 
mutex_unlock(&eventfs_mutex);
 
-   /* The lookup already has the parent->d_inode locked */
-   if (!lookup)
-   inode_lock(parent->d_inode);
-
dentry = create_file(name, mode, attr, parent, data, fops);
 
-   if (!lookup)
-   inode_unlock(parent->d_inode);
-
mutex_lock(&eventfs_mutex);
 
if (IS_ERR_OR_NULL(dentry)) {
@@ -453,15 +446,8 @@ create_dir_dentry(struct eventfs_inode *pei, struct 
eventfs_inode *ei,
}
mutex_unlock(&eventfs_mutex);
 
-   /* The lookup already has the parent->d_inode locked */
-   if (!lookup)
-   inode_lock(parent->d_inode);
-
dentry = create_dir(ei, parent);
 
-   if (!lookup)
-   inode_unlock(parent->d_inode);
-
mutex_lock(&eventfs_mutex);
 
if (IS_ERR_OR_NULL(dentry) && !ei->is_freed) {
@@ -693,6 +679,7 @@ static int dcache_dir_open_wrapper(struct inode *inode, 
struct file *file)
return -ENOMEM;
}
 
+   inode_lock(parent->d_inode);
list_for_each_entry_srcu(ei_child, &ei->children, list,
 srcu_read_lock_held(&eventfs_srcu)) {
d = create_dir_dentry(ei, ei_child, parent, false);
@@ -725,6 +712,7 @@ static int dcache_dir_open_wrapper(struct inode *inode, 
struct file *file)
cnt++;
}
}
+   inode_unlock(parent->d_inode);
srcu_read_unlock(&eventfs_srcu, idx);
ret = dcache_dir_open(inode, file);
 
-- 
2.42.0

[PATCH 4/4] eventfs: Make sure that parent->d_inode is locked in creating files/dirs

2023-11-21 Thread Steven Rostedt

From: "Steven Rostedt (Google)" 

Since the locking of the parent->d_inode has been moved outside the
creation of the files and directories (as it use to be locked via a
conditional), add a WARN_ON_ONCE() to the case that it's not locked.

Signed-off-by: Steven Rostedt (Google) 
---
 fs/tracefs/event_inode.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 590e8176449b..0b90869fd805 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -327,6 +327,8 @@ create_file_dentry(struct eventfs_inode *ei, int idx,
struct dentry **e_dentry = &ei->d_children[idx];
struct dentry *dentry;
 
+   WARN_ON_ONCE(!inode_is_locked(parent->d_inode));
+
mutex_lock(&eventfs_mutex);
if (ei->is_freed) {
mutex_unlock(&eventfs_mutex);
@@ -430,6 +432,8 @@ create_dir_dentry(struct eventfs_inode *pei, struct 
eventfs_inode *ei,
 {
struct dentry *dentry = NULL;
 
+   WARN_ON_ONCE(!inode_is_locked(parent->d_inode));
+
mutex_lock(&eventfs_mutex);
if (pei->is_freed || ei->is_freed) {
mutex_unlock(&eventfs_mutex);
-- 
2.42.0

Re: [PATCH] MAINTAINERS: TRACING: Add Mathieu Desnoyers as Reviewer

2023-11-21 Thread Google

On Wed, 15 Nov 2023 10:50:18 -0500
Mathieu Desnoyers  wrote:

> In order to make sure I get CC'd on tracing changes for which my input
> would be relevant, add my name as reviewer of the TRACING subsystem.
> 

Yes, you should be a reviewer for tracing subsystem :) 

Acked-by: Masami Hiramatsu (Google) 

Thanks!

> Signed-off-by: Mathieu Desnoyers 
> Cc: Steven Rostedt 
> Cc: Masami Hiramatsu 
> Cc: linux-trace-ker...@vger.kernel.org
> ---
>  MAINTAINERS | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 4cc6bf79fdd8..a7c2092d0063 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -21622,6 +21622,7 @@ F:drivers/hwmon/pmbus/tps546d24.c
>  TRACING
>  M:   Steven Rostedt 
>  M:   Masami Hiramatsu 
> +R:   Mathieu Desnoyers 
>  L:   linux-kernel@vger.kernel.org
>  L:   linux-trace-ker...@vger.kernel.org
>  S:   Maintained
> -- 
> 2.39.2
> 


-- 
Masami Hiramatsu (Google)

[PATCH net] bpf: test_run: fix WARNING in format_decode

2023-11-21 Thread Edward Adam Davis

Confirm that skb->len is not 0 to ensure that skb length is valid.

Fixes: 114039b34201 ("bpf: Move skb->len == 0 checks into __bpf_redirect")
Reported-by: syzbot+e2c932aec5c8a6e1d...@syzkaller.appspotmail.com
Signed-off-by: Edward Adam Davis 
---
 net/bpf/test_run.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index c9fdcc5cdce1..78258a822a5c 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -845,6 +845,9 @@ static int convert___skb_to_skb(struct sk_buff *skb, struct 
__sk_buff *__skb)
 {
struct qdisc_skb_cb *cb = (struct qdisc_skb_cb *)skb->cb;
 
+   if (!skb->len)
+   return -EINVAL;
+
if (!__skb)
return 0;
 
-- 
2.26.1

[ANNOUNCE] 5.10.201-rt98

2023-11-21 Thread Luis Claudio R. Goncalves

Hello RT-list!

I'm pleased to announce the 5.10.201-rt98 stable release.

This release is just an update to the new stable 5.10.201
version and no RT changes have been made.

You can get this release via the git tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

  branch: v5.10-rt
  Head SHA1: 3a93f0a0d49dd0db4c6876ca9a7369350e64320e

Or to build 5.10.201-rt98 directly, the following patches should be applied:

  https://www.kernel.org/pub/linux/kernel/v5.x/linux-5.10.tar.xz

  https://www.kernel.org/pub/linux/kernel/v5.x/patch-5.10.201.xz

  
https://www.kernel.org/pub/linux/kernel/projects/rt/5.10/older/patch-5.10.201-rt98.patch.xz

Signing key fingerprint:

  9354 0649 9972 8D31 D464  D140 F394 A423 F8E6 7C26

All keys used for the above files and repositories can be found on the
following git repository:

   git://git.kernel.org/pub/scm/docs/kernel/pgpkeys.git

Enjoy!
Luis

[ANNOUNCE] 4.14.330-rt157

2023-11-21 Thread Luis Claudio R. Goncalves

Hello RT-list!

I'm pleased to announce the 4.14.330-rt157 stable release.

This release is just an update to the new stable 4.14.330
version and no RT changes have been made.

You can get this release via the git tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

  branch: v4.14-rt
  Head SHA1: 7c25b98670e2f172f2813e18a39c02bf13b9849e

Or to build 4.14.330-rt157 directly, the following patches should be applied:

  https://www.kernel.org/pub/linux/kernel/v4.x/linux-4.14.tar.xz

  https://www.kernel.org/pub/linux/kernel/v4.x/patch-4.14.330.xz

  
https://www.kernel.org/pub/linux/kernel/projects/rt/4.14/older/patch-4.14.330-rt157.patch.xz

Signing key fingerprint:

  9354 0649 9972 8D31 D464  D140 F394 A423 F8E6 7C26

All keys used for the above files and repositories can be found on the
following git repository:

   git://git.kernel.org/pub/scm/docs/kernel/pgpkeys.git

Enjoy!
Luis

Re: [PATCH RFC v2 20/27] mm: hugepage: Handle huge page fault on access

2023-11-21 Thread Peter Collingbourne

On Sun, Nov 19, 2023 at 8:59 AM Alexandru Elisei
 wrote:
>
> Handle PAGE_FAULT_ON_ACCESS faults for huge pages in a similar way to
> regular pages.
>
> Signed-off-by: Alexandru Elisei 
> ---
>  arch/arm64/include/asm/mte_tag_storage.h |  1 +
>  arch/arm64/include/asm/pgtable.h |  7 ++
>  arch/arm64/mm/fault.c| 81 
>  include/linux/huge_mm.h  |  2 +
>  include/linux/pgtable.h  |  5 ++
>  mm/huge_memory.c |  4 +-
>  mm/memory.c  |  3 +
>  7 files changed, 101 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/include/asm/mte_tag_storage.h 
> b/arch/arm64/include/asm/mte_tag_storage.h
> index c70ced60a0cd..b97406d369ce 100644
> --- a/arch/arm64/include/asm/mte_tag_storage.h
> +++ b/arch/arm64/include/asm/mte_tag_storage.h
> @@ -35,6 +35,7 @@ void free_tag_storage(struct page *page, int order);
>  bool page_tag_storage_reserved(struct page *page);
>
>  vm_fault_t handle_page_missing_tag_storage(struct vm_fault *vmf);
> +vm_fault_t handle_huge_page_missing_tag_storage(struct vm_fault *vmf);
>  #else
>  static inline bool tag_storage_enabled(void)
>  {
> diff --git a/arch/arm64/include/asm/pgtable.h 
> b/arch/arm64/include/asm/pgtable.h
> index 8cc135f1c112..1704411c096d 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -477,6 +477,13 @@ static inline vm_fault_t 
> arch_do_page_fault_on_access(struct vm_fault *vmf)
> return handle_page_missing_tag_storage(vmf);
> return VM_FAULT_SIGBUS;
>  }
> +
> +static inline vm_fault_t arch_do_huge_page_fault_on_access(struct vm_fault 
> *vmf)
> +{
> +   if (tag_storage_enabled())
> +   return handle_huge_page_missing_tag_storage(vmf);
> +   return VM_FAULT_SIGBUS;
> +}
>  #endif /* CONFIG_ARCH_HAS_FAULT_ON_ACCESS */
>
>  #define pmd_present_invalid(pmd) (!!(pmd_val(pmd) & PMD_PRESENT_INVALID))
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index f5fa583acf18..6730a0812a24 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -1041,6 +1041,87 @@ vm_fault_t handle_page_missing_tag_storage(struct 
> vm_fault *vmf)
>
> return 0;
>
> +out_retry:
> +   put_page(page);
> +   if (vmf->flags & FAULT_FLAG_VMA_LOCK)
> +   vma_end_read(vma);
> +   if (fault_flag_allow_retry_first(vmf->flags)) {
> +   err = VM_FAULT_RETRY;
> +   } else {
> +   /* Replay the fault. */
> +   err = 0;
> +   }
> +   return err;
> +}
> +
> +vm_fault_t handle_huge_page_missing_tag_storage(struct vm_fault *vmf)
> +{
> +   unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
> +   struct vm_area_struct *vma = vmf->vma;
> +   pmd_t old_pmd, new_pmd;
> +   bool writable = false;
> +   struct page *page;
> +   vm_fault_t err;
> +   int ret;
> +
> +   vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
> +   if (unlikely(!pmd_same(vmf->orig_pmd, *vmf->pmd))) {
> +   spin_unlock(vmf->ptl);
> +   return 0;
> +   }
> +
> +   old_pmd = vmf->orig_pmd;
> +   new_pmd = pmd_modify(old_pmd, vma->vm_page_prot);
> +
> +   /*
> +* Detect now whether the PMD could be writable; this information
> +* is only valid while holding the PT lock.
> +*/
> +   writable = pmd_write(new_pmd);
> +   if (!writable && vma_wants_manual_pte_write_upgrade(vma) &&
> +   can_change_pmd_writable(vma, vmf->address, new_pmd))
> +   writable = true;
> +
> +   page = vm_normal_page_pmd(vma, haddr, new_pmd);
> +   if (!page)
> +   goto out_map;
> +
> +   if (!(vma->vm_flags & VM_MTE))
> +   goto out_map;
> +
> +   get_page(page);
> +   vma_set_access_pid_bit(vma);
> +
> +   spin_unlock(vmf->ptl);
> +   writable = false;
> +
> +   if (unlikely(is_migrate_isolate_page(page)))
> +   goto out_retry;
> +
> +   ret = reserve_tag_storage(page, HPAGE_PMD_ORDER, 
> GFP_HIGHUSER_MOVABLE);
> +   if (ret)
> +   goto out_retry;
> +
> +   put_page(page);
> +
> +   vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
> +   if (unlikely(!pmd_same(old_pmd, *vmf->pmd))) {
> +   spin_unlock(vmf->ptl);
> +   return 0;
> +   }
> +
> +out_map:
> +   /* Restore the PMD */
> +   new_pmd = pmd_modify(old_pmd, vma->vm_page_prot);
> +   new_pmd = pmd_mkyoung(new_pmd);
> +   if (writable)
> +   new_pmd = pmd_mkwrite(new_pmd, vma);
> +   set_pmd_at(vma->vm_mm, haddr, vmf->pmd, new_pmd);
> +   update_mmu_cache_pmd(vma, vmf->address, vmf->pmd);
> +   spin_unlock(vmf->ptl);
> +
> +   return 0;
> +
>  out_retry:
> put_page(page);
> if (vmf->flags & FAULT_FLAG_VMA_LOCK)
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index fa0350b0812

Re: [PATCH net] bpf: test_run: fix WARNING in format_decode

2023-11-21 Thread Yonghong Song




On 11/21/23 7:50 PM, Edward Adam Davis wrote:

Confirm that skb->len is not 0 to ensure that skb length is valid.

Fixes: 114039b34201 ("bpf: Move skb->len == 0 checks into __bpf_redirect")
Reported-by: syzbot+e2c932aec5c8a6e1d...@syzkaller.appspotmail.com
Signed-off-by: Edward Adam Davis 


Stan, Could you take a look at this patch?



---
  net/bpf/test_run.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index c9fdcc5cdce1..78258a822a5c 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -845,6 +845,9 @@ static int convert___skb_to_skb(struct sk_buff *skb, struct 
__sk_buff *__skb)
  {
struct qdisc_skb_cb *cb = (struct qdisc_skb_cb *)skb->cb;
  
+	if (!skb->len)

+   return -EINVAL;
+
if (!__skb)
return 0;

59 matches

Mail list logo