Re: [PATCH V11 03/17] riscv: Use Zicbop in arch_xchg when available
On Tue, Sep 19, 2023 at 1:13 PM Leonardo Bras wrote: > > On Sun, Sep 17, 2023 at 10:34:36PM +0800, Guo Ren wrote: > > On Sat, Sep 16, 2023 at 9:25 AM Leonardo Bras wrote: > > > > > > On Fri, Sep 15, 2023 at 08:36:31PM +0800, Guo Ren wrote: > > > > On Wed, Sep 13, 2023 at 4:50 PM Leonardo Bras > > > > wrote: > > > > > > > > > > On Sun, Sep 10, 2023 at 04:28:57AM -0400, guo...@kernel.org wrote: > > > > > > From: Guo Ren > > > > > > > > > > > > Cache-block prefetch instructions are HINTs to the hardware to > > > > > > indicate that software intends to perform a particular type of > > > > > > memory access in the near future. Enable ARCH_HAS_PREFETCHW and > > > > > > improve the arch_xchg for qspinlock xchg_tail. > > > > > > > > > > > > Signed-off-by: Guo Ren > > > > > > Signed-off-by: Guo Ren > > > > > > --- > > > > > > arch/riscv/Kconfig | 15 +++ > > > > > > arch/riscv/include/asm/cmpxchg.h | 4 +++- > > > > > > arch/riscv/include/asm/hwcap.h | 1 + > > > > > > arch/riscv/include/asm/insn-def.h | 5 + > > > > > > arch/riscv/include/asm/processor.h | 13 + > > > > > > arch/riscv/kernel/cpufeature.c | 1 + > > > > > > 6 files changed, 38 insertions(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig > > > > > > index e9ae6fa232c3..2c346fe169c1 100644 > > > > > > --- a/arch/riscv/Kconfig > > > > > > +++ b/arch/riscv/Kconfig > > > > > > @@ -617,6 +617,21 @@ config RISCV_ISA_ZICBOZ > > > > > > > > > > > > If you don't know what to do here, say Y. > > > > > > > > > > > > +config RISCV_ISA_ZICBOP > > > > > > + bool "Zicbop extension support for cache block prefetch" > > > > > > + depends on MMU > > > > > > + depends on RISCV_ALTERNATIVE > > > > > > + default y > > > > > > + help > > > > > > +Adds support to dynamically detect the presence of the > > > > > > ZICBOP > > > > > > +extension (Cache Block Prefetch Operations) and enable its > > > > > > +usage. > > > > > > + > > > > > > +The Zicbop extension can be used to prefetch cache block > > > > > > for > > > > > > +read/write/instruction fetch. > > > > > > + > > > > > > +If you don't know what to do here, say Y. > > > > > > + > > > > > > config TOOLCHAIN_HAS_ZIHINTPAUSE > > > > > > bool > > > > > > default y > > > > > > diff --git a/arch/riscv/include/asm/cmpxchg.h > > > > > > b/arch/riscv/include/asm/cmpxchg.h > > > > > > index 702725727671..56eff7a9d2d2 100644 > > > > > > --- a/arch/riscv/include/asm/cmpxchg.h > > > > > > +++ b/arch/riscv/include/asm/cmpxchg.h > > > > > > @@ -11,6 +11,7 @@ > > > > > > > > > > > > #include > > > > > > #include > > > > > > +#include > > > > > > > > > > > > #define __arch_xchg_masked(prepend, append, r, p, n) > > > > > > \ > > > > > > ({ > > > > > > \ > > > > > > @@ -25,6 +26,7 @@ > > > > > > > > > > > > \ > > > > > > __asm__ __volatile__ ( > > > > > > \ > > > > > > prepend > > > > > > \ > > > > > > +PREFETCHW_ASM(%5) > > > > > > \ > > > > > > "0: lr.w %0, %2\n" > > > > > > \ > > > > > > "and %1, %0, %z4\n" > > > > > > \ > > > > > > "or %1, %1, %z3\n" > > > > > > \ > > > > > > @@ -32,7 +34,7 @@ > > > > > > "bnez %1, 0b\n" > > > > > > \ > > > > > > append > > > > > > \ > > > > > > : "=&r" (__retx), "=&r" (__rc), "+A" (*(__ptr32b)) > > > > > > \ > > > > > > -: "rJ" (__newx), "rJ" (~__mask) > > > > > > \ > > > > > > +: "rJ" (__newx), "rJ" (~__mask), "rJ" (__ptr32b) > > > > > > \ > > > > > > : "memory"); > > > > > > \ > > > > > > > > > > > > \ > > > > > > r = (__typeof__(*(p)))((__retx & __mask) >> __s); > > > > > > \ > > > > > > diff --git a/arch/riscv/include/asm/hwcap.h > > > > > > b/arch/riscv/include/asm/hwcap.h > > > > > > index b7b58258f6c7..78b7b8b53778 100644 > > > > > > --- a/arch/riscv/include/asm/hwcap.h > > > > > > +++ b/arch/riscv/include/asm/hwcap.h > > > > > > @@ -58,6 +58,7 @@ > > > > > > #define RISCV_ISA_EXT_ZICSR 40 > > > > > > #define RISCV_ISA_EXT_ZIFENCEI 41 > > > > > > #define RISCV_ISA_EXT_ZIHPM 42 > > > > > > +#define RISCV_ISA_EXT_ZICBOP 43 > > > > > > >
Re: [PATCH V11 08/17] riscv: qspinlock: Add virt_spin_lock() support for KVM guest
On Tue, Sep 19, 2023 at 1:30 PM Leonardo Bras wrote: > > On Sun, Sep 17, 2023 at 11:12:31PM +0800, Guo Ren wrote: > > On Thu, Sep 14, 2023 at 4:02 PM Leonardo Bras wrote: > > > > > > On Sun, Sep 10, 2023 at 04:29:02AM -0400, guo...@kernel.org wrote: > > > > From: Guo Ren > > > > > > > > Add a static key controlling whether virt_spin_lock() should be > > > > called or not. When running on bare metal set the new key to > > > > false. > > > > > > > > The KVM guests fall back to a Test-and-Set spinlock, because fair > > > > locks have horrible lock 'holder' preemption issues. The > > > > virt_spin_lock_key would shortcut for the > > > > queued_spin_lock_slowpath() function that allow virt_spin_lock to > > > > hijack it. > > > > > > > > Signed-off-by: Guo Ren > > > > Signed-off-by: Guo Ren > > > > --- > > > > .../admin-guide/kernel-parameters.txt | 4 +++ > > > > arch/riscv/include/asm/sbi.h | 8 + > > > > arch/riscv/include/asm/spinlock.h | 22 ++ > > > > arch/riscv/kernel/sbi.c | 2 +- > > > > arch/riscv/kernel/setup.c | 30 ++- > > > > 5 files changed, 64 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/Documentation/admin-guide/kernel-parameters.txt > > > > b/Documentation/admin-guide/kernel-parameters.txt > > > > index 61cacb8dfd0e..f75bedc50e00 100644 > > > > --- a/Documentation/admin-guide/kernel-parameters.txt > > > > +++ b/Documentation/admin-guide/kernel-parameters.txt > > > > @@ -3927,6 +3927,10 @@ > > > > no_uaccess_flush > > > > [PPC] Don't flush the L1-D cache after accessing > > > > user data. > > > > > > > > + no_virt_spin[RISC-V] Disable virt_spin_lock in KVM guest to > > > > use > > > > + native_queued_spinlock when the nopvspin option > > > > is enabled. > > > > + This would help vcpu=pcpu scenarios. > > > > + > > > > novmcoredd [KNL,KDUMP] > > > > Disable device dump. Device dump allows drivers to > > > > append dump data to vmcore so you can collect > > > > driver > > > > diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h > > > > index 501e06e52078..e0233b3d7a5f 100644 > > > > --- a/arch/riscv/include/asm/sbi.h > > > > +++ b/arch/riscv/include/asm/sbi.h > > > > @@ -50,6 +50,13 @@ enum sbi_ext_base_fid { > > > > SBI_EXT_BASE_GET_MIMPID, > > > > }; > > > > > > > > +enum sbi_ext_base_impl_id { > > > > + SBI_EXT_BASE_IMPL_ID_BBL = 0, > > > > + SBI_EXT_BASE_IMPL_ID_OPENSBI, > > > > + SBI_EXT_BASE_IMPL_ID_XVISOR, > > > > + SBI_EXT_BASE_IMPL_ID_KVM, > > > > +}; > > > > + > > > > enum sbi_ext_time_fid { > > > > SBI_EXT_TIME_SET_TIMER = 0, > > > > }; > > > > @@ -269,6 +276,7 @@ int sbi_console_getchar(void); > > > > long sbi_get_mvendorid(void); > > > > long sbi_get_marchid(void); > > > > long sbi_get_mimpid(void); > > > > +long sbi_get_firmware_id(void); > > > > void sbi_set_timer(uint64_t stime_value); > > > > void sbi_shutdown(void); > > > > void sbi_send_ipi(unsigned int cpu); > > > > diff --git a/arch/riscv/include/asm/spinlock.h > > > > b/arch/riscv/include/asm/spinlock.h > > > > index 8ea0fee80652..6b38d6616f14 100644 > > > > --- a/arch/riscv/include/asm/spinlock.h > > > > +++ b/arch/riscv/include/asm/spinlock.h > > > > @@ -4,6 +4,28 @@ > > > > #define __ASM_RISCV_SPINLOCK_H > > > > > > > > #ifdef CONFIG_QUEUED_SPINLOCKS > > > > +/* > > > > + * The KVM guests fall back to a Test-and-Set spinlock, because fair > > > > locks > > > > + * have horrible lock 'holder' preemption issues. The > > > > virt_spin_lock_key > > > > + * would shortcut for the queued_spin_lock_slowpath() function that > > > > allow > > > > + * virt_spin_lock to hijack it. > > > > + */ > > > > +DECLARE_STATIC_KEY_TRUE(virt_spin_lock_key); > > > > + > > > > +#define virt_spin_lock virt_spin_lock > > > > +static inline bool virt_spin_lock(struct qspinlock *lock) > > > > +{ > > > > + if (!static_branch_likely(&virt_spin_lock_key)) > > > > + return false; > > > > + > > > > + do { > > > > + while (atomic_read(&lock->val) != 0) > > > > + cpu_relax(); > > > > + } while (atomic_cmpxchg(&lock->val, 0, _Q_LOCKED_VAL) != 0); > > > > + > > > > + return true; > > > > +} > > > > + > > > > #define _Q_PENDING_LOOPS (1 << 9) > > > > #endif > > > > > > > > diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c > > > > index 88eea3a99ee0..cdd45edc8db4 100644 > > > > --- a/arch/riscv/kernel/sbi.c > > > > +++ b/arch/riscv/kernel/sbi.c > > > > @@ -555,7 +555,7 @@ static inline long sbi_get_spec_version(void) > > > > return __sbi_base_ecall(SBI_EXT_BASE_GET_SPEC_VERSION); > > > > } > > > > > > > > -static inline long sbi_get_firmware_id(void) > > > > +long sbi_get_firmware_id(void) > > > > { > > > > return __sbi_base_ecall(SBI_EXT_BASE_
Re: [PATCH v10 36/38] x86/fred: Add fred_syscall_init()
On Wed, Sep 13 2023 at 21:48, Xin Li wrote: > +static inline void fred_syscall_init(void) > +{ > + /* > + * Per FRED spec 5.0, FRED uses the ring 3 FRED entrypoint for SYSCALL > + * and SYSENTER, and ERETU is the only legit instruction to return to > + * ring 3, as a result there is _no_ need to setup the SYSCALL and > + * SYSENTER MSRs. > + * > + * Note, both sysexit and sysret cause #UD when FRED is enabled. > + */ > + wrmsrl(MSR_LSTAR, 0ULL); > + wrmsrl_cstar(0ULL); That write is pointless. See the comment in wrmsrl_cstar(). Thanks, tglx
[PATCH] rtla: fix a example in rtla-timerlat-hist.rst
From: Xie XiuQi The following error message is reported when running the example in document. # timerlat hist -d 10m -c 0-4 -P d:100us:1ms -p 1ms --no-aa Failed to set timerlat period Could not apply config The unit of the period is microsecond, '1ms' cannot be accepted. usage: [rtla] timerlat hist [-h] [-q] [-d s] [-D] [-n] [-a us] [-p us] [-i us] [-T us] [-s us] ... ... -p/--period us: timerlat period in us ... Also fix another minor missleading comment. Signed-off-by: Xie XiuQi --- Documentation/tools/rtla/rtla-timerlat-hist.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/tools/rtla/rtla-timerlat-hist.rst b/Documentation/tools/rtla/rtla-timerlat-hist.rst index 057db78d4095..03b7f3deb069 100644 --- a/Documentation/tools/rtla/rtla-timerlat-hist.rst +++ b/Documentation/tools/rtla/rtla-timerlat-hist.rst @@ -36,11 +36,11 @@ EXAMPLE In the example below, **rtla timerlat hist** is set to run for *10* minutes, in the cpus *0-4*, *skipping zero* only lines. Moreover, **rtla timerlat hist** will change the priority of the *timerlat* threads to run under -*SCHED_DEADLINE* priority, with a *10us* runtime every *1ms* period. The +*SCHED_DEADLINE* priority, with a *100us* runtime every *1ms* period. The *1ms* period is also passed to the *timerlat* tracer. Auto-analysis is disabled to reduce overhead :: - [root@alien ~]# timerlat hist -d 10m -c 0-4 -P d:100us:1ms -p 1ms --no-aa + [root@alien ~]# timerlat hist -d 10m -c 0-4 -P d:100us:1ms -p 1000 --no-aa # RTLA timerlat histogram # Time unit is microseconds (us) # Duration: 0 00:10:00 -- 2.25.1
Re: [RFC PATCH 00/14] Prevent cross-cache attacks in the SLUB allocator
On Mon, 18 Sept 2023 at 19:39, Ingo Molnar wrote: > > What's the split of the increase in overhead due to SLAB_VIRTUAL=y, between > user-space execution and kernel-space execution? > Same benchmark as before (compiling a kernel on a system running the patched kernel): Intel Skylake: LABEL| COUNT | MIN| MAX| MEAN | MEDIAN | STDDEV ---+---+--+--+--+--+ wall clock | | | | | | SLAB_VIRTUAL=n | 150 | 49.700 | 51.320 | 50.449 | 50.430 | 0.29959 SLAB_VIRTUAL=y | 150 | 50.020 | 51.660 | 50.880 | 50.880 | 0.30495 | | +0.64% | +0.66% | +0.85% | +0.89% | +1.79% system time| | | | | | SLAB_VIRTUAL=n | 150 | 358.560 | 362.900 | 360.922 | 360.985 | 0.91761 SLAB_VIRTUAL=y | 150 | 362.970 | 367.970 | 366.062 | 366.115 | 1.015 | | +1.23% | +1.40% | +1.42% | +1.42% | +10.60% user time | | | | | | SLAB_VIRTUAL=n | 150 | 3110.000 | 3124.520 | 3118.143 | 3118.120 | 2.466 SLAB_VIRTUAL=y | 150 | 3115.070 | 3127.070 | 3120.762 | 3120.925 | 2.654 | | +0.16% | +0.08% | +0.08% | +0.09% | +7.63% AMD Milan: LABEL| COUNT | MIN| MAX| MEAN | MEDIAN | STDDEV ---+---+--+--+--+--+ wall clock | | | | | | SLAB_VIRTUAL=n | 150 | 25.480 | 26.550 | 26.065 | 26.055 | 0.23495 SLAB_VIRTUAL=y | 150 | 25.820 | 27.080 | 26.531 | 26.540 | 0.25974 | | +1.33% | +2.00% | +1.79% | +1.86% | +10.55% system time| | | | | | SLAB_VIRTUAL=n | 150 | 478.530 | 540.420 | 520.803 | 521.485 | 9.166 SLAB_VIRTUAL=y | 150 | 530.520 | 572.460 | 552.825 | 552.985 | 7.161 | | +10.86% | +5.93% | +6.15% | +6.04% | -21.88% user time | | | | | | SLAB_VIRTUAL=n | 150 | 2373.540 | 2403.800 | 2386.343 | 2385.840 | 5.325 SLAB_VIRTUAL=y | 150 | 2388.690 | 2426.290 | 2408.325 | 2408.895 | 6.667 | | +0.64% | +0.94% | +0.92% | +0.97% | +25.20% I'm not exactly sure why user time increases by almost 1% on Milan, it could be TLB contention. -- Matteo
Re: [PATCH] rtla: fix a example in rtla-timerlat-hist.rst
On 9/19/23 15:30, Xie XiuQi wrote: > From: Xie XiuQi > > The following error message is reported when running the example in document. > > # timerlat hist -d 10m -c 0-4 -P d:100us:1ms -p 1ms --no-aa > Failed to set timerlat period > Could not apply config > > The unit of the period is microsecond, '1ms' cannot be accepted. right, but we could also parse the -p accordingly, no? as well as the -r for osnoise and hwnoise... and other us only parameters > usage: [rtla] timerlat hist [-h] [-q] [-d s] [-D] [-n] [-a us] [-p us] [-i > us] [-T us] [-s us] ... > ... > -p/--period us: timerlat period in us > ... > > Also fix another minor missleading comment. > > Signed-off-by: Xie XiuQi
Re: [PATCH V11 03/17] riscv: Use Zicbop in arch_xchg when available
On Tue, Sep 19, 2023 at 03:53:22PM +0800, Guo Ren wrote: > On Tue, Sep 19, 2023 at 1:13 PM Leonardo Bras wrote: > > > > On Sun, Sep 17, 2023 at 10:34:36PM +0800, Guo Ren wrote: > > > On Sat, Sep 16, 2023 at 9:25 AM Leonardo Bras wrote: > > > > > > > > On Fri, Sep 15, 2023 at 08:36:31PM +0800, Guo Ren wrote: > > > > > On Wed, Sep 13, 2023 at 4:50 PM Leonardo Bras > > > > > wrote: > > > > > > > > > > > > On Sun, Sep 10, 2023 at 04:28:57AM -0400, guo...@kernel.org wrote: > > > > > > > From: Guo Ren > > > > > > > > > > > > > > Cache-block prefetch instructions are HINTs to the hardware to > > > > > > > indicate that software intends to perform a particular type of > > > > > > > memory access in the near future. Enable ARCH_HAS_PREFETCHW and > > > > > > > improve the arch_xchg for qspinlock xchg_tail. > > > > > > > > > > > > > > Signed-off-by: Guo Ren > > > > > > > Signed-off-by: Guo Ren > > > > > > > --- > > > > > > > arch/riscv/Kconfig | 15 +++ > > > > > > > arch/riscv/include/asm/cmpxchg.h | 4 +++- > > > > > > > arch/riscv/include/asm/hwcap.h | 1 + > > > > > > > arch/riscv/include/asm/insn-def.h | 5 + > > > > > > > arch/riscv/include/asm/processor.h | 13 + > > > > > > > arch/riscv/kernel/cpufeature.c | 1 + > > > > > > > 6 files changed, 38 insertions(+), 1 deletion(-) > > > > > > > > > > > > > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig > > > > > > > index e9ae6fa232c3..2c346fe169c1 100644 > > > > > > > --- a/arch/riscv/Kconfig > > > > > > > +++ b/arch/riscv/Kconfig > > > > > > > @@ -617,6 +617,21 @@ config RISCV_ISA_ZICBOZ > > > > > > > > > > > > > > If you don't know what to do here, say Y. > > > > > > > > > > > > > > +config RISCV_ISA_ZICBOP > > > > > > > + bool "Zicbop extension support for cache block prefetch" > > > > > > > + depends on MMU > > > > > > > + depends on RISCV_ALTERNATIVE > > > > > > > + default y > > > > > > > + help > > > > > > > +Adds support to dynamically detect the presence of the > > > > > > > ZICBOP > > > > > > > +extension (Cache Block Prefetch Operations) and enable > > > > > > > its > > > > > > > +usage. > > > > > > > + > > > > > > > +The Zicbop extension can be used to prefetch cache block > > > > > > > for > > > > > > > +read/write/instruction fetch. > > > > > > > + > > > > > > > +If you don't know what to do here, say Y. > > > > > > > + > > > > > > > config TOOLCHAIN_HAS_ZIHINTPAUSE > > > > > > > bool > > > > > > > default y > > > > > > > diff --git a/arch/riscv/include/asm/cmpxchg.h > > > > > > > b/arch/riscv/include/asm/cmpxchg.h > > > > > > > index 702725727671..56eff7a9d2d2 100644 > > > > > > > --- a/arch/riscv/include/asm/cmpxchg.h > > > > > > > +++ b/arch/riscv/include/asm/cmpxchg.h > > > > > > > @@ -11,6 +11,7 @@ > > > > > > > > > > > > > > #include > > > > > > > #include > > > > > > > +#include > > > > > > > > > > > > > > #define __arch_xchg_masked(prepend, append, r, p, n) > > > > > > > \ > > > > > > > ({ > > > > > > > \ > > > > > > > @@ -25,6 +26,7 @@ > > > > > > > > > > > > > > \ > > > > > > > __asm__ __volatile__ ( > > > > > > > \ > > > > > > > prepend > > > > > > > \ > > > > > > > +PREFETCHW_ASM(%5) > > > > > > > \ > > > > > > > "0: lr.w %0, %2\n" > > > > > > > \ > > > > > > > "and %1, %0, %z4\n" > > > > > > > \ > > > > > > > "or %1, %1, %z3\n" > > > > > > > \ > > > > > > > @@ -32,7 +34,7 @@ > > > > > > > "bnez %1, 0b\n" > > > > > > > \ > > > > > > > append > > > > > > > \ > > > > > > > : "=&r" (__retx), "=&r" (__rc), "+A" (*(__ptr32b)) > > > > > > > \ > > > > > > > -: "rJ" (__newx), "rJ" (~__mask) > > > > > > > \ > > > > > > > +: "rJ" (__newx), "rJ" (~__mask), "rJ" (__ptr32b) > > > > > > > \ > > > > > > > : "memory"); > > > > > > > \ > > > > > > > > > > > > > > \ > > > > > > > r = (__typeof__(*(p)))((__retx & __mask) >> __s); > > > > > > > \ > > > > > > > diff --git a/arch/riscv/include/asm/hwcap.h > > > > > > > b/arch/riscv/include/asm/hwcap.h > > > > > > > index b7b58258f6c7..78b7b8b53778 100644 > > > > > > > --- a/arch/riscv/include/asm/hwcap.h > > > > > > > +++ b/arch/ri
Re: [PATCH V11 08/17] riscv: qspinlock: Add virt_spin_lock() support for KVM guest
On Tue, Sep 19, 2023 at 04:04:48PM +0800, Guo Ren wrote: > On Tue, Sep 19, 2023 at 1:30 PM Leonardo Bras wrote: > > > > On Sun, Sep 17, 2023 at 11:12:31PM +0800, Guo Ren wrote: > > > On Thu, Sep 14, 2023 at 4:02 PM Leonardo Bras wrote: > > > > > > > > On Sun, Sep 10, 2023 at 04:29:02AM -0400, guo...@kernel.org wrote: > > > > > From: Guo Ren > > > > > > > > > > Add a static key controlling whether virt_spin_lock() should be > > > > > called or not. When running on bare metal set the new key to > > > > > false. > > > > > > > > > > The KVM guests fall back to a Test-and-Set spinlock, because fair > > > > > locks have horrible lock 'holder' preemption issues. The > > > > > virt_spin_lock_key would shortcut for the > > > > > queued_spin_lock_slowpath() function that allow virt_spin_lock to > > > > > hijack it. > > > > > > > > > > Signed-off-by: Guo Ren > > > > > Signed-off-by: Guo Ren > > > > > --- > > > > > .../admin-guide/kernel-parameters.txt | 4 +++ > > > > > arch/riscv/include/asm/sbi.h | 8 + > > > > > arch/riscv/include/asm/spinlock.h | 22 ++ > > > > > arch/riscv/kernel/sbi.c | 2 +- > > > > > arch/riscv/kernel/setup.c | 30 > > > > > ++- > > > > > 5 files changed, 64 insertions(+), 2 deletions(-) > > > > > > > > > > diff --git a/Documentation/admin-guide/kernel-parameters.txt > > > > > b/Documentation/admin-guide/kernel-parameters.txt > > > > > index 61cacb8dfd0e..f75bedc50e00 100644 > > > > > --- a/Documentation/admin-guide/kernel-parameters.txt > > > > > +++ b/Documentation/admin-guide/kernel-parameters.txt > > > > > @@ -3927,6 +3927,10 @@ > > > > > no_uaccess_flush > > > > > [PPC] Don't flush the L1-D cache after > > > > > accessing user data. > > > > > > > > > > + no_virt_spin[RISC-V] Disable virt_spin_lock in KVM guest to > > > > > use > > > > > + native_queued_spinlock when the nopvspin option > > > > > is enabled. > > > > > + This would help vcpu=pcpu scenarios. > > > > > + > > > > > novmcoredd [KNL,KDUMP] > > > > > Disable device dump. Device dump allows drivers > > > > > to > > > > > append dump data to vmcore so you can collect > > > > > driver > > > > > diff --git a/arch/riscv/include/asm/sbi.h > > > > > b/arch/riscv/include/asm/sbi.h > > > > > index 501e06e52078..e0233b3d7a5f 100644 > > > > > --- a/arch/riscv/include/asm/sbi.h > > > > > +++ b/arch/riscv/include/asm/sbi.h > > > > > @@ -50,6 +50,13 @@ enum sbi_ext_base_fid { > > > > > SBI_EXT_BASE_GET_MIMPID, > > > > > }; > > > > > > > > > > +enum sbi_ext_base_impl_id { > > > > > + SBI_EXT_BASE_IMPL_ID_BBL = 0, > > > > > + SBI_EXT_BASE_IMPL_ID_OPENSBI, > > > > > + SBI_EXT_BASE_IMPL_ID_XVISOR, > > > > > + SBI_EXT_BASE_IMPL_ID_KVM, > > > > > +}; > > > > > + > > > > > enum sbi_ext_time_fid { > > > > > SBI_EXT_TIME_SET_TIMER = 0, > > > > > }; > > > > > @@ -269,6 +276,7 @@ int sbi_console_getchar(void); > > > > > long sbi_get_mvendorid(void); > > > > > long sbi_get_marchid(void); > > > > > long sbi_get_mimpid(void); > > > > > +long sbi_get_firmware_id(void); > > > > > void sbi_set_timer(uint64_t stime_value); > > > > > void sbi_shutdown(void); > > > > > void sbi_send_ipi(unsigned int cpu); > > > > > diff --git a/arch/riscv/include/asm/spinlock.h > > > > > b/arch/riscv/include/asm/spinlock.h > > > > > index 8ea0fee80652..6b38d6616f14 100644 > > > > > --- a/arch/riscv/include/asm/spinlock.h > > > > > +++ b/arch/riscv/include/asm/spinlock.h > > > > > @@ -4,6 +4,28 @@ > > > > > #define __ASM_RISCV_SPINLOCK_H > > > > > > > > > > #ifdef CONFIG_QUEUED_SPINLOCKS > > > > > +/* > > > > > + * The KVM guests fall back to a Test-and-Set spinlock, because fair > > > > > locks > > > > > + * have horrible lock 'holder' preemption issues. The > > > > > virt_spin_lock_key > > > > > + * would shortcut for the queued_spin_lock_slowpath() function that > > > > > allow > > > > > + * virt_spin_lock to hijack it. > > > > > + */ > > > > > +DECLARE_STATIC_KEY_TRUE(virt_spin_lock_key); > > > > > + > > > > > +#define virt_spin_lock virt_spin_lock > > > > > +static inline bool virt_spin_lock(struct qspinlock *lock) > > > > > +{ > > > > > + if (!static_branch_likely(&virt_spin_lock_key)) > > > > > + return false; > > > > > + > > > > > + do { > > > > > + while (atomic_read(&lock->val) != 0) > > > > > + cpu_relax(); > > > > > + } while (atomic_cmpxchg(&lock->val, 0, _Q_LOCKED_VAL) != 0); > > > > > + > > > > > + return true; > > > > > +} > > > > > + > > > > > #define _Q_PENDING_LOOPS (1 << 9) > > > > > #endif > > > > > > > > > > diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c > > > > > index 88eea3a99ee0..cdd45edc8db4 100644 > > > > > --- a/arch/riscv/kernel/sbi.c > > > > > +++ b/arch/riscv/kernel/sbi.c
Re: [RFC PATCH 00/14] Prevent cross-cache attacks in the SLUB allocator
On Mon, 18 Sept 2023 at 20:05, Linus Torvalds wrote: > > ... and equally importantly, what about DMA? I'm not exactly sure what you mean by this, I don't think this should affect the performance of DMA. > Or what about the fixed-size slabs (aka kmalloc?) What's the point of > "never re-use the same address for a different slab", when the *same* > slab will contain different kinds of allocations anyway? There are a number of patches out there (for example the random_kmalloc series which recently got merged into v6.6) which attempt to segregate kmalloc'd objects into different caches to make exploitation harder. Another thing that we would like to have in the future is to segregate objects by type (like XNU's kalloc_type https://security.apple.com/blog/towards-the-next-generation-of-xnu-memory-safety/) which makes exploiting use-after-free by type confusion much harder or impossible. All of these mitigations can be bypassed very easily if the attacker can mount a cross-cache attack, which is what this series attempts to prevent. This is not only theoretical, we've seen attackers use this all the time in kCTF/kernelCTF submissions (for example https://ruia-ruia.github.io/2022/08/05/CVE-2022-29582-io-uring/). > I think the whole "make it one single compile-time option" model is > completely and fundamentally broken. Wouldn't making this toggleable at boot time or runtime make performance even worse? -- Matteo
Re: [RFC PATCH 00/14] Prevent cross-cache attacks in the SLUB allocator
On 9/19/23 06:42, Matteo Rizzo wrote: > On Mon, 18 Sept 2023 at 19:39, Ingo Molnar wrote: >> What's the split of the increase in overhead due to SLAB_VIRTUAL=y, between >> user-space execution and kernel-space execution? >> > Same benchmark as before (compiling a kernel on a system running the patched > kernel): Thanks for running those. One more situation that comes to mind is how this will act under memory pressure. Will some memory pressure make contention on 'slub_kworker_lock' visible or make the global TLB flushes less bearable? In any case, none of this looks _catastrophic_. It's surely a cost that some folks will pay. But I really do think it needs to be more dynamic. There are a _couple_ of reasons for this. If it's only a compile-time option, it's never going to get turned on except for maybe ChromeOS and the datacenter folks that are paranoid. I suspect the distros will never turn it on. A lot of questions get easier if you can disable/enable it at runtime. For instance, what do you do if the virtual area fills up? You _could_ just go back to handing out direct map addresses. Less secure? Yep. But better than crashing (for some folks). It also opens up the door to do this per-slab. That alone would be a handy debugging option.
Re: [RFC PATCH 00/14] Prevent cross-cache attacks in the SLUB allocator
On 9/19/23 08:48, Matteo Rizzo wrote: >> I think the whole "make it one single compile-time option" model is >> completely and fundamentally broken. > Wouldn't making this toggleable at boot time or runtime make performance > even worse? Maybe. But you can tolerate even more of a performance impact from a feature if the people that don't care can actually disable it. There are also plenty of ways to minimize the overhead of switching it on and off at runtime. Static branches are your best friend here.
Re: [RFC PATCH 00/14] Prevent cross-cache attacks in the SLUB allocator
On September 19, 2023 9:02:07 AM PDT, Dave Hansen wrote: >On 9/19/23 08:48, Matteo Rizzo wrote: >>> I think the whole "make it one single compile-time option" model is >>> completely and fundamentally broken. >> Wouldn't making this toggleable at boot time or runtime make performance >> even worse? > >Maybe. > >But you can tolerate even more of a performance impact from a feature if >the people that don't care can actually disable it. > >There are also plenty of ways to minimize the overhead of switching it >on and off at runtime. Static branches are your best friend here. Let's start with a boot time on/off toggle (no per-slab, no switch on out-of-space, etc). That should be sufficient for initial ease of use for testing, etc. But yes, using static_branch will nicely DTRT here. -- Kees Cook
Re: [RFC PATCH 00/14] Prevent cross-cache attacks in the SLUB allocator
On Tue, 19 Sept 2023 at 08:48, Matteo Rizzo wrote: > > On Mon, 18 Sept 2023 at 20:05, Linus Torvalds > wrote: > > > > ... and equally importantly, what about DMA? > > I'm not exactly sure what you mean by this, I don't think this should > affect the performance of DMA. I was more worried about just basic correctness. We've traditionally had a lot of issues with using virtual addresses for dma, simply because we've got random drivers, and I'm not entirely convinced that your "virt_to_phys()" update will catch it all. IOW, even on x86-64 - which is hopefully better than most architectures because it already has that double mapping issue - we have things like unsigned long paddr = (unsigned long)vaddr - __PAGE_OFFSET; in other places than just the __phys_addr() code. The one place I grepped for looks to be just boot-time AMD memory encryption, so wouldn't be any slab allocation, but ... Linus
RE: [PATCH v10 36/38] x86/fred: Add fred_syscall_init()
> > +static inline void fred_syscall_init(void) { > > + /* > > +* Per FRED spec 5.0, FRED uses the ring 3 FRED entrypoint for SYSCALL > > +* and SYSENTER, and ERETU is the only legit instruction to return to > > +* ring 3, as a result there is _no_ need to setup the SYSCALL and > > +* SYSENTER MSRs. > > +* > > +* Note, both sysexit and sysret cause #UD when FRED is enabled. > > +*/ > > + wrmsrl(MSR_LSTAR, 0ULL); > > + wrmsrl_cstar(0ULL); > > That write is pointless. See the comment in wrmsrl_cstar(). What I heard is that AMD is going to support FRED. Both LSTAR and CSTAR have no function when FRED is enabled, so maybe just do NOT write to them? Thanks! Xin