Re: [PATCH V11 03/17] riscv: Use Zicbop in arch_xchg when available

2023-09-19 Thread Guo Ren
On Tue, Sep 19, 2023 at 1:13 PM Leonardo Bras  wrote:
>
> On Sun, Sep 17, 2023 at 10:34:36PM +0800, Guo Ren wrote:
> > On Sat, Sep 16, 2023 at 9:25 AM Leonardo Bras  wrote:
> > >
> > > On Fri, Sep 15, 2023 at 08:36:31PM +0800, Guo Ren wrote:
> > > > On Wed, Sep 13, 2023 at 4:50 PM Leonardo Bras  
> > > > wrote:
> > > > >
> > > > > On Sun, Sep 10, 2023 at 04:28:57AM -0400, guo...@kernel.org wrote:
> > > > > > From: Guo Ren 
> > > > > >
> > > > > > Cache-block prefetch instructions are HINTs to the hardware to
> > > > > > indicate that software intends to perform a particular type of
> > > > > > memory access in the near future. Enable ARCH_HAS_PREFETCHW and
> > > > > > improve the arch_xchg for qspinlock xchg_tail.
> > > > > >
> > > > > > Signed-off-by: Guo Ren 
> > > > > > Signed-off-by: Guo Ren 
> > > > > > ---
> > > > > >  arch/riscv/Kconfig | 15 +++
> > > > > >  arch/riscv/include/asm/cmpxchg.h   |  4 +++-
> > > > > >  arch/riscv/include/asm/hwcap.h |  1 +
> > > > > >  arch/riscv/include/asm/insn-def.h  |  5 +
> > > > > >  arch/riscv/include/asm/processor.h | 13 +
> > > > > >  arch/riscv/kernel/cpufeature.c |  1 +
> > > > > >  6 files changed, 38 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > > > > > index e9ae6fa232c3..2c346fe169c1 100644
> > > > > > --- a/arch/riscv/Kconfig
> > > > > > +++ b/arch/riscv/Kconfig
> > > > > > @@ -617,6 +617,21 @@ config RISCV_ISA_ZICBOZ
> > > > > >
> > > > > >  If you don't know what to do here, say Y.
> > > > > >
> > > > > > +config RISCV_ISA_ZICBOP
> > > > > > + bool "Zicbop extension support for cache block prefetch"
> > > > > > + depends on MMU
> > > > > > + depends on RISCV_ALTERNATIVE
> > > > > > + default y
> > > > > > + help
> > > > > > +Adds support to dynamically detect the presence of the 
> > > > > > ZICBOP
> > > > > > +extension (Cache Block Prefetch Operations) and enable its
> > > > > > +usage.
> > > > > > +
> > > > > > +The Zicbop extension can be used to prefetch cache block 
> > > > > > for
> > > > > > +read/write/instruction fetch.
> > > > > > +
> > > > > > +If you don't know what to do here, say Y.
> > > > > > +
> > > > > >  config TOOLCHAIN_HAS_ZIHINTPAUSE
> > > > > >   bool
> > > > > >   default y
> > > > > > diff --git a/arch/riscv/include/asm/cmpxchg.h 
> > > > > > b/arch/riscv/include/asm/cmpxchg.h
> > > > > > index 702725727671..56eff7a9d2d2 100644
> > > > > > --- a/arch/riscv/include/asm/cmpxchg.h
> > > > > > +++ b/arch/riscv/include/asm/cmpxchg.h
> > > > > > @@ -11,6 +11,7 @@
> > > > > >
> > > > > >  #include 
> > > > > >  #include 
> > > > > > +#include 
> > > > > >
> > > > > >  #define __arch_xchg_masked(prepend, append, r, p, n)   
> > > > > >   \
> > > > > >  ({ 
> > > > > >   \
> > > > > > @@ -25,6 +26,7 @@
> > > > > > 
> > > > > >   \
> > > > > >   __asm__ __volatile__ (
> > > > > >   \
> > > > > >  prepend
> > > > > >   \
> > > > > > +PREFETCHW_ASM(%5)  
> > > > > >   \
> > > > > >  "0:  lr.w %0, %2\n"
> > > > > >   \
> > > > > >  "and  %1, %0, %z4\n"   
> > > > > >   \
> > > > > >  "or   %1, %1, %z3\n"   
> > > > > >   \
> > > > > > @@ -32,7 +34,7 @@
> > > > > >  "bnez %1, 0b\n"
> > > > > >   \
> > > > > >  append 
> > > > > >   \
> > > > > >  : "=&r" (__retx), "=&r" (__rc), "+A" (*(__ptr32b)) 
> > > > > >   \
> > > > > > -: "rJ" (__newx), "rJ" (~__mask)
> > > > > >   \
> > > > > > +: "rJ" (__newx), "rJ" (~__mask), "rJ" (__ptr32b)   
> > > > > >   \
> > > > > >  : "memory");   
> > > > > >   \
> > > > > > 
> > > > > >   \
> > > > > >   r = (__typeof__(*(p)))((__retx & __mask) >> __s); 
> > > > > >   \
> > > > > > diff --git a/arch/riscv/include/asm/hwcap.h 
> > > > > > b/arch/riscv/include/asm/hwcap.h
> > > > > > index b7b58258f6c7..78b7b8b53778 100644
> > > > > > --- a/arch/riscv/include/asm/hwcap.h
> > > > > > +++ b/arch/riscv/include/asm/hwcap.h
> > > > > > @@ -58,6 +58,7 @@
> > > > > >  #define RISCV_ISA_EXT_ZICSR  40
> > > > > >  #define RISCV_ISA_EXT_ZIFENCEI   41
> > > > > >  #define RISCV_ISA_EXT_ZIHPM  42
> > > > > > +#define RISCV_ISA_EXT_ZICBOP 43
> > > > > >
> 

Re: [PATCH V11 08/17] riscv: qspinlock: Add virt_spin_lock() support for KVM guest

2023-09-19 Thread Guo Ren
On Tue, Sep 19, 2023 at 1:30 PM Leonardo Bras  wrote:
>
> On Sun, Sep 17, 2023 at 11:12:31PM +0800, Guo Ren wrote:
> > On Thu, Sep 14, 2023 at 4:02 PM Leonardo Bras  wrote:
> > >
> > > On Sun, Sep 10, 2023 at 04:29:02AM -0400, guo...@kernel.org wrote:
> > > > From: Guo Ren 
> > > >
> > > > Add a static key controlling whether virt_spin_lock() should be
> > > > called or not. When running on bare metal set the new key to
> > > > false.
> > > >
> > > > The KVM guests fall back to a Test-and-Set spinlock, because fair
> > > > locks have horrible lock 'holder' preemption issues. The
> > > > virt_spin_lock_key would shortcut for the
> > > > queued_spin_lock_slowpath() function that allow virt_spin_lock to
> > > > hijack it.
> > > >
> > > > Signed-off-by: Guo Ren 
> > > > Signed-off-by: Guo Ren 
> > > > ---
> > > >  .../admin-guide/kernel-parameters.txt |  4 +++
> > > >  arch/riscv/include/asm/sbi.h  |  8 +
> > > >  arch/riscv/include/asm/spinlock.h | 22 ++
> > > >  arch/riscv/kernel/sbi.c   |  2 +-
> > > >  arch/riscv/kernel/setup.c | 30 ++-
> > > >  5 files changed, 64 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> > > > b/Documentation/admin-guide/kernel-parameters.txt
> > > > index 61cacb8dfd0e..f75bedc50e00 100644
> > > > --- a/Documentation/admin-guide/kernel-parameters.txt
> > > > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > > > @@ -3927,6 +3927,10 @@
> > > >   no_uaccess_flush
> > > >   [PPC] Don't flush the L1-D cache after accessing 
> > > > user data.
> > > >
> > > > + no_virt_spin[RISC-V] Disable virt_spin_lock in KVM guest to 
> > > > use
> > > > + native_queued_spinlock when the nopvspin option 
> > > > is enabled.
> > > > + This would help vcpu=pcpu scenarios.
> > > > +
> > > >   novmcoredd  [KNL,KDUMP]
> > > >   Disable device dump. Device dump allows drivers to
> > > >   append dump data to vmcore so you can collect 
> > > > driver
> > > > diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
> > > > index 501e06e52078..e0233b3d7a5f 100644
> > > > --- a/arch/riscv/include/asm/sbi.h
> > > > +++ b/arch/riscv/include/asm/sbi.h
> > > > @@ -50,6 +50,13 @@ enum sbi_ext_base_fid {
> > > >   SBI_EXT_BASE_GET_MIMPID,
> > > >  };
> > > >
> > > > +enum sbi_ext_base_impl_id {
> > > > + SBI_EXT_BASE_IMPL_ID_BBL = 0,
> > > > + SBI_EXT_BASE_IMPL_ID_OPENSBI,
> > > > + SBI_EXT_BASE_IMPL_ID_XVISOR,
> > > > + SBI_EXT_BASE_IMPL_ID_KVM,
> > > > +};
> > > > +
> > > >  enum sbi_ext_time_fid {
> > > >   SBI_EXT_TIME_SET_TIMER = 0,
> > > >  };
> > > > @@ -269,6 +276,7 @@ int sbi_console_getchar(void);
> > > >  long sbi_get_mvendorid(void);
> > > >  long sbi_get_marchid(void);
> > > >  long sbi_get_mimpid(void);
> > > > +long sbi_get_firmware_id(void);
> > > >  void sbi_set_timer(uint64_t stime_value);
> > > >  void sbi_shutdown(void);
> > > >  void sbi_send_ipi(unsigned int cpu);
> > > > diff --git a/arch/riscv/include/asm/spinlock.h 
> > > > b/arch/riscv/include/asm/spinlock.h
> > > > index 8ea0fee80652..6b38d6616f14 100644
> > > > --- a/arch/riscv/include/asm/spinlock.h
> > > > +++ b/arch/riscv/include/asm/spinlock.h
> > > > @@ -4,6 +4,28 @@
> > > >  #define __ASM_RISCV_SPINLOCK_H
> > > >
> > > >  #ifdef CONFIG_QUEUED_SPINLOCKS
> > > > +/*
> > > > + * The KVM guests fall back to a Test-and-Set spinlock, because fair 
> > > > locks
> > > > + * have horrible lock 'holder' preemption issues. The 
> > > > virt_spin_lock_key
> > > > + * would shortcut for the queued_spin_lock_slowpath() function that 
> > > > allow
> > > > + * virt_spin_lock to hijack it.
> > > > + */
> > > > +DECLARE_STATIC_KEY_TRUE(virt_spin_lock_key);
> > > > +
> > > > +#define virt_spin_lock virt_spin_lock
> > > > +static inline bool virt_spin_lock(struct qspinlock *lock)
> > > > +{
> > > > + if (!static_branch_likely(&virt_spin_lock_key))
> > > > + return false;
> > > > +
> > > > + do {
> > > > + while (atomic_read(&lock->val) != 0)
> > > > + cpu_relax();
> > > > + } while (atomic_cmpxchg(&lock->val, 0, _Q_LOCKED_VAL) != 0);
> > > > +
> > > > + return true;
> > > > +}
> > > > +
> > > >  #define _Q_PENDING_LOOPS (1 << 9)
> > > >  #endif
> > > >
> > > > diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c
> > > > index 88eea3a99ee0..cdd45edc8db4 100644
> > > > --- a/arch/riscv/kernel/sbi.c
> > > > +++ b/arch/riscv/kernel/sbi.c
> > > > @@ -555,7 +555,7 @@ static inline long sbi_get_spec_version(void)
> > > >   return __sbi_base_ecall(SBI_EXT_BASE_GET_SPEC_VERSION);
> > > >  }
> > > >
> > > > -static inline long sbi_get_firmware_id(void)
> > > > +long sbi_get_firmware_id(void)
> > > >  {
> > > >   return __sbi_base_ecall(SBI_EXT_BASE_

Re: [PATCH v10 36/38] x86/fred: Add fred_syscall_init()

2023-09-19 Thread Thomas Gleixner
On Wed, Sep 13 2023 at 21:48, Xin Li wrote:
> +static inline void fred_syscall_init(void)
> +{
> + /*
> +  * Per FRED spec 5.0, FRED uses the ring 3 FRED entrypoint for SYSCALL
> +  * and SYSENTER, and ERETU is the only legit instruction to return to
> +  * ring 3, as a result there is _no_ need to setup the SYSCALL and
> +  * SYSENTER MSRs.
> +  *
> +  * Note, both sysexit and sysret cause #UD when FRED is enabled.
> +  */
> + wrmsrl(MSR_LSTAR, 0ULL);
> + wrmsrl_cstar(0ULL);

That write is pointless. See the comment in wrmsrl_cstar().

Thanks,

tglx



[PATCH] rtla: fix a example in rtla-timerlat-hist.rst

2023-09-19 Thread Xie XiuQi
From: Xie XiuQi 

The following error message is reported when running the example in document.

  # timerlat hist -d 10m -c 0-4 -P d:100us:1ms -p 1ms --no-aa
  Failed to set timerlat period
  Could not apply config

The unit of the period is microsecond, '1ms' cannot be accepted.

  usage: [rtla] timerlat hist [-h] [-q] [-d s] [-D] [-n] [-a us] [-p us] [-i 
us] [-T us] [-s us] ...
 ...
  -p/--period us: timerlat period in us
 ...

Also fix another minor missleading comment.

Signed-off-by: Xie XiuQi 
---
 Documentation/tools/rtla/rtla-timerlat-hist.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/tools/rtla/rtla-timerlat-hist.rst 
b/Documentation/tools/rtla/rtla-timerlat-hist.rst
index 057db78d4095..03b7f3deb069 100644
--- a/Documentation/tools/rtla/rtla-timerlat-hist.rst
+++ b/Documentation/tools/rtla/rtla-timerlat-hist.rst
@@ -36,11 +36,11 @@ EXAMPLE
 In the example below, **rtla timerlat hist** is set to run for *10* minutes,
 in the cpus *0-4*, *skipping zero* only lines. Moreover, **rtla timerlat
 hist** will change the priority of the *timerlat* threads to run under
-*SCHED_DEADLINE* priority, with a *10us* runtime every *1ms* period. The
+*SCHED_DEADLINE* priority, with a *100us* runtime every *1ms* period. The
 *1ms* period is also passed to the *timerlat* tracer. Auto-analysis is disabled
 to reduce overhead ::
 
-  [root@alien ~]# timerlat hist -d 10m -c 0-4 -P d:100us:1ms -p 1ms --no-aa
+  [root@alien ~]# timerlat hist -d 10m -c 0-4 -P d:100us:1ms -p 1000 --no-aa
   # RTLA timerlat histogram
   # Time unit is microseconds (us)
   # Duration:   0 00:10:00
-- 
2.25.1



Re: [RFC PATCH 00/14] Prevent cross-cache attacks in the SLUB allocator

2023-09-19 Thread Matteo Rizzo
On Mon, 18 Sept 2023 at 19:39, Ingo Molnar  wrote:
>
> What's the split of the increase in overhead due to SLAB_VIRTUAL=y, between
> user-space execution and kernel-space execution?
>

Same benchmark as before (compiling a kernel on a system running the patched
kernel):

Intel Skylake:

  LABEL| COUNT |   MIN|   MAX|   MEAN   |  MEDIAN  | STDDEV
---+---+--+--+--+--+
wall clock |   |  |  |  |  |
SLAB_VIRTUAL=n | 150   | 49.700   | 51.320   | 50.449   | 50.430   | 0.29959
SLAB_VIRTUAL=y | 150   | 50.020   | 51.660   | 50.880   | 50.880   | 0.30495
   |   | +0.64%   | +0.66%   | +0.85%   | +0.89%   | +1.79%
system time|   |  |  |  |  |
SLAB_VIRTUAL=n | 150   | 358.560  | 362.900  | 360.922  | 360.985  | 0.91761
SLAB_VIRTUAL=y | 150   | 362.970  | 367.970  | 366.062  | 366.115  | 1.015
   |   | +1.23%   | +1.40%   | +1.42%   | +1.42%   | +10.60%
user time  |   |  |  |  |  |
SLAB_VIRTUAL=n | 150   | 3110.000 | 3124.520 | 3118.143 | 3118.120 | 2.466
SLAB_VIRTUAL=y | 150   | 3115.070 | 3127.070 | 3120.762 | 3120.925 | 2.654
   |   | +0.16%   | +0.08%   | +0.08%   | +0.09%   | +7.63%

AMD Milan:

  LABEL| COUNT |   MIN|   MAX|   MEAN   |  MEDIAN  | STDDEV
---+---+--+--+--+--+
wall clock |   |  |  |  |  |
SLAB_VIRTUAL=n | 150   | 25.480   | 26.550   | 26.065   | 26.055   | 0.23495
SLAB_VIRTUAL=y | 150   | 25.820   | 27.080   | 26.531   | 26.540   | 0.25974
   |   | +1.33%   | +2.00%   | +1.79%   | +1.86%   | +10.55%
system time|   |  |  |  |  |
SLAB_VIRTUAL=n | 150   | 478.530  | 540.420  | 520.803  | 521.485  | 9.166
SLAB_VIRTUAL=y | 150   | 530.520  | 572.460  | 552.825  | 552.985  | 7.161
   |   | +10.86%  | +5.93%   | +6.15%   | +6.04%   | -21.88%
user time  |   |  |  |  |  |
SLAB_VIRTUAL=n | 150   | 2373.540 | 2403.800 | 2386.343 | 2385.840 | 5.325
SLAB_VIRTUAL=y | 150   | 2388.690 | 2426.290 | 2408.325 | 2408.895 | 6.667
   |   | +0.64%   | +0.94%   | +0.92%   | +0.97%   | +25.20%


I'm not exactly sure why user time increases by almost 1% on Milan, it could be
TLB contention.

--
Matteo


Re: [PATCH] rtla: fix a example in rtla-timerlat-hist.rst

2023-09-19 Thread Daniel Bristot de Oliveira
On 9/19/23 15:30, Xie XiuQi wrote:
> From: Xie XiuQi 
> 
> The following error message is reported when running the example in document.
> 
>   # timerlat hist -d 10m -c 0-4 -P d:100us:1ms -p 1ms --no-aa
>   Failed to set timerlat period
>   Could not apply config
> 
> The unit of the period is microsecond, '1ms' cannot be accepted.

right, but we could also parse the -p accordingly, no?

as well as the -r for osnoise and hwnoise... and other
us only parameters

>   usage: [rtla] timerlat hist [-h] [-q] [-d s] [-D] [-n] [-a us] [-p us] [-i 
> us] [-T us] [-s us] ...
>  ...
> -p/--period us: timerlat period in us
>  ...
> 
> Also fix another minor missleading comment.
> 
> Signed-off-by: Xie XiuQi 



Re: [PATCH V11 03/17] riscv: Use Zicbop in arch_xchg when available

2023-09-19 Thread Leonardo Bras
On Tue, Sep 19, 2023 at 03:53:22PM +0800, Guo Ren wrote:
> On Tue, Sep 19, 2023 at 1:13 PM Leonardo Bras  wrote:
> >
> > On Sun, Sep 17, 2023 at 10:34:36PM +0800, Guo Ren wrote:
> > > On Sat, Sep 16, 2023 at 9:25 AM Leonardo Bras  wrote:
> > > >
> > > > On Fri, Sep 15, 2023 at 08:36:31PM +0800, Guo Ren wrote:
> > > > > On Wed, Sep 13, 2023 at 4:50 PM Leonardo Bras  
> > > > > wrote:
> > > > > >
> > > > > > On Sun, Sep 10, 2023 at 04:28:57AM -0400, guo...@kernel.org wrote:
> > > > > > > From: Guo Ren 
> > > > > > >
> > > > > > > Cache-block prefetch instructions are HINTs to the hardware to
> > > > > > > indicate that software intends to perform a particular type of
> > > > > > > memory access in the near future. Enable ARCH_HAS_PREFETCHW and
> > > > > > > improve the arch_xchg for qspinlock xchg_tail.
> > > > > > >
> > > > > > > Signed-off-by: Guo Ren 
> > > > > > > Signed-off-by: Guo Ren 
> > > > > > > ---
> > > > > > >  arch/riscv/Kconfig | 15 +++
> > > > > > >  arch/riscv/include/asm/cmpxchg.h   |  4 +++-
> > > > > > >  arch/riscv/include/asm/hwcap.h |  1 +
> > > > > > >  arch/riscv/include/asm/insn-def.h  |  5 +
> > > > > > >  arch/riscv/include/asm/processor.h | 13 +
> > > > > > >  arch/riscv/kernel/cpufeature.c |  1 +
> > > > > > >  6 files changed, 38 insertions(+), 1 deletion(-)
> > > > > > >
> > > > > > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > > > > > > index e9ae6fa232c3..2c346fe169c1 100644
> > > > > > > --- a/arch/riscv/Kconfig
> > > > > > > +++ b/arch/riscv/Kconfig
> > > > > > > @@ -617,6 +617,21 @@ config RISCV_ISA_ZICBOZ
> > > > > > >
> > > > > > >  If you don't know what to do here, say Y.
> > > > > > >
> > > > > > > +config RISCV_ISA_ZICBOP
> > > > > > > + bool "Zicbop extension support for cache block prefetch"
> > > > > > > + depends on MMU
> > > > > > > + depends on RISCV_ALTERNATIVE
> > > > > > > + default y
> > > > > > > + help
> > > > > > > +Adds support to dynamically detect the presence of the 
> > > > > > > ZICBOP
> > > > > > > +extension (Cache Block Prefetch Operations) and enable 
> > > > > > > its
> > > > > > > +usage.
> > > > > > > +
> > > > > > > +The Zicbop extension can be used to prefetch cache block 
> > > > > > > for
> > > > > > > +read/write/instruction fetch.
> > > > > > > +
> > > > > > > +If you don't know what to do here, say Y.
> > > > > > > +
> > > > > > >  config TOOLCHAIN_HAS_ZIHINTPAUSE
> > > > > > >   bool
> > > > > > >   default y
> > > > > > > diff --git a/arch/riscv/include/asm/cmpxchg.h 
> > > > > > > b/arch/riscv/include/asm/cmpxchg.h
> > > > > > > index 702725727671..56eff7a9d2d2 100644
> > > > > > > --- a/arch/riscv/include/asm/cmpxchg.h
> > > > > > > +++ b/arch/riscv/include/asm/cmpxchg.h
> > > > > > > @@ -11,6 +11,7 @@
> > > > > > >
> > > > > > >  #include 
> > > > > > >  #include 
> > > > > > > +#include 
> > > > > > >
> > > > > > >  #define __arch_xchg_masked(prepend, append, r, p, n) 
> > > > > > > \
> > > > > > >  ({   
> > > > > > > \
> > > > > > > @@ -25,6 +26,7 @@
> > > > > > >   
> > > > > > > \
> > > > > > >   __asm__ __volatile__ (  
> > > > > > > \
> > > > > > >  prepend  
> > > > > > > \
> > > > > > > +PREFETCHW_ASM(%5)
> > > > > > > \
> > > > > > >  "0:  lr.w %0, %2\n"  
> > > > > > > \
> > > > > > >  "and  %1, %0, %z4\n" 
> > > > > > > \
> > > > > > >  "or   %1, %1, %z3\n" 
> > > > > > > \
> > > > > > > @@ -32,7 +34,7 @@
> > > > > > >  "bnez %1, 0b\n"  
> > > > > > > \
> > > > > > >  append   
> > > > > > > \
> > > > > > >  : "=&r" (__retx), "=&r" (__rc), "+A" (*(__ptr32b))   
> > > > > > > \
> > > > > > > -: "rJ" (__newx), "rJ" (~__mask)  
> > > > > > > \
> > > > > > > +: "rJ" (__newx), "rJ" (~__mask), "rJ" (__ptr32b) 
> > > > > > > \
> > > > > > >  : "memory"); 
> > > > > > > \
> > > > > > >   
> > > > > > > \
> > > > > > >   r = (__typeof__(*(p)))((__retx & __mask) >> __s);   
> > > > > > > \
> > > > > > > diff --git a/arch/riscv/include/asm/hwcap.h 
> > > > > > > b/arch/riscv/include/asm/hwcap.h
> > > > > > > index b7b58258f6c7..78b7b8b53778 100644
> > > > > > > --- a/arch/riscv/include/asm/hwcap.h
> > > > > > > +++ b/arch/ri

Re: [PATCH V11 08/17] riscv: qspinlock: Add virt_spin_lock() support for KVM guest

2023-09-19 Thread Leonardo Bras
On Tue, Sep 19, 2023 at 04:04:48PM +0800, Guo Ren wrote:
> On Tue, Sep 19, 2023 at 1:30 PM Leonardo Bras  wrote:
> >
> > On Sun, Sep 17, 2023 at 11:12:31PM +0800, Guo Ren wrote:
> > > On Thu, Sep 14, 2023 at 4:02 PM Leonardo Bras  wrote:
> > > >
> > > > On Sun, Sep 10, 2023 at 04:29:02AM -0400, guo...@kernel.org wrote:
> > > > > From: Guo Ren 
> > > > >
> > > > > Add a static key controlling whether virt_spin_lock() should be
> > > > > called or not. When running on bare metal set the new key to
> > > > > false.
> > > > >
> > > > > The KVM guests fall back to a Test-and-Set spinlock, because fair
> > > > > locks have horrible lock 'holder' preemption issues. The
> > > > > virt_spin_lock_key would shortcut for the
> > > > > queued_spin_lock_slowpath() function that allow virt_spin_lock to
> > > > > hijack it.
> > > > >
> > > > > Signed-off-by: Guo Ren 
> > > > > Signed-off-by: Guo Ren 
> > > > > ---
> > > > >  .../admin-guide/kernel-parameters.txt |  4 +++
> > > > >  arch/riscv/include/asm/sbi.h  |  8 +
> > > > >  arch/riscv/include/asm/spinlock.h | 22 ++
> > > > >  arch/riscv/kernel/sbi.c   |  2 +-
> > > > >  arch/riscv/kernel/setup.c | 30 
> > > > > ++-
> > > > >  5 files changed, 64 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> > > > > b/Documentation/admin-guide/kernel-parameters.txt
> > > > > index 61cacb8dfd0e..f75bedc50e00 100644
> > > > > --- a/Documentation/admin-guide/kernel-parameters.txt
> > > > > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > > > > @@ -3927,6 +3927,10 @@
> > > > >   no_uaccess_flush
> > > > >   [PPC] Don't flush the L1-D cache after 
> > > > > accessing user data.
> > > > >
> > > > > + no_virt_spin[RISC-V] Disable virt_spin_lock in KVM guest to 
> > > > > use
> > > > > + native_queued_spinlock when the nopvspin option 
> > > > > is enabled.
> > > > > + This would help vcpu=pcpu scenarios.
> > > > > +
> > > > >   novmcoredd  [KNL,KDUMP]
> > > > >   Disable device dump. Device dump allows drivers 
> > > > > to
> > > > >   append dump data to vmcore so you can collect 
> > > > > driver
> > > > > diff --git a/arch/riscv/include/asm/sbi.h 
> > > > > b/arch/riscv/include/asm/sbi.h
> > > > > index 501e06e52078..e0233b3d7a5f 100644
> > > > > --- a/arch/riscv/include/asm/sbi.h
> > > > > +++ b/arch/riscv/include/asm/sbi.h
> > > > > @@ -50,6 +50,13 @@ enum sbi_ext_base_fid {
> > > > >   SBI_EXT_BASE_GET_MIMPID,
> > > > >  };
> > > > >
> > > > > +enum sbi_ext_base_impl_id {
> > > > > + SBI_EXT_BASE_IMPL_ID_BBL = 0,
> > > > > + SBI_EXT_BASE_IMPL_ID_OPENSBI,
> > > > > + SBI_EXT_BASE_IMPL_ID_XVISOR,
> > > > > + SBI_EXT_BASE_IMPL_ID_KVM,
> > > > > +};
> > > > > +
> > > > >  enum sbi_ext_time_fid {
> > > > >   SBI_EXT_TIME_SET_TIMER = 0,
> > > > >  };
> > > > > @@ -269,6 +276,7 @@ int sbi_console_getchar(void);
> > > > >  long sbi_get_mvendorid(void);
> > > > >  long sbi_get_marchid(void);
> > > > >  long sbi_get_mimpid(void);
> > > > > +long sbi_get_firmware_id(void);
> > > > >  void sbi_set_timer(uint64_t stime_value);
> > > > >  void sbi_shutdown(void);
> > > > >  void sbi_send_ipi(unsigned int cpu);
> > > > > diff --git a/arch/riscv/include/asm/spinlock.h 
> > > > > b/arch/riscv/include/asm/spinlock.h
> > > > > index 8ea0fee80652..6b38d6616f14 100644
> > > > > --- a/arch/riscv/include/asm/spinlock.h
> > > > > +++ b/arch/riscv/include/asm/spinlock.h
> > > > > @@ -4,6 +4,28 @@
> > > > >  #define __ASM_RISCV_SPINLOCK_H
> > > > >
> > > > >  #ifdef CONFIG_QUEUED_SPINLOCKS
> > > > > +/*
> > > > > + * The KVM guests fall back to a Test-and-Set spinlock, because fair 
> > > > > locks
> > > > > + * have horrible lock 'holder' preemption issues. The 
> > > > > virt_spin_lock_key
> > > > > + * would shortcut for the queued_spin_lock_slowpath() function that 
> > > > > allow
> > > > > + * virt_spin_lock to hijack it.
> > > > > + */
> > > > > +DECLARE_STATIC_KEY_TRUE(virt_spin_lock_key);
> > > > > +
> > > > > +#define virt_spin_lock virt_spin_lock
> > > > > +static inline bool virt_spin_lock(struct qspinlock *lock)
> > > > > +{
> > > > > + if (!static_branch_likely(&virt_spin_lock_key))
> > > > > + return false;
> > > > > +
> > > > > + do {
> > > > > + while (atomic_read(&lock->val) != 0)
> > > > > + cpu_relax();
> > > > > + } while (atomic_cmpxchg(&lock->val, 0, _Q_LOCKED_VAL) != 0);
> > > > > +
> > > > > + return true;
> > > > > +}
> > > > > +
> > > > >  #define _Q_PENDING_LOOPS (1 << 9)
> > > > >  #endif
> > > > >
> > > > > diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c
> > > > > index 88eea3a99ee0..cdd45edc8db4 100644
> > > > > --- a/arch/riscv/kernel/sbi.c
> > > > > +++ b/arch/riscv/kernel/sbi.c

Re: [RFC PATCH 00/14] Prevent cross-cache attacks in the SLUB allocator

2023-09-19 Thread Matteo Rizzo
On Mon, 18 Sept 2023 at 20:05, Linus Torvalds
 wrote:
>
> ... and equally importantly, what about DMA?

I'm not exactly sure what you mean by this, I don't think this should
affect the performance of DMA.

> Or what about the fixed-size slabs (aka kmalloc?) What's the point of
> "never re-use the same address for a different slab", when the *same*
> slab will contain different kinds of allocations anyway?

There are a number of patches out there (for example the random_kmalloc
series which recently got merged into v6.6) which attempt to segregate
kmalloc'd objects into different caches to make exploitation harder.
Another thing that we would like to have in the future is to segregate
objects by type (like XNU's kalloc_type
https://security.apple.com/blog/towards-the-next-generation-of-xnu-memory-safety/)
which makes exploiting use-after-free by type confusion much harder or
impossible.

All of these mitigations can be bypassed very easily if the attacker can
mount a cross-cache attack, which is what this series attempts to prevent.
This is not only theoretical, we've seen attackers use this all the time in
kCTF/kernelCTF submissions (for example
https://ruia-ruia.github.io/2022/08/05/CVE-2022-29582-io-uring/).

> I think the whole "make it one single compile-time option" model is
> completely and fundamentally broken.

Wouldn't making this toggleable at boot time or runtime make performance
even worse?

--
Matteo


Re: [RFC PATCH 00/14] Prevent cross-cache attacks in the SLUB allocator

2023-09-19 Thread Dave Hansen
On 9/19/23 06:42, Matteo Rizzo wrote:
> On Mon, 18 Sept 2023 at 19:39, Ingo Molnar  wrote:
>> What's the split of the increase in overhead due to SLAB_VIRTUAL=y, between
>> user-space execution and kernel-space execution?
>>
> Same benchmark as before (compiling a kernel on a system running the patched
> kernel):

Thanks for running those.  One more situation that comes to mind is how
this will act under memory pressure.  Will some memory pressure make
contention on 'slub_kworker_lock' visible or make the global TLB flushes
less bearable?

In any case, none of this looks _catastrophic_.  It's surely a cost that
some folks will pay.

But I really do think it needs to be more dynamic.  There are a _couple_
of reasons for this.  If it's only a compile-time option, it's never
going to get turned on except for maybe ChromeOS and the datacenter
folks that are paranoid.  I suspect the distros will never turn it on.

A lot of questions get easier if you can disable/enable it at runtime.
For instance, what do you do if the virtual area fills up?  You _could_
just go back to handing out direct map addresses.  Less secure?  Yep.
But better than crashing (for some folks).

It also opens up the door to do this per-slab.  That alone would be a
handy debugging option.


Re: [RFC PATCH 00/14] Prevent cross-cache attacks in the SLUB allocator

2023-09-19 Thread Dave Hansen
On 9/19/23 08:48, Matteo Rizzo wrote:
>> I think the whole "make it one single compile-time option" model is
>> completely and fundamentally broken.
> Wouldn't making this toggleable at boot time or runtime make performance
> even worse?

Maybe.

But you can tolerate even more of a performance impact from a feature if
the people that don't care can actually disable it.

There are also plenty of ways to minimize the overhead of switching it
on and off at runtime.  Static branches are your best friend here.


Re: [RFC PATCH 00/14] Prevent cross-cache attacks in the SLUB allocator

2023-09-19 Thread Kees Cook
On September 19, 2023 9:02:07 AM PDT, Dave Hansen  wrote:
>On 9/19/23 08:48, Matteo Rizzo wrote:
>>> I think the whole "make it one single compile-time option" model is
>>> completely and fundamentally broken.
>> Wouldn't making this toggleable at boot time or runtime make performance
>> even worse?
>
>Maybe.
>
>But you can tolerate even more of a performance impact from a feature if
>the people that don't care can actually disable it.
>
>There are also plenty of ways to minimize the overhead of switching it
>on and off at runtime.  Static branches are your best friend here.

Let's start with a boot time on/off toggle (no per-slab, no switch on 
out-of-space, etc). That should be sufficient for initial ease of use for 
testing, etc. But yes, using static_branch will nicely DTRT here.



-- 
Kees Cook


Re: [RFC PATCH 00/14] Prevent cross-cache attacks in the SLUB allocator

2023-09-19 Thread Linus Torvalds
On Tue, 19 Sept 2023 at 08:48, Matteo Rizzo  wrote:
>
> On Mon, 18 Sept 2023 at 20:05, Linus Torvalds
>  wrote:
> >
> > ... and equally importantly, what about DMA?
>
> I'm not exactly sure what you mean by this, I don't think this should
> affect the performance of DMA.

I was more worried about just basic correctness.

We've traditionally had a lot of issues with using virtual addresses
for dma, simply because we've got random drivers, and I'm not entirely
convinced that your "virt_to_phys()" update will catch it all.

IOW, even on x86-64 - which is hopefully better than most
architectures because it already has that double mapping issue - we
have things like

unsigned long paddr = (unsigned long)vaddr - __PAGE_OFFSET;

in other places than just the __phys_addr() code.

The one place I grepped for looks to be just boot-time AMD memory
encryption, so wouldn't be any slab allocation, but ...

   Linus


RE: [PATCH v10 36/38] x86/fred: Add fred_syscall_init()

2023-09-19 Thread Li, Xin3
> > +static inline void fred_syscall_init(void) {
> > +   /*
> > +* Per FRED spec 5.0, FRED uses the ring 3 FRED entrypoint for SYSCALL
> > +* and SYSENTER, and ERETU is the only legit instruction to return to
> > +* ring 3, as a result there is _no_ need to setup the SYSCALL and
> > +* SYSENTER MSRs.
> > +*
> > +* Note, both sysexit and sysret cause #UD when FRED is enabled.
> > +*/
> > +   wrmsrl(MSR_LSTAR, 0ULL);
> > +   wrmsrl_cstar(0ULL);
> 
> That write is pointless. See the comment in wrmsrl_cstar().

What I heard is that AMD is going to support FRED.

Both LSTAR and CSTAR have no function when FRED is enabled, so maybe
just do NOT write to them?

Thanks!
Xin