Re: [PATCH] powerpc/xmon: add read-only mode
> On March 29, 2019 at 3:41 AM Christophe Leroy wrote: > > > > > Le 29/03/2019 à 05:21, cmr a écrit : > > Operations which write to memory should be restricted on secure systems > > and optionally to avoid self-destructive behaviors. > > > > Add a config option, XMON_RO, to control default xmon behavior along > > with kernel cmdline options xmon=ro and xmon=rw for explicit control. > > The default is to enable read-only mode. > > > > The following xmon operations are affected: > > memops: > > disable memmove > > disable memset > > memex: > > no-op'd mwrite > > super_regs: > > no-op'd write_spr > > bpt_cmds: > > disable > > proc_call: > > disable > > > > Signed-off-by: cmr > > A Fully qualified name should be used. What do you mean by fully-qualified here? PPC_XMON_RO? (PPC_)XMON_READONLY? > > > --- > > arch/powerpc/Kconfig.debug | 7 +++ > > arch/powerpc/xmon/xmon.c | 24 > > 2 files changed, 31 insertions(+) > > > > diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug > > index 4e00cb0a5464..33cc01adf4cb 100644 > > --- a/arch/powerpc/Kconfig.debug > > +++ b/arch/powerpc/Kconfig.debug > > @@ -117,6 +117,13 @@ config XMON_DISASSEMBLY > > to say Y here, unless you're building for a memory-constrained > > system. > > > > +config XMON_RO > > + bool "Set xmon read-only mode" > > + depends on XMON > > + default y > > Should it really be always default y ? > I would set default 'y' only when some security options are also set. > This is a good point, I based this on an internal Slack suggestion but giving this more thought, disabling read-only mode by default makes more sense. I'm not sure what security options could be set though?
Re: [PATCH] powerpc/xmon: add read-only mode
> On March 29, 2019 at 12:49 AM Andrew Donnellan > wrote: > > > On 29/3/19 3:21 pm, cmr wrote: > > Operations which write to memory should be restricted on secure systems > > and optionally to avoid self-destructive behaviors. > > For reference: > - https://github.com/linuxppc/issues/issues/219 > - https://github.com/linuxppc/issues/issues/232 > > Perhaps clarify what is meant here by "secure systems". > > Otherwise commit message looks good. > I will reword this for the next patch to reflect the verbiage in the referenced github issue -- ie. Secure Boot and not violating secure boot integrity by using xmon. > > > --- > > arch/powerpc/Kconfig.debug | 7 +++ > > arch/powerpc/xmon/xmon.c | 24 > > 2 files changed, 31 insertions(+) > > > > diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug > > index 4e00cb0a5464..33cc01adf4cb 100644 > > --- a/arch/powerpc/Kconfig.debug > > +++ b/arch/powerpc/Kconfig.debug > > @@ -117,6 +117,13 @@ config XMON_DISASSEMBLY > > to say Y here, unless you're building for a memory-constrained > > system. > > > > +config XMON_RO > > + bool "Set xmon read-only mode" > > + depends on XMON > > + default y > > + help > > + Disable state- and memory-altering write operations in xmon. > > The meaning of this option is a bit unclear. > > From the code - it looks like what this option actually does is enable > RO mode *by default*. In which case it should probably be called > XMON_RO_DEFAULT and the description should note that RW mode can still > be enabled via a cmdline option. > Based on Christophe's feedback the default will change for this option in the next patch. I will also add the cmdline options to the description for clarity. > > > + > > config DEBUGGER > > bool > > depends on KGDB || XMON > > diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c > > index a0f44f992360..c13ee73cdfd4 100644 > > --- a/arch/powerpc/xmon/xmon.c > > +++ b/arch/powerpc/xmon/xmon.c > > @@ -80,6 +80,7 @@ static int set_indicator_token = RTAS_UNKNOWN_SERVICE; > > #endif > > static unsigned long in_xmon __read_mostly = 0; > > static int xmon_on = IS_ENABLED(CONFIG_XMON_DEFAULT); > > +static int xmon_ro = IS_ENABLED(CONFIG_XMON_RO); > > > > static unsigned long adrs; > > static int size = 1; > > @@ -1042,6 +1043,8 @@ cmds(struct pt_regs *excp) > > set_lpp_cmd(); > > break; > > case 'b': > > + if (xmon_ro == 1) > > + break; > > For all these cases - it would be much better to print an error message > somewhere when we abort due to read-only mode. > I included print messages initially but then thought about how xmon is intended for "power" users. I can add print statements to avoid confusion and frustration since the operations are just "silently" dropped -- *if* that aligns with xmon's "philosophy".
Re: [PATCH] powerpc/xmon: add read-only mode
> On April 3, 2019 at 12:15 AM Christophe Leroy wrote: > > > > > Le 03/04/2019 à 05:38, Christopher M Riedl a écrit : > >> On March 29, 2019 at 3:41 AM Christophe Leroy > >> wrote: > >> > >> > >> > >> > >> Le 29/03/2019 à 05:21, cmr a écrit : > >>> Operations which write to memory should be restricted on secure systems > >>> and optionally to avoid self-destructive behaviors. > >>> > >>> Add a config option, XMON_RO, to control default xmon behavior along > >>> with kernel cmdline options xmon=ro and xmon=rw for explicit control. > >>> The default is to enable read-only mode. > >>> > >>> The following xmon operations are affected: > >>> memops: > >>> disable memmove > >>> disable memset > >>> memex: > >>> no-op'd mwrite > >>> super_regs: > >>> no-op'd write_spr > >>> bpt_cmds: > >>> disable > >>> proc_call: > >>> disable > >>> > >>> Signed-off-by: cmr > >> > >> A Fully qualified name should be used. > > > > What do you mean by fully-qualified here? PPC_XMON_RO? (PPC_)XMON_READONLY? > > I mean it should be > > Signed-off-by: Christopher M Riedl > > instead of > > Signed-off-by: cmr > Hehe, thanks :) > > > >> > >>> --- > >>>arch/powerpc/Kconfig.debug | 7 +++ > >>>arch/powerpc/xmon/xmon.c | 24 > >>>2 files changed, 31 insertions(+) > >>> > >>> diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug > >>> index 4e00cb0a5464..33cc01adf4cb 100644 > >>> --- a/arch/powerpc/Kconfig.debug > >>> +++ b/arch/powerpc/Kconfig.debug > >>> @@ -117,6 +117,13 @@ config XMON_DISASSEMBLY > >>> to say Y here, unless you're building for a memory-constrained > >>> system. > >>> > >>> +config XMON_RO > >>> + bool "Set xmon read-only mode" > >>> + depends on XMON > >>> + default y > >> > >> Should it really be always default y ? > >> I would set default 'y' only when some security options are also set. > >> > > > > This is a good point, I based this on an internal Slack suggestion but > > giving this more thought, disabling read-only mode by default makes more > > sense. I'm not sure what security options could be set though? > > > > Maybe starting with CONFIG_STRICT_KERNEL_RWX > > Another point that may also be addressed by your patch is the definition > of PAGE_KERNEL_TEXT: > > #if defined(CONFIG_KGDB) || defined(CONFIG_XMON) || > defined(CONFIG_BDI_SWITCH) ||\ > defined(CONFIG_KPROBES) || defined(CONFIG_DYNAMIC_FTRACE) > #define PAGE_KERNEL_TEXT PAGE_KERNEL_X > #else > #define PAGE_KERNEL_TEXT PAGE_KERNEL_ROX > #endif > > The above let me think that it would be better if you add a config > XMON_RW instead of XMON_RO, with default !STRICT_KERNEL_RWX > > Christophe Thanks! I like that a lot better, this, along with your other suggestions in the initial review, will be in the next version.
[PATCH v2] powerpc/xmon: add read-only mode
Operations which write to memory and special purpose registers should be restricted on systems with integrity guarantees (such as Secure Boot) and, optionally, to avoid self-destructive behaviors. Add a config option, XMON_RW, to control default xmon behavior along with kernel cmdline options xmon=ro and xmon=rw for explicit control. Use XMON_RW instead of XMON in the condition to set PAGE_KERNEL_TEXT to allow xmon in read-only mode alongside write-protected kernel text. XMON_RW defaults to !STRICT_KERNEL_RWX. The following xmon operations are affected: memops: disable memmove disable memset disable memzcan memex: no-op'd mwrite super_regs: no-op'd write_spr bpt_cmds: disable proc_call: disable Signed-off-by: Christopher M. Riedl --- v1->v2: Use bool type for xmon_is_ro flag Replace XMON_RO with XMON_RW config option Make XMON_RW dependent on STRICT_KERNEL_RWX Use XMON_RW to control PAGE_KERNEL_TEXT Add printf in xmon read-only mode when dropping/skipping writes Disable memzcan (zero-fill memop) in xmon read-only mode arch/powerpc/Kconfig.debug | 10 + arch/powerpc/include/asm/book3s/32/pgtable.h | 5 ++- arch/powerpc/include/asm/book3s/64/pgtable.h | 5 ++- arch/powerpc/include/asm/nohash/pgtable.h| 5 ++- arch/powerpc/xmon/xmon.c | 42 5 files changed, 61 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug index 4e00cb0a5464..0c7f21476018 100644 --- a/arch/powerpc/Kconfig.debug +++ b/arch/powerpc/Kconfig.debug @@ -117,6 +117,16 @@ config XMON_DISASSEMBLY to say Y here, unless you're building for a memory-constrained system. +config XMON_RW + bool "Allow xmon read and write operations" + depends on XMON + default !STRICT_KERNEL_RWX + help + Allow xmon to read and write to memory and special-purpose registers. + Conversely, prevent xmon write access when set to N. Read and write + access can also be explicitly controlled with 'xmon=rw' or 'xmon=ro' + (read-only) cmdline options. Default is !STRICT_KERNEL_RWX. + config DEBUGGER bool depends on KGDB || XMON diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h b/arch/powerpc/include/asm/book3s/32/pgtable.h index aa8406b8f7ba..615144ad667d 100644 --- a/arch/powerpc/include/asm/book3s/32/pgtable.h +++ b/arch/powerpc/include/asm/book3s/32/pgtable.h @@ -86,8 +86,9 @@ static inline bool pte_user(pte_t pte) * set breakpoints anywhere, so don't write protect the kernel text * on platforms where such control is possible. */ -#if defined(CONFIG_KGDB) || defined(CONFIG_XMON) || defined(CONFIG_BDI_SWITCH) ||\ - defined(CONFIG_KPROBES) || defined(CONFIG_DYNAMIC_FTRACE) +#if defined(CONFIG_KGDB) || defined(CONFIG_XMON_RW) || \ + defined(CONFIG_BDI_SWITCH) || defined(CONFIG_KPROBES) || \ + defined(CONFIG_DYNAMIC_FTRACE) #define PAGE_KERNEL_TEXT PAGE_KERNEL_X #else #define PAGE_KERNEL_TEXT PAGE_KERNEL_ROX diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h index 581f91be9dd4..bc4655122f6b 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -168,8 +168,9 @@ * set breakpoints anywhere, so don't write protect the kernel text * on platforms where such control is possible. */ -#if defined(CONFIG_KGDB) || defined(CONFIG_XMON) || defined(CONFIG_BDI_SWITCH) || \ - defined(CONFIG_KPROBES) || defined(CONFIG_DYNAMIC_FTRACE) +#if defined(CONFIG_KGDB) || defined(CONFIG_XMON_RW) || \ + defined(CONFIG_BDI_SWITCH) || defined(CONFIG_KPROBES) || \ + defined(CONFIG_DYNAMIC_FTRACE) #define PAGE_KERNEL_TEXT PAGE_KERNEL_X #else #define PAGE_KERNEL_TEXT PAGE_KERNEL_ROX diff --git a/arch/powerpc/include/asm/nohash/pgtable.h b/arch/powerpc/include/asm/nohash/pgtable.h index 1ca1c1864b32..c052931bd243 100644 --- a/arch/powerpc/include/asm/nohash/pgtable.h +++ b/arch/powerpc/include/asm/nohash/pgtable.h @@ -22,8 +22,9 @@ * set breakpoints anywhere, so don't write protect the kernel text * on platforms where such control is possible. */ -#if defined(CONFIG_KGDB) || defined(CONFIG_XMON) || defined(CONFIG_BDI_SWITCH) ||\ - defined(CONFIG_KPROBES) || defined(CONFIG_DYNAMIC_FTRACE) +#if defined(CONFIG_KGDB) || defined(CONFIG_XMON_RW) || \ + defined(CONFIG_BDI_SWITCH) || defined(CONFIG_KPROBES) || \ + defined(CONFIG_DYNAMIC_FTRACE) #define PAGE_KERNEL_TEXT PAGE_KERNEL_X #else #define PAGE_KERNEL_TEXT PAGE_KERNEL_ROX diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index a0f44f992360..224ca0b3506b 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/power
Re: [PATCH v2] powerpc/xmon: add read-only mode
> On April 8, 2019 at 1:34 AM Oliver wrote: > > > On Mon, Apr 8, 2019 at 1:06 PM Christopher M. Riedl > wrote: > > > > Operations which write to memory and special purpose registers should be > > restricted on systems with integrity guarantees (such as Secure Boot) > > and, optionally, to avoid self-destructive behaviors. > > > > Add a config option, XMON_RW, to control default xmon behavior along > > with kernel cmdline options xmon=ro and xmon=rw for explicit control. > > Use XMON_RW instead of XMON in the condition to set PAGE_KERNEL_TEXT to > > allow xmon in read-only mode alongside write-protected kernel text. > > XMON_RW defaults to !STRICT_KERNEL_RWX. > > > > The following xmon operations are affected: > > memops: > > disable memmove > > disable memset > > disable memzcan > > memex: > > no-op'd mwrite > > super_regs: > > no-op'd write_spr > > bpt_cmds: > > disable > > proc_call: > > disable > > > > Signed-off-by: Christopher M. Riedl > > --- > > v1->v2: > > Use bool type for xmon_is_ro flag > > Replace XMON_RO with XMON_RW config option > > Make XMON_RW dependent on STRICT_KERNEL_RWX > Do you mean make it dependent on XMON? > Yeah that's really not clear at all -- XMON_RW is set based on the value of STRICT_KERNEL_RWX. > > > Use XMON_RW to control PAGE_KERNEL_TEXT > > Add printf in xmon read-only mode when dropping/skipping writes > > Disable memzcan (zero-fill memop) in xmon read-only mode > > > > arch/powerpc/Kconfig.debug | 10 + > > arch/powerpc/include/asm/book3s/32/pgtable.h | 5 ++- > > arch/powerpc/include/asm/book3s/64/pgtable.h | 5 ++- > > arch/powerpc/include/asm/nohash/pgtable.h| 5 ++- > > arch/powerpc/xmon/xmon.c | 42 > > 5 files changed, 61 insertions(+), 6 deletions(-) > > > > diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug > > index 4e00cb0a5464..0c7f21476018 100644 > > --- a/arch/powerpc/Kconfig.debug > > +++ b/arch/powerpc/Kconfig.debug > > @@ -117,6 +117,16 @@ config XMON_DISASSEMBLY > > to say Y here, unless you're building for a memory-constrained > > system. > > > > > +config XMON_RW > > + bool "Allow xmon read and write operations" > > + depends on XMON > > + default !STRICT_KERNEL_RWX > > + help > > + Allow xmon to read and write to memory and special-purpose > > registers. > > + Conversely, prevent xmon write access when set to N. Read and > > write > > + access can also be explicitly controlled with 'xmon=rw' or > > 'xmon=ro' > > + (read-only) cmdline options. Default is !STRICT_KERNEL_RWX. > > Maybe I am a dumb, but I found this *extremely* confusing. > Conventionally Kconfig options will control what code is and is not > included in the kernel (see XMON_DISASSEMBLY) rather than changing the > default behaviour of code. It's not wrong to do so and I'm going to > assume that you were following the pattern of XMON_DEFAULT, but I > think you need to be a little more clear about what option actually > does. Renaming it to XMON_DEFAULT_RO_MODE and re-wording the > description to indicate it's a only a mode change would help a lot. > > Sorry if this comes across as pointless bikeshedding since it's the > opposite of what Christophe said in the last patch, but this was a bit > of a head scratcher. > If anyone is dumb here it's me for making this confusing :) I chatted with Michael Ellerman about this, so let me try to explain this more clearly. There are two things I am trying to address with XMON_RW: 1) provide a default access mode for xmon based on system "security" 2) replace XMON in the decision to write-protect kernel text at compile-time I think a single Kconfig for both of those things is sensible as ultimately the point is to allow xmon to operate in read-only mode on "secure" systems -- without violating any integrity/security guarantees (such as write-protected kernel text). Christophe suggested looking at STRICT_KERNEL_RWX and I think that option makes the most sense to base XMON_RW on since the description for STRICT_KERNEL_RWX states: > If this is set, kernel text and rodata memory will be made read-only, > and non-text memory will be made non-executable. This provides > protection against certain security exploi
Re: [PATCH v2] powerpc/xmon: add read-only mode
> On April 8, 2019 at 2:37 AM Andrew Donnellan > wrote: > > > On 8/4/19 1:08 pm, Christopher M. Riedl wrote: > > Operations which write to memory and special purpose registers should be > > restricted on systems with integrity guarantees (such as Secure Boot) > > and, optionally, to avoid self-destructive behaviors. > > > > Add a config option, XMON_RW, to control default xmon behavior along > > with kernel cmdline options xmon=ro and xmon=rw for explicit control. > > Use XMON_RW instead of XMON in the condition to set PAGE_KERNEL_TEXT to > > allow xmon in read-only mode alongside write-protected kernel text. > > XMON_RW defaults to !STRICT_KERNEL_RWX. > > > > The following xmon operations are affected: > > memops: > > disable memmove > > disable memset > > disable memzcan > > memex: > > no-op'd mwrite > > super_regs: > > no-op'd write_spr > > bpt_cmds: > > disable > > proc_call: > > disable > > > > Signed-off-by: Christopher M. Riedl > > --- > > v1->v2: > > Use bool type for xmon_is_ro flag > > Replace XMON_RO with XMON_RW config option > > Make XMON_RW dependent on STRICT_KERNEL_RWX > > Use XMON_RW to control PAGE_KERNEL_TEXT > > Add printf in xmon read-only mode when dropping/skipping writes > > Disable memzcan (zero-fill memop) in xmon read-only mode > > > > arch/powerpc/Kconfig.debug | 10 + > > arch/powerpc/include/asm/book3s/32/pgtable.h | 5 ++- > > arch/powerpc/include/asm/book3s/64/pgtable.h | 5 ++- > > arch/powerpc/include/asm/nohash/pgtable.h| 5 ++- > > arch/powerpc/xmon/xmon.c | 42 > > 5 files changed, 61 insertions(+), 6 deletions(-) > > > > diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug > > index 4e00cb0a5464..0c7f21476018 100644 > > --- a/arch/powerpc/Kconfig.debug > > +++ b/arch/powerpc/Kconfig.debug > > @@ -117,6 +117,16 @@ config XMON_DISASSEMBLY > > to say Y here, unless you're building for a memory-constrained > > system. > > > > +config XMON_RW > > + bool "Allow xmon read and write operations" > > "Allow xmon write operations" would be clearer. This option has no > impact on read operations. > Agreed, if the option isn't renamed again I will fix this in the next version :) > > > + depends on XMON > > + default !STRICT_KERNEL_RWX > > + help > > + Allow xmon to read and write to memory and special-purpose registers. > > + Conversely, prevent xmon write access when set to N. Read and > > write > > + access can also be explicitly controlled with 'xmon=rw' or > > 'xmon=ro' > > + (read-only) cmdline options. Default is !STRICT_KERNEL_RWX. > > This is an improvement but still doesn't clearly explain the > relationship between selecting this option and using the cmdline options. > I will reword this in the next version. > > > + > > config DEBUGGER > > bool > > depends on KGDB || XMON > > diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h > > b/arch/powerpc/include/asm/book3s/32/pgtable.h > > index aa8406b8f7ba..615144ad667d 100644 > > --- a/arch/powerpc/include/asm/book3s/32/pgtable.h > > +++ b/arch/powerpc/include/asm/book3s/32/pgtable.h > > @@ -86,8 +86,9 @@ static inline bool pte_user(pte_t pte) > >* set breakpoints anywhere, so don't write protect the kernel text > >* on platforms where such control is possible. > >*/ > > -#if defined(CONFIG_KGDB) || defined(CONFIG_XMON) || > > defined(CONFIG_BDI_SWITCH) ||\ > > - defined(CONFIG_KPROBES) || defined(CONFIG_DYNAMIC_FTRACE) > > +#if defined(CONFIG_KGDB) || defined(CONFIG_XMON_RW) || \ > > + defined(CONFIG_BDI_SWITCH) || defined(CONFIG_KPROBES) || \ > > + defined(CONFIG_DYNAMIC_FTRACE) > > #define PAGE_KERNEL_TEXT PAGE_KERNEL_X > > #else > > #define PAGE_KERNEL_TEXT PAGE_KERNEL_ROX > > diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h > > b/arch/powerpc/include/asm/book3s/64/pgtable.h > > index 581f91be9dd4..bc4655122f6b 100644 > > --- a/arch/powerpc/include/asm/book3s/64/pgtable.h > > +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h > > @@ -168,8 +168,9 @@ > >* set breakpoints anywhere, so don't write protect the kernel text > >* on platforms where such control is possible. > >*/ >
Re: [PATCH v2] powerpc/xmon: add read-only mode
> On April 11, 2019 at 8:37 AM Michael Ellerman wrote: > > > Christopher M Riedl writes: > >> On April 8, 2019 at 1:34 AM Oliver wrote: > >> On Mon, Apr 8, 2019 at 1:06 PM Christopher M. Riedl > >> wrote: > ... > >> > > >> > diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug > >> > index 4e00cb0a5464..0c7f21476018 100644 > >> > --- a/arch/powerpc/Kconfig.debug > >> > +++ b/arch/powerpc/Kconfig.debug > >> > @@ -117,6 +117,16 @@ config XMON_DISASSEMBLY > >> > to say Y here, unless you're building for a memory-constrained > >> > system. > >> > > >> > >> > +config XMON_RW > >> > + bool "Allow xmon read and write operations" > >> > + depends on XMON > >> > + default !STRICT_KERNEL_RWX > >> > + help > >> > + Allow xmon to read and write to memory and special-purpose > >> > registers. > >> > + Conversely, prevent xmon write access when set to N. Read and > >> > write > >> > + access can also be explicitly controlled with 'xmon=rw' or > >> > 'xmon=ro' > >> > + (read-only) cmdline options. Default is !STRICT_KERNEL_RWX. > >> > >> Maybe I am a dumb, but I found this *extremely* confusing. > >> Conventionally Kconfig options will control what code is and is not > >> included in the kernel (see XMON_DISASSEMBLY) rather than changing the > >> default behaviour of code. It's not wrong to do so and I'm going to > >> assume that you were following the pattern of XMON_DEFAULT, but I > >> think you need to be a little more clear about what option actually > >> does. Renaming it to XMON_DEFAULT_RO_MODE and re-wording the > >> description to indicate it's a only a mode change would help a lot. > >> > >> Sorry if this comes across as pointless bikeshedding since it's the > >> opposite of what Christophe said in the last patch, but this was a bit > >> of a head scratcher. > > > > If anyone is dumb here it's me for making this confusing :) > > I chatted with Michael Ellerman about this, so let me try to explain this > > more clearly. > > Yeah it's my fault :) > "Signed-off-by: Christopher M. Riedl" -- I take full responsibility hah. > > > There are two things I am trying to address with XMON_RW: > > 1) provide a default access mode for xmon based on system "security" > > I think I've gone off this idea. Tying them together is just enforcing a > linkage that people may not want. > > I think XMON_RW should just be an option that stands on its own. It > should probably be default n, to give people a safe default. > Next version includes this along with making it clear that this option provides the default mode for XMON. > > > 2) replace XMON in the decision to write-protect kernel text at compile-time > > We should do that as a separate patch. That's actually a bug in the > current STRICT_KERNEL_RWX support. > > ie. STRICT_KERNEL_RWX should always give you PAGE_KERNEL_ROX, regardless > of XMON or anything else. > > > I think a single Kconfig for both of those things is sensible as ultimately > > the > > point is to allow xmon to operate in read-only mode on "secure" systems -- > > without > > violating any integrity/security guarantees (such as write-protected kernel > > text). > > > > Christophe suggested looking at STRICT_KERNEL_RWX and I think that option > > makes the > > most sense to base XMON_RW on since the description for STRICT_KERNEL_RWX > > states: > > Once we fix the bugs in STRICT_KERNEL_RWX people are going to enable > that by default, so it will essentially be always on in future. > > > > With that said, I will remove the 'xmon=rw' cmdline option as it really > > doesn't work > > since kernel text is write-protected at compile time. > > I think 'xmon=rw' still makes sense. Only some of the RW functionality > relies on being able to patch kernel text. > > And once you have proccall() you can just call a function to make it > read/write anyway, or use memex to manually frob the page tables. > > cheers Great, adding this back in the next version.
[PATCH v3] powerpc/xmon: add read-only mode
Operations which write to memory and special purpose registers should be restricted on systems with integrity guarantees (such as Secure Boot) and, optionally, to avoid self-destructive behaviors. Add a config option, XMON_DEFAULT_RO_MODE, to set default xmon behavior. The kernel cmdline options xmon=ro and xmon=rw override this default. The following xmon operations are affected: memops: disable memmove disable memset disable memzcan memex: no-op'd mwrite super_regs: no-op'd write_spr bpt_cmds: disable proc_call: disable Signed-off-by: Christopher M. Riedl --- v2->v3: Use XMON_DEFAULT_RO_MODE to set xmon read-only mode Untangle read-only mode from STRICT_KERNEL_RWX and PAGE_KERNEL_ROX Update printed msg string for write ops in read-only mode arch/powerpc/Kconfig.debug | 8 arch/powerpc/xmon/xmon.c | 42 ++ 2 files changed, 50 insertions(+) diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug index 4e00cb0a5464..8de4823dfb86 100644 --- a/arch/powerpc/Kconfig.debug +++ b/arch/powerpc/Kconfig.debug @@ -117,6 +117,14 @@ config XMON_DISASSEMBLY to say Y here, unless you're building for a memory-constrained system. +config XMON_DEFAULT_RO_MODE + bool "Restrict xmon to read-only operations" + depends on XMON + default y + help + Operate xmon in read-only mode. The cmdline options 'xmon=rw' and + 'xmon=ro' override this default. + config DEBUGGER bool depends on KGDB || XMON diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index a0f44f992360..ce98c8049eb6 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -80,6 +80,7 @@ static int set_indicator_token = RTAS_UNKNOWN_SERVICE; #endif static unsigned long in_xmon __read_mostly = 0; static int xmon_on = IS_ENABLED(CONFIG_XMON_DEFAULT); +static bool xmon_is_ro = IS_ENABLED(CONFIG_XMON_DEFAULT_RO_MODE); static unsigned long adrs; static int size = 1; @@ -202,6 +203,8 @@ static void dump_tlb_book3e(void); #define GETWORD(v) (((v)[0] << 24) + ((v)[1] << 16) + ((v)[2] << 8) + (v)[3]) #endif +static const char *xmon_ro_msg = "Operation disabled: xmon in read-only mode\n"; + static char *help_string = "\ Commands:\n\ bshow breakpoints\n\ @@ -989,6 +992,10 @@ cmds(struct pt_regs *excp) memlocate(); break; case 'z': + if (xmon_is_ro) { + printf(xmon_ro_msg); + break; + } memzcan(); break; case 'i': @@ -1042,6 +1049,10 @@ cmds(struct pt_regs *excp) set_lpp_cmd(); break; case 'b': + if (xmon_is_ro) { + printf(xmon_ro_msg); + break; + } bpt_cmds(); break; case 'C': @@ -1055,6 +1066,10 @@ cmds(struct pt_regs *excp) bootcmds(); break; case 'p': + if (xmon_is_ro) { + printf(xmon_ro_msg); + break; + } proccall(); break; case 'P': @@ -1777,6 +1792,11 @@ read_spr(int n, unsigned long *vp) static void write_spr(int n, unsigned long val) { + if (xmon_is_ro) { + printf(xmon_ro_msg); + return; + } + if (setjmp(bus_error_jmp) == 0) { catch_spr_faults = 1; sync(); @@ -2016,6 +2036,12 @@ mwrite(unsigned long adrs, void *buf, int size) char *p, *q; n = 0; + + if (xmon_is_ro) { + printf(xmon_ro_msg); + return n; + } + if (setjmp(bus_error_jmp) == 0) { catch_memory_errors = 1; sync(); @@ -2884,9 +2910,17 @@ memops(int cmd) scanhex((void *)&mcount); switch( cmd ){ case 'm': + if (xmon_is_ro) { + printf(xmon_ro_msg); + break; + } memmove((void *)mdest, (void *)msrc, mcount); break; case 's': + if (xmon_is_ro) { + printf(xmon_ro_msg); + break; + } memset((void *)mdest, m
[PATCH v4] powerpc/xmon: add read-only mode
Operations which write to memory and special purpose registers should be restricted on systems with integrity guarantees (such as Secure Boot) and, optionally, to avoid self-destructive behaviors. Add a config option, XMON_DEFAULT_RO_MODE, to set default xmon behavior. The kernel cmdline options xmon=ro and xmon=rw override this default. The following xmon operations are affected: memops: disable memmove disable memset disable memzcan memex: no-op'd mwrite super_regs: no-op'd write_spr bpt_cmds: disable proc_call: disable Signed-off-by: Christopher M. Riedl Reviewed-by: Oliver O'Halloran --- v3->v4: Address Andrew's nitpick. arch/powerpc/Kconfig.debug | 8 arch/powerpc/xmon/xmon.c | 42 ++ 2 files changed, 50 insertions(+) diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug index 4e00cb0a5464..326ac5ea3f72 100644 --- a/arch/powerpc/Kconfig.debug +++ b/arch/powerpc/Kconfig.debug @@ -117,6 +117,14 @@ config XMON_DISASSEMBLY to say Y here, unless you're building for a memory-constrained system. +config XMON_DEFAULT_RO_MODE + bool "Restrict xmon to read-only operations by default" + depends on XMON + default y + help + Operate xmon in read-only mode. The cmdline options 'xmon=rw' and + 'xmon=ro' override this default. + config DEBUGGER bool depends on KGDB || XMON diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index a0f44f992360..ce98c8049eb6 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -80,6 +80,7 @@ static int set_indicator_token = RTAS_UNKNOWN_SERVICE; #endif static unsigned long in_xmon __read_mostly = 0; static int xmon_on = IS_ENABLED(CONFIG_XMON_DEFAULT); +static bool xmon_is_ro = IS_ENABLED(CONFIG_XMON_DEFAULT_RO_MODE); static unsigned long adrs; static int size = 1; @@ -202,6 +203,8 @@ static void dump_tlb_book3e(void); #define GETWORD(v) (((v)[0] << 24) + ((v)[1] << 16) + ((v)[2] << 8) + (v)[3]) #endif +static const char *xmon_ro_msg = "Operation disabled: xmon in read-only mode\n"; + static char *help_string = "\ Commands:\n\ bshow breakpoints\n\ @@ -989,6 +992,10 @@ cmds(struct pt_regs *excp) memlocate(); break; case 'z': + if (xmon_is_ro) { + printf(xmon_ro_msg); + break; + } memzcan(); break; case 'i': @@ -1042,6 +1049,10 @@ cmds(struct pt_regs *excp) set_lpp_cmd(); break; case 'b': + if (xmon_is_ro) { + printf(xmon_ro_msg); + break; + } bpt_cmds(); break; case 'C': @@ -1055,6 +1066,10 @@ cmds(struct pt_regs *excp) bootcmds(); break; case 'p': + if (xmon_is_ro) { + printf(xmon_ro_msg); + break; + } proccall(); break; case 'P': @@ -1777,6 +1792,11 @@ read_spr(int n, unsigned long *vp) static void write_spr(int n, unsigned long val) { + if (xmon_is_ro) { + printf(xmon_ro_msg); + return; + } + if (setjmp(bus_error_jmp) == 0) { catch_spr_faults = 1; sync(); @@ -2016,6 +2036,12 @@ mwrite(unsigned long adrs, void *buf, int size) char *p, *q; n = 0; + + if (xmon_is_ro) { + printf(xmon_ro_msg); + return n; + } + if (setjmp(bus_error_jmp) == 0) { catch_memory_errors = 1; sync(); @@ -2884,9 +2910,17 @@ memops(int cmd) scanhex((void *)&mcount); switch( cmd ){ case 'm': + if (xmon_is_ro) { + printf(xmon_ro_msg); + break; + } memmove((void *)mdest, (void *)msrc, mcount); break; case 's': + if (xmon_is_ro) { + printf(xmon_ro_msg); + break; + } memset((void *)mdest, mval, mcount); break; case 'd': @@ -3796,6 +3830,14 @@ static int __init early_parse_xmon(char
[RFC PATCH 2/3] powerpc/lib: Initialize a temporary mm for code patching
When code patching a STRICT_KERNEL_RWX kernel the page containing the address to be patched is temporarily mapped with permissive memory protections. Currently, a per-cpu vmalloc patch area is used for this purpose. While the patch area is per-cpu, the temporary page mapping is inserted into the kernel page tables for the duration of the patching. The mapping is exposed to CPUs other than the patching CPU - this is undesirable from a hardening perspective. Use the `poking_init` init hook to prepare a temporary mm and patching address. Initialize the temporary mm by copying the init mm. Choose a randomized patching address inside the temporary mm userspace address portion. The next patch uses the temporary mm and patching address for code patching. Based on x86 implementation: commit 4fc19708b165 ("x86/alternatives: Initialize temporary mm for patching") Signed-off-by: Christopher M. Riedl --- arch/powerpc/lib/code-patching.c | 26 ++ 1 file changed, 26 insertions(+) diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index 3345f039a876..18b88ecfc5a8 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -11,6 +11,8 @@ #include #include #include +#include +#include #include #include @@ -39,6 +41,30 @@ int raw_patch_instruction(unsigned int *addr, unsigned int instr) } #ifdef CONFIG_STRICT_KERNEL_RWX + +__ro_after_init struct mm_struct *patching_mm; +__ro_after_init unsigned long patching_addr; + +void __init poking_init(void) +{ + spinlock_t *ptl; /* for protecting pte table */ + pte_t *ptep; + + patching_mm = copy_init_mm(); + BUG_ON(!patching_mm); + + /* +* In hash we cannot go above DEFAULT_MAP_WINDOW easily. +* XXX: Do we want additional bits of entropy for radix? +*/ + patching_addr = (get_random_long() & PAGE_MASK) % + (DEFAULT_MAP_WINDOW - PAGE_SIZE); + + ptep = get_locked_pte(patching_mm, patching_addr, &ptl); + BUG_ON(!ptep); + pte_unmap_unlock(ptep, ptl); +} + static DEFINE_PER_CPU(struct vm_struct *, text_poke_area); static int text_area_cpu_up(unsigned int cpu) -- 2.25.1
[RFC PATCH 0/3] Use per-CPU temporary mappings for patching
When compiled with CONFIG_STRICT_KERNEL_RWX, the kernel must create temporary mappings when patching itself. These mappings temporarily override the strict RWX text protections to permit a write. Currently, powerpc allocates a per-CPU VM area for patching. Patching occurs as follows: 1. Map page of text to be patched to per-CPU VM area w/ PAGE_KERNEL protection 2. Patch text 3. Remove the temporary mapping While the VM area is per-CPU, the mapping is actually inserted into the kernel page tables. Presumably, this could allow another CPU to access the normally write-protected text - either malicously or accidentally - via this same mapping if the address of the VM area is known. Ideally, the mapping should be kept local to the CPU doing the patching (or any other sensitive operations requiring temporarily overriding memory protections) [0]. x86 introduced "temporary mm" structs which allow the creation of mappings local to a particular CPU [1]. This series intends to bring the notion of a temporary mm to powerpc and harden powerpc by using such a mapping for patching a kernel with strict RWX permissions. The first patch introduces the temporary mm struct and API for powerpc along with a new function to retrieve a current hw breakpoint. The second patch uses the `poking_init` init hook added by the x86 patches to initialize a temporary mm and patching address. The patching address is randomized between 0 and DEFAULT_MAP_WINDOW-PAGE_SIZE. The upper limit is necessary due to how the hash MMU operates - by default the space above DEFAULT_MAP_WINDOW is not available. For now, both hash and radix randomize inside this range. The number of possible random addresses is dependent on PAGE_SIZE and limited by DEFAULT_MAP_WINDOW. Bits of entropy with 64K page size on BOOK3S_64: bits-o-entropy = log2(DEFAULT_MAP_WINDOW_USER64 / PAGE_SIZE) PAGE_SIZE=64K, DEFAULT_MAP_WINDOW_USER64=128TB bits-o-entropy = log2(128TB / 64K) bits-o-entropy = 31 Currently, randomization occurs only once during initialization at boot. The third patch replaces the VM area with the temporary mm in the patching code. The page for patching has to be mapped PAGE_SHARED with the hash MMU since hash prevents the kernel from accessing userspace pages with PAGE_PRIVILEGED bit set. There is on-going work on my side to explore if this is actually necessary in the hash codepath. Testing so far is limited to booting on QEMU (power8 and power9 targets) and a POWER8 VM along with setting some simple xmon breakpoints (which makes use of code-patching). A POC lkdtm test is in-progress to actually exploit the existing vulnerability (ie. the mapping during patching is exposed in kernel page tables and accessible by other CPUS) - this will accompany a future v1 of this series. [0]: https://github.com/linuxppc/issues/issues/224 [1]: https://lore.kernel.org/kernel-hardening/20190426232303.28381-1-nadav.a...@gmail.com/ Christopher M. Riedl (3): powerpc/mm: Introduce temporary mm powerpc/lib: Initialize a temporary mm for code patching powerpc/lib: Use a temporary mm for code patching arch/powerpc/include/asm/debug.h | 1 + arch/powerpc/include/asm/mmu_context.h | 56 +- arch/powerpc/kernel/process.c | 5 + arch/powerpc/lib/code-patching.c | 140 ++--- 4 files changed, 137 insertions(+), 65 deletions(-) -- 2.25.1
[RFC PATCH 3/3] powerpc/lib: Use a temporary mm for code patching
Currently, code patching a STRICT_KERNEL_RWX exposes the temporary mappings to other CPUs. These mappings should be kept local to the CPU doing the patching. Use the pre-initialized temporary mm and patching address for this purpose. Also add a check after patching to ensure the patch succeeded. Based on x86 implementation: commit b3fd8e83ada0 ("x86/alternatives: Use temporary mm for text poking") Signed-off-by: Christopher M. Riedl --- arch/powerpc/lib/code-patching.c | 128 ++- 1 file changed, 57 insertions(+), 71 deletions(-) diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index 18b88ecfc5a8..f156132e8975 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -19,6 +19,7 @@ #include #include #include +#include static int __patch_instruction(unsigned int *exec_addr, unsigned int instr, unsigned int *patch_addr) @@ -65,99 +66,79 @@ void __init poking_init(void) pte_unmap_unlock(ptep, ptl); } -static DEFINE_PER_CPU(struct vm_struct *, text_poke_area); - -static int text_area_cpu_up(unsigned int cpu) -{ - struct vm_struct *area; - - area = get_vm_area(PAGE_SIZE, VM_ALLOC); - if (!area) { - WARN_ONCE(1, "Failed to create text area for cpu %d\n", - cpu); - return -1; - } - this_cpu_write(text_poke_area, area); - - return 0; -} - -static int text_area_cpu_down(unsigned int cpu) -{ - free_vm_area(this_cpu_read(text_poke_area)); - return 0; -} - -/* - * Run as a late init call. This allows all the boot time patching to be done - * simply by patching the code, and then we're called here prior to - * mark_rodata_ro(), which happens after all init calls are run. Although - * BUG_ON() is rude, in this case it should only happen if ENOMEM, and we judge - * it as being preferable to a kernel that will crash later when someone tries - * to use patch_instruction(). - */ -static int __init setup_text_poke_area(void) -{ - BUG_ON(!cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, - "powerpc/text_poke:online", text_area_cpu_up, - text_area_cpu_down)); - - return 0; -} -late_initcall(setup_text_poke_area); +struct patch_mapping { + spinlock_t *ptl; /* for protecting pte table */ + struct temp_mm temp_mm; +}; /* * This can be called for kernel text or a module. */ -static int map_patch_area(void *addr, unsigned long text_poke_addr) +static int map_patch(const void *addr, struct patch_mapping *patch_mapping) { - unsigned long pfn; - int err; + struct page *page; + pte_t pte, *ptep; + pgprot_t pgprot; if (is_vmalloc_addr(addr)) - pfn = vmalloc_to_pfn(addr); + page = vmalloc_to_page(addr); else - pfn = __pa_symbol(addr) >> PAGE_SHIFT; + page = virt_to_page(addr); - err = map_kernel_page(text_poke_addr, (pfn << PAGE_SHIFT), PAGE_KERNEL); + if (radix_enabled()) + pgprot = __pgprot(pgprot_val(PAGE_KERNEL)); + else + pgprot = PAGE_SHARED; - pr_devel("Mapped addr %lx with pfn %lx:%d\n", text_poke_addr, pfn, err); - if (err) + ptep = get_locked_pte(patching_mm, patching_addr, &patch_mapping->ptl); + if (unlikely(!ptep)) { + pr_warn("map patch: failed to allocate pte for patching\n"); return -1; + } + + pte = mk_pte(page, pgprot); + set_pte_at(patching_mm, patching_addr, ptep, pte); + + init_temp_mm(&patch_mapping->temp_mm, patching_mm); + use_temporary_mm(&patch_mapping->temp_mm); return 0; } -static inline int unmap_patch_area(unsigned long addr) +static int unmap_patch(struct patch_mapping *patch_mapping) { pte_t *ptep; pmd_t *pmdp; pud_t *pudp; pgd_t *pgdp; - pgdp = pgd_offset_k(addr); + pgdp = pgd_offset(patching_mm, patching_addr); if (unlikely(!pgdp)) return -EINVAL; - pudp = pud_offset(pgdp, addr); + pudp = pud_offset(pgdp, patching_addr); if (unlikely(!pudp)) return -EINVAL; - pmdp = pmd_offset(pudp, addr); + pmdp = pmd_offset(pudp, patching_addr); if (unlikely(!pmdp)) return -EINVAL; - ptep = pte_offset_kernel(pmdp, addr); + ptep = pte_offset_kernel(pmdp, patching_addr); if (unlikely(!ptep)) return -EINVAL; - pr_devel("clearing mm %p, pte %p, addr %lx\n", &init_mm, ptep, addr); + /* +* In hash, pte_clear flushes the tlb +*/ + pte_clear(patching_mm, patching_addr, ptep); + unuse_temporary_mm(&patch_mapping->temp_mm); /* -* In hash, pte_c
[RFC PATCH 1/3] powerpc/mm: Introduce temporary mm
x86 supports the notion of a temporary mm which restricts access to temporary PTEs to a single CPU. A temporary mm is useful for situations where a CPU needs to perform sensitive operations (such as patching a STRICT_KERNEL_RWX kernel) requiring temporary mappings without exposing said mappings to other CPUs. A side benefit is that other CPU TLBs do not need to be flushed when the temporary mm is torn down. Mappings in the temporary mm can be set in the userspace portion of the address-space. Interrupts must be disabled while the temporary mm is in use. HW breakpoints, which may have been set by userspace as watchpoints on addresses now within the temporary mm, are saved and disabled when loading the temporary mm. The HW breakpoints are restored when unloading the temporary mm. All HW breakpoints are indiscriminately disabled while the temporary mm is in use. Based on x86 implementation: commit cefa929c034e ("x86/mm: Introduce temporary mm structs") Signed-off-by: Christopher M. Riedl --- arch/powerpc/include/asm/debug.h | 1 + arch/powerpc/include/asm/mmu_context.h | 56 +- arch/powerpc/kernel/process.c | 5 +++ 3 files changed, 61 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/debug.h b/arch/powerpc/include/asm/debug.h index 7756026b95ca..b945bc16c932 100644 --- a/arch/powerpc/include/asm/debug.h +++ b/arch/powerpc/include/asm/debug.h @@ -45,6 +45,7 @@ static inline int debugger_break_match(struct pt_regs *regs) { return 0; } static inline int debugger_fault_handler(struct pt_regs *regs) { return 0; } #endif +void __get_breakpoint(struct arch_hw_breakpoint *brk); void __set_breakpoint(struct arch_hw_breakpoint *brk); bool ppc_breakpoint_available(void); #ifdef CONFIG_PPC_ADV_DEBUG_REGS diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h index 360367c579de..3e6381d04c28 100644 --- a/arch/powerpc/include/asm/mmu_context.h +++ b/arch/powerpc/include/asm/mmu_context.h @@ -7,9 +7,10 @@ #include #include #include -#include +#include #include #include +#include /* * Most if the context management is out of line @@ -270,5 +271,58 @@ static inline int arch_dup_mmap(struct mm_struct *oldmm, return 0; } +struct temp_mm { + struct mm_struct *temp; + struct mm_struct *prev; + bool is_kernel_thread; + struct arch_hw_breakpoint brk; +}; + +static inline void init_temp_mm(struct temp_mm *temp_mm, struct mm_struct *mm) +{ + temp_mm->temp = mm; + temp_mm->prev = NULL; + temp_mm->is_kernel_thread = false; + memset(&temp_mm->brk, 0, sizeof(temp_mm->brk)); +} + +static inline void use_temporary_mm(struct temp_mm *temp_mm) +{ + lockdep_assert_irqs_disabled(); + + temp_mm->is_kernel_thread = current->mm == NULL; + if (temp_mm->is_kernel_thread) + temp_mm->prev = current->active_mm; + else + temp_mm->prev = current->mm; + + /* +* Hash requires a non-NULL current->mm to allocate a userspace address +* when handling a page fault. Does not appear to hurt in Radix either. +*/ + current->mm = temp_mm->temp; + switch_mm_irqs_off(NULL, temp_mm->temp, current); + + if (ppc_breakpoint_available()) { + __get_breakpoint(&temp_mm->brk); + if (temp_mm->brk.type != 0) + hw_breakpoint_disable(); + } +} + +static inline void unuse_temporary_mm(struct temp_mm *temp_mm) +{ + lockdep_assert_irqs_disabled(); + + if (temp_mm->is_kernel_thread) + current->mm = NULL; + else + current->mm = temp_mm->prev; + switch_mm_irqs_off(NULL, temp_mm->prev, current); + + if (ppc_breakpoint_available() && temp_mm->brk.type != 0) + __set_breakpoint(&temp_mm->brk); +} + #endif /* __KERNEL__ */ #endif /* __ASM_POWERPC_MMU_CONTEXT_H */ diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index fad50db9dcf2..5e5cf33fc358 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -793,6 +793,11 @@ static inline int set_breakpoint_8xx(struct arch_hw_breakpoint *brk) return 0; } +void __get_breakpoint(struct arch_hw_breakpoint *brk) +{ + memcpy(brk, this_cpu_ptr(¤t_brk), sizeof(*brk)); +} + void __set_breakpoint(struct arch_hw_breakpoint *brk) { memcpy(this_cpu_ptr(¤t_brk), brk, sizeof(*brk)); -- 2.25.1
Re: [RFC PATCH 1/3] powerpc/mm: Introduce temporary mm
> On March 24, 2020 11:07 AM Christophe Leroy wrote: > > > Le 23/03/2020 à 05:52, Christopher M. Riedl a écrit : > > x86 supports the notion of a temporary mm which restricts access to > > temporary PTEs to a single CPU. A temporary mm is useful for situations > > where a CPU needs to perform sensitive operations (such as patching a > > STRICT_KERNEL_RWX kernel) requiring temporary mappings without exposing > > said mappings to other CPUs. A side benefit is that other CPU TLBs do > > not need to be flushed when the temporary mm is torn down. > > > > Mappings in the temporary mm can be set in the userspace portion of the > > address-space. > > > > Interrupts must be disabled while the temporary mm is in use. HW > > breakpoints, which may have been set by userspace as watchpoints on > > addresses now within the temporary mm, are saved and disabled when > > loading the temporary mm. The HW breakpoints are restored when unloading > > the temporary mm. All HW breakpoints are indiscriminately disabled while > > the temporary mm is in use. > > > > Based on x86 implementation: > > > > commit cefa929c034e > > ("x86/mm: Introduce temporary mm structs") > > > > Signed-off-by: Christopher M. Riedl > > --- > > arch/powerpc/include/asm/debug.h | 1 + > > arch/powerpc/include/asm/mmu_context.h | 56 +- > > arch/powerpc/kernel/process.c | 5 +++ > > 3 files changed, 61 insertions(+), 1 deletion(-) > > > > diff --git a/arch/powerpc/include/asm/debug.h > > b/arch/powerpc/include/asm/debug.h > > index 7756026b95ca..b945bc16c932 100644 > > --- a/arch/powerpc/include/asm/debug.h > > +++ b/arch/powerpc/include/asm/debug.h > > @@ -45,6 +45,7 @@ static inline int debugger_break_match(struct pt_regs > > *regs) { return 0; } > > static inline int debugger_fault_handler(struct pt_regs *regs) { return > > 0; } > > #endif > > > > +void __get_breakpoint(struct arch_hw_breakpoint *brk); > > void __set_breakpoint(struct arch_hw_breakpoint *brk); > > bool ppc_breakpoint_available(void); > > #ifdef CONFIG_PPC_ADV_DEBUG_REGS > > diff --git a/arch/powerpc/include/asm/mmu_context.h > > b/arch/powerpc/include/asm/mmu_context.h > > index 360367c579de..3e6381d04c28 100644 > > --- a/arch/powerpc/include/asm/mmu_context.h > > +++ b/arch/powerpc/include/asm/mmu_context.h > > @@ -7,9 +7,10 @@ > > #include > > #include > > #include > > -#include > > +#include > > What's this change ? > I see you are removing a space at the end of the line, but it shouldn't > be part of this patch. > Overly aggressive "helpful" editor setting apparently. Removed this change in the next version. > > #include > > #include > > +#include > > > > /* > >* Most if the context management is out of line > > @@ -270,5 +271,58 @@ static inline int arch_dup_mmap(struct mm_struct > > *oldmm, > > return 0; > > } > > > > +struct temp_mm { > > + struct mm_struct *temp; > > + struct mm_struct *prev; > > + bool is_kernel_thread; > > + struct arch_hw_breakpoint brk; > > +}; > > + > > +static inline void init_temp_mm(struct temp_mm *temp_mm, struct mm_struct > > *mm) > > +{ > > + temp_mm->temp = mm; > > + temp_mm->prev = NULL; > > + temp_mm->is_kernel_thread = false; > > + memset(&temp_mm->brk, 0, sizeof(temp_mm->brk)); > > +} > > + > > +static inline void use_temporary_mm(struct temp_mm *temp_mm) > > +{ > > + lockdep_assert_irqs_disabled(); > > + > > + temp_mm->is_kernel_thread = current->mm == NULL; > > + if (temp_mm->is_kernel_thread) > > + temp_mm->prev = current->active_mm; > > + else > > + temp_mm->prev = current->mm; > > + > > + /* > > +* Hash requires a non-NULL current->mm to allocate a userspace address > > +* when handling a page fault. Does not appear to hurt in Radix either. > > +*/ > > + current->mm = temp_mm->temp; > > + switch_mm_irqs_off(NULL, temp_mm->temp, current); > > + > > + if (ppc_breakpoint_available()) { > > + __get_breakpoint(&temp_mm->brk); > > + if (temp_mm->brk.type != 0) > > + hw_breakpoint_disable(); > > + } > > +} > > + > > +static inline void unuse_temporary_mm(struct temp_mm *tem
Re: [RFC PATCH 2/3] powerpc/lib: Initialize a temporary mm for code patching
> On March 24, 2020 11:10 AM Christophe Leroy wrote: > > > Le 23/03/2020 à 05:52, Christopher M. Riedl a écrit : > > When code patching a STRICT_KERNEL_RWX kernel the page containing the > > address to be patched is temporarily mapped with permissive memory > > protections. Currently, a per-cpu vmalloc patch area is used for this > > purpose. While the patch area is per-cpu, the temporary page mapping is > > inserted into the kernel page tables for the duration of the patching. > > The mapping is exposed to CPUs other than the patching CPU - this is > > undesirable from a hardening perspective. > > > > Use the `poking_init` init hook to prepare a temporary mm and patching > > address. Initialize the temporary mm by copying the init mm. Choose a > > randomized patching address inside the temporary mm userspace address > > portion. The next patch uses the temporary mm and patching address for > > code patching. > > > > Based on x86 implementation: > > > > commit 4fc19708b165 > > ("x86/alternatives: Initialize temporary mm for patching") > > > > Signed-off-by: Christopher M. Riedl > > --- > > arch/powerpc/lib/code-patching.c | 26 ++ > > 1 file changed, 26 insertions(+) > > > > diff --git a/arch/powerpc/lib/code-patching.c > > b/arch/powerpc/lib/code-patching.c > > index 3345f039a876..18b88ecfc5a8 100644 > > --- a/arch/powerpc/lib/code-patching.c > > +++ b/arch/powerpc/lib/code-patching.c > > @@ -11,6 +11,8 @@ > > #include > > #include > > #include > > +#include > > +#include > > > > #include > > #include > > @@ -39,6 +41,30 @@ int raw_patch_instruction(unsigned int *addr, unsigned > > int instr) > > } > > > > #ifdef CONFIG_STRICT_KERNEL_RWX > > + > > +__ro_after_init struct mm_struct *patching_mm; > > +__ro_after_init unsigned long patching_addr; > > Can we make those those static ? > Yes, makes sense to me. > > + > > +void __init poking_init(void) > > +{ > > + spinlock_t *ptl; /* for protecting pte table */ > > + pte_t *ptep; > > + > > + patching_mm = copy_init_mm(); > > + BUG_ON(!patching_mm); > > Does it needs to be a BUG_ON() ? Can't we fail gracefully with just a > WARN_ON ? > I'm not sure what failing gracefully means here? The main reason this could fail is if there is not enough memory to allocate the patching_mm. The previous implementation had this justification for BUG_ON(): /* * Run as a late init call. This allows all the boot time patching to be done * simply by patching the code, and then we're called here prior to * mark_rodata_ro(), which happens after all init calls are run. Although * BUG_ON() is rude, in this case it should only happen if ENOMEM, and we judge * it as being preferable to a kernel that will crash later when someone tries * to use patch_instruction(). */ static int __init setup_text_poke_area(void) { BUG_ON(!cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "powerpc/text_poke:online", text_area_cpu_up, text_area_cpu_down)); return 0; } late_initcall(setup_text_poke_area); I think the BUG_ON() is appropriate even if only to adhere to the previous judgement call. I can add a similar comment explaining the reasoning if that helps. > > + > > + /* > > +* In hash we cannot go above DEFAULT_MAP_WINDOW easily. > > +* XXX: Do we want additional bits of entropy for radix? > > +*/ > > + patching_addr = (get_random_long() & PAGE_MASK) % > > + (DEFAULT_MAP_WINDOW - PAGE_SIZE); > > + > > + ptep = get_locked_pte(patching_mm, patching_addr, &ptl); > > + BUG_ON(!ptep); > > Same here, can we fail gracefully instead ? > Same reasoning as above. > > + pte_unmap_unlock(ptep, ptl); > > +} > > + > > static DEFINE_PER_CPU(struct vm_struct *, text_poke_area); > > > > static int text_area_cpu_up(unsigned int cpu) > > > > Christophe
Re: [RFC PATCH 2/3] powerpc/lib: Initialize a temporary mm for code patching
> On April 8, 2020 6:01 AM Christophe Leroy wrote: > > > Le 31/03/2020 à 05:19, Christopher M Riedl a écrit : > >> On March 24, 2020 11:10 AM Christophe Leroy > >> wrote: > >> > >> > >> Le 23/03/2020 à 05:52, Christopher M. Riedl a écrit : > >>> When code patching a STRICT_KERNEL_RWX kernel the page containing the > >>> address to be patched is temporarily mapped with permissive memory > >>> protections. Currently, a per-cpu vmalloc patch area is used for this > >>> purpose. While the patch area is per-cpu, the temporary page mapping is > >>> inserted into the kernel page tables for the duration of the patching. > >>> The mapping is exposed to CPUs other than the patching CPU - this is > >>> undesirable from a hardening perspective. > >>> > >>> Use the `poking_init` init hook to prepare a temporary mm and patching > >>> address. Initialize the temporary mm by copying the init mm. Choose a > >>> randomized patching address inside the temporary mm userspace address > >>> portion. The next patch uses the temporary mm and patching address for > >>> code patching. > >>> > >>> Based on x86 implementation: > >>> > >>> commit 4fc19708b165 > >>> ("x86/alternatives: Initialize temporary mm for patching") > >>> > >>> Signed-off-by: Christopher M. Riedl > >>> --- > >>>arch/powerpc/lib/code-patching.c | 26 ++ > >>>1 file changed, 26 insertions(+) > >>> > >>> diff --git a/arch/powerpc/lib/code-patching.c > >>> b/arch/powerpc/lib/code-patching.c > >>> index 3345f039a876..18b88ecfc5a8 100644 > >>> --- a/arch/powerpc/lib/code-patching.c > >>> +++ b/arch/powerpc/lib/code-patching.c > >>> @@ -11,6 +11,8 @@ > >>>#include > >>>#include > >>>#include > >>> +#include > >>> +#include > >>> > >>>#include > >>>#include > >>> @@ -39,6 +41,30 @@ int raw_patch_instruction(unsigned int *addr, unsigned > >>> int instr) > >>>} > >>> > >>>#ifdef CONFIG_STRICT_KERNEL_RWX > >>> + > >>> +__ro_after_init struct mm_struct *patching_mm; > >>> +__ro_after_init unsigned long patching_addr; > >> > >> Can we make those those static ? > >> > > > > Yes, makes sense to me. > > > >>> + > >>> +void __init poking_init(void) > >>> +{ > >>> + spinlock_t *ptl; /* for protecting pte table */ > >>> + pte_t *ptep; > >>> + > >>> + patching_mm = copy_init_mm(); > >>> + BUG_ON(!patching_mm); > >> > >> Does it needs to be a BUG_ON() ? Can't we fail gracefully with just a > >> WARN_ON ? > >> > > > > I'm not sure what failing gracefully means here? The main reason this could > > fail is if there is not enough memory to allocate the patching_mm. The > > previous implementation had this justification for BUG_ON(): > > But the system can continue running just fine after this failure. > Only the things that make use of code patching will fail (ftrace, kgdb, ...) > > Checkpatch tells: "Avoid crashing the kernel - try using WARN_ON & > recovery code rather than BUG() or BUG_ON()" > > All vital code patching has already been done previously, so I think a > WARN_ON() should be enough, plus returning non 0 to indicate that the > late_initcall failed. > > Got it, makes sense to me. I will make these changes in the next version. Thanks! > > > > /* > > * Run as a late init call. This allows all the boot time patching to be > > done > > * simply by patching the code, and then we're called here prior to > > * mark_rodata_ro(), which happens after all init calls are run. Although > > * BUG_ON() is rude, in this case it should only happen if ENOMEM, and we > > judge > > * it as being preferable to a kernel that will crash later when someone > > tries > > * to use patch_instruction(). > > */ > > static int __init setup_text_poke_area(void) > > { > > BUG_ON(!cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, > > "powerpc/text_poke:online", text_area_cpu_up, > > text_area_cpu_down)); > > > > return 0; > > } > > late_initcall(setup_text_poke_area); > > > > I think the BUG_ON() is appropriate even if only to adhere to the previous > > judgement call. I can add a similar comment explaining the reasoning if > > that helps. > > > >>> + > >>> + /* > >>> + * In hash we cannot go above DEFAULT_MAP_WINDOW easily. > >>> + * XXX: Do we want additional bits of entropy for radix? > >>> + */ > >>> + patching_addr = (get_random_long() & PAGE_MASK) % > >>> + (DEFAULT_MAP_WINDOW - PAGE_SIZE); > >>> + > >>> + ptep = get_locked_pte(patching_mm, patching_addr, &ptl); > >>> + BUG_ON(!ptep); > >> > >> Same here, can we fail gracefully instead ? > >> > > > > Same reasoning as above. > > Here as well, a WARN_ON() should be enough, the system will continue > running after that. > > > > >>> + pte_unmap_unlock(ptep, ptl); > >>> +} > >>> + > >>>static DEFINE_PER_CPU(struct vm_struct *, text_poke_area); > >>> > >>>static int text_area_cpu_up(unsigned int cpu) > >>> > >> > >> Christophe > > Christophe
Re: [RFC PATCH 3/3] powerpc/lib: Use a temporary mm for code patching
> On March 24, 2020 11:25 AM Christophe Leroy wrote: > > > Le 23/03/2020 à 05:52, Christopher M. Riedl a écrit : > > Currently, code patching a STRICT_KERNEL_RWX exposes the temporary > > mappings to other CPUs. These mappings should be kept local to the CPU > > doing the patching. Use the pre-initialized temporary mm and patching > > address for this purpose. Also add a check after patching to ensure the > > patch succeeded. > > > > Based on x86 implementation: > > > > commit b3fd8e83ada0 > > ("x86/alternatives: Use temporary mm for text poking") > > > > Signed-off-by: Christopher M. Riedl > > --- > > arch/powerpc/lib/code-patching.c | 128 ++- > > 1 file changed, 57 insertions(+), 71 deletions(-) > > > > diff --git a/arch/powerpc/lib/code-patching.c > > b/arch/powerpc/lib/code-patching.c > > index 18b88ecfc5a8..f156132e8975 100644 > > --- a/arch/powerpc/lib/code-patching.c > > +++ b/arch/powerpc/lib/code-patching.c > > @@ -19,6 +19,7 @@ > > #include > > #include > > #include > > +#include > > > > static int __patch_instruction(unsigned int *exec_addr, unsigned int > > instr, > >unsigned int *patch_addr) > > @@ -65,99 +66,79 @@ void __init poking_init(void) > > pte_unmap_unlock(ptep, ptl); > > } > > > > -static DEFINE_PER_CPU(struct vm_struct *, text_poke_area); > > - > > -static int text_area_cpu_up(unsigned int cpu) > > -{ > > - struct vm_struct *area; > > - > > - area = get_vm_area(PAGE_SIZE, VM_ALLOC); > > - if (!area) { > > - WARN_ONCE(1, "Failed to create text area for cpu %d\n", > > - cpu); > > - return -1; > > - } > > - this_cpu_write(text_poke_area, area); > > - > > - return 0; > > -} > > - > > -static int text_area_cpu_down(unsigned int cpu) > > -{ > > - free_vm_area(this_cpu_read(text_poke_area)); > > - return 0; > > -} > > - > > -/* > > - * Run as a late init call. This allows all the boot time patching to be > > done > > - * simply by patching the code, and then we're called here prior to > > - * mark_rodata_ro(), which happens after all init calls are run. Although > > - * BUG_ON() is rude, in this case it should only happen if ENOMEM, and we > > judge > > - * it as being preferable to a kernel that will crash later when someone > > tries > > - * to use patch_instruction(). > > - */ > > -static int __init setup_text_poke_area(void) > > -{ > > - BUG_ON(!cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, > > - "powerpc/text_poke:online", text_area_cpu_up, > > - text_area_cpu_down)); > > - > > - return 0; > > -} > > -late_initcall(setup_text_poke_area); > > +struct patch_mapping { > > + spinlock_t *ptl; /* for protecting pte table */ > > + struct temp_mm temp_mm; > > +}; > > > > /* > >* This can be called for kernel text or a module. > >*/ > > -static int map_patch_area(void *addr, unsigned long text_poke_addr) > > +static int map_patch(const void *addr, struct patch_mapping *patch_mapping) > > Why change the name ? > It's not really an "area" anymore. > > { > > - unsigned long pfn; > > - int err; > > + struct page *page; > > + pte_t pte, *ptep; > > + pgprot_t pgprot; > > > > if (is_vmalloc_addr(addr)) > > - pfn = vmalloc_to_pfn(addr); > > + page = vmalloc_to_page(addr); > > else > > - pfn = __pa_symbol(addr) >> PAGE_SHIFT; > > + page = virt_to_page(addr); > > > > - err = map_kernel_page(text_poke_addr, (pfn << PAGE_SHIFT), PAGE_KERNEL); > > + if (radix_enabled()) > > + pgprot = __pgprot(pgprot_val(PAGE_KERNEL)); > > + else > > + pgprot = PAGE_SHARED; > > Can you explain the difference between radix and non radix ? > > Why PAGE_KERNEL for a page that is mapped in userspace ? > > Why do you need to do __pgprot(pgprot_val(PAGE_KERNEL)) instead of just > using PAGE_KERNEL ? > On hash there is a manual check which prevents setting _PAGE_PRIVILEGED for kernel to userspace access in __hash_page - hence we cannot access the mapping if the page is mapped PAGE_KERNEL on hash. However, I would like to use PAGE_KERNEL here as well and am working on understanding why this check is done in hash and i
Re: [RFC PATCH] powerpc/lib: Fixing use a temporary mm for code patching
> On March 26, 2020 9:42 AM Christophe Leroy wrote: > > > This patch fixes the RFC series identified below. > It fixes three points: > - Failure with CONFIG_PPC_KUAP > - Failure to write do to lack of DIRTY bit set on the 8xx > - Inadequaly complex WARN post verification > > However, it has an impact on the CPU load. Here is the time > needed on an 8xx to run the ftrace selftests without and > with this series: > - Without CONFIG_STRICT_KERNEL_RWX==> 38 seconds > - With CONFIG_STRICT_KERNEL_RWX ==> 40 seconds > - With CONFIG_STRICT_KERNEL_RWX + this series ==> 43 seconds > > Link: https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=166003 > Signed-off-by: Christophe Leroy > --- > arch/powerpc/lib/code-patching.c | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/lib/code-patching.c > b/arch/powerpc/lib/code-patching.c > index f156132e8975..4ccff427592e 100644 > --- a/arch/powerpc/lib/code-patching.c > +++ b/arch/powerpc/lib/code-patching.c > @@ -97,6 +97,7 @@ static int map_patch(const void *addr, struct patch_mapping > *patch_mapping) > } > > pte = mk_pte(page, pgprot); > + pte = pte_mkdirty(pte); > set_pte_at(patching_mm, patching_addr, ptep, pte); > > init_temp_mm(&patch_mapping->temp_mm, patching_mm); > @@ -168,7 +169,9 @@ static int do_patch_instruction(unsigned int *addr, > unsigned int instr) > (offset_in_page((unsigned long)addr) / > sizeof(unsigned int)); > > + allow_write_to_user(patch_addr, sizeof(instr)); > __patch_instruction(addr, instr, patch_addr); > + prevent_write_to_user(patch_addr, sizeof(instr)); > On radix we can map the page with PAGE_KERNEL protection which ends up setting EAA[0] in the radix PTE. This means the KUAP (AMR) protection is ignored (ISA v3.0b Fig. 35) since we are accessing the page from MSR[PR]=0. Can we employ a similar approach on the 8xx? I would prefer *not* to wrap the __patch_instruction() with the allow_/prevent_write_to_user() KUAP things because this is a temporary kernel mapping which really isn't userspace in the usual sense. > err = unmap_patch(&patch_mapping); > if (err) > @@ -179,7 +182,7 @@ static int do_patch_instruction(unsigned int *addr, > unsigned int instr) >* think we just wrote. >* XXX: BUG_ON() instead? >*/ > - WARN_ON(memcmp(addr, &instr, sizeof(instr))); > + WARN_ON(*addr != instr); > > out: > local_irq_restore(flags); > -- > 2.25.0
Re: [RFC PATCH] powerpc/lib: Fixing use a temporary mm for code patching
> On April 15, 2020 4:12 AM Christophe Leroy wrote: > > > Le 15/04/2020 à 07:16, Christopher M Riedl a écrit : > >> On March 26, 2020 9:42 AM Christophe Leroy wrote: > >> > >> > >> This patch fixes the RFC series identified below. > >> It fixes three points: > >> - Failure with CONFIG_PPC_KUAP > >> - Failure to write do to lack of DIRTY bit set on the 8xx > >> - Inadequaly complex WARN post verification > >> > >> However, it has an impact on the CPU load. Here is the time > >> needed on an 8xx to run the ftrace selftests without and > >> with this series: > >> - Without CONFIG_STRICT_KERNEL_RWX ==> 38 seconds > >> - With CONFIG_STRICT_KERNEL_RWX==> 40 seconds > >> - With CONFIG_STRICT_KERNEL_RWX + this series ==> 43 seconds > >> > >> Link: https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=166003 > >> Signed-off-by: Christophe Leroy > >> --- > >> arch/powerpc/lib/code-patching.c | 5 - > >> 1 file changed, 4 insertions(+), 1 deletion(-) > >> > >> diff --git a/arch/powerpc/lib/code-patching.c > >> b/arch/powerpc/lib/code-patching.c > >> index f156132e8975..4ccff427592e 100644 > >> --- a/arch/powerpc/lib/code-patching.c > >> +++ b/arch/powerpc/lib/code-patching.c > >> @@ -97,6 +97,7 @@ static int map_patch(const void *addr, struct > >> patch_mapping *patch_mapping) > >>} > >> > >>pte = mk_pte(page, pgprot); > >> + pte = pte_mkdirty(pte); > >>set_pte_at(patching_mm, patching_addr, ptep, pte); > >> > >>init_temp_mm(&patch_mapping->temp_mm, patching_mm); > >> @@ -168,7 +169,9 @@ static int do_patch_instruction(unsigned int *addr, > >> unsigned int instr) > >>(offset_in_page((unsigned long)addr) / > >>sizeof(unsigned int)); > >> > >> + allow_write_to_user(patch_addr, sizeof(instr)); > >>__patch_instruction(addr, instr, patch_addr); > >> + prevent_write_to_user(patch_addr, sizeof(instr)); > >> > > > > On radix we can map the page with PAGE_KERNEL protection which ends up > > setting EAA[0] in the radix PTE. This means the KUAP (AMR) protection is > > ignored (ISA v3.0b Fig. 35) since we are accessing the page from MSR[PR]=0. > > > > Can we employ a similar approach on the 8xx? I would prefer *not* to wrap > > the __patch_instruction() with the allow_/prevent_write_to_user() KUAP > > things > > because this is a temporary kernel mapping which really isn't userspace in > > the usual sense. > > On the 8xx, that's pretty different. > > The PTE doesn't control whether a page is user page or a kernel page. > The only thing that is set in the PTE is whether a page is linked to a > given PID or not. > PAGE_KERNEL tells that the page can be addressed with any PID. > > The user access right is given by a kind of zone, which is in the PGD > entry. Every pages above PAGE_OFFSET are defined as belonging to zone 0. > Every pages below PAGE_OFFSET are defined as belonging to zone 1. > > By default, zone 0 can only be accessed by kernel, and zone 1 can only > be accessed by user. When kernel wants to access zone 1, it temporarily > changes properties of zone 1 to allow both kernel and user accesses. > > So, if your mapping is below PAGE_OFFSET, it is in zone 1 and kernel > must unlock it to access it. > > > And this is more or less the same on hash/32. This is managed by segment > registers. One segment register corresponds to a 256Mbytes area. Every > pages below PAGE_OFFSET can only be read by default by kernel. Only user > can write if the PTE allows it. When the kernel needs to write at an > address below PAGE_OFFSET, it must change the segment properties in the > corresponding segment register. > > So, for both cases, if we want to have it local to a task while still > allowing kernel access, it means we have to define a new special area > between TASK_SIZE and PAGE_OFFSET which belongs to kernel zone. > > That looks complex to me for a small benefit, especially as 8xx is not > SMP and neither are most of the hash/32 targets. > Agreed. So I guess the solution is to differentiate between radix/non-radix and use PAGE_SHARED for non-radix along with the KUAP functions when KUAP is enabled. Hmm, I need to think about this some more, especially if it's acceptable to temporarily map kernel text as PAGE_SHARED for patching. Do you see any obvious problems on 8xx and hash/32 w/ using PAGE_SHARED? I don't necessarily want to drop the local mm patching idea for non-radix platforms since that means we would have to maintain two implementations. > Christophe
Re: [RFC PATCH 3/3] powerpc/lib: Use a temporary mm for code patching
> On April 15, 2020 3:45 AM Christophe Leroy wrote: > > > Le 15/04/2020 à 07:11, Christopher M Riedl a écrit : > >> On March 24, 2020 11:25 AM Christophe Leroy > >> wrote: > >> > >> > >> Le 23/03/2020 à 05:52, Christopher M. Riedl a écrit : > >>> Currently, code patching a STRICT_KERNEL_RWX exposes the temporary > >>> mappings to other CPUs. These mappings should be kept local to the CPU > >>> doing the patching. Use the pre-initialized temporary mm and patching > >>> address for this purpose. Also add a check after patching to ensure the > >>> patch succeeded. > >>> > >>> Based on x86 implementation: > >>> > >>> commit b3fd8e83ada0 > >>> ("x86/alternatives: Use temporary mm for text poking") > >>> > >>> Signed-off-by: Christopher M. Riedl > >>> --- > >>>arch/powerpc/lib/code-patching.c | 128 ++- > >>>1 file changed, 57 insertions(+), 71 deletions(-) > >>> > >>> diff --git a/arch/powerpc/lib/code-patching.c > >>> b/arch/powerpc/lib/code-patching.c > >>> index 18b88ecfc5a8..f156132e8975 100644 > >>> --- a/arch/powerpc/lib/code-patching.c > >>> +++ b/arch/powerpc/lib/code-patching.c > >>> @@ -19,6 +19,7 @@ > >>>#include > >>>#include > >>>#include > >>> +#include > >>> > >>>static int __patch_instruction(unsigned int *exec_addr, unsigned int > >>> instr, > >>> unsigned int *patch_addr) > >>> @@ -65,99 +66,79 @@ void __init poking_init(void) > >>> pte_unmap_unlock(ptep, ptl); > >>>} > >>> > >>> -static DEFINE_PER_CPU(struct vm_struct *, text_poke_area); > >>> - > >>> -static int text_area_cpu_up(unsigned int cpu) > >>> -{ > >>> - struct vm_struct *area; > >>> - > >>> - area = get_vm_area(PAGE_SIZE, VM_ALLOC); > >>> - if (!area) { > >>> - WARN_ONCE(1, "Failed to create text area for cpu %d\n", > >>> - cpu); > >>> - return -1; > >>> - } > >>> - this_cpu_write(text_poke_area, area); > >>> - > >>> - return 0; > >>> -} > >>> - > >>> -static int text_area_cpu_down(unsigned int cpu) > >>> -{ > >>> - free_vm_area(this_cpu_read(text_poke_area)); > >>> - return 0; > >>> -} > >>> - > >>> -/* > >>> - * Run as a late init call. This allows all the boot time patching to be > >>> done > >>> - * simply by patching the code, and then we're called here prior to > >>> - * mark_rodata_ro(), which happens after all init calls are run. Although > >>> - * BUG_ON() is rude, in this case it should only happen if ENOMEM, and > >>> we judge > >>> - * it as being preferable to a kernel that will crash later when someone > >>> tries > >>> - * to use patch_instruction(). > >>> - */ > >>> -static int __init setup_text_poke_area(void) > >>> -{ > >>> - BUG_ON(!cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, > >>> - "powerpc/text_poke:online", text_area_cpu_up, > >>> - text_area_cpu_down)); > >>> - > >>> - return 0; > >>> -} > >>> -late_initcall(setup_text_poke_area); > >>> +struct patch_mapping { > >>> + spinlock_t *ptl; /* for protecting pte table */ > >>> + struct temp_mm temp_mm; > >>> +}; > >>> > >>>/* > >>> * This can be called for kernel text or a module. > >>> */ > >>> -static int map_patch_area(void *addr, unsigned long text_poke_addr) > >>> +static int map_patch(const void *addr, struct patch_mapping > >>> *patch_mapping) > >> > >> Why change the name ? > >> > > > > It's not really an "area" anymore. > > > >>>{ > >>> - unsigned long pfn; > >>> - int err; > >>> + struct page *page; > >>> + pte_t pte, *ptep; > >>> + pgprot_t pgprot; > >>> > >>> if (is_vmalloc_addr(addr)) > >>> - pfn = vmalloc_to_pfn(addr); > >>> +
Re: [RFC PATCH] powerpc/lib: Fixing use a temporary mm for code patching
On Sat Apr 18, 2020 at 12:27 PM, Christophe Leroy wrote: > > > > > Le 15/04/2020 à 18:22, Christopher M Riedl a écrit : > >> On April 15, 2020 4:12 AM Christophe Leroy wrote: > >> > >> > >> Le 15/04/2020 à 07:16, Christopher M Riedl a écrit : > >>>> On March 26, 2020 9:42 AM Christophe Leroy > >>>> wrote: > >>>> > >>>> > >>>> This patch fixes the RFC series identified below. > >>>> It fixes three points: > >>>> - Failure with CONFIG_PPC_KUAP > >>>> - Failure to write do to lack of DIRTY bit set on the 8xx > >>>> - Inadequaly complex WARN post verification > >>>> > >>>> However, it has an impact on the CPU load. Here is the time > >>>> needed on an 8xx to run the ftrace selftests without and > >>>> with this series: > >>>> - Without CONFIG_STRICT_KERNEL_RWX ==> 38 seconds > >>>> - With CONFIG_STRICT_KERNEL_RWX ==> 40 seconds > >>>> - With CONFIG_STRICT_KERNEL_RWX + this series==> 43 seconds > >>>> > >>>> Link: > >>>> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=166003 > >>>> Signed-off-by: Christophe Leroy > >>>> --- > >>>>arch/powerpc/lib/code-patching.c | 5 - > >>>>1 file changed, 4 insertions(+), 1 deletion(-) > >>>> > >>>> diff --git a/arch/powerpc/lib/code-patching.c > >>>> b/arch/powerpc/lib/code-patching.c > >>>> index f156132e8975..4ccff427592e 100644 > >>>> --- a/arch/powerpc/lib/code-patching.c > >>>> +++ b/arch/powerpc/lib/code-patching.c > >>>> @@ -97,6 +97,7 @@ static int map_patch(const void *addr, struct > >>>> patch_mapping *patch_mapping) > >>>> } > >>>> > >>>> pte = mk_pte(page, pgprot); > >>>> +pte = pte_mkdirty(pte); > >>>> set_pte_at(patching_mm, patching_addr, ptep, pte); > >>>> > >>>> init_temp_mm(&patch_mapping->temp_mm, patching_mm); > >>>> @@ -168,7 +169,9 @@ static int do_patch_instruction(unsigned int *addr, > >>>> unsigned int instr) > >>>> (offset_in_page((unsigned long)addr) / > >>>> sizeof(unsigned int)); > >>>> > >>>> +allow_write_to_user(patch_addr, sizeof(instr)); > >>>> __patch_instruction(addr, instr, patch_addr); > >>>> +prevent_write_to_user(patch_addr, sizeof(instr)); > >>>> > >>> > >>> On radix we can map the page with PAGE_KERNEL protection which ends up > >>> setting EAA[0] in the radix PTE. This means the KUAP (AMR) protection is > >>> ignored (ISA v3.0b Fig. 35) since we are accessing the page from > >>> MSR[PR]=0. > >>> > >>> Can we employ a similar approach on the 8xx? I would prefer *not* to wrap > >>> the __patch_instruction() with the allow_/prevent_write_to_user() KUAP > >>> things > >>> because this is a temporary kernel mapping which really isn't userspace in > >>> the usual sense. > >> > >> On the 8xx, that's pretty different. > >> > >> The PTE doesn't control whether a page is user page or a kernel page. > >> The only thing that is set in the PTE is whether a page is linked to a > >> given PID or not. > >> PAGE_KERNEL tells that the page can be addressed with any PID. > >> > >> The user access right is given by a kind of zone, which is in the PGD > >> entry. Every pages above PAGE_OFFSET are defined as belonging to zone 0. > >> Every pages below PAGE_OFFSET are defined as belonging to zone 1. > >> > >> By default, zone 0 can only be accessed by kernel, and zone 1 can only > >> be accessed by user. When kernel wants to access zone 1, it temporarily > >> changes properties of zone 1 to allow both kernel and user accesses. > >> > >> So, if your mapping is below PAGE_OFFSET, it is in zone 1 and kernel > >> must unlock it to access it. > >> > >> > >> And this is more or less the same on hash/32. This is managed by segment > >> registers. One segment register corresponds to a 256Mbytes area. Every > >> pages below PAGE_OFFSET can only
Re: [PATCH 1/3] powerpc: Properly return error code from do_patch_instruction()
On Fri Apr 24, 2020 at 9:15 AM, Steven Rostedt wrote: > On Thu, 23 Apr 2020 18:21:14 +0200 > Christophe Leroy wrote: > > > > Le 23/04/2020 à 17:09, Naveen N. Rao a écrit : > > > With STRICT_KERNEL_RWX, we are currently ignoring return value from > > > __patch_instruction() in do_patch_instruction(), resulting in the error > > > not being propagated back. Fix the same. > > > > Good patch. > > > > Be aware that there is ongoing work which tend to wanting to replace > > error reporting by BUG_ON() . See > > https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=166003 > > > Thanks for the reference. I still believe that WARN_ON() should be used > in > 99% of the cases, including here. And only do a BUG_ON() when you know > there's no recovering from it. > > > In fact, there's still BUG_ON()s in my code that I need to convert to > WARN_ON() (it was written when BUG_ON() was still acceptable ;-) > Figured I'd chime in since I am working on that other series :) The BUG_ON()s are _only_ in the init code to set things up to allow a temporary mapping for patching a STRICT_RWX kernel later. There's no ongoing work to "replace error reporting by BUG_ON()". If that initial setup fails we cannot patch under STRICT_KERNEL_RWX at all which imo warrants a BUG_ON(). I am still working on v2 of my RFC which does return any __patch_instruction() error back to the caller of patch_instruction() similar to this patch. > > -- Steve > > > >
[RFC PATCH v2 0/5] Use per-CPU temporary mappings for patching
When compiled with CONFIG_STRICT_KERNEL_RWX, the kernel must create temporary mappings when patching itself. These mappings temporarily override the strict RWX text protections to permit a write. Currently, powerpc allocates a per-CPU VM area for patching. Patching occurs as follows: 1. Map page of text to be patched to per-CPU VM area w/ PAGE_KERNEL protection 2. Patch text 3. Remove the temporary mapping While the VM area is per-CPU, the mapping is actually inserted into the kernel page tables. Presumably, this could allow another CPU to access the normally write-protected text - either malicously or accidentally - via this same mapping if the address of the VM area is known. Ideally, the mapping should be kept local to the CPU doing the patching (or any other sensitive operations requiring temporarily overriding memory protections) [0]. x86 introduced "temporary mm" structs which allow the creation of mappings local to a particular CPU [1]. This series intends to bring the notion of a temporary mm to powerpc and harden powerpc by using such a mapping for patching a kernel with strict RWX permissions. The first patch introduces the temporary mm struct and API for powerpc along with a new function to retrieve a current hw breakpoint. The second patch uses the `poking_init` init hook added by the x86 patches to initialize a temporary mm and patching address. The patching address is randomized between 0 and DEFAULT_MAP_WINDOW-PAGE_SIZE. The upper limit is necessary due to how the hash MMU operates - by default the space above DEFAULT_MAP_WINDOW is not available. For now, both hash and radix randomize inside this range. The number of possible random addresses is dependent on PAGE_SIZE and limited by DEFAULT_MAP_WINDOW. Bits of entropy with 64K page size on BOOK3S_64: bits of entropy = log2(DEFAULT_MAP_WINDOW_USER64 / PAGE_SIZE) PAGE_SIZE=64K, DEFAULT_MAP_WINDOW_USER64=128TB bits of entropy = log2(128TB / 64K) bits of entropy = 31 Randomization occurs only once during initialization at boot. The third patch replaces the VM area with the temporary mm in the patching code. The page for patching has to be mapped PAGE_SHARED with the hash MMU since hash prevents the kernel from accessing userspace pages with PAGE_PRIVILEGED bit set. On the radix MMU the page is mapped with PAGE_KERNEL which has the added benefit that we can skip KUAP. The fourth and fifth patches implement an LKDTM test "proof-of-concept" which exploits the previous vulnerability (ie. the mapping during patching is exposed in kernel page tables and accessible by other CPUS). The LKDTM test is somewhat "rough" in that it uses a brute-force approach - I am open to any suggestions and/or ideas to improve this. Currently, the LKDTM test passes with this series on POWER8 (hash) and POWER9 (radix, hash) and fails without this series (ie. the temporary mapping for patching is exposed to CPUs other than the patching CPU). The test can be applied to a tree without this new series by first adding this in /arch/powerpc/lib/code-patching.c: @@ -41,6 +41,13 @@ int raw_patch_instruction(unsigned int *addr, unsigned int instr) #ifdef CONFIG_STRICT_KERNEL_RWX static DEFINE_PER_CPU(struct vm_struct *, text_poke_area); +#ifdef CONFIG_LKDTM +unsigned long read_cpu_patching_addr(unsigned int cpu) +{ + return (unsigned long)(per_cpu(text_poke_area, cpu))->addr; +} +#endif + static int text_area_cpu_up(unsigned int cpu) { struct vm_struct *area; And then applying the last patch of this series which adds the LKDTM test, (powerpc: Add LKDTM test to hijack a patch mapping). Tested on QEMU (POWER8, POWER9), POWER8 VM, and a Blackbird (8-core POWER9). v2: Many fixes and improvements mostly based on extensive feedback and testing by Christophe Leroy (thanks!). * Make patching_mm and patching_addr static and mode '__ro_after_init' to after the variable name (more common in other parts of the kernel) * Use 'asm/debug.h' header instead of 'asm/hw_breakpoint.h' to fix PPC64e compile * Add comment explaining why we use BUG_ON() during the init call to setup for patching later * Move ptep into patch_mapping to avoid walking page tables a second time when unmapping the temporary mapping * Use KUAP under non-radix, also manually dirty the PTE for patch mapping on non-BOOK3S_64 platforms * Properly return any error from __patch_instruction * Do not use 'memcmp' where a simple comparison is appropriate * Simplify expression for patch address by removing pointer maths * Add LKDTM test [0]: https://github.com/linuxppc/issues/issues/224 [1]: https://lore.kernel.org/kernel-hardening/20190426232303.28381-1-nadav.a...@gmail.com/ Christopher M. Riedl (5): powerpc/mm: Introduce temporary mm
[RFC PATCH v2 3/5] powerpc/lib: Use a temporary mm for code patching
Currently, code patching a STRICT_KERNEL_RWX exposes the temporary mappings to other CPUs. These mappings should be kept local to the CPU doing the patching. Use the pre-initialized temporary mm and patching address for this purpose. Also add a check after patching to ensure the patch succeeded. Use the KUAP functions on non-BOOKS3_64 platforms since the temporary mapping for patching uses a userspace address (to keep the mapping local). On BOOKS3_64 platforms hash does not implement KUAP and on radix the use of PAGE_KERNEL sets EAA[0] for the PTE which means the AMR (KUAP) protection is ignored (see PowerISA v3.0b, Fig, 35). Based on x86 implementation: commit b3fd8e83ada0 ("x86/alternatives: Use temporary mm for text poking") Signed-off-by: Christopher M. Riedl --- arch/powerpc/lib/code-patching.c | 149 --- 1 file changed, 55 insertions(+), 94 deletions(-) diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index 259c19480a85..26f06cdb5d7e 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -19,6 +19,7 @@ #include #include #include +#include static int __patch_instruction(unsigned int *exec_addr, unsigned int instr, unsigned int *patch_addr) @@ -72,101 +73,58 @@ void __init poking_init(void) pte_unmap_unlock(ptep, ptl); } -static DEFINE_PER_CPU(struct vm_struct *, text_poke_area); - -static int text_area_cpu_up(unsigned int cpu) -{ - struct vm_struct *area; - - area = get_vm_area(PAGE_SIZE, VM_ALLOC); - if (!area) { - WARN_ONCE(1, "Failed to create text area for cpu %d\n", - cpu); - return -1; - } - this_cpu_write(text_poke_area, area); - - return 0; -} - -static int text_area_cpu_down(unsigned int cpu) -{ - free_vm_area(this_cpu_read(text_poke_area)); - return 0; -} - -/* - * Run as a late init call. This allows all the boot time patching to be done - * simply by patching the code, and then we're called here prior to - * mark_rodata_ro(), which happens after all init calls are run. Although - * BUG_ON() is rude, in this case it should only happen if ENOMEM, and we judge - * it as being preferable to a kernel that will crash later when someone tries - * to use patch_instruction(). - */ -static int __init setup_text_poke_area(void) -{ - BUG_ON(!cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, - "powerpc/text_poke:online", text_area_cpu_up, - text_area_cpu_down)); - - return 0; -} -late_initcall(setup_text_poke_area); +struct patch_mapping { + spinlock_t *ptl; /* for protecting pte table */ + pte_t *ptep; + struct temp_mm temp_mm; +}; /* * This can be called for kernel text or a module. */ -static int map_patch_area(void *addr, unsigned long text_poke_addr) +static int map_patch(const void *addr, struct patch_mapping *patch_mapping) { - unsigned long pfn; - int err; + struct page *page; + pte_t pte; + pgprot_t pgprot; if (is_vmalloc_addr(addr)) - pfn = vmalloc_to_pfn(addr); + page = vmalloc_to_page(addr); else - pfn = __pa_symbol(addr) >> PAGE_SHIFT; + page = virt_to_page(addr); - err = map_kernel_page(text_poke_addr, (pfn << PAGE_SHIFT), PAGE_KERNEL); + if (radix_enabled()) + pgprot = PAGE_KERNEL; + else + pgprot = PAGE_SHARED; - pr_devel("Mapped addr %lx with pfn %lx:%d\n", text_poke_addr, pfn, err); - if (err) + patch_mapping->ptep = get_locked_pte(patching_mm, patching_addr, +&patch_mapping->ptl); + if (unlikely(!patch_mapping->ptep)) { + pr_warn("map patch: failed to allocate pte for patching\n"); return -1; + } + + pte = mk_pte(page, pgprot); + if (!IS_ENABLED(CONFIG_PPC_BOOK3S_64)) + pte = pte_mkdirty(pte); + set_pte_at(patching_mm, patching_addr, patch_mapping->ptep, pte); + + init_temp_mm(&patch_mapping->temp_mm, patching_mm); + use_temporary_mm(&patch_mapping->temp_mm); return 0; } -static inline int unmap_patch_area(unsigned long addr) +static void unmap_patch(struct patch_mapping *patch_mapping) { - pte_t *ptep; - pmd_t *pmdp; - pud_t *pudp; - pgd_t *pgdp; - - pgdp = pgd_offset_k(addr); - if (unlikely(!pgdp)) - return -EINVAL; - - pudp = pud_offset(pgdp, addr); - if (unlikely(!pudp)) - return -EINVAL; - - pmdp = pmd_offset(pudp, addr); - if (unlikely(!pmdp)) - return -EINVAL; - - ptep = pte_offset_kernel(pmdp, addr); - if (unlikely(!ptep)) - return -EINV
[RFC PATCH v2 5/5] powerpc: Add LKDTM test to hijack a patch mapping
When live patching with STRICT_KERNEL_RWX, the CPU doing the patching must use a temporary mapping which allows for writing to kernel text. During the entire window of time when this temporary mapping is in use, another CPU could write to the same mapping and maliciously alter kernel text. Implement a LKDTM test to attempt to exploit such a openings when a CPU is patching under STRICT_KERNEL_RWX. The test is only implemented on powerpc for now. The LKDTM "hijack" test works as follows: 1. A CPU executes an infinite loop to patch an instruction. This is the "patching" CPU. 2. Another CPU attempts to write to the address of the temporary mapping used by the "patching" CPU. This other CPU is the "hijacker" CPU. The hijack either fails with a segfault or succeeds, in which case some kernel text is now overwritten. How to run the test: mount -t debugfs none /sys/kernel/debug (echo HIJACK_PATCH > /sys/kernel/debug/provoke-crash/DIRECT) Signed-off-by: Christopher M. Riedl --- drivers/misc/lkdtm/core.c | 1 + drivers/misc/lkdtm/lkdtm.h | 1 + drivers/misc/lkdtm/perms.c | 99 ++ 3 files changed, 101 insertions(+) diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c index a5e344df9166..482e72f6a1e1 100644 --- a/drivers/misc/lkdtm/core.c +++ b/drivers/misc/lkdtm/core.c @@ -145,6 +145,7 @@ static const struct crashtype crashtypes[] = { CRASHTYPE(WRITE_RO), CRASHTYPE(WRITE_RO_AFTER_INIT), CRASHTYPE(WRITE_KERN), + CRASHTYPE(HIJACK_PATCH), CRASHTYPE(REFCOUNT_INC_OVERFLOW), CRASHTYPE(REFCOUNT_ADD_OVERFLOW), CRASHTYPE(REFCOUNT_INC_NOT_ZERO_OVERFLOW), diff --git a/drivers/misc/lkdtm/lkdtm.h b/drivers/misc/lkdtm/lkdtm.h index 601a2156a0d4..bfcf3542370d 100644 --- a/drivers/misc/lkdtm/lkdtm.h +++ b/drivers/misc/lkdtm/lkdtm.h @@ -62,6 +62,7 @@ void lkdtm_EXEC_USERSPACE(void); void lkdtm_EXEC_NULL(void); void lkdtm_ACCESS_USERSPACE(void); void lkdtm_ACCESS_NULL(void); +void lkdtm_HIJACK_PATCH(void); /* lkdtm_refcount.c */ void lkdtm_REFCOUNT_INC_OVERFLOW(void); diff --git a/drivers/misc/lkdtm/perms.c b/drivers/misc/lkdtm/perms.c index 62f76d506f04..547ce16e03e5 100644 --- a/drivers/misc/lkdtm/perms.c +++ b/drivers/misc/lkdtm/perms.c @@ -9,6 +9,7 @@ #include #include #include +#include #include /* Whether or not to fill the target memory area with do_nothing(). */ @@ -213,6 +214,104 @@ void lkdtm_ACCESS_NULL(void) *ptr = tmp; } +#if defined(CONFIG_PPC) && defined(CONFIG_STRICT_KERNEL_RWX) +#include + +extern unsigned long read_cpu_patching_addr(unsigned int cpu); + +static unsigned int * const patch_site = (unsigned int * const)&do_nothing; + +static int lkdtm_patching_cpu(void *data) +{ + int err = 0; + + pr_info("starting patching_cpu=%d\n", smp_processor_id()); + do { + err = patch_instruction(patch_site, 0xdeadbeef); + } while (*READ_ONCE(patch_site) == 0xdeadbeef && + !err && !kthread_should_stop()); + + if (err) + pr_warn("patch_instruction returned error: %d\n", err); + + set_current_state(TASK_INTERRUPTIBLE); + while (!kthread_should_stop()) { + schedule(); + set_current_state(TASK_INTERRUPTIBLE); + } + + return err; +} + +void lkdtm_HIJACK_PATCH(void) +{ + struct task_struct *patching_kthrd; + int patching_cpu, hijacker_cpu, original_insn, attempts; + unsigned long addr; + bool hijacked; + + if (num_online_cpus() < 2) { + pr_warn("need at least two cpus\n"); + return; + } + + original_insn = *READ_ONCE(patch_site); + + hijacker_cpu = smp_processor_id(); + patching_cpu = cpumask_any_but(cpu_online_mask, hijacker_cpu); + + patching_kthrd = kthread_create_on_node(&lkdtm_patching_cpu, NULL, + cpu_to_node(patching_cpu), + "lkdtm_patching_cpu"); + kthread_bind(patching_kthrd, patching_cpu); + wake_up_process(patching_kthrd); + + addr = offset_in_page(patch_site) | read_cpu_patching_addr(patching_cpu); + + pr_info("starting hijacker_cpu=%d\n", hijacker_cpu); + for (attempts = 0; attempts < 10; ++attempts) { + /* Use __put_user to catch faults without an Oops */ + hijacked = !__put_user(0xbad00bad, (unsigned int *)addr); + + if (hijacked) { + if (kthread_stop(patching_kthrd)) + goto out; + break; + } + } + pr_info("hijack attempts: %d\n", attempts)
[RFC PATCH v2 2/5] powerpc/lib: Initialize a temporary mm for code patching
When code patching a STRICT_KERNEL_RWX kernel the page containing the address to be patched is temporarily mapped with permissive memory protections. Currently, a per-cpu vmalloc patch area is used for this purpose. While the patch area is per-cpu, the temporary page mapping is inserted into the kernel page tables for the duration of the patching. The mapping is exposed to CPUs other than the patching CPU - this is undesirable from a hardening perspective. Use the `poking_init` init hook to prepare a temporary mm and patching address. Initialize the temporary mm by copying the init mm. Choose a randomized patching address inside the temporary mm userspace address portion. The next patch uses the temporary mm and patching address for code patching. Based on x86 implementation: commit 4fc19708b165 ("x86/alternatives: Initialize temporary mm for patching") Signed-off-by: Christopher M. Riedl --- arch/powerpc/lib/code-patching.c | 33 1 file changed, 33 insertions(+) diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index 3345f039a876..259c19480a85 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -11,6 +11,8 @@ #include #include #include +#include +#include #include #include @@ -39,6 +41,37 @@ int raw_patch_instruction(unsigned int *addr, unsigned int instr) } #ifdef CONFIG_STRICT_KERNEL_RWX + +static struct mm_struct *patching_mm __ro_after_init; +static unsigned long patching_addr __ro_after_init; + +void __init poking_init(void) +{ + spinlock_t *ptl; /* for protecting pte table */ + pte_t *ptep; + + /* +* Some parts of the kernel (static keys for example) depend on +* successful code patching. Code patching under STRICT_KERNEL_RWX +* requires this setup - otherwise we cannot patch at all. We use +* BUG_ON() here and later since an early failure is preferred to +* buggy behavior and/or strange crashes later. +*/ + patching_mm = copy_init_mm(); + BUG_ON(!patching_mm); + + /* +* In hash we cannot go above DEFAULT_MAP_WINDOW easily. +* XXX: Do we want additional bits of entropy for radix? +*/ + patching_addr = (get_random_long() & PAGE_MASK) % + (DEFAULT_MAP_WINDOW - PAGE_SIZE); + + ptep = get_locked_pte(patching_mm, patching_addr, &ptl); + BUG_ON(!ptep); + pte_unmap_unlock(ptep, ptl); +} + static DEFINE_PER_CPU(struct vm_struct *, text_poke_area); static int text_area_cpu_up(unsigned int cpu) -- 2.26.1
[RFC PATCH v2 4/5] powerpc/lib: Add LKDTM accessor for patching addr
When live patching a STRICT_RWX kernel, a mapping is installed at a "patching address" with temporary write permissions. Provide a LKDTM-only accessor function for this address in preparation for a LKDTM test which attempts to "hijack" this mapping by writing to it from another CPU. Signed-off-by: Christopher M. Riedl --- arch/powerpc/lib/code-patching.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index 26f06cdb5d7e..cfbdef90384e 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -46,6 +46,13 @@ int raw_patch_instruction(unsigned int *addr, unsigned int instr) static struct mm_struct *patching_mm __ro_after_init; static unsigned long patching_addr __ro_after_init; +#ifdef CONFIG_LKDTM +unsigned long read_cpu_patching_addr(unsigned int cpu) +{ + return patching_addr; +} +#endif + void __init poking_init(void) { spinlock_t *ptl; /* for protecting pte table */ -- 2.26.1
[RFC PATCH v2 1/5] powerpc/mm: Introduce temporary mm
x86 supports the notion of a temporary mm which restricts access to temporary PTEs to a single CPU. A temporary mm is useful for situations where a CPU needs to perform sensitive operations (such as patching a STRICT_KERNEL_RWX kernel) requiring temporary mappings without exposing said mappings to other CPUs. A side benefit is that other CPU TLBs do not need to be flushed when the temporary mm is torn down. Mappings in the temporary mm can be set in the userspace portion of the address-space. Interrupts must be disabled while the temporary mm is in use. HW breakpoints, which may have been set by userspace as watchpoints on addresses now within the temporary mm, are saved and disabled when loading the temporary mm. The HW breakpoints are restored when unloading the temporary mm. All HW breakpoints are indiscriminately disabled while the temporary mm is in use. Based on x86 implementation: commit cefa929c034e ("x86/mm: Introduce temporary mm structs") Signed-off-by: Christopher M. Riedl --- arch/powerpc/include/asm/debug.h | 1 + arch/powerpc/include/asm/mmu_context.h | 54 ++ arch/powerpc/kernel/process.c | 5 +++ 3 files changed, 60 insertions(+) diff --git a/arch/powerpc/include/asm/debug.h b/arch/powerpc/include/asm/debug.h index 7756026b95ca..b945bc16c932 100644 --- a/arch/powerpc/include/asm/debug.h +++ b/arch/powerpc/include/asm/debug.h @@ -45,6 +45,7 @@ static inline int debugger_break_match(struct pt_regs *regs) { return 0; } static inline int debugger_fault_handler(struct pt_regs *regs) { return 0; } #endif +void __get_breakpoint(struct arch_hw_breakpoint *brk); void __set_breakpoint(struct arch_hw_breakpoint *brk); bool ppc_breakpoint_available(void); #ifdef CONFIG_PPC_ADV_DEBUG_REGS diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h index 360367c579de..57a8695fe63f 100644 --- a/arch/powerpc/include/asm/mmu_context.h +++ b/arch/powerpc/include/asm/mmu_context.h @@ -10,6 +10,7 @@ #include #include #include +#include /* * Most if the context management is out of line @@ -270,5 +271,58 @@ static inline int arch_dup_mmap(struct mm_struct *oldmm, return 0; } +struct temp_mm { + struct mm_struct *temp; + struct mm_struct *prev; + bool is_kernel_thread; + struct arch_hw_breakpoint brk; +}; + +static inline void init_temp_mm(struct temp_mm *temp_mm, struct mm_struct *mm) +{ + temp_mm->temp = mm; + temp_mm->prev = NULL; + temp_mm->is_kernel_thread = false; + memset(&temp_mm->brk, 0, sizeof(temp_mm->brk)); +} + +static inline void use_temporary_mm(struct temp_mm *temp_mm) +{ + lockdep_assert_irqs_disabled(); + + temp_mm->is_kernel_thread = current->mm == NULL; + if (temp_mm->is_kernel_thread) + temp_mm->prev = current->active_mm; + else + temp_mm->prev = current->mm; + + /* +* Hash requires a non-NULL current->mm to allocate a userspace address +* when handling a page fault. Does not appear to hurt in Radix either. +*/ + current->mm = temp_mm->temp; + switch_mm_irqs_off(NULL, temp_mm->temp, current); + + if (ppc_breakpoint_available()) { + __get_breakpoint(&temp_mm->brk); + if (temp_mm->brk.type != 0) + hw_breakpoint_disable(); + } +} + +static inline void unuse_temporary_mm(struct temp_mm *temp_mm) +{ + lockdep_assert_irqs_disabled(); + + if (temp_mm->is_kernel_thread) + current->mm = NULL; + else + current->mm = temp_mm->prev; + switch_mm_irqs_off(NULL, temp_mm->prev, current); + + if (ppc_breakpoint_available() && temp_mm->brk.type != 0) + __set_breakpoint(&temp_mm->brk); +} + #endif /* __KERNEL__ */ #endif /* __ASM_POWERPC_MMU_CONTEXT_H */ diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 9c21288f8645..ec4cf890d92c 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -800,6 +800,11 @@ static inline int set_breakpoint_8xx(struct arch_hw_breakpoint *brk) return 0; } +void __get_breakpoint(struct arch_hw_breakpoint *brk) +{ + memcpy(brk, this_cpu_ptr(¤t_brk), sizeof(*brk)); +} + void __set_breakpoint(struct arch_hw_breakpoint *brk) { memcpy(this_cpu_ptr(¤t_brk), brk, sizeof(*brk)); -- 2.26.1
Re: [RFC PATCH v2 3/5] powerpc/lib: Use a temporary mm for code patching
On Wed Apr 29, 2020 at 7:52 AM, Christophe Leroy wrote: > > > > > Le 29/04/2020 à 04:05, Christopher M. Riedl a écrit : > > Currently, code patching a STRICT_KERNEL_RWX exposes the temporary > > mappings to other CPUs. These mappings should be kept local to the CPU > > doing the patching. Use the pre-initialized temporary mm and patching > > address for this purpose. Also add a check after patching to ensure the > > patch succeeded. > > > > Use the KUAP functions on non-BOOKS3_64 platforms since the temporary > > mapping for patching uses a userspace address (to keep the mapping > > local). On BOOKS3_64 platforms hash does not implement KUAP and on radix > > the use of PAGE_KERNEL sets EAA[0] for the PTE which means the AMR > > (KUAP) protection is ignored (see PowerISA v3.0b, Fig, 35). > > > > Based on x86 implementation: > > > > commit b3fd8e83ada0 > > ("x86/alternatives: Use temporary mm for text poking") > > > > Signed-off-by: Christopher M. Riedl > > --- > > arch/powerpc/lib/code-patching.c | 149 --- > > 1 file changed, 55 insertions(+), 94 deletions(-) > > > > diff --git a/arch/powerpc/lib/code-patching.c > > b/arch/powerpc/lib/code-patching.c > > index 259c19480a85..26f06cdb5d7e 100644 > > --- a/arch/powerpc/lib/code-patching.c > > +++ b/arch/powerpc/lib/code-patching.c > > @@ -19,6 +19,7 @@ > > #include > > #include > > #include > > +#include > > > > static int __patch_instruction(unsigned int *exec_addr, unsigned int > > instr, > >unsigned int *patch_addr) > > @@ -72,101 +73,58 @@ void __init poking_init(void) > > pte_unmap_unlock(ptep, ptl); > > } > > > > -static DEFINE_PER_CPU(struct vm_struct *, text_poke_area); > > - > > -static int text_area_cpu_up(unsigned int cpu) > > -{ > > - struct vm_struct *area; > > - > > - area = get_vm_area(PAGE_SIZE, VM_ALLOC); > > - if (!area) { > > - WARN_ONCE(1, "Failed to create text area for cpu %d\n", > > - cpu); > > - return -1; > > - } > > - this_cpu_write(text_poke_area, area); > > - > > - return 0; > > -} > > - > > -static int text_area_cpu_down(unsigned int cpu) > > -{ > > - free_vm_area(this_cpu_read(text_poke_area)); > > - return 0; > > -} > > - > > -/* > > - * Run as a late init call. This allows all the boot time patching to be > > done > > - * simply by patching the code, and then we're called here prior to > > - * mark_rodata_ro(), which happens after all init calls are run. Although > > - * BUG_ON() is rude, in this case it should only happen if ENOMEM, and we > > judge > > - * it as being preferable to a kernel that will crash later when someone > > tries > > - * to use patch_instruction(). > > - */ > > -static int __init setup_text_poke_area(void) > > -{ > > - BUG_ON(!cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, > > - "powerpc/text_poke:online", text_area_cpu_up, > > - text_area_cpu_down)); > > - > > - return 0; > > -} > > -late_initcall(setup_text_poke_area); > > +struct patch_mapping { > > + spinlock_t *ptl; /* for protecting pte table */ > > + pte_t *ptep; > > + struct temp_mm temp_mm; > > +}; > > > > /* > >* This can be called for kernel text or a module. > >*/ > > -static int map_patch_area(void *addr, unsigned long text_poke_addr) > > +static int map_patch(const void *addr, struct patch_mapping *patch_mapping) > > { > > - unsigned long pfn; > > - int err; > > + struct page *page; > > + pte_t pte; > > + pgprot_t pgprot; > > > > if (is_vmalloc_addr(addr)) > > - pfn = vmalloc_to_pfn(addr); > > + page = vmalloc_to_page(addr); > > else > > - pfn = __pa_symbol(addr) >> PAGE_SHIFT; > > + page = virt_to_page(addr); > > > > - err = map_kernel_page(text_poke_addr, (pfn << PAGE_SHIFT), PAGE_KERNEL); > > + if (radix_enabled()) > > + pgprot = PAGE_KERNEL; > > + else > > + pgprot = PAGE_SHARED; > > > > - pr_devel("Mapped addr %lx with pfn %lx:%d\n", text_poke_addr, pfn, err); > > - if (err) > > + patch_mapping->ptep = get_locked_pte(patching_mm, patching_addr, > > +
Re: [RFC PATCH v2 1/5] powerpc/mm: Introduce temporary mm
On Wed Apr 29, 2020 at 7:39 AM, Christophe Leroy wrote: > > > > > Le 29/04/2020 à 04:05, Christopher M. Riedl a écrit : > > x86 supports the notion of a temporary mm which restricts access to > > temporary PTEs to a single CPU. A temporary mm is useful for situations > > where a CPU needs to perform sensitive operations (such as patching a > > STRICT_KERNEL_RWX kernel) requiring temporary mappings without exposing > > said mappings to other CPUs. A side benefit is that other CPU TLBs do > > not need to be flushed when the temporary mm is torn down. > > > > Mappings in the temporary mm can be set in the userspace portion of the > > address-space. > > > > Interrupts must be disabled while the temporary mm is in use. HW > > breakpoints, which may have been set by userspace as watchpoints on > > addresses now within the temporary mm, are saved and disabled when > > loading the temporary mm. The HW breakpoints are restored when unloading > > the temporary mm. All HW breakpoints are indiscriminately disabled while > > the temporary mm is in use. > > > > Based on x86 implementation: > > > > commit cefa929c034e > > ("x86/mm: Introduce temporary mm structs") > > > > Signed-off-by: Christopher M. Riedl > > --- > > arch/powerpc/include/asm/debug.h | 1 + > > arch/powerpc/include/asm/mmu_context.h | 54 ++ > > arch/powerpc/kernel/process.c | 5 +++ > > 3 files changed, 60 insertions(+) > > > > diff --git a/arch/powerpc/include/asm/debug.h > > b/arch/powerpc/include/asm/debug.h > > index 7756026b95ca..b945bc16c932 100644 > > --- a/arch/powerpc/include/asm/debug.h > > +++ b/arch/powerpc/include/asm/debug.h > > @@ -45,6 +45,7 @@ static inline int debugger_break_match(struct pt_regs > > *regs) { return 0; } > > static inline int debugger_fault_handler(struct pt_regs *regs) { return > > 0; } > > #endif > > > > +void __get_breakpoint(struct arch_hw_breakpoint *brk); > > void __set_breakpoint(struct arch_hw_breakpoint *brk); > > bool ppc_breakpoint_available(void); > > #ifdef CONFIG_PPC_ADV_DEBUG_REGS > > diff --git a/arch/powerpc/include/asm/mmu_context.h > > b/arch/powerpc/include/asm/mmu_context.h > > index 360367c579de..57a8695fe63f 100644 > > --- a/arch/powerpc/include/asm/mmu_context.h > > +++ b/arch/powerpc/include/asm/mmu_context.h > > @@ -10,6 +10,7 @@ > > #include > > #include > > #include > > +#include > > > > /* > >* Most if the context management is out of line > > @@ -270,5 +271,58 @@ static inline int arch_dup_mmap(struct mm_struct > > *oldmm, > > return 0; > > } > > > > +struct temp_mm { > > + struct mm_struct *temp; > > + struct mm_struct *prev; > > + bool is_kernel_thread; > > + struct arch_hw_breakpoint brk; > > +}; > > + > > +static inline void init_temp_mm(struct temp_mm *temp_mm, struct mm_struct > > *mm) > > +{ > > + temp_mm->temp = mm; > > + temp_mm->prev = NULL; > > + temp_mm->is_kernel_thread = false; > > + memset(&temp_mm->brk, 0, sizeof(temp_mm->brk)); > > +} > > + > > +static inline void use_temporary_mm(struct temp_mm *temp_mm) > > +{ > > + lockdep_assert_irqs_disabled(); > > + > > + temp_mm->is_kernel_thread = current->mm == NULL; > > + if (temp_mm->is_kernel_thread) > > + temp_mm->prev = current->active_mm; > > + else > > + temp_mm->prev = current->mm; > > + > > + /* > > +* Hash requires a non-NULL current->mm to allocate a userspace address > > +* when handling a page fault. Does not appear to hurt in Radix either. > > +*/ > > + current->mm = temp_mm->temp; > > + switch_mm_irqs_off(NULL, temp_mm->temp, current); > > + > > + if (ppc_breakpoint_available()) { > > + __get_breakpoint(&temp_mm->brk); > > + if (temp_mm->brk.type != 0) > > + hw_breakpoint_disable(); > > + } > > +} > > + > > +static inline void unuse_temporary_mm(struct temp_mm *temp_mm) > > > Not sure "unuse" is a best naming, allthought I don't have a better > suggestion a the moment. If not using temporary_mm anymore, what are we > using now ? > > I'm not too fond of 'unuse' either, but it's what x86 uses and I couldn't come up with anything better on the spot
Re: [RFC PATCH v2 1/5] powerpc/mm: Introduce temporary mm
On Wed Apr 29, 2020 at 7:48 AM, Christophe Leroy wrote: > > > > > Le 29/04/2020 à 04:05, Christopher M. Riedl a écrit : > > x86 supports the notion of a temporary mm which restricts access to > > temporary PTEs to a single CPU. A temporary mm is useful for situations > > where a CPU needs to perform sensitive operations (such as patching a > > STRICT_KERNEL_RWX kernel) requiring temporary mappings without exposing > > said mappings to other CPUs. A side benefit is that other CPU TLBs do > > not need to be flushed when the temporary mm is torn down. > > > > Mappings in the temporary mm can be set in the userspace portion of the > > address-space. > > > > Interrupts must be disabled while the temporary mm is in use. HW > > breakpoints, which may have been set by userspace as watchpoints on > > addresses now within the temporary mm, are saved and disabled when > > loading the temporary mm. The HW breakpoints are restored when unloading > > the temporary mm. All HW breakpoints are indiscriminately disabled while > > the temporary mm is in use. > > > Why do we need to use a temporary mm all the time ? > Not sure I understand, the temporary mm is only in use for kernel patching in this series. We could have other uses in the future maybe where it's beneficial to keep mappings local. > > Doesn't each CPU have its own mm already ? Only the upper address space > is shared between all mm's but each mm has its own lower address space, > at least when it is running a user process. Why not just use that mm ? > As we are mapping then unmapping with interrupts disabled, there is no > risk at all that the user starts running while the patch page is mapped, > so I'm not sure why switching to a temporary mm is needed. > > I suppose that's an option, but then we have to save and restore the mapping which we temporarily "steal" from userspace. I admit I didn't consider that as an option when I started this series based on the x86 patches. I think it's cleaner to switch mm, but that's a rather weak argument. Are you concerned about performance with the temporary mm? > > > > > > Based on x86 implementation: > > > > commit cefa929c034e > > ("x86/mm: Introduce temporary mm structs") > > > > Signed-off-by: Christopher M. Riedl > > > Christophe > > > >
[PATCH 2/3] powerpc/spinlocks: Rename SPLPAR-only spinlocks
The __rw_yield and __spin_yield locks only pertain to SPLPAR mode. Rename them to make this relationship obvious. Signed-off-by: Christopher M. Riedl --- arch/powerpc/include/asm/spinlock.h | 6 -- arch/powerpc/lib/locks.c| 6 +++--- 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h index 8631b0b4e109..1e7721176f39 100644 --- a/arch/powerpc/include/asm/spinlock.h +++ b/arch/powerpc/include/asm/spinlock.h @@ -101,8 +101,10 @@ static inline int arch_spin_trylock(arch_spinlock_t *lock) #if defined(CONFIG_PPC_SPLPAR) /* We only yield to the hypervisor if we are in shared processor mode */ -extern void __spin_yield(arch_spinlock_t *lock); -extern void __rw_yield(arch_rwlock_t *lock); +void splpar_spin_yield(arch_spinlock_t *lock); +void splpar_rw_yield(arch_rwlock_t *lock); +#define __spin_yield(x) splpar_spin_yield(x) +#define __rw_yield(x) splpar_rw_yield(x) #else /* SPLPAR */ #define __spin_yield(x)barrier() #define __rw_yield(x) barrier() diff --git a/arch/powerpc/lib/locks.c b/arch/powerpc/lib/locks.c index 6550b9e5ce5f..6440d5943c00 100644 --- a/arch/powerpc/lib/locks.c +++ b/arch/powerpc/lib/locks.c @@ -18,7 +18,7 @@ #include #include -void __spin_yield(arch_spinlock_t *lock) +void splpar_spin_yield(arch_spinlock_t *lock) { unsigned int lock_value, holder_cpu, yield_count; @@ -36,14 +36,14 @@ void __spin_yield(arch_spinlock_t *lock) plpar_hcall_norets(H_CONFER, get_hard_smp_processor_id(holder_cpu), yield_count); } -EXPORT_SYMBOL_GPL(__spin_yield); +EXPORT_SYMBOL_GPL(splpar_spin_yield); /* * Waiting for a read lock or a write lock on a rwlock... * This turns out to be the same for read and write locks, since * we only know the holder if it is write-locked. */ -void __rw_yield(arch_rwlock_t *rw) +void splpar_rw_yield(arch_rwlock_t *rw) { int lock_value; unsigned int holder_cpu, yield_count; -- 2.22.0
[PATCH 0/3] Fix oops in shared-processor spinlocks
Fixes an oops when calling the shared-processor spinlock implementation from a non-SP LPAR. Also take this opportunity to refactor SHARED_PROCESSOR a bit. Reference: https://github.com/linuxppc/issues/issues/229 Christopher M. Riedl (3): powerpc/spinlocks: Refactor SHARED_PROCESSOR powerpc/spinlocks: Rename SPLPAR-only spinlocks powerpc/spinlocks: Fix oops in shared-processor spinlocks arch/powerpc/include/asm/spinlock.h | 59 - arch/powerpc/lib/locks.c| 6 +-- 2 files changed, 45 insertions(+), 20 deletions(-) -- 2.22.0
[PATCH 1/3] powerpc/spinlocks: Refactor SHARED_PROCESSOR
Determining if a processor is in shared processor mode is not a constant so don't hide it behind a #define. Signed-off-by: Christopher M. Riedl --- arch/powerpc/include/asm/spinlock.h | 21 +++-- 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h index a47f827bc5f1..8631b0b4e109 100644 --- a/arch/powerpc/include/asm/spinlock.h +++ b/arch/powerpc/include/asm/spinlock.h @@ -101,15 +101,24 @@ static inline int arch_spin_trylock(arch_spinlock_t *lock) #if defined(CONFIG_PPC_SPLPAR) /* We only yield to the hypervisor if we are in shared processor mode */ -#define SHARED_PROCESSOR (lppaca_shared_proc(local_paca->lppaca_ptr)) extern void __spin_yield(arch_spinlock_t *lock); extern void __rw_yield(arch_rwlock_t *lock); #else /* SPLPAR */ #define __spin_yield(x)barrier() #define __rw_yield(x) barrier() -#define SHARED_PROCESSOR 0 #endif +static inline bool is_shared_processor(void) +{ +/* Only server processors have an lppaca struct */ +#ifdef CONFIG_PPC_BOOK3S + return (IS_ENABLED(CONFIG_PPC_SPLPAR) && + lppaca_shared_proc(local_paca->lppaca_ptr)); +#else + return false; +#endif +} + static inline void arch_spin_lock(arch_spinlock_t *lock) { while (1) { @@ -117,7 +126,7 @@ static inline void arch_spin_lock(arch_spinlock_t *lock) break; do { HMT_low(); - if (SHARED_PROCESSOR) + if (is_shared_processor()) __spin_yield(lock); } while (unlikely(lock->slock != 0)); HMT_medium(); @@ -136,7 +145,7 @@ void arch_spin_lock_flags(arch_spinlock_t *lock, unsigned long flags) local_irq_restore(flags); do { HMT_low(); - if (SHARED_PROCESSOR) + if (is_shared_processor()) __spin_yield(lock); } while (unlikely(lock->slock != 0)); HMT_medium(); @@ -226,7 +235,7 @@ static inline void arch_read_lock(arch_rwlock_t *rw) break; do { HMT_low(); - if (SHARED_PROCESSOR) + if (is_shared_processor()) __rw_yield(rw); } while (unlikely(rw->lock < 0)); HMT_medium(); @@ -240,7 +249,7 @@ static inline void arch_write_lock(arch_rwlock_t *rw) break; do { HMT_low(); - if (SHARED_PROCESSOR) + if (is_shared_processor()) __rw_yield(rw); } while (unlikely(rw->lock != 0)); HMT_medium(); -- 2.22.0
[PATCH 3/3] powerpc/spinlock: Fix oops in shared-processor spinlocks
Booting w/ ppc64le_defconfig + CONFIG_PREEMPT results in the attached kernel trace due to calling shared-processor spinlocks while not running in an SPLPAR. Previously, the out-of-line spinlocks implementations were selected based on CONFIG_PPC_SPLPAR at compile time without a runtime shared-processor LPAR check. To fix, call the actual spinlock implementations from a set of common functions, spin_yield() and rw_yield(), which check for shared-processor LPAR during runtime and select the appropriate lock implementation. [0.430878] BUG: Kernel NULL pointer dereference at 0x0100 [0.431991] Faulting instruction address: 0xc0097f88 [0.432934] Oops: Kernel access of bad area, sig: 7 [#1] [0.433448] LE PAGE_SIZE=64K MMU=Radix MMU=Hash PREEMPT SMP NR_CPUS=2048 NUMA PowerNV [0.434479] Modules linked in: [0.435055] CPU: 0 PID: 2 Comm: kthreadd Not tainted 5.2.0-rc6-00491-g249155c20f9b #28 [0.435730] NIP: c0097f88 LR: c0c07a88 CTR: c015ca10 [0.436383] REGS: c000727079f0 TRAP: 0300 Not tainted (5.2.0-rc6-00491-g249155c20f9b) [0.437004] MSR: 92009033 CR: 84000424 XER: 2004 [0.437874] CFAR: c0c07a84 DAR: 0100 DSISR: 0008 IRQMASK: 1 [0.437874] GPR00: c0c07a88 c00072707c80 c1546300 c0007be38a80 [0.437874] GPR04: c000726f0c00 0002 c0007279c980 0100 [0.437874] GPR08: c1581b78 8001 0008 c0007279c9b0 [0.437874] GPR12: c173 c0142558 [0.437874] GPR16: [0.437874] GPR20: [0.437874] GPR24: c0007be38a80 c0c002f4 [0.437874] GPR28: c00072221a00 c000726c2600 c0007be38a80 c0007be38a80 [0.443992] NIP [c0097f88] __spin_yield+0x48/0xa0 [0.444523] LR [c0c07a88] __raw_spin_lock+0xb8/0xc0 [0.445080] Call Trace: [0.445670] [c00072707c80] [c00072221a00] 0xc00072221a00 (unreliable) [0.446425] [c00072707cb0] [c0bffb0c] __schedule+0xbc/0x850 [0.447078] [c00072707d70] [c0c002f4] schedule+0x54/0x130 [0.447694] [c00072707da0] [c01427dc] kthreadd+0x28c/0x2b0 [0.448389] [c00072707e20] [c000c1cc] ret_from_kernel_thread+0x5c/0x70 [0.449143] Instruction dump: [0.449821] 4d9e0020 552a043e 210a07ff 79080fe0 0b08 3d020004 3908b878 794a1f24 [0.450587] e8e8 7ce7502a e8e7 38e70100 <7ca03c2c> 70a70001 78a50020 4d820020 [0.452808] ---[ end trace 474d6b2b8fc5cb7e ]--- Signed-off-by: Christopher M. Riedl --- arch/powerpc/include/asm/spinlock.h | 36 - 1 file changed, 25 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h index 1e7721176f39..8161809c6be1 100644 --- a/arch/powerpc/include/asm/spinlock.h +++ b/arch/powerpc/include/asm/spinlock.h @@ -103,11 +103,9 @@ static inline int arch_spin_trylock(arch_spinlock_t *lock) /* We only yield to the hypervisor if we are in shared processor mode */ void splpar_spin_yield(arch_spinlock_t *lock); void splpar_rw_yield(arch_rwlock_t *lock); -#define __spin_yield(x) splpar_spin_yield(x) -#define __rw_yield(x) splpar_rw_yield(x) #else /* SPLPAR */ -#define __spin_yield(x)barrier() -#define __rw_yield(x) barrier() +#define splpar_spin_yield(lock) +#define splpar_rw_yield(lock) #endif static inline bool is_shared_processor(void) @@ -121,6 +119,22 @@ static inline bool is_shared_processor(void) #endif } +static inline void spin_yield(arch_spinlock_t *lock) +{ + if (is_shared_processor()) + splpar_spin_yield(lock); + else + barrier(); +} + +static inline void rw_yield(arch_rwlock_t *lock) +{ + if (is_shared_processor()) + splpar_rw_yield(lock); + else + barrier(); +} + static inline void arch_spin_lock(arch_spinlock_t *lock) { while (1) { @@ -129,7 +143,7 @@ static inline void arch_spin_lock(arch_spinlock_t *lock) do { HMT_low(); if (is_shared_processor()) - __spin_yield(lock); + spin_yield(lock); } while (unlikely(lock->slock != 0)); HMT_medium(); } @@ -148,7 +162,7 @@ void arch_spin_lock_flags(arch_spinlock_t *lock, unsigned long flags) do { HMT_low(); if (is_shared_processor()) - __spin_yield(lock); + spin_yield(lock); } while (unlikely(lock-&g
Re: [PATCH 1/3] powerpc/spinlocks: Refactor SHARED_PROCESSOR
> On July 30, 2019 at 4:31 PM Thiago Jung Bauermann > wrote: > > > > Christopher M. Riedl writes: > > > Determining if a processor is in shared processor mode is not a constant > > so don't hide it behind a #define. > > > > Signed-off-by: Christopher M. Riedl > > --- > > arch/powerpc/include/asm/spinlock.h | 21 +++-- > > 1 file changed, 15 insertions(+), 6 deletions(-) > > > > diff --git a/arch/powerpc/include/asm/spinlock.h > > b/arch/powerpc/include/asm/spinlock.h > > index a47f827bc5f1..8631b0b4e109 100644 > > --- a/arch/powerpc/include/asm/spinlock.h > > +++ b/arch/powerpc/include/asm/spinlock.h > > @@ -101,15 +101,24 @@ static inline int arch_spin_trylock(arch_spinlock_t > > *lock) > > > > #if defined(CONFIG_PPC_SPLPAR) > > /* We only yield to the hypervisor if we are in shared processor mode */ > > -#define SHARED_PROCESSOR (lppaca_shared_proc(local_paca->lppaca_ptr)) > > extern void __spin_yield(arch_spinlock_t *lock); > > extern void __rw_yield(arch_rwlock_t *lock); > > #else /* SPLPAR */ > > #define __spin_yield(x)barrier() > > #define __rw_yield(x) barrier() > > -#define SHARED_PROCESSOR 0 > > #endif > > > > +static inline bool is_shared_processor(void) > > +{ > > +/* Only server processors have an lppaca struct */ > > +#ifdef CONFIG_PPC_BOOK3S > > + return (IS_ENABLED(CONFIG_PPC_SPLPAR) && > > + lppaca_shared_proc(local_paca->lppaca_ptr)); > > +#else > > + return false; > > +#endif > > +} > > + > > CONFIG_PPC_SPLPAR depends on CONFIG_PPC_PSERIES, which depends on > CONFIG_PPC_BOOK3S so the #ifdef above is unnecessary: > > if CONFIG_PPC_BOOK3S is unset then CONFIG_PPC_SPLPAR will be unset as > well and the return expression should short-circuit to false. > Agreed, but the #ifdef is necessary to compile platforms which include this header but do not implement lppaca_shared_proc(...) and friends. I can reword the comment if that helps. > -- > Thiago Jung Bauermann > IBM Linux Technology Center
Re: [PATCH 1/3] powerpc/spinlocks: Refactor SHARED_PROCESSOR
> On July 30, 2019 at 7:11 PM Thiago Jung Bauermann > wrote: > > > > Christopher M Riedl writes: > > >> On July 30, 2019 at 4:31 PM Thiago Jung Bauermann > >> wrote: > >> > >> > >> > >> Christopher M. Riedl writes: > >> > >> > Determining if a processor is in shared processor mode is not a constant > >> > so don't hide it behind a #define. > >> > > >> > Signed-off-by: Christopher M. Riedl > >> > --- > >> > arch/powerpc/include/asm/spinlock.h | 21 +++-- > >> > 1 file changed, 15 insertions(+), 6 deletions(-) > >> > > >> > diff --git a/arch/powerpc/include/asm/spinlock.h > >> > b/arch/powerpc/include/asm/spinlock.h > >> > index a47f827bc5f1..8631b0b4e109 100644 > >> > --- a/arch/powerpc/include/asm/spinlock.h > >> > +++ b/arch/powerpc/include/asm/spinlock.h > >> > @@ -101,15 +101,24 @@ static inline int > >> > arch_spin_trylock(arch_spinlock_t *lock) > >> > > >> > #if defined(CONFIG_PPC_SPLPAR) > >> > /* We only yield to the hypervisor if we are in shared processor mode */ > >> > -#define SHARED_PROCESSOR (lppaca_shared_proc(local_paca->lppaca_ptr)) > >> > extern void __spin_yield(arch_spinlock_t *lock); > >> > extern void __rw_yield(arch_rwlock_t *lock); > >> > #else /* SPLPAR */ > >> > #define __spin_yield(x) barrier() > >> > #define __rw_yield(x) barrier() > >> > -#define SHARED_PROCESSOR0 > >> > #endif > >> > > >> > +static inline bool is_shared_processor(void) > >> > +{ > >> > +/* Only server processors have an lppaca struct */ > >> > +#ifdef CONFIG_PPC_BOOK3S > >> > +return (IS_ENABLED(CONFIG_PPC_SPLPAR) && > >> > +lppaca_shared_proc(local_paca->lppaca_ptr)); > >> > +#else > >> > +return false; > >> > +#endif > >> > +} > >> > + > >> > >> CONFIG_PPC_SPLPAR depends on CONFIG_PPC_PSERIES, which depends on > >> CONFIG_PPC_BOOK3S so the #ifdef above is unnecessary: > >> > >> if CONFIG_PPC_BOOK3S is unset then CONFIG_PPC_SPLPAR will be unset as > >> well and the return expression should short-circuit to false. > >> > > > > Agreed, but the #ifdef is necessary to compile platforms which include > > this header but do not implement lppaca_shared_proc(...) and friends. > > I can reword the comment if that helps. > > Ah, indeed. Yes, if you could mention that in the commit I think it > would help. These #ifdefs are becoming démodé so it's good to know why > they're there. > > Another alternative is to provide a dummy lppaca_shared_proc() which > always returns false when CONFIG_PPC_BOOK3S isn't set (just mentioning > it, I don't have a preference). > Yeah, I tried that first, but the declaration and definition for lppaca_shared_proc() and arguments are nested within several includes and arch/platform #ifdefs that I decided the #ifdef in is_shared_processor() is simpler. I am not sure if unraveling all that makes sense for implementing this fix, maybe someone can convince me hah. In any case, next version will have an improved commit message and comment. > -- > Thiago Jung Bauermann > IBM Linux Technology Center
[PATCH v2 0/3] Fix oops in shared-processor spinlocks
Fixes an oops when calling the shared-processor spinlock implementation from a non-SP LPAR. Also take this opportunity to refactor SHARED_PROCESSOR a bit. Reference: https://github.com/linuxppc/issues/issues/229 Changes since v1: - Improve comment wording to make it clear why the BOOK3S #ifdef is required in is_shared_processor() in spinlock.h - Replace empty #define of splpar_*_yield() with actual functions with empty bodies. Christopher M. Riedl (3): powerpc/spinlocks: Refactor SHARED_PROCESSOR powerpc/spinlocks: Rename SPLPAR-only spinlocks powerpc/spinlocks: Fix oops in shared-processor spinlocks arch/powerpc/include/asm/spinlock.h | 62 + arch/powerpc/lib/locks.c| 6 +-- 2 files changed, 48 insertions(+), 20 deletions(-) -- 2.22.0
[PATCH v2 2/3] powerpc/spinlocks: Rename SPLPAR-only spinlocks
The __rw_yield and __spin_yield locks only pertain to SPLPAR mode. Rename them to make this relationship obvious. Signed-off-by: Christopher M. Riedl Reviewed-by: Andrew Donnellan --- arch/powerpc/include/asm/spinlock.h | 6 -- arch/powerpc/lib/locks.c| 6 +++--- 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h index dc5fcea1f006..0a8270183770 100644 --- a/arch/powerpc/include/asm/spinlock.h +++ b/arch/powerpc/include/asm/spinlock.h @@ -101,8 +101,10 @@ static inline int arch_spin_trylock(arch_spinlock_t *lock) #if defined(CONFIG_PPC_SPLPAR) /* We only yield to the hypervisor if we are in shared processor mode */ -extern void __spin_yield(arch_spinlock_t *lock); -extern void __rw_yield(arch_rwlock_t *lock); +void splpar_spin_yield(arch_spinlock_t *lock); +void splpar_rw_yield(arch_rwlock_t *lock); +#define __spin_yield(x) splpar_spin_yield(x) +#define __rw_yield(x) splpar_rw_yield(x) #else /* SPLPAR */ #define __spin_yield(x)barrier() #define __rw_yield(x) barrier() diff --git a/arch/powerpc/lib/locks.c b/arch/powerpc/lib/locks.c index 6550b9e5ce5f..6440d5943c00 100644 --- a/arch/powerpc/lib/locks.c +++ b/arch/powerpc/lib/locks.c @@ -18,7 +18,7 @@ #include #include -void __spin_yield(arch_spinlock_t *lock) +void splpar_spin_yield(arch_spinlock_t *lock) { unsigned int lock_value, holder_cpu, yield_count; @@ -36,14 +36,14 @@ void __spin_yield(arch_spinlock_t *lock) plpar_hcall_norets(H_CONFER, get_hard_smp_processor_id(holder_cpu), yield_count); } -EXPORT_SYMBOL_GPL(__spin_yield); +EXPORT_SYMBOL_GPL(splpar_spin_yield); /* * Waiting for a read lock or a write lock on a rwlock... * This turns out to be the same for read and write locks, since * we only know the holder if it is write-locked. */ -void __rw_yield(arch_rwlock_t *rw) +void splpar_rw_yield(arch_rwlock_t *rw) { int lock_value; unsigned int holder_cpu, yield_count; -- 2.22.0
[PATCH v2 1/3] powerpc/spinlocks: Refactor SHARED_PROCESSOR
Determining if a processor is in shared processor mode is not a constant so don't hide it behind a #define. Signed-off-by: Christopher M. Riedl Reviewed-by: Andrew Donnellan --- arch/powerpc/include/asm/spinlock.h | 24 ++-- 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h index a47f827bc5f1..dc5fcea1f006 100644 --- a/arch/powerpc/include/asm/spinlock.h +++ b/arch/powerpc/include/asm/spinlock.h @@ -101,15 +101,27 @@ static inline int arch_spin_trylock(arch_spinlock_t *lock) #if defined(CONFIG_PPC_SPLPAR) /* We only yield to the hypervisor if we are in shared processor mode */ -#define SHARED_PROCESSOR (lppaca_shared_proc(local_paca->lppaca_ptr)) extern void __spin_yield(arch_spinlock_t *lock); extern void __rw_yield(arch_rwlock_t *lock); #else /* SPLPAR */ #define __spin_yield(x)barrier() #define __rw_yield(x) barrier() -#define SHARED_PROCESSOR 0 #endif +static inline bool is_shared_processor(void) +{ +/* + * LPPACA is only available on BOOK3S so guard anything LPPACA related to + * allow other platforms (which include this common header) to compile. + */ +#ifdef CONFIG_PPC_BOOK3S + return (IS_ENABLED(CONFIG_PPC_SPLPAR) && + lppaca_shared_proc(local_paca->lppaca_ptr)); +#else + return false; +#endif +} + static inline void arch_spin_lock(arch_spinlock_t *lock) { while (1) { @@ -117,7 +129,7 @@ static inline void arch_spin_lock(arch_spinlock_t *lock) break; do { HMT_low(); - if (SHARED_PROCESSOR) + if (is_shared_processor()) __spin_yield(lock); } while (unlikely(lock->slock != 0)); HMT_medium(); @@ -136,7 +148,7 @@ void arch_spin_lock_flags(arch_spinlock_t *lock, unsigned long flags) local_irq_restore(flags); do { HMT_low(); - if (SHARED_PROCESSOR) + if (is_shared_processor()) __spin_yield(lock); } while (unlikely(lock->slock != 0)); HMT_medium(); @@ -226,7 +238,7 @@ static inline void arch_read_lock(arch_rwlock_t *rw) break; do { HMT_low(); - if (SHARED_PROCESSOR) + if (is_shared_processor()) __rw_yield(rw); } while (unlikely(rw->lock < 0)); HMT_medium(); @@ -240,7 +252,7 @@ static inline void arch_write_lock(arch_rwlock_t *rw) break; do { HMT_low(); - if (SHARED_PROCESSOR) + if (is_shared_processor()) __rw_yield(rw); } while (unlikely(rw->lock != 0)); HMT_medium(); -- 2.22.0
[PATCH v2 3/3] powerpc/spinlocks: Fix oops in shared-processor spinlocks
Booting w/ ppc64le_defconfig + CONFIG_PREEMPT results in the attached kernel trace due to calling shared-processor spinlocks while not running in an SPLPAR. Previously, the out-of-line spinlocks implementations were selected based on CONFIG_PPC_SPLPAR at compile time without a runtime shared-processor LPAR check. To fix, call the actual spinlock implementations from a set of common functions, spin_yield() and rw_yield(), which check for shared-processor LPAR during runtime and select the appropriate lock implementation. [0.430878] BUG: Kernel NULL pointer dereference at 0x0100 [0.431991] Faulting instruction address: 0xc0097f88 [0.432934] Oops: Kernel access of bad area, sig: 7 [#1] [0.433448] LE PAGE_SIZE=64K MMU=Radix MMU=Hash PREEMPT SMP NR_CPUS=2048 NUMA PowerNV [0.434479] Modules linked in: [0.435055] CPU: 0 PID: 2 Comm: kthreadd Not tainted 5.2.0-rc6-00491-g249155c20f9b #28 [0.435730] NIP: c0097f88 LR: c0c07a88 CTR: c015ca10 [0.436383] REGS: c000727079f0 TRAP: 0300 Not tainted (5.2.0-rc6-00491-g249155c20f9b) [0.437004] MSR: 92009033 CR: 84000424 XER: 2004 [0.437874] CFAR: c0c07a84 DAR: 0100 DSISR: 0008 IRQMASK: 1 [0.437874] GPR00: c0c07a88 c00072707c80 c1546300 c0007be38a80 [0.437874] GPR04: c000726f0c00 0002 c0007279c980 0100 [0.437874] GPR08: c1581b78 8001 0008 c0007279c9b0 [0.437874] GPR12: c173 c0142558 [0.437874] GPR16: [0.437874] GPR20: [0.437874] GPR24: c0007be38a80 c0c002f4 [0.437874] GPR28: c00072221a00 c000726c2600 c0007be38a80 c0007be38a80 [0.443992] NIP [c0097f88] __spin_yield+0x48/0xa0 [0.444523] LR [c0c07a88] __raw_spin_lock+0xb8/0xc0 [0.445080] Call Trace: [0.445670] [c00072707c80] [c00072221a00] 0xc00072221a00 (unreliable) [0.446425] [c00072707cb0] [c0bffb0c] __schedule+0xbc/0x850 [0.447078] [c00072707d70] [c0c002f4] schedule+0x54/0x130 [0.447694] [c00072707da0] [c01427dc] kthreadd+0x28c/0x2b0 [0.448389] [c00072707e20] [c000c1cc] ret_from_kernel_thread+0x5c/0x70 [0.449143] Instruction dump: [0.449821] 4d9e0020 552a043e 210a07ff 79080fe0 0b08 3d020004 3908b878 794a1f24 [0.450587] e8e8 7ce7502a e8e7 38e70100 <7ca03c2c> 70a70001 78a50020 4d820020 [0.452808] ---[ end trace 474d6b2b8fc5cb7e ]--- Signed-off-by: Christopher M. Riedl --- arch/powerpc/include/asm/spinlock.h | 36 - 1 file changed, 25 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h index 0a8270183770..6aed8a83b180 100644 --- a/arch/powerpc/include/asm/spinlock.h +++ b/arch/powerpc/include/asm/spinlock.h @@ -103,11 +103,9 @@ static inline int arch_spin_trylock(arch_spinlock_t *lock) /* We only yield to the hypervisor if we are in shared processor mode */ void splpar_spin_yield(arch_spinlock_t *lock); void splpar_rw_yield(arch_rwlock_t *lock); -#define __spin_yield(x) splpar_spin_yield(x) -#define __rw_yield(x) splpar_rw_yield(x) #else /* SPLPAR */ -#define __spin_yield(x)barrier() -#define __rw_yield(x) barrier() +static inline void splpar_spin_yield(arch_spinlock_t *lock) {}; +static inline void splpar_rw_yield(arch_rwlock_t *lock) {}; #endif static inline bool is_shared_processor(void) @@ -124,6 +122,22 @@ static inline bool is_shared_processor(void) #endif } +static inline void spin_yield(arch_spinlock_t *lock) +{ + if (is_shared_processor()) + splpar_spin_yield(lock); + else + barrier(); +} + +static inline void rw_yield(arch_rwlock_t *lock) +{ + if (is_shared_processor()) + splpar_rw_yield(lock); + else + barrier(); +} + static inline void arch_spin_lock(arch_spinlock_t *lock) { while (1) { @@ -132,7 +146,7 @@ static inline void arch_spin_lock(arch_spinlock_t *lock) do { HMT_low(); if (is_shared_processor()) - __spin_yield(lock); + spin_yield(lock); } while (unlikely(lock->slock != 0)); HMT_medium(); } @@ -151,7 +165,7 @@ void arch_spin_lock_flags(arch_spinlock_t *lock, unsigned long flags) do { HMT_low(); if (is_shared_processor()) - __spin_yield(lock); + spin_y
Re: [PATCH v2 3/3] powerpc/spinlocks: Fix oops in shared-processor spinlocks
> On August 2, 2019 at 6:38 AM Michael Ellerman wrote: > > > "Christopher M. Riedl" writes: > > diff --git a/arch/powerpc/include/asm/spinlock.h > > b/arch/powerpc/include/asm/spinlock.h > > index 0a8270183770..6aed8a83b180 100644 > > --- a/arch/powerpc/include/asm/spinlock.h > > +++ b/arch/powerpc/include/asm/spinlock.h > > @@ -124,6 +122,22 @@ static inline bool is_shared_processor(void) > > #endif > > } > > > > +static inline void spin_yield(arch_spinlock_t *lock) > > +{ > > + if (is_shared_processor()) > > + splpar_spin_yield(lock); > > + else > > + barrier(); > > +} > ... > > static inline void arch_spin_lock(arch_spinlock_t *lock) > > { > > while (1) { > > @@ -132,7 +146,7 @@ static inline void arch_spin_lock(arch_spinlock_t *lock) > > do { > > HMT_low(); > > if (is_shared_processor()) > > - __spin_yield(lock); > > + spin_yield(lock); > > This leaves us with a double test of is_shared_processor() doesn't it? Yep, and that's no good. Hmm, executing the barrier() in the non-shared-processor case probably hurts performance here?
[RFC PATCH v3] powerpc/xmon: Restrict when kernel is locked down
Xmon should be either fully or partially disabled depending on the kernel lockdown state. Put xmon into read-only mode for lockdown=integrity and completely disable xmon when lockdown=confidentiality. Xmon checks the lockdown state and takes appropriate action: (1) during xmon_setup to prevent early xmon'ing (2) when triggered via sysrq (3) when toggled via debugfs (4) when triggered via a previously enabled breakpoint The following lockdown state transitions are handled: (1) lockdown=none -> lockdown=integrity set xmon read-only mode (2) lockdown=none -> lockdown=confidentiality clear all breakpoints, set xmon read-only mode, prevent re-entry into xmon (3) lockdown=integrity -> lockdown=confidentiality clear all breakpoints, set xmon read-only mode, prevent re-entry into xmon Suggested-by: Andrew Donnellan Signed-off-by: Christopher M. Riedl --- Changes since v1: - Rebased onto v36 of https://patchwork.kernel.org/cover/11049461/ (based on: f632a8170a6b667ee4e3f552087588f0fe13c4bb) - Do not clear existing breakpoints when transitioning from lockdown=none to lockdown=integrity - Remove line continuation and dangling quote (confuses checkpatch.pl) from the xmon command help/usage string arch/powerpc/xmon/xmon.c | 59 ++-- include/linux/security.h | 2 ++ security/lockdown/lockdown.c | 2 ++ 3 files changed, 60 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index d0620d762a5a..1a5e43d664ca 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -25,6 +25,7 @@ #include #include #include +#include #include #include @@ -187,6 +188,9 @@ static void dump_tlb_44x(void); static void dump_tlb_book3e(void); #endif +static void clear_all_bpt(void); +static void xmon_init(int); + #ifdef CONFIG_PPC64 #define REG"%.16lx" #else @@ -283,10 +287,41 @@ Commands:\n\ " U show uptime information\n" " ? help\n" " # n limit output to n lines per page (for dp, dpa, dl)\n" -" zr reboot\n\ - zh halt\n" +" zr reboot\n" +" zh halt\n" ; +#ifdef CONFIG_SECURITY +static bool xmon_is_locked_down(void) +{ + static bool lockdown; + + if (!lockdown) { + lockdown = !!security_locked_down(LOCKDOWN_XMON_RW); + if (lockdown) { + printf("xmon: Disabled due to kernel lockdown\n"); + xmon_is_ro = true; + xmon_on = 0; + xmon_init(0); + clear_all_bpt(); + } + } + + if (!xmon_is_ro) { + xmon_is_ro = !!security_locked_down(LOCKDOWN_XMON_WR); + if (xmon_is_ro) + printf("xmon: Read-only due to kernel lockdown\n"); + } + + return lockdown; +} +#else /* CONFIG_SECURITY */ +static inline bool xmon_is_locked_down(void) +{ + return false; +} +#endif + static struct pt_regs *xmon_regs; static inline void sync(void) @@ -704,6 +739,9 @@ static int xmon_bpt(struct pt_regs *regs) struct bpt *bp; unsigned long offset; + if (xmon_is_locked_down()) + return 0; + if ((regs->msr & (MSR_IR|MSR_PR|MSR_64BIT)) != (MSR_IR|MSR_64BIT)) return 0; @@ -735,6 +773,9 @@ static int xmon_sstep(struct pt_regs *regs) static int xmon_break_match(struct pt_regs *regs) { + if (xmon_is_locked_down()) + return 0; + if ((regs->msr & (MSR_IR|MSR_PR|MSR_64BIT)) != (MSR_IR|MSR_64BIT)) return 0; if (dabr.enabled == 0) @@ -745,6 +786,9 @@ static int xmon_break_match(struct pt_regs *regs) static int xmon_iabr_match(struct pt_regs *regs) { + if (xmon_is_locked_down()) + return 0; + if ((regs->msr & (MSR_IR|MSR_PR|MSR_64BIT)) != (MSR_IR|MSR_64BIT)) return 0; if (iabr == NULL) @@ -3741,6 +3785,9 @@ static void xmon_init(int enable) #ifdef CONFIG_MAGIC_SYSRQ static void sysrq_handle_xmon(int key) { + if (xmon_is_locked_down()) + return; + /* ensure xmon is enabled */ xmon_init(1); debugger(get_irq_regs()); @@ -3762,7 +3809,6 @@ static int __init setup_xmon_sysrq(void) device_initcall(setup_xmon_sysrq); #endif /* CONFIG_MAGIC_SYSRQ */ -#ifdef CONFIG_DEBUG_FS static void clear_all_bpt(void) { int i; @@ -3784,8 +3830,12 @@ static void clear_all_bpt(void) printf("xmon: All breakpoints cleared\n"); } +#ifdef CONFIG_DEBUG_FS static int xmon_dbgfs_set(void *data, u64 val) { + if (xmon_is_locked_down()) + return 0; + xmon_on = !!val; xmon_init(xmon_on); @@ -3844,6 +3894,9 @@ early_param("xmon", early_parse_xmon);
Re: [RFC PATCH v2] powerpc/xmon: restrict when kernel is locked down
> On July 29, 2019 at 2:00 AM Daniel Axtens wrote: > > Would you be able to send a v2 with these changes? (that is, not purging > breakpoints when entering integrity mode) > Just sent out a v3 with that change among a few others and a rebase. Thanks, Chris R.
[PATCH v3 1/3] powerpc/spinlocks: Refactor SHARED_PROCESSOR
Determining if a processor is in shared processor mode is not a constant so don't hide it behind a #define. Signed-off-by: Christopher M. Riedl Reviewed-by: Andrew Donnellan --- arch/powerpc/include/asm/spinlock.h | 24 ++-- 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h index a47f827bc5f1..dc5fcea1f006 100644 --- a/arch/powerpc/include/asm/spinlock.h +++ b/arch/powerpc/include/asm/spinlock.h @@ -101,15 +101,27 @@ static inline int arch_spin_trylock(arch_spinlock_t *lock) #if defined(CONFIG_PPC_SPLPAR) /* We only yield to the hypervisor if we are in shared processor mode */ -#define SHARED_PROCESSOR (lppaca_shared_proc(local_paca->lppaca_ptr)) extern void __spin_yield(arch_spinlock_t *lock); extern void __rw_yield(arch_rwlock_t *lock); #else /* SPLPAR */ #define __spin_yield(x)barrier() #define __rw_yield(x) barrier() -#define SHARED_PROCESSOR 0 #endif +static inline bool is_shared_processor(void) +{ +/* + * LPPACA is only available on BOOK3S so guard anything LPPACA related to + * allow other platforms (which include this common header) to compile. + */ +#ifdef CONFIG_PPC_BOOK3S + return (IS_ENABLED(CONFIG_PPC_SPLPAR) && + lppaca_shared_proc(local_paca->lppaca_ptr)); +#else + return false; +#endif +} + static inline void arch_spin_lock(arch_spinlock_t *lock) { while (1) { @@ -117,7 +129,7 @@ static inline void arch_spin_lock(arch_spinlock_t *lock) break; do { HMT_low(); - if (SHARED_PROCESSOR) + if (is_shared_processor()) __spin_yield(lock); } while (unlikely(lock->slock != 0)); HMT_medium(); @@ -136,7 +148,7 @@ void arch_spin_lock_flags(arch_spinlock_t *lock, unsigned long flags) local_irq_restore(flags); do { HMT_low(); - if (SHARED_PROCESSOR) + if (is_shared_processor()) __spin_yield(lock); } while (unlikely(lock->slock != 0)); HMT_medium(); @@ -226,7 +238,7 @@ static inline void arch_read_lock(arch_rwlock_t *rw) break; do { HMT_low(); - if (SHARED_PROCESSOR) + if (is_shared_processor()) __rw_yield(rw); } while (unlikely(rw->lock < 0)); HMT_medium(); @@ -240,7 +252,7 @@ static inline void arch_write_lock(arch_rwlock_t *rw) break; do { HMT_low(); - if (SHARED_PROCESSOR) + if (is_shared_processor()) __rw_yield(rw); } while (unlikely(rw->lock != 0)); HMT_medium(); -- 2.22.0
[PATCH v3 3/3] powerpc/spinlocks: Fix oops in shared-processor spinlocks
Booting w/ ppc64le_defconfig + CONFIG_PREEMPT results in the attached kernel trace due to calling shared-processor spinlocks while not running in an SPLPAR. Previously, the out-of-line spinlocks implementations were selected based on CONFIG_PPC_SPLPAR at compile time without a runtime shared-processor LPAR check. To fix, call the actual spinlock implementations from a set of common functions, spin_yield() and rw_yield(), which check for shared-processor LPAR during runtime and select the appropriate lock implementation. [0.430878] BUG: Kernel NULL pointer dereference at 0x0100 [0.431991] Faulting instruction address: 0xc0097f88 [0.432934] Oops: Kernel access of bad area, sig: 7 [#1] [0.433448] LE PAGE_SIZE=64K MMU=Radix MMU=Hash PREEMPT SMP NR_CPUS=2048 NUMA PowerNV [0.434479] Modules linked in: [0.435055] CPU: 0 PID: 2 Comm: kthreadd Not tainted 5.2.0-rc6-00491-g249155c20f9b #28 [0.435730] NIP: c0097f88 LR: c0c07a88 CTR: c015ca10 [0.436383] REGS: c000727079f0 TRAP: 0300 Not tainted (5.2.0-rc6-00491-g249155c20f9b) [0.437004] MSR: 92009033 CR: 84000424 XER: 2004 [0.437874] CFAR: c0c07a84 DAR: 0100 DSISR: 0008 IRQMASK: 1 [0.437874] GPR00: c0c07a88 c00072707c80 c1546300 c0007be38a80 [0.437874] GPR04: c000726f0c00 0002 c0007279c980 0100 [0.437874] GPR08: c1581b78 8001 0008 c0007279c9b0 [0.437874] GPR12: c173 c0142558 [0.437874] GPR16: [0.437874] GPR20: [0.437874] GPR24: c0007be38a80 c0c002f4 [0.437874] GPR28: c00072221a00 c000726c2600 c0007be38a80 c0007be38a80 [0.443992] NIP [c0097f88] __spin_yield+0x48/0xa0 [0.444523] LR [c0c07a88] __raw_spin_lock+0xb8/0xc0 [0.445080] Call Trace: [0.445670] [c00072707c80] [c00072221a00] 0xc00072221a00 (unreliable) [0.446425] [c00072707cb0] [c0bffb0c] __schedule+0xbc/0x850 [0.447078] [c00072707d70] [c0c002f4] schedule+0x54/0x130 [0.447694] [c00072707da0] [c01427dc] kthreadd+0x28c/0x2b0 [0.448389] [c00072707e20] [c000c1cc] ret_from_kernel_thread+0x5c/0x70 [0.449143] Instruction dump: [0.449821] 4d9e0020 552a043e 210a07ff 79080fe0 0b08 3d020004 3908b878 794a1f24 [0.450587] e8e8 7ce7502a e8e7 38e70100 <7ca03c2c> 70a70001 78a50020 4d820020 [0.452808] ---[ end trace 474d6b2b8fc5cb7e ]--- Signed-off-by: Christopher M. Riedl --- Changes since v2: - Directly call splpar_*_yield() to avoid duplicate call to is_shared_processor() in some cases arch/powerpc/include/asm/spinlock.h | 36 - 1 file changed, 25 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h index 0a8270183770..8935315c80ff 100644 --- a/arch/powerpc/include/asm/spinlock.h +++ b/arch/powerpc/include/asm/spinlock.h @@ -103,11 +103,9 @@ static inline int arch_spin_trylock(arch_spinlock_t *lock) /* We only yield to the hypervisor if we are in shared processor mode */ void splpar_spin_yield(arch_spinlock_t *lock); void splpar_rw_yield(arch_rwlock_t *lock); -#define __spin_yield(x) splpar_spin_yield(x) -#define __rw_yield(x) splpar_rw_yield(x) #else /* SPLPAR */ -#define __spin_yield(x)barrier() -#define __rw_yield(x) barrier() +static inline void splpar_spin_yield(arch_spinlock_t *lock) {}; +static inline void splpar_rw_yield(arch_rwlock_t *lock) {}; #endif static inline bool is_shared_processor(void) @@ -124,6 +122,22 @@ static inline bool is_shared_processor(void) #endif } +static inline void spin_yield(arch_spinlock_t *lock) +{ + if (is_shared_processor()) + splpar_spin_yield(lock); + else + barrier(); +} + +static inline void rw_yield(arch_rwlock_t *lock) +{ + if (is_shared_processor()) + splpar_rw_yield(lock); + else + barrier(); +} + static inline void arch_spin_lock(arch_spinlock_t *lock) { while (1) { @@ -132,7 +146,7 @@ static inline void arch_spin_lock(arch_spinlock_t *lock) do { HMT_low(); if (is_shared_processor()) - __spin_yield(lock); + splpar_spin_yield(lock); } while (unlikely(lock->slock != 0)); HMT_medium(); } @@ -151,7 +165,7 @@ void arch_spin_lock_flags(arch_spinlock_t *lock, unsigned long flags) do { HMT_low();
[PATCH v3 2/3] powerpc/spinlocks: Rename SPLPAR-only spinlocks
The __rw_yield and __spin_yield locks only pertain to SPLPAR mode. Rename them to make this relationship obvious. Signed-off-by: Christopher M. Riedl Reviewed-by: Andrew Donnellan --- arch/powerpc/include/asm/spinlock.h | 6 -- arch/powerpc/lib/locks.c| 6 +++--- 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h index dc5fcea1f006..0a8270183770 100644 --- a/arch/powerpc/include/asm/spinlock.h +++ b/arch/powerpc/include/asm/spinlock.h @@ -101,8 +101,10 @@ static inline int arch_spin_trylock(arch_spinlock_t *lock) #if defined(CONFIG_PPC_SPLPAR) /* We only yield to the hypervisor if we are in shared processor mode */ -extern void __spin_yield(arch_spinlock_t *lock); -extern void __rw_yield(arch_rwlock_t *lock); +void splpar_spin_yield(arch_spinlock_t *lock); +void splpar_rw_yield(arch_rwlock_t *lock); +#define __spin_yield(x) splpar_spin_yield(x) +#define __rw_yield(x) splpar_rw_yield(x) #else /* SPLPAR */ #define __spin_yield(x)barrier() #define __rw_yield(x) barrier() diff --git a/arch/powerpc/lib/locks.c b/arch/powerpc/lib/locks.c index 6550b9e5ce5f..6440d5943c00 100644 --- a/arch/powerpc/lib/locks.c +++ b/arch/powerpc/lib/locks.c @@ -18,7 +18,7 @@ #include #include -void __spin_yield(arch_spinlock_t *lock) +void splpar_spin_yield(arch_spinlock_t *lock) { unsigned int lock_value, holder_cpu, yield_count; @@ -36,14 +36,14 @@ void __spin_yield(arch_spinlock_t *lock) plpar_hcall_norets(H_CONFER, get_hard_smp_processor_id(holder_cpu), yield_count); } -EXPORT_SYMBOL_GPL(__spin_yield); +EXPORT_SYMBOL_GPL(splpar_spin_yield); /* * Waiting for a read lock or a write lock on a rwlock... * This turns out to be the same for read and write locks, since * we only know the holder if it is write-locked. */ -void __rw_yield(arch_rwlock_t *rw) +void splpar_rw_yield(arch_rwlock_t *rw) { int lock_value; unsigned int holder_cpu, yield_count; -- 2.22.0
[PATCH v3 0/3] Fix oops in shared-processor spinlocks
Fixes an oops when calling the shared-processor spinlock implementation from a non-SP LPAR. Also take this opportunity to refactor SHARED_PROCESSOR a bit. Reference: https://github.com/linuxppc/issues/issues/229 Changes since v2: - Directly call splpar_*_yield() to avoid duplicate call to is_shared_processor() in some cases Changes since v1: - Improve comment wording to make it clear why the BOOK3S #ifdef is required in is_shared_processor() in spinlock.h - Replace empty #define of splpar_*_yield() with actual functions with empty bodies Christopher M. Riedl (3): powerpc/spinlocks: Refactor SHARED_PROCESSOR powerpc/spinlocks: Rename SPLPAR-only spinlocks powerpc/spinlocks: Fix oops in shared-processor spinlocks arch/powerpc/include/asm/spinlock.h | 62 + arch/powerpc/lib/locks.c| 6 +-- 2 files changed, 48 insertions(+), 20 deletions(-) -- 2.22.0
Re: [PATCH v2 3/3] powerpc/spinlocks: Fix oops in shared-processor spinlocks
> On August 6, 2019 at 7:14 AM Michael Ellerman wrote: > > > Christopher M Riedl writes: > >> On August 2, 2019 at 6:38 AM Michael Ellerman wrote: > >> "Christopher M. Riedl" writes: > >> > >> This leaves us with a double test of is_shared_processor() doesn't it? > > > > Yep, and that's no good. Hmm, executing the barrier() in the > > non-shared-processor > > case probably hurts performance here? > > It's only a "compiler barrier", so it shouldn't generate any code. > > But it does have the effect of telling the compiler it can't optimise > across that barrier, which can be important. > > In those spin loops all we're doing is checking lock->slock which is > already marked volatile in the definition of arch_spinlock_t, so the > extra barrier shouldn't really make any difference. > > But still the current code doesn't have a barrier() there, so we should > make sure we don't introduce one as part of this refactor. Thank you for taking the time to explain this. I have some more reading to do about compiler-barriers it seems :) > > So I think you just want to change the call to spin_yield() above to > splpar_spin_yield(), which avoids the double check, and also avoids the > barrier() in the SPLPAR=n case. > > And then arch_spin_relax() calls spin_yield() etc. I submitted a v3 before your reply with this change already - figured this is the best way to avoid the double check and maintain legacy behavior. > > cheers
[PATCH v4 0/3] Fix oops in shared-processor spinlocks
Fixes an oops when calling the shared-processor spinlock implementation from a non-SP LPAR. Also take this opportunity to refactor SHARED_PROCESSOR a bit. Reference: https://github.com/linuxppc/issues/issues/229 Changes since v3: - Replace CONFIG_BOOK3S #ifdef with CONFIG_PPC_PSERIES in is_shared_processor() to fix compile error reported by 0day-ci Changes since v2: - Directly call splpar_*_yield() to avoid duplicate call to is_shared_processor() in some cases Changes since v1: - Improve comment wording to make it clear why the BOOK3S #ifdef is required in is_shared_processor() in spinlock.h - Replace empty #define of splpar_*_yield() with actual functions with empty bodies Christopher M. Riedl (3): powerpc/spinlocks: Refactor SHARED_PROCESSOR powerpc/spinlocks: Rename SPLPAR-only spinlocks powerpc/spinlocks: Fix oops in shared-processor spinlocks arch/powerpc/include/asm/spinlock.h | 62 + arch/powerpc/lib/locks.c| 6 +-- 2 files changed, 48 insertions(+), 20 deletions(-) -- 2.22.0
[PATCH v4 1/3] powerpc/spinlocks: Refactor SHARED_PROCESSOR
Determining if a processor is in shared processor mode is not a constant so don't hide it behind a #define. Signed-off-by: Christopher M. Riedl Reviewed-by: Andrew Donnellan --- arch/powerpc/include/asm/spinlock.h | 24 ++-- 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h index a47f827bc5f1..e9c60fbcc8fe 100644 --- a/arch/powerpc/include/asm/spinlock.h +++ b/arch/powerpc/include/asm/spinlock.h @@ -101,15 +101,27 @@ static inline int arch_spin_trylock(arch_spinlock_t *lock) #if defined(CONFIG_PPC_SPLPAR) /* We only yield to the hypervisor if we are in shared processor mode */ -#define SHARED_PROCESSOR (lppaca_shared_proc(local_paca->lppaca_ptr)) extern void __spin_yield(arch_spinlock_t *lock); extern void __rw_yield(arch_rwlock_t *lock); #else /* SPLPAR */ #define __spin_yield(x)barrier() #define __rw_yield(x) barrier() -#define SHARED_PROCESSOR 0 #endif +static inline bool is_shared_processor(void) +{ +/* + * LPPACA is only available on Pseries so guard anything LPPACA related to + * allow other platforms (which include this common header) to compile. + */ +#ifdef CONFIG_PPC_PSERIES + return (IS_ENABLED(CONFIG_PPC_SPLPAR) && + lppaca_shared_proc(local_paca->lppaca_ptr)); +#else + return false; +#endif +} + static inline void arch_spin_lock(arch_spinlock_t *lock) { while (1) { @@ -117,7 +129,7 @@ static inline void arch_spin_lock(arch_spinlock_t *lock) break; do { HMT_low(); - if (SHARED_PROCESSOR) + if (is_shared_processor()) __spin_yield(lock); } while (unlikely(lock->slock != 0)); HMT_medium(); @@ -136,7 +148,7 @@ void arch_spin_lock_flags(arch_spinlock_t *lock, unsigned long flags) local_irq_restore(flags); do { HMT_low(); - if (SHARED_PROCESSOR) + if (is_shared_processor()) __spin_yield(lock); } while (unlikely(lock->slock != 0)); HMT_medium(); @@ -226,7 +238,7 @@ static inline void arch_read_lock(arch_rwlock_t *rw) break; do { HMT_low(); - if (SHARED_PROCESSOR) + if (is_shared_processor()) __rw_yield(rw); } while (unlikely(rw->lock < 0)); HMT_medium(); @@ -240,7 +252,7 @@ static inline void arch_write_lock(arch_rwlock_t *rw) break; do { HMT_low(); - if (SHARED_PROCESSOR) + if (is_shared_processor()) __rw_yield(rw); } while (unlikely(rw->lock != 0)); HMT_medium(); -- 2.22.0
[PATCH v4 3/3] powerpc/spinlocks: Fix oops in shared-processor spinlocks
Booting w/ ppc64le_defconfig + CONFIG_PREEMPT results in the attached kernel trace due to calling shared-processor spinlocks while not running in an SPLPAR. Previously, the out-of-line spinlocks implementations were selected based on CONFIG_PPC_SPLPAR at compile time without a runtime shared-processor LPAR check. To fix, call the actual spinlock implementations from a set of common functions, spin_yield() and rw_yield(), which check for shared-processor LPAR during runtime and select the appropriate lock implementation. [0.430878] BUG: Kernel NULL pointer dereference at 0x0100 [0.431991] Faulting instruction address: 0xc0097f88 [0.432934] Oops: Kernel access of bad area, sig: 7 [#1] [0.433448] LE PAGE_SIZE=64K MMU=Radix MMU=Hash PREEMPT SMP NR_CPUS=2048 NUMA PowerNV [0.434479] Modules linked in: [0.435055] CPU: 0 PID: 2 Comm: kthreadd Not tainted 5.2.0-rc6-00491-g249155c20f9b #28 [0.435730] NIP: c0097f88 LR: c0c07a88 CTR: c015ca10 [0.436383] REGS: c000727079f0 TRAP: 0300 Not tainted (5.2.0-rc6-00491-g249155c20f9b) [0.437004] MSR: 92009033 CR: 84000424 XER: 2004 [0.437874] CFAR: c0c07a84 DAR: 0100 DSISR: 0008 IRQMASK: 1 [0.437874] GPR00: c0c07a88 c00072707c80 c1546300 c0007be38a80 [0.437874] GPR04: c000726f0c00 0002 c0007279c980 0100 [0.437874] GPR08: c1581b78 8001 0008 c0007279c9b0 [0.437874] GPR12: c173 c0142558 [0.437874] GPR16: [0.437874] GPR20: [0.437874] GPR24: c0007be38a80 c0c002f4 [0.437874] GPR28: c00072221a00 c000726c2600 c0007be38a80 c0007be38a80 [0.443992] NIP [c0097f88] __spin_yield+0x48/0xa0 [0.444523] LR [c0c07a88] __raw_spin_lock+0xb8/0xc0 [0.445080] Call Trace: [0.445670] [c00072707c80] [c00072221a00] 0xc00072221a00 (unreliable) [0.446425] [c00072707cb0] [c0bffb0c] __schedule+0xbc/0x850 [0.447078] [c00072707d70] [c0c002f4] schedule+0x54/0x130 [0.447694] [c00072707da0] [c01427dc] kthreadd+0x28c/0x2b0 [0.448389] [c00072707e20] [c000c1cc] ret_from_kernel_thread+0x5c/0x70 [0.449143] Instruction dump: [0.449821] 4d9e0020 552a043e 210a07ff 79080fe0 0b08 3d020004 3908b878 794a1f24 [0.450587] e8e8 7ce7502a e8e7 38e70100 <7ca03c2c> 70a70001 78a50020 4d820020 [0.452808] ---[ end trace 474d6b2b8fc5cb7e ]--- Signed-off-by: Christopher M. Riedl --- arch/powerpc/include/asm/spinlock.h | 36 - 1 file changed, 25 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h index 0d04d468f660..e9a960e28f3c 100644 --- a/arch/powerpc/include/asm/spinlock.h +++ b/arch/powerpc/include/asm/spinlock.h @@ -103,11 +103,9 @@ static inline int arch_spin_trylock(arch_spinlock_t *lock) /* We only yield to the hypervisor if we are in shared processor mode */ void splpar_spin_yield(arch_spinlock_t *lock); void splpar_rw_yield(arch_rwlock_t *lock); -#define __spin_yield(x) splpar_spin_yield(x) -#define __rw_yield(x) splpar_rw_yield(x) #else /* SPLPAR */ -#define __spin_yield(x)barrier() -#define __rw_yield(x) barrier() +static inline void splpar_spin_yield(arch_spinlock_t *lock) {}; +static inline void splpar_rw_yield(arch_rwlock_t *lock) {}; #endif static inline bool is_shared_processor(void) @@ -124,6 +122,22 @@ static inline bool is_shared_processor(void) #endif } +static inline void spin_yield(arch_spinlock_t *lock) +{ + if (is_shared_processor()) + splpar_spin_yield(lock); + else + barrier(); +} + +static inline void rw_yield(arch_rwlock_t *lock) +{ + if (is_shared_processor()) + splpar_rw_yield(lock); + else + barrier(); +} + static inline void arch_spin_lock(arch_spinlock_t *lock) { while (1) { @@ -132,7 +146,7 @@ static inline void arch_spin_lock(arch_spinlock_t *lock) do { HMT_low(); if (is_shared_processor()) - __spin_yield(lock); + splpar_spin_yield(lock); } while (unlikely(lock->slock != 0)); HMT_medium(); } @@ -151,7 +165,7 @@ void arch_spin_lock_flags(arch_spinlock_t *lock, unsigned long flags) do { HMT_low(); if (is_shared_processor()) - __spin_yield(lock); + splpar_spin_y
[PATCH v4 2/3] powerpc/spinlocks: Rename SPLPAR-only spinlocks
The __rw_yield and __spin_yield locks only pertain to SPLPAR mode. Rename them to make this relationship obvious. Signed-off-by: Christopher M. Riedl Reviewed-by: Andrew Donnellan --- arch/powerpc/include/asm/spinlock.h | 6 -- arch/powerpc/lib/locks.c| 6 +++--- 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h index e9c60fbcc8fe..0d04d468f660 100644 --- a/arch/powerpc/include/asm/spinlock.h +++ b/arch/powerpc/include/asm/spinlock.h @@ -101,8 +101,10 @@ static inline int arch_spin_trylock(arch_spinlock_t *lock) #if defined(CONFIG_PPC_SPLPAR) /* We only yield to the hypervisor if we are in shared processor mode */ -extern void __spin_yield(arch_spinlock_t *lock); -extern void __rw_yield(arch_rwlock_t *lock); +void splpar_spin_yield(arch_spinlock_t *lock); +void splpar_rw_yield(arch_rwlock_t *lock); +#define __spin_yield(x) splpar_spin_yield(x) +#define __rw_yield(x) splpar_rw_yield(x) #else /* SPLPAR */ #define __spin_yield(x)barrier() #define __rw_yield(x) barrier() diff --git a/arch/powerpc/lib/locks.c b/arch/powerpc/lib/locks.c index 6550b9e5ce5f..6440d5943c00 100644 --- a/arch/powerpc/lib/locks.c +++ b/arch/powerpc/lib/locks.c @@ -18,7 +18,7 @@ #include #include -void __spin_yield(arch_spinlock_t *lock) +void splpar_spin_yield(arch_spinlock_t *lock) { unsigned int lock_value, holder_cpu, yield_count; @@ -36,14 +36,14 @@ void __spin_yield(arch_spinlock_t *lock) plpar_hcall_norets(H_CONFER, get_hard_smp_processor_id(holder_cpu), yield_count); } -EXPORT_SYMBOL_GPL(__spin_yield); +EXPORT_SYMBOL_GPL(splpar_spin_yield); /* * Waiting for a read lock or a write lock on a rwlock... * This turns out to be the same for read and write locks, since * we only know the holder if it is write-locked. */ -void __rw_yield(arch_rwlock_t *rw) +void splpar_rw_yield(arch_rwlock_t *rw) { int lock_value; unsigned int holder_cpu, yield_count; -- 2.22.0
[RFC PATCH v4 2/2] powerpc/xmon: Restrict when kernel is locked down
Xmon should be either fully or partially disabled depending on the kernel lockdown state. Put xmon into read-only mode for lockdown=integrity and completely disable xmon when lockdown=confidentiality. Xmon checks the lockdown state and takes appropriate action: (1) during xmon_setup to prevent early xmon'ing (2) when triggered via sysrq (3) when toggled via debugfs (4) when triggered via a previously enabled breakpoint The following lockdown state transitions are handled: (1) lockdown=none -> lockdown=integrity set xmon read-only mode (2) lockdown=none -> lockdown=confidentiality clear all breakpoints, set xmon read-only mode, prevent re-entry into xmon (3) lockdown=integrity -> lockdown=confidentiality clear all breakpoints, set xmon read-only mode, prevent re-entry into xmon Suggested-by: Andrew Donnellan Signed-off-by: Christopher M. Riedl --- arch/powerpc/xmon/xmon.c | 59 ++-- include/linux/security.h | 2 ++ security/lockdown/lockdown.c | 2 ++ 3 files changed, 60 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index bb63ecc599fd..8fd79369974e 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -25,6 +25,7 @@ #include #include #include +#include #include #include @@ -187,6 +188,9 @@ static void dump_tlb_44x(void); static void dump_tlb_book3e(void); #endif +static void clear_all_bpt(void); +static void xmon_init(int); + #ifdef CONFIG_PPC64 #define REG"%.16lx" #else @@ -283,10 +287,41 @@ Commands:\n\ " U show uptime information\n" " ? help\n" " # n limit output to n lines per page (for dp, dpa, dl)\n" -" zr reboot\n\ - zh halt\n" +" zr reboot\n" +" zh halt\n" ; +#ifdef CONFIG_SECURITY +static bool xmon_is_locked_down(void) +{ + static bool lockdown; + + if (!lockdown) { + lockdown = !!security_locked_down(LOCKDOWN_XMON_RW); + if (lockdown) { + printf("xmon: Disabled due to kernel lockdown\n"); + xmon_is_ro = true; + xmon_on = 0; + xmon_init(0); + clear_all_bpt(); + } + } + + if (!xmon_is_ro) { + xmon_is_ro = !!security_locked_down(LOCKDOWN_XMON_WR); + if (xmon_is_ro) + printf("xmon: Read-only due to kernel lockdown\n"); + } + + return lockdown; +} +#else /* CONFIG_SECURITY */ +static inline bool xmon_is_locked_down(void) +{ + return false; +} +#endif + static struct pt_regs *xmon_regs; static inline void sync(void) @@ -704,6 +739,9 @@ static int xmon_bpt(struct pt_regs *regs) struct bpt *bp; unsigned long offset; + if (xmon_is_locked_down()) + return 0; + if ((regs->msr & (MSR_IR|MSR_PR|MSR_64BIT)) != (MSR_IR|MSR_64BIT)) return 0; @@ -735,6 +773,9 @@ static int xmon_sstep(struct pt_regs *regs) static int xmon_break_match(struct pt_regs *regs) { + if (xmon_is_locked_down()) + return 0; + if ((regs->msr & (MSR_IR|MSR_PR|MSR_64BIT)) != (MSR_IR|MSR_64BIT)) return 0; if (dabr.enabled == 0) @@ -745,6 +786,9 @@ static int xmon_break_match(struct pt_regs *regs) static int xmon_iabr_match(struct pt_regs *regs) { + if (xmon_is_locked_down()) + return 0; + if ((regs->msr & (MSR_IR|MSR_PR|MSR_64BIT)) != (MSR_IR|MSR_64BIT)) return 0; if (iabr == NULL) @@ -3750,6 +3794,9 @@ static void xmon_init(int enable) #ifdef CONFIG_MAGIC_SYSRQ static void sysrq_handle_xmon(int key) { + if (xmon_is_locked_down()) + return; + /* ensure xmon is enabled */ xmon_init(1); debugger(get_irq_regs()); @@ -3771,7 +3818,6 @@ static int __init setup_xmon_sysrq(void) device_initcall(setup_xmon_sysrq); #endif /* CONFIG_MAGIC_SYSRQ */ -#ifdef CONFIG_DEBUG_FS static void clear_all_bpt(void) { int i; @@ -3793,8 +3839,12 @@ static void clear_all_bpt(void) printf("xmon: All breakpoints cleared\n"); } +#ifdef CONFIG_DEBUG_FS static int xmon_dbgfs_set(void *data, u64 val) { + if (xmon_is_locked_down()) + return 0; + xmon_on = !!val; xmon_init(xmon_on); @@ -3853,6 +3903,9 @@ early_param("xmon", early_parse_xmon); void __init xmon_setup(void) { + if (xmon_is_locked_down()) + return; + if (xmon_on) xmon_init(1); if (xmon_early) diff --git a/include/linux/security.h b/include/linux/security.h index 807dc0d24982..379b74b5d545 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -116,12 +11
[RFC PATCH v4 0/2] Restrict xmon when kernel is locked down
Xmon should be either fully or partially disabled depending on the kernel lockdown state. Put xmon into read-only mode for lockdown=integrity and completely disable xmon when lockdown=confidentiality. Since this can occur dynamically, there may be pre-existing, active breakpoints in xmon when transitioning into read-only mode. These breakpoints will still trigger, so allow them to be listed, but not cleared or altered, using xmon. Changes since v3: - Allow active breakpoints to be shown/listed in read-only mode Changes since v2: - Rebased onto v36 of https://patchwork.kernel.org/cover/11049461/ (based on: f632a8170a6b667ee4e3f552087588f0fe13c4bb) - Do not clear existing breakpoints when transitioning from lockdown=none to lockdown=integrity - Remove line continuation and dangling quote (confuses checkpatch.pl) from the xmon command help/usage string Christopher M. Riedl (2): powerpc/xmon: Allow listing active breakpoints in read-only mode powerpc/xmon: Restrict when kernel is locked down arch/powerpc/xmon/xmon.c | 78 include/linux/security.h | 2 + security/lockdown/lockdown.c | 2 + 3 files changed, 74 insertions(+), 8 deletions(-) -- 2.22.0
[RFC PATCH v4 1/2] powerpc/xmon: Allow listing active breakpoints in read-only mode
Xmon can enter read-only mode dynamically due to changes in kernel lockdown state. This transition does not clear active breakpoints and any these breakpoints should remain visible to the xmon'er. Signed-off-by: Christopher M. Riedl --- arch/powerpc/xmon/xmon.c | 19 ++- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index d0620d762a5a..bb63ecc599fd 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -1045,10 +1045,6 @@ cmds(struct pt_regs *excp) set_lpp_cmd(); break; case 'b': - if (xmon_is_ro) { - printf(xmon_ro_msg); - break; - } bpt_cmds(); break; case 'C': @@ -1317,11 +1313,16 @@ bpt_cmds(void) struct bpt *bp; cmd = inchar(); + switch (cmd) { #ifndef CONFIG_PPC_8xx static const char badaddr[] = "Only kernel addresses are permitted for breakpoints\n"; int mode; case 'd': /* bd - hardware data breakpoint */ + if (xmon_is_ro) { + printf(xmon_ro_msg); + break; + } if (!ppc_breakpoint_available()) { printf("Hardware data breakpoint not supported on this cpu\n"); break; @@ -1349,6 +1350,10 @@ bpt_cmds(void) break; case 'i': /* bi - hardware instr breakpoint */ + if (xmon_is_ro) { + printf(xmon_ro_msg); + break; + } if (!cpu_has_feature(CPU_FTR_ARCH_207S)) { printf("Hardware instruction breakpoint " "not supported on this cpu\n"); @@ -1372,6 +1377,10 @@ bpt_cmds(void) #endif case 'c': + if (xmon_is_ro) { + printf(xmon_ro_msg); + break; + } if (!scanhex(&a)) { /* clear all breakpoints */ for (i = 0; i < NBPTS; ++i) @@ -1407,7 +1416,7 @@ bpt_cmds(void) break; } termch = cmd; - if (!scanhex(&a)) { + if (xmon_is_ro || !scanhex(&a)) { /* print all breakpoints */ printf(" typeaddress\n"); if (dabr.enabled) { -- 2.22.0
[PATCH v5 1/2] powerpc/xmon: Allow listing and clearing breakpoints in read-only mode
Read-only mode should not prevent listing and clearing any active breakpoints. Signed-off-by: Christopher M. Riedl --- arch/powerpc/xmon/xmon.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index d0620d762a5a..a98a354d46ac 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -1045,10 +1045,6 @@ cmds(struct pt_regs *excp) set_lpp_cmd(); break; case 'b': - if (xmon_is_ro) { - printf(xmon_ro_msg); - break; - } bpt_cmds(); break; case 'C': @@ -1317,11 +1313,16 @@ bpt_cmds(void) struct bpt *bp; cmd = inchar(); + switch (cmd) { #ifndef CONFIG_PPC_8xx static const char badaddr[] = "Only kernel addresses are permitted for breakpoints\n"; int mode; case 'd': /* bd - hardware data breakpoint */ + if (xmon_is_ro) { + printf(xmon_ro_msg); + break; + } if (!ppc_breakpoint_available()) { printf("Hardware data breakpoint not supported on this cpu\n"); break; @@ -1349,6 +1350,10 @@ bpt_cmds(void) break; case 'i': /* bi - hardware instr breakpoint */ + if (xmon_is_ro) { + printf(xmon_ro_msg); + break; + } if (!cpu_has_feature(CPU_FTR_ARCH_207S)) { printf("Hardware instruction breakpoint " "not supported on this cpu\n"); @@ -1407,7 +1412,7 @@ bpt_cmds(void) break; } termch = cmd; - if (!scanhex(&a)) { + if (xmon_is_ro || !scanhex(&a)) { /* print all breakpoints */ printf(" typeaddress\n"); if (dabr.enabled) { -- 2.23.0
[PATCH v5 2/2] powerpc/xmon: Restrict when kernel is locked down
Xmon should be either fully or partially disabled depending on the kernel lockdown state. Put xmon into read-only mode for lockdown=integrity and prevent user entry into xmon when lockdown=confidentiality. Xmon checks the lockdown state on every attempted entry: (1) during early xmon'ing (2) when triggered via sysrq (3) when toggled via debugfs (4) when triggered via a previously enabled breakpoint The following lockdown state transitions are handled: (1) lockdown=none -> lockdown=integrity set xmon read-only mode (2) lockdown=none -> lockdown=confidentiality clear all breakpoints, set xmon read-only mode, prevent user re-entry into xmon (3) lockdown=integrity -> lockdown=confidentiality clear all breakpoints, set xmon read-only mode, prevent user re-entry into xmon Suggested-by: Andrew Donnellan Signed-off-by: Christopher M. Riedl --- arch/powerpc/xmon/xmon.c | 85 include/linux/security.h | 2 + security/lockdown/lockdown.c | 2 + 3 files changed, 72 insertions(+), 17 deletions(-) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index a98a354d46ac..94a5fada3034 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -25,6 +25,7 @@ #include #include #include +#include #include #include @@ -187,6 +188,8 @@ static void dump_tlb_44x(void); static void dump_tlb_book3e(void); #endif +static void clear_all_bpt(void); + #ifdef CONFIG_PPC64 #define REG"%.16lx" #else @@ -283,10 +286,38 @@ Commands:\n\ " U show uptime information\n" " ? help\n" " # n limit output to n lines per page (for dp, dpa, dl)\n" -" zr reboot\n\ - zh halt\n" +" zr reboot\n" +" zh halt\n" ; +#ifdef CONFIG_SECURITY +static bool xmon_is_locked_down(void) +{ + static bool lockdown; + + if (!lockdown) { + lockdown = !!security_locked_down(LOCKDOWN_XMON_RW); + if (lockdown) { + printf("xmon: Disabled due to kernel lockdown\n"); + xmon_is_ro = true; + } + } + + if (!xmon_is_ro) { + xmon_is_ro = !!security_locked_down(LOCKDOWN_XMON_WR); + if (xmon_is_ro) + printf("xmon: Read-only due to kernel lockdown\n"); + } + + return lockdown; +} +#else /* CONFIG_SECURITY */ +static inline bool xmon_is_locked_down(void) +{ + return false; +} +#endif + static struct pt_regs *xmon_regs; static inline void sync(void) @@ -438,7 +469,10 @@ static bool wait_for_other_cpus(int ncpus) return false; } -#endif /* CONFIG_SMP */ +#else /* CONFIG_SMP */ +static inline void get_output_lock(void) {} +static inline void release_output_lock(void) {} +#endif static inline int unrecoverable_excp(struct pt_regs *regs) { @@ -455,6 +489,7 @@ static int xmon_core(struct pt_regs *regs, int fromipi) int cmd = 0; struct bpt *bp; long recurse_jmp[JMP_BUF_LEN]; + bool locked_down; unsigned long offset; unsigned long flags; #ifdef CONFIG_SMP @@ -465,6 +500,8 @@ static int xmon_core(struct pt_regs *regs, int fromipi) local_irq_save(flags); hard_irq_disable(); + locked_down = xmon_is_locked_down(); + tracing_enabled = tracing_is_on(); tracing_off(); @@ -516,7 +553,8 @@ static int xmon_core(struct pt_regs *regs, int fromipi) if (!fromipi) { get_output_lock(); - excprint(regs); + if (!locked_down) + excprint(regs); if (bp) { printf("cpu 0x%x stopped at breakpoint 0x%tx (", cpu, BP_NUM(bp)); @@ -568,10 +606,14 @@ static int xmon_core(struct pt_regs *regs, int fromipi) } remove_bpts(); disable_surveillance(); - /* for breakpoint or single step, print the current instr. */ - if (bp || TRAP(regs) == 0xd00) - ppc_inst_dump(regs->nip, 1, 0); - printf("enter ? for help\n"); + + if (!locked_down) { + /* for breakpoint or single step, print curr insn */ + if (bp || TRAP(regs) == 0xd00) + ppc_inst_dump(regs->nip, 1, 0); + printf("enter ? for help\n"); + } + mb(); xmon_gate = 1; barrier(); @@ -595,8 +637,9 @@ static int xmon_core(struct pt_regs *regs, int fromipi) spin_cpu_relax(); touch_nmi_watchdog(); } else { - cmd = cmds(regs); - if (cmd != 0) { +
[PATCH v5 0/2] Restrict xmon when kernel is locked down
Xmon should be either fully or partially disabled depending on the kernel lockdown state. Put xmon into read-only mode for lockdown=integrity and completely disable xmon when lockdown=confidentiality. Since this can occur dynamically, there may be pre-existing, active breakpoints in xmon when transitioning into read-only mode. These breakpoints will still trigger, so allow them to be listed, but not cleared or altered, using xmon. Changes since v4: - Move lockdown state checks into xmon_core - Allow clearing of breakpoints in xmon read-only mode - Test simple scenarios (combinations of xmon and lockdown cmdline options, setting breakpoints and changing lockdown state, etc) in QEMU and on an actual POWER8 VM - Rebase onto security/next-lockdown b602614a81078bf29c82b2671bb96a63488f68d6 Changes since v3: - Allow active breakpoints to be shown/listed in read-only mode Changes since v2: - Rebased onto v36 of https://patchwork.kernel.org/cover/11049461/ (based on: f632a8170a6b667ee4e3f552087588f0fe13c4bb) - Do not clear existing breakpoints when transitioning from lockdown=none to lockdown=integrity - Remove line continuation and dangling quote (confuses checkpatch.pl) from the xmon command help/usage string Christopher M. Riedl (2): powerpc/xmon: Allow listing active breakpoints in read-only mode powerpc/xmon: Restrict when kernel is locked down arch/powerpc/xmon/xmon.c | 104 +++ include/linux/security.h | 2 + security/lockdown/lockdown.c | 2 + 3 files changed, 86 insertions(+), 22 deletions(-) -- 2.23.0
Re: [PATCH v5 2/2] powerpc/xmon: Restrict when kernel is locked down
> On August 29, 2019 at 2:43 AM Daniel Axtens wrote: > > > Hi, > > > Xmon should be either fully or partially disabled depending on the > > kernel lockdown state. > > I've been kicking the tyres of this, and it seems to work well: > > Tested-by: Daniel Axtens > Thank you for taking the time to test this! > > I have one small nit: if I enter confidentiality mode and then try to > enter xmon, I get 32 messages about clearing the breakpoints each time I > try to enter xmon: > Ugh, that's annoying. I tested this on a vm w/ 2 vcpus but should have considered the case of more vcpus :( > > root@dja-guest:~# echo confidentiality > /sys/kernel/security/lockdown > root@dja-guest:~# echo x >/proc/sysrq-trigger > [ 489.585400] sysrq: Entering xmon > xmon: Disabled due to kernel lockdown > xmon: All breakpoints cleared > xmon: All breakpoints cleared > xmon: All breakpoints cleared > xmon: All breakpoints cleared > xmon: All breakpoints cleared > ... > > Investigating, I see that this is because my vm has 32 vcpus, and I'm > getting one per CPU. > > Looking at the call sites, there's only one other caller, so I think you > might be better served with this: > > diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c > index 94a5fada3034..fcaf1d568162 100644 > --- a/arch/powerpc/xmon/xmon.c > +++ b/arch/powerpc/xmon/xmon.c > @@ -3833,10 +3833,6 @@ static void clear_all_bpt(void) > iabr = NULL; > dabr.enabled = 0; > } > - > - get_output_lock(); > - printf("xmon: All breakpoints cleared\n"); > - release_output_lock(); > } > > #ifdef CONFIG_DEBUG_FS > @@ -3846,8 +3842,13 @@ static int xmon_dbgfs_set(void *data, u64 val) > xmon_init(xmon_on); > > /* make sure all breakpoints removed when disabling */ > - if (!xmon_on) > + if (!xmon_on) { > clear_all_bpt(); > + get_output_lock(); > + printf("xmon: All breakpoints cleared\n"); > + release_output_lock(); > + } > + > return 0; > } > Good point, I will add this to the next version, thanks! > > Apart from that: > Reviewed-by: Daniel Axtens > > Regards, > Daniel >
Re: [PATCH v5 1/2] powerpc/xmon: Allow listing and clearing breakpoints in read-only mode
> On August 29, 2019 at 1:40 AM Daniel Axtens wrote: > > > Hi Chris, > > > Read-only mode should not prevent listing and clearing any active > > breakpoints. > > I tested this and it works for me: > > Tested-by: Daniel Axtens > > > + if (xmon_is_ro || !scanhex(&a)) { > > It took me a while to figure out what this line does: as I understand > it, the 'b' command can also be used to install a breakpoint (as well as > bi/bd). If we are in ro mode or if the input after 'b' doesn't scan as a > hex string, print the list of breakpoints instead. Anyway, I'm now > happy with it, so: > I can add a comment to that effect in the next version. That entire section of code could probably be cleaned up a bit - but that's for another patch. Thanks for testing! > > Reviewed-by: Daniel Axtens > > Regards, > Daniel > > > /* print all breakpoints */ > > printf(" typeaddress\n"); > > if (dabr.enabled) { > > -- > > 2.23.0
[PATCH v6 1/2] powerpc/xmon: Allow listing and clearing breakpoints in read-only mode
Read-only mode should not prevent listing and clearing any active breakpoints. Tested-by: Daniel Axtens Reviewed-by: Daniel Axtens Signed-off-by: Christopher M. Riedl --- arch/powerpc/xmon/xmon.c | 16 +++- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index d0620d762a5a..ed94de614938 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -1045,10 +1045,6 @@ cmds(struct pt_regs *excp) set_lpp_cmd(); break; case 'b': - if (xmon_is_ro) { - printf(xmon_ro_msg); - break; - } bpt_cmds(); break; case 'C': @@ -1317,11 +1313,16 @@ bpt_cmds(void) struct bpt *bp; cmd = inchar(); + switch (cmd) { #ifndef CONFIG_PPC_8xx static const char badaddr[] = "Only kernel addresses are permitted for breakpoints\n"; int mode; case 'd': /* bd - hardware data breakpoint */ + if (xmon_is_ro) { + printf(xmon_ro_msg); + break; + } if (!ppc_breakpoint_available()) { printf("Hardware data breakpoint not supported on this cpu\n"); break; @@ -1349,6 +1350,10 @@ bpt_cmds(void) break; case 'i': /* bi - hardware instr breakpoint */ + if (xmon_is_ro) { + printf(xmon_ro_msg); + break; + } if (!cpu_has_feature(CPU_FTR_ARCH_207S)) { printf("Hardware instruction breakpoint " "not supported on this cpu\n"); @@ -1407,7 +1412,8 @@ bpt_cmds(void) break; } termch = cmd; - if (!scanhex(&a)) { + + if (xmon_is_ro || !scanhex(&a)) { /* print all breakpoints */ printf(" typeaddress\n"); if (dabr.enabled) { -- 2.23.0
[PATCH v6 0/2] Restrict xmon when kernel is locked down
Xmon should be either fully or partially disabled depending on the kernel lockdown state. Put xmon into read-only mode for lockdown=integrity and completely disable xmon when lockdown=confidentiality. Since this can occur dynamically, there may be pre-existing, active breakpoints in xmon when transitioning into read-only mode. These breakpoints will still trigger, so allow them to be listed and cleared using xmon. Changes since v5: - Do not spam print messages when attempting to enter xmon when lockdown=confidentiality Changes since v4: - Move lockdown state checks into xmon_core - Allow clearing of breakpoints in xmon read-only mode - Test simple scenarios (combinations of xmon and lockdown cmdline options, setting breakpoints and changing lockdown state, etc) in QEMU and on an actual POWER8 VM - Rebase onto security/next-lockdown b602614a81078bf29c82b2671bb96a63488f68d6 Changes since v3: - Allow active breakpoints to be shown/listed in read-only mode Changes since v2: - Rebased onto v36 of https://patchwork.kernel.org/cover/11049461/ (based on: f632a8170a6b667ee4e3f552087588f0fe13c4bb) - Do not clear existing breakpoints when transitioning from lockdown=none to lockdown=integrity - Remove line continuation and dangling quote (confuses checkpatch.pl) from the xmon command help/usage string Christopher M. Riedl (2): powerpc/xmon: Allow listing and clearing breakpoints in read-only mode powerpc/xmon: Restrict when kernel is locked down arch/powerpc/xmon/xmon.c | 108 +++ include/linux/security.h | 2 + security/lockdown/lockdown.c | 2 + 3 files changed, 87 insertions(+), 25 deletions(-) -- 2.23.0
[PATCH v6 2/2] powerpc/xmon: Restrict when kernel is locked down
Xmon should be either fully or partially disabled depending on the kernel lockdown state. Put xmon into read-only mode for lockdown=integrity and prevent user entry into xmon when lockdown=confidentiality. Xmon checks the lockdown state on every attempted entry: (1) during early xmon'ing (2) when triggered via sysrq (3) when toggled via debugfs (4) when triggered via a previously enabled breakpoint The following lockdown state transitions are handled: (1) lockdown=none -> lockdown=integrity set xmon read-only mode (2) lockdown=none -> lockdown=confidentiality clear all breakpoints, set xmon read-only mode, prevent user re-entry into xmon (3) lockdown=integrity -> lockdown=confidentiality clear all breakpoints, set xmon read-only mode, prevent user re-entry into xmon Suggested-by: Andrew Donnellan Tested-by: Daniel Axtens Reviewed-by: Daniel Axtens Signed-off-by: Christopher M. Riedl --- arch/powerpc/xmon/xmon.c | 92 include/linux/security.h | 2 + security/lockdown/lockdown.c | 2 + 3 files changed, 76 insertions(+), 20 deletions(-) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index ed94de614938..335718d0b777 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -25,6 +25,7 @@ #include #include #include +#include #include #include @@ -187,6 +188,8 @@ static void dump_tlb_44x(void); static void dump_tlb_book3e(void); #endif +static void clear_all_bpt(void); + #ifdef CONFIG_PPC64 #define REG"%.16lx" #else @@ -283,10 +286,38 @@ Commands:\n\ " U show uptime information\n" " ? help\n" " # n limit output to n lines per page (for dp, dpa, dl)\n" -" zr reboot\n\ - zh halt\n" +" zr reboot\n" +" zh halt\n" ; +#ifdef CONFIG_SECURITY +static bool xmon_is_locked_down(void) +{ + static bool lockdown; + + if (!lockdown) { + lockdown = !!security_locked_down(LOCKDOWN_XMON_RW); + if (lockdown) { + printf("xmon: Disabled due to kernel lockdown\n"); + xmon_is_ro = true; + } + } + + if (!xmon_is_ro) { + xmon_is_ro = !!security_locked_down(LOCKDOWN_XMON_WR); + if (xmon_is_ro) + printf("xmon: Read-only due to kernel lockdown\n"); + } + + return lockdown; +} +#else /* CONFIG_SECURITY */ +static inline bool xmon_is_locked_down(void) +{ + return false; +} +#endif + static struct pt_regs *xmon_regs; static inline void sync(void) @@ -438,7 +469,10 @@ static bool wait_for_other_cpus(int ncpus) return false; } -#endif /* CONFIG_SMP */ +#else /* CONFIG_SMP */ +static inline void get_output_lock(void) {} +static inline void release_output_lock(void) {} +#endif static inline int unrecoverable_excp(struct pt_regs *regs) { @@ -455,6 +489,7 @@ static int xmon_core(struct pt_regs *regs, int fromipi) int cmd = 0; struct bpt *bp; long recurse_jmp[JMP_BUF_LEN]; + bool locked_down; unsigned long offset; unsigned long flags; #ifdef CONFIG_SMP @@ -465,6 +500,8 @@ static int xmon_core(struct pt_regs *regs, int fromipi) local_irq_save(flags); hard_irq_disable(); + locked_down = xmon_is_locked_down(); + tracing_enabled = tracing_is_on(); tracing_off(); @@ -516,7 +553,8 @@ static int xmon_core(struct pt_regs *regs, int fromipi) if (!fromipi) { get_output_lock(); - excprint(regs); + if (!locked_down) + excprint(regs); if (bp) { printf("cpu 0x%x stopped at breakpoint 0x%tx (", cpu, BP_NUM(bp)); @@ -568,10 +606,14 @@ static int xmon_core(struct pt_regs *regs, int fromipi) } remove_bpts(); disable_surveillance(); - /* for breakpoint or single step, print the current instr. */ - if (bp || TRAP(regs) == 0xd00) - ppc_inst_dump(regs->nip, 1, 0); - printf("enter ? for help\n"); + + if (!locked_down) { + /* for breakpoint or single step, print curr insn */ + if (bp || TRAP(regs) == 0xd00) + ppc_inst_dump(regs->nip, 1, 0); + printf("enter ? for help\n"); + } + mb(); xmon_gate = 1; barrier(); @@ -595,8 +637,9 @@ static int xmon_core(struct pt_regs *regs, int fromipi) spin_cpu_relax(); touch_nmi_watchdog(); } else { - cmd = cmds(regs); -
[PATCH v7 1/2] powerpc/xmon: Allow listing and clearing breakpoints in read-only mode
Read-only mode should not prevent listing and clearing any active breakpoints. Tested-by: Daniel Axtens Reviewed-by: Daniel Axtens Signed-off-by: Christopher M. Riedl --- arch/powerpc/xmon/xmon.c | 16 +++- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index d0620d762a5a..ed94de614938 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -1045,10 +1045,6 @@ cmds(struct pt_regs *excp) set_lpp_cmd(); break; case 'b': - if (xmon_is_ro) { - printf(xmon_ro_msg); - break; - } bpt_cmds(); break; case 'C': @@ -1317,11 +1313,16 @@ bpt_cmds(void) struct bpt *bp; cmd = inchar(); + switch (cmd) { #ifndef CONFIG_PPC_8xx static const char badaddr[] = "Only kernel addresses are permitted for breakpoints\n"; int mode; case 'd': /* bd - hardware data breakpoint */ + if (xmon_is_ro) { + printf(xmon_ro_msg); + break; + } if (!ppc_breakpoint_available()) { printf("Hardware data breakpoint not supported on this cpu\n"); break; @@ -1349,6 +1350,10 @@ bpt_cmds(void) break; case 'i': /* bi - hardware instr breakpoint */ + if (xmon_is_ro) { + printf(xmon_ro_msg); + break; + } if (!cpu_has_feature(CPU_FTR_ARCH_207S)) { printf("Hardware instruction breakpoint " "not supported on this cpu\n"); @@ -1407,7 +1412,8 @@ bpt_cmds(void) break; } termch = cmd; - if (!scanhex(&a)) { + + if (xmon_is_ro || !scanhex(&a)) { /* print all breakpoints */ printf(" typeaddress\n"); if (dabr.enabled) { -- 2.23.0
[PATCH v7 2/2] powerpc/xmon: Restrict when kernel is locked down
Xmon should be either fully or partially disabled depending on the kernel lockdown state. Put xmon into read-only mode for lockdown=integrity and prevent user entry into xmon when lockdown=confidentiality. Xmon checks the lockdown state on every attempted entry: (1) during early xmon'ing (2) when triggered via sysrq (3) when toggled via debugfs (4) when triggered via a previously enabled breakpoint The following lockdown state transitions are handled: (1) lockdown=none -> lockdown=integrity set xmon read-only mode (2) lockdown=none -> lockdown=confidentiality clear all breakpoints, set xmon read-only mode, prevent user re-entry into xmon (3) lockdown=integrity -> lockdown=confidentiality clear all breakpoints, set xmon read-only mode, prevent user re-entry into xmon Suggested-by: Andrew Donnellan Signed-off-by: Christopher M. Riedl --- arch/powerpc/xmon/xmon.c | 103 --- include/linux/security.h | 2 + security/lockdown/lockdown.c | 2 + 3 files changed, 86 insertions(+), 21 deletions(-) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index ed94de614938..6eaf8ab532f6 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -25,6 +25,7 @@ #include #include #include +#include #include #include @@ -187,6 +188,8 @@ static void dump_tlb_44x(void); static void dump_tlb_book3e(void); #endif +static void clear_all_bpt(void); + #ifdef CONFIG_PPC64 #define REG"%.16lx" #else @@ -283,10 +286,38 @@ Commands:\n\ " U show uptime information\n" " ? help\n" " # n limit output to n lines per page (for dp, dpa, dl)\n" -" zr reboot\n\ - zh halt\n" +" zr reboot\n" +" zh halt\n" ; +#ifdef CONFIG_SECURITY +static bool xmon_is_locked_down(void) +{ + static bool lockdown; + + if (!lockdown) { + lockdown = !!security_locked_down(LOCKDOWN_XMON_RW); + if (lockdown) { + printf("xmon: Disabled due to kernel lockdown\n"); + xmon_is_ro = true; + } + } + + if (!xmon_is_ro) { + xmon_is_ro = !!security_locked_down(LOCKDOWN_XMON_WR); + if (xmon_is_ro) + printf("xmon: Read-only due to kernel lockdown\n"); + } + + return lockdown; +} +#else /* CONFIG_SECURITY */ +static inline bool xmon_is_locked_down(void) +{ + return false; +} +#endif + static struct pt_regs *xmon_regs; static inline void sync(void) @@ -438,7 +469,10 @@ static bool wait_for_other_cpus(int ncpus) return false; } -#endif /* CONFIG_SMP */ +#else /* CONFIG_SMP */ +static inline void get_output_lock(void) {} +static inline void release_output_lock(void) {} +#endif static inline int unrecoverable_excp(struct pt_regs *regs) { @@ -455,6 +489,7 @@ static int xmon_core(struct pt_regs *regs, int fromipi) int cmd = 0; struct bpt *bp; long recurse_jmp[JMP_BUF_LEN]; + bool locked_down; unsigned long offset; unsigned long flags; #ifdef CONFIG_SMP @@ -465,6 +500,8 @@ static int xmon_core(struct pt_regs *regs, int fromipi) local_irq_save(flags); hard_irq_disable(); + locked_down = xmon_is_locked_down(); + tracing_enabled = tracing_is_on(); tracing_off(); @@ -516,7 +553,8 @@ static int xmon_core(struct pt_regs *regs, int fromipi) if (!fromipi) { get_output_lock(); - excprint(regs); + if (!locked_down) + excprint(regs); if (bp) { printf("cpu 0x%x stopped at breakpoint 0x%tx (", cpu, BP_NUM(bp)); @@ -568,10 +606,14 @@ static int xmon_core(struct pt_regs *regs, int fromipi) } remove_bpts(); disable_surveillance(); - /* for breakpoint or single step, print the current instr. */ - if (bp || TRAP(regs) == 0xd00) - ppc_inst_dump(regs->nip, 1, 0); - printf("enter ? for help\n"); + + if (!locked_down) { + /* for breakpoint or single step, print curr insn */ + if (bp || TRAP(regs) == 0xd00) + ppc_inst_dump(regs->nip, 1, 0); + printf("enter ? for help\n"); + } + mb(); xmon_gate = 1; barrier(); @@ -595,8 +637,9 @@ static int xmon_core(struct pt_regs *regs, int fromipi) spin_cpu_relax(); touch_nmi_watchdog(); } else { - cmd = cmds(regs); - if (cmd != 0) { +
[PATCH v7 0/2] Restrict xmon when kernel is locked down
Xmon should be either fully or partially disabled depending on the kernel lockdown state. Put xmon into read-only mode for lockdown=integrity and completely disable xmon when lockdown=confidentiality. Since this can occur dynamically, there may be pre-existing, active breakpoints in xmon when transitioning into read-only mode. These breakpoints will still trigger, so allow them to be listed and cleared using xmon. Changes since v6: - Add lockdown check in sysrq-trigger to prevent entry into xmon_core - Add lockdown check during init xmon setup for the case when booting with compile-time or cmdline lockdown=confidentialiaty Changes since v5: - Do not spam print messages when attempting to enter xmon when lockdown=confidentiality Changes since v4: - Move lockdown state checks into xmon_core - Allow clearing of breakpoints in xmon read-only mode - Test simple scenarios (combinations of xmon and lockdown cmdline options, setting breakpoints and changing lockdown state, etc) in QEMU and on an actual POWER8 VM - Rebase onto security/next-lockdown b602614a81078bf29c82b2671bb96a63488f68d6 Changes since v3: - Allow active breakpoints to be shown/listed in read-only mode Changes since v2: - Rebased onto v36 of https://patchwork.kernel.org/cover/11049461/ (based on: f632a8170a6b667ee4e3f552087588f0fe13c4bb) - Do not clear existing breakpoints when transitioning from lockdown=none to lockdown=integrity - Remove line continuation and dangling quote (confuses checkpatch.pl) from the xmon command help/usage string Christopher M. Riedl (2): powerpc/xmon: Allow listing and clearing breakpoints in read-only mode powerpc/xmon: Restrict when kernel is locked down arch/powerpc/xmon/xmon.c | 119 +++ include/linux/security.h | 2 + security/lockdown/lockdown.c | 2 + 3 files changed, 97 insertions(+), 26 deletions(-) -- 2.23.0
Re: [PATCH v8 5/6] powerpc/code-patching: Use temporary mm for Radix MMU
On Mon Oct 24, 2022 at 12:17 AM CDT, Benjamin Gray wrote: > On Mon, 2022-10-24 at 14:45 +1100, Russell Currey wrote: > > On Fri, 2022-10-21 at 16:22 +1100, Benjamin Gray wrote: > > > From: "Christopher M. Riedl" > > > -%<-- > > > > > > --- > > > > Is the section following the --- your addendum to Chris' patch? That > > cuts it off from git, including your signoff. It'd be better to have > > it together as one commit message and note the bits you contributed > > below the --- after your signoff. > > > > Commits where you're modifying someone else's previous work should > > include their signoff above yours, as well. > > Addendum to his wording, to break it off from the "From..." section > (which is me splicing together his comments from previous patches with > some minor changes to account for the patch changes). I found out > earlier today that Git will treat it as a comment :( > > I'll add the signed off by back, I wasn't sure whether to leave it > there after making changes (same in patch 2). > This commit has lots of my words so should probably keep the sign-off - if only to guarantee that blame is properly directed at me for any nonsense therein ^^. Patch 2 probably doesn't need my sign-off any more - iirc, I actually defended the BUG_ON()s (which are WARN_ON()s now) at some point.
Re: [PATCH v3 4/6] powerpc: Introduce temporary mm
On Thu Aug 27, 2020 at 11:15 AM CDT, Jann Horn wrote: > On Thu, Aug 27, 2020 at 7:24 AM Christopher M. Riedl > wrote: > > x86 supports the notion of a temporary mm which restricts access to > > temporary PTEs to a single CPU. A temporary mm is useful for situations > > where a CPU needs to perform sensitive operations (such as patching a > > STRICT_KERNEL_RWX kernel) requiring temporary mappings without exposing > > said mappings to other CPUs. A side benefit is that other CPU TLBs do > > not need to be flushed when the temporary mm is torn down. > > > > Mappings in the temporary mm can be set in the userspace portion of the > > address-space. > [...] > > diff --git a/arch/powerpc/lib/code-patching.c > > b/arch/powerpc/lib/code-patching.c > [...] > > @@ -44,6 +45,70 @@ int raw_patch_instruction(struct ppc_inst *addr, struct > > ppc_inst instr) > > } > > > > #ifdef CONFIG_STRICT_KERNEL_RWX > > + > > +struct temp_mm { > > + struct mm_struct *temp; > > + struct mm_struct *prev; > > + bool is_kernel_thread; > > + struct arch_hw_breakpoint brk[HBP_NUM_MAX]; > > +}; > > + > > +static inline void init_temp_mm(struct temp_mm *temp_mm, struct mm_struct > > *mm) > > +{ > > + temp_mm->temp = mm; > > + temp_mm->prev = NULL; > > + temp_mm->is_kernel_thread = false; > > + memset(&temp_mm->brk, 0, sizeof(temp_mm->brk)); > > +} > > + > > +static inline void use_temporary_mm(struct temp_mm *temp_mm) > > +{ > > + lockdep_assert_irqs_disabled(); > > + > > + temp_mm->is_kernel_thread = current->mm == NULL; > > (That's a somewhat misleading variable name - kernel threads can have > a non-NULL ->mm, too.) > Oh I didn't know that, in that case yes this is not a good name. I am considering some changes (based on your comments about current->mm below) which would make this variable superfluous. > > + if (temp_mm->is_kernel_thread) > > + temp_mm->prev = current->active_mm; > > + else > > + temp_mm->prev = current->mm; > > Why the branch? Shouldn't current->active_mm work in both cases? > > Yes you are correct. > > + /* > > +* Hash requires a non-NULL current->mm to allocate a userspace > > address > > +* when handling a page fault. Does not appear to hurt in Radix > > either. > > +*/ > > + current->mm = temp_mm->temp; > > This looks dangerous to me. There are various places that attempt to > find all userspace tasks that use a given mm by iterating through all > tasks on the system and comparing each task's ->mm pointer to > current's. Things like current_is_single_threaded() as part of various > security checks, mm_update_next_owner(), zap_threads(), and so on. So > if this is reachable from userspace task context (which I think it > is?), I don't think we're allowed to switch out the ->mm pointer here. > > Thanks for pointing this out! I took a step back and looked at this again in more detail. The only reason for reassigning the ->mm pointer is that when patching we need to hash the page and allocate an SLB entry w/ the hash MMU. That codepath includes a check to ensure that ->mm is not NULL. Overwriting ->mm temporarily and restoring it is pretty crappy in retrospect. I _think_ a better approach is to just call the hashing and allocate SLB functions from `map_patch` directly - this both removes the need to overwrite ->mm (since the functions take an mm parameter) and it avoids taking two exceptions when doing the actual patching. This works fine on Power9 and a Power8 at least but needs some testing on PPC32 before I can send a v4. > > + switch_mm_irqs_off(NULL, temp_mm->temp, current); > > switch_mm_irqs_off() calls switch_mmu_context(), which in the nohash > implementation increments next->context.active and decrements > prev->context.active if prev is non-NULL, right? So this would > increase temp_mm->temp->context.active... > > > + if (ppc_breakpoint_available()) { > > + struct arch_hw_breakpoint null_brk = {0}; > > + int i = 0; > > + > > + for (; i < nr_wp_slots(); ++i) { > > + __get_breakpoint(i, &temp_mm->brk[i]); > > + if (temp_mm->brk[i].type != 0) > > + __set_breakpoint(i, &null_brk); > > + } > > + } > > +} > > + > > +static in
Re: [PATCH v2 25/25] powerpc/signal32: Transform save_user_regs() and save_tm_user_regs() in 'unsafe' version
On Tue Aug 18, 2020 at 12:19 PM CDT, Christophe Leroy wrote: > Change those two functions to be used within a user access block. > > For that, change save_general_regs() to and unsafe_save_general_regs(), > then replace all user accesses by unsafe_ versions. > > This series leads to a reduction from 2.55s to 1.73s of > the system CPU time with the following microbench app > on an mpc832x with KUAP (approx 32%) > > Without KUAP, the difference is in the noise. > > void sigusr1(int sig) { } > > int main(int argc, char **argv) > { > int i = 10; > > signal(SIGUSR1, sigusr1); > for (;i--;) > raise(SIGUSR1); > exit(0); > } > > An additional 0.10s reduction is achieved by removing > CONFIG_PPC_FPU, as the mpc832x has no FPU. > > A bit less spectacular on an 8xx as KUAP is less heavy, prior to > the series (with KUAP) it ran in 8.10 ms. Once applies the removal > of FPU regs handling, we get 7.05s. With the full series, we get 6.9s. > If artificially re-activating FPU regs handling with the full series, > we get 7.6s. > > So for the 8xx, the removal of the FPU regs copy is what makes the > difference, but the rework of handle_signal also have a benefit. > > Same as above, without KUAP the difference is in the noise. > > Signed-off-by: Christophe Leroy > --- > arch/powerpc/kernel/signal_32.c | 224 > 1 file changed, 111 insertions(+), 113 deletions(-) > > diff --git a/arch/powerpc/kernel/signal_32.c > b/arch/powerpc/kernel/signal_32.c > index 86539a4e0514..f795fe0240a1 100644 > --- a/arch/powerpc/kernel/signal_32.c > +++ b/arch/powerpc/kernel/signal_32.c > @@ -93,8 +93,8 @@ static inline int get_sigset_t(sigset_t *set, > #define to_user_ptr(p) ptr_to_compat(p) > #define from_user_ptr(p) compat_ptr(p) > > -static inline int save_general_regs(struct pt_regs *regs, > - struct mcontext __user *frame) > +static __always_inline int > +save_general_regs_unsafe(struct pt_regs *regs, struct mcontext __user > *frame) > { > elf_greg_t64 *gregs = (elf_greg_t64 *)regs; > int val, i; > @@ -108,10 +108,12 @@ static inline int save_general_regs(struct pt_regs > *regs, > else > val = gregs[i]; > > - if (__put_user(val, &frame->mc_gregs[i])) > - return -EFAULT; > + unsafe_put_user(val, &frame->mc_gregs[i], failed); > } > return 0; > + > +failed: > + return 1; > } > > static inline int restore_general_regs(struct pt_regs *regs, > @@ -148,11 +150,15 @@ static inline int get_sigset_t(sigset_t *set, > const sigset_t __user *uset) > #define to_user_ptr(p) ((unsigned long)(p)) > #define from_user_ptr(p) ((void __user *)(p)) > > -static inline int save_general_regs(struct pt_regs *regs, > - struct mcontext __user *frame) > +static __always_inline int > +save_general_regs_unsafe(struct pt_regs *regs, struct mcontext __user > *frame) > { > WARN_ON(!FULL_REGS(regs)); > - return __copy_to_user(&frame->mc_gregs, regs, GP_REGS_SIZE); > + unsafe_copy_to_user(&frame->mc_gregs, regs, GP_REGS_SIZE, failed); > + return 0; > + > +failed: > + return 1; > } > > static inline int restore_general_regs(struct pt_regs *regs, > @@ -170,6 +176,11 @@ static inline int restore_general_regs(struct > pt_regs *regs, > } > #endif > > +#define unsafe_save_general_regs(regs, frame, label) do { \ > + if (save_general_regs_unsafe(regs, frame)) \ Minor nitpick (sorry); this naming seems a bit strange to me, in x86 it is "__unsafe_" as a prefix instead of "_unsafe" as a suffix. That sounds a bit better to me, what do you think? Unless there is some convention I am not aware of here apart from "unsafe_" using a goto label for errors. > + goto label; \ > +} while (0) > + > /* > * When we have signals to deliver, we set up on the > * user stack, going down from the original stack pointer: > @@ -249,21 +260,19 @@ static void prepare_save_user_regs(int > ctx_has_vsx_region) > #endif > } > > -static int save_user_regs(struct pt_regs *regs, struct mcontext __user > *frame, > - struct mcontext __user *tm_frame, int ctx_has_vsx_region) > +static int save_user_regs_unsafe(struct pt_regs *regs, struct mcontext > __user *frame, > + struct mcontext __user *tm_frame, int ctx_has_vsx_region) > { > unsigned long msr = regs->msr; > > /* save general registers */ > - if (save_general_regs(regs, frame)) > - return 1; > + unsafe_save_general_regs(regs, frame, failed); > > #ifdef CONFIG_ALTIVEC > /* save altivec registers */ > if (current->thread.used_vr) { > - if (__copy_to_user(&frame->mc_vregs, ¤t->thread.vr_state, > - ELF_NVRREG * sizeof(vector128))) > - return 1; > + unsafe_copy_to_user(&frame->mc_vregs, ¤t->thread.vr_state, > + ELF_NVRREG * sizeof(vector128), failed); > /* set MSR_VEC in the saved MSR value to indicate that > frame->mc_vregs contains valid data */ > msr |= MSR_VEC; > @@ -276,11 +285,10 @@ static int save_user_regs(struct pt_regs *regs, > struct mcontext __user *frame, > * most significant bits of that same vector. --BenH > * Note that the current VRSAVE value is in the SPR at this point. > */ > - if (__put_user(c
Re: [PATCH v2 23/25] powerpc/signal: Create 'unsafe' versions of copy_[ck][fpr/vsx]_to_user()
On Tue Aug 18, 2020 at 12:19 PM CDT, Christophe Leroy wrote: > For the non VSX version, that's trivial. Just use unsafe_copy_to_user() > instead of __copy_to_user(). > > For the VSX version, remove the intermediate step through a buffer and > use unsafe_put_user() directly. This generates a far smaller code which > is acceptable to inline, see below: > > Standard VSX version: > > <.copy_fpr_to_user>: > 0: 7c 08 02 a6 mflr r0 > 4: fb e1 ff f8 std r31,-8(r1) > 8: 39 00 00 20 li r8,32 > c: 39 24 0b 80 addi r9,r4,2944 > 10: 7d 09 03 a6 mtctr r8 > 14: f8 01 00 10 std r0,16(r1) > 18: f8 21 fe 71 stdu r1,-400(r1) > 1c: 39 41 00 68 addi r10,r1,104 > 20: e9 09 00 00 ld r8,0(r9) > 24: 39 4a 00 08 addi r10,r10,8 > 28: 39 29 00 10 addi r9,r9,16 > 2c: f9 0a 00 00 std r8,0(r10) > 30: 42 00 ff f0 bdnz 20 <.copy_fpr_to_user+0x20> > 34: e9 24 0d 80 ld r9,3456(r4) > 38: 3d 42 00 00 addis r10,r2,0 > 3a: R_PPC64_TOC16_HA .toc > 3c: eb ea 00 00 ld r31,0(r10) > 3e: R_PPC64_TOC16_LO_DS .toc > 40: f9 21 01 70 std r9,368(r1) > 44: e9 3f 00 00 ld r9,0(r31) > 48: 81 29 00 20 lwz r9,32(r9) > 4c: 2f 89 00 00 cmpwi cr7,r9,0 > 50: 40 9c 00 18 bge cr7,68 <.copy_fpr_to_user+0x68> > 54: 4c 00 01 2c isync > 58: 3d 20 40 00 lis r9,16384 > 5c: 79 29 07 c6 rldicr r9,r9,32,31 > 60: 7d 3d 03 a6 mtspr 29,r9 > 64: 4c 00 01 2c isync > 68: 38 a0 01 08 li r5,264 > 6c: 38 81 00 70 addi r4,r1,112 > 70: 48 00 00 01 bl 70 <.copy_fpr_to_user+0x70> > 70: R_PPC64_REL24 .__copy_tofrom_user > 74: 60 00 00 00 nop > 78: e9 3f 00 00 ld r9,0(r31) > 7c: 81 29 00 20 lwz r9,32(r9) > 80: 2f 89 00 00 cmpwi cr7,r9,0 > 84: 40 9c 00 18 bge cr7,9c <.copy_fpr_to_user+0x9c> > 88: 4c 00 01 2c isync > 8c: 39 20 ff ff li r9,-1 > 90: 79 29 00 44 rldicr r9,r9,0,1 > 94: 7d 3d 03 a6 mtspr 29,r9 > 98: 4c 00 01 2c isync > 9c: 38 21 01 90 addi r1,r1,400 > a0: e8 01 00 10 ld r0,16(r1) > a4: eb e1 ff f8 ld r31,-8(r1) > a8: 7c 08 03 a6 mtlr r0 > ac: 4e 80 00 20 blr > > 'unsafe' simulated VSX version (The ... are only nops) using > unsafe_copy_fpr_to_user() macro: > > unsigned long copy_fpr_to_user(void __user *to, > struct task_struct *task) > { > unsafe_copy_fpr_to_user(to, task, failed); > return 0; > failed: > return 1; > } > > <.copy_fpr_to_user>: > 0: 39 00 00 20 li r8,32 > 4: 39 44 0b 80 addi r10,r4,2944 > 8: 7d 09 03 a6 mtctr r8 > c: 7c 69 1b 78 mr r9,r3 > ... > 20: e9 0a 00 00 ld r8,0(r10) > 24: f9 09 00 00 std r8,0(r9) > 28: 39 4a 00 10 addi r10,r10,16 > 2c: 39 29 00 08 addi r9,r9,8 > 30: 42 00 ff f0 bdnz 20 <.copy_fpr_to_user+0x20> > 34: e9 24 0d 80 ld r9,3456(r4) > 38: f9 23 01 00 std r9,256(r3) > 3c: 38 60 00 00 li r3,0 > 40: 4e 80 00 20 blr > ... > 50: 38 60 00 01 li r3,1 > 54: 4e 80 00 20 blr > > Signed-off-by: Christophe Leroy > --- > arch/powerpc/kernel/signal.h | 53 > 1 file changed, 53 insertions(+) > > diff --git a/arch/powerpc/kernel/signal.h b/arch/powerpc/kernel/signal.h > index f610cfafa478..2559a681536e 100644 > --- a/arch/powerpc/kernel/signal.h > +++ b/arch/powerpc/kernel/signal.h > @@ -32,7 +32,54 @@ unsigned long copy_fpr_to_user(void __user *to, > struct task_struct *task); > unsigned long copy_ckfpr_to_user(void __user *to, struct task_struct > *task); > unsigned long copy_fpr_from_user(struct task_struct *task, void __user > *from); > unsigned long copy_ckfpr_from_user(struct task_struct *task, void __user > *from); > + > +#define unsafe_copy_fpr_to_user(to, task, label) do { \ > + struct task_struct *__t = task; \ > + u64 __user *buf = (u64 __user *)to; \ > + int i; \ > + \ > + for (i = 0; i < ELF_NFPREG - 1 ; i++) \ > + unsafe_put_user(__t->thread.TS_FPR(i), &buf[i], label); \ > + unsafe_put_user(__t->thread.fp_state.fpscr, &buf[i], label); \ > +} while (0) > + I've been working on the PPC64 side of this "unsafe" rework using this series as a basis. One question here - I don't really understand what the benefit of re-implementing this logic in macros (similarly for the other copy_* functions below) is? I am considering a "__unsafe_copy_*" implementation in signal.c for each (just the original implementation w/ using the "unsafe_" variants of the uaccess stuff) which gets called by the "safe" functions w/ the appropriate "user_*_access_begin/user_*_access_end". Something like (pseudo-ish code): /* signal.c */ unsigned long __unsafe_copy_fpr_to_user(...) { ... unsafe_copy_to_user(..., bad); return 0; bad: return 1; /* -EFAULT? */ } unsigned long copy_fpr_to_user(...) { unsigned long err; if (!user_write_access_begin(...)) return 1; /* -EFAULT? */ err = __unsafe_copy_fpr_to_user(...); user_write_access_end(); return err; } /* signal.h */ unsigned long __unsafe_copy_fpr_to_user(...); #define unsafe_copy_fpr_to_user(..., label) \
[PATCH 0/8] Improve signal performance on PPC64 with KUAP
As reported by Anton, there is a large penalty to signal handling performance on radix systems using KUAP. The signal handling code performs many user access operations, each of which needs to switch the KUAP permissions bit to open and then close user access. This involves a costly 'mtspr' operation [0]. There is existing work done on x86 and by Christopher Leroy for PPC32 to instead open up user access in "blocks" using user_*_access_{begin,end}. We can do the same in PPC64 to bring performance back up on KUAP-enabled radix systems. This series applies on top of Christophe Leroy's work for PPC32 [1] (I'm sure patchwork won't be too happy about that). The first two patches add some needed 'unsafe' versions of copy-from functions. While these do not make use of asm-goto they still allow for avoiding the repeated uaccess switches. The third patch adds 'notrace' to any functions expected to be called in a uaccess block context. Normally functions called in such a context should be inlined, but this is not feasible everywhere. Marking them 'notrace' should provide _some_ protection against leaving the user access window open. The next three patches rewrite some of the signal64 helper functions to be 'unsafe'. Finally, the last two patches update the main signal handling functions to make use of the new 'unsafe' helpers and eliminate some additional uaccess switching. I used the will-it-scale signal1 benchmark to measure and compare performance [2]. The below results are from a P9 Blackbird system. Note that currently hash does not support KUAP and is therefore used as the "baseline" comparison. Bigger numbers are better: signal1_threads -t1 -s10 | | hash | radix | | --- | -- | -- | | linuxppc/next | 289014 | 158408 | | unsafe-signal64 | 298506 | 253053 | [0]: https://github.com/linuxppc/issues/issues/277 [1]: https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=196278 [2]: https://github.com/antonblanchard/will-it-scale/blob/master/tests/signal1.c Christopher M. Riedl (5): powerpc/uaccess: Add unsafe_copy_from_user powerpc/signal: Add unsafe_copy_{vsx,fpr}_from_user() powerpc: Mark functions called inside uaccess blocks w/ 'notrace' powerpc/signal64: Replace setup_sigcontext() w/ unsafe_setup_sigcontext() powerpc/signal64: Replace restore_sigcontext() w/ unsafe_restore_sigcontext() Daniel Axtens (3): powerpc/signal64: Replace setup_trampoline() w/ unsafe_setup_trampoline() powerpc/signal64: Rewrite handle_rt_signal64() to minimise uaccess switches powerpc/signal64: Rewrite rt_sigreturn() to minimise uaccess switches arch/powerpc/include/asm/uaccess.h | 28 ++-- arch/powerpc/kernel/process.c | 20 +-- arch/powerpc/kernel/signal.h | 33 + arch/powerpc/kernel/signal_64.c| 216 + arch/powerpc/mm/mem.c | 4 +- 5 files changed, 194 insertions(+), 107 deletions(-) -- 2.28.0
[PATCH 6/8] powerpc/signal64: Replace setup_trampoline() w/ unsafe_setup_trampoline()
From: Daniel Axtens Previously setup_trampoline() performed a costly KUAP switch on every uaccess operation. These repeated uaccess switches cause a significant drop in signal handling performance. Rewrite setup_trampoline() to assume that a userspace write access window is open. Replace all uaccess functions with their 'unsafe' versions to avoid the repeated uaccess switches. Signed-off-by: Daniel Axtens Signed-off-by: Christopher M. Riedl --- arch/powerpc/kernel/signal_64.c | 32 +++- 1 file changed, 19 insertions(+), 13 deletions(-) diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c index bd92064e5576..6d4f7a5c4fbf 100644 --- a/arch/powerpc/kernel/signal_64.c +++ b/arch/powerpc/kernel/signal_64.c @@ -600,30 +600,33 @@ static long restore_tm_sigcontexts(struct task_struct *tsk, /* * Setup the trampoline code on the stack */ -static long setup_trampoline(unsigned int syscall, unsigned int __user *tramp) +#define unsafe_setup_trampoline(syscall, tramp, e) \ + unsafe_op_wrap(__unsafe_setup_trampoline(syscall, tramp), e) +static long notrace __unsafe_setup_trampoline(unsigned int syscall, + unsigned int __user *tramp) { int i; - long err = 0; /* bctrl # call the handler */ - err |= __put_user(PPC_INST_BCTRL, &tramp[0]); + unsafe_put_user(PPC_INST_BCTRL, &tramp[0], err); /* addi r1, r1, __SIGNAL_FRAMESIZE # Pop the dummy stackframe */ - err |= __put_user(PPC_INST_ADDI | __PPC_RT(R1) | __PPC_RA(R1) | - (__SIGNAL_FRAMESIZE & 0x), &tramp[1]); + unsafe_put_user(PPC_INST_ADDI | __PPC_RT(R1) | __PPC_RA(R1) | + (__SIGNAL_FRAMESIZE & 0x), &tramp[1], err); /* li r0, __NR_[rt_]sigreturn| */ - err |= __put_user(PPC_INST_ADDI | (syscall & 0x), &tramp[2]); + unsafe_put_user(PPC_INST_ADDI | (syscall & 0x), &tramp[2], err); /* sc */ - err |= __put_user(PPC_INST_SC, &tramp[3]); + unsafe_put_user(PPC_INST_SC, &tramp[3], err); /* Minimal traceback info */ for (i=TRAMP_TRACEBACK; i < TRAMP_SIZE ;i++) - err |= __put_user(0, &tramp[i]); + unsafe_put_user(0, &tramp[i], err); - if (!err) - flush_icache_range((unsigned long) &tramp[0], - (unsigned long) &tramp[TRAMP_SIZE]); + flush_icache_range((unsigned long)&tramp[0], + (unsigned long)&tramp[TRAMP_SIZE]); - return err; + return 0; +err: + return 1; } /* @@ -888,7 +891,10 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set, if (vdso64_rt_sigtramp && tsk->mm->context.vdso_base) { regs->nip = tsk->mm->context.vdso_base + vdso64_rt_sigtramp; } else { - err |= setup_trampoline(__NR_rt_sigreturn, &frame->tramp[0]); + if (!user_write_access_begin(frame, sizeof(struct rt_sigframe))) + return -EFAULT; + err |= __unsafe_setup_trampoline(__NR_rt_sigreturn, &frame->tramp[0]); + user_write_access_end(); if (err) goto badframe; regs->nip = (unsigned long) &frame->tramp[0]; -- 2.28.0
[PATCH 5/8] powerpc/signal64: Replace restore_sigcontext() w/ unsafe_restore_sigcontext()
Previously restore_sigcontext() performed a costly KUAP switch on every uaccess operation. These repeated uaccess switches cause a significant drop in signal handling performance. Rewrite restore_sigcontext() to assume that a userspace read access window is open. Replace all uaccess functions with their 'unsafe' versions which avoid the repeated uaccess switches. Signed-off-by: Christopher M. Riedl --- arch/powerpc/kernel/signal_64.c | 68 - 1 file changed, 41 insertions(+), 27 deletions(-) diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c index 26934ceeb925..bd92064e5576 100644 --- a/arch/powerpc/kernel/signal_64.c +++ b/arch/powerpc/kernel/signal_64.c @@ -318,14 +318,14 @@ static long setup_tm_sigcontexts(struct sigcontext __user *sc, /* * Restore the sigcontext from the signal frame. */ - -static long restore_sigcontext(struct task_struct *tsk, sigset_t *set, int sig, - struct sigcontext __user *sc) +#define unsafe_restore_sigcontext(tsk, set, sig, sc, e) \ + unsafe_op_wrap(__unsafe_restore_sigcontext(tsk, set, sig, sc), e) +static long notrace __unsafe_restore_sigcontext(struct task_struct *tsk, sigset_t *set, + int sig, struct sigcontext __user *sc) { #ifdef CONFIG_ALTIVEC elf_vrreg_t __user *v_regs; #endif - unsigned long err = 0; unsigned long save_r13 = 0; unsigned long msr; struct pt_regs *regs = tsk->thread.regs; @@ -340,27 +340,28 @@ static long restore_sigcontext(struct task_struct *tsk, sigset_t *set, int sig, save_r13 = regs->gpr[13]; /* copy the GPRs */ - err |= __copy_from_user(regs->gpr, sc->gp_regs, sizeof(regs->gpr)); - err |= __get_user(regs->nip, &sc->gp_regs[PT_NIP]); + unsafe_copy_from_user(regs->gpr, sc->gp_regs, sizeof(regs->gpr), + efault_out); + unsafe_get_user(regs->nip, &sc->gp_regs[PT_NIP], efault_out); /* get MSR separately, transfer the LE bit if doing signal return */ - err |= __get_user(msr, &sc->gp_regs[PT_MSR]); + unsafe_get_user(msr, &sc->gp_regs[PT_MSR], efault_out); if (sig) regs->msr = (regs->msr & ~MSR_LE) | (msr & MSR_LE); - err |= __get_user(regs->orig_gpr3, &sc->gp_regs[PT_ORIG_R3]); - err |= __get_user(regs->ctr, &sc->gp_regs[PT_CTR]); - err |= __get_user(regs->link, &sc->gp_regs[PT_LNK]); - err |= __get_user(regs->xer, &sc->gp_regs[PT_XER]); - err |= __get_user(regs->ccr, &sc->gp_regs[PT_CCR]); + unsafe_get_user(regs->orig_gpr3, &sc->gp_regs[PT_ORIG_R3], efault_out); + unsafe_get_user(regs->ctr, &sc->gp_regs[PT_CTR], efault_out); + unsafe_get_user(regs->link, &sc->gp_regs[PT_LNK], efault_out); + unsafe_get_user(regs->xer, &sc->gp_regs[PT_XER], efault_out); + unsafe_get_user(regs->ccr, &sc->gp_regs[PT_CCR], efault_out); /* Don't allow userspace to set SOFTE */ set_trap_norestart(regs); - err |= __get_user(regs->dar, &sc->gp_regs[PT_DAR]); - err |= __get_user(regs->dsisr, &sc->gp_regs[PT_DSISR]); - err |= __get_user(regs->result, &sc->gp_regs[PT_RESULT]); + unsafe_get_user(regs->dar, &sc->gp_regs[PT_DAR], efault_out); + unsafe_get_user(regs->dsisr, &sc->gp_regs[PT_DSISR], efault_out); + unsafe_get_user(regs->result, &sc->gp_regs[PT_RESULT], efault_out); if (!sig) regs->gpr[13] = save_r13; if (set != NULL) - err |= __get_user(set->sig[0], &sc->oldmask); + unsafe_get_user(set->sig[0], &sc->oldmask, efault_out); /* * Force reload of FP/VEC. @@ -370,29 +371,28 @@ static long restore_sigcontext(struct task_struct *tsk, sigset_t *set, int sig, regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC | MSR_VSX); #ifdef CONFIG_ALTIVEC - err |= __get_user(v_regs, &sc->v_regs); - if (err) - return err; + unsafe_get_user(v_regs, &sc->v_regs, efault_out); if (v_regs && !access_ok(v_regs, 34 * sizeof(vector128))) return -EFAULT; /* Copy 33 vec registers (vr0..31 and vscr) from the stack */ if (v_regs != NULL && (msr & MSR_VEC) != 0) { - err |= __copy_from_user(&tsk->thread.vr_state, v_regs, - 33 * sizeof(vector128)); + unsafe_copy_from_user(&tsk->thread.vr_state, v_regs, + 33 * sizeof(vector128), efault_out); tsk->thread.used_vr = true; } else if (
[PATCH 4/8] powerpc/signal64: Replace setup_sigcontext() w/ unsafe_setup_sigcontext()
Previously setup_sigcontext() performed a costly KUAP switch on every uaccess operation. These repeated uaccess switches cause a significant drop in signal handling performance. Rewrite setup_sigcontext() to assume that a userspace write access window is open. Replace all uaccess functions with their 'unsafe' versions which avoid the repeated uaccess switches. Signed-off-by: Christopher M. Riedl --- arch/powerpc/kernel/signal_64.c | 71 - 1 file changed, 44 insertions(+), 27 deletions(-) diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c index 7df088b9ad0f..26934ceeb925 100644 --- a/arch/powerpc/kernel/signal_64.c +++ b/arch/powerpc/kernel/signal_64.c @@ -83,9 +83,13 @@ static elf_vrreg_t __user *sigcontext_vmx_regs(struct sigcontext __user *sc) * Set up the sigcontext for the signal frame. */ -static long setup_sigcontext(struct sigcontext __user *sc, - struct task_struct *tsk, int signr, sigset_t *set, - unsigned long handler, int ctx_has_vsx_region) +#define unsafe_setup_sigcontext(sc, tsk, signr, set, handler, \ + ctx_has_vsx_region, e) \ + unsafe_op_wrap(__unsafe_setup_sigcontext(sc, tsk, signr, set, \ + handler, ctx_has_vsx_region), e) +static long notrace __unsafe_setup_sigcontext(struct sigcontext __user *sc, + struct task_struct *tsk, int signr, sigset_t *set, + unsigned long handler, int ctx_has_vsx_region) { /* When CONFIG_ALTIVEC is set, we _always_ setup v_regs even if the * process never used altivec yet (MSR_VEC is zero in pt_regs of @@ -101,21 +105,20 @@ static long setup_sigcontext(struct sigcontext __user *sc, #endif struct pt_regs *regs = tsk->thread.regs; unsigned long msr = regs->msr; - long err = 0; /* Force usr to alway see softe as 1 (interrupts enabled) */ unsigned long softe = 0x1; BUG_ON(tsk != current); #ifdef CONFIG_ALTIVEC - err |= __put_user(v_regs, &sc->v_regs); + unsafe_put_user(v_regs, &sc->v_regs, efault_out); /* save altivec registers */ if (tsk->thread.used_vr) { flush_altivec_to_thread(tsk); /* Copy 33 vec registers (vr0..31 and vscr) to the stack */ - err |= __copy_to_user(v_regs, &tsk->thread.vr_state, - 33 * sizeof(vector128)); + unsafe_copy_to_user(v_regs, &tsk->thread.vr_state, + 33 * sizeof(vector128), efault_out); /* set MSR_VEC in the MSR value in the frame to indicate that sc->v_reg) * contains valid data. */ @@ -130,13 +133,13 @@ static long setup_sigcontext(struct sigcontext __user *sc, tsk->thread.vrsave = vrsave; } - err |= __put_user(vrsave, (u32 __user *)&v_regs[33]); + unsafe_put_user(vrsave, (u32 __user *)&v_regs[33], efault_out); #else /* CONFIG_ALTIVEC */ - err |= __put_user(0, &sc->v_regs); + unsafe_put_user(0, &sc->v_regs, efault_out); #endif /* CONFIG_ALTIVEC */ flush_fp_to_thread(tsk); /* copy fpr regs and fpscr */ - err |= copy_fpr_to_user(&sc->fp_regs, tsk); + unsafe_copy_fpr_to_user(&sc->fp_regs, tsk, efault_out); /* * Clear the MSR VSX bit to indicate there is no valid state attached @@ -152,24 +155,27 @@ static long setup_sigcontext(struct sigcontext __user *sc, if (tsk->thread.used_vsr && ctx_has_vsx_region) { flush_vsx_to_thread(tsk); v_regs += ELF_NVRREG; - err |= copy_vsx_to_user(v_regs, tsk); + unsafe_copy_vsx_to_user(v_regs, tsk, efault_out); /* set MSR_VSX in the MSR value in the frame to * indicate that sc->vs_reg) contains valid data. */ msr |= MSR_VSX; } #endif /* CONFIG_VSX */ - err |= __put_user(&sc->gp_regs, &sc->regs); + unsafe_put_user(&sc->gp_regs, &sc->regs, efault_out); WARN_ON(!FULL_REGS(regs)); - err |= __copy_to_user(&sc->gp_regs, regs, GP_REGS_SIZE); - err |= __put_user(msr, &sc->gp_regs[PT_MSR]); - err |= __put_user(softe, &sc->gp_regs[PT_SOFTE]); - err |= __put_user(signr, &sc->signal); - err |= __put_user(handler, &sc->handler); + unsafe_copy_to_user(&sc->gp_regs, regs, GP_REGS_SIZE, efault_out); + unsafe_put_user(msr, &sc->gp_regs[PT_MSR], efault_out); + unsafe_put_user(softe, &sc->gp_regs[PT_SOFTE], efault_out); + unsafe_put_user(signr, &sc->signal, efault_out); + uns
[PATCH 2/8] powerpc/signal: Add unsafe_copy_{vsx,fpr}_from_user()
Reuse the "safe" implementation from signal.c except for calling unsafe_copy_from_user() to copy into a local buffer. Unlike the unsafe_copy_{vsx,fpr}_to_user() functions the "copy from" functions cannot use unsafe_get_user() directly to bypass the local buffer since doing so significantly reduces signal handling performance. Signed-off-by: Christopher M. Riedl --- arch/powerpc/kernel/signal.h | 33 + 1 file changed, 33 insertions(+) diff --git a/arch/powerpc/kernel/signal.h b/arch/powerpc/kernel/signal.h index 2559a681536e..e9aaeac0da37 100644 --- a/arch/powerpc/kernel/signal.h +++ b/arch/powerpc/kernel/signal.h @@ -53,6 +53,33 @@ unsigned long copy_ckfpr_from_user(struct task_struct *task, void __user *from); &buf[i], label);\ } while (0) +#define unsafe_copy_fpr_from_user(task, from, label) do {\ + struct task_struct *__t = task; \ + u64 __user *__f = (u64 __user *)from; \ + u64 buf[ELF_NFPREG];\ + int i; \ + \ + unsafe_copy_from_user(buf, __f, ELF_NFPREG * sizeof(double),\ + label); \ + for (i = 0; i < ELF_NFPREG - 1; i++)\ + __t->thread.TS_FPR(i) = buf[i]; \ + __t->thread.fp_state.fpscr = buf[i];\ +} while (0) + +#define unsafe_copy_vsx_from_user(task, from, label) do {\ + struct task_struct *__t = task; \ + u64 __user *__f = (u64 __user *)from; \ + u64 buf[ELF_NVSRHALFREG]; \ + int i; \ + \ + unsafe_copy_from_user(buf, __f, \ + ELF_NVSRHALFREG * sizeof(double), \ + label); \ + for (i = 0; i < ELF_NVSRHALFREG ; i++) \ + __t->thread.fp_state.fpr[i][TS_VSRLOWOFFSET] = buf[i]; \ +} while (0) + + #ifdef CONFIG_PPC_TRANSACTIONAL_MEM #define unsafe_copy_ckfpr_to_user(to, task, label) do {\ struct task_struct *__t = task; \ @@ -80,6 +107,10 @@ unsigned long copy_ckfpr_from_user(struct task_struct *task, void __user *from); unsafe_copy_to_user(to, (task)->thread.fp_state.fpr,\ ELF_NFPREG * sizeof(double), label) +#define unsafe_copy_fpr_from_user(task, from, label) \ + unsafe_copy_from_user((task)->thread.fp_state.fpr, from \ + ELF_NFPREG * sizeof(double), label) + static inline unsigned long copy_fpr_to_user(void __user *to, struct task_struct *task) { @@ -115,6 +146,8 @@ copy_ckfpr_from_user(struct task_struct *task, void __user *from) #else #define unsafe_copy_fpr_to_user(to, task, label) do { } while (0) +#define unsafe_copy_fpr_from_user(task, from, label) do { } while (0) + static inline unsigned long copy_fpr_to_user(void __user *to, struct task_struct *task) { -- 2.28.0
[PATCH 8/8] powerpc/signal64: Rewrite rt_sigreturn() to minimise uaccess switches
From: Daniel Axtens Add uaccess blocks and use the 'unsafe' versions of functions doing user access where possible to reduce the number of times uaccess has to be opened/closed. Signed-off-by: Daniel Axtens Signed-off-by: Christopher M. Riedl --- arch/powerpc/kernel/signal_64.c | 23 +++ 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c index 3b97e3681a8f..0f4ff7a5bfc1 100644 --- a/arch/powerpc/kernel/signal_64.c +++ b/arch/powerpc/kernel/signal_64.c @@ -779,18 +779,22 @@ SYSCALL_DEFINE0(rt_sigreturn) */ regs->msr &= ~MSR_TS_MASK; - if (__get_user(msr, &uc->uc_mcontext.gp_regs[PT_MSR])) + if (!user_read_access_begin(uc, sizeof(*uc))) goto badframe; + + unsafe_get_user(msr, &uc->uc_mcontext.gp_regs[PT_MSR], badframe_block); + if (MSR_TM_ACTIVE(msr)) { /* We recheckpoint on return. */ struct ucontext __user *uc_transact; /* Trying to start TM on non TM system */ if (!cpu_has_feature(CPU_FTR_TM)) - goto badframe; + goto badframe_block; + + unsafe_get_user(uc_transact, &uc->uc_link, badframe_block); + user_read_access_end(); - if (__get_user(uc_transact, &uc->uc_link)) - goto badframe; if (restore_tm_sigcontexts(current, &uc->uc_mcontext, &uc_transact->uc_mcontext)) goto badframe; @@ -810,12 +814,13 @@ SYSCALL_DEFINE0(rt_sigreturn) * causing a TM bad thing. */ current->thread.regs->msr &= ~MSR_TS_MASK; + +#ifndef CONFIG_PPC_TRANSACTIONAL_MEM if (!user_read_access_begin(uc, sizeof(*uc))) - return -EFAULT; - if (__unsafe_restore_sigcontext(current, NULL, 1, &uc->uc_mcontext)) { - user_read_access_end(); goto badframe; - } +#endif + unsafe_restore_sigcontext(current, NULL, 1, &uc->uc_mcontext, + badframe_block); user_read_access_end(); } @@ -825,6 +830,8 @@ SYSCALL_DEFINE0(rt_sigreturn) set_thread_flag(TIF_RESTOREALL); return 0; +badframe_block: + user_read_access_end(); badframe: signal_fault(current, regs, "rt_sigreturn", uc); -- 2.28.0
[PATCH 7/8] powerpc/signal64: Rewrite handle_rt_signal64() to minimise uaccess switches
From: Daniel Axtens Add uaccess blocks and use the 'unsafe' versions of functions doing user access where possible to reduce the number of times uaccess has to be opened/closed. There is no 'unsafe' version of copy_siginfo_to_user, so move it slightly to allow for a "longer" uaccess block. Signed-off-by: Daniel Axtens Signed-off-by: Christopher M. Riedl --- arch/powerpc/kernel/signal_64.c | 54 - 1 file changed, 27 insertions(+), 27 deletions(-) diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c index 6d4f7a5c4fbf..3b97e3681a8f 100644 --- a/arch/powerpc/kernel/signal_64.c +++ b/arch/powerpc/kernel/signal_64.c @@ -843,46 +843,42 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set, /* Save the thread's msr before get_tm_stackpointer() changes it */ unsigned long msr = regs->msr; #endif - frame = get_sigframe(ksig, tsk, sizeof(*frame), 0); - if (!access_ok(frame, sizeof(*frame))) + if (!user_write_access_begin(frame, sizeof(*frame))) goto badframe; - err |= __put_user(&frame->info, &frame->pinfo); - err |= __put_user(&frame->uc, &frame->puc); - err |= copy_siginfo_to_user(&frame->info, &ksig->info); - if (err) - goto badframe; + unsafe_put_user(&frame->info, &frame->pinfo, badframe_block); + unsafe_put_user(&frame->uc, &frame->puc, badframe_block); /* Create the ucontext. */ - err |= __put_user(0, &frame->uc.uc_flags); - err |= __save_altstack(&frame->uc.uc_stack, regs->gpr[1]); + unsafe_put_user(0, &frame->uc.uc_flags, badframe_block); + unsafe_save_altstack(&frame->uc.uc_stack, regs->gpr[1], badframe_block); + #ifdef CONFIG_PPC_TRANSACTIONAL_MEM if (MSR_TM_ACTIVE(msr)) { /* The ucontext_t passed to userland points to the second * ucontext_t (for transactional state) with its uc_link ptr. */ - err |= __put_user(&frame->uc_transact, &frame->uc.uc_link); + unsafe_put_user(&frame->uc_transact, &frame->uc.uc_link, badframe_block); + user_write_access_end(); err |= setup_tm_sigcontexts(&frame->uc.uc_mcontext, &frame->uc_transact.uc_mcontext, tsk, ksig->sig, NULL, (unsigned long)ksig->ka.sa.sa_handler, msr); + if (!user_write_access_begin(frame, sizeof(struct rt_sigframe))) + goto badframe; + } else #endif { - err |= __put_user(0, &frame->uc.uc_link); - - if (!user_write_access_begin(frame, sizeof(struct rt_sigframe))) - return -EFAULT; - err |= __unsafe_setup_sigcontext(&frame->uc.uc_mcontext, tsk, - ksig->sig, NULL, - (unsigned long)ksig->ka.sa.sa_handler, 1); - user_write_access_end(); + unsafe_put_user(0, &frame->uc.uc_link, badframe_block); + unsafe_setup_sigcontext(&frame->uc.uc_mcontext, tsk, ksig->sig, + NULL, (unsigned long)ksig->ka.sa.sa_handler, + 1, badframe_block); } - err |= __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set)); - if (err) - goto badframe; + + unsafe_copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set), badframe_block); /* Make sure signal handler doesn't get spurious FP exceptions */ tsk->thread.fp_state.fpscr = 0; @@ -891,15 +887,17 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set, if (vdso64_rt_sigtramp && tsk->mm->context.vdso_base) { regs->nip = tsk->mm->context.vdso_base + vdso64_rt_sigtramp; } else { - if (!user_write_access_begin(frame, sizeof(struct rt_sigframe))) - return -EFAULT; - err |= __unsafe_setup_trampoline(__NR_rt_sigreturn, &frame->tramp[0]); - user_write_access_end(); - if (err) - goto badframe; + unsafe_setup_trampoline(__NR_rt_sigreturn, &frame->tramp[0], + badframe_block); regs->nip = (unsigned long) &frame->tramp[0]; } + user_write_access_end(); + + /* Save the siginfo outside of the unsafe block. */ + if (copy_siginfo_to_user(&frame
[PATCH 1/8] powerpc/uaccess: Add unsafe_copy_from_user
Implement raw_copy_from_user_allowed() which assumes that userspace read access is open. Use this new function to implement raw_copy_from_user(). Finally, wrap the new function to follow the usual "unsafe_" convention of taking a label argument. The new raw_copy_from_user_allowed() calls __copy_tofrom_user() internally, but this is still safe to call in user access blocks formed with user_*_access_begin()/user_*_access_end() since asm functions are not instrumented for tracing. Signed-off-by: Christopher M. Riedl --- arch/powerpc/include/asm/uaccess.h | 28 +++- 1 file changed, 19 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h index 26781b044932..66940b4eb692 100644 --- a/arch/powerpc/include/asm/uaccess.h +++ b/arch/powerpc/include/asm/uaccess.h @@ -418,38 +418,45 @@ raw_copy_in_user(void __user *to, const void __user *from, unsigned long n) } #endif /* __powerpc64__ */ -static inline unsigned long raw_copy_from_user(void *to, - const void __user *from, unsigned long n) +static inline unsigned long +raw_copy_from_user_allowed(void *to, const void __user *from, unsigned long n) { - unsigned long ret; if (__builtin_constant_p(n) && (n <= 8)) { - ret = 1; + unsigned long ret = 1; switch (n) { case 1: barrier_nospec(); - __get_user_size(*(u8 *)to, from, 1, ret); + __get_user_size_allowed(*(u8 *)to, from, 1, ret); break; case 2: barrier_nospec(); - __get_user_size(*(u16 *)to, from, 2, ret); + __get_user_size_allowed(*(u16 *)to, from, 2, ret); break; case 4: barrier_nospec(); - __get_user_size(*(u32 *)to, from, 4, ret); + __get_user_size_allowed(*(u32 *)to, from, 4, ret); break; case 8: barrier_nospec(); - __get_user_size(*(u64 *)to, from, 8, ret); + __get_user_size_allowed(*(u64 *)to, from, 8, ret); break; } if (ret == 0) return 0; } + return __copy_tofrom_user((__force void __user *)to, from, n); +} + +static inline unsigned long +raw_copy_from_user(void *to, const void __user *from, unsigned long n) +{ + unsigned long ret; + barrier_nospec(); allow_read_from_user(from, n); - ret = __copy_tofrom_user((__force void __user *)to, from, n); + ret = raw_copy_from_user_allowed(to, from, n); prevent_read_from_user(from, n); return ret; } @@ -571,6 +578,9 @@ user_write_access_begin(const void __user *ptr, size_t len) #define unsafe_get_user(x, p, e) unsafe_op_wrap(__get_user_allowed(x, p), e) #define unsafe_put_user(x, p, e) __put_user_goto(x, p, e) +#define unsafe_copy_from_user(d, s, l, e) \ + unsafe_op_wrap(raw_copy_from_user_allowed(d, s, l), e) + #define unsafe_copy_to_user(d, s, l, e) \ do { \ u8 __user *_dst = (u8 __user *)(d); \ -- 2.28.0
[PATCH 3/8] powerpc: Mark functions called inside uaccess blocks w/ 'notrace'
Functions called between user_*_access_begin() and user_*_access_end() should be either inlined or marked 'notrace' to prevent leaving userspace access exposed. Mark any such functions relevant to signal handling so that subsequent patches can call them inside uaccess blocks. Signed-off-by: Christopher M. Riedl --- arch/powerpc/kernel/process.c | 20 ++-- arch/powerpc/mm/mem.c | 4 ++-- 2 files changed, 12 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index ba2c987b8403..bf5d9654bd2c 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -84,7 +84,7 @@ extern unsigned long _get_SP(void); */ bool tm_suspend_disabled __ro_after_init = false; -static void check_if_tm_restore_required(struct task_struct *tsk) +static void notrace check_if_tm_restore_required(struct task_struct *tsk) { /* * If we are saving the current thread's registers, and the @@ -151,7 +151,7 @@ void notrace __msr_check_and_clear(unsigned long bits) EXPORT_SYMBOL(__msr_check_and_clear); #ifdef CONFIG_PPC_FPU -static void __giveup_fpu(struct task_struct *tsk) +static void notrace __giveup_fpu(struct task_struct *tsk) { unsigned long msr; @@ -163,7 +163,7 @@ static void __giveup_fpu(struct task_struct *tsk) tsk->thread.regs->msr = msr; } -void giveup_fpu(struct task_struct *tsk) +void notrace giveup_fpu(struct task_struct *tsk) { check_if_tm_restore_required(tsk); @@ -177,7 +177,7 @@ EXPORT_SYMBOL(giveup_fpu); * Make sure the floating-point register state in the * the thread_struct is up to date for task tsk. */ -void flush_fp_to_thread(struct task_struct *tsk) +void notrace flush_fp_to_thread(struct task_struct *tsk) { if (tsk->thread.regs) { /* @@ -234,7 +234,7 @@ static inline void __giveup_fpu(struct task_struct *tsk) { } #endif /* CONFIG_PPC_FPU */ #ifdef CONFIG_ALTIVEC -static void __giveup_altivec(struct task_struct *tsk) +static void notrace __giveup_altivec(struct task_struct *tsk) { unsigned long msr; @@ -246,7 +246,7 @@ static void __giveup_altivec(struct task_struct *tsk) tsk->thread.regs->msr = msr; } -void giveup_altivec(struct task_struct *tsk) +void notrace giveup_altivec(struct task_struct *tsk) { check_if_tm_restore_required(tsk); @@ -285,7 +285,7 @@ EXPORT_SYMBOL(enable_kernel_altivec); * Make sure the VMX/Altivec register state in the * the thread_struct is up to date for task tsk. */ -void flush_altivec_to_thread(struct task_struct *tsk) +void notrace flush_altivec_to_thread(struct task_struct *tsk) { if (tsk->thread.regs) { preempt_disable(); @@ -300,7 +300,7 @@ EXPORT_SYMBOL_GPL(flush_altivec_to_thread); #endif /* CONFIG_ALTIVEC */ #ifdef CONFIG_VSX -static void __giveup_vsx(struct task_struct *tsk) +static void notrace __giveup_vsx(struct task_struct *tsk) { unsigned long msr = tsk->thread.regs->msr; @@ -317,7 +317,7 @@ static void __giveup_vsx(struct task_struct *tsk) __giveup_altivec(tsk); } -static void giveup_vsx(struct task_struct *tsk) +static void notrace giveup_vsx(struct task_struct *tsk) { check_if_tm_restore_required(tsk); @@ -352,7 +352,7 @@ void enable_kernel_vsx(void) } EXPORT_SYMBOL(enable_kernel_vsx); -void flush_vsx_to_thread(struct task_struct *tsk) +void notrace flush_vsx_to_thread(struct task_struct *tsk) { if (tsk->thread.regs) { preempt_disable(); diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c index ddc32cc1b6cf..da2345a2abc6 100644 --- a/arch/powerpc/mm/mem.c +++ b/arch/powerpc/mm/mem.c @@ -378,7 +378,7 @@ static inline bool flush_coherent_icache(unsigned long addr) * @start: the start address * @stop: the stop address (exclusive) */ -static void invalidate_icache_range(unsigned long start, unsigned long stop) +static void notrace invalidate_icache_range(unsigned long start, unsigned long stop) { unsigned long shift = l1_icache_shift(); unsigned long bytes = l1_icache_bytes(); @@ -402,7 +402,7 @@ static void invalidate_icache_range(unsigned long start, unsigned long stop) * @start: the start address * @stop: the stop address (exclusive) */ -void flush_icache_range(unsigned long start, unsigned long stop) +void notrace flush_icache_range(unsigned long start, unsigned long stop) { if (flush_coherent_icache(start)) return; -- 2.28.0
Re: [PATCH 3/8] powerpc: Mark functions called inside uaccess blocks w/ 'notrace'
On Fri Oct 16, 2020 at 4:02 AM CDT, Christophe Leroy wrote: > > > Le 15/10/2020 à 17:01, Christopher M. Riedl a écrit : > > Functions called between user_*_access_begin() and user_*_access_end() > > should be either inlined or marked 'notrace' to prevent leaving > > userspace access exposed. Mark any such functions relevant to signal > > handling so that subsequent patches can call them inside uaccess blocks. > > Is it enough to mark it "notrace" ? I see that when I activate KASAN, > there are still KASAN calls in > those functions. > Maybe not enough after all :( > In my series for 32 bits, I re-ordered stuff in order to do all those > calls before doing the > _access_begin(), can't you do the same on PPC64 ? (See > https://patchwork.ozlabs.org/project/linuxppc-dev/patch/f6eac65781b4a57220477c8864bca2b57f29a5d5.1597770847.git.christophe.le...@csgroup.eu/) > Yes, I will give this another shot in the next spin. > Christophe > > > > > Signed-off-by: Christopher M. Riedl > > --- > > arch/powerpc/kernel/process.c | 20 ++-- > > arch/powerpc/mm/mem.c | 4 ++-- > > 2 files changed, 12 insertions(+), 12 deletions(-) > > > > diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c > > index ba2c987b8403..bf5d9654bd2c 100644 > > --- a/arch/powerpc/kernel/process.c > > +++ b/arch/powerpc/kernel/process.c > > @@ -84,7 +84,7 @@ extern unsigned long _get_SP(void); > >*/ > > bool tm_suspend_disabled __ro_after_init = false; > > > > -static void check_if_tm_restore_required(struct task_struct *tsk) > > +static void notrace check_if_tm_restore_required(struct task_struct *tsk) > > { > > /* > > * If we are saving the current thread's registers, and the > > @@ -151,7 +151,7 @@ void notrace __msr_check_and_clear(unsigned long bits) > > EXPORT_SYMBOL(__msr_check_and_clear); > > > > #ifdef CONFIG_PPC_FPU > > -static void __giveup_fpu(struct task_struct *tsk) > > +static void notrace __giveup_fpu(struct task_struct *tsk) > > { > > unsigned long msr; > > > > @@ -163,7 +163,7 @@ static void __giveup_fpu(struct task_struct *tsk) > > tsk->thread.regs->msr = msr; > > } > > > > -void giveup_fpu(struct task_struct *tsk) > > +void notrace giveup_fpu(struct task_struct *tsk) > > { > > check_if_tm_restore_required(tsk); > > > > @@ -177,7 +177,7 @@ EXPORT_SYMBOL(giveup_fpu); > >* Make sure the floating-point register state in the > >* the thread_struct is up to date for task tsk. > >*/ > > -void flush_fp_to_thread(struct task_struct *tsk) > > +void notrace flush_fp_to_thread(struct task_struct *tsk) > > { > > if (tsk->thread.regs) { > > /* > > @@ -234,7 +234,7 @@ static inline void __giveup_fpu(struct task_struct > > *tsk) { } > > #endif /* CONFIG_PPC_FPU */ > > > > #ifdef CONFIG_ALTIVEC > > -static void __giveup_altivec(struct task_struct *tsk) > > +static void notrace __giveup_altivec(struct task_struct *tsk) > > { > > unsigned long msr; > > > > @@ -246,7 +246,7 @@ static void __giveup_altivec(struct task_struct *tsk) > > tsk->thread.regs->msr = msr; > > } > > > > -void giveup_altivec(struct task_struct *tsk) > > +void notrace giveup_altivec(struct task_struct *tsk) > > { > > check_if_tm_restore_required(tsk); > > > > @@ -285,7 +285,7 @@ EXPORT_SYMBOL(enable_kernel_altivec); > >* Make sure the VMX/Altivec register state in the > >* the thread_struct is up to date for task tsk. > >*/ > > -void flush_altivec_to_thread(struct task_struct *tsk) > > +void notrace flush_altivec_to_thread(struct task_struct *tsk) > > { > > if (tsk->thread.regs) { > > preempt_disable(); > > @@ -300,7 +300,7 @@ EXPORT_SYMBOL_GPL(flush_altivec_to_thread); > > #endif /* CONFIG_ALTIVEC */ > > > > #ifdef CONFIG_VSX > > -static void __giveup_vsx(struct task_struct *tsk) > > +static void notrace __giveup_vsx(struct task_struct *tsk) > > { > > unsigned long msr = tsk->thread.regs->msr; > > > > @@ -317,7 +317,7 @@ static void __giveup_vsx(struct task_struct *tsk) > > __giveup_altivec(tsk); > > } > > > > -static void giveup_vsx(struct task_struct *tsk) > > +static void notrace giveup_vsx(struct task_struct *tsk) > > { > > check_if_tm_restore_required(tsk);
Re: [PATCH 2/8] powerpc/signal: Add unsafe_copy_{vsx,fpr}_from_user()
On Fri Oct 16, 2020 at 10:48 AM CDT, Christophe Leroy wrote: > > > Le 15/10/2020 à 17:01, Christopher M. Riedl a écrit : > > Reuse the "safe" implementation from signal.c except for calling > > unsafe_copy_from_user() to copy into a local buffer. Unlike the > > unsafe_copy_{vsx,fpr}_to_user() functions the "copy from" functions > > cannot use unsafe_get_user() directly to bypass the local buffer since > > doing so significantly reduces signal handling performance. > > Why can't the functions use unsafe_get_user(), why does it significantly > reduces signal handling > performance ? How much significant ? I would expect that not going > through an intermediate memory > area would be more efficient > Here is a comparison, 'unsafe-signal64-regs' avoids the intermediate buffer: | | hash | radix | | | -- | -- | | linuxppc/next| 289014 | 158408 | | unsafe-signal64 | 298506 | 253053 | | unsafe-signal64-regs | 254898 | 220831 | I have not figured out the 'why' yet. As you mentioned in your series, technically calling __copy_tofrom_user() is overkill for these operations. The only obvious difference between unsafe_put_user() and unsafe_get_user() is that we don't have asm-goto for the 'get' variant. Instead we wrap with unsafe_op_wrap() which inserts a conditional and then goto to the label. Implemenations: #define unsafe_copy_fpr_from_user(task, from, label) do {\ struct task_struct *__t = task; \ u64 __user *buf = (u64 __user *)from; \ int i; \ \ for (i = 0; i < ELF_NFPREG - 1; i++)\ unsafe_get_user(__t->thread.TS_FPR(i), &buf[i], label); \ unsafe_get_user(__t->thread.fp_state.fpscr, &buf[i], label);\ } while (0) #define unsafe_copy_vsx_from_user(task, from, label) do {\ struct task_struct *__t = task; \ u64 __user *buf = (u64 __user *)from; \ int i; \ \ for (i = 0; i < ELF_NVSRHALFREG ; i++) \ unsafe_get_user(__t->thread.fp_state.fpr[i][TS_VSRLOWOFFSET], \ &buf[i], label);\ } while (0) > Christophe > > > > > > Signed-off-by: Christopher M. Riedl > > --- > > arch/powerpc/kernel/signal.h | 33 + > > 1 file changed, 33 insertions(+) > > > > diff --git a/arch/powerpc/kernel/signal.h b/arch/powerpc/kernel/signal.h > > index 2559a681536e..e9aaeac0da37 100644 > > --- a/arch/powerpc/kernel/signal.h > > +++ b/arch/powerpc/kernel/signal.h > > @@ -53,6 +53,33 @@ unsigned long copy_ckfpr_from_user(struct task_struct > > *task, void __user *from); > > &buf[i], label);\ > > } while (0) > > > > +#define unsafe_copy_fpr_from_user(task, from, label) do { > > \ > > + struct task_struct *__t = task; \ > > + u64 __user *__f = (u64 __user *)from; \ > > + u64 buf[ELF_NFPREG];\ > > + int i; \ > > + \ > > + unsafe_copy_from_user(buf, __f, ELF_NFPREG * sizeof(double),\ > > + label); \ > > + for (i = 0; i < ELF_NFPREG - 1; i++)\ > > + __t->thread.TS_FPR(i) = buf[i]; \ > > + __t->thread.fp_state.fpscr = buf[i];\ > > +} while (0) > > + > > +#define unsafe_copy_vsx_from_user(task, from, label) do { > > \ > > + struct task_struct *__t = task; \ > > + u64 __user *__f = (u64 __user *)from; \ > > + u64 buf[ELF_NVSRHALFREG]; \ > > + int i; \ > > +
Re: [PATCH 6/8] powerpc/signal64: Replace setup_trampoline() w/ unsafe_setup_trampoline()
On Fri Oct 16, 2020 at 10:56 AM CDT, Christophe Leroy wrote: > > > Le 15/10/2020 à 17:01, Christopher M. Riedl a écrit : > > From: Daniel Axtens > > > > Previously setup_trampoline() performed a costly KUAP switch on every > > uaccess operation. These repeated uaccess switches cause a significant > > drop in signal handling performance. > > > > Rewrite setup_trampoline() to assume that a userspace write access > > window is open. Replace all uaccess functions with their 'unsafe' > > versions to avoid the repeated uaccess switches. > > > > Signed-off-by: Daniel Axtens > > Signed-off-by: Christopher M. Riedl > > --- > > arch/powerpc/kernel/signal_64.c | 32 +++- > > 1 file changed, 19 insertions(+), 13 deletions(-) > > > > diff --git a/arch/powerpc/kernel/signal_64.c > > b/arch/powerpc/kernel/signal_64.c > > index bd92064e5576..6d4f7a5c4fbf 100644 > > --- a/arch/powerpc/kernel/signal_64.c > > +++ b/arch/powerpc/kernel/signal_64.c > > @@ -600,30 +600,33 @@ static long restore_tm_sigcontexts(struct task_struct > > *tsk, > > /* > >* Setup the trampoline code on the stack > >*/ > > -static long setup_trampoline(unsigned int syscall, unsigned int __user > > *tramp) > > +#define unsafe_setup_trampoline(syscall, tramp, e) \ > > + unsafe_op_wrap(__unsafe_setup_trampoline(syscall, tramp), e) > > +static long notrace __unsafe_setup_trampoline(unsigned int syscall, > > + unsigned int __user *tramp) > > { > > int i; > > - long err = 0; > > > > /* bctrl # call the handler */ > > - err |= __put_user(PPC_INST_BCTRL, &tramp[0]); > > + unsafe_put_user(PPC_INST_BCTRL, &tramp[0], err); > > /* addi r1, r1, __SIGNAL_FRAMESIZE # Pop the dummy stackframe */ > > - err |= __put_user(PPC_INST_ADDI | __PPC_RT(R1) | __PPC_RA(R1) | > > - (__SIGNAL_FRAMESIZE & 0x), &tramp[1]); > > + unsafe_put_user(PPC_INST_ADDI | __PPC_RT(R1) | __PPC_RA(R1) | > > + (__SIGNAL_FRAMESIZE & 0x), &tramp[1], err); > > /* li r0, __NR_[rt_]sigreturn| */ > > - err |= __put_user(PPC_INST_ADDI | (syscall & 0x), &tramp[2]); > > + unsafe_put_user(PPC_INST_ADDI | (syscall & 0x), &tramp[2], err); > > /* sc */ > > - err |= __put_user(PPC_INST_SC, &tramp[3]); > > + unsafe_put_user(PPC_INST_SC, &tramp[3], err); > > > > /* Minimal traceback info */ > > for (i=TRAMP_TRACEBACK; i < TRAMP_SIZE ;i++) > > - err |= __put_user(0, &tramp[i]); > > + unsafe_put_user(0, &tramp[i], err); > > > > - if (!err) > > - flush_icache_range((unsigned long) &tramp[0], > > - (unsigned long) &tramp[TRAMP_SIZE]); > > + flush_icache_range((unsigned long)&tramp[0], > > + (unsigned long)&tramp[TRAMP_SIZE]); > > This flush should be done outside the user_write_access block. > Hmm, I suppose that means setup_trampoline() cannot be completely "unsafe". I'll see if I can re-arrange the code which calls this function to avoid an additional uaccess block instead and push the start()/end() into setup_trampoline() directly. > > > > - return err; > > + return 0; > > +err: > > + return 1; > > } > > > > /* > > @@ -888,7 +891,10 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t > > *set, > > if (vdso64_rt_sigtramp && tsk->mm->context.vdso_base) { > > regs->nip = tsk->mm->context.vdso_base + vdso64_rt_sigtramp; > > } else { > > - err |= setup_trampoline(__NR_rt_sigreturn, &frame->tramp[0]); > > + if (!user_write_access_begin(frame, sizeof(struct rt_sigframe))) > > + return -EFAULT; > > + err |= __unsafe_setup_trampoline(__NR_rt_sigreturn, > > &frame->tramp[0]); > > + user_write_access_end(); > > if (err) > > goto badframe; > > regs->nip = (unsigned long) &frame->tramp[0]; > > > > Christophe
Re: [PATCH 7/8] powerpc/signal64: Rewrite handle_rt_signal64() to minimise uaccess switches
On Fri Oct 16, 2020 at 11:00 AM CDT, Christophe Leroy wrote: > > > Le 15/10/2020 à 17:01, Christopher M. Riedl a écrit : > > From: Daniel Axtens > > > > Add uaccess blocks and use the 'unsafe' versions of functions doing user > > access where possible to reduce the number of times uaccess has to be > > opened/closed. > > > > There is no 'unsafe' version of copy_siginfo_to_user, so move it > > slightly to allow for a "longer" uaccess block. > > > > Signed-off-by: Daniel Axtens > > Signed-off-by: Christopher M. Riedl > > --- > > arch/powerpc/kernel/signal_64.c | 54 - > > 1 file changed, 27 insertions(+), 27 deletions(-) > > > > diff --git a/arch/powerpc/kernel/signal_64.c > > b/arch/powerpc/kernel/signal_64.c > > index 6d4f7a5c4fbf..3b97e3681a8f 100644 > > --- a/arch/powerpc/kernel/signal_64.c > > +++ b/arch/powerpc/kernel/signal_64.c > > @@ -843,46 +843,42 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t > > *set, > > /* Save the thread's msr before get_tm_stackpointer() changes it */ > > unsigned long msr = regs->msr; > > #endif > > - > > frame = get_sigframe(ksig, tsk, sizeof(*frame), 0); > > - if (!access_ok(frame, sizeof(*frame))) > > + if (!user_write_access_begin(frame, sizeof(*frame))) > > goto badframe; > > > > - err |= __put_user(&frame->info, &frame->pinfo); > > - err |= __put_user(&frame->uc, &frame->puc); > > - err |= copy_siginfo_to_user(&frame->info, &ksig->info); > > - if (err) > > - goto badframe; > > + unsafe_put_user(&frame->info, &frame->pinfo, badframe_block); > > + unsafe_put_user(&frame->uc, &frame->puc, badframe_block); > > > > /* Create the ucontext. */ > > - err |= __put_user(0, &frame->uc.uc_flags); > > - err |= __save_altstack(&frame->uc.uc_stack, regs->gpr[1]); > > + unsafe_put_user(0, &frame->uc.uc_flags, badframe_block); > > + unsafe_save_altstack(&frame->uc.uc_stack, regs->gpr[1], badframe_block); > > + > > #ifdef CONFIG_PPC_TRANSACTIONAL_MEM > > if (MSR_TM_ACTIVE(msr)) { > > /* The ucontext_t passed to userland points to the second > > * ucontext_t (for transactional state) with its uc_link ptr. > > */ > > - err |= __put_user(&frame->uc_transact, &frame->uc.uc_link); > > + unsafe_put_user(&frame->uc_transact, &frame->uc.uc_link, > > badframe_block); > > + user_write_access_end(); > > Whaou. Doing this inside an #ifdef sequence is dirty. > Can you reorganise code to avoid that and to avoid nesting #ifdef/#endif > and the if/else as I did in > signal32 ? Hopefully yes - next spin! > > > err |= setup_tm_sigcontexts(&frame->uc.uc_mcontext, > > &frame->uc_transact.uc_mcontext, > > tsk, ksig->sig, NULL, > > (unsigned > > long)ksig->ka.sa.sa_handler, > > msr); > > + if (!user_write_access_begin(frame, sizeof(struct rt_sigframe))) > > + goto badframe; > > + > > } else > > #endif > > { > > - err |= __put_user(0, &frame->uc.uc_link); > > - > > - if (!user_write_access_begin(frame, sizeof(struct rt_sigframe))) > > - return -EFAULT; > > - err |= __unsafe_setup_sigcontext(&frame->uc.uc_mcontext, tsk, > > - ksig->sig, NULL, > > - (unsigned > > long)ksig->ka.sa.sa_handler, 1); > > - user_write_access_end(); > > + unsafe_put_user(0, &frame->uc.uc_link, badframe_block); > > + unsafe_setup_sigcontext(&frame->uc.uc_mcontext, tsk, ksig->sig, > > + NULL, (unsigned > > long)ksig->ka.sa.sa_handler, > > + 1, badframe_block); > > } > > - err |= __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set)); > > - if (err) > > - goto badframe; > > + > > + unsafe_copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set), > > badframe_block); > > > >
Re: [PATCH 8/8] powerpc/signal64: Rewrite rt_sigreturn() to minimise uaccess switches
On Fri Oct 16, 2020 at 11:07 AM CDT, Christophe Leroy wrote: > > > Le 15/10/2020 à 17:01, Christopher M. Riedl a écrit : > > From: Daniel Axtens > > > > Add uaccess blocks and use the 'unsafe' versions of functions doing user > > access where possible to reduce the number of times uaccess has to be > > opened/closed. > > > > Signed-off-by: Daniel Axtens > > Signed-off-by: Christopher M. Riedl > > --- > > arch/powerpc/kernel/signal_64.c | 23 +++ > > 1 file changed, 15 insertions(+), 8 deletions(-) > > > > diff --git a/arch/powerpc/kernel/signal_64.c > > b/arch/powerpc/kernel/signal_64.c > > index 3b97e3681a8f..0f4ff7a5bfc1 100644 > > --- a/arch/powerpc/kernel/signal_64.c > > +++ b/arch/powerpc/kernel/signal_64.c > > @@ -779,18 +779,22 @@ SYSCALL_DEFINE0(rt_sigreturn) > > */ > > regs->msr &= ~MSR_TS_MASK; > > > > - if (__get_user(msr, &uc->uc_mcontext.gp_regs[PT_MSR])) > > + if (!user_read_access_begin(uc, sizeof(*uc))) > > goto badframe; > > + > > + unsafe_get_user(msr, &uc->uc_mcontext.gp_regs[PT_MSR], badframe_block); > > + > > if (MSR_TM_ACTIVE(msr)) { > > /* We recheckpoint on return. */ > > struct ucontext __user *uc_transact; > > > > /* Trying to start TM on non TM system */ > > if (!cpu_has_feature(CPU_FTR_TM)) > > - goto badframe; > > + goto badframe_block; > > + > > + unsafe_get_user(uc_transact, &uc->uc_link, badframe_block); > > + user_read_access_end(); > > user_access_end() only in the if branch ? > > > > > - if (__get_user(uc_transact, &uc->uc_link)) > > - goto badframe; > > if (restore_tm_sigcontexts(current, &uc->uc_mcontext, > >&uc_transact->uc_mcontext)) > > goto badframe; > > @@ -810,12 +814,13 @@ SYSCALL_DEFINE0(rt_sigreturn) > > * causing a TM bad thing. > > */ > > current->thread.regs->msr &= ~MSR_TS_MASK; > > + > > +#ifndef CONFIG_PPC_TRANSACTIONAL_MEM > > if (!user_read_access_begin(uc, sizeof(*uc))) > > The matching user_read_access_end() is not in the same #ifndef ? That's > dirty and hard to follow. > Can you re-organise the code to avoid all those nesting ? Yes, thanks for pointing this out. I really wanted to avoid changing too much of the logic inside these functions. But I suppose I ended up creating a mess - I will fix this in the next spin. > > > - return -EFAULT; > > - if (__unsafe_restore_sigcontext(current, NULL, 1, > > &uc->uc_mcontext)) { > > - user_read_access_end(); > > goto badframe; > > - } > > +#endif > > + unsafe_restore_sigcontext(current, NULL, 1, &uc->uc_mcontext, > > + badframe_block); > > user_read_access_end(); > > } > > > > @@ -825,6 +830,8 @@ SYSCALL_DEFINE0(rt_sigreturn) > > set_thread_flag(TIF_RESTOREALL); > > return 0; > > > > +badframe_block: > > + user_read_access_end(); > > badframe: > > signal_fault(current, regs, "rt_sigreturn", uc); > > > > > > Christophe
Re: [PATCH 1/8] powerpc/uaccess: Add unsafe_copy_from_user
On Fri Oct 16, 2020 at 10:17 AM CDT, Christophe Leroy wrote: > > > Le 15/10/2020 à 17:01, Christopher M. Riedl a écrit : > > Implement raw_copy_from_user_allowed() which assumes that userspace read > > access is open. Use this new function to implement raw_copy_from_user(). > > Finally, wrap the new function to follow the usual "unsafe_" convention > > of taking a label argument. The new raw_copy_from_user_allowed() calls > > __copy_tofrom_user() internally, but this is still safe to call in user > > access blocks formed with user_*_access_begin()/user_*_access_end() > > since asm functions are not instrumented for tracing. > > Would objtool accept that if it was implemented on powerpc ? > > __copy_tofrom_user() is a function which is optimised for larger memory > copies (using dcbz, etc ...) > Do we need such an optimisation for unsafe_copy_from_user() ? Or can we > do a simple loop as done for > unsafe_copy_to_user() instead ? I tried using a simple loop based on your unsafe_copy_to_user() implementation. Similar to the copy_{vsx,fpr}_from_user() results there is a hit to signal handling performance. The results with the loop are in the 'unsafe-signal64-copy' column: | | hash | radix | | | -- | -- | | linuxppc/next| 289014 | 158408 | | unsafe-signal64 | 298506 | 253053 | | unsafe-signal64-copy | 197029 | 177002 | Similar to the copy_{vsx,fpr}_from_user() patch I don't fully understand why this performs so badly yet. Implementation: unsafe_copy_from_user(d, s, l, e) \ do { \ u8 *_dst = (u8 *)(d); \ const u8 __user *_src = (u8 __user*)(s); \ size_t _len = (l); \ int _i; \ \ for (_i = 0; _i < (_len & ~(sizeof(long) - 1)); _i += sizeof(long)) \ unsafe_get_user(*(long*)(_dst + _i), (long __user *)(_src + _i), e);\ if (IS_ENABLED(CONFIG_PPC64) && (_len & 4)) { \ unsafe_get_user(*(u32*)(_dst + _i), (u32 __user *)(_src + _i), e); \ _i += 4;\ } \ if (_len & 2) { \ unsafe_get_user(*(u16*)(_dst + _i), (u16 __user *)(_src + _i), e); \ _i += 2;\ } \ if (_len & 1) \ unsafe_get_user(*(u8*)(_dst + _i), (u8 __user *)(_src + _i), e);\ } while (0) > > Christophe > > > > > Signed-off-by: Christopher M. Riedl > > --- > > arch/powerpc/include/asm/uaccess.h | 28 +++- > > 1 file changed, 19 insertions(+), 9 deletions(-) > > > > diff --git a/arch/powerpc/include/asm/uaccess.h > > b/arch/powerpc/include/asm/uaccess.h > > index 26781b044932..66940b4eb692 100644 > > --- a/arch/powerpc/include/asm/uaccess.h > > +++ b/arch/powerpc/include/asm/uaccess.h > > @@ -418,38 +418,45 @@ raw_copy_in_user(void __user *to, const void __user > > *from, unsigned long n) > > } > > #endif /* __powerpc64__ */ > > > > -static inline unsigned long raw_copy_from_user(void *to, > > - const void __user *from, unsigned long n) > > +static inline unsigned long > > +raw_copy_from_user_allowed(void *to, const void __user *from, unsigned > > long n) > > { > > - unsigned long ret; > > if (__builtin_constant_p(n) && (n <= 8)) { > > - ret = 1; > > + unsigned long ret = 1; > > > > switch (n) { > > case 1: > > barrier_nospec(); > > - __get_user_size(*(u8 *)to, from, 1, ret); > > + __get_user_size_allowed(*(u8 *)to, from, 1, ret); > > break; > > case 2: > > barrier_nospec(); > > - __get_user_size(*(u16 *)to, from, 2, ret); > > + __get_user_siz
[PATCH v2 2/8] powerpc/signal: Add unsafe_copy_{vsx,fpr}_from_user()
Reuse the "safe" implementation from signal.c except for calling unsafe_copy_from_user() to copy into a local buffer. Unlike the unsafe_copy_{vsx,fpr}_to_user() functions the "copy from" functions cannot use unsafe_get_user() directly to bypass the local buffer since doing so significantly reduces signal handling performance. Signed-off-by: Christopher M. Riedl --- arch/powerpc/kernel/signal.h | 33 + 1 file changed, 33 insertions(+) diff --git a/arch/powerpc/kernel/signal.h b/arch/powerpc/kernel/signal.h index 2559a681536e..e9aaeac0da37 100644 --- a/arch/powerpc/kernel/signal.h +++ b/arch/powerpc/kernel/signal.h @@ -53,6 +53,33 @@ unsigned long copy_ckfpr_from_user(struct task_struct *task, void __user *from); &buf[i], label);\ } while (0) +#define unsafe_copy_fpr_from_user(task, from, label) do {\ + struct task_struct *__t = task; \ + u64 __user *__f = (u64 __user *)from; \ + u64 buf[ELF_NFPREG];\ + int i; \ + \ + unsafe_copy_from_user(buf, __f, ELF_NFPREG * sizeof(double),\ + label); \ + for (i = 0; i < ELF_NFPREG - 1; i++)\ + __t->thread.TS_FPR(i) = buf[i]; \ + __t->thread.fp_state.fpscr = buf[i];\ +} while (0) + +#define unsafe_copy_vsx_from_user(task, from, label) do {\ + struct task_struct *__t = task; \ + u64 __user *__f = (u64 __user *)from; \ + u64 buf[ELF_NVSRHALFREG]; \ + int i; \ + \ + unsafe_copy_from_user(buf, __f, \ + ELF_NVSRHALFREG * sizeof(double), \ + label); \ + for (i = 0; i < ELF_NVSRHALFREG ; i++) \ + __t->thread.fp_state.fpr[i][TS_VSRLOWOFFSET] = buf[i]; \ +} while (0) + + #ifdef CONFIG_PPC_TRANSACTIONAL_MEM #define unsafe_copy_ckfpr_to_user(to, task, label) do {\ struct task_struct *__t = task; \ @@ -80,6 +107,10 @@ unsigned long copy_ckfpr_from_user(struct task_struct *task, void __user *from); unsafe_copy_to_user(to, (task)->thread.fp_state.fpr,\ ELF_NFPREG * sizeof(double), label) +#define unsafe_copy_fpr_from_user(task, from, label) \ + unsafe_copy_from_user((task)->thread.fp_state.fpr, from \ + ELF_NFPREG * sizeof(double), label) + static inline unsigned long copy_fpr_to_user(void __user *to, struct task_struct *task) { @@ -115,6 +146,8 @@ copy_ckfpr_from_user(struct task_struct *task, void __user *from) #else #define unsafe_copy_fpr_to_user(to, task, label) do { } while (0) +#define unsafe_copy_fpr_from_user(task, from, label) do { } while (0) + static inline unsigned long copy_fpr_to_user(void __user *to, struct task_struct *task) { -- 2.29.0
[PATCH v2 3/8] powerpc/signal64: Move non-inline functions out of setup_sigcontext()
There are non-inline functions which get called in setup_sigcontext() to save register state to the thread struct. Move these functions into a separate prepare_setup_sigcontext() function so that setup_sigcontext() can be refactored later into an "unsafe" version which assumes an open uaccess window. Non-inline functions should be avoided when uaccess is open. Signed-off-by: Christopher M. Riedl --- arch/powerpc/kernel/signal_64.c | 32 +--- 1 file changed, 21 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c index 7df088b9ad0f..ece1f982dd05 100644 --- a/arch/powerpc/kernel/signal_64.c +++ b/arch/powerpc/kernel/signal_64.c @@ -79,6 +79,24 @@ static elf_vrreg_t __user *sigcontext_vmx_regs(struct sigcontext __user *sc) } #endif +static void prepare_setup_sigcontext(struct task_struct *tsk, int ctx_has_vsx_region) +{ +#ifdef CONFIG_ALTIVEC + /* save altivec registers */ + if (tsk->thread.used_vr) + flush_altivec_to_thread(tsk); + if (cpu_has_feature(CPU_FTR_ALTIVEC)) + tsk->thread.vrsave = mfspr(SPRN_VRSAVE); +#endif /* CONFIG_ALTIVEC */ + + flush_fp_to_thread(tsk); + +#ifdef CONFIG_VSX + if (tsk->thread.used_vsr && ctx_has_vsx_region) + flush_vsx_to_thread(tsk); +#endif /* CONFIG_VSX */ +} + /* * Set up the sigcontext for the signal frame. */ @@ -97,7 +115,6 @@ static long setup_sigcontext(struct sigcontext __user *sc, */ #ifdef CONFIG_ALTIVEC elf_vrreg_t __user *v_regs = sigcontext_vmx_regs(sc); - unsigned long vrsave; #endif struct pt_regs *regs = tsk->thread.regs; unsigned long msr = regs->msr; @@ -112,7 +129,6 @@ static long setup_sigcontext(struct sigcontext __user *sc, /* save altivec registers */ if (tsk->thread.used_vr) { - flush_altivec_to_thread(tsk); /* Copy 33 vec registers (vr0..31 and vscr) to the stack */ err |= __copy_to_user(v_regs, &tsk->thread.vr_state, 33 * sizeof(vector128)); @@ -124,17 +140,10 @@ static long setup_sigcontext(struct sigcontext __user *sc, /* We always copy to/from vrsave, it's 0 if we don't have or don't * use altivec. */ - vrsave = 0; - if (cpu_has_feature(CPU_FTR_ALTIVEC)) { - vrsave = mfspr(SPRN_VRSAVE); - tsk->thread.vrsave = vrsave; - } - - err |= __put_user(vrsave, (u32 __user *)&v_regs[33]); + err |= __put_user(tsk->thread.vrsave, (u32 __user *)&v_regs[33]); #else /* CONFIG_ALTIVEC */ err |= __put_user(0, &sc->v_regs); #endif /* CONFIG_ALTIVEC */ - flush_fp_to_thread(tsk); /* copy fpr regs and fpscr */ err |= copy_fpr_to_user(&sc->fp_regs, tsk); @@ -150,7 +159,6 @@ static long setup_sigcontext(struct sigcontext __user *sc, * VMX data. */ if (tsk->thread.used_vsr && ctx_has_vsx_region) { - flush_vsx_to_thread(tsk); v_regs += ELF_NVRREG; err |= copy_vsx_to_user(v_regs, tsk); /* set MSR_VSX in the MSR value in the frame to @@ -655,6 +663,7 @@ SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx, ctx_has_vsx_region = 1; if (old_ctx != NULL) { + prepare_setup_sigcontext(current, ctx_has_vsx_region); if (!access_ok(old_ctx, ctx_size) || setup_sigcontext(&old_ctx->uc_mcontext, current, 0, NULL, 0, ctx_has_vsx_region) @@ -842,6 +851,7 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set, #endif { err |= __put_user(0, &frame->uc.uc_link); + prepare_setup_sigcontext(tsk, 1); err |= setup_sigcontext(&frame->uc.uc_mcontext, tsk, ksig->sig, NULL, (unsigned long)ksig->ka.sa.sa_handler, 1); -- 2.29.0
[PATCH v2 4/8] powerpc/signal64: Remove TM ifdefery in middle of if/else block
Similar to commit 1c32940f5220 ("powerpc/signal32: Remove ifdefery in middle of if/else") for PPC32, remove the messy ifdef. Unlike PPC32, the ifdef cannot be removed entirely since the uc_transact member of the sigframe depends on CONFIG_PPC_TRANSACTIONAL_MEM=y. Signed-off-by: Christopher M. Riedl --- arch/powerpc/kernel/signal_64.c | 17 +++-- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c index ece1f982dd05..d3e9519b2e62 100644 --- a/arch/powerpc/kernel/signal_64.c +++ b/arch/powerpc/kernel/signal_64.c @@ -710,9 +710,7 @@ SYSCALL_DEFINE0(rt_sigreturn) struct pt_regs *regs = current_pt_regs(); struct ucontext __user *uc = (struct ucontext __user *)regs->gpr[1]; sigset_t set; -#ifdef CONFIG_PPC_TRANSACTIONAL_MEM unsigned long msr; -#endif /* Always make any pending restarted system calls return -EINTR */ current->restart_block.fn = do_no_restart_syscall; @@ -762,10 +760,12 @@ SYSCALL_DEFINE0(rt_sigreturn) * restore_tm_sigcontexts. */ regs->msr &= ~MSR_TS_MASK; +#endif if (__get_user(msr, &uc->uc_mcontext.gp_regs[PT_MSR])) goto badframe; if (MSR_TM_ACTIVE(msr)) { +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM /* We recheckpoint on return. */ struct ucontext __user *uc_transact; @@ -778,9 +778,8 @@ SYSCALL_DEFINE0(rt_sigreturn) if (restore_tm_sigcontexts(current, &uc->uc_mcontext, &uc_transact->uc_mcontext)) goto badframe; - } else #endif - { + } else { /* * Fall through, for non-TM restore * @@ -818,10 +817,8 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set, unsigned long newsp = 0; long err = 0; struct pt_regs *regs = tsk->thread.regs; -#ifdef CONFIG_PPC_TRANSACTIONAL_MEM /* Save the thread's msr before get_tm_stackpointer() changes it */ - unsigned long msr = regs->msr; -#endif + unsigned long msr __maybe_unused = regs->msr; frame = get_sigframe(ksig, tsk, sizeof(*frame), 0); if (!access_ok(frame, sizeof(*frame))) @@ -836,8 +833,9 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set, /* Create the ucontext. */ err |= __put_user(0, &frame->uc.uc_flags); err |= __save_altstack(&frame->uc.uc_stack, regs->gpr[1]); -#ifdef CONFIG_PPC_TRANSACTIONAL_MEM + if (MSR_TM_ACTIVE(msr)) { +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM /* The ucontext_t passed to userland points to the second * ucontext_t (for transactional state) with its uc_link ptr. */ @@ -847,9 +845,8 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set, tsk, ksig->sig, NULL, (unsigned long)ksig->ka.sa.sa_handler, msr); - } else #endif - { + } else { err |= __put_user(0, &frame->uc.uc_link); prepare_setup_sigcontext(tsk, 1); err |= setup_sigcontext(&frame->uc.uc_mcontext, tsk, ksig->sig, -- 2.29.0
[PATCH v2 0/8] Improve signal performance on PPC64 with KUAP
As reported by Anton, there is a large penalty to signal handling performance on radix systems using KUAP. The signal handling code performs many user access operations, each of which needs to switch the KUAP permissions bit to open and then close user access. This involves a costly 'mtspr' operation [0]. There is existing work done on x86 and by Christopher Leroy for PPC32 to instead open up user access in "blocks" using user_*_access_{begin,end}. We can do the same in PPC64 to bring performance back up on KUAP-enabled radix systems. This series applies on top of Christophe Leroy's work for PPC32 [1] (I'm sure patchwork won't be too happy about that). The first two patches add some needed 'unsafe' versions of copy-from functions. While these do not make use of asm-goto they still allow for avoiding the repeated uaccess switches. The third patch moves functions called by setup_sigcontext() into a new prepare_setup_sigcontext() to simplify converting setup_sigcontext() into an 'unsafe' version which assumes an open uaccess window later. The fourth patch cleans-up some of the Transactional Memory ifdef stuff to simplify using uaccess blocks later. The next two patches rewrite some of the signal64 helper functions to be 'unsafe'. Finally, the last two patches update the main signal handling functions to make use of the new 'unsafe' helpers and eliminate some additional uaccess switching. I used the will-it-scale signal1 benchmark to measure and compare performance [2]. The below results are from a P9 Blackbird system. Note that currently hash does not support KUAP and is therefore used as the "baseline" comparison. Bigger numbers are better: signal1_threads -t1 -s10 | | hash | radix | | --- | -- | -- | | linuxppc/next | 289014 | 158408 | | unsafe-signal64 | 298506 | 253053 | [0]: https://github.com/linuxppc/issues/issues/277 [1]: https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=196278 [2]: https://github.com/antonblanchard/will-it-scale/blob/master/tests/signal1.c v2: * Rebase on latest linuxppc/next + Christophe Leroy's PPC32 signal series * Simplify/remove TM ifdefery similar to PPC32 series and clean up the uaccess begin/end calls * Isolate non-inline functions so they are not called when uaccess window is open Christopher M. Riedl (6): powerpc/uaccess: Add unsafe_copy_from_user powerpc/signal: Add unsafe_copy_{vsx,fpr}_from_user() powerpc/signal64: Move non-inline functions out of setup_sigcontext() powerpc/signal64: Remove TM ifdefery in middle of if/else block powerpc/signal64: Replace setup_sigcontext() w/ unsafe_setup_sigcontext() powerpc/signal64: Replace restore_sigcontext() w/ unsafe_restore_sigcontext() Daniel Axtens (2): powerpc/signal64: Rewrite handle_rt_signal64() to minimise uaccess switches powerpc/signal64: Rewrite rt_sigreturn() to minimise uaccess switches arch/powerpc/include/asm/uaccess.h | 28 ++-- arch/powerpc/kernel/signal.h | 33 arch/powerpc/kernel/signal_64.c| 239 ++--- 3 files changed, 201 insertions(+), 99 deletions(-) -- 2.29.0
[PATCH v2 8/8] powerpc/signal64: Rewrite rt_sigreturn() to minimise uaccess switches
From: Daniel Axtens Add uaccess blocks and use the 'unsafe' versions of functions doing user access where possible to reduce the number of times uaccess has to be opened/closed. Signed-off-by: Daniel Axtens Co-developed-by: Christopher M. Riedl Signed-off-by: Christopher M. Riedl --- arch/powerpc/kernel/signal_64.c | 24 ++-- 1 file changed, 14 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c index d17f2d5436d2..82e68a508e5c 100644 --- a/arch/powerpc/kernel/signal_64.c +++ b/arch/powerpc/kernel/signal_64.c @@ -784,8 +784,11 @@ SYSCALL_DEFINE0(rt_sigreturn) regs->msr &= ~MSR_TS_MASK; #endif - if (__get_user(msr, &uc->uc_mcontext.gp_regs[PT_MSR])) + if (!user_read_access_begin(uc, sizeof(*uc))) goto badframe; + + unsafe_get_user(msr, &uc->uc_mcontext.gp_regs[PT_MSR], badframe_block); + if (MSR_TM_ACTIVE(msr)) { #ifdef CONFIG_PPC_TRANSACTIONAL_MEM /* We recheckpoint on return. */ @@ -793,10 +796,12 @@ SYSCALL_DEFINE0(rt_sigreturn) /* Trying to start TM on non TM system */ if (!cpu_has_feature(CPU_FTR_TM)) - goto badframe; + goto badframe_block; + + unsafe_get_user(uc_transact, &uc->uc_link, badframe_block); + + user_read_access_end(); - if (__get_user(uc_transact, &uc->uc_link)) - goto badframe; if (restore_tm_sigcontexts(current, &uc->uc_mcontext, &uc_transact->uc_mcontext)) goto badframe; @@ -815,12 +820,9 @@ SYSCALL_DEFINE0(rt_sigreturn) * causing a TM bad thing. */ current->thread.regs->msr &= ~MSR_TS_MASK; - if (!user_read_access_begin(uc, sizeof(*uc))) - return -EFAULT; - if (__unsafe_restore_sigcontext(current, NULL, 1, &uc->uc_mcontext)) { - user_read_access_end(); - goto badframe; - } + unsafe_restore_sigcontext(current, NULL, 1, &uc->uc_mcontext, + badframe_block); + user_read_access_end(); } @@ -830,6 +832,8 @@ SYSCALL_DEFINE0(rt_sigreturn) set_thread_flag(TIF_RESTOREALL); return 0; +badframe_block: + user_read_access_end(); badframe: signal_fault(current, regs, "rt_sigreturn", uc); -- 2.29.0
[PATCH v2 6/8] powerpc/signal64: Replace restore_sigcontext() w/ unsafe_restore_sigcontext()
Previously restore_sigcontext() performed a costly KUAP switch on every uaccess operation. These repeated uaccess switches cause a significant drop in signal handling performance. Rewrite restore_sigcontext() to assume that a userspace read access window is open. Replace all uaccess functions with their 'unsafe' versions which avoid the repeated uaccess switches. Signed-off-by: Christopher M. Riedl --- arch/powerpc/kernel/signal_64.c | 68 - 1 file changed, 41 insertions(+), 27 deletions(-) diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c index 3f25309826b6..d72153825719 100644 --- a/arch/powerpc/kernel/signal_64.c +++ b/arch/powerpc/kernel/signal_64.c @@ -326,14 +326,14 @@ static long setup_tm_sigcontexts(struct sigcontext __user *sc, /* * Restore the sigcontext from the signal frame. */ - -static long restore_sigcontext(struct task_struct *tsk, sigset_t *set, int sig, - struct sigcontext __user *sc) +#define unsafe_restore_sigcontext(tsk, set, sig, sc, e) \ + unsafe_op_wrap(__unsafe_restore_sigcontext(tsk, set, sig, sc), e) +static long notrace __unsafe_restore_sigcontext(struct task_struct *tsk, sigset_t *set, + int sig, struct sigcontext __user *sc) { #ifdef CONFIG_ALTIVEC elf_vrreg_t __user *v_regs; #endif - unsigned long err = 0; unsigned long save_r13 = 0; unsigned long msr; struct pt_regs *regs = tsk->thread.regs; @@ -348,27 +348,28 @@ static long restore_sigcontext(struct task_struct *tsk, sigset_t *set, int sig, save_r13 = regs->gpr[13]; /* copy the GPRs */ - err |= __copy_from_user(regs->gpr, sc->gp_regs, sizeof(regs->gpr)); - err |= __get_user(regs->nip, &sc->gp_regs[PT_NIP]); + unsafe_copy_from_user(regs->gpr, sc->gp_regs, sizeof(regs->gpr), + efault_out); + unsafe_get_user(regs->nip, &sc->gp_regs[PT_NIP], efault_out); /* get MSR separately, transfer the LE bit if doing signal return */ - err |= __get_user(msr, &sc->gp_regs[PT_MSR]); + unsafe_get_user(msr, &sc->gp_regs[PT_MSR], efault_out); if (sig) regs->msr = (regs->msr & ~MSR_LE) | (msr & MSR_LE); - err |= __get_user(regs->orig_gpr3, &sc->gp_regs[PT_ORIG_R3]); - err |= __get_user(regs->ctr, &sc->gp_regs[PT_CTR]); - err |= __get_user(regs->link, &sc->gp_regs[PT_LNK]); - err |= __get_user(regs->xer, &sc->gp_regs[PT_XER]); - err |= __get_user(regs->ccr, &sc->gp_regs[PT_CCR]); + unsafe_get_user(regs->orig_gpr3, &sc->gp_regs[PT_ORIG_R3], efault_out); + unsafe_get_user(regs->ctr, &sc->gp_regs[PT_CTR], efault_out); + unsafe_get_user(regs->link, &sc->gp_regs[PT_LNK], efault_out); + unsafe_get_user(regs->xer, &sc->gp_regs[PT_XER], efault_out); + unsafe_get_user(regs->ccr, &sc->gp_regs[PT_CCR], efault_out); /* Don't allow userspace to set SOFTE */ set_trap_norestart(regs); - err |= __get_user(regs->dar, &sc->gp_regs[PT_DAR]); - err |= __get_user(regs->dsisr, &sc->gp_regs[PT_DSISR]); - err |= __get_user(regs->result, &sc->gp_regs[PT_RESULT]); + unsafe_get_user(regs->dar, &sc->gp_regs[PT_DAR], efault_out); + unsafe_get_user(regs->dsisr, &sc->gp_regs[PT_DSISR], efault_out); + unsafe_get_user(regs->result, &sc->gp_regs[PT_RESULT], efault_out); if (!sig) regs->gpr[13] = save_r13; if (set != NULL) - err |= __get_user(set->sig[0], &sc->oldmask); + unsafe_get_user(set->sig[0], &sc->oldmask, efault_out); /* * Force reload of FP/VEC. @@ -378,29 +379,28 @@ static long restore_sigcontext(struct task_struct *tsk, sigset_t *set, int sig, regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC | MSR_VSX); #ifdef CONFIG_ALTIVEC - err |= __get_user(v_regs, &sc->v_regs); - if (err) - return err; + unsafe_get_user(v_regs, &sc->v_regs, efault_out); if (v_regs && !access_ok(v_regs, 34 * sizeof(vector128))) return -EFAULT; /* Copy 33 vec registers (vr0..31 and vscr) from the stack */ if (v_regs != NULL && (msr & MSR_VEC) != 0) { - err |= __copy_from_user(&tsk->thread.vr_state, v_regs, - 33 * sizeof(vector128)); + unsafe_copy_from_user(&tsk->thread.vr_state, v_regs, + 33 * sizeof(vector128), efault_out); tsk->thread.used_vr = true; } else if (
[PATCH v2 1/8] powerpc/uaccess: Add unsafe_copy_from_user
Implement raw_copy_from_user_allowed() which assumes that userspace read access is open. Use this new function to implement raw_copy_from_user(). Finally, wrap the new function to follow the usual "unsafe_" convention of taking a label argument. The new raw_copy_from_user_allowed() calls __copy_tofrom_user() internally, but this is still safe to call in user access blocks formed with user_*_access_begin()/user_*_access_end() since asm functions are not instrumented for tracing. Signed-off-by: Christopher M. Riedl --- arch/powerpc/include/asm/uaccess.h | 28 +++- 1 file changed, 19 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h index ef5bbb705c08..96b4abab4f5a 100644 --- a/arch/powerpc/include/asm/uaccess.h +++ b/arch/powerpc/include/asm/uaccess.h @@ -403,38 +403,45 @@ raw_copy_in_user(void __user *to, const void __user *from, unsigned long n) } #endif /* __powerpc64__ */ -static inline unsigned long raw_copy_from_user(void *to, - const void __user *from, unsigned long n) +static inline unsigned long +raw_copy_from_user_allowed(void *to, const void __user *from, unsigned long n) { - unsigned long ret; if (__builtin_constant_p(n) && (n <= 8)) { - ret = 1; + unsigned long ret = 1; switch (n) { case 1: barrier_nospec(); - __get_user_size(*(u8 *)to, from, 1, ret); + __get_user_size_allowed(*(u8 *)to, from, 1, ret); break; case 2: barrier_nospec(); - __get_user_size(*(u16 *)to, from, 2, ret); + __get_user_size_allowed(*(u16 *)to, from, 2, ret); break; case 4: barrier_nospec(); - __get_user_size(*(u32 *)to, from, 4, ret); + __get_user_size_allowed(*(u32 *)to, from, 4, ret); break; case 8: barrier_nospec(); - __get_user_size(*(u64 *)to, from, 8, ret); + __get_user_size_allowed(*(u64 *)to, from, 8, ret); break; } if (ret == 0) return 0; } + return __copy_tofrom_user((__force void __user *)to, from, n); +} + +static inline unsigned long +raw_copy_from_user(void *to, const void __user *from, unsigned long n) +{ + unsigned long ret; + barrier_nospec(); allow_read_from_user(from, n); - ret = __copy_tofrom_user((__force void __user *)to, from, n); + ret = raw_copy_from_user_allowed(to, from, n); prevent_read_from_user(from, n); return ret; } @@ -542,6 +549,9 @@ user_write_access_begin(const void __user *ptr, size_t len) #define unsafe_get_user(x, p, e) unsafe_op_wrap(__get_user_allowed(x, p), e) #define unsafe_put_user(x, p, e) __put_user_goto(x, p, e) +#define unsafe_copy_from_user(d, s, l, e) \ + unsafe_op_wrap(raw_copy_from_user_allowed(d, s, l), e) + #define unsafe_copy_to_user(d, s, l, e) \ do { \ u8 __user *_dst = (u8 __user *)(d); \ -- 2.29.0
[PATCH v2 5/8] powerpc/signal64: Replace setup_sigcontext() w/ unsafe_setup_sigcontext()
Previously setup_sigcontext() performed a costly KUAP switch on every uaccess operation. These repeated uaccess switches cause a significant drop in signal handling performance. Rewrite setup_sigcontext() to assume that a userspace write access window is open. Replace all uaccess functions with their 'unsafe' versions which avoid the repeated uaccess switches. Signed-off-by: Christopher M. Riedl --- arch/powerpc/kernel/signal_64.c | 70 - 1 file changed, 43 insertions(+), 27 deletions(-) diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c index d3e9519b2e62..3f25309826b6 100644 --- a/arch/powerpc/kernel/signal_64.c +++ b/arch/powerpc/kernel/signal_64.c @@ -101,9 +101,13 @@ static void prepare_setup_sigcontext(struct task_struct *tsk, int ctx_has_vsx_re * Set up the sigcontext for the signal frame. */ -static long setup_sigcontext(struct sigcontext __user *sc, - struct task_struct *tsk, int signr, sigset_t *set, - unsigned long handler, int ctx_has_vsx_region) +#define unsafe_setup_sigcontext(sc, tsk, signr, set, handler, \ + ctx_has_vsx_region, e) \ + unsafe_op_wrap(__unsafe_setup_sigcontext(sc, tsk, signr, set, \ + handler, ctx_has_vsx_region), e) +static long notrace __unsafe_setup_sigcontext(struct sigcontext __user *sc, + struct task_struct *tsk, int signr, sigset_t *set, + unsigned long handler, int ctx_has_vsx_region) { /* When CONFIG_ALTIVEC is set, we _always_ setup v_regs even if the * process never used altivec yet (MSR_VEC is zero in pt_regs of @@ -118,20 +122,19 @@ static long setup_sigcontext(struct sigcontext __user *sc, #endif struct pt_regs *regs = tsk->thread.regs; unsigned long msr = regs->msr; - long err = 0; /* Force usr to alway see softe as 1 (interrupts enabled) */ unsigned long softe = 0x1; BUG_ON(tsk != current); #ifdef CONFIG_ALTIVEC - err |= __put_user(v_regs, &sc->v_regs); + unsafe_put_user(v_regs, &sc->v_regs, efault_out); /* save altivec registers */ if (tsk->thread.used_vr) { /* Copy 33 vec registers (vr0..31 and vscr) to the stack */ - err |= __copy_to_user(v_regs, &tsk->thread.vr_state, - 33 * sizeof(vector128)); + unsafe_copy_to_user(v_regs, &tsk->thread.vr_state, + 33 * sizeof(vector128), efault_out); /* set MSR_VEC in the MSR value in the frame to indicate that sc->v_reg) * contains valid data. */ @@ -140,12 +143,12 @@ static long setup_sigcontext(struct sigcontext __user *sc, /* We always copy to/from vrsave, it's 0 if we don't have or don't * use altivec. */ - err |= __put_user(tsk->thread.vrsave, (u32 __user *)&v_regs[33]); + unsafe_put_user(tsk->thread.vrsave, (u32 __user *)&v_regs[33], efault_out); #else /* CONFIG_ALTIVEC */ - err |= __put_user(0, &sc->v_regs); + unsafe_put_user(0, &sc->v_regs, efault_out); #endif /* CONFIG_ALTIVEC */ /* copy fpr regs and fpscr */ - err |= copy_fpr_to_user(&sc->fp_regs, tsk); + unsafe_copy_fpr_to_user(&sc->fp_regs, tsk, efault_out); /* * Clear the MSR VSX bit to indicate there is no valid state attached @@ -160,24 +163,27 @@ static long setup_sigcontext(struct sigcontext __user *sc, */ if (tsk->thread.used_vsr && ctx_has_vsx_region) { v_regs += ELF_NVRREG; - err |= copy_vsx_to_user(v_regs, tsk); + unsafe_copy_vsx_to_user(v_regs, tsk, efault_out); /* set MSR_VSX in the MSR value in the frame to * indicate that sc->vs_reg) contains valid data. */ msr |= MSR_VSX; } #endif /* CONFIG_VSX */ - err |= __put_user(&sc->gp_regs, &sc->regs); + unsafe_put_user(&sc->gp_regs, &sc->regs, efault_out); WARN_ON(!FULL_REGS(regs)); - err |= __copy_to_user(&sc->gp_regs, regs, GP_REGS_SIZE); - err |= __put_user(msr, &sc->gp_regs[PT_MSR]); - err |= __put_user(softe, &sc->gp_regs[PT_SOFTE]); - err |= __put_user(signr, &sc->signal); - err |= __put_user(handler, &sc->handler); + unsafe_copy_to_user(&sc->gp_regs, regs, GP_REGS_SIZE, efault_out); + unsafe_put_user(msr, &sc->gp_regs[PT_MSR], efault_out); + unsafe_put_user(softe, &sc->gp_regs[PT_SOFTE], efault_out); + unsafe_put_user(signr, &sc->signal, efault_out); +
[PATCH v2 7/8] powerpc/signal64: Rewrite handle_rt_signal64() to minimise uaccess switches
From: Daniel Axtens Add uaccess blocks and use the 'unsafe' versions of functions doing user access where possible to reduce the number of times uaccess has to be opened/closed. There is no 'unsafe' version of copy_siginfo_to_user, so move it slightly to allow for a "longer" uaccess block. Signed-off-by: Daniel Axtens Co-developed-by: Christopher M. Riedl Signed-off-by: Christopher M. Riedl --- arch/powerpc/kernel/signal_64.c | 54 + 1 file changed, 34 insertions(+), 20 deletions(-) diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c index d72153825719..d17f2d5436d2 100644 --- a/arch/powerpc/kernel/signal_64.c +++ b/arch/powerpc/kernel/signal_64.c @@ -848,44 +848,51 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set, unsigned long msr __maybe_unused = regs->msr; frame = get_sigframe(ksig, tsk, sizeof(*frame), 0); - if (!access_ok(frame, sizeof(*frame))) - goto badframe; - err |= __put_user(&frame->info, &frame->pinfo); - err |= __put_user(&frame->uc, &frame->puc); - err |= copy_siginfo_to_user(&frame->info, &ksig->info); - if (err) + /* This only applies when calling unsafe_setup_sigcontext() and must be +* called before opening the uaccess window. +*/ + if (!MSR_TM_ACTIVE(msr)) + prepare_setup_sigcontext(tsk, 1); + + if (!user_write_access_begin(frame, sizeof(*frame))) goto badframe; + unsafe_put_user(&frame->info, &frame->pinfo, badframe_block); + unsafe_put_user(&frame->uc, &frame->puc, badframe_block); + /* Create the ucontext. */ - err |= __put_user(0, &frame->uc.uc_flags); - err |= __save_altstack(&frame->uc.uc_stack, regs->gpr[1]); + unsafe_put_user(0, &frame->uc.uc_flags, badframe_block); + unsafe_save_altstack(&frame->uc.uc_stack, regs->gpr[1], badframe_block); if (MSR_TM_ACTIVE(msr)) { #ifdef CONFIG_PPC_TRANSACTIONAL_MEM /* The ucontext_t passed to userland points to the second * ucontext_t (for transactional state) with its uc_link ptr. */ - err |= __put_user(&frame->uc_transact, &frame->uc.uc_link); + unsafe_put_user(&frame->uc_transact, &frame->uc.uc_link, badframe_block); + + user_write_access_end(); + err |= setup_tm_sigcontexts(&frame->uc.uc_mcontext, &frame->uc_transact.uc_mcontext, tsk, ksig->sig, NULL, (unsigned long)ksig->ka.sa.sa_handler, msr); + + if (!user_write_access_begin(frame, sizeof(struct rt_sigframe))) + goto badframe; + #endif } else { - err |= __put_user(0, &frame->uc.uc_link); - prepare_setup_sigcontext(tsk, 1); - if (!user_write_access_begin(frame, sizeof(struct rt_sigframe))) - return -EFAULT; - err |= __unsafe_setup_sigcontext(&frame->uc.uc_mcontext, tsk, - ksig->sig, NULL, - (unsigned long)ksig->ka.sa.sa_handler, 1); - user_write_access_end(); + unsafe_put_user(0, &frame->uc.uc_link, badframe_block); + unsafe_setup_sigcontext(&frame->uc.uc_mcontext, tsk, ksig->sig, + NULL, (unsigned long)ksig->ka.sa.sa_handler, + 1, badframe_block); } - err |= __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set)); - if (err) - goto badframe; + + unsafe_copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set), badframe_block); + user_write_access_end(); /* Make sure signal handler doesn't get spurious FP exceptions */ tsk->thread.fp_state.fpscr = 0; @@ -900,6 +907,11 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set, regs->nip = (unsigned long) &frame->tramp[0]; } + + /* Save the siginfo outside of the unsafe block. */ + if (copy_siginfo_to_user(&frame->info, &ksig->info)) + goto badframe; + /* Allocate a dummy caller frame for the signal handler. */ newsp = ((unsigned long)frame) - __SIGNAL_FRAMESIZE; err |= put_user(regs->gpr[1], (unsigned long __user *)newsp); @@ -939,6 +951,8 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set, return 0; +badframe_block: + user_write_access_end(); badframe: signal_fault(current, regs, "handle_rt_signal64", frame); -- 2.29.0
Re: [PATCH v5 10/10] powerpc/signal64: Use __get_user() to copy sigset_t
On Wed Feb 3, 2021 at 12:43 PM CST, Christopher M. Riedl wrote: > Usually sigset_t is exactly 8B which is a "trivial" size and does not > warrant using __copy_from_user(). Use __get_user() directly in > anticipation of future work to remove the trivial size optimizations > from __copy_from_user(). Calling __get_user() also results in a small > boost to signal handling throughput here. > > Signed-off-by: Christopher M. Riedl This patch triggered sparse warnings about 'different address spaces'. This minor fixup cleans that up: diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c index 42fdc4a7ff72..1dfda6403e14 100644 --- a/arch/powerpc/kernel/signal_64.c +++ b/arch/powerpc/kernel/signal_64.c @@ -97,7 +97,7 @@ static void prepare_setup_sigcontext(struct task_struct *tsk, int ctx_has_vsx_re #endif /* CONFIG_VSX */ } -static inline int get_user_sigset(sigset_t *dst, const sigset_t *src) +static inline int get_user_sigset(sigset_t *dst, const sigset_t __user *src) { if (sizeof(sigset_t) <= 8) return __get_user(dst->sig[0], &src->sig[0]);
[PATCH v2] powerpc64/idle: Fix SP offsets when saving GPRs
The idle entry/exit code saves/restores GPRs in the stack "red zone" (Protected Zone according to PowerPC64 ELF ABI v2). However, the offset used for the first GPR is incorrect and overwrites the back chain - the Protected Zone actually starts below the current SP. In practice this is probably not an issue, but it's still incorrect so fix it. Also expand the comments to explain why using the stack "red zone" instead of creating a new stackframe is appropriate here. Signed-off-by: Christopher M. Riedl --- arch/powerpc/kernel/idle_book3s.S | 138 -- 1 file changed, 73 insertions(+), 65 deletions(-) diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S index 22f249b6f58d..f9e6d83e6720 100644 --- a/arch/powerpc/kernel/idle_book3s.S +++ b/arch/powerpc/kernel/idle_book3s.S @@ -52,28 +52,32 @@ _GLOBAL(isa300_idle_stop_mayloss) std r1,PACAR1(r13) mflrr4 mfcrr5 - /* use stack red zone rather than a new frame for saving regs */ - std r2,-8*0(r1) - std r14,-8*1(r1) - std r15,-8*2(r1) - std r16,-8*3(r1) - std r17,-8*4(r1) - std r18,-8*5(r1) - std r19,-8*6(r1) - std r20,-8*7(r1) - std r21,-8*8(r1) - std r22,-8*9(r1) - std r23,-8*10(r1) - std r24,-8*11(r1) - std r25,-8*12(r1) - std r26,-8*13(r1) - std r27,-8*14(r1) - std r28,-8*15(r1) - std r29,-8*16(r1) - std r30,-8*17(r1) - std r31,-8*18(r1) - std r4,-8*19(r1) - std r5,-8*20(r1) + /* +* Use the stack red zone rather than a new frame for saving regs since +* in the case of no GPR loss the wakeup code branches directly back to +* the caller without deallocating the stack frame first. +*/ + std r2,-8*1(r1) + std r14,-8*2(r1) + std r15,-8*3(r1) + std r16,-8*4(r1) + std r17,-8*5(r1) + std r18,-8*6(r1) + std r19,-8*7(r1) + std r20,-8*8(r1) + std r21,-8*9(r1) + std r22,-8*10(r1) + std r23,-8*11(r1) + std r24,-8*12(r1) + std r25,-8*13(r1) + std r26,-8*14(r1) + std r27,-8*15(r1) + std r28,-8*16(r1) + std r29,-8*17(r1) + std r30,-8*18(r1) + std r31,-8*19(r1) + std r4,-8*20(r1) + std r5,-8*21(r1) /* 168 bytes */ PPC_STOP b . /* catch bugs */ @@ -89,8 +93,8 @@ _GLOBAL(isa300_idle_stop_mayloss) */ _GLOBAL(idle_return_gpr_loss) ld r1,PACAR1(r13) - ld r4,-8*19(r1) - ld r5,-8*20(r1) + ld r4,-8*20(r1) + ld r5,-8*21(r1) mtlrr4 mtcrr5 /* @@ -98,25 +102,25 @@ _GLOBAL(idle_return_gpr_loss) * from PACATOC. This could be avoided for that less common case * if KVM saved its r2. */ - ld r2,-8*0(r1) - ld r14,-8*1(r1) - ld r15,-8*2(r1) - ld r16,-8*3(r1) - ld r17,-8*4(r1) - ld r18,-8*5(r1) - ld r19,-8*6(r1) - ld r20,-8*7(r1) - ld r21,-8*8(r1) - ld r22,-8*9(r1) - ld r23,-8*10(r1) - ld r24,-8*11(r1) - ld r25,-8*12(r1) - ld r26,-8*13(r1) - ld r27,-8*14(r1) - ld r28,-8*15(r1) - ld r29,-8*16(r1) - ld r30,-8*17(r1) - ld r31,-8*18(r1) + ld r2,-8*1(r1) + ld r14,-8*2(r1) + ld r15,-8*3(r1) + ld r16,-8*4(r1) + ld r17,-8*5(r1) + ld r18,-8*6(r1) + ld r19,-8*7(r1) + ld r20,-8*8(r1) + ld r21,-8*9(r1) + ld r22,-8*10(r1) + ld r23,-8*11(r1) + ld r24,-8*12(r1) + ld r25,-8*13(r1) + ld r26,-8*14(r1) + ld r27,-8*15(r1) + ld r28,-8*16(r1) + ld r29,-8*17(r1) + ld r30,-8*18(r1) + ld r31,-8*19(r1) blr /* @@ -154,28 +158,32 @@ _GLOBAL(isa206_idle_insn_mayloss) std r1,PACAR1(r13) mflrr4 mfcrr5 - /* use stack red zone rather than a new frame for saving regs */ - std r2,-8*0(r1) - std r14,-8*1(r1) - std r15,-8*2(r1) - std r16,-8*3(r1) - std r17,-8*4(r1) - std r18,-8*5(r1) - std r19,-8*6(r1) - std r20,-8*7(r1) - std r21,-8*8(r1) - std r22,-8*9(r1) - std r23,-8*10(r1) - std r24,-8*11(r1) - std r25,-8*12(r1) - std r26,-8*13(r1) - std r27,-8*14(r1) - std r28,-8*15(r1) - std r29,-8*16(r1) - std r30,-8*17(r1) - std r31,-8*18(r1) - std r4,-8*
Re: [PATCH 2/8] powerpc/signal: Add unsafe_copy_{vsx,fpr}_from_user()
On Sat Feb 6, 2021 at 10:32 AM CST, Christophe Leroy wrote: > > > Le 20/10/2020 à 04:01, Christopher M. Riedl a écrit : > > On Fri Oct 16, 2020 at 10:48 AM CDT, Christophe Leroy wrote: > >> > >> > >> Le 15/10/2020 à 17:01, Christopher M. Riedl a écrit : > >>> Reuse the "safe" implementation from signal.c except for calling > >>> unsafe_copy_from_user() to copy into a local buffer. Unlike the > >>> unsafe_copy_{vsx,fpr}_to_user() functions the "copy from" functions > >>> cannot use unsafe_get_user() directly to bypass the local buffer since > >>> doing so significantly reduces signal handling performance. > >> > >> Why can't the functions use unsafe_get_user(), why does it significantly > >> reduces signal handling > >> performance ? How much significant ? I would expect that not going > >> through an intermediate memory > >> area would be more efficient > >> > > > > Here is a comparison, 'unsafe-signal64-regs' avoids the intermediate buffer: > > > > | | hash | radix | > > | | -- | -- | > > | linuxppc/next| 289014 | 158408 | > > | unsafe-signal64 | 298506 | 253053 | > > | unsafe-signal64-regs | 254898 | 220831 | > > > > I have not figured out the 'why' yet. As you mentioned in your series, > > technically calling __copy_tofrom_user() is overkill for these > > operations. The only obvious difference between unsafe_put_user() and > > unsafe_get_user() is that we don't have asm-goto for the 'get' variant. > > Instead we wrap with unsafe_op_wrap() which inserts a conditional and > > then goto to the label. > > > > Implemenations: > > > > #define unsafe_copy_fpr_from_user(task, from, label) do {\ > >struct task_struct *__t = task; \ > >u64 __user *buf = (u64 __user *)from; \ > >int i; \ > >\ > >for (i = 0; i < ELF_NFPREG - 1; i++)\ > >unsafe_get_user(__t->thread.TS_FPR(i), &buf[i], label); \ > >unsafe_get_user(__t->thread.fp_state.fpscr, &buf[i], label);\ > > } while (0) > > > > #define unsafe_copy_vsx_from_user(task, from, label) do {\ > >struct task_struct *__t = task; \ > >u64 __user *buf = (u64 __user *)from; \ > >int i; \ > >\ > >for (i = 0; i < ELF_NVSRHALFREG ; i++) \ > > > > unsafe_get_user(__t->thread.fp_state.fpr[i][TS_VSRLOWOFFSET], \ > >&buf[i], label);\ > > } while (0) > > > > Do you have CONFIG_PROVE_LOCKING or CONFIG_DEBUG_ATOMIC_SLEEP enabled in > your config ? I don't have these set in my config (ppc64le_defconfig). I think I figured this out - the reason for the lower signal throughput is the barrier_nospec() in __get_user_nocheck(). When looping we incur that cost on every iteration. Commenting it out results in signal performance of ~316K w/ hash on the unsafe-signal64-regs branch. Obviously the barrier is there for a reason but it is quite costly. This also explains why the copy_{fpr,vsx}_to_user() direction does not suffer from the slowdown because there is no need for barrier_nospec(). > > If yes, could you try together with the patch from Alexey > https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210204121612.32721-1-...@ozlabs.ru/ > ? > > Thanks > Christophe
Re: [PATCH 2/8] powerpc/signal: Add unsafe_copy_{vsx,fpr}_from_user()
On Sun Feb 7, 2021 at 4:12 AM CST, Christophe Leroy wrote: > > > Le 06/02/2021 à 18:39, Christopher M. Riedl a écrit : > > On Sat Feb 6, 2021 at 10:32 AM CST, Christophe Leroy wrote: > >> > >> > >> Le 20/10/2020 à 04:01, Christopher M. Riedl a écrit : > >>> On Fri Oct 16, 2020 at 10:48 AM CDT, Christophe Leroy wrote: > >>>> > >>>> > >>>> Le 15/10/2020 à 17:01, Christopher M. Riedl a écrit : > >>>>> Reuse the "safe" implementation from signal.c except for calling > >>>>> unsafe_copy_from_user() to copy into a local buffer. Unlike the > >>>>> unsafe_copy_{vsx,fpr}_to_user() functions the "copy from" functions > >>>>> cannot use unsafe_get_user() directly to bypass the local buffer since > >>>>> doing so significantly reduces signal handling performance. > >>>> > >>>> Why can't the functions use unsafe_get_user(), why does it significantly > >>>> reduces signal handling > >>>> performance ? How much significant ? I would expect that not going > >>>> through an intermediate memory > >>>> area would be more efficient > >>>> > >>> > >>> Here is a comparison, 'unsafe-signal64-regs' avoids the intermediate > >>> buffer: > >>> > >>> | | hash | radix | > >>> | | -- | -- | > >>> | linuxppc/next| 289014 | 158408 | > >>> | unsafe-signal64 | 298506 | 253053 | > >>> | unsafe-signal64-regs | 254898 | 220831 | > >>> > >>> I have not figured out the 'why' yet. As you mentioned in your series, > >>> technically calling __copy_tofrom_user() is overkill for these > >>> operations. The only obvious difference between unsafe_put_user() and > >>> unsafe_get_user() is that we don't have asm-goto for the 'get' variant. > >>> Instead we wrap with unsafe_op_wrap() which inserts a conditional and > >>> then goto to the label. > >>> > >>> Implemenations: > >>> > >>> #define unsafe_copy_fpr_from_user(task, from, label) do {\ > >>> struct task_struct *__t = task; \ > >>> u64 __user *buf = (u64 __user *)from; \ > >>> int i; \ > >>> \ > >>> for (i = 0; i < ELF_NFPREG - 1; i++)\ > >>> unsafe_get_user(__t->thread.TS_FPR(i), &buf[i], label); \ > >>> unsafe_get_user(__t->thread.fp_state.fpscr, &buf[i], label);\ > >>> } while (0) > >>> > >>> #define unsafe_copy_vsx_from_user(task, from, label) do {\ > >>> struct task_struct *__t = task; \ > >>> u64 __user *buf = (u64 __user *)from; \ > >>> int i; \ > >>> \ > >>> for (i = 0; i < ELF_NVSRHALFREG ; i++) \ > >>> > >>> unsafe_get_user(__t->thread.fp_state.fpr[i][TS_VSRLOWOFFSET], \ > >>> &buf[i], label);\ > >>> } while (0) > >>> > >> > >> Do you have CONFIG_PROVE_LOCKING or CONFIG_DEBUG_ATOMIC_SLEEP enabled in > >> your config ? > > > > I don't have these set in my config (ppc64le_defconfig). I think I > > figured this out - the reason for the lower signal throughput is the > > barrier_nospec() in __get_user_nocheck(). When looping we incur that > > cost on every iteration. Commenting it out results in signal performance > > of ~316K w/ hash on the unsafe-signal64-regs branch. Obviously the > > barrier is there for a reason but it is quite costly. > > Interesting. > > Can you try with the patch I just sent out > https://patchwork.ozlabs.org/project/linuxppc-dev/patch/c72f014730823b413528e90ab6c4d3bcb79f8497.1612692067.git.christophe.le...@csgroup.eu/ Yeah that patch solves the problem. Using unsafe_get_user() in a loop is actually faster on radix than using the intermediary buffer step. A summary of results below (unsafe-signal64-v6 uses unsafe_get_user() and avoids the local buffer): | | hash | radix | | | -- | -- | | unsafe-signal64-v5 | 194533 | 230089 | | unsafe-signal64-v6 | 176739 | 202840 | | unsafe-signal64-v5+barrier patch | 203037 | 234936 | | unsafe-signal64-v6+barrier patch | 205484 | 241030 | I am still expecting some comments/feedback on my v5 before sending out v6. Should I include your patch in my series as well? > > Thanks > Christophe
Re: [PATCH v5 10/10] powerpc/signal64: Use __get_user() to copy sigset_t
On Tue Feb 9, 2021 at 3:45 PM CST, Christophe Leroy wrote: > "Christopher M. Riedl" a écrit : > > > Usually sigset_t is exactly 8B which is a "trivial" size and does not > > warrant using __copy_from_user(). Use __get_user() directly in > > anticipation of future work to remove the trivial size optimizations > > from __copy_from_user(). Calling __get_user() also results in a small > > boost to signal handling throughput here. > > > > Signed-off-by: Christopher M. Riedl > > --- > > arch/powerpc/kernel/signal_64.c | 14 -- > > 1 file changed, 12 insertions(+), 2 deletions(-) > > > > diff --git a/arch/powerpc/kernel/signal_64.c > > b/arch/powerpc/kernel/signal_64.c > > index 817b64e1e409..42fdc4a7ff72 100644 > > --- a/arch/powerpc/kernel/signal_64.c > > +++ b/arch/powerpc/kernel/signal_64.c > > @@ -97,6 +97,14 @@ static void prepare_setup_sigcontext(struct > > task_struct *tsk, int ctx_has_vsx_re > > #endif /* CONFIG_VSX */ > > } > > > > +static inline int get_user_sigset(sigset_t *dst, const sigset_t *src) > > Should be called __get_user_sigset() as it is a helper for __get_user() Ok makes sense. > > > +{ > > + if (sizeof(sigset_t) <= 8) > > We should always use __get_user(), see below. > > > + return __get_user(dst->sig[0], &src->sig[0]); > > I think the above will not work on ppc32, it will only copy 4 bytes. > You must cast the source to u64* Well this is signal_64.c :) Looks like ppc32 needs the same thing so I'll just move this into signal.h and use it for both. The only exception would be the COMPAT case in signal_32.c which ends up calling the common get_compat_sigset(). Updating that is probably outside the scope of this series. > > > + else > > + return __copy_from_user(dst, src, sizeof(sigset_t)); > > I see no point in keeping this alternative. Today sigset_ t is fixed. > If you fear one day someone might change it to something different > than a u64, just add a BUILD_BUG_ON(sizeof(sigset_t) != sizeof(u64)); Ah yes that is much better - thanks for the suggestion. > > > +} > > + > > /* > > * Set up the sigcontext for the signal frame. > > */ > > @@ -701,8 +709,9 @@ SYSCALL_DEFINE3(swapcontext, struct ucontext > > __user *, old_ctx, > > * We kill the task with a SIGSEGV in this situation. > > */ > > > > - if (__copy_from_user(&set, &new_ctx->uc_sigmask, sizeof(set))) > > + if (get_user_sigset(&set, &new_ctx->uc_sigmask)) > > do_exit(SIGSEGV); > > + > > This white space is not part of the change, keep patches to the > minimum, avoid cosmetic Just a (bad?) habit on my part that I missed - I'll remove this one and the one further below. > > > set_current_blocked(&set); > > > > if (!user_read_access_begin(new_ctx, ctx_size)) > > @@ -740,8 +749,9 @@ SYSCALL_DEFINE0(rt_sigreturn) > > if (!access_ok(uc, sizeof(*uc))) > > goto badframe; > > > > - if (__copy_from_user(&set, &uc->uc_sigmask, sizeof(set))) > > + if (get_user_sigset(&set, &uc->uc_sigmask)) > > goto badframe; > > + > > Same > > > set_current_blocked(&set); > > > > #ifdef CONFIG_PPC_TRANSACTIONAL_MEM > > -- > > 2.26.1