from:"Josh Triplett"

Re: linux-next: manual merge of the tty tree with the input tree

2013-01-28 Thread Josh Triplett

On Mon, Jan 28, 2013 at 02:44:43PM -0800, Dmitry Torokhov wrote:
> On Mon, Jan 28, 2013 at 02:09:31PM -0800, Joe Millenbach wrote:
> > On Mon, Jan 28, 2013 at 9:33 AM, Dmitry Torokhov
> >  wrote:
> > > On Mon, Jan 28, 2013 at 06:46:15AM -0800, Greg KH wrote:
> > >> On Mon, Jan 28, 2013 at 08:44:24PM +1100, Stephen Rothwell wrote:
> > >> > Hi Greg,
> > >> >
> > >> > Today's linux-next merge of the tty tree got a conflict in
> > >> > drivers/input/keyboard/Kconfig between commit 6f2ac009f29b ("Input:
> > >> > goldfish - virtual input event driver") from the input tree and commit
> > >> > 4f73bc4dd3e8 ("tty: Added a CONFIG_TTY option to allow removal of TTY")
> > >> > from the tty tree.
> > >> >
> > >> > I fixed it up (see below - I am not sure if GOLDFISH_EVENTS needs TTY 
> > >> > or
> > >> > not) and can carry the fix as necessary (no action is required).
> > >> >
> > >> > --
> > >> > Cheers,
> > >> > Stephen Rothwells...@canb.auug.org.au
> > >> >
> > >> > diff --cc drivers/input/keyboard/Kconfig
> > >> > index 078305e,008f96a..000
> > >> > --- a/drivers/input/keyboard/Kconfig
> > >> > +++ b/drivers/input/keyboard/Kconfig
> > >> > @@@ -479,16 -482,8 +482,18 @@@ config KEYBOARD_SAMSUN
> > >> >   To compile this driver as a module, choose M here: the
> > >> >   module will be called samsung-keypad.
> > >> >
> > >> > + if TTY
> > >> > +
> > >> >  +config KEYBOARD_GOLDFISH_EVENTS
> > >> >  +  depends on GOLDFISH
> > >> >  +  tristate "Generic Input Event device for Goldfish"
> > >> >  +  help
> > >> >  +Say Y here to get an input event device for the Goldfish virtual
> > >>
> > >> Looks good, thanks.
> > >
> > > Greg,
> > >
> > > Please drop 4f73bc4dd3e8563ef4109f293a092820dff66d92, at least the parts
> > > related to input. As far as I know nothing except serport driver
> > > depends on tty and we do not need to introduce this kind of
> > > dependencie. Anyone needing slim config can simply try disabling
> > > input (or parts of it) without needing an artificial dependencies.
> > >
> > > Thanks.
> > >
> > > --
> > > Dmitry
> > 
> > Dmitry and Greg,
> > 
> > SERIO needs TTY, 
> 
> No it does not. There is only one (1) serio driver that needs the tty
> layer and that is serport.
> 
> > and the majority of the input changes are adding "depends on TTY" to
> > things that "select SERIO" as they break the dependency chain.  In
> > other words, enabling a component that selects SERIO will turn SERIO
> > on even when SERIO depends on TTY and TTY is disabled.
> 
> Except that it does not. Are you confusing SERIO with SERIAL by any
> chance?

A few serial drivers don't actually need the TTY layer.  However, most
do, including many that don't obviously appear to at first glance.  For
instance, MOUSE_PS2 doesn't *appear* to need TTY, but without the
dependency, having MOUSE_PS2 enabled and TTY disabled produces a kernel
that doesn't build.  (Also keep in mind that many other headers include
.)  Many of the drivers that don't actually need TTY
nonetheless won't typically appear on systems small enough to want to
compile out TTY.

In any case, it seems simple enough to whittle down dependencies on TTY
later on, but for a first pass, the conserative approach seems
preferable.

> In the future it would be nice if you CCed people involved in the
> subsystem you are changing.

Such as the maintainers of the TTY and serial layers?  Alan Cox OKed
this conservative addition of dependencies in the original version of
this change, and Greg had no complaints at the time.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: manual merge of the tty tree with the input tree

2013-01-28 Thread Josh Triplett

On Mon, Jan 28, 2013 at 04:23:57PM -0800, Dmitry Torokhov wrote:
> On Tue, Jan 29, 2013 at 10:59:17AM +1100, Josh Triplett wrote:
> > On Mon, Jan 28, 2013 at 02:44:43PM -0800, Dmitry Torokhov wrote:
> > > On Mon, Jan 28, 2013 at 02:09:31PM -0800, Joe Millenbach wrote:
> > > > On Mon, Jan 28, 2013 at 9:33 AM, Dmitry Torokhov
> > > >  wrote:
> > > > > On Mon, Jan 28, 2013 at 06:46:15AM -0800, Greg KH wrote:
> > > > >> On Mon, Jan 28, 2013 at 08:44:24PM +1100, Stephen Rothwell wrote:
> > > > >> > Hi Greg,
> > > > >> >
> > > > >> > Today's linux-next merge of the tty tree got a conflict in
> > > > >> > drivers/input/keyboard/Kconfig between commit 6f2ac009f29b ("Input:
> > > > >> > goldfish - virtual input event driver") from the input tree and 
> > > > >> > commit
> > > > >> > 4f73bc4dd3e8 ("tty: Added a CONFIG_TTY option to allow removal of 
> > > > >> > TTY")
> > > > >> > from the tty tree.
> > > > >> >
> > > > >> > I fixed it up (see below - I am not sure if GOLDFISH_EVENTS needs 
> > > > >> > TTY or
> > > > >> > not) and can carry the fix as necessary (no action is required).
> > > > >> >
> > > > >> > --
> > > > >> > Cheers,
> > > > >> > Stephen Rothwells...@canb.auug.org.au
> > > > >> >
> > > > >> > diff --cc drivers/input/keyboard/Kconfig
> > > > >> > index 078305e,008f96a..000
> > > > >> > --- a/drivers/input/keyboard/Kconfig
> > > > >> > +++ b/drivers/input/keyboard/Kconfig
> > > > >> > @@@ -479,16 -482,8 +482,18 @@@ config KEYBOARD_SAMSUN
> > > > >> >   To compile this driver as a module, choose M here: the
> > > > >> >   module will be called samsung-keypad.
> > > > >> >
> > > > >> > + if TTY
> > > > >> > +
> > > > >> >  +config KEYBOARD_GOLDFISH_EVENTS
> > > > >> >  +  depends on GOLDFISH
> > > > >> >  +  tristate "Generic Input Event device for Goldfish"
> > > > >> >  +  help
> > > > >> >  +Say Y here to get an input event device for the Goldfish 
> > > > >> > virtual
> > > > >>
> > > > >> Looks good, thanks.
> > > > >
> > > > > Greg,
> > > > >
> > > > > Please drop 4f73bc4dd3e8563ef4109f293a092820dff66d92, at least the 
> > > > > parts
> > > > > related to input. As far as I know nothing except serport driver
> > > > > depends on tty and we do not need to introduce this kind of
> > > > > dependencie. Anyone needing slim config can simply try disabling
> > > > > input (or parts of it) without needing an artificial dependencies.
> > > > >
> > > > > Thanks.
> > > > >
> > > > > --
> > > > > Dmitry
> > > > 
> > > > Dmitry and Greg,
> > > > 
> > > > SERIO needs TTY, 
> > > 
> > > No it does not. There is only one (1) serio driver that needs the tty
> > > layer and that is serport.
> > > 
> > > > and the majority of the input changes are adding "depends on TTY" to
> > > > things that "select SERIO" as they break the dependency chain.  In
> > > > other words, enabling a component that selects SERIO will turn SERIO
> > > > on even when SERIO depends on TTY and TTY is disabled.
> > > 
> > > Except that it does not. Are you confusing SERIO with SERIAL by any
> > > chance?
> > 
> > A few serial drivers don't actually need the TTY layer.  However, most
> 
> Can you please tell me why you are talking about serial layer here?

Probably terminology sloppiness.  Insert noun for "things that depend on
SERIO" here.

> > do, including many that don't obviously appear to at first glance.  For
> > instance, MOUSE_PS2 doesn't *appear* to need TTY, but without the
> > dependency, having MOUSE_PS2 enabled and TTY disabled produces a kernel
> > that doesn't build.
> 
> Compile log please.

http://marc.info/?l=linux-kernel&m=134555498507747&w=1

IIRC, produced by dropping the

Re: [tip:x86/debug] x86/EFI: Properly init-annotate BGRT code

2013-01-24 Thread Josh Triplett

On Thu, Jan 24, 2013 at 12:34:21PM -0800, tip-bot for Jan Beulich wrote:
> Commit-ID:  13f0e4d2b9e2209f13d5a4122478eb79e6136870
> Gitweb: http://git.kernel.org/tip/13f0e4d2b9e2209f13d5a4122478eb79e6136870
> Author: Jan Beulich 
> AuthorDate: Fri, 23 Nov 2012 16:30:07 +
> Committer:  Ingo Molnar 
> CommitDate: Thu, 24 Jan 2013 17:12:18 +0100
> 
> x86/EFI: Properly init-annotate BGRT code
> 
> These items are only ever referenced from initialization code.

Not true, and this patch will break the BGRT code.  bgrt_init, which
does indeed have an __init annotation, stores bgrt_image and
bgrt_image_size into the .private and .size fields of a sysfs
bin_attribute, which does *not* have an __initdata annotation, and which
will get read whenever the user reads the corresponding sysfs attribute.

> Signed-off-by: Jan Beulich 
> Cc: 
> Link: http://lkml.kernel.org/r/50afb29f0278000aa...@nat28.tlf.novell.com
> Signed-off-by: Ingo Molnar 
> ---
>  arch/x86/platform/efi/efi-bgrt.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/platform/efi/efi-bgrt.c 
> b/arch/x86/platform/efi/efi-bgrt.c
> index d9c1b95..7145ec6 100644
> --- a/arch/x86/platform/efi/efi-bgrt.c
> +++ b/arch/x86/platform/efi/efi-bgrt.c
> @@ -11,20 +11,21 @@
>   * published by the Free Software Foundation.
>   */
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
>  
>  struct acpi_table_bgrt *bgrt_tab;
> -void *bgrt_image;
> -size_t bgrt_image_size;
> +void *__initdata bgrt_image;
> +size_t __initdata bgrt_image_size;
>  
>  struct bmp_header {
>   u16 id;
>   u32 size;
>  } __packed;
>  
> -void efi_bgrt_init(void)
> +void __init efi_bgrt_init(void)
>  {
>   acpi_status status;
>   void __iomem *image;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [tip:x86/debug] x86/EFI: Properly init-annotate BGRT code

2013-01-25 Thread Josh Triplett

On Fri, Jan 25, 2013 at 07:45:42AM +, Jan Beulich wrote:
> >>> On 24.01.13 at 23:28, Josh Triplett  wrote:
> > On Thu, Jan 24, 2013 at 12:34:21PM -0800, tip-bot for Jan Beulich wrote:
> >> Commit-ID:  13f0e4d2b9e2209f13d5a4122478eb79e6136870
> >> Gitweb: 
> > http://git.kernel.org/tip/13f0e4d2b9e2209f13d5a4122478eb79e6136870 
> >> Author: Jan Beulich 
> >> AuthorDate: Fri, 23 Nov 2012 16:30:07 +
> >> Committer:  Ingo Molnar 
> >> CommitDate: Thu, 24 Jan 2013 17:12:18 +0100
> >> 
> >> x86/EFI: Properly init-annotate BGRT code
> >> 
> >> These items are only ever referenced from initialization code.
> > 
> > Not true, and this patch will break the BGRT code.  bgrt_init, which
> > does indeed have an __init annotation, stores bgrt_image and
> > bgrt_image_size into the .private and .size fields of a sysfs
> > bin_attribute, which does *not* have an __initdata annotation, and which
> > will get read whenever the user reads the corresponding sysfs attribute.
> 
> Copying init-only data into a sysfs structure is no problem at all
> - that structure obviously is non-__initdata and hence can be
> read at any time. It was a different thing if .private and/or .size
> stored _pointers_ to one of the two variables in question.

Ah, I see; the data itself gets kmalloc'd, and you just want to discard
the original pointer and size.  Fair enough.  Sorry for the false alarm.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [tip:x86/debug] x86/EFI: Properly init-annotate BGRT code

2013-01-25 Thread Josh Triplett

On Fri, Jan 25, 2013 at 07:36:57PM +0100, Ingo Molnar wrote:
> 
> * Josh Triplett  wrote:
> 
> > On Fri, Jan 25, 2013 at 07:45:42AM +, Jan Beulich wrote:
> > > >>> On 24.01.13 at 23:28, Josh Triplett  wrote:
> > > > On Thu, Jan 24, 2013 at 12:34:21PM -0800, tip-bot for Jan Beulich wrote:
> > > >> Commit-ID:  13f0e4d2b9e2209f13d5a4122478eb79e6136870
> > > >> Gitweb: 
> > > > http://git.kernel.org/tip/13f0e4d2b9e2209f13d5a4122478eb79e6136870 
> > > >> Author: Jan Beulich 
> > > >> AuthorDate: Fri, 23 Nov 2012 16:30:07 +
> > > >> Committer:  Ingo Molnar 
> > > >> CommitDate: Thu, 24 Jan 2013 17:12:18 +0100
> > > >> 
> > > >> x86/EFI: Properly init-annotate BGRT code
> > > >> 
> > > >> These items are only ever referenced from initialization code.
> > > > 
> > > > Not true, and this patch will break the BGRT code.  bgrt_init, which
> > > > does indeed have an __init annotation, stores bgrt_image and
> > > > bgrt_image_size into the .private and .size fields of a sysfs
> > > > bin_attribute, which does *not* have an __initdata annotation, and which
> > > > will get read whenever the user reads the corresponding sysfs attribute.
> > > 
> > > Copying init-only data into a sysfs structure is no problem at all
> > > - that structure obviously is non-__initdata and hence can be
> > > read at any time. It was a different thing if .private and/or .size
> > > stored _pointers_ to one of the two variables in question.
> > 
> > Ah, I see; the data itself gets kmalloc'd, and you just want 
> > to discard the original pointer and size.  Fair enough.  Sorry 
> > for the false alarm.
> 
> Ok - thanks for the clarification - I'll keep the commit as-is, 
> agreed?
> 
> Thanks,

Yeah.  In fact:

Reviewed-by: Josh Triplett 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 4/4] rcu: Make rcutorture's shuffler task shuffle recently added tasks

2013-01-27 Thread Josh Triplett

On Sat, Jan 26, 2013 at 04:05:20PM -0800, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> A number of kthreads have been added to rcutorture, but the shuffler
> task was not informed of them, and thus did not shuffle them.  This
> commit therefore adds the requisite shuffling.
> 
> Signed-off-by: Paul E. McKenney 

This also makes an unrelated semantic change, and several unrelated
whitespace changes.

> ---
>  kernel/rcutorture.c |   24 
>  1 files changed, 20 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/rcutorture.c b/kernel/rcutorture.c
> index a583f1c..3ebc8bf 100644
> --- a/kernel/rcutorture.c
> +++ b/kernel/rcutorture.c
> @@ -846,7 +846,7 @@ static int rcu_torture_boost(void *arg)
>   /* Wait for the next test interval. */
>   oldstarttime = boost_starttime;
>   while (ULONG_CMP_LT(jiffies, oldstarttime)) {
> - schedule_timeout_uninterruptible(1);
> + schedule_timeout_interruptible(oldstarttime - jiffies);

This change doesn't seem related, and the commit message doesn't explain
it either.  Could you split it out into a separate commit and document
the rationale, please?

>   rcu_stutter_wait("rcu_torture_boost");
>   if (kthread_should_stop() ||
>   fullstop != FULLSTOP_DONTSTOP)
> @@ -1318,19 +1318,35 @@ static void rcu_torture_shuffle_tasks(void)
>   set_cpus_allowed_ptr(reader_tasks[i],
>shuffle_tmp_mask);
>   }
> -
>   if (fakewriter_tasks) {
>   for (i = 0; i < nfakewriters; i++)
>   if (fakewriter_tasks[i])
>   set_cpus_allowed_ptr(fakewriter_tasks[i],
>shuffle_tmp_mask);
>   }
> -
>   if (writer_task)
>   set_cpus_allowed_ptr(writer_task, shuffle_tmp_mask);
> -

These three whitespace changes seem unrelated as well.

>   if (stats_task)
>   set_cpus_allowed_ptr(stats_task, shuffle_tmp_mask);
> + if (stutter_task)
> + set_cpus_allowed_ptr(stutter_task, shuffle_tmp_mask);
> + if (fqs_task)
> + set_cpus_allowed_ptr(fqs_task, shuffle_tmp_mask);
> + if (shutdown_task)
> + set_cpus_allowed_ptr(shutdown_task, shuffle_tmp_mask);
> +#ifdef CONFIG_HOTPLUG_CPU
> + if (onoff_task)
> + set_cpus_allowed_ptr(onoff_task, shuffle_tmp_mask);
> +#endif /* #ifdef CONFIG_HOTPLUG_CPU */
> + if (stall_task)
> + set_cpus_allowed_ptr(stall_task, shuffle_tmp_mask);
> + if (barrier_cbs_tasks)
> + for (i = 0; i < n_barrier_cbs; i++)
> + if (barrier_cbs_tasks[i])
> + set_cpus_allowed_ptr(barrier_cbs_tasks[i],
> +  shuffle_tmp_mask);
> + if (barrier_task)
> + set_cpus_allowed_ptr(barrier_task, shuffle_tmp_mask);

The rest of this seems fine.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 2/2] rcu: Trace callback acceleration

2013-01-27 Thread Josh Triplett

You probably don't want to use --chain-reply-to; that makes patch N a
reply to patch N-1 rather than to the cover letter, which creates much
deeper and harder to follow threads.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 1/2] rcu: Provide RCU CPU stall warnings for tiny RCU

2013-01-27 Thread Josh Triplett

On Sat, Jan 26, 2013 at 04:23:46PM -0800, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> Tiny RCU has historically omitted RCU CPU stall warnings in order to
> reduce memory requirements, however, lack of these warnings caused
> Thomas Gleixner some debugging pain recently.  Therefore, this commit
> adds RCU CPU stall warnings to tiny RCU if RCU_TRACE=y.  This keeps
> the memory footprint small, while still enabling CPU stall warnings
> in kernels built to enable them.
> 
> Updated to include Josh Triplett's suggested use of RCU_STALL_COMMON
> config variable to simplify #if expressions.
> 
> Reported-by: Thomas Gleixner 
> Signed-off-by: Paul E. McKenney 
> Signed-off-by: Paul E. McKenney 

One suggestion below; with that change,
Reviewed-by: Josh Triplett 

> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -486,6 +486,14 @@ config PREEMPT_RCU
> This option enables preemptible-RCU code that is common between
> the TREE_PREEMPT_RCU and TINY_PREEMPT_RCU implementations.
>  
> +config RCU_STALL_COMMON
> + def_bool ( TREE_RCU || TREE_PREEMPT_RCU || RCU_TRACE )
> + help
> +   This option enables RCU CPU stall code that is common between
> +   the TINY and TREE variants of RCU.  The purpose is to allow
> +   the tiny variants to disable RCU CPU stall warnings, while
> +   making these warnings mandatory for the tree variants.
> +
[...]
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -970,7 +970,7 @@ config RCU_TORTURE_TEST_RUNNABLE
>  
>  config RCU_CPU_STALL_TIMEOUT
>   int "RCU CPU stall timeout in seconds"
> - depends on TREE_RCU || TREE_PREEMPT_RCU
> + depends on TREE_RCU || TREE_PREEMPT_RCU || RCU_TRACE

depends on RCU_STALL_COMMON
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 2/2] rcu: Allow TREE_PREEMPT_RCU on UP systems

2013-01-27 Thread Josh Triplett

On Sat, Jan 26, 2013 at 04:23:47PM -0800, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> The TINY_PREEMPT_RCU is complex, does not provide that much memory
> savings, and therefore TREE_PREEMPT_RCU should be used instead.  The
> systems where the difference between TINY_PREEMPT_RCU and TREE_PREEMPT_RCU
> are quite small compared to the memory footprint of CONFIG_PREEMPT.
> 
> This commit therefore takes a first step towards eliminating
> TINY_PREEMPT_RCU by allowing TREE_PREEMPT_RCU to be configured on !SMP
> systems.
> 
> Signed-off-by: Paul E. McKenney 

Reviewed-by: Josh Triplett 

> ---
>  init/Kconfig |4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/init/Kconfig b/init/Kconfig
> index a5e90e1..fb19b46 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -453,7 +453,7 @@ config TREE_RCU
>  
>  config TREE_PREEMPT_RCU
>   bool "Preemptible tree-based hierarchical RCU"
> - depends on PREEMPT && SMP
> + depends on PREEMPT
>   help
> This option selects the RCU implementation that is
> designed for very large SMP systems with hundreds or
> @@ -461,6 +461,8 @@ config TREE_PREEMPT_RCU
> is also required.  It also scales down nicely to
> smaller systems.
>  
> +   Select this option if you are unsure.
> +
>  config TINY_RCU
>   bool "UP-only small-memory-footprint RCU"
>   depends on !PREEMPT && !SMP
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 0/2] v2 Tiny RCU changes for 3.9

2013-01-27 Thread Josh Triplett

On Sat, Jan 26, 2013 at 04:23:28PM -0800, Paul E. McKenney wrote:
> Hello!
> 
> This series provides a couple of tiny-RCU changes:
> 
> 1.Make Tiny RCU emit RCU CPU stall warnings when RCU_TRACE=y.
> 2.Allow TREE_PREEMPT_RCU to be used on UP systems.

I replied to patch 1 with a suggestion; with that change,
Reviewed-by: Josh Triplett 
for the whole series.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: very strange dependencies on CONFIG_EXPERT=n in kernel 3.8

2013-03-12 Thread Josh Triplett

[Please don't top-post.]

On Tue, Mar 12, 2013 at 03:52:24PM +0100, Konrad Vrba wrote:
> On 3/11/13, Josh Triplett  wrote:
> > On Sun, Mar 10, 2013 at 04:14:27PM +0100, Konrad Vrba wrote:
> >> I have noticed that CONFIG_EXPERT=n makes the following options in the
> >> kernel required (unremovable):
> >>
> >> CONFIG_FW_LOADER=y
> >> CONFIG_EXTRA_FIRMWARE=""
> >> CONFIG_MOUSE_PS2_ALPS=y
> >> CONFIG_MOUSE_PS2_LOGIPS2PP=y
> >> CONFIG_MOUSE_PS2_SYNAPTICS=y
> >> CONFIG_MOUSE_PS2_LIFEBOOK=y
> >> CONFIG_MOUSE_PS2_TRACKPOINT=y
> >> CONFIG_VGA_ARB=y
> >> CONFIG_VGA_ARB_MAX_GPUS=16
> >> CONFIG_I8253_LOCK=y
> >> CONFIG_DEBUG_MEMORY_INIT=y
> >> CONFIG_PCSPKR_PLATFORM=y
> >>
> >> If I select CONFIG_EXPERT=y then I can remove those, but that creates
> >> a new problem by making CONFIG_DEBUG_KERNEL=y unremovable.
> >>
> >> To make a specific example, this makes it impossible to compile a kernel
> >> with
> >> CONFIG_FW_LOADER=n
> >> CONFIG_DEBUG_KERNEL=n
> >> at the same time
> >
> > CONFIG_DEBUG_KERNEL, like CONFIG_EXPERT, should not directly affect the
> > code included in the kernel; it should just avoid asking about a pile of
> > other debugging options.  In practice, a small amount of
> > architecture-specific code (for powerpc, parisc, and blackfin) did use
> > it as a generic debug option, but that needs fixing.  So, for now, turn
> > on CONFIG_EXPERT and live with having CONFIG_DEBUG_KERNEL turned on.
> >
> > That aside, several of the above options should not depend on EXPERT;
> > why would PCSPKR_PLATFORM or DEBUG_MEMORY_INIT need to depend on EXPERT?
> >
> > - Josh Triplett
> >
> Thanks Josh,
> unfortunately, I cannot live with CONFIG_DEBUG_KERNEL. I need a
> minimalistic kernel, and every unnecessary KB is expensive. Besides, I
> have ideological objections to having unneeded features compiled in my
> kernel.

What architecture will your kernel target?  If not powerpc, parisc, or
blackfin, CONFIG_DEBUG_KERNEL will not add a single byte to your kernel,
as long as you turn off all the other debugging options under it.

> Is there any way of editing it directly in the .config file manually?

No, you can't make the .config file violate the dependencies specified
in Kconfig.  However, you *could* write and submit a patch fixing the
underlying bug: add one or more new Kconfig symbols under
CONFIG_DEBUG_KERNEL, and change all the instances of CONFIG_DEBUG_KERNEL
in a .c, .h, or .S file to one of those.

> Nobody asked for my opinion, but I feel I have to say something:
> The linux kernel project is already a complicated endeavor as it is.
> Why does somebody complicate it needlessly, by creating these
> artificial problems.

CONFIG_EXPERT and CONFIG_DEBUG_KERNEL actually exist for the opposite
reason: to simplify kernel configuration for most people.  With
CONFIG_EXPERT=n, you don't get asked about a pile of kernel options that
almost everyone will want and that can unexpectedly break your kernel
when turned off.  Likewise, CONFIG_DEBUG_KERNEL=n lets you avoid
configuring a pile of individual debugging options.  (That matters more
if you use "make config" and have to answer questions one by one in
series, rather than "make menuconfig" where you can just ignore whole
sections of configuration that you don't care about.)

CONFIG_EXPERT=y means "I know what I'm doing and want to configure out
features that almost every kernel needs", and you usually only turn it
on when trying to make the smallest possible kernel.  In turn,
CONFIG_EXPERT enables CONFIG_DEBUG_KERNEL so that you can see and turn
*off* some of the debugging options that default to y, to make an even
smaller kernel.  So, in practice, if you want to make the smallest
possible kernel, you want CONFIG_EXPERT=y and CONFIG_DEBUG_KERNEL=y.
That CONFIG_DEBUG_KERNEL controls a bit of miscellaneous debugging on a
few architectures represents a bug, nothing more.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

NULL pointer dereference in ext4_superblock_csum_set with mounted filesystem

2013-03-13 Thread Josh Triplett

I frequently test kernel changes by booting them with kvm's -kernel
option, with -hda pointing to my host system's root filesystem, and
-snapshot to prevent writing to (and likely corrupting) that root
filesystem.  I tried this with a kernel built from git commit
7c6baa304b841673d3a55ea4fcf9a5cbf7a1674b, with a stock x86-64 "make
defconfig", and got a kernel panic:

[0.908898] EXT4-fs (sda): couldn't mount as ext3 due to feature 
incompatibilities
[0.911608] EXT4-fs (sda): couldn't mount as ext2 due to feature 
incompatibilities
[0.917997] EXT4-fs (sda): INFO: recovery required on readonly filesystem
[0.919575] EXT4-fs (sda): write access will be enabled during recovery
[1.004234] BUG: unable to handle kernel NULL pointer dereference at 
  (null)
[1.005050] IP: [] ext4_superblock_csum_set+0x2f/0x70
[1.005050] PGD 0 
[1.005050] Oops:  [#1] SMP 
[1.005050] Modules linked in:
[1.005050] CPU 0 
[1.005050] Pid: 1, comm: swapper/0 Not tainted 3.9.0-rc2+ #5 Bochs Bochs
[1.005050] RIP: 0010:[]  [] 
ext4_superblock_csum_set+0x2f/0x70
[1.005050] RSP: :88003e1f5578  EFLAGS: 00010202
[1.005050] RAX:  RBX: 880001da8400 RCX: 0001
[1.005050] RDX: 0040 RSI: 0040 RDI: 88003d93d400
[1.005050] RBP: 88003e1f55a8 R08: 81cb4238 R09: 0040
[1.005050] R10: 01270030 R11:  R12: 88003de0f1a0
[1.005050] R13: 880001da8400 R14:  R15: 88003d93d400
[1.005050] FS:  () GS:88003fc0() 
knlGS:
[1.005050] CS:  0010 DS:  ES:  CR0: 8005003b
[1.005050] CR2:  CR3: 01c0b000 CR4: 06f0
[1.005050] DR0:  DR1:  DR2: 
[1.005050] DR3:  DR6: 0ff0 DR7: 0400
[1.005050] Process swapper/0 (pid: 1, threadinfo 88003e1f4000, task 
88003e1f)
[1.005050] Stack:
[1.005050]  88003e1f55a8 812c8ffa 810fd729 

[1.005050]  88003de0f1a0 0105a4e8 88003e1f55f8 
811cae3c
[1.005050]  000104d8 307ea8c1 88003e1f55f8 
88003d93d400
[1.005050] Call Trace:
[1.005050]  [] ? __percpu_counter_sum+0x5a/0x80
[1.005050]  [] ? __inc_zone_state+0x59/0x60
[1.005050]  [] ext4_commit_super+0x15c/0x240
[1.005050]  [] save_error_info+0x1e/0x30
[1.005050]  [] ext4_error_inode+0x5e/0x120
[1.005050]  [] ? mempool_alloc_slab+0x10/0x20
[1.005050]  [] 
__check_block_validity.constprop.57+0x78/0x80
[1.005050]  [] ? ext4_es_lookup_extent+0x91/0x180
[1.005050]  [] ext4_map_blocks+0x250/0x3f0
[1.005050]  [] _ext4_get_block+0x82/0x190
[1.005050]  [] ext4_get_block+0x11/0x20
[1.005050]  [] generic_block_bmap+0x3a/0x40
[1.005050]  [] ? find_get_page+0x19/0xa0
[1.005050]  [] ? __find_get_block_slow+0xb8/0x160
[1.005050]  [] ? mapping_tagged+0xd/0x10
[1.005050]  [] ext4_bmap+0x89/0xf0
[1.005050]  [] bmap+0x19/0x20
[1.005050]  [] jbd2_journal_bmap+0x2e/0xb0
[1.005050]  [] jread+0x3b/0x270
[1.005050]  [] ? __getblk+0x28/0x2d0
[1.005050]  [] ? find_revoke_record+0x5a/0xb0
[1.005050]  [] do_one_pass+0x8e/0xad0
[1.005050]  [] jbd2_journal_recover+0xd9/0x110
[1.005050]  [] jbd2_journal_load+0xd7/0x390
[1.005050]  [] ? kmem_cache_alloc_trace+0x30/0x110
[1.005050]  [] ext4_fill_super+0x1e9b/0x2dc0
[1.005050]  [] mount_bdev+0x1a1/0x1e0
[1.005050]  [] ? ext4_calculate_overhead+0x3c0/0x3c0
[1.005050]  [] ext4_mount+0x10/0x20
[1.005050]  [] mount_fs+0x3e/0x1b0
[1.005050]  [] ? __alloc_percpu+0xb/0x10
[1.005050]  [] vfs_kern_mount+0x6f/0x110
[1.005050]  [] do_mount+0x209/0xa10
[1.005050]  [] ? strndup_user+0x53/0x70
[1.005050]  [] sys_mount+0x89/0xd0
[1.005050]  [] mount_block_root+0xf6/0x221
[1.005050]  [] mount_root+0xfa/0x105
[1.005050]  [] prepare_namespace+0x13d/0x16a
[1.005050]  [] kernel_init_freeable+0x1b4/0x1c4
[1.005050]  [] ? do_early_param+0x8c/0x8c
[1.005050]  [] ? rest_init+0x70/0x70
[1.005050]  [] kernel_init+0x9/0xf0
[1.005050]  [] ret_from_fork+0x7c/0xb0
[1.005050]  [] ? rest_init+0x70/0x70
[1.005050] Code: 53 48 83 ec 28 48 8b 87 40 03 00 00 48 8b 58 68 f6 43 65 
04 75 0e 48 83 c4 28 5b 5d c3 0f 1f 80 00 00 00 00 48 8b 80 b8 03 00 00 <83> 38 
04 75 37 48 8d 7d d8 ba fc 03 00 00 48 89 de 48 89 45 d8 
[1.005050] RIP  [] ext4_superblock_csum_set+0x2f/0x70
[1.005050]  RSP 
[1.005050] CR2: 
[1.066804] ---[ end trace cba8b53354947677 ]---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http

Re: NULL pointer dereference in ext4_superblock_csum_set with mounted filesystem

2013-03-13 Thread Josh Triplett

On Wed, Mar 13, 2013 at 03:01:41PM -0400, Theodore Ts'o wrote:
> On Wed, Mar 13, 2013 at 11:59:13AM -0700, Josh Triplett wrote:
> > I frequently test kernel changes by booting them with kvm's -kernel
> > option, with -hda pointing to my host system's root filesystem, and
> > -snapshot to prevent writing to (and likely corrupting) that root
> > filesystem.  I tried this with a kernel built from git commit
> > 7c6baa304b841673d3a55ea4fcf9a5cbf7a1674b, with a stock x86-64 "make
> > defconfig", and got a kernel panic:
> 
> Can you send me the output of "dumpe2fs -h" on your host system's root
> file system?

Attached.

- Josh Triplett
dumpe2fs 1.42.5 (29-Jul-2012)
Filesystem volume name:   
Last mounted on:  /
Filesystem UUID:  e23a62e0-8a4a-48d0-b781-e11ae069ab06
Filesystem magic number:  0xEF53
Filesystem revision #:1 (dynamic)
Filesystem features:  has_journal ext_attr resize_inode dir_index filetype 
needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg 
dir_nlink extra_isize
Filesystem flags: signed_directory_hash 
Default mount options:(none)
Filesystem state: clean
Errors behavior:  Continue
Filesystem OS type:   Linux
Inode count:  17711104
Block count:  70817792
Reserved block count: 708177
Free blocks:  50350508
Free inodes:  17149541
First block:  0
Block size:   4096
Fragment size:4096
Reserved GDT blocks:  1007
Blocks per group: 32768
Fragments per group:  32768
Inodes per group: 8192
Inode blocks per group:   512
Flex block group size:16
Filesystem created:   Tue Jul 10 22:09:47 2012
Last mount time:  Wed Mar 13 10:19:40 2013
Last write time:  Wed Mar 13 10:19:40 2013
Mount count:  6
Maximum mount count:  27
Last checked: Mon Mar 11 21:27:56 2013
Check interval:   15552000 (6 months)
Next check after: Sat Sep  7 21:27:56 2013
Lifetime writes:  776 GB
Reserved blocks uid:  0 (user root)
Reserved blocks gid:  0 (group root)
First inode:  11
Inode size:   256
Required extra isize: 28
Desired extra isize:  28
Journal inode:8
First orphan inode:   12845162
Default directory hash:   half_md4
Directory Hash Seed:  22edf7ec-c22c-43aa-a7ea-c3349da9a00c
Journal backup:   inode blocks
Journal features: journal_incompat_revoke
Journal size: 128M
Journal length:   32768
Journal sequence: 0x0023bba0
Journal start:10041

[PATCH] fs: Make binfmt support for #! scripts modular and removable

2013-03-13 Thread Josh Triplett

Add a new configuration option CONFIG_BINFMT_SCRIPT to configure support
for interpreted scripts starting with "#!"; allow compiling out that
support, or building it as a module.  Embedded systems running
exclusively compiled binaries could leave this support out, and systems
that don't need scripts before mounting the root filesystem can build
this as a module.

Signed-off-by: Josh Triplett 
---

Note when testing this that many shells implement support for shell
scripts themselves, so try it with something like #!/bin/cat instead.

 fs/Kconfig.binfmt |   14 ++
 fs/Makefile   |5 +
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/fs/Kconfig.binfmt b/fs/Kconfig.binfmt
index 0efd152..370b24c 100644
--- a/fs/Kconfig.binfmt
+++ b/fs/Kconfig.binfmt
@@ -65,6 +65,20 @@ config CORE_DUMP_DEFAULT_ELF_HEADERS
  This config option changes the default setting of coredump_filter
  seen at boot time.  If unsure, say Y.
 
+config BINFMT_SCRIPT
+   tristate "Kernel support for scripts starting with #!"
+   default y
+   help
+ Say Y here if you want to execute interpreted scripts starting with
+ #! followed by the path to an interpreter.
+
+ You can build this support as a module; however, until that module
+ gets loaded, you cannot run scripts.  Thus, if you want to load this
+ module from an initramfs, the portion of the initramfs before loading
+ this module must consist of compiled binaries only.
+
+ Most systems will not boot if you say M or N here.  If unsure, say Y.
+
 config BINFMT_FLAT
bool "Kernel support for flat binaries"
depends on !MMU && (!FRV || BROKEN)
diff --git a/fs/Makefile b/fs/Makefile
index 9d53192..2ef3298 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -34,10 +34,7 @@ obj-$(CONFIG_COMPAT) += compat.o compat_ioctl.o
 obj-$(CONFIG_BINFMT_AOUT)  += binfmt_aout.o
 obj-$(CONFIG_BINFMT_EM86)  += binfmt_em86.o
 obj-$(CONFIG_BINFMT_MISC)  += binfmt_misc.o
-
-# binfmt_script is always there
-obj-y  += binfmt_script.o
-
+obj-$(CONFIG_BINFMT_SCRIPT)+= binfmt_script.o
 obj-$(CONFIG_BINFMT_ELF)   += binfmt_elf.o
 obj-$(CONFIG_COMPAT_BINFMT_ELF)+= compat_binfmt_elf.o
 obj-$(CONFIG_BINFMT_ELF_FDPIC) += binfmt_elf_fdpic.o
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] fs: Don't compile in drop_caches.c when CONFIG_SYSCTL=n

2013-03-13 Thread Josh Triplett

drop_caches.c provides code only invokable via sysctl, so don't compile
it in when CONFIG_SYSCTL=n.

Signed-off-by: Josh Triplett 
---
 fs/Makefile|3 ++-
 include/linux/mm.h |4 
 kernel/sysctl.c|1 -
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/Makefile b/fs/Makefile
index 9d53192..3b2c767 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -10,7 +10,7 @@ obj-y :=  open.o read_write.o file_table.o super.o \
ioctl.o readdir.o select.o fifo.o dcache.o inode.o \
attr.o bad_inode.o file.o filesystems.o namespace.o \
seq_file.o xattr.o libfs.o fs-writeback.o \
-   pnode.o drop_caches.o splice.o sync.o utimes.o \
+   pnode.o splice.o sync.o utimes.o \
stack.o fs_struct.o statfs.o
 
 ifeq ($(CONFIG_BLOCK),y)
@@ -49,6 +49,7 @@ obj-$(CONFIG_FS_POSIX_ACL)+= posix_acl.o xattr_acl.o
 obj-$(CONFIG_NFS_COMMON)   += nfs_common/
 obj-$(CONFIG_GENERIC_ACL)  += generic_acl.o
 obj-$(CONFIG_COREDUMP) += coredump.o
+obj-$(CONFIG_SYSCTL)   += drop_caches.o
 
 obj-$(CONFIG_FHANDLE)  += fhandle.o
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7acc9dc..1bb400f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1674,8 +1674,12 @@ int in_gate_area_no_mm(unsigned long addr);
 #define in_gate_area(mm, addr) ({(void)mm; in_gate_area_no_mm(addr);})
 #endif /* __HAVE_ARCH_GATE_AREA */
 
+#ifdef CONFIG_SYSCTL
+extern int sysctl_drop_caches;
 int drop_caches_sysctl_handler(struct ctl_table *, int,
void __user *, size_t *, loff_t *);
+#endif
+
 unsigned long shrink_slab(struct shrink_control *shrink,
  unsigned long nr_pages_scanned,
  unsigned long lru_pages);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index afc1dc6..3dadde5 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -106,7 +106,6 @@ extern unsigned int core_pipe_limit;
 #endif
 extern int pid_max;
 extern int pid_max_min, pid_max_max;
-extern int sysctl_drop_caches;
 extern int percpu_pagelist_fraction;
 extern int compat_log;
 extern int latencytop_enabled;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NULL pointer dereference in ext4_superblock_csum_set with mounted filesystem

2013-03-14 Thread Josh Triplett

On Thu, Mar 14, 2013 at 12:08:35AM -0400, Theodore Ts'o wrote:
> Huh.  This is very, very weird.  Is this a repeatable crash?

I could reliably replicate it for that particular session, but now that
I've rebooted the host, no.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] fs: Make binfmt support for #! scripts modular and removable

2013-03-15 Thread Josh Triplett

Add a new configuration option CONFIG_BINFMT_SCRIPT to configure support
for interpreted scripts starting with "#!"; allow compiling out that
support, or building it as a module.  Embedded systems running
exclusively compiled binaries could leave this support out, and systems
that don't need scripts before mounting the root filesystem can build
this as a module.

Signed-off-by: Josh Triplett 
---

Resending this because I received a bounce notification from
v...@zeniv.linux.org.uk.  Hopefully this one will go through.

Note when testing this that many shells implement support for shell
scripts themselves, so try it with something like #!/bin/cat instead.

 fs/Kconfig.binfmt |   14 ++
 fs/Makefile   |5 +
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/fs/Kconfig.binfmt b/fs/Kconfig.binfmt
index 0efd152..370b24c 100644
--- a/fs/Kconfig.binfmt
+++ b/fs/Kconfig.binfmt
@@ -65,6 +65,20 @@ config CORE_DUMP_DEFAULT_ELF_HEADERS
  This config option changes the default setting of coredump_filter
  seen at boot time.  If unsure, say Y.
 
+config BINFMT_SCRIPT
+   tristate "Kernel support for scripts starting with #!"
+   default y
+   help
+ Say Y here if you want to execute interpreted scripts starting with
+ #! followed by the path to an interpreter.
+
+ You can build this support as a module; however, until that module
+ gets loaded, you cannot run scripts.  Thus, if you want to load this
+ module from an initramfs, the portion of the initramfs before loading
+ this module must consist of compiled binaries only.
+
+ Most systems will not boot if you say M or N here.  If unsure, say Y.
+
 config BINFMT_FLAT
bool "Kernel support for flat binaries"
depends on !MMU && (!FRV || BROKEN)
diff --git a/fs/Makefile b/fs/Makefile
index 9d53192..2ef3298 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -34,10 +34,7 @@ obj-$(CONFIG_COMPAT) += compat.o compat_ioctl.o
 obj-$(CONFIG_BINFMT_AOUT)  += binfmt_aout.o
 obj-$(CONFIG_BINFMT_EM86)  += binfmt_em86.o
 obj-$(CONFIG_BINFMT_MISC)  += binfmt_misc.o
-
-# binfmt_script is always there
-obj-y  += binfmt_script.o
-
+obj-$(CONFIG_BINFMT_SCRIPT)+= binfmt_script.o
 obj-$(CONFIG_BINFMT_ELF)   += binfmt_elf.o
 obj-$(CONFIG_COMPAT_BINFMT_ELF)+= compat_binfmt_elf.o
 obj-$(CONFIG_BINFMT_ELF_FDPIC) += binfmt_elf_fdpic.o
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 0/5] Documentation and rcutorture changes

2012-08-30 Thread Josh Triplett

On Thu, Aug 30, 2012 at 11:44:48AM -0700, Paul E. McKenney wrote:
> Hello!
> 
> This series covers changes to rcutorture and documentation updates.
> The individual patches in this series are as follows:
> 
> 1.Update rcutorture default values so that casual rcutorture
>   users will do more aggressive testing.
> 2.Make rcutorture track CPU-hotplug latency statistics.
> 3.Document SRCU's new-found ability to be used by offline and
>   idle CPUs, and also emphasize SRCU's limitations.
> 4.Use the new pr_*() interfaces in rcutorture.
> 5.Prevent kthread-initialization races in rcutorture.
> 
>   Thanx, Paul
> 
> 
> 
>  b/Documentation/RCU/checklist.txt |6 +
>  b/Documentation/RCU/whatisRCU.txt |9 +-
>  b/kernel/rcutorture.c |4 -
>  kernel/rcutorture.c   |  152 
> +++---
>  4 files changed, 108 insertions(+), 63 deletions(-)

Something seems wrong with this diffstat; how'd the b/ prefixes get
there, and why does it list kernel/rcutorture.c twice, once with and
once without?

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 1/5] rcu: Update rcutorture defaults

2012-08-30 Thread Josh Triplett

On Thu, Aug 30, 2012 at 11:45:08AM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> A number of new features have been added to rcutorture over the years, but
> the defaults have not been updated to include them.  This commit therefore
> turns on a couple of them that have proven helpful and trustworthy, namely
> periodic progress reports and testing of NO_HZ.
> 
> Signed-off-by: Paul E. McKenney 
> Signed-off-by: Paul E. McKenney 
> ---
>  kernel/rcutorture.c |4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/rcutorture.c b/kernel/rcutorture.c
> index 25b1503..86315d3 100644
> --- a/kernel/rcutorture.c
> +++ b/kernel/rcutorture.c
> @@ -53,10 +53,10 @@ MODULE_AUTHOR("Paul E. McKenney  and 
> Josh Triplett   
>  static int nreaders = -1;/* # reader threads, defaults to 2*ncpus */
>  static int nfakewriters = 4; /* # fake writer threads */
> -static int stat_interval;/* Interval between stats, in seconds. */
> +static int stat_interval = 60;   /* Interval between stats, in seconds. 
> */
>   /*  Defaults to "only at end of test". */

Need to remove this comment about the default.

>  static bool verbose; /* Print more debug info. */
> -static bool test_no_idle_hz; /* Test RCU's support for tickless idle CPUs. */
> +static bool test_no_idle_hz = 1; /* Test RCU support for tickless idle CPUs. 
> */

s/1/true/

>  static int shuffle_interval = 3; /* Interval between shuffles (in sec)*/
>  static int stutter = 5;  /* Start/stop testing interval (in sec) 
> */
>  static int irqreader = 1;/* RCU readers from irq (timers). */
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 2/5] rcu: Track CPU-hotplug duration statistics

2012-08-30 Thread Josh Triplett

On Thu, Aug 30, 2012 at 11:45:09AM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> Many rcutorture runs include CPU-hotplug operations in their stress
> testing.  This commit accumulates statistics on the durations of these
> operations in deference to the recent concern about the overhead and
> latency of these operations.

How many jiffies, on average, do these operations take?  Measuring these
using jiffies seems highly prone to repeated rounding error.

> Signed-off-by: Paul E. McKenney 
> Signed-off-by: Paul E. McKenney 
> ---
>  kernel/rcutorture.c |   42 +-
>  1 files changed, 37 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/rcutorture.c b/kernel/rcutorture.c
> index 86315d3..c6cf6ff 100644
> --- a/kernel/rcutorture.c
> +++ b/kernel/rcutorture.c
> @@ -176,8 +176,14 @@ static long n_rcu_torture_boosts;
>  static long n_rcu_torture_timers;
>  static long n_offline_attempts;
>  static long n_offline_successes;
> +static unsigned long sum_offline;
> +static int min_offline = -1;
> +static int max_offline;
>  static long n_online_attempts;
>  static long n_online_successes;
> +static unsigned long sum_online;
> +static int min_online = -1;
> +static int max_online;
>  static long n_barrier_attempts;
>  static long n_barrier_successes;
>  static struct list_head rcu_torture_removed;
> @@ -1214,11 +1220,13 @@ rcu_torture_printk(char *page)
>  n_rcu_torture_boost_failure,
>  n_rcu_torture_boosts,
>  n_rcu_torture_timers);
> - cnt += sprintf(&page[cnt], "onoff: %ld/%ld:%ld/%ld ",
> -n_online_successes,
> -n_online_attempts,
> -n_offline_successes,
> -n_offline_attempts);
> + cnt += sprintf(&page[cnt],
> +"onoff: %ld/%ld:%ld/%ld %d,%d:%d,%d %lu:%lu (HZ=%d) ",
> +n_online_successes, n_online_attempts,
> +n_offline_successes, n_offline_attempts,
> +min_online, max_online,
> +min_offline, max_offline,
> +sum_online, sum_offline, HZ);
>   cnt += sprintf(&page[cnt], "barrier: %ld/%ld:%ld",
>  n_barrier_successes,
>  n_barrier_attempts,
> @@ -1490,8 +1498,10 @@ static int __cpuinit
>  rcu_torture_onoff(void *arg)
>  {
>   int cpu;
> + unsigned long delta;
>   int maxcpu = -1;
>   DEFINE_RCU_RANDOM(rand);
> + unsigned long starttime;
>  
>   VERBOSE_PRINTK_STRING("rcu_torture_onoff task started");
>   for_each_online_cpu(cpu)
> @@ -1509,6 +1519,7 @@ rcu_torture_onoff(void *arg)
>   printk(KERN_ALERT "%s" TORTURE_FLAG
>  "rcu_torture_onoff task: offlining %d\n",
>  torture_type, cpu);
> + starttime = jiffies;
>   n_offline_attempts++;
>   if (cpu_down(cpu) == 0) {
>   if (verbose)
> @@ -1516,12 +1527,23 @@ rcu_torture_onoff(void *arg)
>  "rcu_torture_onoff task: 
> offlined %d\n",
>  torture_type, cpu);
>   n_offline_successes++;
> + delta = jiffies - starttime;
> + sum_offline += delta;
> + if (min_offline < 0) {
> + min_offline = delta;
> + max_offline = delta;
> + }
> + if (min_offline > delta)
> + min_offline = delta;
> + if (max_offline < delta)
> + max_offline = delta;
>   }
>   } else if (cpu_is_hotpluggable(cpu)) {
>   if (verbose)
>   printk(KERN_ALERT "%s" TORTURE_FLAG
>  "rcu_torture_onoff task: onlining %d\n",
>  torture_type, cpu);
> + starttime = jiffies;
>   n_online_attempts++;
>   if (cpu_up(cpu) == 0) {
>   if (verbose)
> @@ -1529,6 +1551,16 @@ rcu_torture_onoff(void *arg)
>  "rcu_torture_onoff task: onlined 
> %d\n",
>  torture_type, cpu);
>   n_online_successes++;
> + delta = jiffies - starttime;
> + sum_online += delta;
> + if (min_online < 0) {
> + min_online = delta;
> + max_online = delta;

Re: [PATCH tip/core/rcu 3/5] rcu: Document SRCU dead-CPU capabilities, emphasize read-side limits

2012-08-30 Thread Josh Triplett

On Thu, Aug 30, 2012 at 11:45:10AM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> The current documentation did not help someone grepping for SRCU to
> learn that disabling preemption is not a replacement for srcu_read_lock(),
> so upgrade the documentation to bring this out, not just for SRCU,
> but also for RCU-bh.  Also document the fact that SRCU readers are
> respected on CPUs executing in user mode, idle CPUs, and even on
> offline CPUs.
> 
> Signed-off-by: Paul E. McKenney 

Reviewed-by: Josh Triplett 

>  Documentation/RCU/checklist.txt |6 ++
>  Documentation/RCU/whatisRCU.txt |9 +++--
>  2 files changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt
> index fc103d7..cdb20d4 100644
> --- a/Documentation/RCU/checklist.txt
> +++ b/Documentation/RCU/checklist.txt
> @@ -310,6 +310,12 @@ over a rather long period of time, but improvements are 
> always welcome!
>   code under the influence of preempt_disable(), you instead
>   need to use synchronize_irq() or synchronize_sched().
>  
> + This same limitation also applies to synchronize_rcu_bh()
> + and synchronize_srcu(), as well as to the asynchronous and
> + expedited forms of the three primitives, namely call_rcu(),
> + call_rcu_bh(), call_srcu(), synchronize_rcu_expedited(),
> + synchronize_rcu_bh_expedited(), and synchronize_srcu_expedited().
> +
>  12.  Any lock acquired by an RCU callback must be acquired elsewhere
>   with softirq disabled, e.g., via spin_lock_irqsave(),
>   spin_lock_bh(), etc.  Failing to disable irq on a given
> diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt
> index 69ee188..bf0f6de 100644
> --- a/Documentation/RCU/whatisRCU.txt
> +++ b/Documentation/RCU/whatisRCU.txt
> @@ -873,7 +873,7 @@ d.Do you need to treat NMI handlers, hardirq 
> handlers,
>   and code segments with preemption disabled (whether
>   via preempt_disable(), local_irq_save(), local_bh_disable(),
>   or some other mechanism) as if they were explicit RCU readers?
> - If so, you need RCU-sched.
> + If so, RCU-sched is the only choice that will work for you.
>  
>  e.   Do you need RCU grace periods to complete even in the face
>   of softirq monopolization of one or more of the CPUs?  For
> @@ -884,7 +884,12 @@ f.   Is your workload too update-intensive for 
> normal use of
>   RCU, but inappropriate for other synchronization mechanisms?
>   If so, consider SLAB_DESTROY_BY_RCU.  But please be careful!
>  
> -g.   Otherwise, use RCU.
> +g.   Do you need read-side critical sections that are respected
> + even though they are in the middle of the idle loop, during
> + user-mode execution, or on an offlined CPU?  If so, SRCU is the
> + only choice that will work for you.
> +
> +h.   Otherwise, use RCU.
>  
>  Of course, this all assumes that you have determined that RCU is in fact
>  the right tool for your job.
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 4/5] rcu: Switch rcutorture to pr_alert() and friends

2012-08-30 Thread Josh Triplett

On Thu, Aug 30, 2012 at 11:45:11AM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> Drop a few characters by switching kernel/rcutorture.c from
> "printk(KERN_ALERT" to "pr_alert(".
> 
> Signed-off-by: Paul E. McKenney 

How about setting pr_fmt as well, and dropping the various "rcutorture:"
prefixes?  You'd still potentially want to add the torture type, though
you could do that with pr_fmt as well.

In any case:

Reviewed-by: Josh Triplett 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 5/5] rcu: Prevent initialization race in rcutorture kthreads

2012-08-30 Thread Josh Triplett

On Thu, Aug 30, 2012 at 11:45:12AM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> When you do something like "t = kthread_run(...)", it is possible that
> the kthread will start running before the assignment to "t" happens.
> If the child kthread expects to find a pointer to its task_struct in "t",
> it will then be fatally disappointed.  This commit therefore switches
> such cases to kthread_create() followed by wake_up_process(), guaranteeing
> that the assignment happens before the child kthread starts running.
> 
> Reported-by: Fengguang Wu 
> Signed-off-by: Paul E. McKenney 

Seems like you should go ahead and make this change for all the threads,
not just two of them.  A simple wrapper around kthread_run, taking a
struct task_struct ** to write to, would make this much simpler.  Such a
wrapper could also return an error code directly (for use in firsterr),
write NULL to the pointer on error, and perhaps print an error message,
which would remove most of the boilerplate currently duplicated for
every thread creation.

Arguably, all of those except the error message printing would make
sense as changes to kthread_run itself, but that's another patch. :)

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/3] Fix ACPI BGRT support for images located in EFI boot services memory

2012-08-30 Thread Josh Triplett

The ACPI BGRT lets the OS access the BIOS logo image and its position on the
screen at boot time, allowing it to maintain that image on the screen until
ready to display something else, making boot more seamless.  This series fixes
support for accessing the boot logo image via the BGRT when the BIOS stores it
in EFI boot services memory, as recommended by the ACPI 5.0 spec.  Linux needs
to copy the image out of boot services memory before reclaiming boot services
memory.

The first patch refactors EFI initialization to defer freeing boot services
memory until later in the boot process, after we have ACPI available.  The
second patch adds a helper function to look up existing EFI boot services
mappings, to avoid re-mapping them.  The third patch moves BGRT initialization
to before the reclamation of boot services memory, copies the logo at that
point, and reworks the existing BGRT driver to use that existing copy.

Josh Triplett (3):
  efi: Defer freeing boot services memory until after ACPI init
  efi: Add a function to look up existing IO memory mappings
  efi: Fix the ACPI BGRT driver for images located in EFI boot services memory

 arch/x86/platform/efi/Makefile   |1 +
 arch/x86/platform/efi/efi-bgrt.c |   76 ++
 arch/x86/platform/efi/efi.c  |   65 +---
 drivers/acpi/Kconfig |4 +-
 drivers/acpi/bgrt.c  |   76 +-
 include/linux/efi-bgrt.h |   21 +++
 include/linux/efi.h  |3 ++
 init/main.c  |7 
 8 files changed, 171 insertions(+), 82 deletions(-)
 create mode 100644 arch/x86/platform/efi/efi-bgrt.c
 create mode 100644 include/linux/efi-bgrt.h

-- 
1.7.10.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/3] efi: Defer freeing boot services memory until after ACPI init

2012-08-30 Thread Josh Triplett

Some new ACPI 5.0 tables reference resources stored in boot services
memory, so keep that memory around until we have ACPI and can extract
data from it.

Signed-off-by: Josh Triplett 
---
 arch/x86/platform/efi/efi.c |   31 ++-
 include/linux/efi.h |1 +
 init/main.c |5 +
 3 files changed, 24 insertions(+), 13 deletions(-)

diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 92660eda..8af329f 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -419,10 +419,21 @@ void __init efi_reserve_boot_services(void)
}
 }
 
-static void __init efi_free_boot_services(void)
+void __init efi_unmap_memmap(void)
+{
+   if (memmap.map) {
+   early_iounmap(memmap.map, memmap.nr_map * memmap.desc_size);
+   memmap.map = NULL;
+   }
+}
+
+void __init efi_free_boot_services(void)
 {
void *p;
 
+   if (!efi_native)
+   return;
+
for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
efi_memory_desc_t *md = p;
unsigned long long start = md->phys_addr;
@@ -438,6 +449,8 @@ static void __init efi_free_boot_services(void)
 
free_bootmem_late(start, size);
}
+
+   efi_unmap_memmap();
 }
 
 static int __init efi_systab_init(void *phys)
@@ -787,8 +800,10 @@ void __init efi_enter_virtual_mode(void)
 * non-native EFI
 */
 
-   if (!efi_native)
-   goto out;
+   if (!efi_native) {
+   efi_unmap_memmap();
+   return;
+   }
 
/* Merge contiguous regions of the same type and attribute */
for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
@@ -878,13 +893,6 @@ void __init efi_enter_virtual_mode(void)
}
 
/*
-* Thankfully, it does seem that no runtime services other than
-* SetVirtualAddressMap() will touch boot services code, so we can
-* get rid of it all at this point
-*/
-   efi_free_boot_services();
-
-   /*
 * Now that EFI is in virtual mode, update the function
 * pointers in the runtime service table to the new virtual addresses.
 *
@@ -906,9 +914,6 @@ void __init efi_enter_virtual_mode(void)
if (__supported_pte_mask & _PAGE_NX)
runtime_code_page_mkexec();
 
-out:
-   early_iounmap(memmap.map, memmap.nr_map * memmap.desc_size);
-   memmap.map = NULL;
kfree(new_memmap);
 }
 
diff --git a/include/linux/efi.h b/include/linux/efi.h
index ec45ccd..00ec70f 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -496,6 +496,7 @@ extern void efi_map_pal_code (void);
 extern void efi_memmap_walk (efi_freemem_callback_t callback, void *arg);
 extern void efi_gettimeofday (struct timespec *ts);
 extern void efi_enter_virtual_mode (void); /* switch EFI to virtual mode, 
if possible */
+extern void efi_free_boot_services(void);
 extern u64 efi_get_iobase (void);
 extern u32 efi_mem_type (unsigned long phys_addr);
 extern u64 efi_mem_attributes (unsigned long phys_addr);
diff --git a/init/main.c b/init/main.c
index b286730..391a291 100644
--- a/init/main.c
+++ b/init/main.c
@@ -631,6 +631,11 @@ asmlinkage void __init start_kernel(void)
acpi_early_init(); /* before LAPIC and SMP init */
sfi_init_late();
 
+#ifdef CONFIG_X86
+   if (efi_enabled)
+   efi_free_boot_services();
+#endif
+
ftrace_init();
 
/* Do the rest non-__init'ed, we're now alive */
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/3] efi: Add a function to look up existing IO memory mappings

2012-08-30 Thread Josh Triplett

The EFI initialization creates virtual mappings for EFI boot services
memory, so if a driver wants to access EFI boot services memory, it
cannot call ioremap itself; doing so will trip the WARN about mapping
RAM twice.  Thus, a driver accessing EFI boot services memory must do so
via the existing mapping already created during EFI intiialization.
Since the EFI code already maintains a memory map for that memory, add a
function efi_lookup_mapped_addr to look up mappings in that memory map.

Signed-off-by: Josh Triplett 
---
 arch/x86/platform/efi/efi.c |   28 
 include/linux/efi.h |1 +
 2 files changed, 29 insertions(+)

diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 8af329f..ae35cc8 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -777,6 +777,34 @@ static void __init runtime_code_page_mkexec(void)
 }
 
 /*
+ * We can't ioremap data in EFI boot services RAM, because we've already mapped
+ * it as RAM.  So, look it up in the existing EFI memory map instead.  Only
+ * callable after efi_enter_virtual_mode and before efi_free_boot_services.
+ */
+void __iomem *efi_lookup_mapped_addr(u64 phys_addr)
+{
+   void *p;
+   if (WARN_ON(!memmap.map))
+   return NULL;
+   for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
+   efi_memory_desc_t *md = p;
+   u64 size = md->num_pages << EFI_PAGE_SHIFT;
+   u64 end = md->phys_addr + size;
+   if (!(md->attribute & EFI_MEMORY_RUNTIME) &&
+   md->type != EFI_BOOT_SERVICES_CODE &&
+   md->type != EFI_BOOT_SERVICES_DATA)
+   continue;
+   if (!md->virt_addr)
+   continue;
+   if (phys_addr >= md->phys_addr && phys_addr < end) {
+   phys_addr += md->virt_addr - md->phys_addr;
+   return (__force void __iomem *)phys_addr;
+   }
+   }
+   return NULL;
+}
+
+/*
  * This function will switch the EFI runtime services to virtual mode.
  * Essentially, look through the EFI memmap and map every region that
  * has the runtime attribute bit set in its memory descriptor and update
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 00ec70f..0c11a58 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -496,6 +496,7 @@ extern void efi_map_pal_code (void);
 extern void efi_memmap_walk (efi_freemem_callback_t callback, void *arg);
 extern void efi_gettimeofday (struct timespec *ts);
 extern void efi_enter_virtual_mode (void); /* switch EFI to virtual mode, 
if possible */
+extern void __iomem *efi_lookup_mapped_addr(u64 phys_addr);
 extern void efi_free_boot_services(void);
 extern u64 efi_get_iobase (void);
 extern u32 efi_mem_type (unsigned long phys_addr);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/3] efi: Fix the ACPI BGRT driver for images located in EFI boot services memory

2012-08-30 Thread Josh Triplett

The ACPI BGRT driver accesses the BIOS logo image when it initializes.
However, ACPI 5.0 (which introduces the BGRT) recommends putting the
logo image in EFI boot services memory, so that the OS can reclaim that
memory.  Production systems follow this recommendation, breaking the
ACPI BGRT driver.

Move the bulk of the BGRT code to run during a new EFI late
initialization phase, which occurs after switching EFI to virtual mode,
and after initializing ACPI, but before freeing boot services memory.
Copy the BIOS logo image to kernel memory at that point, and make it
accessible to the BGRT driver.  Rework the existing ACPI BGRT driver to
act as a simple wrapper exposing that image (and the properties from the
BGRT) via sysfs.

Signed-off-by: Josh Triplett 
---
 arch/x86/platform/efi/Makefile   |1 +
 arch/x86/platform/efi/efi-bgrt.c |   76 ++
 arch/x86/platform/efi/efi.c  |6 +++
 drivers/acpi/Kconfig |4 +-
 drivers/acpi/bgrt.c  |   76 +-
 include/linux/efi-bgrt.h |   21 +++
 include/linux/efi.h  |1 +
 init/main.c  |4 +-
 8 files changed, 119 insertions(+), 70 deletions(-)
 create mode 100644 arch/x86/platform/efi/efi-bgrt.c
 create mode 100644 include/linux/efi-bgrt.h

diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
index 73b8be0..6db1cc4 100644
--- a/arch/x86/platform/efi/Makefile
+++ b/arch/x86/platform/efi/Makefile
@@ -1 +1,2 @@
 obj-$(CONFIG_EFI)  += efi.o efi_$(BITS).o efi_stub_$(BITS).o
+obj-$(CONFIG_ACPI_BGRT) += efi-bgrt.o
diff --git a/arch/x86/platform/efi/efi-bgrt.c b/arch/x86/platform/efi/efi-bgrt.c
new file mode 100644
index 000..f6a0c1b
--- /dev/null
+++ b/arch/x86/platform/efi/efi-bgrt.c
@@ -0,0 +1,76 @@
+/*
+ * Copyright 2012 Intel Corporation
+ * Author: Josh Triplett 
+ *
+ * Based on the bgrt driver:
+ * Copyright 2012 Red Hat, Inc 
+ * Author: Matthew Garrett
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include 
+#include 
+#include 
+#include 
+
+struct acpi_table_bgrt *bgrt_tab;
+void *bgrt_image;
+size_t bgrt_image_size;
+
+struct bmp_header {
+   u16 id;
+   u32 size;
+} __packed;
+
+void efi_bgrt_init(void)
+{
+   acpi_status status;
+   void __iomem *image;
+   bool ioremapped = false;
+   struct bmp_header bmp_header;
+
+   if (acpi_disabled)
+   return;
+
+   status = acpi_get_table("BGRT", 0,
+   (struct acpi_table_header **)&bgrt_tab);
+   if (ACPI_FAILURE(status))
+   return;
+
+   if (bgrt_tab->version != 1)
+   return;
+   if (bgrt_tab->image_type != 0 || !bgrt_tab->image_address)
+   return;
+
+   image = efi_lookup_mapped_addr(bgrt_tab->image_address);
+   if (!image) {
+   image = ioremap(bgrt_tab->image_address, sizeof(bmp_header));
+   ioremapped = true;
+   if (!image)
+   return;
+   }
+
+   memcpy_fromio(&bmp_header, image, sizeof(bmp_header));
+   if (ioremapped)
+   iounmap(image);
+   bgrt_image_size = bmp_header.size;
+
+   bgrt_image = kmalloc(bgrt_image_size, GFP_KERNEL);
+   if (!bgrt_image)
+   return;
+
+   if (ioremapped) {
+   image = ioremap(bgrt_tab->image_address, bmp_header.size);
+   if (!image) {
+   kfree(bgrt_image);
+   bgrt_image = NULL;
+   return;
+   }
+   }
+
+   memcpy_fromio(bgrt_image, image, bgrt_image_size);
+   if (ioremapped)
+   iounmap(image);
+}
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index ae35cc8..0226585 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -745,6 +746,11 @@ void __init efi_init(void)
 #endif
 }
 
+void __init efi_late_init(void)
+{
+   efi_bgrt_init();
+}
+
 void __init efi_set_executable(efi_memory_desc_t *md, bool executable)
 {
u64 addr, npages;
diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index 8099895..119d58d 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -385,8 +385,8 @@ config ACPI_CUSTOM_METHOD
  to override that restriction).
 
 config ACPI_BGRT
-tristate "Boottime Graphics Resource Table support"
-default n
+   bool "Boottime Graphics Resource Table support"
+   depends on EFI
 help
  This driver adds support for exposing the ACPI Boottime Graphics
  Resource Table, which allows the operating sys

Re: [PATCH tip/core/rcu 0/5] Documentation and rcutorture changes

2012-08-30 Thread Josh Triplett

On Thu, Aug 30, 2012 at 02:46:03PM -0700, Paul E. McKenney wrote:
> On Thu, Aug 30, 2012 at 11:56:09AM -0700, Josh Triplett wrote:
> > On Thu, Aug 30, 2012 at 11:44:48AM -0700, Paul E. McKenney wrote:
> > > Hello!
> > > 
> > > This series covers changes to rcutorture and documentation updates.
> > > The individual patches in this series are as follows:
> > > 
> > > 1.Update rcutorture default values so that casual rcutorture
> > >   users will do more aggressive testing.
> > > 2.Make rcutorture track CPU-hotplug latency statistics.
> > > 3.Document SRCU's new-found ability to be used by offline and
> > >   idle CPUs, and also emphasize SRCU's limitations.
> > > 4.Use the new pr_*() interfaces in rcutorture.
> > > 5.Prevent kthread-initialization races in rcutorture.
> > > 
> > >   Thanx, Paul
> > > 
> > > 
> > > 
> > >  b/Documentation/RCU/checklist.txt |6 +
> > >  b/Documentation/RCU/whatisRCU.txt |9 +-
> > >  b/kernel/rcutorture.c |4 -
> > >  kernel/rcutorture.c   |  152 
> > > +++---
> > >  4 files changed, 108 insertions(+), 63 deletions(-)
> > 
> > Something seems wrong with this diffstat; how'd the b/ prefixes get
> > there, and why does it list kernel/rcutorture.c twice, once with and
> > once without?
> 
> Hmmm...  It seems quite reproducible.  I did the usual git-format-patch
> and ran the resulting set of patches through diffstat.  I seem to have a
> broken diffstat...
> 
> However, git diff --stat v3.6-rc1..hotplug.2012.08.28a generates the
> following:
> 
>  kernel/rcutree.c   |   93 
> +++-
>  kernel/rcutree.h   |3 --
>  kernel/rcutree_trace.c |4 +-
>  kernel/sched/core.c|   41 ++---
>  4 files changed, 43 insertions(+), 98 deletions(-)
> 
> Which does look much better.

You might try generating your cover letter template via git format-patch
--cover-letter, which will automatically give you a list of patches and
a git-produced diffstat; much easier than trying to format a cover
letter by hand.  Meanwhile, you might consider sending your patches as a
bug report to diffstat upstream: Thomas E. Dickey
.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 2/5] rcu: Track CPU-hotplug duration statistics

2012-08-30 Thread Josh Triplett

On Thu, Aug 30, 2012 at 01:38:42PM -0700, Paul E. McKenney wrote:
> On Thu, Aug 30, 2012 at 12:00:18PM -0700, Josh Triplett wrote:
> > On Thu, Aug 30, 2012 at 11:45:09AM -0700, Paul E. McKenney wrote:
> > > From: "Paul E. McKenney" 
> > > 
> > > Many rcutorture runs include CPU-hotplug operations in their stress
> > > testing.  This commit accumulates statistics on the durations of these
> > > operations in deference to the recent concern about the overhead and
> > > latency of these operations.
> > 
> > How many jiffies, on average, do these operations take?  Measuring these
> > using jiffies seems highly prone to repeated rounding error.
> 
> On my laptop, 30-140 depending on what hotplug patches I have in place.
> Some users have reported as few as 2-3 jiffies, but they don't use
> rcutorture.
> 
> I eagerly look forward to the time when I need to change the timebase for
> my own use.  ;-)

Fair enough.  In that case, this seems precise enough for the purpose it
serves.

> > > Signed-off-by: Paul E. McKenney 
> > > Signed-off-by: Paul E. McKenney 

Reviewed-by: Josh Triplett 

> > > ---
> > >  kernel/rcutorture.c |   42 +-
> > >  1 files changed, 37 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/kernel/rcutorture.c b/kernel/rcutorture.c
> > > index 86315d3..c6cf6ff 100644
> > > --- a/kernel/rcutorture.c
> > > +++ b/kernel/rcutorture.c
> > > @@ -176,8 +176,14 @@ static long n_rcu_torture_boosts;
> > >  static long n_rcu_torture_timers;
> > >  static long n_offline_attempts;
> > >  static long n_offline_successes;
> > > +static unsigned long sum_offline;
> > > +static int min_offline = -1;
> > > +static int max_offline;
> > >  static long n_online_attempts;
> > >  static long n_online_successes;
> > > +static unsigned long sum_online;
> > > +static int min_online = -1;
> > > +static int max_online;
> > >  static long n_barrier_attempts;
> > >  static long n_barrier_successes;
> > >  static struct list_head rcu_torture_removed;
> > > @@ -1214,11 +1220,13 @@ rcu_torture_printk(char *page)
> > >  n_rcu_torture_boost_failure,
> > >  n_rcu_torture_boosts,
> > >  n_rcu_torture_timers);
> > > - cnt += sprintf(&page[cnt], "onoff: %ld/%ld:%ld/%ld ",
> > > -n_online_successes,
> > > -n_online_attempts,
> > > -n_offline_successes,
> > > -n_offline_attempts);
> > > + cnt += sprintf(&page[cnt],
> > > +"onoff: %ld/%ld:%ld/%ld %d,%d:%d,%d %lu:%lu (HZ=%d) ",
> > > +n_online_successes, n_online_attempts,
> > > +n_offline_successes, n_offline_attempts,
> > > +min_online, max_online,
> > > +min_offline, max_offline,
> > > +sum_online, sum_offline, HZ);
> > >   cnt += sprintf(&page[cnt], "barrier: %ld/%ld:%ld",
> > >  n_barrier_successes,
> > >  n_barrier_attempts,
> > > @@ -1490,8 +1498,10 @@ static int __cpuinit
> > >  rcu_torture_onoff(void *arg)
> > >  {
> > >   int cpu;
> > > + unsigned long delta;
> > >   int maxcpu = -1;
> > >   DEFINE_RCU_RANDOM(rand);
> > > + unsigned long starttime;
> > >  
> > >   VERBOSE_PRINTK_STRING("rcu_torture_onoff task started");
> > >   for_each_online_cpu(cpu)
> > > @@ -1509,6 +1519,7 @@ rcu_torture_onoff(void *arg)
> > >   printk(KERN_ALERT "%s" TORTURE_FLAG
> > >  "rcu_torture_onoff task: offlining %d\n",
> > >  torture_type, cpu);
> > > + starttime = jiffies;
> > >   n_offline_attempts++;
> > >   if (cpu_down(cpu) == 0) {
> > >   if (verbose)
> > > @@ -1516,12 +1527,23 @@ rcu_torture_onoff(void *arg)
> > >  "rcu_torture_onoff task: 
> > > offlined %d\n",
> > >  torture_type, cpu);
> > >   n_offline_successes++;
> > > + delta = jiffies - starttime;
> > > + sum_offl

Re: [PATCH tip/core/rcu 1/5] rcu: Update rcutorture defaults

2012-08-30 Thread Josh Triplett

On Thu, Aug 30, 2012 at 02:35:36PM -0700, Paul E. McKenney wrote:
> On Thu, Aug 30, 2012 at 11:57:05AM -0700, Josh Triplett wrote:
> > On Thu, Aug 30, 2012 at 11:45:08AM -0700, Paul E. McKenney wrote:
> > > From: "Paul E. McKenney" 
> > > 
> > > A number of new features have been added to rcutorture over the years, but
> > > the defaults have not been updated to include them.  This commit therefore
> > > turns on a couple of them that have proven helpful and trustworthy, namely
> > > periodic progress reports and testing of NO_HZ.
> > > 
> > > Signed-off-by: Paul E. McKenney 
> > > Signed-off-by: Paul E. McKenney 
> > > ---
> > >  kernel/rcutorture.c |4 ++--
> > >  1 files changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/kernel/rcutorture.c b/kernel/rcutorture.c
> > > index 25b1503..86315d3 100644
> > > --- a/kernel/rcutorture.c
> > > +++ b/kernel/rcutorture.c
> > > @@ -53,10 +53,10 @@ MODULE_AUTHOR("Paul E. McKenney  
> > > and Josh Triplett  > >  
> > >  static int nreaders = -1;/* # reader threads, defaults to 
> > > 2*ncpus */
> > >  static int nfakewriters = 4; /* # fake writer threads */
> > > -static int stat_interval;/* Interval between stats, in seconds. 
> > > */
> > > +static int stat_interval = 60;   /* Interval between stats, in seconds. 
> > > */
> > >   /*  Defaults to "only at end of test". */
> > 
> > Need to remove this comment about the default.
> 
> Good catch!  I have replaced it with "Zero means "only at end of test".

Good point, you definitely still need to document what zero means.

> > >  static bool verbose; /* Print more debug info. */
> > > -static bool test_no_idle_hz; /* Test RCU's support for tickless idle 
> > > CPUs. */
> > > +static bool test_no_idle_hz = 1; /* Test RCU support for tickless idle 
> > > CPUs. */
> > 
> > s/1/true/
> 
> Good point, fixed.
> 
> Thank you for looking this over!

With those two fixes:

Reviewed-by: Josh Triplett 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 1/3] rcu: Remove _rcu_barrier() dependency on __stop_machine()

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 12:03:01PM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> Currently, _rcu_barrier() relies on preempt_disable() to prevent
> any CPU from going offline, which in turn depends on CPU hotplug's
> use of __stop_machine().
> 
> This patch therefore makes _rcu_barrier() use get_online_cpus() to
> block CPU-hotplug operations.  This has the added benefit of removing
> the need for _rcu_barrier() to adopt callbacks:  Because CPU-hotplug
> operations are excluded, there can be no callbacks to adopt.  This
> commit simplifies the code accordingly.
> 
> Signed-off-by: Paul E. McKenney 
> Signed-off-by: Paul E. McKenney 

Impressive simplification!

Reviewed-by: Josh Triplett 

> ---
>  kernel/rcutree.c   |   83 ++-
>  kernel/rcutree.h   |3 --
>  kernel/rcutree_trace.c |4 +-
>  3 files changed, 13 insertions(+), 77 deletions(-)
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index f280e54..9854a00 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -1390,17 +1390,6 @@ static void rcu_adopt_orphan_cbs(struct rcu_state *rsp)
>   int i;
>   struct rcu_data *rdp = __this_cpu_ptr(rsp->rda);
>  
> - /*
> -  * If there is an rcu_barrier() operation in progress, then
> -  * only the task doing that operation is permitted to adopt
> -  * callbacks.  To do otherwise breaks rcu_barrier() and friends
> -  * by causing them to fail to wait for the callbacks in the
> -  * orphanage.
> -  */
> - if (rsp->rcu_barrier_in_progress &&
> - rsp->rcu_barrier_in_progress != current)
> - return;
> -
>   /* Do the accounting first. */
>   rdp->qlen_lazy += rsp->qlen_lazy;
>   rdp->qlen += rsp->qlen;
> @@ -1455,9 +1444,8 @@ static void rcu_cleanup_dying_cpu(struct rcu_state *rsp)
>   * The CPU has been completely removed, and some other CPU is reporting
>   * this fact from process context.  Do the remainder of the cleanup,
>   * including orphaning the outgoing CPU's RCU callbacks, and also
> - * adopting them, if there is no _rcu_barrier() instance running.
> - * There can only be one CPU hotplug operation at a time, so no other
> - * CPU can be attempting to update rcu_cpu_kthread_task.
> + * adopting them.  There can only be one CPU hotplug operation at a time,
> + * so no other CPU can be attempting to update rcu_cpu_kthread_task.
>   */
>  static void rcu_cleanup_dead_cpu(int cpu, struct rcu_state *rsp)
>  {
> @@ -1519,10 +1507,6 @@ static void rcu_cleanup_dead_cpu(int cpu, struct 
> rcu_state *rsp)
>  
>  #else /* #ifdef CONFIG_HOTPLUG_CPU */
>  
> -static void rcu_adopt_orphan_cbs(struct rcu_state *rsp)
> -{
> -}
> -
>  static void rcu_cleanup_dying_cpu(struct rcu_state *rsp)
>  {
>  }
> @@ -2326,13 +2310,10 @@ static void rcu_barrier_func(void *type)
>  static void _rcu_barrier(struct rcu_state *rsp)
>  {
>   int cpu;
> - unsigned long flags;
>   struct rcu_data *rdp;
> - struct rcu_data rd;
>   unsigned long snap = ACCESS_ONCE(rsp->n_barrier_done);
>   unsigned long snap_done;
>  
> - init_rcu_head_on_stack(&rd.barrier_head);
>   _rcu_barrier_trace(rsp, "Begin", -1, snap);
>  
>   /* Take mutex to serialize concurrent rcu_barrier() requests. */
> @@ -2372,70 +2353,30 @@ static void _rcu_barrier(struct rcu_state *rsp)
>   /*
>* Initialize the count to one rather than to zero in order to
>* avoid a too-soon return to zero in case of a short grace period
> -  * (or preemption of this task).  Also flag this task as doing
> -  * an rcu_barrier().  This will prevent anyone else from adopting
> -  * orphaned callbacks, which could cause otherwise failure if a
> -  * CPU went offline and quickly came back online.  To see this,
> -  * consider the following sequence of events:
> -  *
> -  * 1.   We cause CPU 0 to post an rcu_barrier_callback() callback.
> -  * 2.   CPU 1 goes offline, orphaning its callbacks.
> -  * 3.   CPU 0 adopts CPU 1's orphaned callbacks.
> -  * 4.   CPU 1 comes back online.
> -  * 5.   We cause CPU 1 to post an rcu_barrier_callback() callback.
> -  * 6.   Both rcu_barrier_callback() callbacks are invoked, awakening
> -  *  us -- but before CPU 1's orphaned callbacks are invoked!!!
> +  * (or preemption of this task).  Exclude CPU-hotplug operations
> +  * to ensure that no offline CPU has callbacks queued.
>*/
>   init_completion(&rsp->barrier_completion);
>   atomic_set(&rsp->barrier_cpu_count, 1);
> -

Re: [PATCH tip/core/rcu 2/3] rcu: Disallow callback registry on offline CPUs

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 12:03:02PM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> Posting a callback after the CPU_DEAD notifier effectively leaks
> that callback unless/until that CPU comes back online.  Silence is
> unhelpful when attempting to track down such leaks, so this commit emits
> a WARN_ON_ONCE() and unconditionally leaks the callback when an offline
> CPU attempts to register a callback.  The rdp->nxttail[RCU_NEXT_TAIL] is
> set to NULL in the CPU_DEAD notifier and restored in the CPU_UP_PREPARE
> notifier, allowing _call_rcu() to determine exactly when posting callbacks
> is illegal.
> 
> Signed-off-by: Paul E. McKenney 
> Signed-off-by: Paul E. McKenney 

One suggestion below; with or without that change:

Reviewed-by: Josh Triplett 

> ---
>  kernel/rcutree.c |   10 ++
>  1 files changed, 10 insertions(+), 0 deletions(-)
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index 9854a00..5f8c4dd 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -1503,6 +1503,9 @@ static void rcu_cleanup_dead_cpu(int cpu, struct 
> rcu_state *rsp)
>   WARN_ONCE(rdp->qlen != 0 || rdp->nxtlist != NULL,
> "rcu_cleanup_dead_cpu: Callbacks on offline CPU %d: qlen=%lu, 
> nxtlist=%p\n",
> cpu, rdp->qlen, rdp->nxtlist);
> + init_callback_list(rdp);
> + /* Disallow further callbacks on this CPU. */
> + rdp->nxttail[RCU_NEXT_TAIL] = NULL;
>  }
>  
>  #else /* #ifdef CONFIG_HOTPLUG_CPU */
> @@ -1925,6 +1928,12 @@ __call_rcu(struct rcu_head *head, void (*func)(struct 
> rcu_head *rcu),
>   rdp = this_cpu_ptr(rsp->rda);
>  
>   /* Add the callback to our list. */
> + if (unlikely(rdp->nxttail[RCU_NEXT_TAIL] == NULL)) {
> + /* _call_rcu() is illegal on offline CPU; leak the callback. */
> + WARN_ON_ONCE(1);

You can write this as:

if (WARN_ON_ONCE(rdp->nxttail[RCU_NEXT_TAIL] == NULL))

WARN_ON_ONCE also has a built-in unlikely() already.

> + local_irq_restore(flags);
> + return;
> + }
>   ACCESS_ONCE(rdp->qlen)++;
>   if (lazy)
>   rdp->qlen_lazy++;
> @@ -2462,6 +2471,7 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp, 
> int preemptible)
>   rdp->qlen_last_fqs_check = 0;
>   rdp->n_force_qs_snap = rsp->n_force_qs;
>   rdp->blimit = blimit;
> + init_callback_list(rdp);  /* Re-enable callbacks on this CPU. */
>   rdp->dynticks->dynticks_nesting = DYNTICK_TASK_EXIT_IDLE;
>   atomic_set(&rdp->dynticks->dynticks,
>  (atomic_read(&rdp->dynticks->dynticks) & ~0x1) + 1);
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 01/15] rcu: Add PROVE_RCU_DELAY to provoke difficult races

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 11:56:14AM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> There have been some recent bugs that were triggered only when
> preemptible RCU's __rcu_read_unlock() was preempted just after setting
> ->rcu_read_lock_nesting to INT_MIN, which is a low-probability event.
> Therefore, reproducing those bugs (to say nothing of gaining confidence
> in alleged fixes) was quite difficult.  This commit therefore creates
> a new debug-only RCU kernel config option that forces a short delay
> in __rcu_read_unlock() to increase the probability of those sorts of
> bugs occurring.
> 
> Signed-off-by: Paul E. McKenney 
> Signed-off-by: Paul E. McKenney 

Reviewed-by: Josh Triplett 

If you end up adding more such conditional race-provoking delays
elsewhere in the code, consider creating a prove_rcu_udelay() wrapper
to avoid multiple #ifdefs in the code.

> ---
>  kernel/rcupdate.c |4 
>  lib/Kconfig.debug |   14 ++
>  2 files changed, 18 insertions(+), 0 deletions(-)
> 
> diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
> index 4e6a61b..29ca1c6 100644
> --- a/kernel/rcupdate.c
> +++ b/kernel/rcupdate.c
> @@ -45,6 +45,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define CREATE_TRACE_POINTS
>  #include 
> @@ -81,6 +82,9 @@ void __rcu_read_unlock(void)
>   } else {
>   barrier();  /* critical section before exit code. */
>   t->rcu_read_lock_nesting = INT_MIN;
> +#ifdef CONFIG_PROVE_RCU_DELAY
> + udelay(10); /* Make preemption more probable. */
> +#endif /* #ifdef CONFIG_PROVE_RCU_DELAY */
>   barrier();  /* assign before ->rcu_read_unlock_special load */
>   if (unlikely(ACCESS_ONCE(t->rcu_read_unlock_special)))
>   rcu_read_unlock_special(t);
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 2403a63..dacbbe4 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -629,6 +629,20 @@ config PROVE_RCU_REPEATEDLY
>  
>Say N if you are unsure.
>  
> +config PROVE_RCU_DELAY
> + bool "RCU debugging: preemptible RCU race provocation"
> + depends on DEBUG_KERNEL && PREEMPT_RCU
> + default n
> + help
> +  There is a class of races that involve an unlikely preemption
> +  of __rcu_read_unlock() just after ->rcu_read_lock_nesting has
> +  been set to INT_MIN.  This feature inserts a delay at that
> +  point to increase the probability of these races.
> +
> +  Say Y to increase probability of preemption of __rcu_read_unlock().
> +
> +  Say N if you are unsure.
> +
>  config SPARSE_RCU_POINTER
>   bool "RCU debugging: sparse-based checks for pointer usage"
>   default n
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 02/15] rcu: Pull TINY_RCU dyntick-idle tracing into non-idle region

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 11:56:15AM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> Because TINY_RCU's idle detection keys directly off of the nesting
> level, rather than from a separate variable as in TREE_RCU, the
> TINY_RCU dyntick-idle tracing on transition to idle must happen
> before the change to the nesting level.  This commit therefore makes
> this change by passing the desired new value (rather than the old value)
> of the nesting level in to rcu_idle_enter_common().
> 
> [ paulmck: Add fix for wrong-variable bug spotted by
>   Michael Wang . ]
> 
> Signed-off-by: Paul E. McKenney 
> Signed-off-by: Paul E. McKenney 

Reviewed-by: Josh Triplett 

> ---
>  kernel/rcutiny.c |   31 ---
>  1 files changed, 16 insertions(+), 15 deletions(-)
> 
> diff --git a/kernel/rcutiny.c b/kernel/rcutiny.c
> index 547b1fe..e4163c5 100644
> --- a/kernel/rcutiny.c
> +++ b/kernel/rcutiny.c
> @@ -56,24 +56,27 @@ static void __call_rcu(struct rcu_head *head,
>  static long long rcu_dynticks_nesting = DYNTICK_TASK_EXIT_IDLE;
>  
>  /* Common code for rcu_idle_enter() and rcu_irq_exit(), see 
> kernel/rcutree.c. */
> -static void rcu_idle_enter_common(long long oldval)
> +static void rcu_idle_enter_common(long long newval)
>  {
> - if (rcu_dynticks_nesting) {
> + if (newval) {
>   RCU_TRACE(trace_rcu_dyntick("--=",
> - oldval, rcu_dynticks_nesting));
> + rcu_dynticks_nesting, newval));
> + rcu_dynticks_nesting = newval;
>   return;
>   }
> - RCU_TRACE(trace_rcu_dyntick("Start", oldval, rcu_dynticks_nesting));
> + RCU_TRACE(trace_rcu_dyntick("Start", rcu_dynticks_nesting, newval));
>   if (!is_idle_task(current)) {
>   struct task_struct *idle = idle_task(smp_processor_id());
>  
>   RCU_TRACE(trace_rcu_dyntick("Error on entry: not idle task",
> - oldval, rcu_dynticks_nesting));
> + rcu_dynticks_nesting, newval));
>   ftrace_dump(DUMP_ALL);
>   WARN_ONCE(1, "Current pid: %d comm: %s / Idle pid: %d comm: %s",
> current->pid, current->comm,
> idle->pid, idle->comm); /* must be idle task! */
>   }
> + barrier();
> + rcu_dynticks_nesting = newval;
>   rcu_sched_qs(0); /* implies rcu_bh_qsctr_inc(0) */
>  }
>  
> @@ -84,17 +87,16 @@ static void rcu_idle_enter_common(long long oldval)
>  void rcu_idle_enter(void)
>  {
>   unsigned long flags;
> - long long oldval;
> + long long newval;
>  
>   local_irq_save(flags);
> - oldval = rcu_dynticks_nesting;
>   WARN_ON_ONCE((rcu_dynticks_nesting & DYNTICK_TASK_NEST_MASK) == 0);
>   if ((rcu_dynticks_nesting & DYNTICK_TASK_NEST_MASK) ==
>   DYNTICK_TASK_NEST_VALUE)
> - rcu_dynticks_nesting = 0;
> + newval = 0;
>   else
> - rcu_dynticks_nesting  -= DYNTICK_TASK_NEST_VALUE;
> - rcu_idle_enter_common(oldval);
> + newval = rcu_dynticks_nesting - DYNTICK_TASK_NEST_VALUE;
> + rcu_idle_enter_common(newval);
>   local_irq_restore(flags);
>  }
>  EXPORT_SYMBOL_GPL(rcu_idle_enter);
> @@ -105,13 +107,12 @@ EXPORT_SYMBOL_GPL(rcu_idle_enter);
>  void rcu_irq_exit(void)
>  {
>   unsigned long flags;
> - long long oldval;
> + long long newval;
>  
>   local_irq_save(flags);
> - oldval = rcu_dynticks_nesting;
> - rcu_dynticks_nesting--;
> - WARN_ON_ONCE(rcu_dynticks_nesting < 0);
> - rcu_idle_enter_common(oldval);
> + newval = rcu_dynticks_nesting - 1;
> + WARN_ON_ONCE(newval < 0);
> + rcu_idle_enter_common(newval);
>   local_irq_restore(flags);
>  }
>  
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 03/15] rcu: Properly initialize ->boost_tasks on CPU offline

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 11:56:16AM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> When rcu_preempt_offline_tasks() clears tasks from a leaf rcu_node
> structure, it does not NULL out the structure's ->boost_tasks field.
> This commit therefore fixes this issue.
> 
> Signed-off-by: Paul E. McKenney 
> Signed-off-by: Paul E. McKenney 

Reviewed-by: Josh Triplett 

> ---
>  kernel/rcutree_plugin.h |7 ---
>  1 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
> index 7f3244c..b1b4851 100644
> --- a/kernel/rcutree_plugin.h
> +++ b/kernel/rcutree_plugin.h
> @@ -584,8 +584,11 @@ static int rcu_preempt_offline_tasks(struct rcu_state 
> *rsp,
>   raw_spin_unlock(&rnp_root->lock); /* irqs still disabled */
>   }
>  
> + rnp->gp_tasks = NULL;
> + rnp->exp_tasks = NULL;
>  #ifdef CONFIG_RCU_BOOST
> - /* In case root is being boosted and leaf is not. */
> + rnp->boost_tasks = NULL;
> + /* In case root is being boosted and leaf was not. */
>   raw_spin_lock(&rnp_root->lock); /* irqs already disabled */
>   if (rnp_root->boost_tasks != NULL &&
>   rnp_root->boost_tasks != rnp_root->gp_tasks)
> @@ -593,8 +596,6 @@ static int rcu_preempt_offline_tasks(struct rcu_state 
> *rsp,
>   raw_spin_unlock(&rnp_root->lock); /* irqs still disabled */
>  #endif /* #ifdef CONFIG_RCU_BOOST */
>  
> - rnp->gp_tasks = NULL;
> - rnp->exp_tasks = NULL;
>   return retval;
>  }
>  
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 04/15] rcu: Permit RCU_NONIDLE() to be used from interrupt context

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 11:56:17AM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> There is a need to use RCU from interrupt context, but either before
> rcu_irq_enter() is called or after rcu_irq_exit() is called.  If the
> interrupt occurs from idle, then lockdep-RCU will complain about such
> uses, as they appear to be illegal uses of RCU from the idle loop.
> In other environments, RCU_NONIDLE() could be used to properly protect
> the use of RCU, but RCU_NONIDLE() currently cannot be invoked except
> from process context.
> 
> This commit therefore modifies RCU_NONIDLE() to permit its use more
> globally.
> 
> Reported-by: Steven Rostedt 
> Signed-off-by: Paul E. McKenney 
> Signed-off-by: Paul E. McKenney 

Something seems wrong about this.  The addition of EXPORT_SYMBOL_GPL
suggests that such interrupt handlers might live in modules.  In what
situation might a module interrupt handler get called from the idle
loop, before rcu_irq_enter or after rcu_irq_exit, and need to know that
when using RCU?

- Josh Triplett

> ---
>  include/linux/rcupdate.h |6 ++
>  kernel/rcutiny.c |2 ++
>  kernel/rcutree.c |2 ++
>  3 files changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 115ead2..0fbbd52 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -210,14 +210,12 @@ extern void exit_rcu(void);
>   * to nest RCU_NONIDLE() wrappers, but the nesting level is currently
>   * quite limited.  If deeper nesting is required, it will be necessary
>   * to adjust DYNTICK_TASK_NESTING_VALUE accordingly.
> - *
> - * This macro may be used from process-level code only.
>   */
>  #define RCU_NONIDLE(a) \
>   do { \
> - rcu_idle_exit(); \
> + rcu_irq_enter(); \
>   do { a; } while (0); \
> - rcu_idle_enter(); \
> + rcu_irq_exit(); \
>   } while (0)
>  
>  /*
> diff --git a/kernel/rcutiny.c b/kernel/rcutiny.c
> index e4163c5..2e073a2 100644
> --- a/kernel/rcutiny.c
> +++ b/kernel/rcutiny.c
> @@ -115,6 +115,7 @@ void rcu_irq_exit(void)
>   rcu_idle_enter_common(newval);
>   local_irq_restore(flags);
>  }
> +EXPORT_SYMBOL_GPL(rcu_irq_exit);
>  
>  /* Common code for rcu_idle_exit() and rcu_irq_enter(), see 
> kernel/rcutree.c. */
>  static void rcu_idle_exit_common(long long oldval)
> @@ -172,6 +173,7 @@ void rcu_irq_enter(void)
>   rcu_idle_exit_common(oldval);
>   local_irq_restore(flags);
>  }
> +EXPORT_SYMBOL_GPL(rcu_irq_enter);
>  
>  #ifdef CONFIG_DEBUG_LOCK_ALLOC
>  
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index f280e54..96b8aff 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -447,6 +447,7 @@ void rcu_irq_exit(void)
>   rcu_idle_enter_common(rdtp, oldval);
>   local_irq_restore(flags);
>  }
> +EXPORT_SYMBOL_GPL(rcu_irq_exit);
>  
>  /*
>   * rcu_idle_exit_common - inform RCU that current CPU is moving away from 
> idle
> @@ -542,6 +543,7 @@ void rcu_irq_enter(void)
>   rcu_idle_exit_common(rdtp, oldval);
>   local_irq_restore(flags);
>  }
> +EXPORT_SYMBOL_GPL(rcu_irq_enter);
>  
>  /**
>   * rcu_nmi_enter - inform RCU of entry to NMI context
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 05/15] rcu: Improve boost selection when moving tasks to root rcu_node

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 11:56:18AM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> The rcu_preempt_offline_tasks() moves all tasks queued on a given leaf
> rcu_node structure to the root rcu_node, which is done when the last CPU
> corresponding the the leaf rcu_node structure goes offline.  Now that
> RCU-preempt's synchronize_rcu_expedited() implementation blocks CPU-hotplug
> operations during the initialization of each rcu_node structure's
> ->boost_tasks pointer, rcu_preempt_offline_tasks() can do a better job
> of setting the root rcu_node's ->boost_tasks pointer.
> 
> The key point is that rcu_preempt_offline_tasks() runs as part of the
> CPU-hotplug process, so that a concurrent synchronize_rcu_expedited() is
> guaranteed to either have not started on the one hand (in which case there
> is no boosting on behalf of the expedited grace period) to be completely

Missing word: s/to be/or to be/

> initialized on the other (in which case, in absence of other priority

s/absence/the absence/

> boosting, all ->boost_tasks pointers will be initialized).  Therefore,
> if rcu_preempt_offline_tasks() finds that the ->boost_tasks pointer is
> equal to the ->exp_tasks pointer, it can be sure that it is correcty
> placed.
> 
> The case where there was boosting ongoing at the time that the

s/The/In the/

> synchronize_rcu_expedited() function started, different nodes might
> start boosting the tasks blocking the expedited grace period at different
> times.  In this mixed case, the root node will either be boosting tasks
> for the expedited grace period already, or it will start as soon as it
> gets done boosting for the normal grace period -- but in this latter
> case, the root node's tasks needed to be boosted in any case.
> 
> This commit therefore adds a check of the ->boost_tasks pointer against
> the ->exp_tasks pointer to the list that prevents updating ->boost_tasks.

Seems like some hint of this explanation really ought to end up in a
comment somewhere...

> Signed-off-by: Paul E. McKenney 
> Signed-off-by: Paul E. McKenney 

Reviewed-by: Josh Triplett 

> ---
>  kernel/rcutree_plugin.h |3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
> index b1b4851..c930a47 100644
> --- a/kernel/rcutree_plugin.h
> +++ b/kernel/rcutree_plugin.h
> @@ -591,7 +591,8 @@ static int rcu_preempt_offline_tasks(struct rcu_state 
> *rsp,
>   /* In case root is being boosted and leaf was not. */
>   raw_spin_lock(&rnp_root->lock); /* irqs already disabled */
>   if (rnp_root->boost_tasks != NULL &&
> - rnp_root->boost_tasks != rnp_root->gp_tasks)
> + rnp_root->boost_tasks != rnp_root->gp_tasks &&
> + rnp_root->boost_tasks != rnp_root->exp_tasks)
>   rnp_root->boost_tasks = rnp_root->gp_tasks;
>   raw_spin_unlock(&rnp_root->lock); /* irqs still disabled */
>  #endif /* #ifdef CONFIG_RCU_BOOST */
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 06/15] rcu: Make offline-CPU checking allow for indefinite delays

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 11:56:19AM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> The rcu_implicit_offline_qs() function implicitly assumed that execution
> would progress predictably when interrupts are disabled, which is of course
> not guaranteed when running on a hypervisor.  Furthermore, this function
> is short, and is called from one place only in a short function.
> 
> This commit therefore ensures that the timing is checked before
> checking the condition, which guarantees correct behavior even given
> indefinite delays.  It also inlines rcu_implicit_offline_qs() into
> rcu_implicit_dynticks_qs().
> 
> Signed-off-by: Paul E. McKenney 
> Signed-off-by: Paul E. McKenney 

Reviewed-by: Josh Triplett 

> ---
>  kernel/rcutree.c |   53 +
>  1 files changed, 21 insertions(+), 32 deletions(-)
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index 96b8aff..9f44749 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -317,35 +317,6 @@ static struct rcu_node *rcu_get_root(struct rcu_state 
> *rsp)
>  }
>  
>  /*
> - * If the specified CPU is offline, tell the caller that it is in
> - * a quiescent state.  Otherwise, whack it with a reschedule IPI.
> - * Grace periods can end up waiting on an offline CPU when that
> - * CPU is in the process of coming online -- it will be added to the
> - * rcu_node bitmasks before it actually makes it online.  The same thing
> - * can happen while a CPU is in the process of coming online.  Because this
> - * race is quite rare, we check for it after detecting that the grace
> - * period has been delayed rather than checking each and every CPU
> - * each and every time we start a new grace period.
> - */
> -static int rcu_implicit_offline_qs(struct rcu_data *rdp)
> -{
> - /*
> -  * If the CPU is offline for more than a jiffy, it is in a quiescent
> -  * state.  We can trust its state not to change because interrupts
> -  * are disabled.  The reason for the jiffy's worth of slack is to
> -  * handle CPUs initializing on the way up and finding their way
> -  * to the idle loop on the way down.
> -  */
> - if (cpu_is_offline(rdp->cpu) &&
> - ULONG_CMP_LT(rdp->rsp->gp_start + 2, jiffies)) {
> - trace_rcu_fqs(rdp->rsp->name, rdp->gpnum, rdp->cpu, "ofl");
> - rdp->offline_fqs++;
> - return 1;
> - }
> - return 0;
> -}
> -
> -/*
>   * rcu_idle_enter_common - inform RCU that current CPU is moving towards idle
>   *
>   * If the new value of the ->dynticks_nesting counter now is zero,
> @@ -675,7 +646,7 @@ static int dyntick_save_progress_counter(struct rcu_data 
> *rdp)
>   * Return true if the specified CPU has passed through a quiescent
>   * state by virtue of being in or having passed through an dynticks
>   * idle state since the last call to dyntick_save_progress_counter()
> - * for this same CPU.
> + * for this same CPU, or by virtue of having been offline.
>   */
>  static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
>  {
> @@ -699,8 +670,26 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
>   return 1;
>   }
>  
> - /* Go check for the CPU being offline. */
> - return rcu_implicit_offline_qs(rdp);
> + /*
> +  * Check for the CPU being offline, but only if the grace period
> +  * is old enough.  We don't need to worry about the CPU changing
> +  * state: If we see it offline even once, it has been through a
> +  * quiescent state.
> +  *
> +  * The reason for insisting that the grace period be at least
> +  * one jiffy old is that CPUs that are not quite online and that
> +  * have just gone offline can still execute RCU read-side critical
> +  * sections.
> +  */
> + if (ULONG_CMP_GE(rdp->rsp->gp_start + 2, jiffies))
> + return 0;  /* Grace period is not old enough. */
> + barrier();
> + if (cpu_is_offline(rdp->cpu)) {
> + trace_rcu_fqs(rdp->rsp->name, rdp->gpnum, rdp->cpu, "ofl");
> + rdp->offline_fqs++;
> + return 1;
> + }
> + return 0;
>  }
>  
>  static int jiffies_till_stall_check(void)
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 07/15] rcu: Fix obsolete rcu_initiate_boost() header comment

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 11:56:20AM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> Commit 1217ed1b (rcu: permit rcu_read_unlock() to be called while holding
> runqueue locks) made rcu_initiate_boost() restore irq state when releasing
> the rcu_node structure's ->lock, but failed to update the header comment
> accordingly.  This commit therefore brings the header comment up to date.
> 
> Signed-off-by: Paul E. McKenney 
> Signed-off-by: Paul E. McKenney 

Reviewed-by: Josh Triplett 

> ---
>  kernel/rcutree_plugin.h |6 +++---
>  1 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
> index c930a47..3ea60c9 100644
> --- a/kernel/rcutree_plugin.h
> +++ b/kernel/rcutree_plugin.h
> @@ -1193,9 +1193,9 @@ static int rcu_boost_kthread(void *arg)
>   * kthread to start boosting them.  If there is an expedited grace
>   * period in progress, it is always time to boost.
>   *
> - * The caller must hold rnp->lock, which this function releases,
> - * but irqs remain disabled.  The ->boost_kthread_task is immortal,
> - * so we don't need to worry about it going away.
> + * The caller must hold rnp->lock, which this function releases.
> + * The ->boost_kthread_task is immortal, so we don't need to worry
> + * about it going away.
>   */
>  static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags)
>  {
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 08/15] rcu: Apply for_each_rcu_flavor() to increment_cpu_stall_ticks()

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 11:56:21AM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> The increment_cpu_stall_ticks() function listed each RCU flavor
> explicitly, with an ifdef to handle preemptible RCU.  This commit
> therefore applies for_each_rcu_flavor() to save a line of code.

And also mysteriously changes __get_cpu_var to __this_cpu_var without
documenting that (or the reason for it) in the commit message. :)

> Signed-off-by: Paul E. McKenney 
> ---
>  kernel/rcutree_plugin.h |9 -
>  1 files changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
> index 3ea60c9..139a803 100644
> --- a/kernel/rcutree_plugin.h
> +++ b/kernel/rcutree_plugin.h
> @@ -2196,11 +2196,10 @@ static void zero_cpu_stall_ticks(struct rcu_data *rdp)
>  /* Increment ->ticks_this_gp for all flavors of RCU. */
>  static void increment_cpu_stall_ticks(void)
>  {
> - __get_cpu_var(rcu_sched_data).ticks_this_gp++;
> - __get_cpu_var(rcu_bh_data).ticks_this_gp++;
> -#ifdef CONFIG_TREE_PREEMPT_RCU
> - __get_cpu_var(rcu_preempt_data).ticks_this_gp++;
> -#endif /* #ifdef CONFIG_TREE_PREEMPT_RCU */
> + struct rcu_state *rsp;
> +
> + for_each_rcu_flavor(rsp)
> + __this_cpu_ptr(rsp->rda)->ticks_this_gp++;
>  }
>  
>  #else /* #ifdef CONFIG_RCU_CPU_STALL_INFO */
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 09/15] rcu: Avoid rcu_print_detail_task_stall_rnp() segfault

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 11:56:22AM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> The rcu_print_detail_task_stall_rnp() function invokes
> rcu_preempt_blocked_readers_cgp() to verify that there are some preempted
> RCU readers blocking the current grace period outside of the protection
> of the rcu_node structure's ->lock.  This means that the last blocked
> reader might exit its RCU read-side critical section and remove itself
> from the ->blkd_tasks list before the ->lock is acquired, resulting in
> a segmentation fault when the subsequent code attempts to dereference
> the now-NULL gp_tasks pointer.
> 
> This commit therefore moves the test under the lock.  This will not
> have measurable effect on lock contention because this code is invoked
> only when printing RCU CPU stall warnings, in other words, in the common
> case, never.
> 
> Signed-off-by: Paul E. McKenney 
> ---
>  kernel/rcutree_plugin.h |6 --
>  1 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
> index 139a803..c02dc1d 100644
> --- a/kernel/rcutree_plugin.h
> +++ b/kernel/rcutree_plugin.h
> @@ -422,9 +422,11 @@ static void rcu_print_detail_task_stall_rnp(struct 
> rcu_node *rnp)
>   unsigned long flags;
>   struct task_struct *t;
>  
> - if (!rcu_preempt_blocked_readers_cgp(rnp))
> - return;
>   raw_spin_lock_irqsave(&rnp->lock, flags);
> + if (!rcu_preempt_blocked_readers_cgp(rnp)) {
> + raw_spin_unlock_irqrestore(&rnp->lock, flags);
> + return;
> + }
>   t = list_entry(rnp->gp_tasks,
>  struct task_struct, rcu_node_entry);
>   list_for_each_entry_continue(t, &rnp->blkd_tasks, rcu_node_entry)

Given the small number of lines of code inside the critical section
here, I think this would look clearer without the early return and
duplicate lock release:

raw_spin_lock_irqsave(&rnp->lock, flags);
if (rcu_preempt_blocked_readers_cgp(rnp)) {
...
}
raw_spin_unlock_irqrestore(&rnp->lock, flags);

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 10/15] rcu: Protect rcu_node accesses during CPU stall warnings

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 11:56:23AM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> The print_other_cpu_stall() function accesses a number of rcu_node
> fields without protection from the ->lock.  In theory, this is not
> a problem because the fields accessed are all integers, but in
> practice the compiler can get nasty.  Therefore, the commit extends
> the existing critical section to cover the entire loop body.
> 
> Signed-off-by: Paul E. McKenney 
> ---
>  kernel/rcutree.c |6 --
>  1 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index 9f44749..fbe43b0 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -746,14 +746,16 @@ static void print_other_cpu_stall(struct rcu_state *rsp)
>   rcu_for_each_leaf_node(rsp, rnp) {
>   raw_spin_lock_irqsave(&rnp->lock, flags);
>   ndetected += rcu_print_task_stall(rnp);
> - raw_spin_unlock_irqrestore(&rnp->lock, flags);
> - if (rnp->qsmask == 0)
> + if (rnp->qsmask == 0) {
> + raw_spin_unlock_irqrestore(&rnp->lock, flags);
>   continue;
> + }
>   for (cpu = 0; cpu <= rnp->grphi - rnp->grplo; cpu++)
>   if (rnp->qsmask & (1UL << cpu)) {
>   print_cpu_stall_info(rsp, rnp->grplo + cpu);
>   ndetected++;
>   }
> + raw_spin_unlock_irqrestore(&rnp->lock, flags);
>   }

Now that you've extended the lock over the rest of the loop body, I
think this would look much clearer if written without the continue and
duplicate lock release:

...
if (rnp->qsmask != 0)
for (cpu = 0; cpu <= rnp->grphi - rnp->grplo; cpu++)

raw_spin_unlock_irqrestore(&rnp->lock, flags);
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 11/15] rcu: Avoid spurious RCU CPU stall warnings

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 11:56:24AM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> If a given CPU avoids the idle loop but also avoids starting a new
> RCU grace period for a full minute, RCU can issue spurious RCU CPU
> stall warnings.  This commit fixes this issue by adding a check for
> ongoing grace period to avoid these spurious stall warnings.
> 
> Reported-by: Becky Bruce 
> Signed-off-by: Paul E. McKenney 
> Signed-off-by: Paul E. McKenney 

Reviewed-by: Josh Triplett 

> ---
>  kernel/rcutree.c |3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index fbe43b0..e58097b 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -820,7 +820,8 @@ static void check_cpu_stall(struct rcu_state *rsp, struct 
> rcu_data *rdp)
>   j = ACCESS_ONCE(jiffies);
>   js = ACCESS_ONCE(rsp->jiffies_stall);
>   rnp = rdp->mynode;
> - if ((ACCESS_ONCE(rnp->qsmask) & rdp->grpmask) && ULONG_CMP_GE(j, js)) {
> + if (rcu_gp_in_progress(rsp) &&
> + (ACCESS_ONCE(rnp->qsmask) & rdp->grpmask) && ULONG_CMP_GE(j, js)) {
>  
>   /* We haven't checked in, so go dump stack. */
>   print_cpu_stall(rsp);
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 12/15] rcu: Remove redundant memory barrier from __call_rcu()

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 11:56:25AM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> The first memory barrier in __call_rcu() is supposed to order any
> updates done beforehand by the caller against the actual queuing
> of the callback.  However, the second memory barrier (which is intended
> to order incrementing the queue lengths before queuing the callback)
> is also between the caller's updates and the queuing of the callback.
> The second memory barrier can therefore serve both purposes.
> 
> This commit therefore removes the first memory barrier.

I don't see any such second memory barrier in __call_rcu(), at least not
in current master.  Right after this smp_mb(), __call_rcu() enqueues the
callback and increments the queue length.

Did you add a second memory barrier in some other patch that hasn't made
it upstream yet?  If so, could you note that patch dependency explicitly
in the commit message?

- Josh Triplett

> Signed-off-by: Paul E. McKenney 
> Signed-off-by: Paul E. McKenney 
> ---
>  kernel/rcutree.c |2 --
>  1 files changed, 0 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index e58097b..5b6709b 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -1923,8 +1923,6 @@ __call_rcu(struct rcu_head *head, void (*func)(struct 
> rcu_head *rcu),
>   head->func = func;
>   head->next = NULL;
>  
> - smp_mb(); /* Ensure RCU update seen before callback registry. */
> -
>   /*
>* Opportunistically note grace-period endings and beginnings.
>* Note that we might see a beginning right after we see an
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 13/15] rcu: Move TINY_PREEMPT_RCU away from raw_local_irq_save()

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 11:56:26AM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> The use of raw_local_irq_save() is unnecessary, given that local_irq_save()
> really does disable interrupts.  Also, it appears to interfere with lockdep.
> Therefore, this commit moves to local_irq_save().

It looks like the non-raw versions also include tracing, which typically
has recursive dependency problems with RCU.  Can all of these call sites
safely call into tracing without recursing back into RCU?

- Josh Triplett

> Reported-by: Fengguang Wu 
> Signed-off-by: Paul E. McKenney 
> Signed-off-by: Paul E. McKenney 
> Tested-by: Fengguang Wu 
> ---
>  kernel/rcutiny_plugin.h |   10 +-
>  1 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/rcutiny_plugin.h b/kernel/rcutiny_plugin.h
> index 918fd1e..3d01902 100644
> --- a/kernel/rcutiny_plugin.h
> +++ b/kernel/rcutiny_plugin.h
> @@ -278,7 +278,7 @@ static int rcu_boost(void)
>   rcu_preempt_ctrlblk.exp_tasks == NULL)
>   return 0;  /* Nothing to boost. */
>  
> - raw_local_irq_save(flags);
> + local_irq_save(flags);
>  
>   /*
>* Recheck with irqs disabled: all tasks in need of boosting
> @@ -287,7 +287,7 @@ static int rcu_boost(void)
>*/
>   if (rcu_preempt_ctrlblk.boost_tasks == NULL &&
>   rcu_preempt_ctrlblk.exp_tasks == NULL) {
> - raw_local_irq_restore(flags);
> + local_irq_restore(flags);
>   return 0;
>   }
>  
> @@ -317,7 +317,7 @@ static int rcu_boost(void)
>   t = container_of(tb, struct task_struct, rcu_node_entry);
>   rt_mutex_init_proxy_locked(&mtx, t);
>   t->rcu_boost_mutex = &mtx;
> - raw_local_irq_restore(flags);
> + local_irq_restore(flags);
>   rt_mutex_lock(&mtx);
>   rt_mutex_unlock(&mtx);  /* Keep lockdep happy. */
>  
> @@ -991,9 +991,9 @@ static void rcu_trace_sub_qlen(struct rcu_ctrlblk *rcp, 
> int n)
>  {
>   unsigned long flags;
>  
> - raw_local_irq_save(flags);
> + local_irq_save(flags);
>   rcp->qlen -= n;
> - raw_local_irq_restore(flags);
> + local_irq_restore(flags);
>  }
>  
>  /*
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 12/15] rcu: Remove redundant memory barrier from __call_rcu()

2012-08-31 Thread Josh Triplett

On Fri, Aug 31, 2012 at 11:30:35AM -0700, Josh Triplett wrote:
> On Thu, Aug 30, 2012 at 11:56:25AM -0700, Paul E. McKenney wrote:
> > From: "Paul E. McKenney" 
> > 
> > The first memory barrier in __call_rcu() is supposed to order any
> > updates done beforehand by the caller against the actual queuing
> > of the callback.  However, the second memory barrier (which is intended
> > to order incrementing the queue lengths before queuing the callback)
> > is also between the caller's updates and the queuing of the callback.
> > The second memory barrier can therefore serve both purposes.
> > 
> > This commit therefore removes the first memory barrier.
> 
> I don't see any such second memory barrier in __call_rcu(), at least not
> in current master.  Right after this smp_mb(), __call_rcu() enqueues the
> callback and increments the queue length.
> 
> Did you add a second memory barrier in some other patch that hasn't made
> it upstream yet?  If so, could you note that patch dependency explicitly
> in the commit message?

Argh, nevermind.  Looked at the wrong branch, not master.  Looking at
master, I do indeed see the second smp_mb().

Reviewed-by: Josh Triplett 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 14/15] time: RCU permitted to stop idle entry via softirq

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 11:56:27AM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> The can_stop_idle_tick() function complains if a softirq vector is
> raised too late in the idle-entry process, presumably in order to
> prevent dangling softirq invocations from being delayed across the
> full idle period, which might be indefinitely long -- and if softirq
> was asserted any later than the call to this function, such a delay
> might well happen.
> 
> However, RCU needs to be able to use softirq to stop idle entry in
> order to be able to drain RCU callbacks from the current CPU, which in
> turn enables faster entry into dyntick-idle mode, which in turn reduces
> power consumption.  Because RCU takes this action at a well-defined
> point in the idle-entry path, it is safe for RCU to take this approach.
> 
> This commit therefore silences the error message that is sometimes
> produced when the going-idle CPU suddenly finds that it has an RCU_SOFTIRQ
> to process.  The error message will continue to be issued for other
> softirq vectors.
> 
> Reported-by: Sedat Dilek 
> Signed-off-by: Paul E. McKenney 
> Signed-off-by: Paul E. McKenney 
> Tested-by: Sedat Dilek 

Reviewed-by: Josh Triplett 

> ---
>  include/linux/interrupt.h |2 ++
>  kernel/time/tick-sched.c  |3 ++-
>  2 files changed, 4 insertions(+), 1 deletions(-)
> 
> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> index c5f856a..5e4e617 100644
> --- a/include/linux/interrupt.h
> +++ b/include/linux/interrupt.h
> @@ -430,6 +430,8 @@ enum
>   NR_SOFTIRQS
>  };
>  
> +#define SOFTIRQ_STOP_IDLE_MASK (~(1 << RCU_SOFTIRQ))
> +
>  /* map softirq index to softirq name. update 'softirq_to_name' in
>   * kernel/softirq.c when adding a new softirq.
>   */
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 024540f..4b1785a 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -436,7 +436,8 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched 
> *ts)
>   if (unlikely(local_softirq_pending() && cpu_online(cpu))) {
>   static int ratelimit;
>  
> - if (ratelimit < 10) {
> + if (ratelimit < 10 &&
> + (local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK)) {
>   printk(KERN_ERR "NOHZ: local_softirq_pending %02x\n",
>  (unsigned int) local_softirq_pending());
>   ratelimit++;
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 15/15] kmemleak: Replace list_for_each_continue_rcu with new interface

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 11:56:28AM -0700, Paul E. McKenney wrote:
> From: Michael Wang 
> 
> This patch replaces list_for_each_continue_rcu() with
> list_for_each_entry_continue_rcu() to save a few lines
> of code and allow removing list_for_each_continue_rcu().
> 
> Signed-off-by: Michael Wang 
> Acked-by: Catalin Marinas 
> Signed-off-by: Paul E. McKenney 

Reviewed-by: Josh Triplett 

> ---
>  mm/kmemleak.c |6 ++
>  1 files changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/kmemleak.c b/mm/kmemleak.c
> index 45eb621..0de83b4 100644
> --- a/mm/kmemleak.c
> +++ b/mm/kmemleak.c
> @@ -1483,13 +1483,11 @@ static void *kmemleak_seq_next(struct seq_file *seq, 
> void *v, loff_t *pos)
>  {
>   struct kmemleak_object *prev_obj = v;
>   struct kmemleak_object *next_obj = NULL;
> - struct list_head *n = &prev_obj->object_list;
> + struct kmemleak_object *obj = prev_obj;
>  
>   ++(*pos);
>  
> - list_for_each_continue_rcu(n, &object_list) {
> - struct kmemleak_object *obj =
> - list_entry(n, struct kmemleak_object, object_list);
> + list_for_each_entry_continue_rcu(obj, &object_list, object_list) {
>   if (get_object(obj)) {
>   next_obj = obj;
>   break;
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 01/26] rcu: New rcu_user_enter() and rcu_user_exit() APIs

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 02:05:18PM -0700, Paul E. McKenney wrote:
> From: Frederic Weisbecker 
> 
> RCU currently insists that only idle tasks can enter RCU idle mode, which
> prohibits an adaptive tickless kernel (AKA nohz cpusets), which in turn
> would mean that usermode execution would always take scheduling-clock
> interrupts, even when there is only one task runnable on the CPU in
> question.
> 
> This commit therefore adds rcu_user_enter() and rcu_user_exit(), which
> allow non-idle tasks to enter RCU idle mode.  These are quite similar
> to rcu_idle_enter() and rcu_idle_exit(), respectively, except that they
> omit the idle-task checks.
> 
> [ Updated to use "user" flag rather than separate check functions. ]
> 
> Signed-off-by: Frederic Weisbecker 
> Signed-off-by: Paul E. McKenney 
> Cc: Alessio Igor Bogani 
> Cc: Andrew Morton 
> Cc: Avi Kivity 
> Cc: Chris Metcalf 
> Cc: Christoph Lameter 
> Cc: Daniel Lezcano 
> Cc: Geoff Levand 
> Cc: Gilad Ben Yossef 
> Cc: Hakan Akkan 
> Cc: Ingo Molnar 
> Cc: Kevin Hilman 
> Cc: Max Krasnyansky 
> Cc: Peter Zijlstra 
> Cc: Stephen Hemminger 
> Cc: Steven Rostedt 
> Cc: Sven-Thorsten Dietrich 
> Cc: Thomas Gleixner 

A few suggestions below: an optional microoptimization and some bugfixes.
With the bugfixes, and with or without the microoptimization:

Reviewed-by: Josh Triplett 

> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
[...]
> -static void rcu_idle_enter_common(struct rcu_dynticks *rdtp, long long 
> oldval)
> +static void rcu_eqs_enter_common(struct rcu_dynticks *rdtp, long long oldval,
> + bool user)
>  {
>   trace_rcu_dyntick("Start", oldval, 0);
> - if (!is_idle_task(current)) {
> + if (!is_idle_task(current) && !user) {

Microoptimization: putting the !user check first (here and in the exit
function) would allow the compiler to partially inline rcu_eqs_*_common
into the two trivial wrappers and constant-fold away the test for !user.

> +void rcu_idle_enter(void)
> +{
> + rcu_eqs_enter(0);
> +}

s/0/false/

> +void rcu_user_enter(void)
> +{
> + rcu_eqs_enter(1);
> +}

s/1/true/

> -static void rcu_idle_exit_common(struct rcu_dynticks *rdtp, long long oldval)
> +static void rcu_eqs_exit_common(struct rcu_dynticks *rdtp, long long oldval,
> +int user)
>  {
>   smp_mb__before_atomic_inc();  /* Force ordering w/previous sojourn. */
>   atomic_inc(&rdtp->dynticks);
> @@ -464,7 +490,7 @@ static void rcu_idle_exit_common(struct rcu_dynticks 
> *rdtp, long long oldval)
>   WARN_ON_ONCE(!(atomic_read(&rdtp->dynticks) & 0x1));
>   rcu_cleanup_after_idle(smp_processor_id());
>   trace_rcu_dyntick("End", oldval, rdtp->dynticks_nesting);
> - if (!is_idle_task(current)) {
> + if (!is_idle_task(current) && !user) {

Same micro-optimization as the enter function.

> +void rcu_idle_exit(void)
> +{
> + rcu_eqs_exit(0);
> +}

s/0/false/

> +void rcu_user_exit(void)
> +{
> + rcu_eqs_exit(1);
> +}

s/1/true/

> @@ -539,7 +586,7 @@ void rcu_irq_enter(void)
>   if (oldval)
>   trace_rcu_dyntick("++=", oldval, rdtp->dynticks_nesting);
>   else
> - rcu_idle_exit_common(rdtp, oldval);
> + rcu_eqs_exit_common(rdtp, oldval, 1);

s/1/true/, and likewise in rcu_irq_exit.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 02/26] rcu: New rcu_user_enter_irq() and rcu_user_exit_irq() APIs

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 02:05:19PM -0700, Paul E. McKenney wrote:
> From: Frederic Weisbecker 
> 
> In some cases, it is necessary to enter or exit userspace-RCU-idle mode
> from an interrupt handler, for example, if some other CPU sends this
> CPU a resched IPI.  In this case, the current CPU would enter the IPI
> handler in userspace-RCU-idle mode, but would need to exit the IPI handler
> after having exited that mode.
> 
> To allow this to work, this commit adds two new APIs to TREE_RCU:
> 
> - rcu_user_enter_irq(). This must be called from an interrupt between
> rcu_irq_enter() and rcu_irq_exit().  After the irq calls rcu_irq_exit(),
> the irq handler will return into an RCU extended quiescent state.
> In theory, this interrupt is never a nested interrupt, but in practice
> it might interrupt softirq, which looks to RCU like a nested interrupt.
> 
> - rcu_user_exit_irq(). This must be called from a non-nesting
> interrupt, interrupting an RCU extended quiescent state, also
> between rcu_irq_enter() and rcu_irq_exit(). After the irq calls
> rcu_irq_exit(), the irq handler will return in an RCU non-quiescent
> state.

These names seem a bit confusing.  From the descriptions, it sounds like
you don't always need to pair them; rcu_irq_exit() will return to a
non-quiescent state, unless you call rcu_user_enter_irq and *don't* call
rcu_user_exit_irq.  Did I get that semantic right?

Given that, the "enter" and "exit" names seem confusing.  This seems
more like a flag you can set and clear, rather than a delimited region
as suggested by an enter/exit pair.

How about something vaguely like rcu_user_irq_set_eqs and
rcu_user_irq_clear_eqs?

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 02/26] rcu: New rcu_user_enter_irq() and rcu_user_exit_irq() APIs

2012-08-31 Thread Josh Triplett

On Fri, Aug 31, 2012 at 09:54:39PM +0200, Frederic Weisbecker wrote:
> 2012/8/31 Josh Triplett :
> > Given that, the "enter" and "exit" names seem confusing.  This seems
> > more like a flag you can set and clear, rather than a delimited region
> > as suggested by an enter/exit pair.
> >
> > How about something vaguely like rcu_user_irq_set_eqs and
> > rcu_user_irq_clear_eqs?
> 
> I'd rather suggest rcu_user_enter_after_irq and
> rcu_user_exit_after_irq. It describes precisely what it does.

Those names sound reasonable, sure; in the context of "after",
enter/exit sounds less confusing.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 03/26] rcu: Make RCU_FAST_NO_HZ handle adaptive ticks

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 02:05:20PM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> The current implementation of RCU_FAST_NO_HZ tries reasonably hard to rid
> the current CPU of RCU callbacks.  This is appropriate when the CPU is
> entering idle, where it doesn't have much useful to do anyway, but is most
> definitely not what you want when transitioning to user-mode execution.
> This commit therefore detects the adaptive-tick case, and refrains from
> burning CPU time getting rid of RCU callbacks in that case.

With the OOM handler from your other patch series, I don't know that it
makes as much sense in the idle case, either; perhaps it would make more
sense to wait and batch up more callbacks as long as you have memory,
and then run them in one big burst.

> Signed-off-by: Paul E. McKenney 
> Signed-off-by: Paul E. McKenney 

Reviewed-by: Josh Triplett 

>  kernel/rcutree_plugin.h |   20 
>  1 files changed, 20 insertions(+), 0 deletions(-)
> 
> diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
> index 7f3244c..b0f09d6 100644
> --- a/kernel/rcutree_plugin.h
> +++ b/kernel/rcutree_plugin.h
> @@ -1997,6 +1997,26 @@ static void rcu_prepare_for_idle(int cpu)
>   if (!tne)
>   return;
>  
> + /* Adaptive-tick mode, where usermode execution is idle to RCU. */
> + if (!is_idle_task(current)) {
> + rdtp->dyntick_holdoff = jiffies - 1;
> + if (rcu_cpu_has_nonlazy_callbacks(cpu)) {
> + trace_rcu_prep_idle("User dyntick with callbacks");
> + rdtp->idle_gp_timer_expires =
> + round_up(jiffies + RCU_IDLE_GP_DELAY,
> +  RCU_IDLE_GP_DELAY);
> + } else if (rcu_cpu_has_callbacks(cpu)) {
> + rdtp->idle_gp_timer_expires =
> + round_jiffies(jiffies + RCU_IDLE_LAZY_GP_DELAY);
> + trace_rcu_prep_idle("User dyntick with lazy callbacks");
> + } else {
> + return;
> + }
> + tp = &rdtp->idle_gp_timer;
> + mod_timer_pinned(tp, rdtp->idle_gp_timer_expires);
> + return;
> + }
> +
>   /*
>* If this is an idle re-entry, for example, due to use of
>* RCU_NONIDLE() or the new idle-loop tracing API within the idle
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 04/26] rcu: Settle config for userspace extended quiescent state

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 02:05:21PM -0700, Paul E. McKenney wrote:
> From: Frederic Weisbecker 
> 
> Create a new config option under the RCU menu that put
> CPUs under RCU extended quiescent state (as in dynticks
> idle mode) when they run in userspace. This require
> some contribution from architectures to hook into kernel
> and userspace boundaries.
> 
> Signed-off-by: Frederic Weisbecker 
> Cc: Alessio Igor Bogani 
> Cc: Andrew Morton 
> Cc: Avi Kivity 
> Cc: Chris Metcalf 
> Cc: Christoph Lameter 
> Cc: Geoff Levand 
> Cc: Gilad Ben Yossef 
> Cc: Hakan Akkan 
> Cc: H. Peter Anvin 
> Cc: Ingo Molnar 
> Cc: Josh Triplett 
> Cc: Kevin Hilman 
> Cc: Max Krasnyansky 
> Cc: Peter Zijlstra 
> Cc: Stephen Hemminger 
> Cc: Steven Rostedt 
> Cc: Sven-Thorsten Dietrich 
> Cc: Thomas Gleixner 
> Signed-off-by: Paul E. McKenney 

One question below, but nonethelesss:

Reviewed-by: Josh Triplett 

>  arch/Kconfig |   10 ++
>  include/linux/rcupdate.h |8 
>  init/Kconfig |   10 ++
>  kernel/rcutree.c |5 -
>  4 files changed, 32 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 72f2fa1..1401a75 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -281,4 +281,14 @@ config SECCOMP_FILTER
>  
> See Documentation/prctl/seccomp_filter.txt for details.
>  
> +config HAVE_RCU_USER_QS
> + bool
> + help
> +   Provide kernel entry/exit hooks necessary for userspace
> +   RCU extended quiescent state. Syscalls need to be wrapped inside
> +   rcu_user_exit()-rcu_user_enter() through the slow path using
> +   TIF_NOHZ flag. Exceptions handlers must be wrapped as well. Irqs
> +   are already protected inside rcu_irq_enter/rcu_irq_exit() but
> +   preemption or signal handling on irq exit still need to be protected.
> +
>  source "kernel/gcov/Kconfig"
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 81d3d5c..e47 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -191,10 +191,18 @@ extern void rcu_idle_enter(void);
>  extern void rcu_idle_exit(void);
>  extern void rcu_irq_enter(void);
>  extern void rcu_irq_exit(void);
> +
> +#ifdef CONFIG_RCU_USER_QS
>  extern void rcu_user_enter(void);
>  extern void rcu_user_exit(void);
>  extern void rcu_user_enter_irq(void);
>  extern void rcu_user_exit_irq(void);
> +#else
> +static inline void rcu_user_enter(void) { }
> +static inline void rcu_user_exit(void) { }
> +#endif /* CONFIG_RCU_USER_QS */
> +
> +
>  extern void exit_rcu(void);
>  
>  /**
> diff --git a/init/Kconfig b/init/Kconfig
> index af6c7f8..f6a1830 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -441,6 +441,16 @@ config PREEMPT_RCU
> This option enables preemptible-RCU code that is common between
> the TREE_PREEMPT_RCU and TINY_PREEMPT_RCU implementations.
>  
> +config RCU_USER_QS
> + bool "Consider userspace as in RCU extended quiescent state"
> + depends on HAVE_RCU_USER_QS && SMP

Does this actually depend on SMP, or does it depend on the non-TINY RCU
implementation?  If the latter, it should depend on that rather than
SMP.

(I assume that the tiny RCU implementation simply doesn't need all this
machinery because it doesn't need coordinated quiescence at all?  Or
does tiny RCU still cause a periodic wakeup on UP?)

> + help
> +   This option sets hooks on kernel / userspace boundaries and
> +   puts RCU in extended quiescent state when the CPU runs in
> +   userspace. It means that when a CPU runs in userspace, it is
> +   excluded from the global RCU state machine and thus doesn't
> +   to keep the timer tick on for RCU.
> +
>  config RCU_FANOUT
>   int "Tree-based hierarchical RCU fanout value"
>   range 2 64 if 64BIT
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index 8fdea17..e287c4a 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -424,6 +424,7 @@ void rcu_idle_enter(void)
>  }
>  EXPORT_SYMBOL_GPL(rcu_idle_enter);
>  
> +#ifdef CONFIG_RCU_USER_QS
>  /**
>   * rcu_user_enter - inform RCU that we are resuming userspace.
>   *
> @@ -438,7 +439,6 @@ void rcu_user_enter(void)
>  }
>  EXPORT_SYMBOL_GPL(rcu_user_enter);
>  
> -
>  /**
>   * rcu_user_enter_irq - inform RCU that we are going to resume userspace
>   * after the current irq returns.
> @@ -459,6 +459,7 @@ void rcu_user_enter_irq(void)
>   rdtp->dynticks_nesting = 1;
>   local_irq_restore(flags);
>  }
> +#endif
>  
>  /**
>   * rcu_irq_exit - inform RCU t

Re: [PATCH tip/core/rcu 05/26] rcu: Allow rcu_user_enter()/exit() to nest

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 02:05:22PM -0700, Paul E. McKenney wrote:
> From: Frederic Weisbecker 
> 
> Allow calls to rcu_user_enter() even if we are already
> in userspace (as seen by RCU) and allow calls to rcu_user_exit()
> even if we are already in the kernel.
> 
> This makes the APIs more flexible to be called from architectures.
> Exception entries for example won't need to know if they come from
> userspace before calling rcu_user_exit().
> 
> Signed-off-by: Frederic Weisbecker 
> Cc: Alessio Igor Bogani 
> Cc: Andrew Morton 
> Cc: Avi Kivity 
> Cc: Chris Metcalf 
> Cc: Christoph Lameter 
> Cc: Geoff Levand 
> Cc: Gilad Ben Yossef 
> Cc: Hakan Akkan 
> Cc: H. Peter Anvin 
> Cc: Ingo Molnar 
> Cc: Josh Triplett 
> Cc: Kevin Hilman 
> Cc: Max Krasnyansky 
> Cc: Peter Zijlstra 
> Cc: Stephen Hemminger 
> Cc: Steven Rostedt 
> Cc: Sven-Thorsten Dietrich 
> Cc: Thomas Gleixner 
> Signed-off-by: Paul E. McKenney 

Reviewed-by: Josh Triplett 

>  kernel/rcutree.c |   41 +
>  kernel/rcutree.h |3 +++
>  2 files changed, 36 insertions(+), 8 deletions(-)
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index e287c4a..8bbc7fb 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -390,11 +390,9 @@ static void rcu_eqs_enter_common(struct rcu_dynticks 
> *rdtp, long long oldval,
>   */
>  static void rcu_eqs_enter(bool user)
>  {
> - unsigned long flags;
>   long long oldval;
>   struct rcu_dynticks *rdtp;
>  
> - local_irq_save(flags);
>   rdtp = &__get_cpu_var(rcu_dynticks);
>   oldval = rdtp->dynticks_nesting;
>   WARN_ON_ONCE((oldval & DYNTICK_TASK_NEST_MASK) == 0);
> @@ -403,7 +401,6 @@ static void rcu_eqs_enter(bool user)
>   else
>   rdtp->dynticks_nesting -= DYNTICK_TASK_NEST_VALUE;
>   rcu_eqs_enter_common(rdtp, oldval, user);
> - local_irq_restore(flags);
>  }
>  
>  /**
> @@ -420,7 +417,11 @@ static void rcu_eqs_enter(bool user)
>   */
>  void rcu_idle_enter(void)
>  {
> + unsigned long flags;
> +
> + local_irq_save(flags);
>   rcu_eqs_enter(0);
> + local_irq_restore(flags);
>  }
>  EXPORT_SYMBOL_GPL(rcu_idle_enter);
>  
> @@ -435,7 +436,18 @@ EXPORT_SYMBOL_GPL(rcu_idle_enter);
>   */
>  void rcu_user_enter(void)
>  {
> - rcu_eqs_enter(1);
> + unsigned long flags;
> + struct rcu_dynticks *rdtp;
> +
> + WARN_ON_ONCE(!current->mm);
> +
> + local_irq_save(flags);
> + rdtp = &__get_cpu_var(rcu_dynticks);
> + if (!rdtp->in_user) {
> + rdtp->in_user = true;
> + rcu_eqs_enter(1);
> + }
> + local_irq_restore(flags);
>  }
>  EXPORT_SYMBOL_GPL(rcu_user_enter);
>  
> @@ -530,11 +542,9 @@ static void rcu_eqs_exit_common(struct rcu_dynticks 
> *rdtp, long long oldval,
>   */
>  static void rcu_eqs_exit(bool user)
>  {
> - unsigned long flags;
>   struct rcu_dynticks *rdtp;
>   long long oldval;
>  
> - local_irq_save(flags);
>   rdtp = &__get_cpu_var(rcu_dynticks);
>   oldval = rdtp->dynticks_nesting;
>   WARN_ON_ONCE(oldval < 0);
> @@ -543,7 +553,6 @@ static void rcu_eqs_exit(bool user)
>   else
>   rdtp->dynticks_nesting = DYNTICK_TASK_EXIT_IDLE;
>   rcu_eqs_exit_common(rdtp, oldval, user);
> - local_irq_restore(flags);
>  }
>  
>  /**
> @@ -559,7 +568,11 @@ static void rcu_eqs_exit(bool user)
>   */
>  void rcu_idle_exit(void)
>  {
> + unsigned long flags;
> +
> + local_irq_save(flags);
>   rcu_eqs_exit(0);
> + local_irq_restore(flags);
>  }
>  EXPORT_SYMBOL_GPL(rcu_idle_exit);
>  
> @@ -572,7 +585,16 @@ EXPORT_SYMBOL_GPL(rcu_idle_exit);
>   */
>  void rcu_user_exit(void)
>  {
> - rcu_eqs_exit(1);
> + unsigned long flags;
> + struct rcu_dynticks *rdtp;
> +
> + local_irq_save(flags);
> + rdtp = &__get_cpu_var(rcu_dynticks);
> + if (rdtp->in_user) {
> + rdtp->in_user = false;
> + rcu_eqs_exit(1);
> + }
> + local_irq_restore(flags);
>  }
>  EXPORT_SYMBOL_GPL(rcu_user_exit);
>  
> @@ -2590,6 +2612,9 @@ rcu_boot_init_percpu_data(int cpu, struct rcu_state 
> *rsp)
>   rdp->dynticks = &per_cpu(rcu_dynticks, cpu);
>   WARN_ON_ONCE(rdp->dynticks->dynticks_nesting != DYNTICK_TASK_EXIT_IDLE);
>   WARN_ON_ONCE(atomic_read(&rdp->dynticks->dynticks) != 1);
> +#ifdef CONFIG_RCU_USER_QS
> + WARN_ON_ONCE(rdp->dynticks->in_user);
> +#endif
>   rdp->cpu = cpu;
>   rdp->

Re: [PATCH tip/core/rcu 06/26] rcu: Ignore userspace extended quiescent state by default

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 02:05:23PM -0700, Paul E. McKenney wrote:
> From: Frederic Weisbecker 
> 
> By default we don't want to enter into RCU extended quiescent
> state while in userspace because doing this produces some overhead
> (eg: use of syscall slowpath). Set it off by default and ready to
> run when some feature like adaptive tickless need it.
> 
> Signed-off-by: Frederic Weisbecker 
> Cc: Alessio Igor Bogani 
> Cc: Andrew Morton 
> Cc: Avi Kivity 
> Cc: Chris Metcalf 
> Cc: Christoph Lameter 
> Cc: Geoff Levand 
> Cc: Gilad Ben Yossef 
> Cc: Hakan Akkan 
> Cc: H. Peter Anvin 
> Cc: Ingo Molnar 
> Cc: Josh Triplett 
> Cc: Kevin Hilman 
> Cc: Max Krasnyansky 
> Cc: Peter Zijlstra 
> Cc: Stephen Hemminger 
> Cc: Steven Rostedt 
> Cc: Sven-Thorsten Dietrich 
> Cc: Thomas Gleixner 
> Signed-off-by: Paul E. McKenney 

Reviewed-by: Josh Triplett 

>  kernel/rcutree.c |5 -
>  kernel/rcutree.h |1 +
>  2 files changed, 5 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index 8bbc7fb..e2fd370 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -210,6 +210,9 @@ EXPORT_SYMBOL_GPL(rcu_note_context_switch);
>  DEFINE_PER_CPU(struct rcu_dynticks, rcu_dynticks) = {
>   .dynticks_nesting = DYNTICK_TASK_EXIT_IDLE,
>   .dynticks = ATOMIC_INIT(1),
> +#ifdef CONFIG_RCU_USER_QS
> + .ignore_user_qs = true,
> +#endif
>  };
>  
>  static int blimit = 10;  /* Maximum callbacks per rcu_do_batch. 
> */
> @@ -443,7 +446,7 @@ void rcu_user_enter(void)
>  
>   local_irq_save(flags);
>   rdtp = &__get_cpu_var(rcu_dynticks);
> - if (!rdtp->in_user) {
> + if (!rdtp->ignore_user_qs && !rdtp->in_user) {
>   rdtp->in_user = true;
>   rcu_eqs_enter(1);
>   }
> diff --git a/kernel/rcutree.h b/kernel/rcutree.h
> index 0dd5fd6..c190582 100644
> --- a/kernel/rcutree.h
> +++ b/kernel/rcutree.h
> @@ -103,6 +103,7 @@ struct rcu_dynticks {
>   int tick_nohz_enabled_snap; /* Previously seen value from sysfs. */
>  #endif /* #ifdef CONFIG_RCU_FAST_NO_HZ */
>  #ifdef CONFIG_RCU_USER_QS
> + bool ignore_user_qs;/* Treat userspace as extended QS or not */
>   bool in_user;   /* Is the CPU in userland from RCU POV? */
>  #endif
>  };
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 07/26] rcu: Switch task's syscall hooks on context switch

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 02:05:24PM -0700, Paul E. McKenney wrote:
> From: Frederic Weisbecker 
> 
> Clear the syscalls hook of a task when it's scheduled out so that if
> the task migrates, it doesn't run the syscall slow path on a CPU
> that might not need it.
> 
> Also set the syscalls hook on the next task if needed.
> 
> Signed-off-by: Frederic Weisbecker 
> Cc: Alessio Igor Bogani 
> Cc: Andrew Morton 
> Cc: Avi Kivity 
> Cc: Chris Metcalf 
> Cc: Christoph Lameter 
> Cc: Geoff Levand 
> Cc: Gilad Ben Yossef 
> Cc: Hakan Akkan 
> Cc: H. Peter Anvin 
> Cc: Ingo Molnar 
> Cc: Josh Triplett 
> Cc: Kevin Hilman 
> Cc: Max Krasnyansky 
> Cc: Peter Zijlstra 
> Cc: Stephen Hemminger 
> Cc: Steven Rostedt 
> Cc: Sven-Thorsten Dietrich 
> Cc: Thomas Gleixner 
> Signed-off-by: Paul E. McKenney 

Reviewed-by: Josh Triplett 

>  arch/um/drivers/mconsole_kern.c |1 +
>  include/linux/rcupdate.h|2 ++
>  include/linux/sched.h   |8 
>  kernel/rcutree.c|   15 +++
>  kernel/sched/core.c |1 +
>  5 files changed, 27 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/um/drivers/mconsole_kern.c b/arch/um/drivers/mconsole_kern.c
> index 664a60e..c17de0d 100644
> --- a/arch/um/drivers/mconsole_kern.c
> +++ b/arch/um/drivers/mconsole_kern.c
> @@ -705,6 +705,7 @@ static void stack_proc(void *arg)
>   struct task_struct *from = current, *to = arg;
>  
>   to->thread.saved_task = from;
> + rcu_switch(from, to);
>   switch_to(from, to, from);
>  }
>  
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index e47..1fc0a0e 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -197,6 +197,8 @@ extern void rcu_user_enter(void);
>  extern void rcu_user_exit(void);
>  extern void rcu_user_enter_irq(void);
>  extern void rcu_user_exit_irq(void);
> +extern void rcu_user_hooks_switch(struct task_struct *prev,
> +   struct task_struct *next);
>  #else
>  static inline void rcu_user_enter(void) { }
>  static inline void rcu_user_exit(void) { }
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index c147e70..e4d5936 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1894,6 +1894,14 @@ static inline void rcu_copy_process(struct task_struct 
> *p)
>  
>  #endif
>  
> +static inline void rcu_switch(struct task_struct *prev,
> +   struct task_struct *next)
> +{
> +#ifdef CONFIG_RCU_USER_QS
> + rcu_user_hooks_switch(prev, next);
> +#endif
> +}
> +
>  static inline void tsk_restore_flags(struct task_struct *task,
>   unsigned long orig_flags, unsigned long flags)
>  {
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index e2fd370..af92681 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -721,6 +721,21 @@ int rcu_is_cpu_idle(void)
>  }
>  EXPORT_SYMBOL(rcu_is_cpu_idle);
>  
> +#ifdef CONFIG_RCU_USER_QS
> +void rcu_user_hooks_switch(struct task_struct *prev,
> +struct task_struct *next)
> +{
> + struct rcu_dynticks *rdtp;
> +
> + /* Interrupts are disabled in context switch */
> + rdtp = &__get_cpu_var(rcu_dynticks);
> + if (!rdtp->ignore_user_qs) {
> + clear_tsk_thread_flag(prev, TIF_NOHZ);
> + set_tsk_thread_flag(next, TIF_NOHZ);
> + }
> +}
> +#endif /* #ifdef CONFIG_RCU_USER_QS */
> +
>  #if defined(CONFIG_PROVE_RCU) && defined(CONFIG_HOTPLUG_CPU)
>  
>  /*
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index d325c4b..07c6d9a 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2081,6 +2081,7 @@ context_switch(struct rq *rq, struct task_struct *prev,
>  #endif
>  
>   /* Here we just switch the register state and the stack. */
> + rcu_switch(prev, next);
>   switch_to(prev, next, prev);
>  
>   barrier();
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 09/26] x86: Exception hooks for userspace RCU extended QS

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 02:05:26PM -0700, Paul E. McKenney wrote:
> From: Frederic Weisbecker 
> 
> Add necessary hooks to x86 exception for userspace
> RCU extended quiescent state support.
> 
> This includes traps, page fault, debug exceptions, etc...
> 
> Signed-off-by: Frederic Weisbecker 
> Cc: Alessio Igor Bogani 
> Cc: Andrew Morton 
> Cc: Avi Kivity 
> Cc: Chris Metcalf 
> Cc: Christoph Lameter 
> Cc: Geoff Levand 
> Cc: Gilad Ben Yossef 
> Cc: Hakan Akkan 
> Cc: H. Peter Anvin 
> Cc: Ingo Molnar 
> Cc: Josh Triplett 
> Cc: Kevin Hilman 
> Cc: Max Krasnyansky 
> Cc: Peter Zijlstra 
> Cc: Stephen Hemminger 
> Cc: Steven Rostedt 
> Cc: Sven-Thorsten Dietrich 
> Cc: Thomas Gleixner 
> Signed-off-by: Paul E. McKenney 

Reviewed-by: Josh Triplett 

>  arch/x86/include/asm/rcu.h |   20 
>  arch/x86/kernel/traps.c|   30 ++
>  arch/x86/mm/fault.c|   13 +++--
>  3 files changed, 53 insertions(+), 10 deletions(-)
>  create mode 100644 arch/x86/include/asm/rcu.h
> 
> diff --git a/arch/x86/include/asm/rcu.h b/arch/x86/include/asm/rcu.h
> new file mode 100644
> index 000..439815b
> --- /dev/null
> +++ b/arch/x86/include/asm/rcu.h
> @@ -0,0 +1,20 @@
> +#ifndef _ASM_X86_RCU_H
> +#define _ASM_X86_RCU_H
> +
> +#include 
> +#include 
> +
> +static inline void exception_enter(struct pt_regs *regs)
> +{
> + rcu_user_exit();
> +}
> +
> +static inline void exception_exit(struct pt_regs *regs)
> +{
> +#ifdef CONFIG_RCU_USER_QS
> + if (user_mode(regs))
> + rcu_user_enter();
> +#endif
> +}
> +
> +#endif
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index b481341..ab82cbd 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -55,6 +55,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  
> @@ -180,11 +181,15 @@ vm86_trap:
>  #define DO_ERROR(trapnr, signr, str, name)   \
>  dotraplinkage void do_##name(struct pt_regs *regs, long error_code)  \
>  {\
> - if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr)  \
> - == NOTIFY_STOP) \
> + exception_enter(regs);  \
> + if (notify_die(DIE_TRAP, str, regs, error_code, \
> + trapnr, signr) == NOTIFY_STOP) {\
> + exception_exit(regs);   \
>   return; \
> + }   \
>   conditional_sti(regs);  \
>   do_trap(trapnr, signr, str, regs, error_code, NULL);\
> + exception_exit(regs);   \
>  }
>  
>  #define DO_ERROR_INFO(trapnr, signr, str, name, sicode, siaddr)  
> \
> @@ -195,11 +200,15 @@ dotraplinkage void do_##name(struct pt_regs *regs, long 
> error_code) \
>   info.si_errno = 0;  \
>   info.si_code = sicode;  \
>   info.si_addr = (void __user *)siaddr;   \
> - if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr)  \
> - == NOTIFY_STOP) \
> + exception_enter(regs);  \
> + if (notify_die(DIE_TRAP, str, regs, error_code, \
> + trapnr, signr) == NOTIFY_STOP) {\
> + exception_exit(regs);   \
>   return; \
> + }   \
>   conditional_sti(regs);  \
>   do_trap(trapnr, signr, str, regs, error_code, &info);   \
> + exception_exit(regs);   \
>  }
>  
>  DO_ERROR_INFO(X86_TRAP_DE, SIGFPE, "divide error", divide_error, FPE_INTDIV,
> @@ -312,6 +321,7 @@ dotraplinkage void __kprobes notrace do_int3(struct 
> pt_regs *regs, long error_co
>   ftrace_int3_handler(regs))
>   return;
>  #endif
> + exception_enter(regs);
>  #ifdef CONFIG_KGDB_LOW_LEVEL_TRAP
>   if (kgdb_ll_trap(DIE_INT3, "int3", regs, error_code, X86_TRAP_BP,
>   SIGTRAP) == NOTIFY

Re: [PATCH tip/core/rcu 10/26] rcu: Exit RCU extended QS on kernel preemption after irq/exception

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 02:05:27PM -0700, Paul E. McKenney wrote:
> From: Frederic Weisbecker 
> 
> When an exception or an irq exits, and we are going to resume into
> interrupted kernel code, the low level architecture code calls
> preempt_schedule_irq() if there is a need to reschedule.
> 
> If the interrupt/exception occured between a call to rcu_user_enter()
> (from syscall exit, exception exit, do_notify_resume exit, ...) and
> a real resume to userspace (iret,...), preempt_schedule_irq() can be
> called whereas RCU thinks we are in userspace. But preempt_schedule_irq()
> is going to run kernel code and may be some RCU read side critical
> section. We must exit the userspace extended quiescent state before
> we call it.
> 
> To solve this, just call rcu_user_exit() in the beginning of
> preempt_schedule_irq().
> 
> Signed-off-by: Frederic Weisbecker 
> Cc: Alessio Igor Bogani 
> Cc: Andrew Morton 
> Cc: Avi Kivity 
> Cc: Chris Metcalf 
> Cc: Christoph Lameter 
> Cc: Geoff Levand 
> Cc: Gilad Ben Yossef 
> Cc: Hakan Akkan 
> Cc: H. Peter Anvin 
> Cc: Ingo Molnar 
> Cc: Josh Triplett 
> Cc: Kevin Hilman 
> Cc: Max Krasnyansky 
> Cc: Peter Zijlstra 
> Cc: Stephen Hemminger 
> Cc: Steven Rostedt 
> Cc: Sven-Thorsten Dietrich 
> Cc: Thomas Gleixner 
> Signed-off-by: Paul E. McKenney 

Reviewed-by: Josh Triplett 

>  kernel/sched/core.c |1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 07c6d9a..0bd599b 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3564,6 +3564,7 @@ asmlinkage void __sched preempt_schedule_irq(void)
>   /* Catch callers which need to be fixed */
>   BUG_ON(ti->preempt_count || !irqs_disabled());
>  
> + rcu_user_exit();
>   do {
>   add_preempt_count(PREEMPT_ACTIVE);
>   local_irq_enable();
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 11/26] rcu: Exit RCU extended QS on user preemption

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 02:05:28PM -0700, Paul E. McKenney wrote:
> From: Frederic Weisbecker 
> 
> When exceptions or irq are about to resume userspace, if
> the task needs to be rescheduled, the arch low level code
> calls schedule() directly.
> 
> At that time we may be in extended quiescent state from RCU
> POV: the exception is not anymore protected inside
> rcu_user_exit() - rcu_user_enter() and the irq has called
> rcu_irq_exit() already.
> 
> Create a new API schedule_user() that calls schedule() inside
> rcu_user_exit()-rcu_user_enter() in order to protect it. Archs
> will need to rely on it now to implement user preemption safely.
> 
> Signed-off-by: Frederic Weisbecker 
> Cc: Alessio Igor Bogani 
> Cc: Andrew Morton 
> Cc: Avi Kivity 
> Cc: Chris Metcalf 
> Cc: Christoph Lameter 
> Cc: Geoff Levand 
> Cc: Gilad Ben Yossef 
> Cc: Hakan Akkan 
> Cc: H. Peter Anvin 
> Cc: Ingo Molnar 
> Cc: Josh Triplett 
> Cc: Kevin Hilman 
> Cc: Max Krasnyansky 
> Cc: Peter Zijlstra 
> Cc: Stephen Hemminger 
> Cc: Steven Rostedt 
> Cc: Sven-Thorsten Dietrich 
> Cc: Thomas Gleixner 
> Signed-off-by: Paul E. McKenney 

Reviewed-by: Josh Triplett 

>  kernel/sched/core.c |7 +++
>  1 files changed, 7 insertions(+), 0 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 0bd599b..e841dfc 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3463,6 +3463,13 @@ asmlinkage void __sched schedule(void)
>  }
>  EXPORT_SYMBOL(schedule);
>  
> +asmlinkage void __sched schedule_user(void)
> +{
> + rcu_user_exit();
> + schedule();
> + rcu_user_enter();
> +}
> +
>  /**
>   * schedule_preempt_disabled - called with preemption disabled
>   *
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 12/26] x86: Use the new schedule_user API on userspace preemption

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 02:05:29PM -0700, Paul E. McKenney wrote:
> From: Frederic Weisbecker 
> 
> This way we can exit the RCU extended quiescent state before
> we schedule a new task from irq/exception exit.
> 
> Signed-off-by: Frederic Weisbecker 
> Cc: Alessio Igor Bogani 
> Cc: Andrew Morton 
> Cc: Avi Kivity 
> Cc: Chris Metcalf 
> Cc: Christoph Lameter 
> Cc: Geoff Levand 
> Cc: Gilad Ben Yossef 
> Cc: Hakan Akkan 
> Cc: H. Peter Anvin 
> Cc: Ingo Molnar 
> Cc: Josh Triplett 
> Cc: Kevin Hilman 
> Cc: Max Krasnyansky 
> Cc: Peter Zijlstra 
> Cc: Stephen Hemminger 
> Cc: Steven Rostedt 
> Cc: Sven-Thorsten Dietrich 
> Cc: Thomas Gleixner 
> Signed-off-by: Paul E. McKenney 

Reviewed-by: Josh Triplett 

>  arch/x86/kernel/entry_64.S |8 
>  1 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index 69babd8..6230487 100644
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -565,7 +565,7 @@ sysret_careful:
>   TRACE_IRQS_ON
>   ENABLE_INTERRUPTS(CLBR_NONE)
>   pushq_cfi %rdi
> - call schedule
> + call schedule_user
>   popq_cfi %rdi
>   jmp sysret_check
>  
> @@ -678,7 +678,7 @@ int_careful:
>   TRACE_IRQS_ON
>   ENABLE_INTERRUPTS(CLBR_NONE)
>   pushq_cfi %rdi
> - call schedule
> + call schedule_user
>   popq_cfi %rdi
>   DISABLE_INTERRUPTS(CLBR_NONE)
>   TRACE_IRQS_OFF
> @@ -974,7 +974,7 @@ retint_careful:
>   TRACE_IRQS_ON
>   ENABLE_INTERRUPTS(CLBR_NONE)
>   pushq_cfi %rdi
> - call  schedule
> + call  schedule_user
>   popq_cfi %rdi
>   GET_THREAD_INFO(%rcx)
>   DISABLE_INTERRUPTS(CLBR_NONE)
> @@ -1449,7 +1449,7 @@ paranoid_userspace:
>  paranoid_schedule:
>   TRACE_IRQS_ON
>   ENABLE_INTERRUPTS(CLBR_ANY)
> - call schedule
> + call schedule_user
>   DISABLE_INTERRUPTS(CLBR_ANY)
>   TRACE_IRQS_OFF
>   jmp paranoid_userspace
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 13/26] x86: Exit RCU extended QS on notify resume

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 02:05:30PM -0700, Paul E. McKenney wrote:
> From: Frederic Weisbecker 
> 
> do_notify_resume() may be called on irq or exception
> exit. But at that time the exception has already called
> rcu_user_enter() and the irq has already called rcu_irq_exit().
> 
> Since it can use RCU read side critical section, we must call
> rcu_user_exit() before doing anything there. Then we must call
> back rcu_user_enter() after this function because we know we are
> going to userspace from there.
> 
> This complete support for userspace RCU extended quiescent state
> in x86-64.
> 
> Signed-off-by: Frederic Weisbecker 
> Cc: Alessio Igor Bogani 
> Cc: Andrew Morton 
> Cc: Avi Kivity 
> Cc: Chris Metcalf 
> Cc: Christoph Lameter 
> Cc: Geoff Levand 
> Cc: Gilad Ben Yossef 
> Cc: Hakan Akkan 
> Cc: H. Peter Anvin 
> Cc: Ingo Molnar 
> Cc: Josh Triplett 
> Cc: Kevin Hilman 
> Cc: Max Krasnyansky 
> Cc: Peter Zijlstra 
> Cc: Stephen Hemminger 
> Cc: Steven Rostedt 
> Cc: Sven-Thorsten Dietrich 
> Cc: Thomas Gleixner 
> Signed-off-by: Paul E. McKenney 

Reviewed-by: Josh Triplett 

>  arch/x86/Kconfig |1 +
>  arch/x86/kernel/signal.c |4 
>  2 files changed, 5 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index ba2657c..5cd953a 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -97,6 +97,7 @@ config X86
>   select KTIME_SCALAR if X86_32
>   select GENERIC_STRNCPY_FROM_USER
>   select GENERIC_STRNLEN_USER
> + select HAVE_RCU_USER_QS if X86_64
>  
>  config INSTRUCTION_DECODER
>   def_bool (KPROBES || PERF_EVENTS || UPROBES)
> diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
> index b280908..bca0ab9 100644
> --- a/arch/x86/kernel/signal.c
> +++ b/arch/x86/kernel/signal.c
> @@ -779,6 +779,8 @@ static void do_signal(struct pt_regs *regs)
>  void
>  do_notify_resume(struct pt_regs *regs, void *unused, __u32 thread_info_flags)
>  {
> + rcu_user_exit();
> +
>  #ifdef CONFIG_X86_MCE
>   /* notify userspace of pending MCEs */
>   if (thread_info_flags & _TIF_MCE_NOTIFY)
> @@ -804,6 +806,8 @@ do_notify_resume(struct pt_regs *regs, void *unused, 
> __u32 thread_info_flags)
>  #ifdef CONFIG_X86_32
>   clear_thread_flag(TIF_IRET);
>  #endif /* CONFIG_X86_32 */
> +
> + rcu_user_enter();
>  }
>  
>  void signal_fault(struct pt_regs *regs, void __user *frame, char *where)
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 14/26] rcu: Userspace RCU extended QS selftest

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 02:05:31PM -0700, Paul E. McKenney wrote:
> From: Frederic Weisbecker 
> 
> Provide a config option that enables the userspace
> RCU extended quiescent state on every CPUs by default.
> 
> This is for testing purpose.
> 
> Signed-off-by: Frederic Weisbecker 
> Cc: Alessio Igor Bogani 
> Cc: Andrew Morton 
> Cc: Avi Kivity 
> Cc: Chris Metcalf 
> Cc: Christoph Lameter 
> Cc: Geoff Levand 
> Cc: Gilad Ben Yossef 
> Cc: Hakan Akkan 
> Cc: H. Peter Anvin 
> Cc: Ingo Molnar 
> Cc: Josh Triplett 
> Cc: Kevin Hilman 
> Cc: Max Krasnyansky 
> Cc: Peter Zijlstra 
> Cc: Stephen Hemminger 
> Cc: Steven Rostedt 
> Cc: Sven-Thorsten Dietrich 
> Cc: Thomas Gleixner 
> Signed-off-by: Paul E. McKenney 

Reviewed-by: Josh Triplett 

>  init/Kconfig |8 
>  kernel/rcutree.c |2 +-
>  2 files changed, 9 insertions(+), 1 deletions(-)
> 
> diff --git a/init/Kconfig b/init/Kconfig
> index f6a1830..c26b8a1 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -451,6 +451,14 @@ config RCU_USER_QS
> excluded from the global RCU state machine and thus doesn't
> to keep the timer tick on for RCU.
>  
> +config RCU_USER_QS_FORCE
> + bool "Force userspace extended QS by default"
> + depends on RCU_USER_QS
> + help
> +   Set the hooks in user/kernel boundaries by default in order to
> +   test this feature that treats userspace as an extended quiescent
> +   state until we have a real user like a full adaptive nohz option.
> +
>  config RCU_FANOUT
>   int "Tree-based hierarchical RCU fanout value"
>   range 2 64 if 64BIT
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index af92681..ccf3cbf 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -210,7 +210,7 @@ EXPORT_SYMBOL_GPL(rcu_note_context_switch);
>  DEFINE_PER_CPU(struct rcu_dynticks, rcu_dynticks) = {
>   .dynticks_nesting = DYNTICK_TASK_EXIT_IDLE,
>   .dynticks = ATOMIC_INIT(1),
> -#ifdef CONFIG_RCU_USER_QS
> +#if defined(CONFIG_RCU_USER_QS) && !defined(CONFIG_RCU_USER_QS_FORCE)
>   .ignore_user_qs = true,
>  #endif
>  };
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 15/26] alpha: Fix preemption handling in idle loop

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 02:05:32PM -0700, Paul E. McKenney wrote:
> From: Frederic Weisbecker 
> 
> cpu_idle() is called on the boot CPU by the init code with
> preemption disabled. But the cpu_idle() function in alpha
> doesn't handle this when it calls schedule() directly.
> 
> Fix it by converting it into schedule_preempt_disabled().
> 
> Also disable preemption before calling cpu_idle() from
> secondary CPU entry code to stay consistent with this
> state.
> 
> Signed-off-by: Frederic Weisbecker 
> Cc: Richard Henderson 
> Cc: Ivan Kokshaysky 
> Cc: Matt Turner 
> Cc: alpha 
> Cc: Paul E. McKenney 
> Cc: Michael Cree 

Reviewed-by: Josh Triplett 

>  arch/alpha/kernel/process.c |3 ++-
>  arch/alpha/kernel/smp.c |1 +
>  2 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/alpha/kernel/process.c b/arch/alpha/kernel/process.c
> index 153d3fc..eac5e01 100644
> --- a/arch/alpha/kernel/process.c
> +++ b/arch/alpha/kernel/process.c
> @@ -56,7 +56,8 @@ cpu_idle(void)
>  
>   while (!need_resched())
>   cpu_relax();
> - schedule();
> +
> + schedule_preempt_disabled();
>   }
>  }
>  
> diff --git a/arch/alpha/kernel/smp.c b/arch/alpha/kernel/smp.c
> index 35ddc02..a41ad90 100644
> --- a/arch/alpha/kernel/smp.c
> +++ b/arch/alpha/kernel/smp.c
> @@ -166,6 +166,7 @@ smp_callin(void)
>   DBGS(("smp_callin: commencing CPU %d current %p active_mm %p\n",
> cpuid, current, current->active_mm));
>  
> + preempt_disable();
>   /* Do nothing.  */
>   cpu_idle();
>  }
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 08/26] x86: Syscall hooks for userspace RCU extended QS

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 02:05:25PM -0700, Paul E. McKenney wrote:
> From: Frederic Weisbecker 
> 
> Add syscall slow path hooks to notify syscall entry
> and exit on CPUs that want to support userspace RCU
> extended quiescent state.
> 
> Signed-off-by: Frederic Weisbecker 
> Cc: Alessio Igor Bogani 
> Cc: Andrew Morton 
> Cc: Avi Kivity 
> Cc: Chris Metcalf 
> Cc: Christoph Lameter 
> Cc: Geoff Levand 
> Cc: Gilad Ben Yossef 
> Cc: Hakan Akkan 
> Cc: H. Peter Anvin 
> Cc: Ingo Molnar 
> Cc: Josh Triplett 
> Cc: Kevin Hilman 
> Cc: Max Krasnyansky 
> Cc: Peter Zijlstra 
> Cc: Stephen Hemminger 
> Cc: Steven Rostedt 
> Cc: Sven-Thorsten Dietrich 
> Cc: Thomas Gleixner 
> Signed-off-by: Paul E. McKenney 

This seems reasonable; presumably you plan to add something actually
setting TIF_NOHZ in a subsequent patch series?

Reviewed-by: Josh Triplett 

>  arch/x86/include/asm/thread_info.h |   10 +++---
>  arch/x86/kernel/ptrace.c   |5 +
>  2 files changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/include/asm/thread_info.h 
> b/arch/x86/include/asm/thread_info.h
> index 89f794f..c535d84 100644
> --- a/arch/x86/include/asm/thread_info.h
> +++ b/arch/x86/include/asm/thread_info.h
> @@ -89,6 +89,7 @@ struct thread_info {
>  #define TIF_NOTSC16  /* TSC is not accessible in userland */
>  #define TIF_IA32 17  /* IA32 compatibility process */
>  #define TIF_FORK 18  /* ret_from_fork */
> +#define TIF_NOHZ 19  /* in adaptive nohz mode */
>  #define TIF_MEMDIE   20  /* is terminating due to OOM killer */
>  #define TIF_DEBUG21  /* uses debug registers */
>  #define TIF_IO_BITMAP22  /* uses I/O bitmap */
> @@ -114,6 +115,7 @@ struct thread_info {
>  #define _TIF_NOTSC   (1 << TIF_NOTSC)
>  #define _TIF_IA32(1 << TIF_IA32)
>  #define _TIF_FORK(1 << TIF_FORK)
> +#define _TIF_NOHZ(1 << TIF_NOHZ)
>  #define _TIF_DEBUG   (1 << TIF_DEBUG)
>  #define _TIF_IO_BITMAP   (1 << TIF_IO_BITMAP)
>  #define _TIF_FORCED_TF   (1 << TIF_FORCED_TF)
> @@ -126,12 +128,13 @@ struct thread_info {
>  /* work to do in syscall_trace_enter() */
>  #define _TIF_WORK_SYSCALL_ENTRY  \
>   (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_EMU | _TIF_SYSCALL_AUDIT |   \
> -  _TIF_SECCOMP | _TIF_SINGLESTEP | _TIF_SYSCALL_TRACEPOINT)
> +  _TIF_SECCOMP | _TIF_SINGLESTEP | _TIF_SYSCALL_TRACEPOINT | \
> +  _TIF_NOHZ)
>  
>  /* work to do in syscall_trace_leave() */
>  #define _TIF_WORK_SYSCALL_EXIT   \
>   (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | _TIF_SINGLESTEP |\
> -  _TIF_SYSCALL_TRACEPOINT)
> +  _TIF_SYSCALL_TRACEPOINT | _TIF_NOHZ)
>  
>  /* work to do on interrupt/exception return */
>  #define _TIF_WORK_MASK   
> \
> @@ -141,7 +144,8 @@ struct thread_info {
>  
>  /* work to do on any return to user space */
>  #define _TIF_ALLWORK_MASK\
> - ((0x & ~_TIF_SECCOMP) | _TIF_SYSCALL_TRACEPOINT)
> + ((0x & ~_TIF_SECCOMP) | _TIF_SYSCALL_TRACEPOINT |   \
> + _TIF_NOHZ)
>  
>  /* Only used for 64 bit */
>  #define _TIF_DO_NOTIFY_MASK  \
> diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
> index c4c6a5c..9f94f8e 100644
> --- a/arch/x86/kernel/ptrace.c
> +++ b/arch/x86/kernel/ptrace.c
> @@ -21,6 +21,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -1463,6 +1464,8 @@ long syscall_trace_enter(struct pt_regs *regs)
>  {
>   long ret = 0;
>  
> + rcu_user_exit();
> +
>   /*
>* If we stepped into a sysenter/syscall insn, it trapped in
>* kernel mode; do_debug() cleared TF and set TIF_SINGLESTEP.
> @@ -1526,4 +1529,6 @@ void syscall_trace_leave(struct pt_regs *regs)
>   !test_thread_flag(TIF_SYSCALL_EMU);
>   if (step || test_thread_flag(TIF_SYSCALL_TRACE))
>   tracehook_report_syscall_exit(regs, step);
> +
> + rcu_user_enter();
>  }
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 16/26] alpha: Add missing RCU idle APIs on idle loop

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 02:05:33PM -0700, Paul E. McKenney wrote:
> From: Frederic Weisbecker 
> 
> In the old times, the whole idle task was considered
> as an RCU quiescent state. But as RCU became more and
> more successful overtime, some RCU read side critical
> section have been added even in the code of some
> architectures idle tasks, for tracing for example.
> 
> So nowadays, rcu_idle_enter() and rcu_idle_exit() must
> be called by the architecture to tell RCU about the part
> in the idle loop that doesn't make use of rcu read side
> critical sections, typically the part that puts the CPU
> in low power mode.
> 
> This is necessary for RCU to find the quiescent states in
> idle in order to complete grace periods.
> 
> Add this missing pair of calls in the Alpha's idle loop.
> 
> Reported-by: Paul E. McKenney 
> Signed-off-by: Frederic Weisbecker 
> Cc: Richard Henderson 
> Cc: Ivan Kokshaysky 
> Cc: Matt Turner 
> Cc: alpha 
> Cc: Paul E. McKenney 
> Cc: Michael Cree 
> Cc: 3.2.x.. 

Reviewed-by: Josh Triplett 

>  arch/alpha/kernel/process.c |3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/alpha/kernel/process.c b/arch/alpha/kernel/process.c
> index eac5e01..eb9558c 100644
> --- a/arch/alpha/kernel/process.c
> +++ b/arch/alpha/kernel/process.c
> @@ -28,6 +28,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -54,9 +55,11 @@ cpu_idle(void)
>   /* FIXME -- EV6 and LCA45 know how to power down
>  the CPU.  */
>  
> + rcu_idle_enter();
>   while (!need_resched())
>   cpu_relax();
>  
> + rcu_idle_exit();
>   schedule_preempt_disabled();
>   }
>  }
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 17/26] cris: Add missing RCU idle APIs on idle loop

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 02:05:34PM -0700, Paul E. McKenney wrote:
> From: Frederic Weisbecker 
> 
> In the old times, the whole idle task was considered
> as an RCU quiescent state. But as RCU became more and
> more successful overtime, some RCU read side critical
> section have been added even in the code of some
> architectures idle tasks, for tracing for example.
> 
> So nowadays, rcu_idle_enter() and rcu_idle_exit() must
> be called by the architecture to tell RCU about the part
> in the idle loop that doesn't make use of rcu read side
> critical sections, typically the part that puts the CPU
> in low power mode.
> 
> This is necessary for RCU to find the quiescent states in
> idle in order to complete grace periods.
> 
> Add this missing pair of calls in the Cris's idle loop.
> 
> Reported-by: Paul E. McKenney 
> Signed-off-by: Frederic Weisbecker 
> Cc: Mikael Starvik 
> Cc: Jesper Nilsson 
> Cc: Cris 
> Cc: 3.2.x.. 
> Cc: Paul E. McKenney 

Reviewed-by: Josh Triplett 

>  arch/cris/kernel/process.c |3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/cris/kernel/process.c b/arch/cris/kernel/process.c
> index 66fd017..7f65be6 100644
> --- a/arch/cris/kernel/process.c
> +++ b/arch/cris/kernel/process.c
> @@ -25,6 +25,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  //#define DEBUG
>  
> @@ -74,6 +75,7 @@ void cpu_idle (void)
>  {
>   /* endless idle loop with no priority at all */
>   while (1) {
> + rcu_idle_enter();
>   while (!need_resched()) {
>   void (*idle)(void);
>   /*
> @@ -86,6 +88,7 @@ void cpu_idle (void)
>   idle = default_idle;
>   idle();
>   }
> + rcu_idle_exit();
>   schedule_preempt_disabled();
>   }
>  }
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 18/26] frv: Add missing RCU idle APIs on idle loop

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 02:05:35PM -0700, Paul E. McKenney wrote:
> From: Frederic Weisbecker 
> 
> In the old times, the whole idle task was considered
> as an RCU quiescent state. But as RCU became more and
> more successful overtime, some RCU read side critical
> section have been added even in the code of some
> architectures idle tasks, for tracing for example.
> 
> So nowadays, rcu_idle_enter() and rcu_idle_exit() must
> be called by the architecture to tell RCU about the part
> in the idle loop that doesn't make use of rcu read side
> critical sections, typically the part that puts the CPU
> in low power mode.
> 
> This is necessary for RCU to find the quiescent states in
> idle in order to complete grace periods.
> 
> Add this missing pair of calls in the Frv's idle loop.
> 
> Reported-by: Paul E. McKenney 
> Signed-off-by: Frederic Weisbecker 
> Cc: David Howells 
> Cc: 3.2.x.. 
> Cc: Paul E. McKenney 

Reviewed-by: Josh Triplett 

>  arch/frv/kernel/process.c |3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/frv/kernel/process.c b/arch/frv/kernel/process.c
> index ff95f50..2eb7fa5 100644
> --- a/arch/frv/kernel/process.c
> +++ b/arch/frv/kernel/process.c
> @@ -25,6 +25,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -69,12 +70,14 @@ void cpu_idle(void)
>  {
>   /* endless idle loop with no priority at all */
>   while (1) {
> + rcu_idle_enter();
>   while (!need_resched()) {
>   check_pgt_cache();
>  
>   if (!frv_dma_inprogress && idle)
>   idle();
>   }
> + rcu_idle_exit();
>  
>   schedule_preempt_disabled();
>   }
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 19/26] h8300: Add missing RCU idle APIs on idle loop

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 02:05:36PM -0700, Paul E. McKenney wrote:
> From: Frederic Weisbecker 
> 
> In the old times, the whole idle task was considered
> as an RCU quiescent state. But as RCU became more and
> more successful overtime, some RCU read side critical
> section have been added even in the code of some
> architectures idle tasks, for tracing for example.
> 
> So nowadays, rcu_idle_enter() and rcu_idle_exit() must
> be called by the architecture to tell RCU about the part
> in the idle loop that doesn't make use of rcu read side
> critical sections, typically the part that puts the CPU
> in low power mode.
> 
> This is necessary for RCU to find the quiescent states in
> idle in order to complete grace periods.
> 
> Add this missing pair of calls in the h8300's idle loop.
> 
> Reported-by: Paul E. McKenney 
> Signed-off-by: Frederic Weisbecker 
> Cc: Yoshinori Sato 
> Cc: 3.2.x.. 
> Cc: Paul E. McKenney 

Reviewed-by: Josh Triplett 

>  arch/h8300/kernel/process.c |3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/h8300/kernel/process.c b/arch/h8300/kernel/process.c
> index 0e9c315..f153ed1 100644
> --- a/arch/h8300/kernel/process.c
> +++ b/arch/h8300/kernel/process.c
> @@ -36,6 +36,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -78,8 +79,10 @@ void (*idle)(void) = default_idle;
>  void cpu_idle(void)
>  {
>   while (1) {
> + rcu_idle_enter();
>   while (!need_resched())
>   idle();
> + rcu_idle_exit();
>   schedule_preempt_disabled();
>   }
>  }
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 20/26] m32r: Add missing RCU idle APIs on idle loop

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 02:05:37PM -0700, Paul E. McKenney wrote:
> From: Frederic Weisbecker 
> 
> In the old times, the whole idle task was considered
> as an RCU quiescent state. But as RCU became more and
> more successful overtime, some RCU read side critical
> section have been added even in the code of some
> architectures idle tasks, for tracing for example.
> 
> So nowadays, rcu_idle_enter() and rcu_idle_exit() must
> be called by the architecture to tell RCU about the part
> in the idle loop that doesn't make use of rcu read side
> critical sections, typically the part that puts the CPU
> in low power mode.
> 
> This is necessary for RCU to find the quiescent states in
> idle in order to complete grace periods.
> 
> Add this missing pair of calls in the m32r's idle loop.
> 
> Reported-by: Paul E. McKenney 
> Signed-off-by: Frederic Weisbecker 
> Cc: Hirokazu Takata 
> Cc: 3.2.x.. 
> Cc: Paul E. McKenney 

Reviewed-by: Josh Triplett 

>  arch/m32r/kernel/process.c |3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/m32r/kernel/process.c b/arch/m32r/kernel/process.c
> index 3a4a32b..384e63f 100644
> --- a/arch/m32r/kernel/process.c
> +++ b/arch/m32r/kernel/process.c
> @@ -26,6 +26,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -82,6 +83,7 @@ void cpu_idle (void)
>  {
>   /* endless idle loop with no priority at all */
>   while (1) {
> + rcu_idle_enter();
>   while (!need_resched()) {
>   void (*idle)(void) = pm_idle;
>  
> @@ -90,6 +92,7 @@ void cpu_idle (void)
>  
>   idle();
>   }
> + rcu_idle_exit();
>   schedule_preempt_disabled();
>   }
>  }
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 21/26] m68k: Add missing RCU idle APIs on idle loop

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 02:05:38PM -0700, Paul E. McKenney wrote:
> From: Frederic Weisbecker 
> 
> In the old times, the whole idle task was considered
> as an RCU quiescent state. But as RCU became more and
> more successful overtime, some RCU read side critical
> section have been added even in the code of some
> architectures idle tasks, for tracing for example.
> 
> So nowadays, rcu_idle_enter() and rcu_idle_exit() must
> be called by the architecture to tell RCU about the part
> in the idle loop that doesn't make use of rcu read side
> critical sections, typically the part that puts the CPU
> in low power mode.
> 
> This is necessary for RCU to find the quiescent states in
> idle in order to complete grace periods.
> 
> Add this missing pair of calls in the m68k's idle loop.
> 
> Reported-by: Paul E. McKenney 
> Signed-off-by: Frederic Weisbecker 
> Acked-by: Geert Uytterhoeven 
> Cc: m68k 
> Cc: 3.2.x.. 
> Cc: Paul E. McKenney 

Reviewed-by: Josh Triplett 

>  arch/m68k/kernel/process.c |3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/m68k/kernel/process.c b/arch/m68k/kernel/process.c
> index c488e3c..ac2892e 100644
> --- a/arch/m68k/kernel/process.c
> +++ b/arch/m68k/kernel/process.c
> @@ -25,6 +25,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -75,8 +76,10 @@ void cpu_idle(void)
>  {
>   /* endless idle loop with no priority at all */
>   while (1) {
> + rcu_idle_enter();
>   while (!need_resched())
>   idle();
> + rcu_idle_exit();
>   schedule_preempt_disabled();
>   }
>  }
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 22/26] mn10300: Add missing RCU idle APIs on idle loop

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 02:05:39PM -0700, Paul E. McKenney wrote:
> From: Frederic Weisbecker 
> 
> In the old times, the whole idle task was considered
> as an RCU quiescent state. But as RCU became more and
> more successful overtime, some RCU read side critical
> section have been added even in the code of some
> architectures idle tasks, for tracing for example.
> 
> So nowadays, rcu_idle_enter() and rcu_idle_exit() must
> be called by the architecture to tell RCU about the part
> in the idle loop that doesn't make use of rcu read side
> critical sections, typically the part that puts the CPU
> in low power mode.
> 
> This is necessary for RCU to find the quiescent states in
> idle in order to complete grace periods.
> 
> Add this missing pair of calls in the mn10300's idle loop.
> 
> Reported-by: Paul E. McKenney 
> Signed-off-by: Frederic Weisbecker 
> Cc: David Howells 
> Cc: Koichi Yasutake 
> Cc: 3.2.x.. 
> Cc: Paul E. McKenney 

Reviewed-by: Josh Triplett 

>  arch/mn10300/kernel/process.c |3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/mn10300/kernel/process.c b/arch/mn10300/kernel/process.c
> index 7dab0cd..e9cceba 100644
> --- a/arch/mn10300/kernel/process.c
> +++ b/arch/mn10300/kernel/process.c
> @@ -25,6 +25,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -107,6 +108,7 @@ void cpu_idle(void)
>  {
>   /* endless idle loop with no priority at all */
>   for (;;) {
> + rcu_idle_enter();
>   while (!need_resched()) {
>   void (*idle)(void);
>  
> @@ -121,6 +123,7 @@ void cpu_idle(void)
>   }
>   idle();
>   }
> + rcu_idle_exit();
>  
>   schedule_preempt_disabled();
>   }
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 23/26] parisc: Add missing RCU idle APIs on idle loop

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 02:05:40PM -0700, Paul E. McKenney wrote:
> From: Frederic Weisbecker 
> 
> In the old times, the whole idle task was considered
> as an RCU quiescent state. But as RCU became more and
> more successful overtime, some RCU read side critical
> section have been added even in the code of some
> architectures idle tasks, for tracing for example.
> 
> So nowadays, rcu_idle_enter() and rcu_idle_exit() must
> be called by the architecture to tell RCU about the part
> in the idle loop that doesn't make use of rcu read side
> critical sections, typically the part that puts the CPU
> in low power mode.
> 
> This is necessary for RCU to find the quiescent states in
> idle in order to complete grace periods.
> 
> Add this missing pair of calls in the parisc's idle loop.
> 
> Reported-by: Paul E. McKenney 
> Signed-off-by: Frederic Weisbecker 
> Cc: James E.J. Bottomley 
> Cc: Helge Deller 
> Cc: Parisc 
> Cc: 3.2.x.. 
> Cc: Paul E. McKenney 

Reviewed-by: Josh Triplett 

>  arch/parisc/kernel/process.c |3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/parisc/kernel/process.c b/arch/parisc/kernel/process.c
> index d4b94b3..c54a4db 100644
> --- a/arch/parisc/kernel/process.c
> +++ b/arch/parisc/kernel/process.c
> @@ -48,6 +48,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -69,8 +70,10 @@ void cpu_idle(void)
>  
>   /* endless idle loop with no priority at all */
>   while (1) {
> + rcu_idle_enter();
>   while (!need_resched())
>   barrier();
> + rcu_idle_exit();
>   schedule_preempt_disabled();
>   check_pgt_cache();
>   }
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 24/26] score: Add missing RCU idle APIs on idle loop

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 02:05:41PM -0700, Paul E. McKenney wrote:
> From: Frederic Weisbecker 
> 
> In the old times, the whole idle task was considered
> as an RCU quiescent state. But as RCU became more and
> more successful overtime, some RCU read side critical
> section have been added even in the code of some
> architectures idle tasks, for tracing for example.
> 
> So nowadays, rcu_idle_enter() and rcu_idle_exit() must
> be called by the architecture to tell RCU about the part
> in the idle loop that doesn't make use of rcu read side
> critical sections, typically the part that puts the CPU
> in low power mode.
> 
> This is necessary for RCU to find the quiescent states in
> idle in order to complete grace periods.
> 
> Add this missing pair of calls in the scores's idle loop.

s/scores's/score/ or s/the scores's/score's/

> Reported-by: Paul E. McKenney 
> Signed-off-by: Frederic Weisbecker 
> Cc: Chen Liqin 
> Cc: Lennox Wu 
> Cc: 3.2.x.. 
> Cc: Paul E. McKenney 

With the fix above,
Reviewed-by: Josh Triplett 

>  arch/score/kernel/process.c |4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/score/kernel/process.c b/arch/score/kernel/process.c
> index 2707023..637970c 100644
> --- a/arch/score/kernel/process.c
> +++ b/arch/score/kernel/process.c
> @@ -27,6 +27,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  void (*pm_power_off)(void);
>  EXPORT_SYMBOL(pm_power_off);
> @@ -50,9 +51,10 @@ void __noreturn cpu_idle(void)
>  {
>   /* endless idle loop with no priority at all */
>   while (1) {
> + rcu_idle_enter();
>   while (!need_resched())
>   barrier();
> -
> + rcu_idle_exit();
>   schedule_preempt_disabled();
>   }
>  }
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 25/26] xtensa: Add missing RCU idle APIs on idle loop

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 02:05:42PM -0700, Paul E. McKenney wrote:
> From: Frederic Weisbecker 
> 
> In the old times, the whole idle task was considered
> as an RCU quiescent state. But as RCU became more and
> more successful overtime, some RCU read side critical
> section have been added even in the code of some
> architectures idle tasks, for tracing for example.
> 
> So nowadays, rcu_idle_enter() and rcu_idle_exit() must
> be called by the architecture to tell RCU about the part
> in the idle loop that doesn't make use of rcu read side
> critical sections, typically the part that puts the CPU
> in low power mode.
> 
> This is necessary for RCU to find the quiescent states in
> idle in order to complete grace periods.
> 
> Add this missing pair of calls in the xtensa's idle loop.
> 
> Reported-by: Paul E. McKenney 
> Signed-off-by: Frederic Weisbecker 
> Cc: Chris Zankel 
> Cc: 3.2.x.. 
> Cc: Paul E. McKenney 

Reviewed-by: Josh Triplett 

>  arch/xtensa/kernel/process.c |3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/xtensa/kernel/process.c b/arch/xtensa/kernel/process.c
> index 2c8d6a3..bc44311 100644
> --- a/arch/xtensa/kernel/process.c
> +++ b/arch/xtensa/kernel/process.c
> @@ -31,6 +31,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -110,8 +111,10 @@ void cpu_idle(void)
>  
>   /* endless idle loop with no priority at all */
>   while (1) {
> + rcu_idle_enter();
>   while (!need_resched())
>   platform_idle();
> + rcu_idle_exit();
>   schedule_preempt_disabled();
>   }
>  }
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 26/26] ia64: Add missing RCU idle APIs on idle loop

2012-08-31 Thread Josh Triplett

On Thu, Aug 30, 2012 at 02:05:43PM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> Traditionally, the entire idle task served as an RCU quiescent state.
> But when RCU read side critical sections started appearing within the
> idle loop, this traditional strategy became untenable.  The fix was to
> create new RCU APIs named rcu_idle_enter() and rcu_idle_exit(), which
> must be called by each architecture's idle loop so that RCU can tell
> when it is safe to ignore a given idle CPU.
> 
> Unfortunately, this fix was never applied to ia64, a shortcoming remedied
> by this commit.
> 
> Reported by: Tony Luck 
> Signed-off-by: Paul E. McKenney 
> Signed-off-by: Paul E. McKenney 
> Tested by: Tony Luck 

Reviewed-by: Josh Triplett 

>  arch/ia64/kernel/process.c |3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/ia64/kernel/process.c b/arch/ia64/kernel/process.c
> index dd6fc14..3e316ec 100644
> --- a/arch/ia64/kernel/process.c
> +++ b/arch/ia64/kernel/process.c
> @@ -29,6 +29,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -279,6 +280,7 @@ cpu_idle (void)
>  
>   /* endless idle loop with no priority at all */
>   while (1) {
> + rcu_idle_enter();
>   if (can_do_pal_halt) {
>   current_thread_info()->status &= ~TS_POLLING;
>   /*
> @@ -309,6 +311,7 @@ cpu_idle (void)
>   normal_xtp();
>  #endif
>   }
> + rcu_idle_exit();
>   schedule_preempt_disabled();
>   check_pgt_cache();
>   if (cpu_is_offline(cpu))
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 01/23] rcu: Move RCU grace-period initialization into a kthread

2012-09-01 Thread Josh Triplett

On Thu, Aug 30, 2012 at 11:18:16AM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> As the first step towards allowing grace-period initialization to be
> preemptible, this commit moves the RCU grace-period initialization
> into its own kthread.  This is needed to keep large-system scheduling
> latency at reasonable levels.
> 
> Reported-by: Mike Galbraith 
> Reported-by: Dimitri Sivanich 
> Signed-off-by: Paul E. McKenney 

Reviewed-by: Josh Triplett 

>  kernel/rcutree.c |  191 
> --
>  kernel/rcutree.h |3 +
>  2 files changed, 130 insertions(+), 64 deletions(-)
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index f280e54..e1c5868 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -1040,6 +1040,103 @@ rcu_start_gp_per_cpu(struct rcu_state *rsp, struct 
> rcu_node *rnp, struct rcu_dat
>  }
>  
>  /*
> + * Body of kthread that handles grace periods.
> + */
> +static int rcu_gp_kthread(void *arg)
> +{
> + unsigned long flags;
> + struct rcu_data *rdp;
> + struct rcu_node *rnp;
> + struct rcu_state *rsp = arg;
> +
> + for (;;) {
> +
> + /* Handle grace-period start. */
> + rnp = rcu_get_root(rsp);
> + for (;;) {
> + wait_event_interruptible(rsp->gp_wq, rsp->gp_flags);
> + if (rsp->gp_flags)
> + break;
> + flush_signals(current);
> + }
> + raw_spin_lock_irqsave(&rnp->lock, flags);
> + rsp->gp_flags = 0;
> + rdp = this_cpu_ptr(rsp->rda);
> +
> + if (rcu_gp_in_progress(rsp)) {
> + /*
> +  * A grace period is already in progress, so
> +  * don't start another one.
> +  */
> + raw_spin_unlock_irqrestore(&rnp->lock, flags);
> + continue;
> + }
> +
> + if (rsp->fqs_active) {
> + /*
> +  * We need a grace period, but force_quiescent_state()
> +  * is running.  Tell it to start one on our behalf.
> +  */
> + rsp->fqs_need_gp = 1;
> + raw_spin_unlock_irqrestore(&rnp->lock, flags);
> + continue;
> + }
> +
> + /* Advance to a new grace period and initialize state. */
> + rsp->gpnum++;
> + trace_rcu_grace_period(rsp->name, rsp->gpnum, "start");
> + WARN_ON_ONCE(rsp->fqs_state == RCU_GP_INIT);
> + rsp->fqs_state = RCU_GP_INIT; /* Stop force_quiescent_state. */
> + rsp->jiffies_force_qs = jiffies + RCU_JIFFIES_TILL_FORCE_QS;
> + record_gp_stall_check_time(rsp);
> + raw_spin_unlock(&rnp->lock);  /* leave irqs disabled. */
> +
> + /* Exclude any concurrent CPU-hotplug operations. */
> + raw_spin_lock(&rsp->onofflock);  /* irqs already disabled. */
> +
> + /*
> +  * Set the quiescent-state-needed bits in all the rcu_node
> +  * structures for all currently online CPUs in breadth-first
> +  * order, starting from the root rcu_node structure.
> +  * This operation relies on the layout of the hierarchy
> +  * within the rsp->node[] array.  Note that other CPUs will
> +  * access only the leaves of the hierarchy, which still
> +  * indicate that no grace period is in progress, at least
> +  * until the corresponding leaf node has been initialized.
> +  * In addition, we have excluded CPU-hotplug operations.
> +  *
> +  * Note that the grace period cannot complete until
> +  * we finish the initialization process, as there will
> +  * be at least one qsmask bit set in the root node until
> +  * that time, namely the one corresponding to this CPU,
> +  * due to the fact that we have irqs disabled.
> +  */
> + rcu_for_each_node_breadth_first(rsp, rnp) {
> + raw_spin_lock(&rnp->lock); /* irqs already disabled. */
> + rcu_preempt_check_blocked_tasks(rnp);
> + rnp->qsmask = rnp->qsmaskinit;
> + rnp->gpnum = rsp->gpnum;
> + rnp->completed = rsp->completed;
> + if (rnp == rdp->myn

Re: [PATCH tip/core/rcu 02/23] rcu: Allow RCU grace-period initialization to be preempted

2012-09-01 Thread Josh Triplett

On Thu, Aug 30, 2012 at 11:18:17AM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> RCU grace-period initialization is currently carried out with interrupts
> disabled, which can result in 200-microsecond latency spikes on systems
> on which RCU has been configured for 4096 CPUs.  This patch therefore
> makes the RCU grace-period initialization be preemptible, which should
> eliminate those latency spikes.  Similar spikes from grace-period cleanup
> and the forcing of quiescent states will be dealt with similarly by later
> patches.
> 
> Reported-by: Mike Galbraith 
> Reported-by: Dimitri Sivanich 
> Signed-off-by: Paul E. McKenney 

Does it make sense to have cond_resched() right before the continues,
which lead right back up to the wait_event_interruptible at the top of
the loop?  Or do you expect to usually find that event already
signalled?

In any case:

Reviewed-by: Josh Triplett 

>  kernel/rcutree.c |   17 ++---
>  1 files changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index e1c5868..ef56aa3 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -1069,6 +1069,7 @@ static int rcu_gp_kthread(void *arg)
>* don't start another one.
>*/
>   raw_spin_unlock_irqrestore(&rnp->lock, flags);
> + cond_resched();
>   continue;
>   }
>  
> @@ -1079,6 +1080,7 @@ static int rcu_gp_kthread(void *arg)
>*/
>   rsp->fqs_need_gp = 1;
>   raw_spin_unlock_irqrestore(&rnp->lock, flags);
> + cond_resched();
>   continue;
>   }
>  
> @@ -1089,10 +1091,10 @@ static int rcu_gp_kthread(void *arg)
>   rsp->fqs_state = RCU_GP_INIT; /* Stop force_quiescent_state. */
>   rsp->jiffies_force_qs = jiffies + RCU_JIFFIES_TILL_FORCE_QS;
>   record_gp_stall_check_time(rsp);
> - raw_spin_unlock(&rnp->lock);  /* leave irqs disabled. */
> + raw_spin_unlock_irqrestore(&rnp->lock, flags);
>  
>   /* Exclude any concurrent CPU-hotplug operations. */
> - raw_spin_lock(&rsp->onofflock);  /* irqs already disabled. */
> + get_online_cpus();
>  
>   /*
>* Set the quiescent-state-needed bits in all the rcu_node
> @@ -1112,7 +1114,7 @@ static int rcu_gp_kthread(void *arg)
>* due to the fact that we have irqs disabled.
>*/
>   rcu_for_each_node_breadth_first(rsp, rnp) {
> - raw_spin_lock(&rnp->lock); /* irqs already disabled. */
> + raw_spin_lock_irqsave(&rnp->lock, flags);
>   rcu_preempt_check_blocked_tasks(rnp);
>   rnp->qsmask = rnp->qsmaskinit;
>   rnp->gpnum = rsp->gpnum;
> @@ -1123,15 +1125,16 @@ static int rcu_gp_kthread(void *arg)
>   trace_rcu_grace_period_init(rsp->name, rnp->gpnum,
>   rnp->level, rnp->grplo,
>   rnp->grphi, rnp->qsmask);
> - raw_spin_unlock(&rnp->lock); /* irqs remain disabled. */
> + raw_spin_unlock_irqrestore(&rnp->lock, flags);
> + cond_resched();
>   }
>  
>   rnp = rcu_get_root(rsp);
> - raw_spin_lock(&rnp->lock); /* irqs already disabled. */
> + raw_spin_lock_irqsave(&rnp->lock, flags);
>   /* force_quiescent_state() now OK. */
>   rsp->fqs_state = RCU_SIGNAL_INIT;
> - raw_spin_unlock(&rnp->lock); /* irqs remain disabled. */
> - raw_spin_unlock_irqrestore(&rsp->onofflock, flags);
> + raw_spin_unlock_irqrestore(&rnp->lock, flags);
> + put_online_cpus();
>   }
>   return 0;
>  }
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 03/23] rcu: Move RCU grace-period cleanup into kthread

2012-09-01 Thread Josh Triplett

On Thu, Aug 30, 2012 at 11:18:18AM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> As a first step towards allowing grace-period cleanup to be preemptible,
> this commit moves the RCU grace-period cleanup into the same kthread
> that is now used to initialize grace periods.  This is needed to keep
> scheduling latency down to a dull roar.
> 
> Reported-by: Mike Galbraith 
> Reported-by: Dimitri Sivanich 
> Signed-off-by: Paul E. McKenney 

Reviewed-by: Josh Triplett 

>  kernel/rcutree.c |  112 
> ++
>  1 files changed, 62 insertions(+), 50 deletions(-)
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index ef56aa3..9fad21c 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -1045,6 +1045,7 @@ rcu_start_gp_per_cpu(struct rcu_state *rsp, struct 
> rcu_node *rnp, struct rcu_dat
>  static int rcu_gp_kthread(void *arg)
>  {
>   unsigned long flags;
> + unsigned long gp_duration;
>   struct rcu_data *rdp;
>   struct rcu_node *rnp;
>   struct rcu_state *rsp = arg;
> @@ -1135,6 +1136,65 @@ static int rcu_gp_kthread(void *arg)
>   rsp->fqs_state = RCU_SIGNAL_INIT;
>   raw_spin_unlock_irqrestore(&rnp->lock, flags);
>   put_online_cpus();
> +
> + /* Handle grace-period end. */
> + rnp = rcu_get_root(rsp);
> + for (;;) {
> + wait_event_interruptible(rsp->gp_wq,
> +  !ACCESS_ONCE(rnp->qsmask) &&
> +  
> !rcu_preempt_blocked_readers_cgp(rnp));
> + if (!ACCESS_ONCE(rnp->qsmask) &&
> + !rcu_preempt_blocked_readers_cgp(rnp))
> + break;
> + flush_signals(current);
> + }
> +
> + raw_spin_lock_irqsave(&rnp->lock, flags);
> + gp_duration = jiffies - rsp->gp_start;
> + if (gp_duration > rsp->gp_max)
> + rsp->gp_max = gp_duration;
> +
> + /*
> +  * We know the grace period is complete, but to everyone else
> +  * it appears to still be ongoing.  But it is also the case
> +  * that to everyone else it looks like there is nothing that
> +  * they can do to advance the grace period.  It is therefore
> +  * safe for us to drop the lock in order to mark the grace
> +  * period as completed in all of the rcu_node structures.
> +  *
> +  * But if this CPU needs another grace period, it will take
> +  * care of this while initializing the next grace period.
> +  * We use RCU_WAIT_TAIL instead of the usual RCU_DONE_TAIL
> +  * because the callbacks have not yet been advanced: Those
> +  * callbacks are waiting on the grace period that just now
> +  * completed.
> +  */
> + if (*rdp->nxttail[RCU_WAIT_TAIL] == NULL) {
> + raw_spin_unlock(&rnp->lock); /* irqs remain disabled. */
> +
> + /*
> +  * Propagate new ->completed value to rcu_node
> +  * structures so that other CPUs don't have to
> +  * wait until the start of the next grace period
> +  * to process their callbacks.
> +  */
> + rcu_for_each_node_breadth_first(rsp, rnp) {
> + /* irqs already disabled. */
> + raw_spin_lock(&rnp->lock);
> + rnp->completed = rsp->gpnum;
> + /* irqs remain disabled. */
> + raw_spin_unlock(&rnp->lock);
> + }
> + rnp = rcu_get_root(rsp);
> + raw_spin_lock(&rnp->lock); /* irqs already disabled. */
> + }
> +
> + rsp->completed = rsp->gpnum; /* Declare grace period done. */
> + trace_rcu_grace_period(rsp->name, rsp->completed, "end");
> + rsp->fqs_state = RCU_GP_IDLE;
> + if (cpu_needs_another_gp(rsp, rdp))
> + rsp->gp_flags = 1;
> + raw_spin_unlock_irqrestore(&rnp->lock, flags);
>   }
>   return 0;
>  }
> @@ -1182,57 +1242,9 @@ rcu_start_gp(struct rcu_state *rsp, unsigned long 
> flags)
>  static void rcu_report_qs_rsp(struct rcu_state *rsp, unsigned

Re: [PATCH tip/core/rcu 04/23] rcu: Allow RCU grace-period cleanup to be preempted

2012-09-01 Thread Josh Triplett

On Thu, Aug 30, 2012 at 11:18:19AM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> RCU grace-period cleanup is currently carried out with interrupts
> disabled, which can result in excessive latency spikes on large systems
> (many hundreds or thousands of CPUs).  This patch therefore makes the
> RCU grace-period cleanup be preemptible, including voluntary preemption
> points, which should eliminate those latency spikes.  Similar spikes from
> forcing of quiescent states will be dealt with similarly by later patches.
> 
> Reported-by: Mike Galbraith 
> Reported-by: Dimitri Sivanich 
> Signed-off-by: Paul E. McKenney 

Reviewed-by: Josh Triplett 

>  kernel/rcutree.c |   11 +--
>  1 files changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index 9fad21c..300aba6 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -1170,7 +1170,7 @@ static int rcu_gp_kthread(void *arg)
>* completed.
>*/
>   if (*rdp->nxttail[RCU_WAIT_TAIL] == NULL) {
> - raw_spin_unlock(&rnp->lock); /* irqs remain disabled. */
> + raw_spin_unlock_irqrestore(&rnp->lock, flags);
>  
>   /*
>* Propagate new ->completed value to rcu_node
> @@ -1179,14 +1179,13 @@ static int rcu_gp_kthread(void *arg)
>* to process their callbacks.
>*/
>   rcu_for_each_node_breadth_first(rsp, rnp) {
> - /* irqs already disabled. */
> - raw_spin_lock(&rnp->lock);
> + raw_spin_lock_irqsave(&rnp->lock, flags);
>   rnp->completed = rsp->gpnum;
> - /* irqs remain disabled. */
> - raw_spin_unlock(&rnp->lock);
> + raw_spin_unlock_irqrestore(&rnp->lock, flags);
> + cond_resched();
>   }
>   rnp = rcu_get_root(rsp);
> - raw_spin_lock(&rnp->lock); /* irqs already disabled. */
> + raw_spin_lock_irqsave(&rnp->lock, flags);
>   }
>  
>   rsp->completed = rsp->gpnum; /* Declare grace period done. */
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 05/23] rcu: Prevent offline CPUs from executing RCU core code

2012-09-01 Thread Josh Triplett

On Thu, Aug 30, 2012 at 11:18:20AM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> Earlier versions of RCU invoked the RCU core from the CPU_DYING notifier
> in order to note a quiescent state for the outgoing CPU.  Because the
> CPU is marked "offline" during the execution of the CPU_DYING notifiers,
> the RCU core had to tolerate being invoked from an offline CPU.  However,
> commit b1420f1c (Make rcu_barrier() less disruptive) left only tracing
> code in the CPU_DYING notifier, so the RCU core need no longer execute
> on offline CPUs.  This commit therefore enforces this restriction.
> 
> Signed-off-by: Paul E. McKenney 
> Signed-off-by: Paul E. McKenney 

Reviewed-by: Josh Triplett 

>  kernel/rcutree.c |2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index 300aba6..84a6f55 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -1892,6 +1892,8 @@ static void rcu_process_callbacks(struct softirq_action 
> *unused)
>  {
>   struct rcu_state *rsp;
>  
> + if (cpu_is_offline(smp_processor_id()))
> + return;
>   trace_rcu_utilization("Start RCU core");
>   for_each_rcu_flavor(rsp)
>   __rcu_process_callbacks(rsp);
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 06/23] rcu: Break up rcu_gp_kthread() into subfunctions

2012-09-01 Thread Josh Triplett

On Thu, Aug 30, 2012 at 11:18:21AM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> Then rcu_gp_kthread() function is too large and furthermore needs to
> have the force_quiescent_state() code pulled in.  This commit therefore
> breaks up rcu_gp_kthread() into rcu_gp_init() and rcu_gp_cleanup().
> 
> Signed-off-by: Paul E. McKenney 

Reviewed-by: Josh Triplett 

>  kernel/rcutree.c |  260 
> +-
>  1 files changed, 138 insertions(+), 122 deletions(-)
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index 84a6f55..c2c036f 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -1040,160 +1040,176 @@ rcu_start_gp_per_cpu(struct rcu_state *rsp, struct 
> rcu_node *rnp, struct rcu_dat
>  }
>  
>  /*
> - * Body of kthread that handles grace periods.
> + * Initialize a new grace period.
>   */
> -static int rcu_gp_kthread(void *arg)
> +static int rcu_gp_init(struct rcu_state *rsp)
>  {
>   unsigned long flags;
> - unsigned long gp_duration;
>   struct rcu_data *rdp;
> - struct rcu_node *rnp;
> - struct rcu_state *rsp = arg;
> + struct rcu_node *rnp = rcu_get_root(rsp);
>  
> - for (;;) {
> + raw_spin_lock_irqsave(&rnp->lock, flags);
> + rsp->gp_flags = 0;
>  
> - /* Handle grace-period start. */
> - rnp = rcu_get_root(rsp);
> - for (;;) {
> - wait_event_interruptible(rsp->gp_wq, rsp->gp_flags);
> - if (rsp->gp_flags)
> - break;
> - flush_signals(current);
> - }
> + if (rcu_gp_in_progress(rsp)) {
> + /* Grace period already in progress, don't start another. */
> + raw_spin_unlock_irqrestore(&rnp->lock, flags);
> + return 0;
> + }
> +
> + if (rsp->fqs_active) {
> + /*
> +  * We need a grace period, but force_quiescent_state()
> +  * is running.  Tell it to start one on our behalf.
> +  */
> + rsp->fqs_need_gp = 1;
> + raw_spin_unlock_irqrestore(&rnp->lock, flags);
> + return 0;
> + }
> +
> + /* Advance to a new grace period and initialize state. */
> + rsp->gpnum++;
> + trace_rcu_grace_period(rsp->name, rsp->gpnum, "start");
> + WARN_ON_ONCE(rsp->fqs_state == RCU_GP_INIT);
> + rsp->fqs_state = RCU_GP_INIT; /* Stop force_quiescent_state. */
> + rsp->jiffies_force_qs = jiffies + RCU_JIFFIES_TILL_FORCE_QS;
> + record_gp_stall_check_time(rsp);
> + raw_spin_unlock_irqrestore(&rnp->lock, flags);
> +
> + /* Exclude any concurrent CPU-hotplug operations. */
> + get_online_cpus();
> +
> + /*
> +  * Set the quiescent-state-needed bits in all the rcu_node
> +  * structures for all currently online CPUs in breadth-first order,
> +  * starting from the root rcu_node structure, relying on the layout
> +  * of the tree within the rsp->node[] array.  Note that other CPUs
> +  * access only the leaves of the hierarchy, thus seeing that no
> +  * grace period is in progress, at least until the corresponding
> +  * leaf node has been initialized.  In addition, we have excluded
> +  * CPU-hotplug operations.
> +  *
> +  * The grace period cannot complete until the initialization
> +  * process finishes, because this kthread handles both.
> +  */
> + rcu_for_each_node_breadth_first(rsp, rnp) {
>   raw_spin_lock_irqsave(&rnp->lock, flags);
> - rsp->gp_flags = 0;
>   rdp = this_cpu_ptr(rsp->rda);
> + rcu_preempt_check_blocked_tasks(rnp);
> + rnp->qsmask = rnp->qsmaskinit;
> + rnp->gpnum = rsp->gpnum;
> + rnp->completed = rsp->completed;
> + if (rnp == rdp->mynode)
> + rcu_start_gp_per_cpu(rsp, rnp, rdp);
> + rcu_preempt_boost_start_gp(rnp);
> + trace_rcu_grace_period_init(rsp->name, rnp->gpnum,
> + rnp->level, rnp->grplo,
> + rnp->grphi, rnp->qsmask);
> + raw_spin_unlock_irqrestore(&rnp->lock, flags);
> + cond_resched();
> + }
>  
> - if (rcu_gp_in_progress(rsp)) {
> - /*
> -  * A grace period is already in progress, so
> -  * don't start another one.
>

Re: [PATCH tip/core/rcu 07/23] rcu: Provide OOM handler to motivate lazy RCU callbacks

2012-09-01 Thread Josh Triplett

On Thu, Aug 30, 2012 at 11:18:22AM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> In kernels built with CONFIG_RCU_FAST_NO_HZ=y, CPUs can accumulate a
> large number of lazy callbacks, which as the name implies will be slow
> to be invoked.  This can be a problem on small-memory systems, where the
> default 6-second sleep for CPUs having only lazy RCU callbacks could well
> be fatal.  This commit therefore installs an OOM hander that ensures that
> every CPU with non-lazy callbacks has at least one non-lazy callback,
> in turn ensuring timely advancement for these callbacks.

Did you mean "every CPU with lazy callbacks" here?

> Signed-off-by: Paul E. McKenney 
> Signed-off-by: Paul E. McKenney 
> Tested-by: Sasha Levin 

Reviewed-by: Josh Triplett 

>  kernel/rcutree.h|5 ++-
>  kernel/rcutree_plugin.h |   80 
> +++
>  2 files changed, 84 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/rcutree.h b/kernel/rcutree.h
> index 117a150..effb273 100644
> --- a/kernel/rcutree.h
> +++ b/kernel/rcutree.h
> @@ -315,8 +315,11 @@ struct rcu_data {
>   unsigned long n_rp_need_fqs;
>   unsigned long n_rp_need_nothing;
>  
> - /* 6) _rcu_barrier() callback. */
> + /* 6) _rcu_barrier() and OOM callbacks. */
>   struct rcu_head barrier_head;
> +#ifdef CONFIG_RCU_FAST_NO_HZ
> + struct rcu_head oom_head;
> +#endif /* #ifdef CONFIG_RCU_FAST_NO_HZ */
>  
>   int cpu;
>   struct rcu_state *rsp;
> diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
> index 7f3244c..bac8cc1 100644
> --- a/kernel/rcutree_plugin.h
> +++ b/kernel/rcutree_plugin.h
> @@ -25,6 +25,7 @@
>   */
>  
>  #include 
> +#include 
>  
>  #define RCU_KTHREAD_PRIO 1
>  
> @@ -2112,6 +2113,85 @@ static void rcu_idle_count_callbacks_posted(void)
>   __this_cpu_add(rcu_dynticks.nonlazy_posted, 1);
>  }
>  
> +/*
> + * Data for flushing lazy RCU callbacks at OOM time.
> + */
> +static atomic_t oom_callback_count;
> +static DECLARE_WAIT_QUEUE_HEAD(oom_callback_wq);
> +
> +/*
> + * RCU OOM callback -- decrement the outstanding count and deliver the
> + * wake-up if we are the last one.
> + */
> +static void rcu_oom_callback(struct rcu_head *rhp)
> +{
> + if (atomic_dec_and_test(&oom_callback_count))
> + wake_up(&oom_callback_wq);
> +}
> +
> +/*
> + * Post an rcu_oom_notify callback on the current CPU if it has at
> + * least one lazy callback.  This will unnecessarily post callbacks
> + * to CPUs that already have a non-lazy callback at the end of their
> + * callback list, but this is an infrequent operation, so accept some
> + * extra overhead to keep things simple.
> + */
> +static void rcu_oom_notify_cpu(void *flavor)
> +{
> + struct rcu_state *rsp = flavor;
> + struct rcu_data *rdp = __this_cpu_ptr(rsp->rda);
> +
> + if (rdp->qlen_lazy != 0) {
> + atomic_inc(&oom_callback_count);
> + rsp->call(&rdp->oom_head, rcu_oom_callback);
> + }
> +}
> +
> +/*
> + * If low on memory, ensure that each CPU has a non-lazy callback.
> + * This will wake up CPUs that have only lazy callbacks, in turn
> + * ensuring that they free up the corresponding memory in a timely manner.
> + */
> +static int rcu_oom_notify(struct notifier_block *self,
> +   unsigned long notused, void *nfreed)
> +{
> + int cpu;
> + struct rcu_state *rsp;
> +
> + /* Wait for callbacks from earlier instance to complete. */
> + wait_event(oom_callback_wq, atomic_read(&oom_callback_count) == 0);
> +
> + /*
> +  * Prevent premature wakeup: ensure that all increments happen
> +  * before there is a chance of the counter reaching zero.
> +  */
> + atomic_set(&oom_callback_count, 1);
> +
> + get_online_cpus();
> + for_each_online_cpu(cpu)
> + for_each_rcu_flavor(rsp)
> + smp_call_function_single(cpu, rcu_oom_notify_cpu,
> +  rsp, 1);
> + put_online_cpus();
> +
> + /* Unconditionally decrement: no need to wake ourselves up. */
> + atomic_dec(&oom_callback_count);
> +
> + *(unsigned long *)nfreed = 1;
> + return NOTIFY_OK;
> +}
> +
> +static struct notifier_block rcu_oom_nb = {
> + .notifier_call = rcu_oom_notify
> +};
> +
> +static int __init rcu_register_oom_notifier(void)
> +{
> + register_oom_notifier(&rcu_oom_nb);
> + return 0;
> +}
> +early_initcall(rcu_register_oom_notifier);
> +
>  #endif /* #else #if !defined(CONFIG_RCU_FAST_NO_HZ) */
>  
>  #ifdef CONFIG_RCU_CPU_STALL_INFO
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 08/23] rcu: Segregate rcu_state fields to improve cache locality

2012-09-01 Thread Josh Triplett

On Thu, Aug 30, 2012 at 11:18:23AM -0700, Paul E. McKenney wrote:
> From: Dimitri Sivanich 
> 
> The fields in the rcu_state structure that are protected by the
> root rcu_node structure's ->lock can share a cache line with the
> fields protected by ->onofflock.  This can result in excessive
> memory contention on large systems, so this commit applies
> cacheline_internodealigned_in_smp to the ->onofflock field in
> order to segregate them.
> 
> Signed-off-by: Dimitri Sivanich 
> Signed-off-by: Paul E. McKenney 
> Tested-by: Dimitri Sivanich 

Reviewed-by: Josh Triplett 

>  kernel/rcutree.h |3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/rcutree.h b/kernel/rcutree.h
> index effb273..5d92b80 100644
> --- a/kernel/rcutree.h
> +++ b/kernel/rcutree.h
> @@ -394,7 +394,8 @@ struct rcu_state {
>  
>   /* End of fields guarded by root rcu_node's lock. */
>  
> - raw_spinlock_t onofflock;   /* exclude on/offline and */
> + raw_spinlock_t onofflock cacheline_internodealigned_in_smp;
> + /* exclude on/offline and */
>   /*  starting new GP. */
>   struct rcu_head *orphan_nxtlist;/* Orphaned callbacks that */
>   /*  need a grace period. */
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 10/23] rcu: Allow RCU quiescent-state forcing to be preempted

2012-09-01 Thread Josh Triplett

On Thu, Aug 30, 2012 at 11:18:25AM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> RCU quiescent-state forcing is currently carried out without preemption
> points, which can result in excessive latency spikes on large systems
> (many hundreds or thousands of CPUs).  This patch therefore inserts
> a voluntary preemption point into force_qs_rnp(), which should greatly
> reduce the magnitude of these spikes.
> 
> Reported-by: Mike Galbraith 
> Reported-by: Dimitri Sivanich 
> Signed-off-by: Paul E. McKenney 

Reviewed-by: Josh Triplett 

>  kernel/rcutree.c |1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index 79c2c28..cce73ff 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -1784,6 +1784,7 @@ static void force_qs_rnp(struct rcu_state *rsp, int 
> (*f)(struct rcu_data *))
>   struct rcu_node *rnp;
>  
>   rcu_for_each_leaf_node(rsp, rnp) {
> + cond_resched();
>   mask = 0;
>   raw_spin_lock_irqsave(&rnp->lock, flags);
>   if (!rcu_gp_in_progress(rsp)) {
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 11/23] rcu: Adjust debugfs tracing for kthread-based quiescent-state forcing

2012-09-01 Thread Josh Triplett

On Thu, Aug 30, 2012 at 11:18:26AM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> Moving quiescent-state forcing into a kthread dispenses with the need
> for the ->n_rp_need_fqs field, so this commit removes it.
> 
> Signed-off-by: Paul E. McKenney 

Reviewed-by: Josh Triplett 

>  Documentation/RCU/trace.txt |   43 
> ---
>  kernel/rcutree.h|1 -
>  kernel/rcutree_trace.c  |3 +--
>  3 files changed, 17 insertions(+), 30 deletions(-)
> 
> diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt
> index f6f15ce..672d190 100644
> --- a/Documentation/RCU/trace.txt
> +++ b/Documentation/RCU/trace.txt
> @@ -333,23 +333,23 @@ o   Each element of the form "1/1 0:127 ^0" 
> represents one struct
>  The output of "cat rcu/rcu_pending" looks as follows:
>  
>  rcu_sched:
> -  0 np=255892 qsp=53936 rpq=85 cbr=0 cng=14417 gpc=10033 gps=24320 nf=6445 
> nn=146741
> -  1 np=261224 qsp=54638 rpq=33 cbr=0 cng=25723 gpc=16310 gps=2849 nf=5912 
> nn=155792
> -  2 np=237496 qsp=49664 rpq=23 cbr=0 cng=2762 gpc=45478 gps=1762 nf=1201 
> nn=136629
> -  3 np=236249 qsp=48766 rpq=98 cbr=0 cng=286 gpc=48049 gps=1218 nf=207 
> nn=137723
> -  4 np=221310 qsp=46850 rpq=7 cbr=0 cng=26 gpc=43161 gps=4634 nf=3529 
> nn=123110
> -  5 np=237332 qsp=48449 rpq=9 cbr=0 cng=54 gpc=47920 gps=3252 nf=201 
> nn=137456
> -  6 np=219995 qsp=46718 rpq=12 cbr=0 cng=50 gpc=42098 gps=6093 nf=4202 
> nn=120834
> -  7 np=249893 qsp=49390 rpq=42 cbr=0 cng=72 gpc=38400 gps=17102 nf=41 
> nn=144888
> +  0 np=255892 qsp=53936 rpq=85 cbr=0 cng=14417 gpc=10033 gps=24320 nn=146741
> +  1 np=261224 qsp=54638 rpq=33 cbr=0 cng=25723 gpc=16310 gps=2849 nn=155792
> +  2 np=237496 qsp=49664 rpq=23 cbr=0 cng=2762 gpc=45478 gps=1762 nn=136629
> +  3 np=236249 qsp=48766 rpq=98 cbr=0 cng=286 gpc=48049 gps=1218 nn=137723
> +  4 np=221310 qsp=46850 rpq=7 cbr=0 cng=26 gpc=43161 gps=4634 nn=123110
> +  5 np=237332 qsp=48449 rpq=9 cbr=0 cng=54 gpc=47920 gps=3252 nn=137456
> +  6 np=219995 qsp=46718 rpq=12 cbr=0 cng=50 gpc=42098 gps=6093 nn=120834
> +  7 np=249893 qsp=49390 rpq=42 cbr=0 cng=72 gpc=38400 gps=17102 nn=144888
>  rcu_bh:
> -  0 np=146741 qsp=1419 rpq=6 cbr=0 cng=6 gpc=0 gps=0 nf=2 nn=145314
> -  1 np=155792 qsp=12597 rpq=3 cbr=0 cng=0 gpc=4 gps=8 nf=3 nn=143180
> -  2 np=136629 qsp=18680 rpq=1 cbr=0 cng=0 gpc=7 gps=6 nf=0 nn=117936
> -  3 np=137723 qsp=2843 rpq=0 cbr=0 cng=0 gpc=10 gps=7 nf=0 nn=134863
> -  4 np=123110 qsp=12433 rpq=0 cbr=0 cng=0 gpc=4 gps=2 nf=0 nn=110671
> -  5 np=137456 qsp=4210 rpq=1 cbr=0 cng=0 gpc=6 gps=5 nf=0 nn=133235
> -  6 np=120834 qsp=9902 rpq=2 cbr=0 cng=0 gpc=6 gps=3 nf=2 nn=110921
> -  7 np=144888 qsp=26336 rpq=0 cbr=0 cng=0 gpc=8 gps=2 nf=0 nn=118542
> +  0 np=146741 qsp=1419 rpq=6 cbr=0 cng=6 gpc=0 gps=0 nn=145314
> +  1 np=155792 qsp=12597 rpq=3 cbr=0 cng=0 gpc=4 gps=8 nn=143180
> +  2 np=136629 qsp=18680 rpq=1 cbr=0 cng=0 gpc=7 gps=6 nn=117936
> +  3 np=137723 qsp=2843 rpq=0 cbr=0 cng=0 gpc=10 gps=7 nn=134863
> +  4 np=123110 qsp=12433 rpq=0 cbr=0 cng=0 gpc=4 gps=2 nn=110671
> +  5 np=137456 qsp=4210 rpq=1 cbr=0 cng=0 gpc=6 gps=5 nn=133235
> +  6 np=120834 qsp=9902 rpq=2 cbr=0 cng=0 gpc=6 gps=3 nn=110921
> +  7 np=144888 qsp=26336 rpq=0 cbr=0 cng=0 gpc=8 gps=2 nn=118542
>  
>  As always, this is once again split into "rcu_sched" and "rcu_bh"
>  portions, with CONFIG_TREE_PREEMPT_RCU kernels having an additional
> @@ -377,17 +377,6 @@ o"gpc" is the number of times that an old grace 
> period had
>  o"gps" is the number of times that a new grace period had started,
>   but this CPU was not yet aware of it.
>  
> -o"nf" is the number of times that this CPU suspected that the
> - current grace period had run for too long, and thus needed to
> - be forced.
> -
> - Please note that "forcing" consists of sending resched IPIs
> - to holdout CPUs.  If that CPU really still is in an old RCU
> - read-side critical section, then we really do have to wait for it.
> - The assumption behing "forcing" is that the CPU is not still in
> - an old RCU read-side critical section, but has not yet responded
> - for some other reason.
> -
>  o"nn" is the number of times that this CPU needed nothing.  Alert
>   readers will note that the rcu "nn" number for a given CPU very
>   closely matches the rcu_bh "np" number for that same CPU.  This
> diff --git a/kernel/rcutree.h b/kernel/rcutree.h
> index 1f26b1f..36916df 100644
> --- a/kernel/rcutree.h
> +++ b/kernel/rcutree.h
> @@ -312,7 +312,6 @@ struc

Re: [PATCH tip/core/rcu 12/23] rcu: Prevent force_quiescent_state() memory contention

2012-09-02 Thread Josh Triplett

On Thu, Aug 30, 2012 at 11:18:27AM -0700, Paul E. McKenney wrote:
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
[...]
> @@ -1824,16 +1825,35 @@ static void force_qs_rnp(struct rcu_state *rsp, int 
> (*f)(struct rcu_data *))
>  static void force_quiescent_state(struct rcu_state *rsp)
>  {
>   unsigned long flags;
> - struct rcu_node *rnp = rcu_get_root(rsp);
> + bool ret;
> + struct rcu_node *rnp;
> + struct rcu_node *rnp_old = NULL;
> +
> + /* Funnel through hierarchy to reduce memory contention. */
> + rnp = per_cpu_ptr(rsp->rda, raw_smp_processor_id())->mynode;

What makes this use of raw_smp_processor_id() safe?  (And, could you
document the answer here?)

> + for (; rnp != NULL; rnp = rnp->parent) {
> + ret = (ACCESS_ONCE(rsp->gp_flags) & RCU_GP_FLAG_FQS) ||
> +   !raw_spin_trylock(&rnp->fqslock);

So, the root lock will still get trylocked by one CPU per second-level
tree node, just not by every CPU?

> @@ -2721,10 +2741,14 @@ static void __init rcu_init_levelspread(struct 
> rcu_state *rsp)
>  static void __init rcu_init_one(struct rcu_state *rsp,
>   struct rcu_data __percpu *rda)
>  {
> - static char *buf[] = { "rcu_node_level_0",
> -"rcu_node_level_1",
> -"rcu_node_level_2",
> -"rcu_node_level_3" };  /* Match MAX_RCU_LVLS */
> + static char *buf[] = { "rcu_node_0",
> +    "rcu_node_1",
> +"rcu_node_2",
> +"rcu_node_3" };  /* Match MAX_RCU_LVLS */

Why rename these?

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] trace: Stop compiling in trace_clock unconditionally

2012-09-02 Thread Josh Triplett

Commit 56449f437add737a1e5e1cb7e00f63ac8ead1938, in April 2009, made
trace_clock available unconditionally, since CONFIG_X86_DS used it too.
Commit faa4602e47690fb11221e00f9b9697c8dc0d4b19, in March 2010, removed
CONFIG_X86_DS, and now only CONFIG_RING_BUFFER (split out from
CONFIG_TRACING for general use) has a dependency on trace_clock.  So,
only compile in trace_clock with CONFIG_RING_BUFFER or CONFIG_TRACING
enabled.

Signed-off-by: Josh Triplett 
---
 kernel/Makefile   |2 +-
 kernel/trace/Kconfig  |5 +
 kernel/trace/Makefile |6 +-
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/kernel/Makefile b/kernel/Makefile
index c0cc67a..29d993b 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -98,7 +98,7 @@ obj-$(CONFIG_COMPAT_BINFMT_ELF) += elfcore.o
 obj-$(CONFIG_BINFMT_ELF_FDPIC) += elfcore.o
 obj-$(CONFIG_FUNCTION_TRACER) += trace/
 obj-$(CONFIG_TRACING) += trace/
-obj-$(CONFIG_X86_DS) += trace/
+obj-$(CONFIG_TRACE_CLOCK) += trace/
 obj-$(CONFIG_RING_BUFFER) += trace/
 obj-$(CONFIG_TRACEPOINTS) += trace/
 obj-$(CONFIG_IRQ_WORK) += irq_work.o
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 8c4c070..e8b7c26 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -57,8 +57,12 @@ config HAVE_C_RECORDMCOUNT
 config TRACER_MAX_TRACE
bool
 
+config TRACE_CLOCK
+   bool
+
 config RING_BUFFER
bool
+   select TRACE_CLOCK
 
 config FTRACE_NMI_ENTER
bool
@@ -109,6 +113,7 @@ config TRACING
select NOP_TRACER
select BINARY_PRINTF
select EVENT_TRACING
+   select TRACE_CLOCK
 
 config GENERIC_TRACER
bool
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index b831087..1b8e4c7 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -17,11 +17,7 @@ endif
 
 CFLAGS_trace_events_filter.o := -I$(src)
 
-#
-# Make the trace clocks available generally: it's infrastructure
-# relied on by ptrace for example:
-#
-obj-y += trace_clock.o
+obj-$(CONFIG_TRACE_CLOCK) += trace_clock.o
 
 obj-$(CONFIG_FUNCTION_TRACER) += libftrace.o
 obj-$(CONFIG_RING_BUFFER) += ring_buffer.o
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] sound: Remove the last mention of SNDRV_MAIN_OBJECT_FILE

2012-09-02 Thread Josh Triplett

SNDRV_MAIN_OBJECT_FILE hasn't done anything since the pre-git days, and
the only remaining reference occurs as a #define in sound/last.c.  Drop
that last mention of it.

Signed-off-by: Josh Triplett 
---
 sound/last.c |1 -
 1 file changed, 1 deletion(-)

diff --git a/sound/last.c b/sound/last.c
index 7ffc182..43f2228 100644
--- a/sound/last.c
+++ b/sound/last.c
@@ -19,7 +19,6 @@
  *
  */
 
-#define SNDRV_MAIN_OBJECT_FILE
 #include 
 #include 
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] hid: Remove "default m" from HID_LOGITECH_DJ

2012-09-02 Thread Josh Triplett

HID_LOGITECH_DJ uses "default m", which enables it in default kernel
builds.  Since this module just enables extra, non-critical
functionality for one particular piece of hardware (specifically,
differentiating multiple wireless keyboards and mice as separate input
devices rather than treating them as one device), and the hardware works
just fine with the default USB HID support, drop the "default m".

Signed-off-by: Josh Triplett 
---
 drivers/hid/Kconfig |1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/hid/Kconfig b/drivers/hid/Kconfig
index fbf4950..d004528 100644
--- a/drivers/hid/Kconfig
+++ b/drivers/hid/Kconfig
@@ -307,7 +307,6 @@ config HID_LOGITECH
 config HID_LOGITECH_DJ
tristate "Logitech Unifying receivers full support"
depends on HID_LOGITECH
-   default m
---help---
Say Y if you want support for Logitech Unifying receivers and devices.
Unifying receivers are capable of pairing up to 6 Logitech compliant
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/5] x86: Improve defconfigs for use on current systems

2012-09-02 Thread Josh Triplett

After repeatedly going through the cycle of building a "make defconfig" kernel,
trying to boot it, getting a kernel panic, turning on ext4, and rebuilding, I
figured I'd actually get the defconfigs fixed to work on modern systems with
ext4 root filesystems.  Patch 2 of this patch series does exactly that.
Hopefully this will prove uncontroversial.

To avoid extraneous noise in patch 2, I first updated the defconfigs to match
the current results of "make defconfig && make savedefconfig", resulting in
patch 1.

Finally, while reviewing the defconfigs, I also ran into a few other random
things to clean up, resulting in patches 3-5.  Patch 3 disables some library
code that only exists to support out-of-tree modules (in-tree modules properly
depend on it); patch 4 disables initrd support (in favor of initramfs); patch 5
disables a special-purpose test module that represents the one and only module
built by default.  These seem reasonable to me, but if anyone finds one of
these three changes objectionable, please feel free to drop that change.  I
primarily care about getting patches 1-2 merged, to avoid a very common
annoyance.

Josh Triplett (5):
  x86: Update defconfigs to current results of "make savedefconfig"
  x86: Switch to ext4 in defconfigs
  x86: Disable CONFIG_CRC_T10DIF in defconfigs
  x86, defconfig: Turn off CONFIG_BLK_DEV_RAM
  x86: Turn off DEBUG_NX_TEST module in defconfigs

 arch/x86/configs/i386_defconfig   |   23 +++
 arch/x86/configs/x86_64_defconfig |   23 +++
 2 files changed, 14 insertions(+), 32 deletions(-)

-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 5/5] x86: Turn off DEBUG_NX_TEST module in defconfigs

2012-09-02 Thread Josh Triplett

The x86 defconfigs include exactly one module: test_nx.ko, a
special-purpose module which just exists to do evil things like
executing code off the stack to see if the kernel has enabled NX
support.  Anyone who actually uses that module can easily enable it
themselves, but the vast majority of kernel builds don't need it;
disable it by default.

Signed-off-by: Josh Triplett 
---
 arch/x86/configs/i386_defconfig   |1 -
 arch/x86/configs/x86_64_defconfig |1 -
 2 files changed, 2 deletions(-)

diff --git a/arch/x86/configs/i386_defconfig b/arch/x86/configs/i386_defconfig
index a6533a2..5598547 100644
--- a/arch/x86/configs/i386_defconfig
+++ b/arch/x86/configs/i386_defconfig
@@ -298,7 +298,6 @@ CONFIG_PROVIDE_OHCI1394_DMA_INIT=y
 CONFIG_EARLY_PRINTK_DBGP=y
 CONFIG_DEBUG_STACKOVERFLOW=y
 # CONFIG_DEBUG_RODATA_TEST is not set
-CONFIG_DEBUG_NX_TEST=m
 CONFIG_DEBUG_BOOT_PARAMS=y
 CONFIG_OPTIMIZE_INLINING=y
 CONFIG_KEYS_DEBUG_PROC_KEYS=y
diff --git a/arch/x86/configs/x86_64_defconfig 
b/arch/x86/configs/x86_64_defconfig
index 18f3cc4..671524d 100644
--- a/arch/x86/configs/x86_64_defconfig
+++ b/arch/x86/configs/x86_64_defconfig
@@ -297,7 +297,6 @@ CONFIG_PROVIDE_OHCI1394_DMA_INIT=y
 CONFIG_EARLY_PRINTK_DBGP=y
 CONFIG_DEBUG_STACKOVERFLOW=y
 # CONFIG_DEBUG_RODATA_TEST is not set
-CONFIG_DEBUG_NX_TEST=m
 CONFIG_DEBUG_BOOT_PARAMS=y
 CONFIG_OPTIMIZE_INLINING=y
 CONFIG_KEYS_DEBUG_PROC_KEYS=y
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/5] x86: Update defconfigs to current results of "make savedefconfig"

2012-09-02 Thread Josh Triplett

The x86 defconfigs have become somewhat out of date compared to the
current result of "make savedefconfig".  Update them to the current output, as
a prelude to further defconfig changes, to avoid unrelated noise in
those further changes.

Signed-off-by: Josh Triplett 
---
 arch/x86/configs/i386_defconfig   |   12 
 arch/x86/configs/x86_64_defconfig |   12 
 2 files changed, 8 insertions(+), 16 deletions(-)

diff --git a/arch/x86/configs/i386_defconfig b/arch/x86/configs/i386_defconfig
index 119db67..1903408 100644
--- a/arch/x86/configs/i386_defconfig
+++ b/arch/x86/configs/i386_defconfig
@@ -8,6 +8,8 @@ CONFIG_TASK_DELAY_ACCT=y
 CONFIG_TASK_XACCT=y
 CONFIG_TASK_IO_ACCOUNTING=y
 CONFIG_AUDIT=y
+CONFIG_NO_HZ=y
+CONFIG_HIGH_RES_TIMERS=y
 CONFIG_LOG_BUF_SHIFT=18
 CONFIG_CGROUPS=y
 CONFIG_CGROUP_FREEZER=y
@@ -34,8 +36,6 @@ CONFIG_SGI_PARTITION=y
 CONFIG_SUN_PARTITION=y
 CONFIG_KARMA_PARTITION=y
 CONFIG_EFI_PARTITION=y
-CONFIG_NO_HZ=y
-CONFIG_HIGH_RES_TIMERS=y
 CONFIG_SMP=y
 CONFIG_X86_GENERIC=y
 CONFIG_HPET_TIMER=y
@@ -231,8 +231,6 @@ CONFIG_SND_HRTIMER=y
 CONFIG_SND_HDA_INTEL=y
 CONFIG_SND_HDA_HWDEP=y
 CONFIG_HIDRAW=y
-CONFIG_HID_PID=y
-CONFIG_USB_HIDDEV=y
 CONFIG_HID_GYRATION=y
 CONFIG_LOGITECH_FF=y
 CONFIG_HID_NTRIG=y
@@ -243,11 +241,11 @@ CONFIG_HID_SAMSUNG=y
 CONFIG_HID_SONY=y
 CONFIG_HID_SUNPLUS=y
 CONFIG_HID_TOPSEED=y
+CONFIG_HID_PID=y
+CONFIG_USB_HIDDEV=y
 CONFIG_USB=y
 CONFIG_USB_DEBUG=y
 CONFIG_USB_ANNOUNCE_NEW_DEVICES=y
-CONFIG_USB_DEVICEFS=y
-# CONFIG_USB_DEVICE_CLASS is not set
 CONFIG_USB_MON=y
 CONFIG_USB_EHCI_HCD=y
 # CONFIG_USB_EHCI_TT_NEWSCHED is not set
@@ -280,7 +278,6 @@ CONFIG_PROC_KCORE=y
 CONFIG_TMPFS_POSIX_ACL=y
 CONFIG_HUGETLBFS=y
 CONFIG_NFS_FS=y
-CONFIG_NFS_V3=y
 CONFIG_NFS_V3_ACL=y
 CONFIG_NFS_V4=y
 CONFIG_ROOT_NFS=y
@@ -299,7 +296,6 @@ CONFIG_DEBUG_KERNEL=y
 CONFIG_SCHEDSTATS=y
 CONFIG_TIMER_STATS=y
 CONFIG_DEBUG_STACK_USAGE=y
-CONFIG_SYSCTL_SYSCALL_CHECK=y
 CONFIG_BLK_DEV_IO_TRACE=y
 CONFIG_PROVIDE_OHCI1394_DMA_INIT=y
 CONFIG_EARLY_PRINTK_DBGP=y
diff --git a/arch/x86/configs/x86_64_defconfig 
b/arch/x86/configs/x86_64_defconfig
index 76eb290..c2c0448 100644
--- a/arch/x86/configs/x86_64_defconfig
+++ b/arch/x86/configs/x86_64_defconfig
@@ -8,6 +8,8 @@ CONFIG_TASK_DELAY_ACCT=y
 CONFIG_TASK_XACCT=y
 CONFIG_TASK_IO_ACCOUNTING=y
 CONFIG_AUDIT=y
+CONFIG_NO_HZ=y
+CONFIG_HIGH_RES_TIMERS=y
 CONFIG_LOG_BUF_SHIFT=18
 CONFIG_CGROUPS=y
 CONFIG_CGROUP_FREEZER=y
@@ -34,8 +36,6 @@ CONFIG_SGI_PARTITION=y
 CONFIG_SUN_PARTITION=y
 CONFIG_KARMA_PARTITION=y
 CONFIG_EFI_PARTITION=y
-CONFIG_NO_HZ=y
-CONFIG_HIGH_RES_TIMERS=y
 CONFIG_SMP=y
 CONFIG_CALGARY_IOMMU=y
 CONFIG_NR_CPUS=64
@@ -227,8 +227,6 @@ CONFIG_SND_HRTIMER=y
 CONFIG_SND_HDA_INTEL=y
 CONFIG_SND_HDA_HWDEP=y
 CONFIG_HIDRAW=y
-CONFIG_HID_PID=y
-CONFIG_USB_HIDDEV=y
 CONFIG_HID_GYRATION=y
 CONFIG_LOGITECH_FF=y
 CONFIG_HID_NTRIG=y
@@ -239,11 +237,11 @@ CONFIG_HID_SAMSUNG=y
 CONFIG_HID_SONY=y
 CONFIG_HID_SUNPLUS=y
 CONFIG_HID_TOPSEED=y
+CONFIG_HID_PID=y
+CONFIG_USB_HIDDEV=y
 CONFIG_USB=y
 CONFIG_USB_DEBUG=y
 CONFIG_USB_ANNOUNCE_NEW_DEVICES=y
-CONFIG_USB_DEVICEFS=y
-# CONFIG_USB_DEVICE_CLASS is not set
 CONFIG_USB_MON=y
 CONFIG_USB_EHCI_HCD=y
 # CONFIG_USB_EHCI_TT_NEWSCHED is not set
@@ -280,7 +278,6 @@ CONFIG_PROC_KCORE=y
 CONFIG_TMPFS_POSIX_ACL=y
 CONFIG_HUGETLBFS=y
 CONFIG_NFS_FS=y
-CONFIG_NFS_V3=y
 CONFIG_NFS_V3_ACL=y
 CONFIG_NFS_V4=y
 CONFIG_ROOT_NFS=y
@@ -298,7 +295,6 @@ CONFIG_DEBUG_KERNEL=y
 CONFIG_SCHEDSTATS=y
 CONFIG_TIMER_STATS=y
 CONFIG_DEBUG_STACK_USAGE=y
-CONFIG_SYSCTL_SYSCALL_CHECK=y
 CONFIG_BLK_DEV_IO_TRACE=y
 CONFIG_PROVIDE_OHCI1394_DMA_INIT=y
 CONFIG_EARLY_PRINTK_DBGP=y
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/5] x86: Switch to ext4 in defconfigs

2012-09-02 Thread Josh Triplett

The current x86 and x86-64 defconfigs do not enable ext4, which most
current distributions default to.  Switch the defconfigs to ext4, so
they will boot on current systems without additional configuration.

Signed-off-by: Josh Triplett 
---
 arch/x86/configs/i386_defconfig   |7 +++
 arch/x86/configs/x86_64_defconfig |7 +++
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/arch/x86/configs/i386_defconfig b/arch/x86/configs/i386_defconfig
index 1903408..2701b8a 100644
--- a/arch/x86/configs/i386_defconfig
+++ b/arch/x86/configs/i386_defconfig
@@ -260,10 +260,9 @@ CONFIG_RTC_CLASS=y
 CONFIG_DMADEVICES=y
 CONFIG_EEEPC_LAPTOP=y
 CONFIG_EFI_VARS=y
-CONFIG_EXT3_FS=y
-# CONFIG_EXT3_DEFAULTS_TO_ORDERED is not set
-CONFIG_EXT3_FS_POSIX_ACL=y
-CONFIG_EXT3_FS_SECURITY=y
+CONFIG_EXT4_FS=y
+CONFIG_EXT4_FS_POSIX_ACL=y
+CONFIG_EXT4_FS_SECURITY=y
 CONFIG_QUOTA=y
 CONFIG_QUOTA_NETLINK_INTERFACE=y
 # CONFIG_PRINT_QUOTA_WARNING is not set
diff --git a/arch/x86/configs/x86_64_defconfig 
b/arch/x86/configs/x86_64_defconfig
index c2c0448..c17614e 100644
--- a/arch/x86/configs/x86_64_defconfig
+++ b/arch/x86/configs/x86_64_defconfig
@@ -260,10 +260,9 @@ CONFIG_AMD_IOMMU_STATS=y
 CONFIG_INTEL_IOMMU=y
 # CONFIG_INTEL_IOMMU_DEFAULT_ON is not set
 CONFIG_EFI_VARS=y
-CONFIG_EXT3_FS=y
-# CONFIG_EXT3_DEFAULTS_TO_ORDERED is not set
-CONFIG_EXT3_FS_POSIX_ACL=y
-CONFIG_EXT3_FS_SECURITY=y
+CONFIG_EXT4_FS=y
+CONFIG_EXT4_FS_POSIX_ACL=y
+CONFIG_EXT4_FS_SECURITY=y
 CONFIG_QUOTA=y
 CONFIG_QUOTA_NETLINK_INTERFACE=y
 # CONFIG_PRINT_QUOTA_WARNING is not set
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/5] x86: Disable CONFIG_CRC_T10DIF in defconfigs

2012-09-02 Thread Josh Triplett

CONFIG_CRC_T10DIF explicitly states that it exists only for use by
out-of-tree modules; anything in-kernel that needs it selects it.  Thus,
compile it out by default.

Signed-off-by: Josh Triplett 
---
 arch/x86/configs/i386_defconfig   |1 -
 arch/x86/configs/x86_64_defconfig |1 -
 2 files changed, 2 deletions(-)

diff --git a/arch/x86/configs/i386_defconfig b/arch/x86/configs/i386_defconfig
index 2701b8a..d833bb6 100644
--- a/arch/x86/configs/i386_defconfig
+++ b/arch/x86/configs/i386_defconfig
@@ -311,4 +311,3 @@ CONFIG_SECURITY_SELINUX_BOOTPARAM=y
 CONFIG_SECURITY_SELINUX_DISABLE=y
 CONFIG_CRYPTO_AES_586=y
 # CONFIG_CRYPTO_ANSI_CPRNG is not set
-CONFIG_CRC_T10DIF=y
diff --git a/arch/x86/configs/x86_64_defconfig 
b/arch/x86/configs/x86_64_defconfig
index c17614e..7ddcd99 100644
--- a/arch/x86/configs/x86_64_defconfig
+++ b/arch/x86/configs/x86_64_defconfig
@@ -309,4 +309,3 @@ CONFIG_SECURITY_SELINUX=y
 CONFIG_SECURITY_SELINUX_BOOTPARAM=y
 CONFIG_SECURITY_SELINUX_DISABLE=y
 # CONFIG_CRYPTO_ANSI_CPRNG is not set
-CONFIG_CRC_T10DIF=y
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/5] x86, defconfig: Turn off CONFIG_BLK_DEV_RAM

2012-09-02 Thread Josh Triplett

The vast majority of systems either use initramfs or mount a root
filesystem directly from the kernel.  Distros have defaulted to
initramfs for years.  Only highly specialized systems would use an
actual filesystem-image initrd at this point, and such systems don't
rely on defconfig anyway.  Drop initrd support (and specifically RAM
block device support) from the defconfigs.

Signed-off-by: Josh Triplett 
---
 arch/x86/configs/i386_defconfig   |2 --
 arch/x86/configs/x86_64_defconfig |2 --
 2 files changed, 4 deletions(-)

diff --git a/arch/x86/configs/i386_defconfig b/arch/x86/configs/i386_defconfig
index d833bb6..a6533a2 100644
--- a/arch/x86/configs/i386_defconfig
+++ b/arch/x86/configs/i386_defconfig
@@ -144,8 +144,6 @@ CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
 CONFIG_DEBUG_DEVRES=y
 CONFIG_CONNECTOR=y
 CONFIG_BLK_DEV_LOOP=y
-CONFIG_BLK_DEV_RAM=y
-CONFIG_BLK_DEV_RAM_SIZE=16384
 CONFIG_BLK_DEV_SD=y
 CONFIG_BLK_DEV_SR=y
 CONFIG_BLK_DEV_SR_VENDOR=y
diff --git a/arch/x86/configs/x86_64_defconfig 
b/arch/x86/configs/x86_64_defconfig
index 7ddcd99..18f3cc4 100644
--- a/arch/x86/configs/x86_64_defconfig
+++ b/arch/x86/configs/x86_64_defconfig
@@ -144,8 +144,6 @@ CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
 CONFIG_DEBUG_DEVRES=y
 CONFIG_CONNECTOR=y
 CONFIG_BLK_DEV_LOOP=y
-CONFIG_BLK_DEV_RAM=y
-CONFIG_BLK_DEV_RAM_SIZE=16384
 CONFIG_BLK_DEV_SD=y
 CONFIG_BLK_DEV_SR=y
 CONFIG_BLK_DEV_SR_VENDOR=y
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 13/23] rcu: Control grace-period duration from sysfs

2012-09-03 Thread Josh Triplett

On Thu, Aug 30, 2012 at 11:18:28AM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> Some uses of RCU benefit from shorter grace periods, while others benefit
> more from the greater efficiency provided by longer grace periods.
> Therefore, this commit allows the durations to be controlled from sysfs.
> There are two sysfs parameters, one named "jiffies_till_first_fqs" that
> specifies the delay in jiffies from the end of grace-period initialization
> until the first attempt to force quiescent states, and the other named
> "jiffies_till_next_fqs" that specifies the delay (again in jiffies)
> between subsequent attempts to force quiescent states.  They both default
> to three jiffies, which is compatible with the old hard-coded behavior.
> 
> Signed-off-by: Paul E. McKenney 
> Signed-off-by: Paul E. McKenney 

Signed-off-by: Josh Triplett 

>  Documentation/kernel-parameters.txt |   11 +++
>  kernel/rcutree.c|   25 ++---
>  2 files changed, 33 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/kernel-parameters.txt 
> b/Documentation/kernel-parameters.txt
> index ad7e2e5..55ada04 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -2385,6 +2385,17 @@ bytes respectively. Such letter suffixes can also be 
> entirely omitted.
>   rcutree.rcu_cpu_stall_timeout= [KNL,BOOT]
>   Set timeout for RCU CPU stall warning messages.
>  
> + rcutree.jiffies_till_first_fqs= [KNL,BOOT]
> + Set delay from grace-period initialization to
> + first attempt to force quiescent states.
> + Units are jiffies, minimum value is zero,
> + and maximum value is HZ.
> +
> + rcutree.jiffies_till_next_fqs= [KNL,BOOT]
> + Set delay between subsequent attempts to force
> + quiescent states.  Units are jiffies, minimum
> + value is one, and maximum value is HZ.
> +
>   rcutorture.fqs_duration= [KNL,BOOT]
>   Set duration of force_quiescent_state bursts.
>  
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index ed1be62..1d33240 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -226,6 +226,12 @@ int rcu_cpu_stall_timeout __read_mostly = 
> CONFIG_RCU_CPU_STALL_TIMEOUT;
>  module_param(rcu_cpu_stall_suppress, int, 0644);
>  module_param(rcu_cpu_stall_timeout, int, 0644);
>  
> +static ulong jiffies_till_first_fqs = RCU_JIFFIES_TILL_FORCE_QS;
> +static ulong jiffies_till_next_fqs = RCU_JIFFIES_TILL_FORCE_QS;
> +
> +module_param(jiffies_till_first_fqs, ulong, 0644);
> +module_param(jiffies_till_next_fqs, ulong, 0644);
> +
>  static void force_qs_rnp(struct rcu_state *rsp, int (*f)(struct rcu_data *));
>  static void force_quiescent_state(struct rcu_state *rsp);
>  static int rcu_pending(int cpu);
> @@ -1193,6 +1199,7 @@ static void rcu_gp_cleanup(struct rcu_state *rsp)
>  static int rcu_gp_kthread(void *arg)
>  {
>   int fqs_state;
> + unsigned long j;
>   int ret;
>   struct rcu_state *rsp = arg;
>   struct rcu_node *rnp = rcu_get_root(rsp);
> @@ -1213,14 +1220,18 @@ static int rcu_gp_kthread(void *arg)
>  
>   /* Handle quiescent-state forcing. */
>   fqs_state = RCU_SAVE_DYNTICK;
> + j = jiffies_till_first_fqs;
> + if (j > HZ) {
> + j = HZ;
> + jiffies_till_first_fqs = HZ;
> + }
>   for (;;) {
> - rsp->jiffies_force_qs = jiffies +
> - RCU_JIFFIES_TILL_FORCE_QS;
> + rsp->jiffies_force_qs = jiffies + j;
>   ret = wait_event_interruptible_timeout(rsp->gp_wq,
>   (rsp->gp_flags & RCU_GP_FLAG_FQS) ||
>   (!ACCESS_ONCE(rnp->qsmask) &&
>!rcu_preempt_blocked_readers_cgp(rnp)),
> - RCU_JIFFIES_TILL_FORCE_QS);
> + j);
>   /* If grace period done, leave loop. */
>   if (!ACCESS_ONCE(rnp->qsmask) &&
>   !rcu_preempt_blocked_readers_cgp(rnp))
> @@ -1234,6 +1245,14 @@ static int rcu_gp_kthread(void *arg)
>   cond_resched();
>   flush_signals(current);
>   }
> + j = jiffies_till_next_fqs;
> +

Re: [PATCH tip/core/rcu 13/23] rcu: Control grace-period duration from sysfs

2012-09-03 Thread Josh Triplett

On Mon, Sep 03, 2012 at 02:30:16AM -0700, Josh Triplett wrote:
> On Thu, Aug 30, 2012 at 11:18:28AM -0700, Paul E. McKenney wrote:
> > From: "Paul E. McKenney" 
> > 
> > Some uses of RCU benefit from shorter grace periods, while others benefit
> > more from the greater efficiency provided by longer grace periods.
> > Therefore, this commit allows the durations to be controlled from sysfs.
> > There are two sysfs parameters, one named "jiffies_till_first_fqs" that
> > specifies the delay in jiffies from the end of grace-period initialization
> > until the first attempt to force quiescent states, and the other named
> > "jiffies_till_next_fqs" that specifies the delay (again in jiffies)
> > between subsequent attempts to force quiescent states.  They both default
> > to three jiffies, which is compatible with the old hard-coded behavior.
> > 
> > Signed-off-by: Paul E. McKenney 
> > Signed-off-by: Paul E. McKenney 
> 
> Signed-off-by: Josh Triplett 

Er, sorry, typo:
Reviewed-by: Josh Triplett 

> >  Documentation/kernel-parameters.txt |   11 +++
> >  kernel/rcutree.c|   25 ++---
> >  2 files changed, 33 insertions(+), 3 deletions(-)
> > 
> > diff --git a/Documentation/kernel-parameters.txt 
> > b/Documentation/kernel-parameters.txt
> > index ad7e2e5..55ada04 100644
> > --- a/Documentation/kernel-parameters.txt
> > +++ b/Documentation/kernel-parameters.txt
> > @@ -2385,6 +2385,17 @@ bytes respectively. Such letter suffixes can also be 
> > entirely omitted.
> > rcutree.rcu_cpu_stall_timeout= [KNL,BOOT]
> > Set timeout for RCU CPU stall warning messages.
> >  
> > +   rcutree.jiffies_till_first_fqs= [KNL,BOOT]
> > +   Set delay from grace-period initialization to
> > +   first attempt to force quiescent states.
> > +   Units are jiffies, minimum value is zero,
> > +   and maximum value is HZ.
> > +
> > +   rcutree.jiffies_till_next_fqs= [KNL,BOOT]
> > +   Set delay between subsequent attempts to force
> > +   quiescent states.  Units are jiffies, minimum
> > +   value is one, and maximum value is HZ.
> > +
> > rcutorture.fqs_duration= [KNL,BOOT]
> > Set duration of force_quiescent_state bursts.
> >  
> > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> > index ed1be62..1d33240 100644
> > --- a/kernel/rcutree.c
> > +++ b/kernel/rcutree.c
> > @@ -226,6 +226,12 @@ int rcu_cpu_stall_timeout __read_mostly = 
> > CONFIG_RCU_CPU_STALL_TIMEOUT;
> >  module_param(rcu_cpu_stall_suppress, int, 0644);
> >  module_param(rcu_cpu_stall_timeout, int, 0644);
> >  
> > +static ulong jiffies_till_first_fqs = RCU_JIFFIES_TILL_FORCE_QS;
> > +static ulong jiffies_till_next_fqs = RCU_JIFFIES_TILL_FORCE_QS;
> > +
> > +module_param(jiffies_till_first_fqs, ulong, 0644);
> > +module_param(jiffies_till_next_fqs, ulong, 0644);
> > +
> >  static void force_qs_rnp(struct rcu_state *rsp, int (*f)(struct rcu_data 
> > *));
> >  static void force_quiescent_state(struct rcu_state *rsp);
> >  static int rcu_pending(int cpu);
> > @@ -1193,6 +1199,7 @@ static void rcu_gp_cleanup(struct rcu_state *rsp)
> >  static int rcu_gp_kthread(void *arg)
> >  {
> > int fqs_state;
> > +   unsigned long j;
> > int ret;
> > struct rcu_state *rsp = arg;
> > struct rcu_node *rnp = rcu_get_root(rsp);
> > @@ -1213,14 +1220,18 @@ static int rcu_gp_kthread(void *arg)
> >  
> > /* Handle quiescent-state forcing. */
> > fqs_state = RCU_SAVE_DYNTICK;
> > +   j = jiffies_till_first_fqs;
> > +   if (j > HZ) {
> > +   j = HZ;
> > +   jiffies_till_first_fqs = HZ;
> > +   }
> > for (;;) {
> > -   rsp->jiffies_force_qs = jiffies +
> > -   RCU_JIFFIES_TILL_FORCE_QS;
> > +   rsp->jiffies_force_qs = jiffies + j;
> > ret = wait_event_interruptible_timeout(rsp->gp_wq,
> > (rsp->gp_flags & RCU_GP_FLAG_FQS) ||
> > (!ACCESS_ONCE(rnp->qsmask) &&
> >  !rcu_preempt_blocked_readers_cgp(rnp)),
> > -   RCU_JIFFIES_TILL_FORCE_QS);
> > +

Re: [PATCH tip/core/rcu 14/23] rcu: Remove now-unused rcu_state fields

2012-09-03 Thread Josh Triplett

On Thu, Aug 30, 2012 at 11:18:29AM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> Moving the RCU grace-period processing to a kthread and adjusting the
> tracing resulted in two of the rcu_state structure's fields being unused.
> This commit therefore removes them.
> 
> Signed-off-by: Paul E. McKenney 

Reviewed-by: Josh Triplett 

>  kernel/rcutree.h |7 ---
>  1 files changed, 0 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/rcutree.h b/kernel/rcutree.h
> index 2d4cc18..8f0293c 100644
> --- a/kernel/rcutree.h
> +++ b/kernel/rcutree.h
> @@ -378,13 +378,6 @@ struct rcu_state {
>  
>   u8  fqs_state cacheline_internodealigned_in_smp;
>   /* Force QS state. */
> - u8  fqs_active; /* force_quiescent_state() */
> - /*  is running. */
> - u8  fqs_need_gp;/* A CPU was prevented from */
> - /*  starting a new grace */
> - /*  period because */
> - /*  force_quiescent_state() */
> - /*  was running. */
>   u8  boost;  /* Subject to priority boost. */
>   unsigned long gpnum;/* Current gp number. */
>   unsigned long completed;/* # of last completed gp. */
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 15/23] rcu: Make rcutree module parameters visible in sysfs

2012-09-03 Thread Josh Triplett

On Thu, Aug 30, 2012 at 11:18:30AM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> The module parameters blimit, qhimark, and qlomark (and more
> recently, rcu_fanout_leaf) have permission masks of zero, so
> that their values are not visible from sysfs.  This is unnecessary
> and inconvenient to administrators who might like an easy way to
> see what these values are on a running system.  This commit therefore
> sets their permission masks to 0444, allowing them to be read but
> not written.
> 
> Reported-by: Rusty Russell 
> Reported-by: Josh Triplett 
> Signed-off-by: Paul E. McKenney 

Reviewed-by: Josh Triplett 

>  kernel/rcutree.c |8 
>  1 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index 1d33240..55f20fd 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -88,7 +88,7 @@ LIST_HEAD(rcu_struct_flavors);
>  
>  /* Increase (but not decrease) the CONFIG_RCU_FANOUT_LEAF at boot time. */
>  static int rcu_fanout_leaf = CONFIG_RCU_FANOUT_LEAF;
> -module_param(rcu_fanout_leaf, int, 0);
> +module_param(rcu_fanout_leaf, int, 0444);
>  int rcu_num_lvls __read_mostly = RCU_NUM_LVLS;
>  static int num_rcu_lvl[] = {  /* Number of rcu_nodes at specified level. */
>   NUM_RCU_LVL_0,
> @@ -216,9 +216,9 @@ static int blimit = 10;   /* Maximum callbacks 
> per rcu_do_batch. */
>  static int qhimark = 1;  /* If this many pending, ignore blimit. */
>  static int qlowmark = 100;   /* Once only this many pending, use blimit. */
>  
> -module_param(blimit, int, 0);
> -module_param(qhimark, int, 0);
> -module_param(qlowmark, int, 0);
> +module_param(blimit, int, 0444);
> +module_param(qhimark, int, 0444);
> +module_param(qlowmark, int, 0444);
>  
>  int rcu_cpu_stall_suppress __read_mostly; /* 1 = suppress stall warnings. */
>  int rcu_cpu_stall_timeout __read_mostly = CONFIG_RCU_CPU_STALL_TIMEOUT;
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 16/23] rcu: Prevent initialization-time quiescent-state race

2012-09-03 Thread Josh Triplett

On Thu, Aug 30, 2012 at 11:18:31AM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> Now the the grace-period initialization procedure is preemptible, it is
> subject to the following race on systems whose rcu_node tree contains
> more than one node:
> 
> 1.CPU 31 starts initializing the grace period, including the
>   first leaf rcu_node structures, and is then preempted.
> 
> 2.CPU 0 refers to the first leaf rcu_node structure, and notes
>   that a new grace period has started.  It passes through a
>   quiescent state shortly thereafter, and informs the RCU core
>   of this rite of passage.
> 
> 3.CPU 0 enters an RCU read-side critical section, acquiring
>   a pointer to an RCU-protected data item.
> 
> 4.CPU 31 removes the data item referenced by CPU 0 from the
>   data structure, and registers an RCU callback in order to
>   free it.
> 
> 5.CPU 31 resumes initializing the grace period, including its
>   own rcu_node structure.  In invokes rcu_start_gp_per_cpu(),
>   which advances all callbacks, including the one registered
>   in #4 above, to be handled by the current grace period.
> 
> 6.The remaining CPUs pass through quiescent states and inform
>   the RCU core, but CPU 0 remains in its RCU read-side critical
>   section, still referencing the now-removed data item.
> 
> 7.The grace period completes and all the callbacks are invoked,
>   including the one that frees the data item that CPU 0 is still
>   referencing.  Oops!!!
> 
> This commit therefore moves the callback handling to precede initialization
> of any of the rcu_node structures, thus avoiding this race.

I don't think it makes sense to introduce and subsequently fix a race in
the same patch series. :)

Could you squash this patch into the one moving grace-period
initialization into a kthread?

- Josh Triplett

> Signed-off-by: Paul E. McKenney 
> ---
>  kernel/rcutree.c |   33 +++--
>  1 files changed, 19 insertions(+), 14 deletions(-)
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index 55f20fd..d435009 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -1028,20 +1028,6 @@ rcu_start_gp_per_cpu(struct rcu_state *rsp, struct 
> rcu_node *rnp, struct rcu_dat
>   /* Prior grace period ended, so advance callbacks for current CPU. */
>   __rcu_process_gp_end(rsp, rnp, rdp);
>  
> - /*
> -  * Because this CPU just now started the new grace period, we know
> -  * that all of its callbacks will be covered by this upcoming grace
> -  * period, even the ones that were registered arbitrarily recently.
> -  * Therefore, advance all outstanding callbacks to RCU_WAIT_TAIL.
> -  *
> -  * Other CPUs cannot be sure exactly when the grace period started.
> -  * Therefore, their recently registered callbacks must pass through
> -  * an additional RCU_NEXT_READY stage, so that they will be handled
> -  * by the next RCU grace period.
> -  */
> - rdp->nxttail[RCU_NEXT_READY_TAIL] = rdp->nxttail[RCU_NEXT_TAIL];
> - rdp->nxttail[RCU_WAIT_TAIL] = rdp->nxttail[RCU_NEXT_TAIL];
> -
>   /* Set state so that this CPU will detect the next quiescent state. */
>   __note_new_gpnum(rsp, rnp, rdp);
>  }
> @@ -1068,6 +1054,25 @@ static int rcu_gp_init(struct rcu_state *rsp)
>   rsp->gpnum++;
>   trace_rcu_grace_period(rsp->name, rsp->gpnum, "start");
>   record_gp_stall_check_time(rsp);
> +
> + /*
> +  * Because this CPU just now started the new grace period, we
> +  * know that all of its callbacks will be covered by this upcoming
> +  * grace period, even the ones that were registered arbitrarily
> +  * recently.Therefore, advance all RCU_NEXT_TAIL callbacks
> +  * to RCU_NEXT_READY_TAIL.  When the CPU later recognizes the
> +  * start of the new grace period, it will advance all callbacks
> +  * one position, which will cause all of its current outstanding
> +  * callbacks to be handled by the newly started grace period.
> +  *
> +  * Other CPUs cannot be sure exactly when the grace period started.
> +  * Therefore, their recently registered callbacks must pass through
> +  * an additional RCU_NEXT_READY stage, so that they will be handled
> +  * by the next RCU grace period.
> +  */
> + rdp = __this_cpu_ptr(rsp->rda);
> + rdp->nxttail[RCU_NEXT_READY_TAIL] = rdp->nxttail[RCU_NEXT_TAIL];
> +
>   raw_spin_unlock_irqrestore(&rnp->lock, flags);
>  
>   /* Exclude any concurrent CPU-hotplug operations. */
> -- 
> 1.7.8
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1376 matches

Mail list logo