* Robin Holt wrote:
> I have the patches sort-of finished. The patch set starts by
> moving the halt/shutdown/reboot functions over to a new
> kernel/reboot.c, next patch gets a checkpatch.pl cleanup to
> work, third patch is essentially the below patch against the
> new file, and the fourth pa
On 04/15, Robin Holt wrote:
>
> On Sat, Apr 13, 2013 at 06:30:22PM +0200, Oleg Nesterov wrote:
> > On 04/12, Robin Holt wrote:
> > >
> > > +void migrate_to_boot_cpu(void)
> > > +{
> > > + /* The boot cpu is always logical cpu 0 */
> > > + int reboot_cpu_id = 0;
> > > +
> > > + /* Make certain the c
On Mon, Apr 15, 2013 at 11:04:08AM -0500, Robin Holt wrote:
> On Sat, Apr 13, 2013 at 06:30:22PM +0200, Oleg Nesterov wrote:
> > On 04/12, Robin Holt wrote:
> > >
> > > +void migrate_to_boot_cpu(void)
> > > +{
> > > + /* The boot cpu is always logical cpu 0 */
> > > + int reboot_cpu_id = 0;
> > > +
On Sat, Apr 13, 2013 at 06:30:22PM +0200, Oleg Nesterov wrote:
> On 04/12, Robin Holt wrote:
> >
> > +void migrate_to_boot_cpu(void)
> > +{
> > + /* The boot cpu is always logical cpu 0 */
> > + int reboot_cpu_id = 0;
> > +
> > + /* Make certain the cpu I'm about to reboot on is online */
> >
I have the patches sort-of finished. The patch set starts by
moving the halt/shutdown/reboot functions over to a new
kernel/reboot.c, next patch gets a checkpatch.pl cleanup to
work, third patch is essentially the below patch against the
new file, and the fourth patch introduces a kernel boot para
On Mon, Apr 15, 2013 at 12:16:44PM +0200, Ingo Molnar wrote:
>
> * Robin Holt wrote:
>
> > From 1767003c943325e52ac78cac6fdbaf2ab63d Mon Sep 17 00:00:00 2001
> > From: Robin Holt
> > Date: Wed, 3 Apr 2013 13:52:00 -0500
> > Subject: [PATCH] Migrate shutdown/reboot to boot cpu.
> >
> > We r
* Robin Holt wrote:
> From 1767003c943325e52ac78cac6fdbaf2ab63d Mon Sep 17 00:00:00 2001
> From: Robin Holt
> Date: Wed, 3 Apr 2013 13:52:00 -0500
> Subject: [PATCH] Migrate shutdown/reboot to boot cpu.
>
> We recently noticed that reboot of a 1024 cpu machine takes approx 16
> minutes of
On 04/12/2013 03:01 PM, Robin Holt wrote:
> kernel/sys.c | 17 +++--
> 1 file changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 0da73cf..4d1047d 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -357,6 +357,19 @@ int unregister_reboot_n
On 04/12, Robin Holt wrote:
>
> +void migrate_to_boot_cpu(void)
> +{
> + /* The boot cpu is always logical cpu 0 */
> + int reboot_cpu_id = 0;
> +
> + /* Make certain the cpu I'm about to reboot on is online */
> + if (!cpu_online(reboot_cpu_id))
> + reboot_cpu_id = smp_
Meant to send this to Shawn. Too early in the morning.
Robin
On Fri, Apr 12, 2013 at 04:31:49AM -0500, Robin Holt wrote:
> On Fri, Apr 12, 2013 at 11:39:51AM +0530, Srivatsa S. Bhat wrote:
> > On 04/12/2013 11:07 AM, Ingo Molnar wrote:
> > >
> > > * Robin Holt wrote:
> > >
> > >> For the v3.9
On Fri, Apr 12, 2013 at 11:39:51AM +0530, Srivatsa S. Bhat wrote:
> On 04/12/2013 11:07 AM, Ingo Molnar wrote:
> >
> > * Robin Holt wrote:
> >
> >> For the v3.9 release, can we consider my awful patch?
> >
> > How about trying what I suggested, to make reboot affine to the boot CPU
> > explici
On 04/12/2013 11:07 AM, Ingo Molnar wrote:
>
> * Robin Holt wrote:
>
>> For the v3.9 release, can we consider my awful patch?
>
> How about trying what I suggested, to make reboot affine to the boot CPU
> explicitly, not by shutting down all the other CPUs, but by
> set_cpus_allowed() or
> s
* Robin Holt wrote:
> For the v3.9 release, can we consider my awful patch?
How about trying what I suggested, to make reboot affine to the boot CPU
explicitly, not by shutting down all the other CPUs, but by set_cpus_allowed()
or
so?
That should solve the regression, without the ugly speci
On Thu, Apr 11, 2013 at 03:08:20PM -0500, Russ Anderson wrote:
> On Thu, Apr 11, 2013 at 08:15:27PM +0530, Srivatsa S. Bhat wrote:
> > On 04/11/2013 07:53 PM, Russ Anderson wrote:
> > > On Thu, Apr 11, 2013 at 06:15:18PM +0530, Srivatsa S. Bhat wrote:
> > >>
> > >> One more thing we have to note is
On 04/12/2013 01:38 AM, Russ Anderson wrote:
> On Thu, Apr 11, 2013 at 08:15:27PM +0530, Srivatsa S. Bhat wrote:
>> On 04/11/2013 07:53 PM, Russ Anderson wrote:
>>> On Thu, Apr 11, 2013 at 06:15:18PM +0530, Srivatsa S. Bhat wrote:
One more thing we have to note is that, there are 4 notifi
On Thu, Apr 11, 2013 at 08:15:27PM +0530, Srivatsa S. Bhat wrote:
> On 04/11/2013 07:53 PM, Russ Anderson wrote:
> > On Thu, Apr 11, 2013 at 06:15:18PM +0530, Srivatsa S. Bhat wrote:
> >>
> >> One more thing we have to note is that, there are 4 notifiers for taking a
> >> CPU offline:
> >>
> >> CPU
On 04/11/2013 07:53 PM, Russ Anderson wrote:
> On Thu, Apr 11, 2013 at 06:15:18PM +0530, Srivatsa S. Bhat wrote:
>> On 04/11/2013 11:01 AM, Paul Mackerras wrote:
>>> On Wed, Apr 10, 2013 at 08:10:05AM -0700, Linus Torvalds wrote:
The optimal solution would be to just speed up the
disable_
On Thu, Apr 11, 2013 at 06:15:18PM +0530, Srivatsa S. Bhat wrote:
> On 04/11/2013 11:01 AM, Paul Mackerras wrote:
> > On Wed, Apr 10, 2013 at 08:10:05AM -0700, Linus Torvalds wrote:
> >> The optimal solution would be to just speed up the
> >> disable_nonboot_cpus() code so much that it isn't an iss
For the v3.9 release, can we consider my awful patch?
Thanks,
Robin
On Thu, Apr 11, 2013 at 06:15:18PM +0530, Srivatsa S. Bhat wrote:
> On 04/11/2013 11:01 AM, Paul Mackerras wrote:
> > On Wed, Apr 10, 2013 at 08:10:05AM -0700, Linus Torvalds wrote:
> >> The optimal solution would be to just spee
On 04/11/2013 11:01 AM, Paul Mackerras wrote:
> On Wed, Apr 10, 2013 at 08:10:05AM -0700, Linus Torvalds wrote:
>> The optimal solution would be to just speed up the
>> disable_nonboot_cpus() code so much that it isn't an issue. That would
>> be good for suspending too, although I guess suspend isn
* Robin Holt wrote:
> On Thu, Apr 11, 2013 at 07:03:58AM -0500, Robin Holt wrote:
> > On Thu, Apr 11, 2013 at 02:00:27PM +0200, Ingo Molnar wrote:
> > >
> > > * Robin Holt wrote:
> > >
> > > > > Ok, so it looks profilable.
> > > > >
> > > > > The result above is not surprising: most CPUs sit
On Thu, Apr 11, 2013 at 07:03:58AM -0500, Robin Holt wrote:
> On Thu, Apr 11, 2013 at 02:00:27PM +0200, Ingo Molnar wrote:
> >
> > * Robin Holt wrote:
> >
> > > > Ok, so it looks profilable.
> > > >
> > > > The result above is not surprising: most CPUs sit in idle and don't do
> > > > anything
On Thu, Apr 11, 2013 at 02:00:27PM +0200, Ingo Molnar wrote:
>
> * Robin Holt wrote:
>
> > > Ok, so it looks profilable.
> > >
> > > The result above is not surprising: most CPUs sit in idle and don't do
> > > anything,
> > > while the loop goes on, right?
> > >
> > > The interesting thing t
* Robin Holt wrote:
> > Ok, so it looks profilable.
> >
> > The result above is not surprising: most CPUs sit in idle and don't do
> > anything,
> > while the loop goes on, right?
> >
> > The interesting thing to profile would be the parallel bring-down, with the
> > simplest global lock so
> Ok, so it looks profilable.
>
> The result above is not surprising: most CPUs sit in idle and don't do
> anything,
> while the loop goes on, right?
>
> The interesting thing to profile would be the parallel bring-down, with the
> simplest global lock solution you mentioned. In that case most
* Robin Holt wrote:
> I had the machine booted as 512 cpus.
> I tweaked the kernel like this:
>
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 39c9c4a..b42bd4f 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -368,8 +368,10 @@ EXPORT_SYMBOL(unregister_reboot_notifier);
> */
> void k
On Wed, Apr 10, 2013 at 08:10:05AM -0700, Linus Torvalds wrote:
> The optimal solution would be to just speed up the
> disable_nonboot_cpus() code so much that it isn't an issue. That would
> be good for suspending too, although I guess suspend isn't a big issue
> if you have a thousand CPU's.
>
>
On Wed, Apr 10, 2013 at 06:59:34PM +0200, Ingo Molnar wrote:
> * Russ Anderson wrote:
>
> > Yes, I have a test patch that replaces for_each_online_cpu(cpu) with a cpu
> > bitmask in disable_nonboot_cpus(). The lower level routines already take a
> > bitmask. It allows __stop_machine() to be c
On Wed, Apr 10, 2013 at 10:29:12AM -0500, Russ Anderson wrote:
> On Wed, Apr 10, 2013 at 08:10:05AM -0700, Linus Torvalds wrote:
> >
> > Yeah, we've had issues with ACPI in the past, so I do think we should
> > always reboot using the BP. Even if it almost certainly works on 99+%
> > of all machin
> > I'm proposing to make 'reboot' overhead profilable, via a debug hack:
> >
> > echo 1 > /proc/sys/kernel/magic_dont_fully_reboot_flag
> >
> > perf record reboot
> >
> > perf is using NMIs to profile - and since much of cpu_down() is with irqs
> > disabled, NMI profiling would be
On 04/10/2013 09:59 AM, Ingo Molnar wrote:
>
> 4 minutes bootup is 240 seconds, with 1024 CPUs that's about 240 msecs per
> CPU.
>
> That sounds a lot, given that unlike bootup there's not much real work to be
> done
> during shutdown - we don't initialize anything, etc.
>
> Maybe much of tho
On Wed, Apr 10, 2013 at 07:22:36PM +0200, Ingo Molnar wrote:
>
> * Robin Holt wrote:
>
> > On Wed, Apr 10, 2013 at 06:59:34PM +0200, Ingo Molnar wrote:
> > >
> > > * Russ Anderson wrote:
> > >
> > > > Yes, I have a test patch that replaces for_each_online_cpu(cpu) with a
> > > > cpu
> > > >
* Robin Holt wrote:
> On Wed, Apr 10, 2013 at 06:59:34PM +0200, Ingo Molnar wrote:
> >
> > * Russ Anderson wrote:
> >
> > > Yes, I have a test patch that replaces for_each_online_cpu(cpu) with a
> > > cpu
> > > bitmask in disable_nonboot_cpus(). The lower level routines already take
> > >
On Wed, Apr 10, 2013 at 06:59:34PM +0200, Ingo Molnar wrote:
>
> * Russ Anderson wrote:
>
> > Yes, I have a test patch that replaces for_each_online_cpu(cpu) with a cpu
> > bitmask in disable_nonboot_cpus(). The lower level routines already take a
> > bitmask. It allows __stop_machine() to b
* Russ Anderson wrote:
> Yes, I have a test patch that replaces for_each_online_cpu(cpu) with a cpu
> bitmask in disable_nonboot_cpus(). The lower level routines already take a
> bitmask. It allows __stop_machine() to be called just once. That change
> reduces shutdown time on a 1024 cpu m
On Wed, Apr 10, 2013 at 08:10:05AM -0700, Linus Torvalds wrote:
> On Wed, Apr 10, 2013 at 4:16 AM, Ingo Molnar wrote:
> >
> > I think rebooting on the same CPU where we booted up is something worth
> > having in
> > general, as a firmware robustness feature. (assuming the CPU in question is
> >
On Wed, Apr 10, 2013 at 4:16 AM, Ingo Molnar wrote:
>
> I think rebooting on the same CPU where we booted up is something worth
> having in
> general, as a firmware robustness feature. (assuming the CPU in question is
> still
> online)
Yeah, we've had issues with ACPI in the past, so I do think
On Wed, Apr 10, 2013 at 01:16:20PM +0200, Ingo Molnar wrote:
>
> * Robin Holt wrote:
>
> > On Mon, Apr 08, 2013 at 09:11:06AM -0700, H. Peter Anvin wrote:
> > > On 04/08/2013 08:57 AM, Ingo Molnar wrote:
> > > >
> > > > I think the original commit:
> > > >
> > > > f96972f2dc63 kernel/sys.c:
* Robin Holt wrote:
> On Mon, Apr 08, 2013 at 09:11:06AM -0700, H. Peter Anvin wrote:
> > On 04/08/2013 08:57 AM, Ingo Molnar wrote:
> > >
> > > I think the original commit:
> > >
> > > f96972f2dc63 kernel/sys.c: call disable_nonboot_cpus() in
> > > kernel_restart()
> > >
> > > actually re
On Mon, Apr 08, 2013 at 05:57:01PM +0200, Ingo Molnar wrote:
>
> * Robin Holt wrote:
>
> > We noticed that recently, reboot of a 1024 cpu machine takes approx 16
> > minutes of just stopping the cpus. The slowdown was tracked to commit
> > f96972f which went into v3.7 and then to the stable tre
On Mon, Apr 08, 2013 at 09:11:06AM -0700, H. Peter Anvin wrote:
> On 04/08/2013 08:57 AM, Ingo Molnar wrote:
> >
> > I think the original commit:
> >
> > f96972f2dc63 kernel/sys.c: call disable_nonboot_cpus() in kernel_restart()
> >
> > actually regressed your 1024 CPU systems, and should poss
On Mon, Apr 08, 2013 at 05:57:01PM +0200, Ingo Molnar wrote:
>
> * Robin Holt wrote:
>
> > We noticed that recently, reboot of a 1024 cpu machine takes approx 16
> > minutes of just stopping the cpus. The slowdown was tracked to commit
> > f96972f which went into v3.7 and then to the stable tre
On 04/08/2013 08:57 AM, Ingo Molnar wrote:
>
> I think the original commit:
>
> f96972f2dc63 kernel/sys.c: call disable_nonboot_cpus() in kernel_restart()
>
> actually regressed your 1024 CPU systems, and should possibly be reverted or
> fixed
> in some other fashion - such as by migrating t
* Robin Holt wrote:
> We noticed that recently, reboot of a 1024 cpu machine takes approx 16
> minutes of just stopping the cpus. The slowdown was tracked to commit
> f96972f which went into v3.7 and then to the stable trees.
>
> x86 does not need to be running the boot cpu to pull reset and I
44 matches
Mail list logo