Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-19 Thread Peter Zijlstra
On Wed, 2012-10-17 at 20:29 -0700, David Rientjes wrote: > > Ok, thanks for the update. I agree that we should be clearing the mapping > at node hot-remove since any cpu that would subsequently get onlined and > assume one of the previous cpu's ids is not guaranteed to have the same > affinity

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-17 Thread David Rientjes
On Thu, 18 Oct 2012, Tang Chen wrote: > We are working on this problem. Since it is complicated, it really > takes us some time. Sorry for the delay. :) > > Actually, we intend to clear cpu-to-node mappings when a whole node is > removed. But the node hot-plug code is still under development, so

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-17 Thread Tang Chen
On 10/18/2012 08:52 AM, David Rientjes wrote: On Wed, 10 Oct 2012, David Rientjes wrote: Ok, so it's been a week and these patches are still in -mm. This is what I was afraid of: patches that both Peter and I nacked sitting in -mm and allow a NULL pointer dereference because no alternative patc

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-17 Thread David Rientjes
On Wed, 10 Oct 2012, David Rientjes wrote: > It fixes a BUG() that only affects users who are doing node hot-remove, > which is still radically under development, and nobody cares about except > those on the cc list, but it also introduces the NULL pointer dereference > that is attempting to be

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread David Rientjes
On Wed, 10 Oct 2012, Andrew Morton wrote: > > > So for now, let me NACK that patch. You cannot go change stuff like > > > that. > > > > > > > Agreed, that makes the nack-count up to 2 now. Andrew, please remove > > cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch > > cpu_hotplug-unm

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread Andrew Morton
On Wed, 10 Oct 2012 13:30:29 -0700 (PDT) David Rientjes wrote: > > So for now, let me NACK that patch. You cannot go change stuff like > > that. > > > > Agreed, that makes the nack-count up to 2 now. Andrew, please remove > cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch > cpu_hot

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread David Rientjes
On Wed, 10 Oct 2012, Peter Zijlstra wrote: > > If cpu_to_node() always returns a valid node id even if all cpus on the > > node are offline, then the cpumask_of_node() implementation, which the > > sched code is using, should either return an empty cpumask (if > > node_to_cpumask_map[nid] isn't

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread Peter Zijlstra
On Wed, 2012-10-10 at 18:10 +0800, Wen Congyang wrote: > I use ./scripts/get_maintainer.pl, and it doesn't tell me that I should cc > you when I post that patch. That script doesn't look at all usage sites of the code you modify does it? You need to audit the entire tree for usage of the interfa

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread Wen Congyang
At 10/10/2012 05:51 PM, Peter Zijlstra Wrote: > On Wed, 2012-10-10 at 17:33 +0800, Wen Congyang wrote: >> >> Hmm, if per-cpu memory is preserved, and we can't offline and remove >> this memory. So we can't offline the node. >> >> But, if the node is hot added, and per-cpu memory doesn't use the >>

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread Peter Zijlstra
On Wed, 2012-10-10 at 17:33 +0800, Wen Congyang wrote: > > Hmm, if per-cpu memory is preserved, and we can't offline and remove > this memory. So we can't offline the node. > > But, if the node is hot added, and per-cpu memory doesn't use the > memory on this node. We can hotremove cpu/memory on

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread Wen Congyang
At 10/10/2012 05:10 PM, Peter Zijlstra Wrote: > On Tue, 2012-10-09 at 16:27 -0700, David Rientjes wrote: >> On Tue, 9 Oct 2012, Peter Zijlstra wrote: >> >>> Well the code they were patching is in the wakeup path. As I think Tang >>> said, we leave !runnable tasks on whatever cpu they ran on last, e

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread Peter Zijlstra
On Tue, 2012-10-09 at 16:27 -0700, David Rientjes wrote: > On Tue, 9 Oct 2012, Peter Zijlstra wrote: > > > Well the code they were patching is in the wakeup path. As I think Tang > > said, we leave !runnable tasks on whatever cpu they ran on last, even if > > that cpu is offlined, we try and fix u

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread Wen Congyang
At 10/10/2012 10:06 AM, Wen Congyang Wrote: > At 10/10/2012 07:27 AM, David Rientjes Wrote: >> On Tue, 9 Oct 2012, Peter Zijlstra wrote: >> >>> Well the code they were patching is in the wakeup path. As I think Tang >>> said, we leave !runnable tasks on whatever cpu they ran on last, even if >>> th

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread Wen Congyang
At 10/10/2012 07:27 AM, David Rientjes Wrote: > On Tue, 9 Oct 2012, Peter Zijlstra wrote: > >> Well the code they were patching is in the wakeup path. As I think Tang >> said, we leave !runnable tasks on whatever cpu they ran on last, even if >> that cpu is offlined, we try and fix up state when w

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread David Rientjes
On Tue, 9 Oct 2012, Peter Zijlstra wrote: > Well the code they were patching is in the wakeup path. As I think Tang > said, we leave !runnable tasks on whatever cpu they ran on last, even if > that cpu is offlined, we try and fix up state when we get a wakeup. > > On wakeup, it tries to find a cp

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread Peter Zijlstra
On Tue, 2012-10-09 at 13:36 -0700, David Rientjes wrote: > On Tue, 9 Oct 2012, Peter Zijlstra wrote: > > > On Mon, 2012-10-08 at 10:59 +0800, Tang Chen wrote: > > > If a cpu is offline, its nid will be set to -1, and cpu_to_node(cpu) will > > > return -1. As a result, cpumask_of_node(nid) will ret

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread David Rientjes
On Tue, 9 Oct 2012, Peter Zijlstra wrote: > On Mon, 2012-10-08 at 10:59 +0800, Tang Chen wrote: > > If a cpu is offline, its nid will be set to -1, and cpu_to_node(cpu) will > > return -1. As a result, cpumask_of_node(nid) will return NULL. In this case, > > find_next_bit() in for_each_cpu will ge

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread David Rientjes
On Tue, 9 Oct 2012, Wen Congyang wrote: > I clear cpu-to-node mapping when the cpu is hotremoved. If the cpu is onlined, > it will be offlined before clearing cpu-to-node mapping. > > Here is the code in driver/acpi/processor_driver.c: > = > static int acpi_processor_handle_eject(stru

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread Peter Zijlstra
On Mon, 2012-10-08 at 10:59 +0800, Tang Chen wrote: > If a cpu is offline, its nid will be set to -1, and cpu_to_node(cpu) will > return -1. As a result, cpumask_of_node(nid) will return NULL. In this case, > find_next_bit() in for_each_cpu will get a NULL pointer and cause panic. Hurm,. this is n

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread Wen Congyang
At 10/09/2012 06:04 PM, David Rientjes Wrote: > On Tue, 9 Oct 2012, Tang Chen wrote: > Eek, the nid shouldn't be -1 yet, though, for cpu hotplug since this should be called at CPU_DYING level and migrate_tasks() still sees a valid cpu. >> >> As Wen said below, nid is now set to -1 w

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread David Rientjes
On Tue, 9 Oct 2012, Tang Chen wrote: > > > Eek, the nid shouldn't be -1 yet, though, for cpu hotplug since this > > > should be called at CPU_DYING level and migrate_tasks() still sees a valid > > > cpu. > > As Wen said below, nid is now set to -1 when cpu is hotremoved. > I reproduce this proble

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread Tang Chen
Hi David, Thanks for reviewing this patch. :) On 10/09/2012 04:34 PM, Wen Congyang wrote: At 10/09/2012 02:21 PM, David Rientjes Wrote: On Mon, 8 Oct 2012, Tang Chen wrote: + /* If the cpu has been offlined, its nid was set to -1. */ + if (nid != -1) { NUMA_NO_NODE. Yes, NUMA_

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread Wen Congyang
At 10/09/2012 02:21 PM, David Rientjes Wrote: > On Mon, 8 Oct 2012, Tang Chen wrote: > >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >> index 66b36ab..e76dce9 100644 >> --- a/kernel/sched/core.c >> +++ b/kernel/sched/core.c >> @@ -1263,18 +1263,24 @@ EXPORT_SYMBOL_GPL(kick_process); >>

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-08 Thread David Rientjes
On Mon, 8 Oct 2012, Tang Chen wrote: > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 66b36ab..e76dce9 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -1263,18 +1263,24 @@ EXPORT_SYMBOL_GPL(kick_process); > */ > static int select_fallback_rq(int cpu, struct t

[PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-07 Thread Tang Chen
If a cpu is offline, its nid will be set to -1, and cpu_to_node(cpu) will return -1. As a result, cpumask_of_node(nid) will return NULL. In this case, find_next_bit() in for_each_cpu will get a NULL pointer and cause panic. Here is a call trace: [ 609.824017] Call Trace: [ 609.824017] [ 609.