On Sat, Oct 25, 2014 at 09:38:16AM -0700, Jay Vosburgh wrote:
> Paul E. McKenney <paul...@linux.vnet.ibm.com> wrote:
> 
> >On Fri, Oct 24, 2014 at 09:33:33PM -0700, Jay Vosburgh wrote:
> >>    Looking at the dmesg, the early boot messages seem to be
> >> confused as to how many CPUs there are, e.g.,
> >> 
> >> [    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> >> [    0.000000] Hierarchical RCU implementation.
> >> [    0.000000]  RCU debugfs-based tracing is enabled.
> >> [    0.000000]  RCU dyntick-idle grace-period acceleration is enabled.
> >> [    0.000000]  RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=4.
> >> [    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
> >> [    0.000000] NR_IRQS:16640 nr_irqs:456 0
> >> [    0.000000]  Offload RCU callbacks from all CPUs
> >> [    0.000000]  Offload RCU callbacks from CPUs: 0-3.
> >> 
> >>    but later shows 2:
> >> 
> >> [    0.233703] x86: Booting SMP configuration:
> >> [    0.236003] .... node  #0, CPUs:      #1
> >> [    0.255528] x86: Booted up 1 node, 2 CPUs
> >> 
> >>    In any event, the E8400 is a 2 core CPU with no hyperthreading.
> >
> >Well, this might explain some of the difficulties.  If RCU decides to wait
> >on CPUs that don't exist, we will of course get a hang.  And rcu_barrier()
> >was definitely expecting four CPUs.
> >
> >So what happens if you boot with maxcpus=2?  (Or build with
> >CONFIG_NR_CPUS=2.) I suspect that this might avoid the hang.  If so,
> >I might have some ideas for a real fix.
> 
>       Booting with maxcpus=2 makes no difference (the dmesg output is
> the same).
> 
>       Rebuilding with CONFIG_NR_CPUS=2 makes the problem go away, and
> dmesg has different CPU information at boot:
> 
> [    0.000000] smpboot: 4 Processors exceeds NR_CPUS limit of 2
> [    0.000000] smpboot: Allowing 2 CPUs, 0 hotplug CPUs
>  [...]
> [    0.000000] setup_percpu: NR_CPUS:2 nr_cpumask_bits:2 nr_cpu_ids:2 
> nr_node_ids:1
>  [...]
> [    0.000000] Hierarchical RCU implementation.
> [    0.000000]        RCU debugfs-based tracing is enabled.
> [    0.000000]        RCU dyntick-idle grace-period acceleration is enabled.
> [    0.000000] NR_IRQS:4352 nr_irqs:440 0
> [    0.000000]        Offload RCU callbacks from all CPUs
> [    0.000000]        Offload RCU callbacks from CPUs: 0-1.

Thank you -- this confirms my suspicions on the fix, though I must admit
to being surprised that maxcpus made no difference.

                                                        Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to