On 11/26/17 01:46, Andrew Turner wrote:
On 25 Nov 2017, at 23:41, Nathan Whitehorn <nwhiteh...@freebsd.org> wrote:
Author: nwhitehorn
Date: Sat Nov 25 23:41:05 2017
New Revision: 326218
URL: https://svnweb.freebsd.org/changeset/base/326218

Log:
  Remove some, but not all, assumptions that the BSP is CPU 0 and that CPUs
  are numbered densely from there to n_cpus.

  MFC after:    1 month

Modified:
  head/sys/kern/kern_clock.c
  head/sys/kern/kern_clocksource.c
  head/sys/kern/kern_shutdown.c
  head/sys/kern/kern_timeout.c
  head/sys/kern/sched_ule.c
  head/sys/kern/subr_pcpu.c

...
Modified: head/sys/kern/subr_pcpu.c
==============================================================================
--- head/sys/kern/subr_pcpu.c   Sat Nov 25 23:23:24 2017        (r326217)
+++ head/sys/kern/subr_pcpu.c   Sat Nov 25 23:41:05 2017        (r326218)
@@ -279,6 +279,8 @@ pcpu_destroy(struct pcpu *pcpu)
struct pcpu *
pcpu_find(u_int cpuid)
{
+       KASSERT(cpuid_to_pcpu[cpuid] != NULL,
+           ("Getting uninitialized PCPU %d", cpuid));

        return (cpuid_to_pcpu[cpuid]);
}
This breaks on one arm64 simulator I have where the device tree lists 8 cpus, 
but only 2 are enabled in the simulation. The ofw_cpu device nodes for these 
are cpu0 and cpu4 so the call to cpu_find in ofw_cpu_attach will hit this 
KASSERT when the CPU has not been enabled.

Andrew

cpu0: <Open Firmware CPU> on cpulist0
cpu1: <Open Firmware CPU> on cpulist0
panic: Getting uninitialized PCPU 1
cpuid = 0
time = 1
KDB: stack backtrace:
db_trace_self() at db_trace_self_wrapper+0x28
         pc = 0xffff0000005f03e8  lr = 0xffff000000087048
         sp = 0xffff0000000105a0  fp = 0xffff0000000107b0

That's unfortunate. Removing the new KASSERT is probably not the right option, since all it is doing is indicating a pre-existing bug. Specifically, it is preventing ofw_cpu from using a NULL pointer for CPU 4, which I suppose it was previously happily doing (it would only get dereferenced if you had cpufreq support, hence no previous panic).

A super quick-and-dirty fix would be to be fail attach on CPU_ABSENT(device_get_unit(dev)), which restores the previous buggy behavior but without the panic. If you do not actually use ofw_cpu for anything, we could also just disable it temporarily on your platform (it's off for some PPC systems, you will note, for similar reasons).

A real fix is somewhat complex, since the driver relies on being able to get a mapping from platform-specific numbers in the device tree to an entry in pcpu, which intrinsically relies on reverse-engineering the platform's mapping between some kind of hardware CPU ID and the kernel CPU numbering. I can't think of a way to do that internally. We could make some kind of platform macro, but the whole concept is even somewhat dubious at the moment since a number of IBM systems with SMT don't even have a 1:1 relationship between CPUs as FreeBSD defines them and device tree nodes, so it's possible we need a serious rearchitecture of the driver.

Please let me know how I can help get this fixed.
-Nathan

_______________________________________________
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"

Reply via email to