On Thu, Jul 20, 2017 at 01:46:33AM +0200, Mark Martinec wrote:
> More news on the matter. As reported yesterday the locally built
> kernel with options INVARIANTS and DDB works fine and somehow avoids
> the trouble at attaching the da (mps) disks on an LSI controller, so
> today I wanted to get back to a reproducible hang - and sure enough,
> reverting to the generic kernel as distributed brings back the hang.
> 
> So I tried rebuilding the kernel while experimenting with options
> like DDB and INVARIANTS.
> 
> A locally built GENERIC kernel behaves the same as the original
> kernel from the distribution (as installed by freebsd-upgrade),
> so no surprises there. It hangs trying to attach the first of the
> da disks (after first successfully attaching all the ada disks).
> The alt ctrl esc is unable to enter debugger when the hang occurs
> (possibly due to an unresponsive USB keyboard at that time),
> even though the debug.kdb.break_to_debugger was set to 1 at a
> loader prompt. It needs loader "Safe mode" to be able to boot.
> 
> Next, a locally built kernel with DDB and INVARIANTS works well
> (the remaining options come from an included GENERIC).
> 
> Now the funny part: a locally built kernel with just the DDB
> option (and the rest included from GENERIC) *also* works well.
> Somehow the DDB option makes a difference, even though kernel
> debugger is never activated.

One thing to try at this point would be to disable EARLY_AP_STARTUP in
the kernel config. That is, take a configuration with which you're able
to reproduce the hang during boot, and remove "options
EARLY_AP_STARTUP".

This feature has a fairly large impact on the bootup process and has
had a few problems that manifested as hangs during boot. There was at
least one other case where an innocuous change to the kernel
configuration "fixed" the problem by introducing some second-order
effect (causing kernel threads to be scheduled in a different
order, for instance).

Regardless of whether the suggestion above makes a difference, it would
be helpful to see verbose dmesgs from both a clean boot and a boot that
hangs. If disabling EARLY_AP_STARTUP helps, then we can try adding some
assertions that will cause the system to panic when the hang occurs,
making it easier to see what's going on.

> 
> To re-assert: at the time of a hang the CPU fan starts revving up,
> and the USB keyboard is unresponsive (<scroll> does not enter scroll
> mode, caps lock and num lock do not toggle their LED indicators,
> alt ctrl esc do not activate kernel debugger. Loader "Safe mode"
> avoids the problem (presumably by disabling SMP).
> 
> Meanwhile I have successfully upgraded two other similar
> hosts from 11.0 to 11.1-RC3, no surprises there (but they do not
> have the same disk controller).
> 
> Not sure what to try next.
> 
>    Mark
_______________________________________________
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Reply via email to