Hi, all,

It's been a really long time since I've had much to say hereabouts,
but as I'm in the middle of an upgrade cycle (12.4 to 13.2) I wanted
to post about an issue I ran into.  On both of my workstations, my
custom kernel would hang at boot.  I didn't see this on either of the
servers that I had already upgraded.  As I was bored at home today, I
tried booting a GENERIC kernel, built from the same source tree
(13.2-RELEASE-p1) as my custom kernel, and it booted just fine.

I don't have the ability to do serial console on either of my
workstations, nor any sort of network debugging, but when I did a
verbose boot on the office workstation, it didn't show anything
interesting.  However, at home, I noticed that the hang occurred
immediately after attach of:

        hwpstate_intel0: <Intel Speed Shift> on cpu0
        hwpstate_intel1: <Intel Speed Shift> on cpu1

The first time I pressed a key on this machine's PS/2 keyboard, it got
one step further:

        hwpstate_intel2: <Intel Speed Shift> on cpu2

This is a 6-core, 12-thread system, and the working kernel gets all
the way to

        hwpstate_intel11: <Intel Speed Shift> on cpu11

nearly instantly.

I took the working GENERIC configuration and pared it down to make a
new custom kernel, and it worked (I'm using it right now).  So I
compared the working and broken configurations, and noticed the
following options were present in the working configuration and not in
the broken one:

        options EARLY_AP_STARTUP
        options GZIO
        options IICHID_SAMPLING
        options KDB
        options KDB_TRACE
        options NUMA
        options SCSI_DELAY=5000
        options SC_PIXEL_MODE
        options VESA
        options ZSTDIO

The first one, EARLY_AP_STARTUP, stood out to me as likely related to
the problem -- most of the other options involve hardware or features
that this machine doesn't use, but I could easily imagine that
configuring power state controls on CPUs that haven't been started yet
might fail.  This option isn't mentioned anywhere in UPDATING, and the
comment in GENERIC isn't espcially helpful, but I have a suspicion
that this option is now effectively mandatory, at least if `cpufreq`
is compiled into the kernel (as it is on all of my kernels and in
GENERIC as well).  To be 100% certain I should build the old config
with just that option enabled, and maybe I'll try that on my work
desktop since I still need to finish the upgrade there.

This option was apparently added in 2016 by jhb@, and in his
PHabricator description, he wrote:

        As a transition aid, the new behavior is moved under a new
        kernel option (EARLY_AP_STARTUP). This will allow the option
        to be turned off if need be during initial testing. I hope to
        enable this on x86 by default in a followup commit and to have
        all platforms moved over before 11.0. Once the transition is
        complete, the option will be removed along with the
        !EARLY_AP_STARTUP code.

Apparently we got all the way to 13.2 and this never happened.  It
should probably get at least a mention in UPDATING for anyone else who
hasn't tripped over this.

-GAWollman


Reply via email to