On Thu, May 25, 2017 at 10:41:03AM +0100, Roger Pau Monné wrote:

> On Wed, May 24, 2017 at 06:33:07PM -0400, Adam McDougall wrote:
> > Hello,
> > 
> > Recently I made a new build of 11-STABLE but encountered a boot hang
> > at this state:
> > http://www.egr.msu.edu/~mcdouga9/pics/r318347-smp-hang.png
> > 
> > It is easy to reproduce, I can just boot from any 11 or 12 ISO that 
> > contains the commit.
> 
> I have just tested latest HEAD (r318861) and stable/11 (r318854) and
> they both work fine on my environment (a VM with 4 vCPUs and 2GB of
> RAM on OSS Xen 4.9). I'm also adding Colin in case he has some input,
> he has been doing some tests on HEAD and AFAIK he hasn't seen any
> issues.
> 
> > I compiled various svn revisions to confirm that r318347 caused the 
> > issue and r318346 is fine. With r318347 or later including the latest 
> > 11-STABLE, the system will only boot with one virtual CPU in XenServer. 
> > Any more cpus and it hangs. I also tried a 12 kernel from head this 
> > afternoon and I have the same hang. I had this issue on XenServer 7 
> > (Xen 4.7) and XenServer 6.5 (Xen 4.4). I did most of my testing on 7. I 
> > also did much of my testing with a GENERIC kernel to try to rule out 
> > kernel configuration mistakes. When it hangs, the performance 
> > monitoring in Xen tells me at least one CPU is pegged. r318674 boots 
> > fine on physical hardware without Xen involved.
> > 
> > Looking at r318347 which mentions EARLY_AP_STARTUP and later seeing 
> > r318763 which enables EARLY_AP_STARTUP in GENERIC, I tried adding it to 
> > my kernel but it turned the hang into a panic but with any number of 
> > CPUs: 
> > http://www.egr.msu.edu/~mcdouga9/pics/r318347-early-ap-startup-panic.png
> 
> I guess this is on stable/11 right? The panic looks easier to debug
> that the hang, so let's start by this one. Can you enable the serial
> console and kernel debug options in order to get a trace? With just
> this it's almost impossible to know what went wrong.

Yes this was on stable/11 amd64.

> If you still have that kernel around (and it's debug symbols), can you
> do:
> 
> $ addr2line -e /usr/lib/debug/boot/kernel/kernel.debug 0xffffffff80793344
> 
> (The address is the instruction pointer on the crash image, I think I
> got it right)

I'll reproduce this soon and get the results from that command.

> In order to compile a stable/11 kernel with full debugging support you
> will have to add:
> 
> # For full debugger support use (turn off in stable branch):
> options       BUF_TRACKING            # Track buffer history
> options       DDB                     # Support DDB.
> options       FULL_BUF_TRACKING       # Track more buffer history
> options       GDB                     # Support remote GDB.
> options       DEADLKRES               # Enable the deadlock resolver
> options       INVARIANTS              # Enable calls of extra sanity checking
> options       INVARIANT_SUPPORT       # Extra sanity checks of internal 
> structures, required by INVARIANTS
> options       WITNESS                 # Enable checks to detect deadlocks and 
> cycles
> options       WITNESS_SKIPSPIN        # Don't run witness on spinlocks for 
> speed
> options       MALLOC_DEBUG_MAXZONES=8 # Separate malloc(9) zones
> 
> To your kernel config file.

I'll work on that soon too when I get a chance, thanks.

> 
> Just to be sure, this is an amd64 kernel right?

yes

> 
> Roger.
> _______________________________________________
> [email protected] mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "[email protected]"
>  
_______________________________________________
[email protected] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[email protected]"

Reply via email to