Re: stable/11 debugging kernel unable to produce crashdump again

2017-07-24 Thread Eugene Grosbein
On 24.07.2017 08:44, Mark Johnston wrote: >> Sadly, this time 11.1-STABLE r321371 SMP hangs instead of doing crashdump: > > Is this amd64 GENERIC, or something else? Custom kernel, amd64. > >> >> - "call doadump" from DDB prompt works just fine; >> - "shutdown -r now" reboots the system withou

Re: The 11.1-RC3 can only boot and attach disks in "Safe mode", otherwise gets stuck attaching

2017-07-24 Thread Mark Martinec
2017-07-24 04:15, Mark Johnston wrote: Could you try re-enabling EARLY_AP_STARTUP, applying the patch at the end of this email, and see if the message "sleeping before eventtimer init" appears in the boot output? If it does, it'll be followed by a backtrace that might be useful for tracking down

Re: The 11.1-RC3 can only boot and attach disks in "Safe mode", otherwise gets stuck attaching

2017-07-24 Thread Mark Martinec
Thanks! Tried it, and the message (or a backtrace) does not show during a boot of a generic (patched) kernel, at least not in the last 40-lines screen before the hang occurs. (It also does not show during a "Safe mode" successful boot.) Btw (may or may not be relevant): after the above experimen

Re: The 11.1-RC3 can only boot and attach disks in "Safe mode", otherwise gets stuck attaching

2017-07-24 Thread Steven Hartland
Based on your boot info you're using mps, so this could be related to mps fix committed to stable/11 today by ken@ https://svnweb.freebsd.org/changeset/base/321415 re@ cc'ed as this could cause hangs for others too on 11.1-RELEASE if this is the case. Regards Steve On 24/07/2017 15:5

Re: The 11.1-RC3 can only boot and attach disks in "Safe mode", otherwise gets stuck attaching

2017-07-24 Thread Ken Merry
It is possible that the change I MFCed today (r321207 in head, r321415 in stable/11) is related, but Mark will have to boot his machine with the fix to see if it makes any difference. What happened in my case on one particular machine (not on most machines in our lab running the same code) was

Re: stable/11 debugging kernel unable to produce crashdump again

2017-07-24 Thread Eugene Grosbein
CCing mav@ as graid expert. On 24.07.2017 08:44, Mark Johnston wrote: >> Sadly, this time 11.1-STABLE r321371 SMP hangs instead of doing crashdump: >> >> - "call doadump" from DDB prompt works just fine; >> - "shutdown -r now" reboots the system without problems; >> - "sysctl debug.kdb.panic=1" t

Re: stable/11 debugging kernel unable to produce crashdump again

2017-07-24 Thread Mark Johnston
On Tue, Jul 25, 2017 at 12:03:05AM +0700, Eugene Grosbein wrote: > Thanks, this helped: > > $ addr2line -f -e kernel.debug 0x80919c00 > g_raid_shutdown_post_sync > /home/src/sys/geom/raid/g_raid.c:2458 > > That is GEOM_RAID's g_raid_shutdown_post_sync() that hangs if called just > before

Re: stable/11 debugging kernel unable to produce crashdump again

2017-07-24 Thread Alexander Motin
I guess that problem of g_raid_shutdown_post_sync in case of panic can be explained by the fact it tries to write clean metadata in regular (not dumping) way while system is already in panic mode and there is no proper scheduling. May be it could be just bypassed in case of dumping (should be triv

Re: stable/11 debugging kernel unable to produce crashdump again

2017-07-24 Thread Warner Losh
I've often wondered why, for CAM at least, we don't automatically fall back to the dump way when scheduling is stopped rather than have two different interfaces and special knowledge of this in a lot of places... Warner On Mon, Jul 24, 2017 at 11:25 AM, Alexander Motin wrote: > I guess that pro

Re: stable/11 debugging kernel unable to produce crashdump again

2017-07-24 Thread Eugene Grosbein
On 25.07.2017 00:22, Mark Johnston wrote: > On Tue, Jul 25, 2017 at 12:03:05AM +0700, Eugene Grosbein wrote: >> Thanks, this helped: >> >> $ addr2line -f -e kernel.debug 0x80919c00 >> g_raid_shutdown_post_sync >> /home/src/sys/geom/raid/g_raid.c:2458 >> >> That is GEOM_RAID's g_raid_shutdow

Re: stable/11: Kernel page fault with the following non-sleepable locks held: CAM device lock

2017-07-24 Thread Eugene Grosbein
On 23.07.2017 20:02, Eugene Grosbein wrote: > Hi! > > Long story short: stable/11 r321371 started to panic at the moment of smartd > invocation > after my SSD died. > > I have Intel motherboard with graid-supported pseudo-raid. > I use it in RAID1 mode with one HDD and one SSD. > > Yesterday th

Re: The 11.1-RC3 can only boot and attach disks in "Safe mode", otherwise gets stuck attaching

2017-07-24 Thread Mark Martinec
2017-07-24 18:25, Ken Merry wrote: It is possible that the change I MFCed today (r321207 in head, r321415 in stable/11) is related, but Mark will have to boot his machine with the fix to see if it makes any difference. What happened in my case on one particular machine (not on most machines in o