Re: Kernel dumps [was Re: possible changes from Panzura]

Vincent Hoffman Wed, 10 Jul 2013 16:07:25 -0700

On 10/07/2013 23:09, Kevin Day wrote:
>>
>> Those sound useful.   Just out of curiosity, however, since we're on the 
>> topic of kernel dumps:  Has anyone even looked into the notion of an 
>> emergency fall-back network stack to enable remote kernel panic (or system 
>> hang) debugging, the way OS X lets you do?  I can't tell you the number of 
>> times I've NMI'd a Mac and connected to it remotely in a scenario where 
>> everything was totally wedged and just a couple of minutes in kgdb (or now 
>> lldb) quickly showed that everything was waiting on a specific lock and the 
>> problem became manifestly clear.
>>
>> The feature also lets you scrape a panic'd machine with automation, running 
>> some kgdb scripts against it to glean useful information for later analysis 
>> vs having to have someone schlep the dump image manually to triage.  It's 
>> going to be damn hard to live without this now, and if someone else isn't 
>> working on it, that's good to know too!
>
> At a previous employer, we had a system where on a panic it had a totally 
> separate stack capable of just IP/UDP/TFTP and would save its core via TFTP 
> to a server. This isn’t as nice as full remote debugging, but it was a whole 
> lot easier to develop. The caveats I remember were:
>
> 1) We didn’t want to implement ARP, so you had to write the mac address of 
> the “dump server” to the kernel via sysctl before crashing.
> 2) We also didn’t want to have to deal with routing tables, so you had to 
> manually specify what interface to blast packets out to, also via sysctl.
> 3) After a panic we didn’t want to rely on interrupt processing working, so 
> it polled the network interface and blocked whenever it needed to. Since this 
> was an embedded system, it wasn’t too big of a deal - only one network driver 
> had to be hacked to support this. Basically a flag that would switch to 
> “disable normal processing, switch to polled fifos for input and output” 
> until reboot.
> 4) The whole system used only preallocated buffers and its own stack (carved 
> out from memory on boot) so even if the kernel’s malloc was trashed, we could 
> still dump.
>
> I’m not sure this really would scratch your itch, but I believe this took me 
> no more than a day or two to implement. Parts #1 and #2 would be pretty easy, 
> but I’m not sure how generic the kernel could support an emergency network 
> mode that doesn’t require interrupts for every network card out there. Maybe 
> that isn’t as important to you as it was to us.
>
> The whole exercise is much easier if you don’t use TFTP but a custom protocol 
> that doesn’t require the crashing system to receive any packets, if it can 
> just blast away at some random host oblivious if it’s working or not, it’s a 
> lot less code to write.
>
There was some work on something similar at one point, not sure what
came of it.
http://lists.freebsd.org/pipermail/freebsd-current/2010-September/020164.html


Vince

> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
>

_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"

Re: Kernel dumps [was Re: possible changes from Panzura]

Reply via email to