On 7/11/13 6:09 AM, Kevin Day wrote:
Those sound useful. Just out of curiosity, however, since we're on the topic
of kernel dumps: Has anyone even looked into the notion of an emergency
fall-back network stack to enable remote kernel panic (or system hang)
debugging, the way OS X lets you do? I can't tell you the number of times I've
NMI'd a Mac and connected to it remotely in a scenario where everything was
totally wedged and just a couple of minutes in kgdb (or now lldb) quickly
showed that everything was waiting on a specific lock and the problem became
manifestly clear.
The feature also lets you scrape a panic'd machine with automation, running
some kgdb scripts against it to glean useful information for later analysis vs
having to have someone schlep the dump image manually to triage. It's going to
be damn hard to live without this now, and if someone else isn't working on it,
that's good to know too!
I could imagine that we could stash away a vimage stack just for this
purpose.
yould set it up on boot and leave it detached until you need it.
you just need to switch the interfaces over to the new stack on panic
and put them into 'poll' mode.
Or maybe you'd need more (like pre-allocating mbufs for it to use).
Just an idea.
At a previous employer, we had a system where on a panic it had a totally
separate stack capable of just IP/UDP/TFTP and would save its core via TFTP to
a server. This isn’t as nice as full remote debugging, but it was a whole lot
easier to develop. The caveats I remember were:
1) We didn’t want to implement ARP, so you had to write the mac address of the
“dump server” to the kernel via sysctl before crashing.
2) We also didn’t want to have to deal with routing tables, so you had to
manually specify what interface to blast packets out to, also via sysctl.
3) After a panic we didn’t want to rely on interrupt processing working, so it
polled the network interface and blocked whenever it needed to. Since this was
an embedded system, it wasn’t too big of a deal - only one network driver had
to be hacked to support this. Basically a flag that would switch to “disable
normal processing, switch to polled fifos for input and output” until reboot.
4) The whole system used only preallocated buffers and its own stack (carved
out from memory on boot) so even if the kernel’s malloc was trashed, we could
still dump.
I’m not sure this really would scratch your itch, but I believe this took me no
more than a day or two to implement. Parts #1 and #2 would be pretty easy, but
I’m not sure how generic the kernel could support an emergency network mode
that doesn’t require interrupts for every network card out there. Maybe that
isn’t as important to you as it was to us.
The whole exercise is much easier if you don’t use TFTP but a custom protocol
that doesn’t require the crashing system to receive any packets, if it can just
blast away at some random host oblivious if it’s working or not, it’s a lot
less code to write.
_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"