I've got two OpenBSD 3.9 firewall/router in a CARP configuration.  They are 
both IBM NetFinity 40004 servers with dual P3 650MHz chips and 512MB of 
memory each.  Twice now, the backup firewall has disappeared from my Nagios 
monitoring and I've found (through remote serial console) only a ddb{1}> 
prompt.

According to man ddb, this can happen when the kernel panics or when a break 
signal is sent from the console (and ddb.console is set to 1).  In my case, 
no one is using the console at these times and ddb.console is set to 0 
anyway.  However, "show panic" seems to indicate it wasn't a kernel panic 
either:

ddb{1}> show panic
the kernel did not panic

I feel like I'm missing something obvious here.  Is there some undocumented 
condition that can cause a system to crash to ddb or am I investigating the 
panic wrong? I tried using trace and hangman to gather more information, but 
hangman just confused the hell out of me and the trace command gave me: 
apm_cpu_idle(0,0,0,0,0) at apm_cpu_idle+0x4a

After a little more investigative commands, I started only to get "Faulted in 
DDB; continuing..." and tried rebooting.  "boot dump" yielded a nonresponsive 
system and a trip to the datacenter to cold boot the machine.

Anyone have any ideas?  Perhaps I can disable part of APM and avoid this 
problem in the future?  What other techniques can I use to debug this if it 
happens again - is there a good doc out there that is a little more 
descriptive than man ddb?

-- 
Regards,
Neil Schelly
Senior Systems Administrator

W: 978-667-5115 x213
M: 508-410-4776

OASIS Open http://www.oasis-open.org
"Advancing E-Business Standards Since 1993"

Reply via email to