I've got two OpenBSD 3.9 firewall/router in a CARP configuration. They are both IBM NetFinity 40004 servers with dual P3 650MHz chips and 512MB of memory each. Twice now, the backup firewall has disappeared from my Nagios monitoring and I've found (through remote serial console) only a ddb{1}> prompt.
According to man ddb, this can happen when the kernel panics or when a break signal is sent from the console (and ddb.console is set to 1). In my case, no one is using the console at these times and ddb.console is set to 0 anyway. However, "show panic" seems to indicate it wasn't a kernel panic either: ddb{1}> show panic the kernel did not panic I feel like I'm missing something obvious here. Is there some undocumented condition that can cause a system to crash to ddb or am I investigating the panic wrong? I tried using trace and hangman to gather more information, but hangman just confused the hell out of me and the trace command gave me: apm_cpu_idle(0,0,0,0,0) at apm_cpu_idle+0x4a After a little more investigative commands, I started only to get "Faulted in DDB; continuing..." and tried rebooting. "boot dump" yielded a nonresponsive system and a trip to the datacenter to cold boot the machine. Anyone have any ideas? Perhaps I can disable part of APM and avoid this problem in the future? What other techniques can I use to debug this if it happens again - is there a good doc out there that is a little more descriptive than man ddb? -- Regards, Neil Schelly Senior Systems Administrator W: 978-667-5115 x213 M: 508-410-4776 OASIS Open http://www.oasis-open.org "Advancing E-Business Standards Since 1993"