On 7/9/2021 18:06, Karl Denninger wrote:
On 7/9/2021 16:17, Ryan Stone wrote:
On Thu, Jul 8, 2021 at 8:54 PM Karl Denninger <k...@denninger.net> wrote:
I will see if I can get at least a panic backtrace, although the
impacted box is a pcEngines firewall that boots of an SD card.
Have you checked whether netdump supports your NICs?  You should be
able to get a full vmcore off if so.

Yes; the box in question is in heavy production and I will not be able to get an isolated period of time to pull a core (assuming the remote dump works) until sometime this weekend.

Will advise once I (hopefully) have it.

Ok, so I have good news and bad news.

I have the trap and it is definitely in libalias which appears to come about as a result of a NAT translation attempt.

Fatal trap 18: integer divide fault while in kernel mode
cpuid = 1; apic id = 01
instruction pointer     = 0x20:0xffffffff8275b7cc
stack pointer           = 0x28:0xfffffe0017b6b310
frame pointer           = 0x28:0xfffffe0017b6b320
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (if_io_tqg_1)
trap number             = 18
panic: integer divide fault
cpuid = 1
time = 1625883072
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0017b6b020
vpanic() at vpanic+0x17b/frame 0xfffffe0017b6b070
panic() at panic+0x43/frame 0xfffffe0017b6b0d0
trap_fatal() at trap_fatal+0x391/frame 0xfffffe0017b6b130
trap() at trap+0x67/frame 0xfffffe0017b6b240
calltrap() at calltrap+0x8/frame 0xfffffe0017b6b240
--- trap 0x12, rip = 0xffffffff8275b7cc, rsp = 0xfffffe0017b6b310, rbp = 0xfffffe0017b6b320 ---
HouseKeeping() at HouseKeeping+0x1c/frame 0xfffffe0017b6b320
LibAliasInLocked() at LibAliasInLocked+0x2f/frame 0xfffffe0017b6b3e0
LibAliasIn() at LibAliasIn+0x46/frame 0xfffffe0017b6b410
ipfw_nat() at ipfw_nat+0x234/frame 0xfffffe0017b6b460
ipfw_chk() at ipfw_chk+0x1350/frame 0xfffffe0017b6b670
ipfw_check_packet() at ipfw_check_packet+0xf0/frame 0xfffffe0017b6b760
pfil_run_hooks() at pfil_run_hooks+0xb0/frame 0xfffffe0017b6b7f0
ip_input() at ip_input+0x427/frame 0xfffffe0017b6b8a0
netisr_dispatch_src() at netisr_dispatch_src+0xca/frame 0xfffffe0017b6b8f0
ether_demux() at ether_demux+0x138/frame 0xfffffe0017b6b920
ether_nh_input() at ether_nh_input+0x33b/frame 0xfffffe0017b6b980
netisr_dispatch_src() at netisr_dispatch_src+0xca/frame 0xfffffe0017b6b9d0
ether_input() at ether_input+0x4b/frame 0xfffffe0017b6ba00
iflib_rxeof() at iflib_rxeof+0xad6/frame 0xfffffe0017b6bae0
_task_fn_rx() at _task_fn_rx+0x72/frame 0xfffffe0017b6bb20
gtaskqueue_run_locked() at gtaskqueue_run_locked+0x121/frame 0xfffffe0017b6bb80 gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xb6/frame 0xfffffe0017b6bbb0
fork_exit() at fork_exit+0x7e/frame 0xfffffe0017b6bbf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0017b6bbf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
Uptime: 7m23s
netdump: overwriting mbuf zone pointers
netdump in progress. searching for server...
netdumping to 192.168.10.100 (ac:1f:6b:ad:d8:cb)
Dumping 190 out of 1882 MB:. . . . . . . . . . . . .
** DUMP FAILED (ERROR 60) **

Now the bad news -- as you can see, an attempted remote dump fails, possibly because the network code at that point is hosed. I get a 69632 length file (exactly and repeatedly) on the remote machine where the dump is set to go; it looks like the first piece of it is indeed received but that's it and then the panic'd unit reboots.

On the server (remote) end I have this in the "info" file:

Dump from IpGw [192.168.10.200]
Dump incomplete: client timed out

So it looks like it got the first part of it, the server replied but the crashed box never sent anything else.

-rw-------   1 root  wheel      2 Jul  9 22:11 bounds.IpGw
-rw-------   1 root  wheel     66 Jul  9 22:10 info.IpGw.0
-rw-------   1 root  wheel      0 Jul  9 22:11 info.IpGw.1
-rw-------   1 root  wheel  69632 Jul  9 22:00 vmcore.IpGw.0
-rw-------   1 root  wheel  69632 Jul  9 22:11 vmcore.IpGw.1

Without a complete core I can't give you a good traceback.  I may be able to get a local device on this unit sometime over the weekend sometime -- not sure as of yet as it is in production use.

This is an extremely reliable panic -- uptime is only a few minutes before it blows up.

--
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to