Hi. I can see the following heartbeat panic when a machine is forwarding heavy short packets:
[ 745.0068385] cpu14: found cpu15 heart stopped beating after 16 seconds [ 745.0068385] panic: cpu15: softints stuck for 16 seconds [ 745.0168386] cpu15: Begin traceback... [ 745.0168386] cpu14: found cpu15 heart stopped beating after 16 seconds [ 745.0268387] vpanic() at cpu14: found cpu15 heart stopped beating after 16 seconds [ 745.0268387] netbsd:vpanic+0x173 [ 745.0368390] cpu14: found cpu15 heart stopped beating after 16 seconds [ 745.0368390] panic() at cpu14: found cpu15 heart stopped beating after 16 seconds [ 745.0468390] netbsd:panic+0x3c [ 745.0468390] heartbeat() at netbsd:heartbeat+0x353 [ 745.0568392] hardclock() at netbsd:hardclock+0x8b [ 745.0668393] Xresume_lapic_ltimer() at netbsd:Xresume_lapic_ltimer+0x1e [ 745.0668393] --- interrupt --- [ 745.0768393] psref_release() at netbsd:psref_release+0x83 [ 745.0768393] ipintr() at netbsd:ipintr+0xef [ 745.0868396] softint_dispatch() at netbsd:softint_dispatch+0x103 [ 745.0868396] DDB lost frame for netbsd:Xsoftintr+0x4c, trying 0xffff8288589fc0f0 [ 745.0968395] Xsoftintr() at netbsd:Xsoftintr+0x4c [ 745.0968395] --- interrupt --- [ 745.1068397] f9faeac0f5baeac4: [ 745.1068397] cpu15: End traceback... [ 745.1068397] fatal breakpoint trap in supervisor mode [ 745.1168399] trap type 1 code 0 rip 0xffffffff80235425 cs 0x8 rflags 0x202 cr2 0 ilevel 0x7 rsp 0xffff8288589fbc68 [ 745.1268401] curlwp 0xffffd8070facf6c0 pid 0.175 lowest kstack 0xffff8288589f72c0 Stopped in pid 0.175 (system) at netbsd:breakpoint+0x5: leave breakpoint() at netbsd:breakpoint+0x5 vpanic() at netbsd:vpanic+0x173 panic() at netbsd:panic+0x3c heartbeat() at netbsd:heartbeat+0x353 hardclock() at netbsd:hardclock+0x8b Xresume_lapic_ltimer() at netbsd:Xresume_lapic_ltimer+0x1e --- interrupt --- psref_release() at netbsd:psref_release+0x83 ipintr() at netbsd:ipintr+0xef softint_dispatch() at netbsd:softint_dispatch+0x103 DDB lost frame for netbsd:Xsoftintr+0x4c, trying 0xffff8288589fc0f0 Xsoftintr() at netbsd:Xsoftintr+0x4c (snip) wm and ixg have hw.{wm,ixg}N.txrx_workqueue sysctl. If we set them from 0 to 1, we can avoid the panic. Many drivers have no way to avoid the problem. I think it would be good to change the default behavior from panic to something others because GENERIC kernel enables HEARTBEAT. by default. One of idea is to print warning message at sufficient intervals. Regards. -- ----------------------------------------------- SAITOH Masanobu (msai...@execsw.org msai...@netbsd.org)