So, while investigating my WAPL performance problems, It looks like I can crash the machine (not reliably, but more often that not) with a simple seq 1 3000 | xargs mkdir command. I get the following backtrace in ddb (wetware OCR):
panic: wapbl_register_deallocation: out of resources fatal breakpoint trap in supervisor mode trap type 1 code 0 rip ffffffff8016f01d cs 8 rflags 246 cr2 ffff80011fc2d000 cpl 0 rsp fffffe811e0fe6f0 Stopped in pid 12551.1 (mkdir) at netbsd:breakpoint+0x5: leave db{3}> bt breakpoint() at netbs:breakpoint+0x5 vpanic() at netbsd:vpanic+0x1f2 printf_nolog() at netbsd:printf_nolog wapbl_register_inode() at netbsd:wapo_register_inode ffs_truncaze() at netbsd:ffs_truncate+0x917 ufs_direnter() at netbsd:ufs_direnter+0x481 ufs_mkdir() at netbsd:ufs_mkdir+0x617 VOP_MKDIR() at netbsd:VOP_MKDIR+0x3b do_sys_mkdir() at netbsd:do_sys_mkdir+0x10f syscall() at netbsd:syscall+0xc4 It's unreasonable to take a dump because that would take an estimated four to five hours. Is there any reasonable way to get a dump out of a 16G box? On reboot, at mounting one file system (NOT the one I was operating on as the crash happened), the "replaying log to disk" took several minutes. I physically walked to the server to have a look whether the discs were actually busy, and there was a strange pattern: Out of the five discs that the RAID was built on, four were blinking at ~7Hz while the fifth was idle. The position of the idle disc changed on a regular basis (about every two seconds), but I could not find a pattern how it moved around. Possibly sometimes, two discs were idle at the same time. Any idea why that took so long? The file system in question is small.