So, while investigating my WAPL performance problems, It looks like I can 
crash the machine (not reliably, but more often that not) with a simple
        seq 1 3000 | xargs mkdir
command. I get the following backtrace in ddb (wetware OCR):

panic: wapbl_register_deallocation: out of resources
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip ffffffff8016f01d cs 8 rflags 246 cr2 ffff80011fc2d000 
cpl 0 rsp fffffe811e0fe6f0
Stopped in pid 12551.1 (mkdir) at       netbsd:breakpoint+0x5:  leave
db{3}> bt
breakpoint() at netbs:breakpoint+0x5
vpanic() at netbsd:vpanic+0x1f2
printf_nolog() at netbsd:printf_nolog
wapbl_register_inode() at netbsd:wapo_register_inode
ffs_truncaze() at netbsd:ffs_truncate+0x917
ufs_direnter() at netbsd:ufs_direnter+0x481
ufs_mkdir() at netbsd:ufs_mkdir+0x617
VOP_MKDIR() at netbsd:VOP_MKDIR+0x3b
do_sys_mkdir() at netbsd:do_sys_mkdir+0x10f
syscall() at netbsd:syscall+0xc4

It's unreasonable to take a dump because that would take an estimated four 
to five hours. Is there any reasonable way to get a dump out of a 16G box?

On reboot, at mounting one file system (NOT the one I was operating on as 
the crash happened), the "replaying log to disk" took several minutes.
I physically walked to the server to have a look whether the discs were 
actually busy, and there was a strange pattern: Out of the five discs that 
the RAID was built on, four were blinking at ~7Hz while the fifth was idle.
The position of the idle disc changed on a regular basis (about every two 
seconds), but I could not find a pattern how it moved around. Possibly 
sometimes, two discs were idle at the same time.
Any idea why that took so long? The file system in question is small.

Reply via email to