Hi people, At work, some storage is put on a SUN StorEdge 3510 FibreChannel array; the disks are divided into two volumes, which are mounted using isp(4) controllers. Two seperate machines each mount a single volume (this is, box1 mounts the first volume (/dev/da1), and box2 mounts the second volume (/dev/da2)). Both volumes are formatted using UFS2.
Over the night, we reset the shelf in order to activate its new management IP address, causing the /dev/da[12] devices to be temporarily unavailable. This resulted in the following panic on the rather busy mailstorage server (the other server has minor load and was fine): --- (da0:isp0:0:1:0): lost device (da0:isp0:0:1:0): removing device entry (da1:isp0:0:2:0): lost device g_vfs_done():da1s1[WRITE(offset=292316823552, length=16384)]error = 6 g_vfs_done():da1s1[WRITE(offset=240287318016, length=16384)]error = 6 g_vfs_done():da1s1[READ(offset=12175362048, length=2048)]error = 6 g_vfs_done():da1s1[WRITE(offset=240287318016, length=16384)]error = 6 g_vfs_done():da1s1[READ(offset=18370689024, length=2048)]error = 6 g_vfs_done():da1s1[READ(offset=25829486592, length=512)]error = 6 vnode_pager_getpages: I/O read error vm_fault: pager read error, pid 78035 (lmtpd) g_vfs_done():da1s1[WRITE(offset=240287318016, length=1638(da1:isp0:0:2:0): Invalidating pack 4)]error = 6 g_vfs_done():da1s1[READ(offset=13768671232, length=6144)]error = 6 g_vfs_done():da1s1[READ(offset=102126977024, length=16384)]error = 6 g_vfs_done():da1s1[READ(offset=13768671232, length=6144)]error = 6 g_vfs_dpone():da1s1[READ(offset=102319669248, length=16384)]error = 6a nic: vinvalbuf: dirty bufs cpuid = 2 Uptime: 54d15h48m38s kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x56 fault code = supervisor read, page not present instruction pointer = 0x20:0xc0681303 stack pointer = 0x28:0xe8d973f0 frame pointer = 0x28:0xe8d973f8 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = resume, IOPL = 0 current process = 78066 (lmtpd) trap number = 12 --- When looking at the source code of vinvalbuf(), which calls bufobj_invalbuf(), it seems that this panic is raised after a bufobj still contains dirty data after waiting for it to complete without error. The code can be found at /sys/kern/vfs_subr.c The sync routine called eventually translates to bufsync(), as in /sys/kern/vfs_bio.c, which calls the filesystem's sync routine. It seems as if the return status of vfs_bio_awrite() in ffs_syncvnode() is not checked; all the other parts are checked. I believe this could provoke this panic. As the machine is in production use, it was instantly rebooted by a collegue and thus I have no vmcore, backtrace or anything. I therefore hope the information provided here is adequate. Can someone with more FreeBSD-VFS knowledge please look at this? Thanks! -- Rink P.W. Springer - http://rink.nu "It's you isn't it? THE BASTARD OPERATOR FROM HELL!" "In the flesh, on the phone and in your account..." - BOFH #3
smime.p7s
Description: S/MIME cryptographic signature