savecore -d System dump time: Sat May 10 11:51:56 2008 Constructing namelist /var/crash/x4500/unix.1 Constructing corefile /var/crash/x4500/vmcore.1 45% done
When we get the second x4500 in we can do more testing in that area. But more importantly we need try to work out why the UFS exported file-systems failed to recover properly. They are mounted "hard" so that IO should wait, and yet it seems to just fail IO and make the mount-point invisible. Logging in to each and every server to remount the file-systems is somewhat tedious. # df -h Filesystem size used avail capacity Mounted on / 64G 24G 39G 39% / /dev 64G 24G 39G 39% /dev proc 0K 0K 0K 0% /proc [snip] swap 1.1G 12K 1.1G 1% /var/run df: cannot statvfs /export/test: No such file or directory # ls -l /export/ test-www01:~# ls -la /export/ total 16 drwxr-xr-x 6 root sys 512 Mar 26 16:09 . drwxr-xr-x 19 root root 512 Mar 19 11:30 .. drwxr-xr-x 23 root root 512 Apr 14 11:54 home drwxr-xr-x 2 root root 512 Mar 17 16:10 nfs No "test" directory there. # mount /export/test on x4500-01-vip:/export/test remote/read/write/setuid/nodevices/vers=3/hard/intr/quota/xattr/dev=4700002 on Tue Mar 25 11:10:52 2008 # mkdir -p /export/test/roo mkdir: "/export/test/roo": No such file or directory # umount /export/test # mount /export/test # df -h test-x4500-01-vip:/export/test 98G 4.1G 93G 5% /export/test More info from the panic: > $c top_end_sync+0xcb(fffffffedefea000, ffffff001f4ca424, b, 0) ufs_fsync+0x1cb(ffffffff3659f980, 10000, ffffffff6f0ccc70) fop_fsync+0x51(ffffffff3659f980, 10000, ffffffff6f0ccc70) rfs3_create+0x604(ffffff001f4ca7c8, ffffff001f4ca8b8, ffffff04e7627d80, ffffff001f4cab20, ffffffff6f0ccc70) common_dispatch+0x444(ffffff001f4cab20, ffffffffa71cc1c0, 2, 4, fffffffff8553a78 , ffffffffc039d3d0) rfs_dispatch+0x2d(ffffff001f4cab20, ffffffffa71cc1c0) svc_getreq+0x1c6(ffffffffa71cc1c0, fffffffec69bddc0) svc_run+0x171(fffffffecc7581c0) svc_do_run+0x85(1) nfssys+0x748(e, fec80fc8) sys_syscall32+0x101() > ::panicinfo cpu 3 thread ffffffff17b8c820 message BAD TRAP: type=e (#pf Page fault) rp=ffffff001f4ca220 addr=0 occurred in module "<unknown>" due to a NULL pointer dereference rdi fffffffedefea000 rsi 9 rdx 0 rcx ffffffff17b8c820 r8 0 r9 ffffff054797dc48 rax 0 rbx 97eaffc rbp ffffff001f4ca350 r10 0 r10 0 r11 fffffffec8b93868 r12 27991000 r13 fffffffed1b59c00 r14 fffffffecf8d8cc0 r15 1000 fsbase 0 gsbase fffffffec3d5a580 ds 4b es 4b fs 0 gs 1c3 trapno e err 10 rip 0 cs 30 rflags 10246 rsp ffffff001f4ca318 ss 38 gdt_hi 0 gdt_lo 500001ef idt_hi 0 idt_lo 40000fff ldt 0 task 70 cr0 8005003b cr2 0 cr3 1fcbbc000 cr4 6f8 Nathan Kroenert - Server ESG wrote: > Dumping to /dev/dsk/c6t0d0s1 > > certainly looks like a non-mirrored dump dev... > > You might try a manual savecore telling it to ignore the dump valid > header and see what you get... > > savecore -d > > and perhaps try telling it to look directly at the dump device... > > savecore -f <device> > > You should also, when you get the chance, deliberately panic the box to > make sure you can actually capture a dump... > > dumpadm is your friend as far as checking where you are going to dump > to, and it it's one side of your swap mirror, that's bad, M'Kay? > > :) > > Nathan. > > Jorgen Lundman wrote: >> OK, this is a pretty damn poor panic report if I may say no, not had >> much sleep. >> >> Solaris Express Developer Edition 9/07 snv_70b X86 >> Copyright 2007 Sun Microsystems, Inc. All Rights Reserved. >> Use is subject to license terms. >> Assembled 30 August 2007 >> >> SunOS x4500-01.unix 5.11 snv_70b i86pc i386 i86pc >> >> Even though it dumped, it wrote nothing to /var/crash/. Perhaps >> because swap is mirrored. >> >> >> >> Jorgen Lundman wrote: >>> We had a panic around noon on Saturday, which it mostly recovered >>> itself. All ZFS NFS exports just remounted, but the UFS on zdev NFS >>> exports did not, needed manual umount && mount on all clients for >>> some reason. >>> >>> Is this a known bug we should consider a patch for? >>> >>> >>> >>> May 10 11:49:46 x4500-01.unix ufs: [ID 912200 kern.notice] quota_ufs: >>> over hard >>> disk limit (pid 477, uid 127409, inum 1047211, fs /export/zero1) >>> May 10 11:51:26 x4500-01.unix unix: [ID 836849 kern.notice] >>> May 10 11:51:26 x4500-01.unix ^Mpanic[cpu3]/thread=ffffffff17b8c820: >>> May 10 11:51:26 x4500-01.unix genunix: [ID 335743 kern.notice] BAD TRAP: >>> type=e >>> (#pf Page fault) rp=ffffff001f4ca220 addr=0 occurred in module >>> "<unknown>" due t >>> o a NULL pointer dereference >>> May 10 11:51:26 x4500-01.unix unix: [ID 100000 kern.notice] >>> May 10 11:51:26 x4500-01.unix unix: [ID 839527 kern.notice] nfsd: >>> May 10 11:51:26 x4500-01.unix unix: [ID 753105 kern.notice] #pf Page >>> fault >>> May 10 11:51:26 x4500-01.unix unix: [ID 532287 kern.notice] Bad kernel >>> fault at >>> addr=0x0 >>> May 10 11:51:26 x4500-01.unix unix: [ID 243837 kern.notice] pid=477, >>> pc=0x0, sp= >>> 0xffffff001f4ca318, eflags=0x10246 >>> May 10 11:51:26 x4500-01.unix unix: [ID 211416 kern.notice] cr0: >>> 8005003b<pg,wp, >>> ne,et,ts,mp,pe> cr4: 6f8<xmme,fxsr,pge,mce,pae,pse,de> >>> May 10 11:51:26 x4500-01.unix unix: [ID 354241 kern.notice] cr2: 0 cr3: >>> 1fcbbc00 >>> 0 cr8: c >>> May 10 11:51:26 x4500-01.unix unix: [ID 592667 kern.notice] rdi: >>> fffffffedef >>> ea000 rsi: 9 rdx: 0 >>> May 10 11:51:26 x4500-01.unix unix: [ID 592667 kern.notice] rcx: >>> ffffffff17b >>> 8c820 r8: 0 r9: ffffff054797dc48 >>> May 10 11:51:26 x4500-01.unix unix: [ID 592667 kern.notice] rax: >>> >>> 0 rbx: 97eaffc rbp: ffffff001f4ca350 >>> May 10 11:51:26 x4500-01.unix unix: [ID 592667 kern.notice] r10: >>> >>> 0 r11: fffffffec8b93868 r12: 27991000 >>> May 10 11:51:27 x4500-01.unix unix: [ID 592667 kern.notice] r13: >>> fffffffed1b >>> 59c00 r14: fffffffecf8d8cc0 r15: 1000 >>> May 10 11:51:27 x4500-01.unix unix: [ID 592667 kern.notice] fsb: >>> >>> 0 gsb: fffffffec3d5a580 ds: 4b >>> May 10 11:51:27 x4500-01.unix unix: [ID 592667 kern.notice] es: >>> >>> 4b fs: 0 gs: 1c3 >>> May 10 11:51:27 x4500-01.unix unix: [ID 592667 kern.notice] trp: >>> >>> e err: 10 rip: 0 >>> May 10 11:51:27 x4500-01.unix unix: [ID 592667 kern.notice] cs: >>> >>> 30 rfl: 10246 rsp: ffffff001f4ca318 >>> May 10 11:51:27 x4500-01.unix unix: [ID 266532 kern.notice] ss: >>> >>> 38 >>> May 10 11:51:27 x4500-01.unix unix: [ID 100000 kern.notice] >>> May 10 11:51:27 x4500-01.unix genunix: [ID 655072 kern.notice] >>> ffffff001f4ca100 >>> unix:die+c8 () >>> May 10 11:51:27 x4500-01.unix genunix: [ID 655072 kern.notice] >>> ffffff001f4ca210 >>> unix:trap+135b () >>> May 10 11:51:27 x4500-01.unix genunix: [ID 655072 kern.notice] >>> ffffff001f4ca220 >>> unix:_cmntrap+e9 () >>> May 10 11:51:27 x4500-01.unix genunix: [ID 802836 kern.notice] >>> ffffff001f4ca350 >>> 0 () >>> May 10 11:51:27 x4500-01.unix genunix: [ID 655072 kern.notice] >>> ffffff001f4ca3d0 >>> ufs:top_end_sync+cb () >>> May 10 11:51:27 x4500-01.unix genunix: [ID 655072 kern.notice] >>> ffffff001f4ca440 >>> ufs:ufs_fsync+1cb () >>> May 10 11:51:27 x4500-01.unix genunix: [ID 655072 kern.notice] >>> ffffff001f4ca490 >>> genunix:fop_fsync+51 () >>> May 10 11:51:27 x4500-01.unix genunix: [ID 655072 kern.notice] >>> ffffff001f4ca770 >>> nfssrv:rfs3_create+604 () >>> May 10 11:51:27 x4500-01.unix genunix: [ID 655072 kern.notice] >>> ffffff001f4caa70 >>> nfssrv:common_dispatch+444 () >>> May 10 11:51:27 x4500-01.unix genunix: [ID 655072 kern.notice] >>> ffffff001f4caa90 >>> nfssrv:rfs_dispatch+2d () >>> May 10 11:51:27 x4500-01.unix genunix: [ID 655072 kern.notice] >>> ffffff001f4cab80 >>> rpcmod:svc_getreq+1c6 () >>> May 10 11:51:27 x4500-01.unix genunix: [ID 655072 kern.notice] >>> ffffff001f4cabf0 >>> rpcmod:svc_run+171 () >>> May 10 11:51:28 x4500-01.unix genunix: [ID 655072 kern.notice] >>> ffffff001f4cac30 >>> rpcmod:svc_do_run+85 () >>> May 10 11:51:28 x4500-01.unix genunix: [ID 655072 kern.notice] >>> ffffff001f4caec0 >>> nfs:nfssys+748 () >>> May 10 11:51:28 x4500-01.unix genunix: [ID 655072 kern.notice] >>> ffffff001f4caf10 >>> unix:brand_sys_syscall32+1a3 () >>> May 10 11:51:28 x4500-01.unix unix: [ID 100000 kern.notice] >>> May 10 11:51:28 x4500-01.unix genunix: [ID 672855 kern.notice] syncing >>> file syst >>> ems... >>> May 10 11:51:28 x4500-01.unix genunix: [ID 733762 kern.notice] 8 >>> May 10 11:51:29 x4500-01.unix genunix: [ID 733762 kern.notice] 5 >>> May 10 11:51:30 x4500-01.unix genunix: [ID 733762 kern.notice] 2 >>> May 10 11:51:54 x4500-01.unix last message repeated 20 times >>> May 10 11:51:55 x4500-01.unix genunix: [ID 622722 kern.notice] done >>> (not all i/ >>> o completed) >>> May 10 11:51:56 x4500-01.unix genunix: [ID 111219 kern.notice] dumping >>> to /dev/d >>> sk/c6t0d0s1, offset 65536, content: kernel >>> >>> >> > -- Jorgen Lundman | <[EMAIL PROTECTED]> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home) _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss