On Tue, Dec 30, 2008 at 10:46 AM, Magnus Bergman <m...@citynetwork.se> wrote: > Hi again, > > No ideas? I have spent quite some time trying to recover, but no luck > yet. Any ideas or hints on recovery would be great! I'm soon running > out of time and will have to rebuild the zones and restore the data > but I'd much rather like to be able to recover it from the datasets. > > Since posting the initial question I have identified one more dataset > that has the same problem and can't be mounted. Basically it's the two > datasets with most disk activity. > > Regards //Magnus > > Begin forwarded message: > >> From: Magnus Bergman <m...@citynetwork.se> >> Date: December 28, 2008 18:11:44 GMT+01:00 >> To: zfs-discuss@opensolaris.org >> Subject: [zfs-discuss] zfs mount hangs >> >> Hi, >> >> System: Netra 1405, 4x450Mhz, 4GB RAM and 2x146GB (root pool) and >> 2x146GB (space pool). snv_98. >> >> After a panic the system hangs on boot and manual attempts to mount >> (at least) one dataset in single user mode, hangs. >> >> The Panic: >> >> Dec 27 04:42:11 base ^Mpanic[cpu0]/thread=300021c1a20: >> Dec 27 04:42:11 base unix: [ID 521688 kern.notice] [AFT1] errID >> 0x00167f73.1c737868 UE Error(s) >> Dec 27 04:42:11 base See previous message(s) for details >> Dec 27 04:42:11 base unix: [ID 100000 kern.notice] >> Dec 27 04:42:11 base genunix: [ID 723222 kern.notice] 000002a10433efc0 >> SUNW,UltraSPARC-II:cpu_aflt_log+5b4 (3, 2a10433f208, 2a10433f2e0, 10, >> 2a10433f207, 2a10433f208) >> Dec 27 04:42:11 base genunix: [ID 179002 kern.notice] %l0-3: >> 000002a10433f0cb 00000000000f0000 00000000012ccc00 00000000012cd000 >> Dec 27 04:42:11 base %l4-7: 000002a10433f208 0000000000000170 >> 00000000012ccc00 0000000000000001 >> Dec 27 04:42:11 base genunix: [ID 723222 kern.notice] 000002a10433f210 >> SUNW,UltraSPARC-II:cpu_async_error+cdc (7fe00000, 0, 180200000, 40, 0, >> a0b7ff60) >> Dec 27 04:42:11 base genunix: [ID 179002 kern.notice] %l0-3: >> 0000000000000000 000000000180c000 0000000000000000 000002a10433f3d4 >> Dec 27 04:42:11 base %l4-7: 00000000012cc400 000000007e600000 >> 00000000012cc400 0000000000000001 >> Dec 27 04:42:11 base genunix: [ID 723222 kern.notice] 000002a10433f410 >> unix:ktl0+48 (2a10433fec0, 2a10433ff80, 180e580, 6, 180c000, 1800000) >> Dec 27 04:42:11 base genunix: [ID 179002 kern.notice] %l0-3: >> 0000000000000002 0000000000001400 0000000080001601 00000000012c1578 >> Dec 27 04:42:11 base %l4-7: 0000000ae394c629 0000060017a32260 >> 000000000000000b 000002a10433f4c0 >> Dec 27 04:42:12 base genunix: [ID 723222 kern.notice] 000002a10433f560 >> unix:resume+240 (300021c1a20, 180c000, 1835c40, 6001c1f20c8, 16, >> 30001e4cc40) >> Dec 27 04:42:12 base genunix: [ID 179002 kern.notice] %l0-3: >> 0000000000000000 0000000000000000 00000180048279c0 000002a1035dbca0 >> Dec 27 04:42:12 base %l4-7: 0000000000000001 0000000001867800 >> 0000000025be86dc 00000000018bbc00 >> Dec 27 04:42:12 base genunix: [ID 723222 kern.notice] 000002a10433f610 >> genunix:cv_wait+3c (3001365ba10, 3001365ba10, 1, 18d0c00, c44000, 0) >> Dec 27 04:42:12 base genunix: [ID 179002 kern.notice] %l0-3: >> 0000000000c44002 00000000018d0e58 0000000000000001 0000000000c44002 >> Dec 27 04:42:12 base %l4-7: 0000000000000000 0000000000000001 >> 0000000000000002 0000000001326e5c >> Dec 27 04:42:12 base genunix: [ID 723222 kern.notice] 000002a10433f6c0 >> zfs:zio_wait+30 (3001365b778, 6001cdcf7e8, 3001365ba18, 3001365ba10, >> 30034dc1f48, 1) >> Dec 27 04:42:12 base genunix: [ID 179002 kern.notice] %l0-3: >> 000006001cdcf7f0 000000000000ffff 0000000000000100 000000000000fc00 >> Dec 27 04:42:12 base %l4-7: 00000000018d7000 000000000c6eefd9 >> 000000000c6eefd8 000000000c6eefd8 >> Dec 27 04:42:12 base genunix: [ID 723222 kern.notice] 000002a10433f770 >> zfs:zil_commit_writer+2d0 (6001583be00, 4b0, 1b1a4d54, 42a03, cfc67, >> 0) >> Dec 27 04:42:12 base genunix: [ID 179002 kern.notice] %l0-3: >> 0000060018b5d068 ffffffffffffffff 0000060010ce1040 000006001583be88 >> Dec 27 04:42:12 base %l4-7: 0000060013760380 00000000000000c0 >> 000003002bf81138 000003001365b778 >> Dec 27 04:42:13 base genunix: [ID 723222 kern.notice] 000002a10433f820 >> zfs:zil_commit+68 (6001583be00, 1b1a5ae5, 38bc5, 6001583be7c, >> 1b1a5ae5, 0) >> Dec 27 04:42:13 base genunix: [ID 179002 kern.notice] %l0-3: >> 0000000000000001 0000000000000001 00000600177fe080 000006001c1f2ad8 >> Dec 27 04:42:13 base %l4-7: 00000000000001c0 0000000000000001 >> 0000060010c78000 0000000000000000 >> Dec 27 04:42:13 base genunix: [ID 723222 kern.notice] 000002a10433f8d0 >> zfs:zfs_fsync+f8 (18e5800, 0, 134fc00, 3001c2c4860, 134fc00, 134fc00) >> Dec 27 04:42:13 base genunix: [ID 179002 kern.notice] %l0-3: >> 000003001e94d948 0000000000010000 000000000180c008 0000000000000008 >> Dec 27 04:42:13 base %l4-7: 0000060013760458 0000000000000000 >> 000000000134fc00 00000000018d2000 >> Dec 27 04:42:13 base genunix: [ID 723222 kern.notice] 000002a10433f980 >> genunix:fop_fsync+40 (300131ed600, 10, 60011c08b68, 0, 60010c77200, >> 30028320b40) >> Dec 27 04:42:13 base genunix: [ID 179002 kern.notice] %l0-3: >> 0000060011f6c828 0000000000000007 000006001c1f20c8 00000000013409d8 >> Dec 27 04:42:13 base %l4-7: 0000000000000000 0000000000000001 >> 0000000000000000 00000000018bcc00 >> Dec 27 04:42:13 base genunix: [ID 723222 kern.notice] 000002a10433fa30 >> genunix:fdsync+40 (7, 10, 0, 184, 10, 30007adda40) >> Dec 27 04:42:13 base genunix: [ID 179002 kern.notice] %l0-3: >> 0000000000000000 000000000000f071 00000000f0710000 000000000000f071 >> Dec 27 04:42:13 base %l4-7: 0000000000000001 000000000180c000 >> 0000000000000000 0000000000000000 >> Dec 27 04:42:14 base unix: [ID 100000 kern.notice] >> Dec 27 04:42:14 base genunix: [ID 672855 kern.notice] syncing file >> systems... >> Dec 27 04:42:14 base genunix: [ID 904073 kern.notice] done >> Dec 27 04:42:15 base SUNW,UltraSPARC-II: [ID 201454 kern.warning] >> WARNING: [AFT1] Uncorrectable Memory Error on CPU0 Data access at >> TL=0, errID 0x00167f73.1c737868 >> Dec 27 04:42:15 base AFSR 0x00000001<ME>.80200000<PRIV,UE> AFAR >> 0x00000000.a0b7ff60 >> Dec 27 04:42:15 base AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00 >> Fault_PC 0x101b708 >> Dec 27 04:42:15 base UDBH 0x0203<UE> UDBH.ESYND 0x03 UDBL 0x0000 >> UDBL.ESYND 0x00 >> Dec 27 04:42:15 base UDBH Syndrome 0x3 Memory Module U1402 U0402 >> U1401 U0401 >> Dec 27 04:42:15 base SUNW,UltraSPARC-II: [ID 325743 kern.warning] >> WARNING: [AFT1] errID 0x00167f73.1c737868 Syndrome 0x3 indicates that >> this may not be a memory module problem >> Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 151010 kern.info] [AFT2] >> errID 0x00167f73.1c737868 PA=0x00000000.a0b7ff60 >> Dec 27 04:42:16 base E$tag 0x00000000.1cc01416 E$State: >> Exclusive E >> $parity 0x0e >> Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 359263 kern.info] >> [AFT2] E >> $Data (0x00): 0x0070ba48.00000000 >> Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 359263 kern.info] >> [AFT2] E >> $Data (0x08): 0x00000000.00000000 >> Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 359263 kern.info] >> [AFT2] E >> $Data (0x10): 0x00ec48b9.495349e1 >> Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 359263 kern.info] >> [AFT2] E >> $Data (0x18): 0x4955a237.00000000 >> Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 989652 kern.info] >> [AFT2] E >> $Data (0x20): 0x00000800.00000000 *Bad* PSYND=0xff00 >> Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 359263 kern.info] >> [AFT2] E >> $Data (0x28): 0x0070ba28.00000000 >> Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 359263 kern.info] >> [AFT2] E >> $Data (0x30): 0x00000000.00000000 >> Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 359263 kern.info] >> [AFT2] E >> $Data (0x38): 0x027a7ea6.494f4aeb >> Dec 27 04:47:56 base genunix: [ID 540533 kern.notice] ^MSunOS Release >> 5.11 Version snv_98 64-bit >> Dec 27 04:47:56 base genunix: [ID 172908 kern.notice] Copyright >> 1983-2008 Sun Microsystems, Inc. All rights reserved. >> Dec 27 04:47:56 base Use is subject to license terms. >> >> My guess would be a broken CPU, Maybe the old Ecache-problem... >> >> Anyway, "zfs mount space" works fine, but "zfs mount space/postfix" >> hangs. A look at the zfs-process shows: >> >> # echo "0t236::pid2proc|::walk thread|::findstack -v" | mdb -k >> stack pointer for thread 30001cecc00: 2a100fa2181 >> [ 000002a100fa2181 cv_wait+0x3c() ] >> 000002a100fa2231 txg_wait_open+0x58(60014aa1158, d000b, 0, >> 60014aa119c, >> 60014aa119e, 60014aa1150) >> 000002a100fa22e1 dmu_tx_assign+0x3c(60022dd3780, 1, 7, 60013cd5918, >> 5b, 1) >> 000002a100fa2391 dmu_free_long_range_impl+0xc4(600245fbdb0, >> 60025f69750, 0, >> 400, 0, 1) >> 000002a100fa2451 dmu_free_long_range+0x44(600245fbdb0, 43b12, 0, >> ffffffffffffffff, 1348800, 0) >> 000002a100fa2511 zfs_rmnode+0x68(60025bb6f20, 12, 600243af9e0, 1, >> 600243af880 >> , 600245fbdb0) >> 000002a100fa25d1 zfs_inactive+0x134(600243af988, 0, 60025f6fef8, >> 4000, 420, >> 60025bb6f20) >> 000002a100fa2681 zfs_rename+0x73c(6002401e400, 40800000004, >> 6002401e400, >> 60021860041, 60022dd3780, 60025bb6fe8) >> 000002a100fa27c1 fop_rename+0xac(6002401e400, 60021860030, >> 6002401e400, >> 60021860041, 60010c03e08, 0) >> 000002a100fa2881 zfs_replay_rename+0xb4(18bbc00, 6002400e8b0, 0, >> 60014a94000, >> 0, 0) >> 000002a100fa2951 zil_replay_log_record+0x244(18d1ed0, 60017108000, >> 2a100fa3450 >> , 0, 6002347fc80, 60014a94000) >> 000002a100fa2a41 zil_parse+0x160(58, 132573c, 13253a4, 2a100fa3450, >> cff2c, >> 1978d7) >> 000002a100fa2ba1 zil_replay+0xa4(9050200ff00ff, 600243af880, >> 600243af8b0, >> 40000, 60022ad91d8, 6002347fc80) >> 000002a100fa2c81 zfsvfs_setup+0x94(600243af880, 1, 18d1c00, >> 600151e8400, >> 18d0c00, 0) >> 000002a100fa2d31 zfs_domount+0x2dc(60011f08d08, 60022afe480, >> 60011f08d08, >> 600243af890, 0, 400) >> 000002a100fa2e11 zfs_mount+0x1ec(60011f08d08, 6002401e200, >> 2a100fa39d8, 100, 0 >> , 2) >> 000002a100fa2f71 domount+0xaf0(100, 1, 6002401e200, 8077, >> 60011f08d08, 0) >> 000002a100fa3121 mount+0xec(60023dd7388, 2a100fa3ad8, 0, ff104ed8, >> 100, 45bd0 >> ) >> 000002a100fa3221 syscall_ap+0x44(2a0, ffbfe8a8, 115b9e8, >> 60023dd72d0, 15, 0) >> 000002a100fa32e1 syscall_trap32+0xcc(45bd0, ffbfe8a8, 100, >> ff104ed8, 0, 0) >> >> >> zpool status and fmdump don't indicate any problems. >> >> Any possibility to recover the dataset? I do have backups of all data, >> but I would really like to be able to recover it to save some time. >> >> Anything special to look for in zdb output? Any other diagnostics that >> would be useful? >> >> Thanks in advance! >> >> Best Regards //Magnus >> >> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss@opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
I had a similar problem, but did not run truss to find the cause as it was not a live filesystem yet. Recreating the filesystem with the same name resulted in it not mounting and just hanging, but if I created it with a different name it would mount and run perfectly fine. I settled on the new name and continued on and have no noticed the problem again. But seeing this post, I'll capture as much data as I can if it happens again. -- Brent Jones br...@servuhome.net _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss