On Tue, Dec 30, 2008 at 10:46 AM, Magnus Bergman <m...@citynetwork.se> wrote:
> Hi again,
>
> No ideas? I have spent quite some time trying to recover, but no luck
> yet. Any ideas or hints on recovery would be great! I'm soon running
> out of time and will have to rebuild the zones and restore the data
> but I'd much rather like to be able to recover it from the datasets.
>
> Since posting the initial question I have identified one more dataset
> that has the same problem and can't be mounted. Basically it's the two
> datasets with most disk activity.
>
> Regards //Magnus
>
> Begin forwarded message:
>
>> From: Magnus Bergman <m...@citynetwork.se>
>> Date: December 28, 2008 18:11:44  GMT+01:00
>> To: zfs-discuss@opensolaris.org
>> Subject: [zfs-discuss] zfs mount hangs
>>
>> Hi,
>>
>> System: Netra 1405, 4x450Mhz, 4GB RAM and 2x146GB (root pool) and
>> 2x146GB (space pool). snv_98.
>>
>> After a panic the system hangs on boot and manual attempts to mount
>> (at least) one dataset in single user mode, hangs.
>>
>> The Panic:
>>
>> Dec 27 04:42:11 base ^Mpanic[cpu0]/thread=300021c1a20:
>> Dec 27 04:42:11 base unix: [ID 521688 kern.notice] [AFT1] errID
>> 0x00167f73.1c737868 UE Error(s)
>> Dec 27 04:42:11 base     See previous message(s) for details
>> Dec 27 04:42:11 base unix: [ID 100000 kern.notice]
>> Dec 27 04:42:11 base genunix: [ID 723222 kern.notice] 000002a10433efc0
>> SUNW,UltraSPARC-II:cpu_aflt_log+5b4 (3, 2a10433f208, 2a10433f2e0, 10,
>> 2a10433f207, 2a10433f208)
>> Dec 27 04:42:11 base genunix: [ID 179002 kern.notice]   %l0-3:
>> 000002a10433f0cb 00000000000f0000 00000000012ccc00 00000000012cd000
>> Dec 27 04:42:11 base   %l4-7: 000002a10433f208 0000000000000170
>> 00000000012ccc00 0000000000000001
>> Dec 27 04:42:11 base genunix: [ID 723222 kern.notice] 000002a10433f210
>> SUNW,UltraSPARC-II:cpu_async_error+cdc (7fe00000, 0, 180200000, 40, 0,
>> a0b7ff60)
>> Dec 27 04:42:11 base genunix: [ID 179002 kern.notice]   %l0-3:
>> 0000000000000000 000000000180c000 0000000000000000 000002a10433f3d4
>> Dec 27 04:42:11 base   %l4-7: 00000000012cc400 000000007e600000
>> 00000000012cc400 0000000000000001
>> Dec 27 04:42:11 base genunix: [ID 723222 kern.notice] 000002a10433f410
>> unix:ktl0+48 (2a10433fec0, 2a10433ff80, 180e580, 6, 180c000, 1800000)
>> Dec 27 04:42:11 base genunix: [ID 179002 kern.notice]   %l0-3:
>> 0000000000000002 0000000000001400 0000000080001601 00000000012c1578
>> Dec 27 04:42:11 base   %l4-7: 0000000ae394c629 0000060017a32260
>> 000000000000000b 000002a10433f4c0
>> Dec 27 04:42:12 base genunix: [ID 723222 kern.notice] 000002a10433f560
>> unix:resume+240 (300021c1a20, 180c000, 1835c40, 6001c1f20c8, 16,
>> 30001e4cc40)
>> Dec 27 04:42:12 base genunix: [ID 179002 kern.notice]   %l0-3:
>> 0000000000000000 0000000000000000 00000180048279c0 000002a1035dbca0
>> Dec 27 04:42:12 base   %l4-7: 0000000000000001 0000000001867800
>> 0000000025be86dc 00000000018bbc00
>> Dec 27 04:42:12 base genunix: [ID 723222 kern.notice] 000002a10433f610
>> genunix:cv_wait+3c (3001365ba10, 3001365ba10, 1, 18d0c00, c44000, 0)
>> Dec 27 04:42:12 base genunix: [ID 179002 kern.notice]   %l0-3:
>> 0000000000c44002 00000000018d0e58 0000000000000001 0000000000c44002
>> Dec 27 04:42:12 base   %l4-7: 0000000000000000 0000000000000001
>> 0000000000000002 0000000001326e5c
>> Dec 27 04:42:12 base genunix: [ID 723222 kern.notice] 000002a10433f6c0
>> zfs:zio_wait+30 (3001365b778, 6001cdcf7e8, 3001365ba18, 3001365ba10,
>> 30034dc1f48, 1)
>> Dec 27 04:42:12 base genunix: [ID 179002 kern.notice]   %l0-3:
>> 000006001cdcf7f0 000000000000ffff 0000000000000100 000000000000fc00
>> Dec 27 04:42:12 base   %l4-7: 00000000018d7000 000000000c6eefd9
>> 000000000c6eefd8 000000000c6eefd8
>> Dec 27 04:42:12 base genunix: [ID 723222 kern.notice] 000002a10433f770
>> zfs:zil_commit_writer+2d0 (6001583be00, 4b0, 1b1a4d54, 42a03, cfc67,
>> 0)
>> Dec 27 04:42:12 base genunix: [ID 179002 kern.notice]   %l0-3:
>> 0000060018b5d068 ffffffffffffffff 0000060010ce1040 000006001583be88
>> Dec 27 04:42:12 base   %l4-7: 0000060013760380 00000000000000c0
>> 000003002bf81138 000003001365b778
>> Dec 27 04:42:13 base genunix: [ID 723222 kern.notice] 000002a10433f820
>> zfs:zil_commit+68 (6001583be00, 1b1a5ae5, 38bc5, 6001583be7c,
>> 1b1a5ae5, 0)
>> Dec 27 04:42:13 base genunix: [ID 179002 kern.notice]   %l0-3:
>> 0000000000000001 0000000000000001 00000600177fe080 000006001c1f2ad8
>> Dec 27 04:42:13 base   %l4-7: 00000000000001c0 0000000000000001
>> 0000060010c78000 0000000000000000
>> Dec 27 04:42:13 base genunix: [ID 723222 kern.notice] 000002a10433f8d0
>> zfs:zfs_fsync+f8 (18e5800, 0, 134fc00, 3001c2c4860, 134fc00, 134fc00)
>> Dec 27 04:42:13 base genunix: [ID 179002 kern.notice]   %l0-3:
>> 000003001e94d948 0000000000010000 000000000180c008 0000000000000008
>> Dec 27 04:42:13 base   %l4-7: 0000060013760458 0000000000000000
>> 000000000134fc00 00000000018d2000
>> Dec 27 04:42:13 base genunix: [ID 723222 kern.notice] 000002a10433f980
>> genunix:fop_fsync+40 (300131ed600, 10, 60011c08b68, 0, 60010c77200,
>> 30028320b40)
>> Dec 27 04:42:13 base genunix: [ID 179002 kern.notice]   %l0-3:
>> 0000060011f6c828 0000000000000007 000006001c1f20c8 00000000013409d8
>> Dec 27 04:42:13 base   %l4-7: 0000000000000000 0000000000000001
>> 0000000000000000 00000000018bcc00
>> Dec 27 04:42:13 base genunix: [ID 723222 kern.notice] 000002a10433fa30
>> genunix:fdsync+40 (7, 10, 0, 184, 10, 30007adda40)
>> Dec 27 04:42:13 base genunix: [ID 179002 kern.notice]   %l0-3:
>> 0000000000000000 000000000000f071 00000000f0710000 000000000000f071
>> Dec 27 04:42:13 base   %l4-7: 0000000000000001 000000000180c000
>> 0000000000000000 0000000000000000
>> Dec 27 04:42:14 base unix: [ID 100000 kern.notice]
>> Dec 27 04:42:14 base genunix: [ID 672855 kern.notice] syncing file
>> systems...
>> Dec 27 04:42:14 base genunix: [ID 904073 kern.notice]  done
>> Dec 27 04:42:15 base SUNW,UltraSPARC-II: [ID 201454 kern.warning]
>> WARNING: [AFT1] Uncorrectable Memory Error on CPU0 Data access at
>> TL=0, errID 0x00167f73.1c737868
>> Dec 27 04:42:15 base     AFSR 0x00000001<ME>.80200000<PRIV,UE> AFAR
>> 0x00000000.a0b7ff60
>> Dec 27 04:42:15 base     AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00
>> Fault_PC 0x101b708
>> Dec 27 04:42:15 base     UDBH 0x0203<UE> UDBH.ESYND 0x03 UDBL 0x0000
>> UDBL.ESYND 0x00
>> Dec 27 04:42:15 base     UDBH Syndrome 0x3 Memory Module U1402 U0402
>> U1401 U0401
>> Dec 27 04:42:15 base SUNW,UltraSPARC-II: [ID 325743 kern.warning]
>> WARNING: [AFT1] errID 0x00167f73.1c737868 Syndrome 0x3 indicates that
>> this may not be a memory module problem
>> Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 151010 kern.info] [AFT2]
>> errID 0x00167f73.1c737868 PA=0x00000000.a0b7ff60
>> Dec 27 04:42:16 base     E$tag 0x00000000.1cc01416 E$State:
>> Exclusive E
>> $parity 0x0e
>> Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 359263 kern.info]
>> [AFT2] E
>> $Data (0x00): 0x0070ba48.00000000
>> Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 359263 kern.info]
>> [AFT2] E
>> $Data (0x08): 0x00000000.00000000
>> Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 359263 kern.info]
>> [AFT2] E
>> $Data (0x10): 0x00ec48b9.495349e1
>> Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 359263 kern.info]
>> [AFT2] E
>> $Data (0x18): 0x4955a237.00000000
>> Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 989652 kern.info]
>> [AFT2] E
>> $Data (0x20): 0x00000800.00000000 *Bad* PSYND=0xff00
>> Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 359263 kern.info]
>> [AFT2] E
>> $Data (0x28): 0x0070ba28.00000000
>> Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 359263 kern.info]
>> [AFT2] E
>> $Data (0x30): 0x00000000.00000000
>> Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 359263 kern.info]
>> [AFT2] E
>> $Data (0x38): 0x027a7ea6.494f4aeb
>> Dec 27 04:47:56 base genunix: [ID 540533 kern.notice] ^MSunOS Release
>> 5.11 Version snv_98 64-bit
>> Dec 27 04:47:56 base genunix: [ID 172908 kern.notice] Copyright
>> 1983-2008 Sun Microsystems, Inc.  All rights reserved.
>> Dec 27 04:47:56 base Use is subject to license terms.
>>
>> My guess would be a broken CPU, Maybe the old Ecache-problem...
>>
>> Anyway, "zfs mount space" works fine, but "zfs mount space/postfix"
>> hangs. A look at the zfs-process shows:
>>
>> # echo "0t236::pid2proc|::walk thread|::findstack -v" | mdb -k
>> stack pointer for thread 30001cecc00: 2a100fa2181
>> [ 000002a100fa2181 cv_wait+0x3c() ]
>>   000002a100fa2231 txg_wait_open+0x58(60014aa1158, d000b, 0,
>> 60014aa119c,
>>   60014aa119e, 60014aa1150)
>>   000002a100fa22e1 dmu_tx_assign+0x3c(60022dd3780, 1, 7, 60013cd5918,
>> 5b, 1)
>>   000002a100fa2391 dmu_free_long_range_impl+0xc4(600245fbdb0,
>> 60025f69750, 0,
>>   400, 0, 1)
>>   000002a100fa2451 dmu_free_long_range+0x44(600245fbdb0, 43b12, 0,
>>   ffffffffffffffff, 1348800, 0)
>>   000002a100fa2511 zfs_rmnode+0x68(60025bb6f20, 12, 600243af9e0, 1,
>> 600243af880
>>   , 600245fbdb0)
>>   000002a100fa25d1 zfs_inactive+0x134(600243af988, 0, 60025f6fef8,
>> 4000, 420,
>>   60025bb6f20)
>>   000002a100fa2681 zfs_rename+0x73c(6002401e400, 40800000004,
>> 6002401e400,
>>   60021860041, 60022dd3780, 60025bb6fe8)
>>   000002a100fa27c1 fop_rename+0xac(6002401e400, 60021860030,
>> 6002401e400,
>>   60021860041, 60010c03e08, 0)
>>   000002a100fa2881 zfs_replay_rename+0xb4(18bbc00, 6002400e8b0, 0,
>> 60014a94000,
>>   0, 0)
>>   000002a100fa2951 zil_replay_log_record+0x244(18d1ed0, 60017108000,
>> 2a100fa3450
>>   , 0, 6002347fc80, 60014a94000)
>>   000002a100fa2a41 zil_parse+0x160(58, 132573c, 13253a4, 2a100fa3450,
>> cff2c,
>>   1978d7)
>>   000002a100fa2ba1 zil_replay+0xa4(9050200ff00ff, 600243af880,
>> 600243af8b0,
>>   40000, 60022ad91d8, 6002347fc80)
>>   000002a100fa2c81 zfsvfs_setup+0x94(600243af880, 1, 18d1c00,
>> 600151e8400,
>>   18d0c00, 0)
>>   000002a100fa2d31 zfs_domount+0x2dc(60011f08d08, 60022afe480,
>> 60011f08d08,
>>   600243af890, 0, 400)
>>   000002a100fa2e11 zfs_mount+0x1ec(60011f08d08, 6002401e200,
>> 2a100fa39d8, 100, 0
>>   , 2)
>>   000002a100fa2f71 domount+0xaf0(100, 1, 6002401e200, 8077,
>> 60011f08d08, 0)
>>   000002a100fa3121 mount+0xec(60023dd7388, 2a100fa3ad8, 0, ff104ed8,
>> 100, 45bd0
>>   )
>>   000002a100fa3221 syscall_ap+0x44(2a0, ffbfe8a8, 115b9e8,
>> 60023dd72d0, 15, 0)
>>   000002a100fa32e1 syscall_trap32+0xcc(45bd0, ffbfe8a8, 100,
>> ff104ed8, 0, 0)
>>
>>
>> zpool status and fmdump don't indicate any problems.
>>
>> Any possibility to recover the dataset? I do have backups of all data,
>> but I would really like to be able to recover it to save some time.
>>
>> Anything special to look for in zdb output? Any other diagnostics that
>> would be useful?
>>
>> Thanks in advance!
>>
>> Best Regards //Magnus
>>
>>
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

I had a similar problem, but did not run truss to find the cause as it
was not a live filesystem yet.
Recreating the filesystem with the same name resulted in it not
mounting and just hanging, but if I created it with a different name
it would mount and run perfectly fine.
I settled on the new name and continued on and have no noticed the
problem again.
But seeing this post, I'll capture as much data as I can if it happens again.

-- 
Brent Jones
br...@servuhome.net
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to