Hello ZFS mailinglist,
We are using ZFS (over ISCSI) on Opensolaris build 57
Today we encountered 2 crashes during a ZFS send/receive operation.
We tried to replicate a snapshot via the built-in send receive zfs
tools.
When we analyzed the resulted crash dump files we found the crash was
ZFS related.
bash-3.00# mdb -k unix.0 vmcore.0
Loading modules: [ unix genunix specfs dtrace cpu.AuthenticAMD.15
uppc pcplusmp scsi_vhci ufs md ip hook neti sctp arp usba fctl nca
lofs zfs random sppp cpc fcip crypto fcp logindmux ptm ipc nfs ]
> ::status
debugging crash dump vmcore.0 (64-bit) from NAS002
operating system: 5.11 snv_57 (i86pc)
panic message:
ZFS: bad checksum (read on <unknown> off 0: zio ffffffff3017b300 [L0
ZFS plain file] 20000L/20000P DVA[0]=<0:3b98ed1e800:25800> fletcher2
uncompressed LE contiguous birth=806063 fill=1 cksum=a487e32d
dump content: kernel pages only
So we ran a zpool status -v, and found that this snapshot (the one
we were trying to replicate) had some (28) permanent errors:
bash-3.00# zpool status -v
pool: home
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
home ONLINE 0 0 0
mirror ONLINE 0 0 0
c1d0s7 ONLINE 0 0 0
c2d0s7 ONLINE 0 0 0
errors: No known data errors
pool: stor
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
stor ONLINE 0 0 0
raidz1 ONLINE 0 0 0
c4t2d0 ONLINE 0 0 0
c4t3d0 ONLINE 0 0 0
c4t4d0 ONLINE 0 0 0
c4t5d0 ONLINE 0 0 0
c4t6d0 ONLINE 0 0 0
c4t7d0 ONLINE 0 0 0
c4t8d0 ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
stor/[EMAIL PROTECTED]:01:00:/1003/kreos11/HB1030/
C_Root/Documents and Settings/bvp/My Documents/My Pictures/
confidential/tconfidential/confidential/96
stor/[EMAIL PROTECTED]:01:00:/1003/kreos11/HB1030/
C_Root/Documents and Settings/bvp/My Documents/My Pictures//
confidential/tconfidential/confidential/97
....
Son we decided to destroy this snapshot, and then started another
Replication.
This time the server crashed again :-(
What can we do to avoid this kind of problems. Or is this a know bug
in build 57 ?
Thanks for you reply.
Kristof
_______________________________________________
zfs-discuss mailing list
[EMAIL PROTECTED]
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss