On 04/ 2/10 02:52 PM, Andrej Gortchivkin wrote:
Hi All,
I just got across a strange (well... at least for me) situation with ZFS and I
hope you might be able to help me out. Recently I built a new machine from
scratch for my storage needs which include various CIFS / NFS and most
importantly VMware ESX based operations (in conjunction with COMSTAR). The
machine that I built is based on fairly new hardware and is running x86
OpenSolaris B134 OS respectively with a RAID-Z pool on the top of 4 x 1TB
SATA-2 Samsung HDD's + one additional HDD for hotspare purposes.
Yesterday one of the HDD's decided to produce some errors and although I wasn't
surprised of that, I was more surprised about the fact that there are permanent
errors over some files.
Here is the output I got from right after the resilvering:
--------------------------------------------------------------------------------------------------
pool: ZPOOL_SAS_1234
state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scrub: resilver completed after 2h45m with 4 errors on Fri Apr 2 03:01:34
2010
config:
NAME STATE READ WRITE CKSUM
ZPOOL_SAS_1234 DEGRADED 381 0 0
c7t0d0 ONLINE 0 0 0
c7t1d0 ONLINE 0 0 0
c7t2d0 ONLINE 0 0 0
spare-3 DEGRADED 363 0 1
c7t3d0 DEGRADED 381 0 3 too many errors
c7t4d0 ONLINE 0 0 730 326G resilvered
spares
c7t4d0 INUSE currently in use
errors: Permanent errors have been detected in the following files:
/ZPOOL_SAS_1234/iSCSI/ESX/ESX_Cluster_01/LUN1_DATASTORE01
/ZPOOL_SAS_1234/iSCSI/ESX/ESX_Cluster_01/LUN2_DATASTORE02
/ZPOOL_SAS_1234/iSCSI/ESX/ESX_Cluster_01/LUN5_DATASTORE05
--------------------------------------------------------------------------------------------------
Although I'm sure that the "c7t3d0" HDD is having some issues (obviously I'm
about to replace it), now I still don't understand why would I get corruption over the
files considering that all other drives are indicating zero problems within the READ,
WRITE and CHKSUM columns. Perhaps I'm missing something about the ZFS concept and it's
redundancy but my understanding for RAID-Z is that it operates in a way similar as RAID-5
which should mean that if one HDD goes down for whatever reason, the data stored over my
ZFS pool / datasets should remain unharmed due to the redundancy.
You don't appear to have any redundancy! How did you create the pool
(should be in "zpool history")?
--
Ian.
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss