comment below...

On Apr 14, 2010, at 1:49 AM, Richard Skelton wrote:

> Hi,
> I have installed OpenSolaris snv_134 from the iso at genunix.org.
> Mon Mar 8 2010 New OpenSolaris preview, based on build 134
> I created a zpool:-
>        NAME        STATE     READ WRITE CKSUM
>        tank        ONLINE       0     0     0
>          c7t4d0    ONLINE       0     0     0
>          c7t5d0    ONLINE       0     0     0
>          c7t6d0    ONLINE       0     0     0
>          c7t8d0    ONLINE       0     0     0
>          c7t9d0    ONLINE       0     0     0
>        logs
>          c5d1p1    ONLINE       0     0     0
>        cache
>          c5d1p2    ONLINE       0     0     0
> 
> The log device and cache are each one half of a 128GB  OCZ VERTEX-TURBO flash 
> card.
> 
> I am getting good NFS performance but have seen this error:-
> r...@brszfs02:~# zpool status tank
>  pool: tank
> state: DEGRADED
> status: One or more devices are faulted in response to persistent errors.
>        Sufficient replicas exist for the pool to continue functioning in a
>        degraded state.
> action: Replace the faulted device, or use 'zpool clear' to mark the device
>        repaired.
> scrub: none requested
> config:
> 
>        NAME        STATE     READ WRITE CKSUM
>        tank        DEGRADED     0     0     0
>          c7t4d0    ONLINE       0     0     0
>          c7t5d0    ONLINE       0     0     0
>          c7t6d0    ONLINE       0     0     0
>          c7t8d0    ONLINE       0     0     0
>          c7t9d0    ONLINE       0     0     0
>        logs
>          c5d1p1    FAULTED      0     4     0  too many errors
>        cache
>          c5d1p2    ONLINE       0     0     0
> 
> errors: No known data errors
> 
> r...@brszfs02:~# fmadm faulty
> --------------- ------------------------------------  -------------- ---------
> TIME            EVENT-ID                              MSG-ID         SEVERITY
> --------------- ------------------------------------  -------------- ---------
> Mar 25 13:14:34 6c0bd163-56bf-ee92-e393-ce2063355b52  ZFS-8000-FD    Major
> 
> Host        : brszfs02
> Platform    : HP-Compaq-dc7700-Convertible-Minitower    Chassis_id  : 
> CZC7264JN4
> Product_sn  :
> 
> Fault class : fault.fs.zfs.vdev.io
> Affects     : zfs://pool=tank/vdev=4ec464b5bf74a898
>                  faulted but still in service
> Problem in  : zfs://pool=tank/vdev=4ec464b5bf74a898
>                  faulted but still in service
> 
> Description : The number of I/O errors associated with a ZFS device exceeded
>                     acceptable levels.  Refer to 
> http://sun.com/msg/ZFS-8000-FD
>              for more information.
> 
> Response    : The device has been offlined and marked as faulted.  An attempt
>                     will be made to activate a hot spare if available.
> 
> Impact      : Fault tolerance of the pool may be compromised.
> 
> Action      : Run 'zpool status -x' and replace the bad device.
> 
> r...@brszfs02:~# iostat -En c5d1
> c5d1             Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
> Model: OCZ VERTEX-TURB Revision:  Serial No: 062F97G71C5T676 Size: 128.04GB 
> <128035160064 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
> Illegal Request: 0
> 
> 
> As there seems to be not hardware errors as reported by iostat I ran zpool 
> clear tank and a scrub on Monday.
> Up to now I have seen no new errors, I have set-up a cron to scrub a 01:30 
> each day.
> 
> Is the flash card faulty or is this a ZFS problem?

In my testing of Flash-based SSDs, this is the most common error.
Since the drive is not reporting media errors or hard errors, the only
interim conclusion is that something in the data path caused data
to be corrupted. This can mean the drive doesn't report these errors,
the errors are transient, or an error occurred which is not related to
the data (eg. phantom writes).

For example, my current bad-boy says:
        $ iostat -En
        ...
        c7t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
        Vendor: USB2.0 Product: VAULT DRIVE Revision: 1100 Serial No: Size: 
8.12GB <8120172544 bytes>
        Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal 
Request: 103 
        Predictive Failure Analysis: 0 
        ...
        $ pfexec zpool status -v
        syspool                                                                 
                
          pool: syspool
         state: ONLINE
        status: One or more devices has experienced an error resulting in data
                corruption.  Applications may be affected.
        action: Restore the file in question if possible.  Otherwise restore the
                entire pool from backup.
           see: http://www.sun.com/msg/ZFS-8000-8A
         scrub: scrub completed after 0h1m with 325 errors on Wed Apr 14 
11:06:58 2010
        config:

                NAME        STATE     READ WRITE CKSUM
                syspool     ONLINE       0     0   330
                  c7t0d0s0  ONLINE       0     0   690

        errors: Permanent errors have been detected in the following files:

                syspool/rootfs-nmu-...@initial:/var/lib/dpkg/info/man-db.list
                syspool/rootfs-nmu-...@initial:/var/lib/dpkg/triggers/File
        ...

 -- richard

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com 





_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to