Re: [zfs-discuss] Drive Checksum error

Richard Elling Tue, 16 Dec 2008 17:04:50 -0800

Glaser, David wrote:
> Hi all,
> 
> A few weeks ago I was inquiring of the group on how often to do zfs 
> scrubs of pools on our x4500's. Figures that the first time I try 
> to do a monthly scrub of our pools, we get one of the three machines
> to throw an error. On one of the machines, there's one disk that has 
> registered one Checksum error. Sun lists it as an 'unrecoverable I/O 
> error'. Is it really an unrecoverable error? Is the drive really bad
> (i.e. warrant a call to SUN for an RMA of the drive?)  Researching 
> the error message says that you can set the plateau of checksum 
> errors before it throws an error, but I'd figure that one is too many.

I presume you mean that a "zpool status" shows a data error?
If so, try "zpool status -xv" to see which file(s) are affected.
If ZFS is managing the redundancy, it should be able to recover
the data.

Depending on the drive, disk drive vendors spec 1 UER for every 1e15
bits read. So it is not really all that unlikely to see them on a
system the size of an X4500 which can hold ~3.8e14 bits.

> So, is there a way to see if it is a bad disk, or just zfs being a 
> pain? Should I reset the checksum error counter and re-run the scrub?

Don't kill the canary!  Check the error logs for more details, also
make sure you are up-to-date on Marvell SATA controller patches.

Jonathan wrote:
> If you start seeing hundreds of errors be sure to check things like the
> cable.  I had a SATA cable come loose on a home ZFS fileserver and scrub
> was throwing 100's of errors even though the drive itself was fine, I
> don't want to think about what could have happened with UFS...

X4500s don't have any SATA cables :-)
  -- richard
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Drive Checksum error

Reply via email to