Glaser, David wrote: > Hi all, > > A few weeks ago I was inquiring of the group on how often to do zfs > scrubs of pools on our x4500's. Figures that the first time I try > to do a monthly scrub of our pools, we get one of the three machines > to throw an error. On one of the machines, there's one disk that has > registered one Checksum error. Sun lists it as an 'unrecoverable I/O > error'. Is it really an unrecoverable error? Is the drive really bad > (i.e. warrant a call to SUN for an RMA of the drive?) Researching > the error message says that you can set the plateau of checksum > errors before it throws an error, but I'd figure that one is too many.
I presume you mean that a "zpool status" shows a data error? If so, try "zpool status -xv" to see which file(s) are affected. If ZFS is managing the redundancy, it should be able to recover the data. Depending on the drive, disk drive vendors spec 1 UER for every 1e15 bits read. So it is not really all that unlikely to see them on a system the size of an X4500 which can hold ~3.8e14 bits. > So, is there a way to see if it is a bad disk, or just zfs being a > pain? Should I reset the checksum error counter and re-run the scrub? Don't kill the canary! Check the error logs for more details, also make sure you are up-to-date on Marvell SATA controller patches. Jonathan wrote: > If you start seeing hundreds of errors be sure to check things like the > cable. I had a SATA cable come loose on a home ZFS fileserver and scrub > was throwing 100's of errors even though the drive itself was fine, I > don't want to think about what could have happened with UFS... X4500s don't have any SATA cables :-) -- richard _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss