On Mar 11, 2010, at 11:28 PM, Paul Tetley wrote: > Hi, > > My zpool is reporting unrecoverable errors with the metadata: > > pool: rpool2 > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be effected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > > (snip) > > "errors: Permanent errors have been detected in the following files: > <metadata>:<0x0> > <metadata>:<0x1> > > It initially reported a DEGRADED pool, but after a reboot, the pool is now > ONLINE and a quick inspection indicates that my data is present and intact > (though the errors stop the file-systems in the pool from mounting at boot - > it drops into maintenance mode). My reading of > http://www.sun.com/msg/ZFS-8000-8A indicates I should destroy the pool and > start again, but http://www.crypticide.com/dropsafe/article/2162 gives me > some small hope that this might be fixable... > > The pool has been 'a little flakey' since I built it two months back. I've > been getting small numbers of read and checksum errors on a few of the disks > each day. Initially I replaced the disks, but they would always pass all > testing, so lately I've just been clearing the errors each day and looking > for another solution. I thought I had found it when I discovered WD had a > firmware patch (http://www.3ware.com/kb/article.aspx?id=15592 , > http://blog.insanegenius.com/2009/09/western-digital-re4-gp-2tb-drive.html) > which solved bugs in the drive spin-up behaviour which has been causing > problems for various hardware RAID controllers. So, yesterday I shutdown > the machine, pulled 2 of the 4 troublesome disks, and applied the firmware > upgrade (04.05G05). When I booted, the pool was degraded and showed > metadata errors. After a shutdown and cold start, the pool was ONLINE but > still have metadata errors (so somewhat inconsistent guidance from ZPOOL > STATUS. > > Can anyone explain what this 'metadata' is? > > More Details: > • This is a backup server so I can rebuild if necessary, but on > principle I'd like to have a go at fixing it... > • The zpool has 96 x 2TB drives divided in to RAIDZ2 sets of 8 (6+2). > • The drives are Western Digital RE-4's (WD2002FYPS). > • Running OpenSolaris build snv_111b. > • Drives are in two AIC JBODs connectected via SAS. > • HBA is an LSI 3801E
It has been suggested to check the firmware release for this controller. This CR has a workaround that might help, too. http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6894775 -- richard > • Server is 1RU SuperMicro Intel. > Any advice appreciated! > > :-) > Paul Tetley > NearMap Pty Ltd > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Atlanta, March 16-18, 2010 http://nexenta-atlanta.eventbrite.com Los Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss