Hi James, It's almost definitely a memory problem. I'd change it ASAP if I were you.
I lost about 70mb from my zfs pool for this very reason just a few weeks ago. Luckily I had enough snapshots from before the rot set in to recover most of what I lost. Joe -- Dr Joe Karthauser On 19 Jul 2012, at 16:29, James Snow <s...@teardrop.org> wrote: > I have a ZFS server on which I've seen periodic checksum errors on > almost every drive. While scrubbing the pool last night, it began to > report unrecoverable data errors on a single file. > > I compared an md5 of the supposedly corrupted file to an md5 of the > original copy, stored on different media. They were the same, suggesting > no corruption. > > A large file was being written to the pool while the scrub was in > progress, and the entire array became unresponsive. The OS was still up, > but 'zpool status' showed the scrub progress stuck at the same spot, > with the throughput rate falling. 'shutdown -r now' stalled. Eventually > I hard power cycled the system. > > Now, attempting to read the file that ZFS reports errors on yields > "Input/output error." The scrub completed, with the following result: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 7 > mirror-0 ONLINE 0 0 0 > aacd0p1 ONLINE 0 0 0 > aacd4p1 ONLINE 0 0 1 > mirror-1 ONLINE 0 0 0 > aacd1p1 ONLINE 0 0 0 > aacd5p1 ONLINE 0 0 0 > mirror-2 ONLINE 0 0 14 > aacd2p1 ONLINE 0 0 14 > aacd6p1 ONLINE 0 0 14 > mirror-3 ONLINE 0 0 0 > aacd3p1 ONLINE 0 0 0 > aacd7p1 ONLINE 0 0 0 > > The system configuration is as follows: > > Controller: Adaptec 2805 > Motherboard: Supermicro X8STE > Drive Cage: 2x Supermicro CSE-M35T-1 > Memory: 2x Kingston 12GB ECC (KVR1066D3E7SK3/12G) > PSU: Nexus RX-7000 > OS: 9.0-RELEASE-p3 > ZFS: ZFS filesystem version 5, ZFS storage pool version 28 > > > The Adaptec card has 2 ports, each of which uses a 4-port fan-out cable. > The cables are routed as shown: > > /--- aacd0 (ST1000DM003-9YN1 CC4D) > / /-- aacd1 (ST1000DM003-9YN1 CC4D) > p1----- > \ \-- aacd2 (WDC WD1001FALS-0 05.0) > \--- aacd3 (WDC WD1001FALS-0 05.0) > > /--- aacd4 (ST1000DM003-9YN1 CC4D) > / /-- aacd5 (ST1000DM003-9YN1 CC4D) > p2----- > \ \-- aacd6 (WDC WD1002FAEX-0 05.0) > \--- aacd7 (WDC WD1002FAEX-0 05.0) > > You can see that each ZFS mirror device is comprised of one drive from > each drive carrier, on separate ports, on separate cables. > > Since I have seen periodic checksum errors on almost every drive but the > only common component is the Adapter controller and the motherboard, I > suspect the controller. (Or the motherboard, but I'm starting with the > controller since it's much simpler to swap out.) > > Could it be something else? What else I should be looking at? Any input > greatly appreciated. > > > -Snow > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" > _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"