Re: Checksum errors across ZFS array

2012-07-22 Thread Holger Kipp
Am 21.07.2012 um 00:57 schrieb "James Snow" : > On Fri, Jul 20, 2012 at 03:46:21PM -0700, Doug Barton wrote: >> >> You probably know this already, but just in case ... Software memory >> tests cannot tell you conclusively that memory is good, only that it's >> bad. > > I may have known that in a

Re: Checksum errors across ZFS array

2012-07-21 Thread Doug Barton
On 07/20/2012 15:55, James Snow wrote: > On Fri, Jul 20, 2012 at 03:46:21PM -0700, Doug Barton wrote: >> >> You probably know this already, but just in case ... Software memory >> tests cannot tell you conclusively that memory is good, only that it's >> bad. > > I may have known that in a past li

Re: Checksum errors across ZFS array

2012-07-20 Thread Steven Hartland
- Original Message - From: "Dr Josef Karthauser" So, take care if the memory doesn't report any failures, it might still be faulty. p.s. It was my fault that I wasn't running ECC memory on the system! :/. We've even seen this with ECC memory. Running the memory in a different machin

Re: Checksum errors across ZFS array

2012-07-20 Thread James Snow
On Fri, Jul 20, 2012 at 03:46:21PM -0700, Doug Barton wrote: > > You probably know this already, but just in case ... Software memory > tests cannot tell you conclusively that memory is good, only that it's > bad. I may have known that in a past life but certainly wasn't thinking about it now.

Re: Checksum errors across ZFS array

2012-07-20 Thread Doug Barton
On 07/20/2012 15:22, James Snow wrote: > I've run memtest for about 20 hours now (13 hours in one pass, 7 and > counting on the second) and seen no errors. Hrm. You probably know this already, but just in case ... Software memory tests cannot tell you conclusively that memory is good, only that it

Re: Checksum errors across ZFS array

2012-07-20 Thread James Snow
On Fri, Jul 20, 2012 at 04:09:28PM +0100, Dr Josef Karthauser wrote: > Take care though, my system which had been working fine for about > a year when I noticed the ZFS rot (which all appears to be recent > in time). I ran memcheck+ on it for 8 hours or so, and it showed no > errors at all. Howeve

Re: Checksum errors across ZFS array

2012-07-20 Thread Dr Josef Karthauser
On 19 Jul 2012, at 18:15, James Snow wrote: > On Thu, Jul 19, 2012 at 06:05:32PM +0100, Dr Joe Karthauser wrote: > >> Hi James, >> >> It's almost definitely a memory problem. I'd change it ASAP if I were >> you. >> >> I lost about 70mb from my zfs pool for this very reason just a few >> weeks a

Re: Checksum errors across ZFS array

2012-07-19 Thread Steven Hartland
- Original Message - From: "James Snow" On Thu, Jul 19, 2012 at 06:05:32PM +0100, Dr Joe Karthauser wrote: Hi James, It's almost definitely a memory problem. I'd change it ASAP if I were you. I lost about 70mb from my zfs pool for this very reason just a few weeks ago. Luckily I h

Re: Checksum errors across ZFS array

2012-07-19 Thread Steven Hartland
- Original Message - From: "James Snow" I have a ZFS server on which I've seen periodic checksum errors on almost every drive. While scrubbing the pool last night, it began to report unrecoverable data errors on a single file. I compared an md5 of the supposedly corrupted file to an

Re: Checksum errors across ZFS array

2012-07-19 Thread James Snow
On Thu, Jul 19, 2012 at 06:05:32PM +0100, Dr Joe Karthauser wrote: > Hi James, > > It's almost definitely a memory problem. I'd change it ASAP if I were > you. > > I lost about 70mb from my zfs pool for this very reason just a few > weeks ago. Luckily I had enough snapshots from before the rot set

Re: Checksum errors across ZFS array

2012-07-19 Thread Dr Joe Karthauser
Hi James, It's almost definitely a memory problem. I'd change it ASAP if I were you. I lost about 70mb from my zfs pool for this very reason just a few weeks ago. Luckily I had enough snapshots from before the rot set in to recover most of what I lost. Joe -- Dr Joe Karthauser On 19 Jul 201

Checksum errors across ZFS array

2012-07-19 Thread James Snow
I have a ZFS server on which I've seen periodic checksum errors on almost every drive. While scrubbing the pool last night, it began to report unrecoverable data errors on a single file. I compared an md5 of the supposedly corrupted file to an md5 of the original copy, stored on different media. T