I see the checksum errors with zpool status -v but the drives are good as far 
as I know.  Fmadm faulty reports nothing:

 root@Atlas:/home/user# zpool status
  pool: rpool
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
        still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(5) for details.
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          c3t1d0s0  ONLINE       0     0     0

errors: No known data errors

  pool: volume0
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-9P
  scan: scrub repaired 50.5K in 0h14m with 0 errors on Fri Oct 12 09:37:27 2012
config:

        NAME        STATE     READ WRITE CKSUM
        volume0     ONLINE       0     0     0
          mirror-0  ONLINE       0     0     2
            c3t0d0  ONLINE       0     0     3
            c3t3d0  ONLINE       0     0     4


root@Atlas:/home/user# fmadm faulty
root@Atlas:/home/user#



Funny thing is the only happens during big copies when I'm moving an image from 
one source to another via the network.


-----Original Message-----
From: Udo Grabowski (IMK) [mailto:udo.grabow...@kit.edu] 
Sent: Friday, October 12, 2012 2:00 PM
To: Discussion list for OpenIndiana
Subject: Re: [OpenIndiana-discuss] Openindiana ZFS server crashes and reboots

On 10/12/12 07:34 PM, Bentley, Dain wrote:
> Hello Udo, thanks for the reply.  Here is the text from fmdump -eV.
 > Is there anything I should be looking for?

So BOTH disks spit out ZFS checksum errors like a machine gun, this can either 
be a controller/cable problem or a memory problem (you don't have ECC memory 
with an ECC supporting processor, e.g.
Xeon ? Otherwise those errors would be reported by fmdump -e).
Or some problem in the OS, those checksum problems haunt me on my home 
workstation (also no ECC) when scrubbing the rpool mirror, although disks and 
cables are ok and no errors occur when not scrubbing. But the sheer amount of 
errors does not look like that symptom.

'zpool status -v' should show some degradation of the mirror with the exact 
checksum count, and also if files or metadata are affected, 'fmadm faulty' 
gives components retired due to errors. If you have data corruption, this could 
cause reboots on some occasion.

Hard to say how to hunt this down, other than checking cables, memory seating, 
looking for newer HBA or disk/BIOS firmware, torturing memory with advanced 
memory testers.
The fact that the machine does reboot let me doubt that this is a pure ZFS/disk 
problem, I never saw that, the machine eventually stalls when those errors 
hammer the machine too hard, but it would not cause a reboot. Memory is always 
the best bet, but it could also be a motherboard problem.
Maybe the vendor has some hardware checking tool on the CD in the box or on his 
website ?

>
> TIME                           CLASS
> Oct 11 2012 06:06:59.370787527 ereport.fs.zfs.data nvlist version: 0
>          class = ereport.fs.zfs.data
>          ena = 0x9ebd5c95f9f00401
>          detector = (embedded nvlist)
>          nvlist version: 0
>                  version = 0x0
>                  scheme = zfs
>                  pool = 0x6a8f5a381b7e2f2c
>          (end detector)
>
>          pool = volume0
>          pool_guid = 0x6a8f5a381b7e2f2c
>          pool_context = 0
>          pool_failmode = wait
>          zio_err = 50
>          zio_objset = 0x28
>          zio_object = 0x1
>          zio_level = 0
>          zio_blkid = 0x6047da
>          __ttl = 0x1
>          __tod = 0x50769a43 0x1619c4c7
>
> Oct 11 2012 06:06:59.370787942 ereport.fs.zfs.checksum nvlist version: 
> 0
>          class = ereport.fs.zfs.checksum
>          ena = 0x9ebd5c95f9f00401
>          detector = (embedded nvlist)
>          nvlist version: 0
>                  version = 0x0
>                  scheme = zfs
>                  pool = 0x6a8f5a381b7e2f2c
>                  vdev = 0xdd2eef656bfc1db5
>          (end detector)
>
>          pool = volume0
>          pool_guid = 0x6a8f5a381b7e2f2c
>          pool_context = 0
>          pool_failmode = wait
>          vdev_guid = 0xdd2eef656bfc1db5
>          vdev_type = disk
>          vdev_path = /dev/dsk/c3t0d0s0
>          vdev_devid = id1,sd@SATA_____HDS725050KLA360_______KRVN03ZAG1JY4D/a
>          parent_guid = 0xfd55a23a93069d20
>          parent_type = mirror
>          zio_err = 50
>          zio_offset = 0x14893b600
>          zio_size = 0x4600
>          zio_objset = 0x28
>          zio_object = 0x1
>          zio_level = 0
>          zio_blkid = 0x6047da
>          cksum_expected = 0x521c0ebafd7 0x2be41b8abfd1b3 0xfccf5694dcdbaa49 
> 0x3e24575fd2f4aa4d
>          cksum_actual = 0x521c2ebafd7 0x2be436a2bfd1b3 0xfcd00e26f8dbaa49 
> 0x4161c187aaf4aa4d
>          cksum_algorithm = fletcher4
>          __ttl = 0x1
>          __tod = 0x50769a43 0x1619c666
>
> Oct 11 2012 06:06:59.370788126 ereport.fs.zfs.checksum nvlist version: 
> 0
>          class = ereport.fs.zfs.checksum
>          ena = 0x9ebd5c95f9f00401
>          detector = (embedded nvlist)
>          nvlist version: 0
>                  version = 0x0
>                  scheme = zfs
>                  pool = 0x6a8f5a381b7e2f2c
>                  vdev = 0x377ede8d0fb06f7e
>          (end detector)
>
>          pool = volume0
>          pool_guid = 0x6a8f5a381b7e2f2c
>          pool_context = 0
>          pool_failmode = wait
>          vdev_guid = 0x377ede8d0fb06f7e
>          vdev_type = disk
>          vdev_path = /dev/dsk/c3t3d0s0
>          vdev_devid = id1,sd@SATA_____HDS725050KLA360_______KRVN03ZAG39LKD/a
>          parent_guid = 0xfd55a23a93069d20
>          parent_type = mirror
>          zio_err = 50
>          zio_offset = 0x14893b600
>          zio_size = 0x4600
>          zio_objset = 0x28
>          zio_object = 0x1
>          zio_level = 0
>          zio_blkid = 0x6047da
>          cksum_expected = 0x521c0ebafd7 0x2be41b8abfd1b3 0xfccf5694dcdbaa49 
> 0x3e24575fd2f4aa4d
>          cksum_actual = 0x521c2ebafd7 0x2be436a2bfd1b3 0xfcd00e26f8dbaa49 
> 0x4161c187aaf4aa4d
>          cksum_algorithm = fletcher4
>          __ttl = 0x1
>          __tod = 0x50769a43 0x1619c71e
>
> ... (1000 more of them in rapid succession)...
 >
> Oct 12 2012 07:54:53.123217977 ereport.fs.zfs.checksum nvlist version: 
> 0
>          class = ereport.fs.zfs.checksum
>          ena = 0xe644965bba100401
>          detector = (embedded nvlist)
>          nvlist version: 0
>                  version = 0x0
>                  scheme = zfs
>                  pool = 0xda1ed4eddd886ca2
>                  vdev = 0x76d6d5ecc9007061
>          (end detector)
>
>          pool = volume0
>          pool_guid = 0xda1ed4eddd886ca2
>          pool_context = 0
>          pool_failmode = wait
>          vdev_guid = 0x76d6d5ecc9007061
>          vdev_type = disk
>          vdev_path = /dev/dsk/c3t0d0s0
>          vdev_devid = id1,sd@SATA_____HDS725050KLA360_______KRVN03ZAG1JY4D/a
>          parent_guid = 0xb85bb36665652fa6
>          parent_type = mirror
>          zio_err = 50
>          zio_offset = 0x6414e7e00
>          zio_size = 0xa400
>          zio_objset = 0x28
>          zio_object = 0x1
>          zio_level = 0
>          zio_blkid = 0x1c3121
>          cksum_expected = 0xd682bb3691e 0x110e9a533de6720 0x6bb562c60c3d27a8 
> 0xd15dd859803c65fd
>          cksum_actual = 0xd682db3691e 0x110e9df4bde6720 0x6bb8ae9ba83d27a8 
> 0xf14a4b22583c65fd
>          cksum_algorithm = fletcher4
>          bad_ranges = 0x2fd0 0x2fd8
>          bad_ranges_min_gap = 0x8
>          bad_range_sets = 0x1
>          bad_range_clears = 0x0
>          bad_set_bits = 0x0 0x0 0x0 0x2 0x0 0x0 0x0 0x0
>          bad_cleared_bits = 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
>          __ttl = 0x1
>          __tod = 0x5078050d 0x7582839
>
> Oct 12 2012 09:31:27.252886984 ereport.fs.zfs.checksum nvlist version: 
> 0
>          class = ereport.fs.zfs.checksum
>          ena = 0x3a9567025fb00801
>          detector = (embedded nvlist)
>          nvlist version: 0
>                  version = 0x0
>                  scheme = zfs
>                  pool = 0xda1ed4eddd886ca2
>                  vdev = 0x4dd27d7d2cff5683
>          (end detector)
>
>          pool = volume0
>          pool_guid = 0xda1ed4eddd886ca2
>          pool_context = 0
>          pool_failmode = wait
>          vdev_guid = 0x4dd27d7d2cff5683
>          vdev_type = disk
>          vdev_path = /dev/dsk/c3t3d0s0
>          vdev_devid = id1,sd@SATA_____HDS725050KLA360_______KRVN03ZAG39LKD/a
>          parent_guid = 0xb85bb36665652fa6
>          parent_type = mirror
>          zio_err = 50
>          zio_offset = 0x4f8cf3c00
>          zio_size = 0xca00
>          zio_objset = 0x28
>          zio_object = 0x1
>          zio_level = 0
>          zio_blkid = 0x110e12
>          cksum_expected = 0x109aa00f800f 0x1aae29387c03843 0x289b0f3d871cdb32 
> 0xb91c890d879e9fc4
>          cksum_actual = 0x109aa20f800f 0x1aae2d39fc03843 0x289f125e231cdb32 
> 0xe3fb48d05f9e9fc4
>          cksum_algorithm = fletcher4
>          bad_ranges = 0x49d0 0x49d8
>          bad_ranges_min_gap = 0x8
>          bad_range_sets = 0x1
>          bad_range_clears = 0x0
>          bad_set_bits = 0x0 0x0 0x0 0x2 0x0 0x0 0x0 0x0
>          bad_cleared_bits = 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
>          __ttl = 0x1
>          __tod = 0x50781baf 0xf12bfc8
>


-- 
Dr.Udo Grabowski    Inst.f.Meteorology a.Climate Research IMK-ASF-SAT
www-imk.fzk.de/asf/sat/grabowski/ www.imk-asf.kit.edu/english/sat.php
KIT - Karlsruhe Institute of Technology            http://www.kit.edu
Postfach 3640,76021 Karlsruhe,Germany  T:(+49)721 608-26026 F:-926026


_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss

Reply via email to