[zfs-discuss] zfs/sol10u8 less stable than in sol10u5?

Carsten Aulbert Thu, 04 Feb 2010 07:31:52 -0800

Hi all,

it might not be a ZFS issue (and thus on the wrong list), but maybe there's 
someone here who might be able to give us a good hint:


We are operating 13 x4500 and started to play with non-Sun blessed SSDs in 
there. As we were running Solaris 10u5 before and wanted to use them as log 
devices we upgraded to the latest and greatest 10u8 and changed the zpool 
layout[1]. However, on the first machine we found many, many problems with 
various disks "failing" in different vdevs (I wrote about this in December on 
this list IIRC).

After going through this with Sun they gave us hints but mostly blamed (maybe 
rightfully the Intel X25e in there), we considered the 2.5" to 2.5" converter 
to be at fault as well. Thus we did the next test by placing the SSD into the 
tray without a conversion unit, but that box (a different one) failed with the 
same problems.

Now, we "learned" from this experience and did the same to another box but 
without the SSD, i.e. jumpstarted the box and installed 10u8, redid the zpool 
and started to fill data in. In today's scrub suddenly this happened:

s09:~# zpool status                                                   
  pool: atlashome                                                     
 state: DEGRADED                                                      
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors  
        using 'zpool clear' or replace the device with 'zpool replace'.     
   see: http://www.sun.com/msg/ZFS-8000-9P                                  
 scrub: resilver in progress for 0h9m, 3.89% done, 4h2m to go               
config:                                                                     

        NAME          STATE     READ WRITE CKSUM
        atlashome     DEGRADED     0     0     0
          raidz1      ONLINE       0     0     0
            c0t0d0    ONLINE       0     0     0
            c1t0d0    ONLINE       0     0     0
            c4t0d0    ONLINE       0     0     0
            c6t0d0    ONLINE       0     0     0
            c7t0d0    ONLINE       0     0     0
          raidz1      ONLINE       0     0     0
            c0t1d0    ONLINE       0     0     0
            c1t1d0    ONLINE       0     0     0
            c4t1d0    ONLINE       0     0     0
            c5t1d0    ONLINE       0     0     0
            c6t1d0    ONLINE       0     0     0
          raidz1      ONLINE       0     0     0
            c7t1d0    ONLINE       0     0     1
            c0t2d0    ONLINE       0     0     0
            c1t2d0    ONLINE       0     0     2
            c4t2d0    ONLINE       0     0     0
            c5t2d0    ONLINE       0     0     0
          raidz1      ONLINE       0     0     0
            c6t2d0    ONLINE       0     0     0
            c7t2d0    ONLINE       0     0     0
            c0t3d0    ONLINE       0     0     0
            c1t3d0    ONLINE       0     0     0
            c4t3d0    ONLINE       0     0     0
          raidz1      DEGRADED     0     0     0
            c5t3d0    ONLINE       0     0     0
            c6t3d0    ONLINE       0     0     0
            c7t3d0    ONLINE       0     0     0
            c1t4d0    ONLINE       0     0     1
            spare     DEGRADED     0     0     0
              c4t4d0  DEGRADED     5     0    11  too many errors
              c0t4d0  ONLINE       0     0     0  5.38G resilvered
          raidz1      ONLINE       0     0     0
            c5t4d0    ONLINE       0     0     0
            c6t4d0    ONLINE       0     0     0
            c7t4d0    ONLINE       0     0     0
            c0t5d0    ONLINE       0     0     0
            c1t5d0    ONLINE       0     0     0
          raidz1      ONLINE       0     0     0
            c4t5d0    ONLINE       0     0     0
            c5t5d0    ONLINE       0     0     0
            c6t5d0    ONLINE       0     0     0
            c7t5d0    ONLINE       0     0     0
            c0t6d0    ONLINE       0     0     1
          raidz1      ONLINE       0     0     0
            c1t6d0    ONLINE       0     0     0
            c4t6d0    ONLINE       0     0     0
            c5t6d0    ONLINE       0     0     0
            c6t6d0    ONLINE       0     0     0
            c7t6d0    ONLINE       0     0     1
          raidz1      ONLINE       0     0     0
            c0t7d0    ONLINE       0     0     0
            c1t7d0    ONLINE       0     0     0
            c4t7d0    ONLINE       0     0     0
            c5t7d0    ONLINE       0     0     0
            c6t7d0    ONLINE       0     0     0
        spares
          c0t4d0      INUSE     currently in use
          c7t7d0      AVAIL


Also similar to the other hosts were the much, much higher Soft/Hard error 
count in iostat:

s09:~# iostat -En|grep Soft
c2t0d0           Soft Errors: 1 Hard Errors: 2 Transport Errors: 0 
c3t0d0           Soft Errors: 2 Hard Errors: 0 Transport Errors: 0 
c5t0d0           Soft Errors: 2805 Hard Errors: 0 Transport Errors: 0 
c6t0d0           Soft Errors: 4003 Hard Errors: 1 Transport Errors: 0 
c4t0d0           Soft Errors: 4003 Hard Errors: 2 Transport Errors: 0 
c1t0d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0 
c6t1d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0 
c1t1d0           Soft Errors: 4002 Hard Errors: 1 Transport Errors: 0 
c4t1d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0 
c5t1d0           Soft Errors: 4003 Hard Errors: 1 Transport Errors: 0 
c1t2d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0 
c0t0d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0 
c5t2d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0 
c4t2d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0 
c1t3d0           Soft Errors: 4000 Hard Errors: 0 Transport Errors: 0 
c6t2d0           Soft Errors: 4002 Hard Errors: 1 Transport Errors: 0
c0t1d0           Soft Errors: 4002 Hard Errors: 2 Transport Errors: 0
c4t3d0           Soft Errors: 4000 Hard Errors: 0 Transport Errors: 0
c5t3d0           Soft Errors: 4001 Hard Errors: 1 Transport Errors: 0
c6t3d0           Soft Errors: 4000 Hard Errors: 0 Transport Errors: 0
c1t4d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0
c0t2d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0
c4t4d0           Soft Errors: 4004 Hard Errors: 6 Transport Errors: 0
c5t4d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0
c5t5d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0
c6t4d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0
c4t5d0           Soft Errors: 4003 Hard Errors: 2 Transport Errors: 0
c1t5d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0
c0t3d0           Soft Errors: 4001 Hard Errors: 1 Transport Errors: 0
c5t6d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0
c1t6d0           Soft Errors: 4003 Hard Errors: 1 Transport Errors: 0
c4t6d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0
c0t4d0           Soft Errors: 4001 Hard Errors: 0 Transport Errors: 0
c6t5d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0
c5t7d0           Soft Errors: 4000 Hard Errors: 1 Transport Errors: 0
c4t7d0           Soft Errors: 4000 Hard Errors: 0 Transport Errors: 0
c0t5d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0
c6t6d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0
c1t7d0           Soft Errors: 4000 Hard Errors: 0 Transport Errors: 0
c0t6d0           Soft Errors: 4001 Hard Errors: 1 Transport Errors: 0
c6t7d0           Soft Errors: 4000 Hard Errors: 0 Transport Errors: 0
c0t7d0           Soft Errors: 4000 Hard Errors: 0 Transport Errors: 0
c7t0d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0
c7t1d0           Soft Errors: 4003 Hard Errors: 2 Transport Errors: 0
c7t2d0           Soft Errors: 4003 Hard Errors: 1 Transport Errors: 0
c7t3d0           Soft Errors: 4001 Hard Errors: 1 Transport Errors: 0
c7t4d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0
c7t5d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0
c7t6d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0
c7t7d0           Soft Errors: 3997 Hard Errors: 0 Transport Errors: 0

(after an uptime of only a couple of days):

s09:~# uptime
  4:27pm  up 2 day(s), 21:31,  1 user,  load average: 0.17, 0.34, 1.45
s09:~# uname -a
SunOS s09 5.10 Generic_142901-03 i86pc i386 i86pc

We checked these numbers before the upgrade and we had no hard errors and only 
an order of magnitude less soft errors after 10s of days of uptime.

Is there anyone aware of some regression when going to 10u8? Might it be ZFS 
related or can the hardware of 3 x4500 rot away after an upgrade within days 
when the environmental state is not changed at all?

Thanks a lot in advance for any hint

Carsten

[1] before we used 3 vdevs with 15, 15 and 16 disks inside, now we are using 9 
vdevs with 5 disks each (plus 2 hot spares)
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] zfs/sol10u8 less stable than in sol10u5?

Reply via email to