Hi all, it might not be a ZFS issue (and thus on the wrong list), but maybe there's someone here who might be able to give us a good hint:
We are operating 13 x4500 and started to play with non-Sun blessed SSDs in there. As we were running Solaris 10u5 before and wanted to use them as log devices we upgraded to the latest and greatest 10u8 and changed the zpool layout[1]. However, on the first machine we found many, many problems with various disks "failing" in different vdevs (I wrote about this in December on this list IIRC). After going through this with Sun they gave us hints but mostly blamed (maybe rightfully the Intel X25e in there), we considered the 2.5" to 2.5" converter to be at fault as well. Thus we did the next test by placing the SSD into the tray without a conversion unit, but that box (a different one) failed with the same problems. Now, we "learned" from this experience and did the same to another box but without the SSD, i.e. jumpstarted the box and installed 10u8, redid the zpool and started to fill data in. In today's scrub suddenly this happened: s09:~# zpool status pool: atlashome state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: resilver in progress for 0h9m, 3.89% done, 4h2m to go config: NAME STATE READ WRITE CKSUM atlashome DEGRADED 0 0 0 raidz1 ONLINE 0 0 0 c0t0d0 ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 c6t0d0 ONLINE 0 0 0 c7t0d0 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c5t1d0 ONLINE 0 0 0 c6t1d0 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c7t1d0 ONLINE 0 0 1 c0t2d0 ONLINE 0 0 0 c1t2d0 ONLINE 0 0 2 c4t2d0 ONLINE 0 0 0 c5t2d0 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c6t2d0 ONLINE 0 0 0 c7t2d0 ONLINE 0 0 0 c0t3d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 c4t3d0 ONLINE 0 0 0 raidz1 DEGRADED 0 0 0 c5t3d0 ONLINE 0 0 0 c6t3d0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 1 spare DEGRADED 0 0 0 c4t4d0 DEGRADED 5 0 11 too many errors c0t4d0 ONLINE 0 0 0 5.38G resilvered raidz1 ONLINE 0 0 0 c5t4d0 ONLINE 0 0 0 c6t4d0 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c4t5d0 ONLINE 0 0 0 c5t5d0 ONLINE 0 0 0 c6t5d0 ONLINE 0 0 0 c7t5d0 ONLINE 0 0 0 c0t6d0 ONLINE 0 0 1 raidz1 ONLINE 0 0 0 c1t6d0 ONLINE 0 0 0 c4t6d0 ONLINE 0 0 0 c5t6d0 ONLINE 0 0 0 c6t6d0 ONLINE 0 0 0 c7t6d0 ONLINE 0 0 1 raidz1 ONLINE 0 0 0 c0t7d0 ONLINE 0 0 0 c1t7d0 ONLINE 0 0 0 c4t7d0 ONLINE 0 0 0 c5t7d0 ONLINE 0 0 0 c6t7d0 ONLINE 0 0 0 spares c0t4d0 INUSE currently in use c7t7d0 AVAIL Also similar to the other hosts were the much, much higher Soft/Hard error count in iostat: s09:~# iostat -En|grep Soft c2t0d0 Soft Errors: 1 Hard Errors: 2 Transport Errors: 0 c3t0d0 Soft Errors: 2 Hard Errors: 0 Transport Errors: 0 c5t0d0 Soft Errors: 2805 Hard Errors: 0 Transport Errors: 0 c6t0d0 Soft Errors: 4003 Hard Errors: 1 Transport Errors: 0 c4t0d0 Soft Errors: 4003 Hard Errors: 2 Transport Errors: 0 c1t0d0 Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0 c6t1d0 Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0 c1t1d0 Soft Errors: 4002 Hard Errors: 1 Transport Errors: 0 c4t1d0 Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0 c5t1d0 Soft Errors: 4003 Hard Errors: 1 Transport Errors: 0 c1t2d0 Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0 c0t0d0 Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0 c5t2d0 Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0 c4t2d0 Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0 c1t3d0 Soft Errors: 4000 Hard Errors: 0 Transport Errors: 0 c6t2d0 Soft Errors: 4002 Hard Errors: 1 Transport Errors: 0 c0t1d0 Soft Errors: 4002 Hard Errors: 2 Transport Errors: 0 c4t3d0 Soft Errors: 4000 Hard Errors: 0 Transport Errors: 0 c5t3d0 Soft Errors: 4001 Hard Errors: 1 Transport Errors: 0 c6t3d0 Soft Errors: 4000 Hard Errors: 0 Transport Errors: 0 c1t4d0 Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0 c0t2d0 Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0 c4t4d0 Soft Errors: 4004 Hard Errors: 6 Transport Errors: 0 c5t4d0 Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0 c5t5d0 Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0 c6t4d0 Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0 c4t5d0 Soft Errors: 4003 Hard Errors: 2 Transport Errors: 0 c1t5d0 Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0 c0t3d0 Soft Errors: 4001 Hard Errors: 1 Transport Errors: 0 c5t6d0 Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0 c1t6d0 Soft Errors: 4003 Hard Errors: 1 Transport Errors: 0 c4t6d0 Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0 c0t4d0 Soft Errors: 4001 Hard Errors: 0 Transport Errors: 0 c6t5d0 Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0 c5t7d0 Soft Errors: 4000 Hard Errors: 1 Transport Errors: 0 c4t7d0 Soft Errors: 4000 Hard Errors: 0 Transport Errors: 0 c0t5d0 Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0 c6t6d0 Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0 c1t7d0 Soft Errors: 4000 Hard Errors: 0 Transport Errors: 0 c0t6d0 Soft Errors: 4001 Hard Errors: 1 Transport Errors: 0 c6t7d0 Soft Errors: 4000 Hard Errors: 0 Transport Errors: 0 c0t7d0 Soft Errors: 4000 Hard Errors: 0 Transport Errors: 0 c7t0d0 Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0 c7t1d0 Soft Errors: 4003 Hard Errors: 2 Transport Errors: 0 c7t2d0 Soft Errors: 4003 Hard Errors: 1 Transport Errors: 0 c7t3d0 Soft Errors: 4001 Hard Errors: 1 Transport Errors: 0 c7t4d0 Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0 c7t5d0 Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0 c7t6d0 Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0 c7t7d0 Soft Errors: 3997 Hard Errors: 0 Transport Errors: 0 (after an uptime of only a couple of days): s09:~# uptime 4:27pm up 2 day(s), 21:31, 1 user, load average: 0.17, 0.34, 1.45 s09:~# uname -a SunOS s09 5.10 Generic_142901-03 i86pc i386 i86pc We checked these numbers before the upgrade and we had no hard errors and only an order of magnitude less soft errors after 10s of days of uptime. Is there anyone aware of some regression when going to 10u8? Might it be ZFS related or can the hardware of 3 x4500 rot away after an upgrade within days when the environmental state is not changed at all? Thanks a lot in advance for any hint Carsten [1] before we used 3 vdevs with 15, 15 and 16 disks inside, now we are using 9 vdevs with 5 disks each (plus 2 hot spares) _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss