An update: Well things didn't quite turn out as expected. I decided to follow the path right to the disks for clues. Digging into the adapter diags with LSIUTIL, revealed an Adapter Link issue.
Adapter Phy 5: Link Down Invalid DWord Count 5,969,575 Running Disparity Error Count 5,782,581 Loss of DWord Synch Count 0 Phy Reset Problem Count 0 After replacing cables, I eventually replaced the controller and then things really went pear shaped. It turns out the backplane, that ran without major issues on the Supermicro controller, refused to operate with the LSI SAS3081E-R (with latest code)- card wouldn't initialise, links only ran at 1.5Mb/s, most disks offline etc. Replacing the backplane (whole jbod) fixed the Adapter Link problems, but timeouts still occur when scrubbing. Oh look, the dev names moved. they used to start at c4t8d0, but it has "made it right" all by itself. EYHOBG! iostat -X -e -n s/w h/w trn tot device 0 0 0 0 c4t0d0 0 0 0 0 c4t1d0 0 2 8 10 c4t2d0 0 3 18 21 c4t3d0 0 0 0 0 c4t4d0 0 2 12 14 c4t5d0 0 1 8 9 c4t6d0 0 2 15 17 c4t7d0 0 0 0 0 c4t8d0 0 0 0 0 c4t9d0 0 0 0 0 c4t10d0 0 0 0 0 c4t11d0 0 0 0 0 c4t12d0 0 0 0 0 c4t13d0 0 11 84 95 c4t41d0 0 8 62 70 c4t42d0 0 10 72 82 c4t43d0 0 19 147 166 c4t44d0 0 12 102 114 c4t45d0 0 19 145 164 c4t46d0 0 13 108 121 c4t47d0 0 7 62 69 c4t48d0 0 14 113 127 c4t49d0 0 11 96 107 c4t50d0 0 11 91 102 c4t51d0 0 8 64 72 c4t52d0 0 13 108 121 c4t53d0 0 11 106 117 c4t54d0 0 10 82 92 c4t55d0 0 10 88 98 c4t56d0 0 12 85 97 c4t57d0 0 6 38 44 c4t58d0 and status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. c4t2d0 ONLINE 0 0 1 25.5K repaired c4t55d0 ONLINE 0 0 4 102K repaired I do note that after these errors, there are no errors in the lsi adapter diag logs. Data disks are all new WD10EARS. If the OpenSolaris and ZFS combination wasn't so robust, this would have ended badly. Next step will be trying different timeout settings on the controller and see if that helps. P.S. I have a client with a "suspect", nearly full, 20Tb zpool to try to scrub, so this is a big issue for me. A resilver of a 1Tb disk takes up to 40 hrs., so I expect a scrub to be a week (or two), and at present, would probably result in multiple disk failures. Mark. -- This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss