Hi Claus, Claus Guttesen wrote: > Hi. > > I installed solaris express developer edition (b79) on a supermicro > quad-core harpertown E5405 with 8 GB ram and two internal sata-drives. > I installed solaris onto one of the internal drives. I added an areca > arc-1680 sas-controller and configured it in jbod-mode. I attached an > external sas-cabinet with 16 sas-drives 1 TB (931 binary GB). I > created a raidz2-pool with ten disks and one spare. I then copied some > 400 GB of small files each approx. 1 MB. To simulate a disk-crash I > pulled one disk out of the cabinet and zfs faulted the drive and used > the spare and started a resilver.
I'm not convinced that this is a valid test; yanking a disk out will have physical-layer effects apart from removing the device from your system. I think relling or roch would have something to say on this also. > During the resilver-process one of the remaining disks had a > checksum-error and was marked as degraded. The zpool is now > unavailable. I first tried to add another spare but got I/O-error. I > then tried to replace the degraded disk by adding a new one: > > # zpool add ef1 c3t1d3p0 > cannot open '/dev/dsk/c3t1d3p0': I/O error > > Partial dmesg: > > Jul 25 13:14:00 malene arcmsr: [ID 419778 kern.notice] arcmsr0: scsi > id=1 lun=3 ccb='0xffffff02e0ca0800' outstanding command timeout > Jul 25 13:14:00 malene arcmsr: [ID 610198 kern.notice] arcmsr0: scsi > id=1 lun=3 fatal error on target, device was gone > Jul 25 13:14:00 malene arcmsr: [ID 658202 kern.warning] WARNING: > arcmsr0: tran reset level=1 tran reset with level=1 is a bus reset > Jul 25 13:14:00 malene arcmsr: [ID 658202 kern.warning] WARNING: > arcmsr0: tran reset level=0 tran reset with level=0 is a target-specific reset, which arcmsr doesn't support. ... > Jul 25 13:15:00 malene arcmsr: [ID 419778 kern.notice] arcmsr0: scsi > id=1 lun=3 ccb='0xffffff02e0ca0800' outstanding command timeout > Jul 25 13:15:00 malene arcmsr: [ID 610198 kern.notice] arcmsr0: scsi > id=1 lun=3 fatal error on target, device was gone The command timed out because your system configuration was unexpectedly changed in a manner which arcmsr doesn't support. .... > /usr/sbin/zpool status > pool: ef1 > state: DEGRADED > status: One or more devices are faulted in response to persistent errors. > Sufficient replicas exist for the pool to continue functioning in a > degraded state. > action: Replace the faulted device, or use 'zpool clear' to mark the device > repaired. > scrub: resilver in progress, 0.02% done, 5606h29m to go > config: > > NAME STATE READ WRITE CKSUM > ef1 DEGRADED 0 0 0 > raidz2 DEGRADED 0 0 0 > spare ONLINE 0 0 0 > c3t0d0p0 ONLINE 0 0 0 > c3t1d2p0 ONLINE 0 0 0 > c3t0d1p0 ONLINE 0 0 0 > c3t0d2p0 ONLINE 0 0 0 > c3t0d0p0 FAULTED 35 1.61K 0 too many errors > c3t0d4p0 ONLINE 0 0 0 > c3t0d5p0 DEGRADED 0 0 34 too many errors > c3t0d6p0 ONLINE 0 0 0 > c3t0d7p0 ONLINE 0 0 0 > c3t1d0p0 ONLINE 0 0 0 > c3t1d1p0 ONLINE 0 0 0 > spares > c3t1d2p0 INUSE currently in use > > errors: No known data errors a double disk failure while resilvering - not a good state for your pool to be in. Can you wait for the resilver to complete? Every minute that goes by tends to decrease the estimate on how long remains. In addition, why are you using p0 devices rather than GPT-labelled disks (or whole-disk s0 slices) ? > When I try to start cli64 to access the arc-1680-card it hangs as well. > Is this a deficiency in the arcmsr-driver? I'll quibble - "this" can mean several things. Yes, there seems to be an issue with arcmsr's handling of uncoordinated device removal. I advise against doing this I don't know how cli64 works and you haven't provided any messages output from the system at the time when "it hangs" - is that the cli64 util, the system, your zpool?... For interest - which version of arcmsr are you running? James C. McPherson -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss