Hi.

I installed solaris express developer edition (b79) on a supermicro
quad-core harpertown E5405 with 8 GB ram and two internal sata-drives.
I installed solaris onto one of the internal drives. I added an areca
arc-1680 sas-controller and configured it in jbod-mode. I attached an
external sas-cabinet with 16 sas-drives 1 TB (931 binary GB). I
created a raidz2-pool with ten disks and one spare. I then copied some
400 GB of small files each approx. 1 MB. To simulate a disk-crash I
pulled one disk out of the cabinet and zfs faulted the drive and used
the spare and started a resilver.

During the resilver-process one of the remaining disks had a
checksum-error and was marked as degraded. The zpool is now
unavailable. I first tried to add another spare but got I/O-error. I
then tried to replace the degraded disk by adding a new one:

# zpool add ef1 c3t1d3p0
cannot open '/dev/dsk/c3t1d3p0': I/O error

Partial dmesg:

Jul 25 13:14:00 malene arcmsr: [ID 419778 kern.notice] arcmsr0: scsi
id=1 lun=3 ccb='0xffffff02e0ca0800' outstanding command timeout
Jul 25 13:14:00 malene arcmsr: [ID 610198 kern.notice] arcmsr0: scsi
id=1 lun=3 fatal error on target, device was gone
Jul 25 13:14:00 malene arcmsr: [ID 658202 kern.warning] WARNING:
arcmsr0: tran reset level=1
Jul 25 13:14:00 malene arcmsr: [ID 658202 kern.warning] WARNING:
arcmsr0: tran reset level=0
Jul 25 13:15:00 malene arcmsr: [ID 419778 kern.notice] arcmsr0: scsi
id=8 lun=0 ccb='0xffffff02e0c8be00' outstanding command timeout
Jul 25 13:15:00 malene arcmsr: [ID 610198 kern.notice] arcmsr0: scsi
id=8 lun=0 fatal error on target, device was gone
Jul 25 13:15:00 malene arcmsr: [ID 419778 kern.notice] arcmsr0: scsi
id=0 lun=0 ccb='0xffffff02e0c92a00' outstanding command timeout
Jul 25 13:15:00 malene arcmsr: [ID 610198 kern.notice] arcmsr0: scsi
id=0 lun=0 fatal error on target, device was gone
Jul 25 13:15:00 malene arcmsr: [ID 658202 kern.warning] WARNING:
arcmsr0: tran reset level=1
Jul 25 13:15:00 malene arcmsr: [ID 658202 kern.warning] WARNING:
arcmsr0: tran reset level=0
Jul 25 13:15:00 malene arcmsr: [ID 419778 kern.notice] arcmsr0: scsi
id=0 lun=5 ccb='0xffffff02e0c97200' outstanding command timeout
Jul 25 13:15:00 malene arcmsr: [ID 610198 kern.notice] arcmsr0: scsi
id=0 lun=5 fatal error on target, device was gone
Jul 25 13:15:00 malene arcmsr: [ID 658202 kern.warning] WARNING:
arcmsr0: tran reset level=1
Jul 25 13:15:00 malene arcmsr: [ID 658202 kern.warning] WARNING:
arcmsr0: tran reset level=0
Jul 25 13:15:00 malene arcmsr: [ID 419778 kern.notice] arcmsr0: scsi
id=1 lun=3 ccb='0xffffff02e0ca0800' outstanding command timeout
Jul 25 13:15:00 malene arcmsr: [ID 610198 kern.notice] arcmsr0: scsi
id=1 lun=3 fatal error on target, device was gone
Jul 25 13:15:00 malene arcmsr: [ID 658202 kern.warning] WARNING:
arcmsr0: tran reset level=1
Jul 25 13:15:00 malene arcmsr: [ID 658202 kern.warning] WARNING:
arcmsr0: tran reset level=0
Jul 25 13:15:00 malene scsi: [ID 107833 kern.warning] WARNING:
/[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci10b5,[EMAIL 
PROTECTED]/pci10b5,[EMAIL PROTECTED]/pci17d3,[EMAIL PROTECTED]/[EMAIL 
PROTECTED],3
(sd8):
Jul 25 13:15:00 malene  offline or reservation conflict

/usr/sbin/zpool status
  pool: ef1
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
        repaired.
 scrub: resilver in progress, 0.02% done, 5606h29m to go
config:

        NAME            STATE     READ WRITE CKSUM
        ef1             DEGRADED     0     0     0
          raidz2        DEGRADED     0     0     0
            spare       ONLINE       0     0     0
              c3t0d0p0  ONLINE       0     0     0
              c3t1d2p0  ONLINE       0     0     0
            c3t0d1p0    ONLINE       0     0     0
            c3t0d2p0    ONLINE       0     0     0
            c3t0d0p0    FAULTED     35 1.61K     0  too many errors
            c3t0d4p0    ONLINE       0     0     0
            c3t0d5p0    DEGRADED     0     0    34  too many errors
            c3t0d6p0    ONLINE       0     0     0
            c3t0d7p0    ONLINE       0     0     0
            c3t1d0p0    ONLINE       0     0     0
            c3t1d1p0    ONLINE       0     0     0
        spares
          c3t1d2p0      INUSE     currently in use

errors: No known data errors

When I try to start cli64 to access the arc-1680-card it hangs as well.

Is this a deficiency in the arcmsr-driver?

-- 
regards
Claus

When lenity and cruelty play for a kingdom,
the gentlest gamester is the soonest winner.

Shakespeare
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to