[zfs-discuss] Pool problem

Robert Thurlow Tue, 20 Mar 2007 04:56:07 -0800

I have this external Firewire box with 4 IDE drives in it, attached to
a Sunblade 2500.  I've built the following pool on them:


banff[1]% zpool status
  pool: pond
 state: ONLINE
 scrub: none requested
config:

        NAME         STATE     READ WRITE CKSUM
        pond         ONLINE       0     0     0
          raidz1     ONLINE       0     0     0
            c10t0d0  ONLINE       0     0     0
            c10t0d1  ONLINE       0     0     0
            c11t0d0  ONLINE       0     0     0
            c11t0d1  ONLINE       0     0     0

errors: No known data errors

I've partly filled it with data to stress it, and now when I run a
'zpool scrub', it gets to this point and stops:

banff[13]# zpool status
  pool: pond
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An

attempt was made to correct the error. Applications areunaffected.

action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub in progress, 35.30% done, 4h3m to go

This is usually followed with repeated complaints on the console:

Mar 19 18:46:30 banff scsi: WARNING:/[EMAIL PROTECTED],600000/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd10):Mar 19 18:46:30 banff Error for Command: read(10) ErrorLevel: RetryableMar 19 18:46:30 banff scsi: Requested Block: 218914280Error Block: 218914280Mar 19 18:46:30 banff scsi: Vendor: WDC WD25Serial Number:

Mar 19 18:46:30 banff scsi:     Sense Key: Media_Error

Mar 19 18:46:30 banff scsi: ASC: 0x4b (data phase error), ASCQ: 0x0,FRU: 0x0

(repeats four times)

Mar 19 18:47:33 banff scsi: WARNING:/[EMAIL PROTECTED],600000/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED],1 (sd11):Mar 19 18:47:33 banff SCSI transport failed: reason 'reset': retryingcommandMar 19 19:01:36 banff scsi: WARNING:/[EMAIL PROTECTED],600000/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd10):

Mar 19 19:01:36 banff   SCSI transport failed: reason 'reset': giving up

Mar 19 19:09:34 banff scsi: WARNING:/[EMAIL PROTECTED],600000/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED],1 (sd11):

Mar 19 19:09:34 banff   SCSI transport failed: reason 'reset': giving up

Mar 19 19:12:37 banff scsi: WARNING:/[EMAIL PROTECTED],600000/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd10):

Mar 19 19:12:37 banff   SCSI transport failed: reason 'reset': giving up

After getting one of these, the machine and the storage never manage
to work it out and resume; all access grinds to a halt until I reboot
the server.  I also see the same blocks failing on the same disks, and
I'm surprised bad block mapping doesn't manage to come into play.  I
tried writing a zero to one of the drives to try to force this, but
it didn't help:

banff[20]# dd if=/dev/zero of=/dev/rdsk/c10t0d0 oseek=218914280 count=1

Help?  I don't understand why zfs isn't handling this.  I do not have
confidence that the external case is my friend here (more about that
in another post), but I'm surprised at this failure mode.

Thanks,
Rob T
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Pool problem

Reply via email to