Re: [zfs-discuss] ZFS Not Offlining Disk on SCSI Sense Error (X4500)

2008-01-03 Thread Jason J. W. Williams
Hi Eric, Hard to say. I'll use MDB next time it happens for more info. The applications using any zpool lock up. -J On Jan 3, 2008 3:33 PM, Eric Schrock <[EMAIL PROTECTED]> wrote: > When you say "starts throwing sense errors," does that mean every I/O to > the drive will fail, or some arbitrary

Re: [zfs-discuss] ZFS Not Offlining Disk on SCSI Sense Error (X4500)

2008-01-03 Thread Eric Schrock
When you say "starts throwing sense errors," does that mean every I/O to the drive will fail, or some arbitrary percentage of I/Os will fail? If it's the latter, ZFS is trying to do the right thing by recognizing these as transient errors, but eventually the ZFS diagnosis should kick in. What doe

Re: [zfs-discuss] ZFS Not Offlining Disk on SCSI Sense Error (X4500)

2008-01-03 Thread Eric Schrock
This should be pretty much fixed on build 77. It will lock up for the duration of a single command timeout, but ZFS should recover quickly without queueing up additional commands. Since the default timeout is 60 seconds, and we retry 3 times, and we do a probe afterwards, you may see hangs of up

Re: [zfs-discuss] ZFS Not Offlining Disk on SCSI Sense Error (X4500)

2008-01-03 Thread Jason J. W. Williams
Hi Eric, I'd really like to suggest a helpful idea, but all I can suggest is an end result. Running ZFS on top of STK arrays doing the RAID, they offline their bad disks very quickly and the applications never notice. In the X4500s, ZFS times out and locks up the applications. If ZFS is going to b

Re: [zfs-discuss] ZFS Not Offlining Disk on SCSI Sense Error (X4500)

2008-01-03 Thread Jason J. W. Williams
Hi Albert, Thank you for the link. ZFS isn't offlining the disk in b77. -J On Jan 3, 2008 3:07 PM, Albert Chin <[EMAIL PROTECTED]> wrote: > > On Thu, Jan 03, 2008 at 02:57:08PM -0700, Jason J. W. Williams wrote: > > There seems to be a persistent issue we have with ZFS where one of the > > SATA

Re: [zfs-discuss] ZFS Not Offlining Disk on SCSI Sense Error (X4500)

2008-01-03 Thread Albert Chin
On Thu, Jan 03, 2008 at 02:57:08PM -0700, Jason J. W. Williams wrote: > There seems to be a persistent issue we have with ZFS where one of the > SATA disk in a zpool on a Thumper starts throwing sense errors, ZFS > does not offline the disk and instead hangs all zpools across the > system. If it is

[zfs-discuss] ZFS Not Offlining Disk on SCSI Sense Error (X4500)

2008-01-03 Thread Jason J. W. Williams
Hello, There seems to be a persistent issue we have with ZFS where one of the SATA disk in a zpool on a Thumper starts throwing sense errors, ZFS does not offline the disk and instead hangs all zpools across the system. If it is not caught soon enough, application data ends up in an inconsistent s