Hi Laurent,

Yes, you should able to offline a faulty device in a redundant
configuration as long as enough devices are available to keep
the pool redundant.

On my Solaris Nevada system (latest bits), injecting a fault
into a disk in a RAID-Z configuration and then offlining a disk
works as expected.

On my Solaris 10 system, I'm unable to offline a faulted disk in
a RAID-Z configuration so I will get back to you with a bug ID
or some other plausible explanation.

Thanks for reporting this problem.

Cindy




Laurent Blume wrote:
You could offline the disk if [b]this[/b] disk (not
the pool) had a replica. Nothing wrong with the
documentation. Hmm, maybe it is little misleading
here. I walked into the same "trap".


I apologize for being daft here, but I don't find any ambiguity in the 
documentation.
This is explicitly stated as being possible.

"This scenario is possible assuming that the systems in question see the storage 
once it is attached to the new switches, possibly through different controllers than 
before, and your pools are set up as RAID-Z or mirrored configurations."

And lower, it even says that it's not possible to offline two devices in a 
RAID-Z, with that exact error as an example:

"You cannot take a pool offline to the point where it becomes faulted. For 
example, you cannot take offline two devices out of a RAID-Z configuration, nor can 
you take offline a top-level virtual device.

# zpool offline tank c1t0d0
cannot offline c1t0d0: no valid replicas
"

http://docs.sun.com/app/docs/doc/819-5461/gazgm?l=en&a=view

I don't understand what you mean by this disk not having a replica. It's 
RAID-Z2: by definition, all the data it contains is replicated on two other 
disks in the pool. That's why the pool is still working fine.


The pool is not using the disk anymore anyway, so
(from the zfs point of view) there is no need to
offline the disk. If you want to stop the io-system
from trying to access the disk, pull it out or wait
until it gives up...


Yes, there is. I don't want the disk to become online if the system reboots, 
because what actually happens is that it *never* gives up (well, at least not 
in more than 24 hours), and all I/O to the zpool stop as long as there are 
those errors. Yes, I know it should continue working. In practice, it does not 
(though it used to be much worse in previous versions of S10, with all I/O 
stopping on all disks and volumes, both ZFS and UFS, and usually ending in a 
panic).
And the zpool command hangs, and never finished. The only way to get out of it 
is to use cfgadm to send multiple hardware resets to the SATA device, then 
disconnect it. At this point, zpool completes and shows the disk as having 
faulted.


Laurent
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to