Goog morning Cindy,

> Hi,
> 
> Testing how ZFS reacts to a failed disk can be
> difficult to anticipate
> because some systems don't react well when you remove
> a disk.
I am in the process of finding that out for my systems. That's why I am doing 
these tests. 
> On an
> x4500, for example, you have to unconfigure a disk
> before you can remove
> it.
I have made similar experience already with disks attached over ahci. Still 
zpool status won't recognize that they have been removed immediately or 
sometimes not at all. But that's stuff for another thread.
> 
> Before removing a disk, I would consult your h/w docs
> to see what the
> recommended process is for removing components.
Spec-wise all drives, backplanes, controllers and their drivers I am using 
would support hotplug. Still, ZFS seems to have difficulties.
> 
> Swapping disks between the main pool and the spare
> pool isn't an
> accurate test of a disk failure and a spare kicking
> in.

That's correct. You may want to note that it wasn't subject of my test 
procedure. I have just intentionally mixed up some disks.

> 
> If you want to test a spare in a ZFS storage pool
> kicking in, then yank 
> a disk from the main pool (after reviewing your h/w
> docs) and observe 
> the spare behavior.
I am aware of that procedure. Thanks. 

> If a disk fails in real time, I
> doubt it will be
> when the pool is exported and the system is shutdown.

Agreed. Once again: the export, reboot, import sequence was specifically 
followed to eliminate any side fx of hotplug behaviour.

> 
> In general, ZFS pools don't need to be exported to
> replace failed disks.
> I've seen unpredictable behavior when
> devices/controllers change on live 
> pools. I would review the doc pointer I provided for
> recommended disk
> replacement practices.
> 
> I can't comment on the autoreplace behavior with a
> pool exported and
> a swap of disks. Maybe someone else can. The point of
> the autoreplace
> feature is to allow you to take a new replacement
> disk and automatically
> replace a failed disk without having to use the zpool
> replace command.
> Its not a way to swap existing disks in the same
> pool.

The interesting point about this is to finding out if one will be able to i.e. 
replace a controller with a different type in case of a hardware failure, or 
even just move the physical discs to a different enclosure for any imaginable 
reason. Once again, the naive assumption was that ZFS will automatically find 
the members of a previously exported pool by information (metadata) present on 
each of the pool members (disks, vdevs, files, whatever).
The situation now after scrub has finished is that the pool reports without any 
"known data errors", but still with the dubious reporting of the same device 
c7t11d0 both in available Spare status and online pool member at the same time. 
The status sticks with another export/import cycle (this time without an 
intermediate reboot).
The next steps for me will be to change the controller with a mpt driven type 
and rebuild the pool from scratch. Then I may repeat the test.
Thanks so far for your support. I have learned a lot.

Regards,

Sebastian
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to