I finaly found the cause of the error....

Since my disks are mounted in a cassette with four in each I had to disconnect 
all cables to them to replace the crashed disk.

When re-attaching the cables I reversed the order of them by accident. In my 
early tests this was not a problem since zfs identified the disks anyway, 
regardless on what controller the disk was connected to (as long as the 
controllers was listed for the pool).

What happened was some kind of an race condition(?). Since the disk on 
controller c2d0 crashed it was listed as corrupt. But since I connected an 
healthy disk (and already an member of the pool (from controller c3d1)) to c2d0 
instead of the new one that should originally replace the crashed disk my 
problem developed.

Zfs therefor listed the original c2d0 disk as faulty but the realized that 
there was an healthy disk on c2d0, i.e. c2d0 was both ok and faulty!! This 
means that any command acting on c2d0 would not succeed due to both disks 
listed as c2d0.

In this condition there seems to exist no way of telling zfs to discard the 
faulty disk entry since both are assigned the name c2d0 and is/was connected to 
the same controller.

My resolution was (after many hours of moving disks and reboots to find the 
error) to make sure that only the new (not yet assigned) disk was connected to 
c2d0 and none of the other disks that already was assigned to the pool.

I must consider this to be an bug that you can't remove/clear this kind of 
error in zfs to be able to repair your pool.

Due to all my efforts it seems that my pool became corrupted in the end 
(probably due to scrubbing and resilver)  so I have some hours to kill, 
restoring my data......
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to