[zfs-discuss] Resilver observations

Ian Collins Tue, 30 Mar 2010 13:40:14 -0700

I've lost a few drives on a thumper I look after in the past week andI've noticed a couple of issues with the resilver process that could beimproved (or maybe have, the system is running Solaris 10 update 8).

1) While the pool has been resilvering, I have been copying a large(2TB) filesystem from another box. Performance was OK for the initialsend (45MB/s), but is pretty terrible for incrementals. It looks likethe issue is latency, sending an empty snapshot usually gets a responsein under a second. During a resilver, it can take 30-40 seconds torespond. I do a daily incremental send of a filesystem set with about4000 small snapshots (1000 users, 6 hourly snaps) which normally takesabout 3 hours, while resilvering, it barely gets 10% through in a day.

2) If a drive fails in one vdev while a drive in another is resilvering,both resilvers start over. I have yet to complete a resilver on thefirst drive to fail due to others failing when the current resilver is90% done! Is it necessary to restart if a drive fails on another vdev?As more drives fail, the resilvers get progressively longer.

3) Detaching a failed device while a spare is resilvering causes theresilver to restart. Is that necessary?


Thanks,

--
Ian.

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Resilver observations

Reply via email to