Hi,

I have a pool of 22 1T SATA disks in a RAIDZ3 configuration. It is filled with 
files of an average size of 2MB. I filled it randomly to resemble the expected 
workload in production use.
Problems arise when I try to scrub/resilver this pool. This operation takes the 
better part of a week (!). During this time the disk being resilvered is at 
100% utilisation with >300 writes/s, but only 3MB/s, which is only about 3% of 
its best case performance.
Having a window of one week with degraded redundancy is intolerable. It is 
quite likely that one loses more disks during this period, eventually leading 
to a total loss of the pool, not to mention the degraded performance during 
this period. In fact, in previous tests I lost a pool in a 6x11 RAIDZ2 
configuration.

I skimmed through the code of resilver and found out that it just enumerates 
all object in the pool and checks them one by one, having maxinflight 
I/O-request in parallel. Because this does not take the order of data ondisk 
into account it leads to this pathological performance. Also I found Bug 
6678033 stating that a prefetch might fix this.

Now my questions:
1) Are there tunings that could speed up resilver, possibly with a negative 
effect on normal performance? I thought of raising recordsize to the expected 
filesize of 2MB. Could this help?
2) What is the state of the fix? When will it be ready?
3) Do you have any configuration hints for setting up a pool layout which might 
help resilver performance? (aside from using hardware RAID instead of RAIDZ)

Thanks for any hints.
sensille
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to