I wrote:
I have been experimenting with a softraid mirror, using two cheap SATA
disks.
In general the performance is very good - except when rebuilding. A quick
set of sums suggests that the problem is seek time.
The disks are 7200rpm, therefore one can hope for 120 seeks per second.
"systat iostat" (while rebuilding) gives this (edited to fit in an email):
DEVICE READ WRITE RTPS WTPS SEC
wd0 3801088 3827302 58 59 1.0
wd1 0 3801088 0 58 0.0
sd0 0 0 0 0 0.0
Totals 3801088 7628390 58 117 1.0
[snip]
I've patched src/sys/dev/softraid_raid1.c so that my machine doesn't write
back to the source chunk(s) when rebuilding - making sure that the other
writes go to both disks.
Now my rebuild speed is much better:
DEVICE READ WRITE RTPS WTPS SEC
wd0 61525196 9830 938 0 0.5
wd1 0 61525196 0 938 0.5
sd0 0 0 0 0 0.0
Totals 61525196 61535027 938 939 1.0
---
FWIW: I found that the rebuild code is (currently) only used by raid1, so
there is no benefit in increasing the buffer size for all rebuild vs tweaking
the rebuild logic.
---
I've tested this patch on a virtual-box machine with a 3 way raid 1 mirror
and it seems to do sensible things there. (i.e. reads from the good disks,
rebuild writes go only to the bad disk, other writes go to all disks and no
data corruption).
I'm now running it on my (experimental) real machine... and the performance
increase is substantial, as you can see above. (Fingers crossed that my
data will be safe).
The patch feels a bit "copy-and-paste"y, and I'll tidy it up in a day or
so (and repost). But if you want to risk it, here it is:
xerxes:~/new_src/src/sys/dev$ cvs diff -u softraid_raid1.c
Index: softraid_raid1.c
===================================================================
RCS file: /cvs/src/sys/dev/softraid_raid1.c,v
retrieving revision 1.23
diff -u softraid_raid1.c
--- softraid_raid1.c 26 Mar 2010 11:20:34 -0000 1.23
+++ softraid_raid1.c 14 Apr 2010 21:10:55 -0000
@@ -447,13 +447,25 @@
}
} else {
/* writes go on all working disks */
+ /* XXX - copy and paste code warning */
x = i;
scp = sd->sd_vol.sv_chunks[x];
switch (scp->src_meta.scm_status) {
- case BIOC_SDONLINE:
case BIOC_SDSCRUB:
case BIOC_SDREBUILD:
b->b_flags |= B_WRITE;
+ break;
+
+ case BIOC_SDONLINE:
+ /* when rebuilding, take care to only send the
+ * rebuild writes to disks that need them
+ */
+ if (wu->swu_flags & SR_WUF_REBUILD) {
+ wu->swu_io_count--;
+ sr_ccb_put(ccb);
+ continue;
+ } else
+ b->b_flags |= B_WRITE;
break;
case BIOC_SDHOTSPARE: /* should never happen */