First let me defend softraid. The rebuild code is designed to offer maximum data protection. With this is mind certain assumptions were made.
That said, I am not opposed to a patch to improve performance but with all things softraid corner-cases are many and complicated. A valid patch will keep (at least) the following things in mind: * Read/Write failure on valid disks * Colliders in front or back of rebuild IO * Multi chunk failures The code also SHALL reuse normal IO paths. This code can not be made much more complicated before violating the KISS principle. The patch bellow is a little iffy because I don't know (yet) how this would manifest itself in the afore mentioned cases. But, improve the patch and make sure you cover the corner cases and we might have a winner. /marco On Wed, Apr 14, 2010 at 09:25:25PM +0000, Matthew Roberts wrote: > I wrote: >> I have been experimenting with a softraid mirror, using two cheap SATA >> disks. >> In general the performance is very good - except when rebuilding. A quick >> set of sums suggests that the problem is seek time. >> >> The disks are 7200rpm, therefore one can hope for 120 seeks per second. >> "systat iostat" (while rebuilding) gives this (edited to fit in an email): >> >> DEVICE READ WRITE RTPS WTPS SEC >> wd0 3801088 3827302 58 59 1.0 >> wd1 0 3801088 0 58 0.0 >> sd0 0 0 0 0 0.0 >> Totals 3801088 7628390 58 117 1.0 > > [snip] > > I've patched src/sys/dev/softraid_raid1.c so that my machine doesn't write > back to the source chunk(s) when rebuilding - making sure that the other > writes go to both disks. > > Now my rebuild speed is much better: > > DEVICE READ WRITE RTPS WTPS SEC > wd0 61525196 9830 938 0 0.5 > wd1 0 61525196 0 938 0.5 > sd0 0 0 0 0 0.0 > Totals 61525196 61535027 938 939 1.0 > > --- > > FWIW: I found that the rebuild code is (currently) only used by raid1, so > there is no benefit in increasing the buffer size for all rebuild vs tweaking > the rebuild logic. > > --- > > I've tested this patch on a virtual-box machine with a 3 way raid 1 mirror > and it seems to do sensible things there. (i.e. reads from the good disks, > rebuild writes go only to the bad disk, other writes go to all disks and no > data corruption). > > I'm now running it on my (experimental) real machine... and the performance > increase is substantial, as you can see above. (Fingers crossed that my > data will be safe). > > The patch feels a bit "copy-and-paste"y, and I'll tidy it up in a day or > so (and repost). But if you want to risk it, here it is: > > xerxes:~/new_src/src/sys/dev$ cvs diff -u softraid_raid1.c > Index: softraid_raid1.c > =================================================================== > RCS file: /cvs/src/sys/dev/softraid_raid1.c,v > retrieving revision 1.23 > diff -u softraid_raid1.c > --- softraid_raid1.c 26 Mar 2010 11:20:34 -0000 1.23 > +++ softraid_raid1.c 14 Apr 2010 21:10:55 -0000 > @@ -447,13 +447,25 @@ > } > } else { > /* writes go on all working disks */ > + /* XXX - copy and paste code warning */ > x = i; > scp = sd->sd_vol.sv_chunks[x]; > switch (scp->src_meta.scm_status) { > - case BIOC_SDONLINE: > case BIOC_SDSCRUB: > case BIOC_SDREBUILD: > b->b_flags |= B_WRITE; > + break; > + > + case BIOC_SDONLINE: > + /* when rebuilding, take care to only send the > + * rebuild writes to disks that need them > + */ > + if (wu->swu_flags & SR_WUF_REBUILD) { > + wu->swu_io_count--; > + sr_ccb_put(ccb); > + continue; > + } else > + b->b_flags |= B_WRITE; > break; > > case BIOC_SDHOTSPARE: /* should never happen */