At 10:14 5/2/2005, Arne "Wörner" wrote:
--- Allen <[EMAIL PROTECTED]> wrote:
> Also you should keep in mind, there could simply be some really
> goofy
> controller option enabled, that forces the RAID5 to behave in a
> "degraded"
> state for reads -- forcing it to read up all the other disks in
> the stripe
> and calculate the XOR again, to make sure the data it read off
> the disk
> matches the checksum.  It's rare, but I've seen it before, and
> it will
> cause exactly this sort of RAID5 performance inversion.  Since
> the XOR is
> recalculated on every write and requires only reading up one
> sector on a
> different disk, options that do the above will result in read
> scores
> drastically lower than writes to the same array.
>
Isn't that compensated by the cache? I mean:
We would just
1. read all the blocks, that correspond to the block, that is
requested,
2. put them all into the cache
3. check the parity bits (XOR should be very fast; especially in
comparison to the disc read times)
4. keep them in the cache (some kind of read ahead...)
5. send the requested block to the driver

Your steps are appropriate but you should note that #3 is not true on cards that support RAID5 but do not have hardware-XOR. Some of the very cheap i960 based cards have this failing, so the XOR itself is slow on top of everything else.


However, the cache doesn't play a part in this at all. It's the difference between these two read cycles, assume just one block was written to out of 4, on a 5-disk system.

Scenario A, verified read disabled:
1. RAID card reads up one block from appropriate drive.  Done.

Scenario B, verified read enabled:
1. RAID card reads up ALL blocks in the stripe (5 reads).
2. RAID card pretends the block requested is on a "degraded" drive, and calculates it from the other 3 + the XOR stripe.
3. RAID card reports the value back, or tosses some kind of error.


You can see, the cache just doesn't play a part in what I was describing, which is basically the array performing as though it is degraded when in fact it is not, to catch failures that would otherwise be missed.


_______________________________________________ freebsd-performance@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-performance To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to