I think it might be worse case that you read all the disks. If your block size is large enough to hold an entire row, you should only have to read one disk to get that data.
I for instance, stopped using multiple data directories and instead use a RAID0. The number of blocks read is not the same for all the disks as you suggest it would be if every disk was involved in every transaction. Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda1 11.80 1.60 105.60 8 528 sdb 17.20 867.20 0.00 4336 0 sdc 2.60 0.00 155.20 0 776 sdd 16.40 796.80 0.00 3984 0 sde 21.80 1113.60 8.00 5568 40 md0 56.00 2777.60 8.00 13888 40 sdb, sdd and sdd are raided on md0 on an ec2 xlarge instance, the number of blockes is different. Of course my rows are small (1-2 Kb), so I should rarely cross a block boundary, with 1MB rows you are more likely to, so multiple data directories might be better for you. I think it all sort of depends on your data size. -Anthony On Mon, Apr 26, 2010 at 10:09:58PM +0200, Roland H?nel wrote: > RAID0 decreases the performance of muliple, concurrent random reads because > for each read request (I assume that at least a couple of stripe sizes are > read), all hard disks are involved in that read. > > Consider the following example: you want to read 1MB out of each of two > files > > a) both files are on the same RAID0 of two disks. For the first 1MB read > request, both disks contain some stripes of this request, both disks have to > move their heads to the correct location and do the read. The second read > request has to wait until the first one finishes, because it is served from > the same disks and depends on the same disk heads. > > b) files are on seperate disks. Both reads can be done at the same time, > because disk heads can move independently. > > Or look at it this way: if you issue a read request on a RAID0, and your > disks have 8ms access time, then after the read request, the whole RAID0 is > completely blocked for 8ms. If you handle the disks independently, only the > disk containing the file is blocked. > > RAID0 has its advantages of course. Streaming reads/writes (e.g. during a > compaction) will be extremely fast. > > -Roland > > > 2010/4/26 Paul Prescod <p...@prescod.net> > > > 2010/4/26 Roland Hänel <rol...@haenel.me>: > > > Ryan, I agree with you on the hot spots, however for the physical disk > > > performance, even the worst case hot spot is not worse than RAID0: in a > > hot > > > spot scenario, it might be that 90% of your reads go to one hard drive. > > But > > > with RAID0, 100% of your reads will go to *all* hard drives. > > > > RAID0 is designed specifically to improve performance (both latency > > and bandwidth). I'm unclear about why you think it would decrease > > importance. Perhaps you're thinking of another RAID type? > > > > Paul Prescod > > -- ------------------------------------------------------------------------ Anthony Molinaro <antho...@alumni.caltech.edu>