Hi Bharath, What does "iostat -dmx 5" say while you're running the benchmark? Let it print out 10 or 15 lines and copy-paste here.
How do you know the disks have unused bandwidth? Sounds like they're just bottlenecked on seeks. Some upcoming work in 0.94 should give you a good boost here (Dhruba's work to do checksumming at the HBase level) -Todd On Mon, Feb 13, 2012 at 8:43 PM, Bharath Ravi <bharathra...@gmail.com> wrote: > Hi all, > > I have a distributed HBase setup, on which I'm running the > YCSB<https://github.com/brianfrankcooper/YCSB/wiki/running-a-workload>benchmark. > There are 5 region servers, each a Dual core with around 4GB of memory, > connected simply by a 1Gbps ethernet switch. > > The number of "handlers" per regionserver is set to 500 (!) and HDFS's > maximum receivers per datanode is 4096. > > The benchmark dataset is large enough not to fit in memory. > Update/Insert/Write throughput goes up to 8000 ops/sec easily. > However, I see read latencies in the order of seconds, and read throughputs > of only a few 100 ops per second. > > "Top" tells me that the CPU's on regionservers spend 70-80% of their time > waiting for IO, while disk and network > have plenty of unused bandwidth. How could I diagnose where the read > bottleneck is? > > Any help would be greatly appreciated :) > > Thanks in advance! > -- > Bharath Ravi -- Todd Lipcon Software Engineer, Cloudera