Hi all, I'm fairly new to HBase and was a bit surprised by performance I am seeing for a range scan. I am running a range scan over ~3 million rows in an HBase cluster with 4 region servers, each fairly large instances on AWS (24 HDD). I'm pulling a single float value from each row and computing the average.
When I run this range scan, it takes ~.5sec to execute, and there is not much performance improvement. This seems long to me. 3 million floats should take maybe 10-20MB to read from disk and transfer, and it should be much faster the second time around since the data supposed to be in memory in block cache at that point. Additionally, I tried running a number of these range scans concurrently against HBase. Again, the performance seemed worse than I expected. The average execution time goes up quite a bit at what seems like low QPS. For example at 1 QPS, the average response time is several seconds. Are these performance numbers typical? or is there some user error that is causing them to be worse than normal? Thanks, Marcell