Reading from HBase is too slow

Tao Xiao Mon, 29 Sep 2014 19:22:36 -0700

I submitted a job in Yarn-Client mode, which simply reads from a HBase
table containing tens of millions of records and then does a *count *action.
The job runs for a much longer time than I expected, so I wonder whether it
was because the data to read was too much. Actually, there are 20 nodes in
my Hadoop cluster so the HBase table seems not so big (tens of millopns of
records). :


I'm using CDH 5.0.0 (Spark 0.9 and HBase 0.96).

BTW, when the job was running, I can see logs on the console, and
specifically I'd like to know what the following log means:

14/09/30 09:45:20 INFO scheduler.TaskSetManager: Starting task 0.0:20 as
TID 20 on executor 2: b04.jsepc.com (PROCESS_LOCAL)
14/09/30 09:45:20 INFO scheduler.TaskSetManager: Serialized task 0.0:20 as
13454 bytes in 0 ms
14/09/30 09:45:20 INFO scheduler.TaskSetManager: Finished TID 19 in 16426
ms on b04.jsepc.com (progress: 18/86)
14/09/30 09:45:20 INFO scheduler.DAGScheduler: Completed ResultTask(0, 19)


Thanks

Reading from HBase is too slow

Reply via email to