can you look at your HBase UI to check whether your job is just reading from a 
single region server? 

Best, 

-- 
Nan Zhu


On Monday, September 29, 2014 at 10:21 PM, Tao Xiao wrote:

> I submitted a job in Yarn-Client mode, which simply reads from a HBase table 
> containing tens of millions of records and then does a count action. The job 
> runs for a much longer time than I expected, so I wonder whether it was 
> because the data to read was too much. Actually, there are 20 nodes in my 
> Hadoop cluster so the HBase table seems not so big (tens of millopns of 
> records). :
> 
> I'm using CDH 5.0.0 (Spark 0.9 and HBase 0.96).
> 
> BTW, when the job was running, I can see logs on the console, and 
> specifically I'd like to know what the following log means:
> 
> > 14/09/30 09:45:20 INFO scheduler.TaskSetManager: Starting task 0.0:20 as 
> > TID 20 on executor 2: b04.jsepc.com (http://b04.jsepc.com) (PROCESS_LOCAL)
> > 14/09/30 09:45:20 INFO scheduler.TaskSetManager: Serialized task 0.0:20 as 
> > 13454 bytes in 0 ms
> > 14/09/30 09:45:20 INFO scheduler.TaskSetManager: Finished TID 19 in 16426 
> > ms on b04.jsepc.com (http://b04.jsepc.com) (progress: 18/86)
> > 14/09/30 09:45:20 INFO scheduler.DAGScheduler: Completed ResultTask(0, 19)
> > 
> 
> Thanks 

Reply via email to