Hi Sean, Do I need to specify the number of executors when submitting the job? I suppose the number of executors will be determined by the number of regions of the table. Just like a MapReduce job, you needn't specify the number of map tasks when reading from a HBase table.
The script to submit my job can be seen in my second post. Please refer to that. 2014-10-08 13:44 GMT+08:00 Sean Owen <so...@cloudera.com>: > How did you run your program? I don't see from your earlier post that > you ever asked for more executors. > > On Wed, Oct 8, 2014 at 4:29 AM, Tao Xiao <xiaotao.cs....@gmail.com> wrote: > > I found the reason why reading HBase is too slow. Although each > > regionserver serves multiple regions for the table I'm reading, the > number > > of Spark workers allocated by Yarn is too low. Actually, I could see that > > the table has dozens of regions spread over about 20 regionservers, but > only > > two Spark workers are allocated by Yarn. What is worse, the two workers > run > > one after one. So, the Spark job lost parallelism. > > > > So now the question is : Why are only 2 workers allocated? >