You do need to specify the number of executor cores to use. Executors are not like mappers. After all they may do much more in their lifetime than just read splits from HBase so would not make sense to determine it by something that the first line of the program does. On Oct 8, 2014 8:00 AM, "Tao Xiao" <xiaotao.cs....@gmail.com> wrote:
> Hi Sean, > > Do I need to specify the number of executors when submitting the job? > I suppose the number of executors will be determined by the number of > regions of the table. Just like a MapReduce job, you needn't specify the > number of map tasks when reading from a HBase table. > > The script to submit my job can be seen in my second post. Please refer > to that. > > > > 2014-10-08 13:44 GMT+08:00 Sean Owen <so...@cloudera.com>: > >> How did you run your program? I don't see from your earlier post that >> you ever asked for more executors. >> >> On Wed, Oct 8, 2014 at 4:29 AM, Tao Xiao <xiaotao.cs....@gmail.com> >> wrote: >> > I found the reason why reading HBase is too slow. Although each >> > regionserver serves multiple regions for the table I'm reading, the >> number >> > of Spark workers allocated by Yarn is too low. Actually, I could see >> that >> > the table has dozens of regions spread over about 20 regionservers, but >> only >> > two Spark workers are allocated by Yarn. What is worse, the two workers >> run >> > one after one. So, the Spark job lost parallelism. >> > >> > So now the question is : Why are only 2 workers allocated? >> > >