Hi Sean,

   Do I need to specify the number of executors when submitting the job?  I
suppose the number of executors will be determined by the number of regions
of the table. Just like a MapReduce job, you needn't specify the number of
map tasks when reading from a HBase table.

  The script to submit my job can be seen in my second post. Please refer
to that.



2014-10-08 13:44 GMT+08:00 Sean Owen <so...@cloudera.com>:

> How did you run your program? I don't see from your earlier post that
> you ever asked for more executors.
>
> On Wed, Oct 8, 2014 at 4:29 AM, Tao Xiao <xiaotao.cs....@gmail.com> wrote:
> > I found the reason why reading HBase is too slow.  Although each
> > regionserver serves multiple regions for the table I'm reading, the
> number
> > of Spark workers allocated by Yarn is too low. Actually, I could see that
> > the table has dozens of regions spread over about 20 regionservers, but
> only
> > two Spark workers are allocated by Yarn. What is worse, the two workers
> run
> > one after one. So, the Spark job lost parallelism.
> >
> > So now the question is : Why are only 2 workers allocated?
>

Reply via email to