Re: Why is 1 executor overworked and other sit idle?

Richard Eggert Tue, 22 Sep 2015 17:09:54 -0700

If there's only one partition, by definition it will only be handled by one
executor. Repartition to divide the work up. Note that this will also
result in multiple output files,  however. If you absolutely need them to
be combined into a single file,  I suggest using the Unix/Linux 'cat'
command to concatenate the files afterwards.


Rich
On Sep 22, 2015 9:20 AM, "Ted Yu" <yuzhih...@gmail.com> wrote:

> Have you tried using repartition to spread the load ?
>
> Cheers
>
> On Sep 22, 2015, at 4:22 AM, Chirag Dewan <chirag.de...@ericsson.com>
> wrote:
>
> Hi,
>
>
>
> I am using Spark to access around 300m rows in Cassandra.
>
>
>
> My job is pretty simple as I am just mapping my row into a CSV format and
> saving it as a text file.
>
>
>
>
>
> public String call(CassandraRow row)
>
>
> throws Exception {
>
>
> StringBuilder sb = new StringBuilder();
>
>
> sb.append(row.getString(10));
>
>
> sb.append(",");
>
>
> sb.append(row.getString(11));
>
>
> sb.append(",");
>
>
> sb.append(row.getString(8));
>
>
> sb.append(",");
>
>
> sb.append(row.getString(7));
>
>
>
>                                                                 return
> sb.toString();
>
> }
>
>
>
> My map methods looks like this.
>
>
>
> I am having a 3 node cluster. I observe that driver starts on Node A. And
> executors are spawned on all 3 nodes. But the executor of Node B or C are
> doing all the tasks. It starts a saveasTextFile job with 1 output partition
> and stores the RDDs in memory and also commits the file on local file
> system.
>
>
>
> This executor is using a lot of system memory and CPU while others are
> sitting idle.
>
>
>
> Am I doing something wrong? Is my RDD correctly partitioned?
>
>
>
> Thanks in advance.
>
>
>
>
>
> Chirag
>
>

Re: Why is 1 executor overworked and other sit idle?

Reply via email to