Re: ERROR TaskSchedulerImpl: Lost an executor

Daniel Darabos Tue, 22 Apr 2014 15:52:25 -0700

On Wed, Apr 23, 2014 at 12:06 AM, jaeholee <jho...@lbl.gov> wrote:

> How do you determine the number of partitions? For example, I have 16
> workers, and the number of cores and the worker memory set in spark-env.sh
> are:
>
> CORE = 8
> MEMORY = 16g
>


So you have the capacity to work on 16 * 8 = 128 tasks at a time. If you
use less than 128 partitions, some CPUs will go unused, and the job will
complete more slowly than it could. But depending on what you are actually
doing with the data, you may want to use much more partitions than that. I
find it fairly difficult to estimate the memory use -- it is much easier to
just try with more partitions and see if it is enough :). The overhead for
using more partitions than necessary is pretty small.

The .csv data I have is about 500MB, but I am eventually going to use a file
> that is about 15GB.
>
> Is the MEMORY variable in spark-env.sh different from spark.executor.memory
> that you mentioned? If they're different, how do I set
> spark.executor.memory?
>

It is probably the same thing. Sorry for confusing you. To be sure, check
the Spark master web UI while the application is connected. You will see
how much RAM is used per worker on the main page. You can set
"spark.executor.memory" through SparkConf.set("spark.executor.memory",
"10g") when creating the SparkContext.

Re: ERROR TaskSchedulerImpl: Lost an executor

Reply via email to