Hi Guillermo,

What exactly do you mean by "each iteration"?  Are you caching data in
memory?

-Sandy

On Wed, Feb 4, 2015 at 5:02 AM, Guillermo Ortiz <konstt2...@gmail.com>
wrote:

> I execute a job in Spark where I'm processing a file of 80Gb in HDFS.
> I have 5 slaves:
> (32cores /256Gb / 7physical disks) x 5
>
> I have been trying many different configurations with YARN.
> yarn.nodemanager.resource.memory-mb 196Gb
> yarn.nodemanager.resource.cpu-vcores 24
>
> I have tried to execute the job with different number of executors a
> memory (1-4g)
> With 20 executors takes 25s each iteration (128mb) and it never has a
> really long time waiting because GC.
>
> When I execute around 60 executors the process time it's about 45s and
> some tasks take until one minute because GC.
>
> I have no idea why it's calling GC when I execute more executors
> simultaneously.
> The another question it's why it takes more time to execute each
> block. My theory about the this it's because there're only 7 physical
> disks and it's not the same 5 processes writing than 20.
>
> The code is pretty simple, it's just a map function which parse a line
> and write the output in HDFS. There're a lot of substrings inside of
> the function what it could cause GC.
>
> Any theory about?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to