My guess would be that you have a thread leak in the user code.
More memory will not solve the problem, only push it a bit further away.

On Mon, Aug 1, 2016 at 9:15 PM, Paulo Cezar <paulo.ce...@gogeo.io> wrote:

> Hi folks,
>
>
> I'm trying to run a DataSet program but after around 200k records are 
> processed a "java.lang.OutOfMemoryError: unable to create new native thread" 
> stops me.
>
>
> I'm deploying Flink (via bin/yarn-session.sh) on a YARN cluster with 10 nodes 
> (each with 8 cores) and starting 10 task managers, each with 8 slots and 6GB 
> of RAM.
>
>
> Except for the data sink that writes to HDFS and runs with a parallelism of 
> 1, my job runs with a parallelism of 80 and has two input datasets, each is a 
> HDFS file with around 6GB and 20mi lines. Most of my map functions uses 
> external services via RPC or REST APIs to enrich the raw data with info from 
> other sources.
>
> Might I be doing something wrong or I really should have more memory 
> available?
>
> Thanks,
> Paulo Cezar
>
>

Reply via email to