Hi Flink mailing list,

I am Ilan from Start.io data platform team, need some guidance.

We have a flow with the following use case:


  *   We read files from AWS S3 buckets process them on our cluster and sink 
the data into files using Flink file sink.
  *   The jobs use always the same jar, we uploaded it to every job manager on 
the cluster.
  *   We are submitting jobs constantly through the REST API.
  *   Each job reads one or more files from S3.
  *   The jobs can run from 20 seconds up to 3.5 hours.
  *   The jobs run on batch mode
  *   Running flink 1.13.1
  *   We are running in cluster mode using docker, same machines are being used 
for task and job manager.

 We are struggling with the same error, over and over again. We encounter it in 
the job manager and in the task manager.

After a while that the cluster is running and jobs are finishing correctly the 
task and job manager fail to operate due to:
Caused by: java.lang.OutOfMemoryError: unable to create new native thread.


We also see some sporadic failure of java.lang.NoClassDefFoundError, not sure 
it is related.

Our set up and configuration are as follow:
*         5 nodes cluster running on docker
*         Relevant memory config:
jobmanager.memory.heap.size: 1600m
taskmanager.memory.process.size: 231664m
taskmanager.memory.network.fraction: 0.3
taskmanager.memory.jvm-metaspace.size: 10g
jobmanager.memory.jvm-metaspace.size: 2g
taskmanager.memory.framework.off-heap.size: 1g

·         Host details
max locked memory  (kbytes, -l) 65536
max memory size       (kbytes, -m) unlimited
open files                     (-n) 1024
max user processes    (-u) 1547269
virtual memory           (kbytes, -v) unlimited
file locks                       (-x) unlimited

cat /proc/sys/kernel/threads-max: 3094538
kernel.pid_max = 57344


We try to increase the max user processes, also to increase and decrease the 
jvm-metaspace.

Should we keep increasing the max number of processes on the host, Is there a 
way to limit the number of threads from flink config?

What should we do? Any insights?
I can provide more information as needed.

Thanks in advance

 Ilan

Reply via email to