Hi Flink mailing list, I am Ilan from Start.io data platform team, need some guidance.
We have a flow with the following use case: * We read files from AWS S3 buckets process them on our cluster and sink the data into files using Flink file sink. * The jobs use always the same jar, we uploaded it to every job manager on the cluster. * We are submitting jobs constantly through the REST API. * Each job reads one or more files from S3. * The jobs can run from 20 seconds up to 3.5 hours. * The jobs run on batch mode * Running flink 1.13.1 * We are running in cluster mode using docker, same machines are being used for task and job manager. We are struggling with the same error, over and over again. We encounter it in the job manager and in the task manager. After a while that the cluster is running and jobs are finishing correctly the task and job manager fail to operate due to: Caused by: java.lang.OutOfMemoryError: unable to create new native thread. We also see some sporadic failure of java.lang.NoClassDefFoundError, not sure it is related. Our set up and configuration are as follow: * 5 nodes cluster running on docker * Relevant memory config: jobmanager.memory.heap.size: 1600m taskmanager.memory.process.size: 231664m taskmanager.memory.network.fraction: 0.3 taskmanager.memory.jvm-metaspace.size: 10g jobmanager.memory.jvm-metaspace.size: 2g taskmanager.memory.framework.off-heap.size: 1g · Host details max locked memory (kbytes, -l) 65536 max memory size (kbytes, -m) unlimited open files (-n) 1024 max user processes (-u) 1547269 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited cat /proc/sys/kernel/threads-max: 3094538 kernel.pid_max = 57344 We try to increase the max user processes, also to increase and decrease the jvm-metaspace. Should we keep increasing the max number of processes on the host, Is there a way to limit the number of threads from flink config? What should we do? Any insights? I can provide more information as needed. Thanks in advance Ilan