Hi,

I am running Flink on a cluster with 24 workers, each with 16 cores.
Starting the cluster works fine and the Web interface confirms there are 384
slots working. Executing my code with parallelism 24 works fine, but when I
try a higher parallelism, eg. 384, the job never succeeds in submitting.
Also submitting from the web interface does not start the job, nor gives any
errors. I also tried starting 4 1-slot taskmanagers on each machine, and
executing with parallelism 96, but same problem. The code is not very
complicated, with the logical graph having only 3 steps. 
Attached is a file with the jstacks of the CliFrontend that is using CPU,
and the StandaloneSessionClusterEntrypoint, as well as the jstack of the
TaskManagerRunner on a remote machine(cloud-12). The jstacks are all from
this last scenario, when executing from command line.
 
My relevant conf is as follows: 

queryable-state.enable: true
jobmanager.rpc.address: cloud-11
jobmanager.rpc.port: 6123
taskmanager.heap.mb: 28672
jobmanager.heap.mb: 14240
taskmanager.memory.fraction: 0.7
taskmanager.network.numberOfBuffers: 16384
taskmanager.network.bufferSizeInBytes: 16384
taskmanager.memory.task.off-heap.size: 4000m
taskmanager.memory.managed.size: 10000m
#taskmanager.numberOfTaskSlots: 16 #for normal setup
taskmanager.numberOfTaskSlots: 1 #for when setting multiple taskmanagers per
machine. 

Am I doing something wrong?
Thanks in advance!

  jstack.jstack
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t2502/jstack.jstack>
  



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Reply via email to