I'm using Flink 1.4.2 and running Flink on Yarn. Job runs with a parallelism of 2. Each task manager is allocated 1 core. When the container memory exceeds the allocated memory yarn kills the container as expected.
{"debug_level":"INFO","debug_timestamp":"2018-12-04 15:52:29,276","debug_thread":"flink-akka.actor.default-dispatcher-17","debug_file":"YarnFlinkResourceManager.java", "debug_line":"545","debug_message":"Diagnostics for container container_1528884788062_18043_01_000002 in state COMPLETE : exitStatus=Pmem limit exceeded (-104) diagnostics=Container [pid=29271,containerID=container_1528884788062_18043_01_000002] is running beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical memory used; 13.4 GB of 2.1 GB virtual memory used. Killing container. The job manager then tries to start a new task manager , but fails with the following error. Why is the job manager not able to allocated a new task manager when there's a lot of resource in the cluster. Flink tries to re-deploy the it 5 times as per set restart strategy and then fails the job. Can someone point me in the correct direction here to debug the issue. Thanks! org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism or increase the number of slots per TaskManager in the configuration. Task to schedule: < Attempt #5 (Source: Custom Source -> from: (zoneId, cityId, time_stamp) -> select: (DeliveryZoneFromId(zoneId) AS zone, CityFromCityId(cityId) AS city, +(CAST(time_stamp), 19800000) AS time_stamp) -> to: Row -> Sink: Unnamed (2/2)) @ (unassigned) - [SCHEDULED] > with groupID < cbc357ccb763df2852fee8c4fc7d55f2 > in sharing group < SlotSharingGroup [cbc357ccb763df2852fee8c4fc7d55f2] >. Resources available to scheduler: Number of instances=1, total number of slots=1, available slots=0 at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:263) at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.allocateSlot(Scheduler.java:142) at org.apache.flink.runtime.executiongraph.Execution.lambda$allocateAndAssignSlotForExecution$1(Execution.java:440) at java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:981) at java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2124) at org.apache.flink.runtime.executiongraph.Execution.allocateAndAssignSlotForExecution(Execution.java:438) at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.allocateResourcesForAll(ExecutionJobVertex.java:503) at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleEager(ExecutionGraph.java:900) at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:854) at org.apache.flink.runtime.executiongraph.ExecutionGraph.restart(ExecutionGraph.java:1175) at org.apache.flink.runtime.executiongraph.restart.ExecutionGraphRestartCallback.triggerFullRecovery(ExecutionGraphRestartCallback.java:59) at org.apache.flink.runtime.executiongraph.restart.FixedDelayRestartStrategy$1.run(FixedDelayRestartStrategy.java:68) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/