[jira] [Commented] (FLINK-10818) RestartStrategies.fixedDelayRestart Occur NoResourceAvailableException: Not enough free slots available to run the job.

sean.miao (JIRA) Sun, 11 Nov 2018 19:09:17 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-10818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16683152#comment-16683152
 ]


sean.miao commented on FLINK-10818:
-----------------------------------

I met the same questions.

yarn cluster has enough resources.

After restart ，both yarn appmaster id or container id did not change。

The  failed jobs list in the same yarn app is （All of the is the same 
Exception！）：

!image-2018-11-12-11-08-13-572.png!

the Exception is ：

!image-2018-11-12-11-05-38-159.png!

> RestartStrategies.fixedDelayRestart Occur  NoResourceAvailableException: Not 
> enough free slots available to run the job.
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-10818
>                 URL: https://issues.apache.org/jira/browse/FLINK-10818
>             Project: Flink
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.6.2
>         Environment: JDK 1.8
> Flink 1.6.0 
> Hadoop 2.7.3
>            Reporter: ambition
>            Priority: Major
>         Attachments: image-2018-11-12-11-05-38-159.png, 
> image-2018-11-12-11-06-33-387.png, image-2018-11-12-11-08-13-572.png
>
>
>  Our Online Flink on Yarn environment operation  job，code set restart tactic 
> like 
> {code:java}
> exeEnv.setRestartStrategy(RestartStrategies.fixedDelayRestart(5,1000l));
> {code}
> But job running some days, Occur Exception is :
> {code:java}
> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: 
> Not enough free slots available to run the job. You can decrease the operator 
> parallelism or increase the number of slots per TaskManager in the 
> configuration. Task to schedule: < Attempt #5 (Source: KafkaJsonTableSource 
> -> Map -> where: (AND(OR(=(app_key, _UTF-16LE'C4FAF9CE1569F541'), =(app_key, 
> _UTF-16LE'F5C7F68C7117630B'), =(app_key, _UTF-16LE'57C6FF4B5A064D29')), 
> OR(=(LOWER(TRIM(FLAG(BOTH), _UTF-16LE' ', os_type)), _UTF-16LE'ios'), 
> =(LOWER(TRIM(FLAG(BOTH), _UTF-16LE' ', os_type)), _UTF-16LE'android')), IS 
> NOT NULL(server_id))), select: (MT_Date_Format_Mode(receive_time, 
> _UTF-16LE'yyyyMMddHHmm', 10) AS date_p, LOWER(TRIM(FLAG(BOTH), _UTF-16LE' ', 
> os_type)) AS os_type, MT_Date_Format_Mode(receive_time, _UTF-16LE'HHmm', 10) 
> AS date_mm, server_id) (1/6)) @ (unassigned) - [SCHEDULED] > with groupID < 
> cbc357ccb763df2852fee8c4fc7d55f2 > in sharing group < 
> 690dbad267a8ff37c8cb5e9dbedd0a6d >. Resources available to scheduler: Number 
> of instances=6, total number of slots=6, available slots=0
>    at 
> org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:281)
>    at 
> org.apache.flink.runtime.jobmanager.scheduler.Scheduler.allocateSlot(Scheduler.java:155)
>    at 
> org.apache.flink.runtime.executiongraph.Execution.lambda$allocateAndAssignSlotForExecution$2(Execution.java:491)
>    at 
> org.apache.flink.runtime.executiongraph.Execution$$Lambda$44/1664178385.apply(Unknown
>  Source)
>    at 
> java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:981)
>    at 
> java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2116)
>    at 
> org.apache.flink.runtime.executiongraph.Execution.allocateAndAssignSlotForExecution(Execution.java:489)
>    at 
> org.apache.flink.runtime.executiongraph.ExecutionJobVertex.allocateResourcesForAll(ExecutionJobVertex.java:521)
>    at 
> org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleEager(ExecutionGraph.java:945)
>    at 
> org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:875)
>    at 
> org.apache.flink.runtime.executiongraph.ExecutionGraph.restart(ExecutionGraph.java:1262)
>    at 
> org.apache.flink.runtime.executiongraph.restart.ExecutionGraphRestartCallback.triggerFullRecovery(ExecutionGraphRestartCallback.java:59)
>    at 
> org.apache.flink.runtime.executiongraph.restart.FixedDelayRestartStrategy$1.run(FixedDelayRestartStrategy.java:68)
>    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>    at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>    at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>    at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>    at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>    at java.lang.Thread.run(Thread.java:745)
> {code}
>  
> this Exception happened when the job started. issue links to 
> https://issues.apache.org/jira/browse/FLINK-4486
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-10818) RestartStrategies.fixedDelayRestart Occur NoResourceAvailableException: Not enough free slots available to run the job.

Reply via email to