[jira] [Commented] (FLINK-10818) RestartStrategies.fixedDelayRestart Occur NoResourceAvailableException: Not enough free slots available to run the job.

Till Rohrmann (JIRA) Thu, 08 Nov 2018 13:21:10 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-10818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16679502#comment-16679502
 ]


Till Rohrmann commented on FLINK-10818:
---------------------------------------

Could you check whether your Yarn cluster had actually the required resources? 
If you have other jobs running in your cluster, then it could happen that they 
take the required resources. Moreover, you could check whether the problem also 
occurs with Flink {{1.6.2}} and the new mode (not legacy).

> RestartStrategies.fixedDelayRestart Occur  NoResourceAvailableException: Not 
> enough free slots available to run the job.
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-10818
>                 URL: https://issues.apache.org/jira/browse/FLINK-10818
>             Project: Flink
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.6.2
>         Environment: JDK 1.8
> Flink 1.6.0 
> Hadoop 2.7.3
>            Reporter: ambition
>            Priority: Major
>
>  Our Online Flink on Yarn environment operation  job，code set restart tactic 
> like 
> {code:java}
> exeEnv.setRestartStrategy(RestartStrategies.fixedDelayRestart(5,1000l));
> {code}
> But job running some days, Occur Exception is :
> {code:java}
> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: 
> Not enough free slots available to run the job. You can decrease the operator 
> parallelism or increase the number of slots per TaskManager in the 
> configuration. Task to schedule: < Attempt #5 (Source: KafkaJsonTableSource 
> -> Map -> where: (AND(OR(=(app_key, _UTF-16LE'C4FAF9CE1569F541'), =(app_key, 
> _UTF-16LE'F5C7F68C7117630B'), =(app_key, _UTF-16LE'57C6FF4B5A064D29')), 
> OR(=(LOWER(TRIM(FLAG(BOTH), _UTF-16LE' ', os_type)), _UTF-16LE'ios'), 
> =(LOWER(TRIM(FLAG(BOTH), _UTF-16LE' ', os_type)), _UTF-16LE'android')), IS 
> NOT NULL(server_id))), select: (MT_Date_Format_Mode(receive_time, 
> _UTF-16LE'yyyyMMddHHmm', 10) AS date_p, LOWER(TRIM(FLAG(BOTH), _UTF-16LE' ', 
> os_type)) AS os_type, MT_Date_Format_Mode(receive_time, _UTF-16LE'HHmm', 10) 
> AS date_mm, server_id) (1/6)) @ (unassigned) - [SCHEDULED] > with groupID < 
> cbc357ccb763df2852fee8c4fc7d55f2 > in sharing group < 
> 690dbad267a8ff37c8cb5e9dbedd0a6d >. Resources available to scheduler: Number 
> of instances=6, total number of slots=6, available slots=0
>    at 
> org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:281)
>    at 
> org.apache.flink.runtime.jobmanager.scheduler.Scheduler.allocateSlot(Scheduler.java:155)
>    at 
> org.apache.flink.runtime.executiongraph.Execution.lambda$allocateAndAssignSlotForExecution$2(Execution.java:491)
>    at 
> org.apache.flink.runtime.executiongraph.Execution$$Lambda$44/1664178385.apply(Unknown
>  Source)
>    at 
> java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:981)
>    at 
> java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2116)
>    at 
> org.apache.flink.runtime.executiongraph.Execution.allocateAndAssignSlotForExecution(Execution.java:489)
>    at 
> org.apache.flink.runtime.executiongraph.ExecutionJobVertex.allocateResourcesForAll(ExecutionJobVertex.java:521)
>    at 
> org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleEager(ExecutionGraph.java:945)
>    at 
> org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:875)
>    at 
> org.apache.flink.runtime.executiongraph.ExecutionGraph.restart(ExecutionGraph.java:1262)
>    at 
> org.apache.flink.runtime.executiongraph.restart.ExecutionGraphRestartCallback.triggerFullRecovery(ExecutionGraphRestartCallback.java:59)
>    at 
> org.apache.flink.runtime.executiongraph.restart.FixedDelayRestartStrategy$1.run(FixedDelayRestartStrategy.java:68)
>    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>    at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>    at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>    at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>    at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>    at java.lang.Thread.run(Thread.java:745)
> {code}
>  
> this Exception happened when the job started. issue links to 
> https://issues.apache.org/jira/browse/FLINK-4486
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-10818) RestartStrategies.fixedDelayRestart Occur NoResourceAvailableException: Not enough free slots available to run the job.

Reply via email to