Thanks Gary,

I am compiling a new version of Mesos and when I test it again I will reply
here if I found an error.


On Wed, 11 Sep 2019, 09:22 Gary Yao, <g...@ververica.com> wrote:

> Hi Felipe,
>
> I am glad that you were able to fix the problem yourself.
>
> > But I suppose that Mesos will allocate Slots and Task Managers
> dynamically.
> > Is that right?
>
> Yes, that is the case since Flink 1.5 [1].
>
> > Then I put the number of "mesos.resourcemanager.tasks.cpus" to be equal
> or
> > less the available cores on a single node of the cluster. I am not sure
> about
> > this parameter, but only after this configuration it worked.
>
> I would need to see JobManager and Mesos logs to understand why this
> resolved
> your issue. If you do not set mesos.resourcemanager.tasks.cpus explicitly,
> Flink will request CPU resources equal to the number of TaskManager slots
> (taskmanager.numberOfTaskSlots) [2]. Maybe this value was too high in your
> configuration?
>
> Best,
> Gary
>
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077
> [2]
> https://github.com/apache/flink/blob/0a405251b297109fde1f9a155eff14be4d943887/flink-mesos/src/main/java/org/apache/flink/mesos/runtime/clusterframework/MesosTaskManagerParameters.java#L344
>
> On Tue, Sep 10, 2019 at 10:41 AM Felipe Gutierrez <
> felipe.o.gutier...@gmail.com> wrote:
>
>> I managed to find what was going wrong. I will write here just for the
>> record.
>>
>> First, the master machine was not login automatically at itself. So I had
>> to give permission for it.
>>
>> chmod og-wx ~/.ssh/authorized_keys
>> chmod 750 $HOME
>>
>> Then I put the number of "mesos.resourcemanager.tasks.cpus" to be equal
>> or less the available cores on a single node of the cluster. I am not sure
>> about this parameter, but only after this configuration it worked.
>>
>> Felipe
>> *--*
>> *-- Felipe Gutierrez*
>>
>> *-- skype: felipe.o.gutierrez*
>> *--* *https://felipeogutierrez.blogspot.com
>> <https://felipeogutierrez.blogspot.com>*
>>
>>
>> On Fri, Sep 6, 2019 at 10:36 AM Felipe Gutierrez <
>> felipe.o.gutier...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am running Mesos without DC/OS [1] and Flink on it. Whe I start my
>>> cluster I receive some messages that I suppose everything was started.
>>> However, I see 0 slats available on the Flink web dashboard. But I suppose
>>> that Mesos will allocate Slots and Task Managers dynamically. Is that right?
>>>
>>> $ ./bin/mesos-appmaster.sh &
>>> [1] 16723
>>> flink@r03:~/flink-1.9.0$ I0906 10:22:45.080328 16943 sched.cpp:239]
>>> Version: 1.9.0
>>> I0906 10:22:45.082672 16996 sched.cpp:343] New master detected at
>>> mas...@xxx.xxx.xxx.xxx:5050
>>> I0906 10:22:45.083276 16996 sched.cpp:363] No credentials provided.
>>> Attempting to register without authentication
>>> I0906 10:22:45.086840 16997 sched.cpp:751] Framework registered with
>>> 22f6a553-e8ac-42d4-9a90-96a8d5f002f0-0003
>>>
>>> Then I deploy my Flink application. When I use the first command to
>>> deploy the application starts. However, the tasks remain CREATED until
>>> Flink throws a timeout exception. In other words, it never turns to RUNNING.
>>> When I use the second comman to deploy the application it does not start
>>> and I receive the exception of "Could not allocate all requires slots
>>> within timeout of 300000 ms. Slots required: 2". The full stacktrace is
>>> below.
>>>
>>> $ /home/flink/flink-1.9.0/bin/flink run
>>> /home/flink/hello-flink-mesos/target/hello-flink-mesos.jar &
>>> $ ./bin/mesos-appmaster-job.sh run
>>> /home/flink/hello-flink-mesos/target/hello-flink-mesos.jar &
>>>
>>>
>>> [1]
>>> https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/mesos.html#mesos-without-dcos
>>> ps.: my application runs normally on a standalone Flink cluster.
>>>
>>> ------------------------------------------------------------
>>>  The program finished with the following exception:
>>>
>>> org.apache.flink.client.program.ProgramInvocationException: Job failed.
>>> (JobID: 7ad8d71faaceb1ac469353452c43dc2a)
>>> at
>>> org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:262)
>>> at
>>> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:338)
>>> at
>>> org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:60)
>>> at
>>> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1507)
>>> at org.hello_flink_mesos.App.<init>(App.java:35)
>>> at org.hello_flink_mesos.App.main(App.java:285)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:498)
>>> at
>>> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:576)
>>> at
>>> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:438)
>>> at
>>> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:274)
>>> at
>>> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:746)
>>> at
>>> org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:273)
>>> at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205)
>>> at
>>> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1010)
>>> at
>>> org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1083)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at javax.security.auth.Subject.doAs(Subject.java:422)
>>> at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
>>> at
>>> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>>> at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1083)
>>> Caused by: org.apache.flink.runtime.client.JobExecutionException: Job
>>> execution failed.
>>> at
>>> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146)
>>> at
>>> org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:259)
>>> ... 22 more
>>> Caused by:
>>> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
>>> Could not allocate all requires slots within timeout of 300000 ms. Slots
>>> required: 2, slots allocated: 0, previous allocation IDs: [], execution
>>> status: completed exceptionally: java.util.concurrent.CompletionException:
>>> java.util.concurrent.CompletionException:
>>> java.util.concurrent.TimeoutException/java.util.concurrent.CompletableFuture@b520de3[Completed
>>> exceptionally], incomplete: 
>>> java.util.concurrent.CompletableFuture@36f3d30c[Not
>>> completed, 1 dependents]
>>> at
>>> org.apache.flink.runtime.executiongraph.SchedulingUtils.lambda$scheduleEager$1(SchedulingUtils.java:194)
>>> at
>>> java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
>>> at
>>> java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
>>> at
>>> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>>> at
>>> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>>> at
>>> org.apache.flink.runtime.concurrent.FutureUtils$ResultConjunctFuture.handleCompletedFuture(FutureUtils.java:633)
>>> at
>>> org.apache.flink.runtime.concurrent.FutureUtils$ResultConjunctFuture.lambda$new$0(FutureUtils.java:656)
>>> at
>>> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
>>> at
>>> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
>>> at
>>> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>>> at
>>> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>>> at
>>> org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.lambda$internalAllocateSlot$0(SchedulerImpl.java:190)
>>> at
>>> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
>>> at
>>> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
>>> at
>>> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>>> at
>>> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>>> at
>>> org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$SingleTaskSlot.release(SlotSharingManager.java:700)
>>> at
>>> org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.release(SlotSharingManager.java:484)
>>> at
>>> org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.lambda$new$0(SlotSharingManager.java:380)
>>> at
>>> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
>>> at
>>> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
>>> at
>>> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>>> at
>>> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>>> at
>>> org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:998)
>>> at
>>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:397)
>>> at
>>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:190)
>>> at
>>> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
>>> at
>>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
>>> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
>>> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
>>> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>>> at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
>>> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
>>> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
>>> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
>>> at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
>>> at akka.actor.ActorCell.invoke(ActorCell.scala:561)
>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
>>> at akka.dispatch.Mailbox.run(Mailbox.scala:225)
>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
>>> at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>> at
>>> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>> at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>> at
>>> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>
>>> Thanks,
>>> Felipe
>>> *--*
>>> *-- Felipe Gutierrez*
>>>
>>> *-- skype: felipe.o.gutierrez*
>>> *--* *https://felipeogutierrez.blogspot.com
>>> <https://felipeogutierrez.blogspot.com>*
>>>
>>

Reply via email to