Never mind, I'll post this new problem as a new thread.

On Wed, Mar 28, 2018 at 6:35 PM, Juho Autio <juho.au...@rovio.com> wrote:

> Thank you. The YARN job was started now, but the Flink job itself is in
> some bad state.
>
> Flink UI keeps showing status CREATED for all sub-tasks and nothing seems
> to be happening.
>
> ( For the record, this is what I did: export HADOOP_CLASSPATH=`hadoop
> classpath` – as found at https://ci.apache.org/proje
> cts/flink/flink-docs-master/ops/deployment/hadoop.html )
>
> I found this in Job manager log:
>
> 2018-03-28 15:26:17,449 INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph
>       - Job UniqueIdStream (43ed4ace55974d3c486452a45ee5db93) switched
> from state RUNNING to FAILING.
> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
> Could not allocate all requires slots within timeout of 300000 ms. Slots
> required: 20, slots allocated: 8
> at org.apache.flink.runtime.executiongraph.ExecutionGraph.lambd
> a$scheduleEager$36(ExecutionGraph.java:984)
> at java.util.concurrent.CompletableFuture.uniExceptionally(Comp
> letableFuture.java:870)
> at java.util.concurrent.CompletableFuture$UniExceptionally.
> tryFire(CompletableFuture.java:852)
> at java.util.concurrent.CompletableFuture.postComplete(Completa
> bleFuture.java:474)
> at java.util.concurrent.CompletableFuture.completeExceptionally
> (CompletableFuture.java:1977)
> at org.apache.flink.runtime.concurrent.FutureUtils$ResultConjun
> ctFuture.handleCompletedFuture(FutureUtils.java:551)
> at java.util.concurrent.CompletableFuture.uniWhenComplete(Compl
> etableFuture.java:760)
> at java.util.concurrent.CompletableFuture$UniWhenComplete.
> tryFire(CompletableFuture.java:736)
> at java.util.concurrent.CompletableFuture.postComplete(Completa
> bleFuture.java:474)
> at java.util.concurrent.CompletableFuture.completeExceptionally
> (CompletableFuture.java:1977)
> at org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete
> (FutureUtils.java:789)
> at akka.dispatch.OnComplete.internal(Future.scala:258)
> at akka.dispatch.OnComplete.internal(Future.scala:256)
> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
> at org.apache.flink.runtime.concurrent.Executors$DirectExecutio
> nContext.execute(Executors.java:83)
> at scala.concurrent.impl.CallbackRunnable.executeWithValue(
> Promise.scala:44)
> at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Pro
> mise.scala:252)
> at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupp
> ort.scala:603)
> at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
> at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedE
> xecute(Future.scala:601)
> at scala.concurrent.BatchingExecutor$class.execute(
> BatchingExecutor.scala:109)
> at scala.concurrent.Future$InternalCallbackExecutor$.execute(
> Future.scala:599)
> at akka.actor.LightArrayRevolverScheduler$TaskHolder.
> executeTask(LightArrayRevolverScheduler.scala:329)
> at akka.actor.LightArrayRevolverScheduler$$anon$4.
> executeBucket$1(LightArrayRevolverScheduler.scala:280)
> at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(Ligh
> tArrayRevolverScheduler.scala:284)
> at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArra
> yRevolverScheduler.scala:236)
> at java.lang.Thread.run(Thread.java:748)
>
> After this there was:
>
> 2018-03-28 15:26:17,521 INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph
>       - Restarting the job UniqueIdStream (43ed4ace55974d3c486452a45ee5d
> b93).
>
> And some time after that:
>
> 2018-03-28 15:27:39,125 ERROR 
> org.apache.flink.runtime.blob.BlobServerConnection
>           - GET operation failed
> java.io.EOFException: Premature end of GET request
> at org.apache.flink.runtime.blob.BlobServerConnection.get(BlobS
> erverConnection.java:275)
> at org.apache.flink.runtime.blob.BlobServerConnection.run(BlobS
> erverConnection.java:117)
>
> Task manager logs don't have any errors.
>
> Is that error about BlobServerConnection severe enough to make the job get
> stuck like this? How to debug this further?
>
> Thanks!
>
> On Wed, Mar 28, 2018 at 5:56 PM, Gary Yao <g...@data-artisans.com> wrote:
>
>> Hi Juho,
>>
>> Can you try submitting with HADOOP_CLASSPATH=`hadoop classpath` set? [1]
>> For example:
>>   HADOOP_CLASSPATH=`hadoop classpath` link-${FLINK_VERSION}/bin/flink
>> run [...]
>>
>> Best,
>> Gary
>>
>> [1] https://ci.apache.org/projects/flink/flink-docs-master/ops/d
>> eployment/hadoop.html#configuring-flink-with-hadoop-classpaths
>>
>>
>> On Wed, Mar 28, 2018 at 4:26 PM, Juho Autio <juho.au...@rovio.com> wrote:
>>
>>> I built a new Flink distribution from release-1.5 branch today.
>>>
>>> I tried running a job but get this error:
>>> java.lang.NoClassDefFoundError: com/sun/jersey/core/util/Featu
>>> resAndProperties
>>>
>>> I use yarn-cluster mode.
>>>
>>> The jersey-core jar is found in the hadoop lib on my EMR cluster, but
>>> seems like it's not used any more.
>>>
>>> I checked that jersey-core classes are not included in the new
>>> distribution, but they were not included in my previously built flink
>>> 1.5-SNAPSHOT either, which works. Has something changed recently to
>>> cause this?
>>>
>>> Is this a Flink bug or should I fix this by somehow explicitly telling
>>> Flink YARN app to use the hadoop lib now?
>>>
>>> More details below if needed.
>>>
>>> Thanks,
>>> Juho
>>>
>>>
>>> My launch command is basically:
>>>
>>> flink-${FLINK_VERSION}/bin/flink run -m yarn-cluster -yn ${NODE_COUNT}
>>> -ys ${SLOT_COUNT} -yjm ${JOB_MANAGER_MEMORY} -ytm ${TASK_MANAGER_MEMORY}
>>> -yst -yD restart-strategy=fixed-delay -yD 
>>> restart-strategy.fixed-delay.attempts=3
>>> -yD "restart-strategy.fixed-delay.delay=30 s" -p ${PARALLELISM} $@
>>>
>>>
>>> I'm also setting this to fix some classloading error (with the previous
>>> build that still works)
>>> -yD.classloader.resolve-order=parent-first
>>>
>>>
>>> Error stack trace:
>>>
>>> java.lang.NoClassDefFoundError: com/sun/jersey/core/util/Featu
>>> resAndProperties
>>> at java.lang.ClassLoader.defineClass1(Native Method)
>>> at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
>>> at java.security.SecureClassLoader.defineClass(SecureClassLoade
>>> r.java:142)
>>> at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
>>> at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>> at org.apache.hadoop.yarn.client.api.TimelineClient.createTimel
>>> ineClient(TimelineClient.java:55)
>>> at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.create
>>> TimelineClient(YarnClientImpl.java:181)
>>> at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.servic
>>> eInit(YarnClientImpl.java:168)
>>> at org.apache.hadoop.service.AbstractService.init(AbstractServi
>>> ce.java:163)
>>> at org.apache.flink.yarn.cli.FlinkYarnSessionCli.getClusterDesc
>>> riptor(FlinkYarnSessionCli.java:971)
>>> at org.apache.flink.yarn.cli.FlinkYarnSessionCli.createDescript
>>> or(FlinkYarnSessionCli.java:273)
>>> at org.apache.flink.yarn.cli.FlinkYarnSessionCli.createClusterD
>>> escriptor(FlinkYarnSessionCli.java:449)
>>> at org.apache.flink.yarn.cli.FlinkYarnSessionCli.createClusterD
>>> escriptor(FlinkYarnSessionCli.java:92)
>>> at org.apache.fliCommand exiting with ret '31'
>>>
>>>
>>
>

Reply via email to