Hi, ./yarn-session.sh -n 3 -s 1 -jm 1024 -tm 4096
will start a YARN session with 4 containers (3 workers, 1 master). Once the session is running, you can submit as many jobs as you want to this session, using ./bin/flink run ./path/to/jar The YARN session will create a hidden file in conf/ which contains the connection details for the ./bin/flink tool. With ./flink run -m yarn-cluster -yn 3 -yjm 1024 -ytm 4096 ../path/to/jar You can start the job in /path/to/jar directly on YARN. Behind the scenes, we'll start a YARN session just for executing this one job. Since you're currently debugging your job, I would suggest to start one yarn session and submit jobs against it. Robert On Mon, May 18, 2015 at 5:58 PM, Stephan Ewen <se...@apache.org> wrote: > Hi Michele! > > It looks like there are quite a few things going wrong in your case. Let > me see what I can deduce from the output you are showing me: > > > 1) You seem to run into a bug that exists in the 0.9-milestone-1 and has > been fixed in the master: > > As far as I can tell, you call "collect()" on a data set, and the type > that you collect is a custom type. > The milestone has a bug there that it uses the wrong classloader, so > "collect()" is unfortunately not supported there on custom types. The > latest SNAPSHOT version should have this fixed. > We are pushing to finalize the code for the 0.9 release, so hopefully we > have an official release with that fix in a few weeks. > > > 2) The results that you collect seem to be very large, so the JobManager > actually runs out of memory while collecting them. This situation should be > improved in the current master as well, > even though it is still possible to break the master's heap with the > collect() call (we plan to fix that soon as well). > > Can you try and see if the latest SNAPSHOT version solves your issue? > > > > > BTW: It looks like you are starting two YARN sessions actually (Robert can > probably comment on this) > > ./yarn-session.sh -n 3 -s 1 -jm 1024 -tm 4096 > > ./flink run -m yarn-cluster -yn 3 -yjm 1024 -ytm 4096 ../path/to/jar > > > The first line starts a YARN session against which you can run multiple > jobs. The second line actually starts another dedicated YARN session for > that job. > > > > Greetings, > Stephan > > > On Mon, May 18, 2015 at 5:30 PM, Michele Bertoni < > michele1.bert...@mail.polimi.it> wrote: > >> Hi, >> >> I have a problem running my app on a Yarn cluster >> >> >> I developed it in my computer and everything is working fine >> >> then we setup the environment on Amazon EMR reading data from HDFS not S3 >> >> >> we run it with these command >> >> >> ./yarn-session.sh -n 3 -s 1 -jm 1024 -tm 4096 >> >> ./flink run -m yarn-cluster -yn 3 -yjm 1024 -ytm 4096 ../path/to/jar >> >> >> we are using flink 0.9.0-milestone-1 >> >> >> after running it, the terminal windows where we launch it totally crash, the >> last messages are >> >> >> 05/18/2015 15:19:56 Job execution switched to status FINISHED. >> 05/18/2015 15:19:56 Job execution switched to status FAILING. >> >> >> >> >> this is the error from the yarn log >> >> >> 2015-05-18 15:19:53 ERROR ApplicationMaster$$anonfun$2$$anon$1:66 - Could >> not process accumulator event of job 5429fe215a3953101cd575cd82b596c8 >> received from akka://flink/deadLetters. >> java.lang.OutOfMemoryError: Java heap space >> at >> org.apache.flink.api.common.accumulators.ListAccumulator.read(ListAccumulator.java:91) >> at >> org.apache.flink.runtime.accumulators.AccumulatorEvent.getAccumulators(AccumulatorEvent.java:124) >> at >> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:350) >> at >> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) >> at >> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) >> at >> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) >> at >> org.apache.flink.yarn.ApplicationMasterActor$$anonfun$receiveYarnMessages$1.applyOrElse(ApplicationMasterActor.scala:99) >> at scala.PartialFunction$OrElse.apply(PartialFunction.scala:162) >> at >> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37) >> at >> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30) >> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) >> at >> org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30) >> at akka.actor.Actor$class.aroundReceive(Actor.scala:465) >> at >> org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:91) >> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) >> at akka.actor.ActorCell.invoke(ActorCell.scala:487) >> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) >> at akka.dispatch.Mailbox.run(Mailbox.scala:221) >> at akka.dispatch.Mailbox.exec(Mailbox.scala:231) >> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >> at >> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) >> at >> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) >> at >> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) >> 2015-05-18 15:19:56 ERROR OneForOneStrategy:66 - >> java.lang.ClassNotFoundException: >> LowLevel.FlinkImplementation.FlinkDataTypes$GValue >> org.apache.commons.lang3.SerializationException: >> java.lang.ClassNotFoundException: >> LowLevel.FlinkImplementation.FlinkDataTypes$GValue >> at >> org.apache.commons.lang3.SerializationUtils.deserialize(SerializationUtils.java:230) >> at >> org.apache.commons.lang3.SerializationUtils.deserialize(SerializationUtils.java:268) >> at >> org.apache.flink.api.common.accumulators.ListAccumulator.getLocalValue(ListAccumulator.java:51) >> at >> org.apache.flink.api.common.accumulators.ListAccumulator.getLocalValue(ListAccumulator.java:35) >> at >> org.apache.flink.runtime.jobmanager.accumulators.AccumulatorManager.getJobAccumulatorResults(AccumulatorManager.java:77) >> at >> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:300) >> at >> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) >> at >> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) >> at >> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) >> at >> org.apache.flink.yarn.ApplicationMasterActor$$anonfun$receiveYarnMessages$1.applyOrElse(ApplicationMasterActor.scala:99) >> at scala.PartialFunction$OrElse.apply(PartialFunction.scala:162) >> at >> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37) >> at >> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30) >> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) >> at >> org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30) >> at akka.actor.Actor$class.aroundReceive(Actor.scala:465) >> at >> org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:91) >> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) >> at akka.actor.ActorCell.invoke(ActorCell.scala:487) >> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) >> at akka.dispatch.Mailbox.run(Mailbox.scala:221) >> at akka.dispatch.Mailbox.exec(Mailbox.scala:231) >> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >> at >> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) >> at >> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) >> at >> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) >> Caused by: java.lang.ClassNotFoundException: >> LowLevel.FlinkImplementation.FlinkDataTypes$GValue >> at java.net.URLClassLoader$1.run(URLClassLoader.java:366) >> at java.net.URLClassLoader$1.run(URLClassLoader.java:355) >> at java.security.AccessController.doPrivileged(Native Method) >> at java.net.URLClassLoader.findClass(URLClassLoader.java:354) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:425) >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:358) >> at java.lang.Class.forName0(Native Method) >> at java.lang.Class.forName(Class.java:274) >> at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:625) >> at >> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612) >> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) >> at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1663) >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344) >> at >> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) >> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) >> at >> org.apache.commons.lang3.SerializationUtils.deserialize(SerializationUtils.java:224) >> ... 25 more >> >> >> >> >> >> >> >> Can you help me understanding something? >> >> thanks >> >> >