Hi Lydia, I looks like that. I guess you should check your hdfs access rights.
Cheers, Till On Mon, Feb 1, 2016 at 11:28 AM, Lydia Ickler <ickle...@googlemail.com> wrote: > Hi Till, > > thanks for your reply! > I tested it with the Wordcount example. > Everything works fine if I run the command: > ./flink run -p 3 /home/flink/examples/WordCount.jar > Then the program gets executed by my 3 workers. > If I want to save the output to a file: > ./flink run -p 3 /home/flink/examples/WordCount.jar > hdfs://grips2:9000/users/Flink_1000.csv > hdfs://grips2:9000/users/Wordcount_1000 > > I get the following error message: > What am I doing wrong? Is something wrong with my cluster writing > permissions? > > org.apache.flink.client.program.ProgramInvocationException: The program > execution failed: Cannot initialize task 'DataSink (CsvOutputFormat (path: > hdfs://grips2:9000/users/Wordcount_1000s, delimiter: ))': Output > directory could not be created. > at org.apache.flink.client.program.Client.runBlocking(Client.java:370) > at org.apache.flink.client.program.Client.runBlocking(Client.java:348) > at org.apache.flink.client.program.Client.runBlocking(Client.java:315) > at > org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:70) > at > org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:78) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:497) > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:395) > at org.apache.flink.client.program.Client.runBlocking(Client.java:252) > at > org.apache.flink.client.CliFrontend.executeProgramBlocking(CliFrontend.java:676) > at org.apache.flink.client.CliFrontend.run(CliFrontend.java:326) > at > org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:978) > at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1028) > Caused by: org.apache.flink.runtime.client.JobExecutionException: Cannot > initialize task 'DataSink (CsvOutputFormat (path: > hdfs://grips2:9000/users/Wordcount_1000s, delimiter: ))': Output > directory could not be created. > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$6.apply(JobManager.scala:867) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$6.apply(JobManager.scala:851) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at org.apache.flink.runtime.jobmanager.JobManager.org > $apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:851) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:341) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) > at > org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:36) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) > at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33) > at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28) > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) > at > org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28) > at akka.actor.Actor$class.aroundReceive(Actor.scala:465) > at > org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:100) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) > at akka.actor.ActorCell.invoke(ActorCell.scala:487) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) > at akka.dispatch.Mailbox.run(Mailbox.scala:221) > at akka.dispatch.Mailbox.exec(Mailbox.scala:231) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > Caused by: java.io.IOException: Output directory could not be created. > at > org.apache.flink.api.common.io.FileOutputFormat.initializeGlobal(FileOutputFormat.java:295) > at > org.apache.flink.runtime.jobgraph.OutputFormatVertex.initializeOnMaster(OutputFormatVertex.java:84) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$6.apply(JobManager.scala:863) > ... 29 more > > The exception above occurred while trying to run your command. > > > Am 28.01.2016 um 10:44 schrieb Till Rohrmann <till.rohrm...@gmail.com>: > > Hi Lydia, > > what do you mean with master? Usually when you submit a program to the > cluster and don’t specify the parallelism in your program, then it will be > executed with the parallelism.default value as parallelism. You can > specify the value in your cluster configuration flink-config.yaml file. > Alternatively you can always specify the parallelism via the CLI client > with the -p option. > > Cheers, > Till > > > On Thu, Jan 28, 2016 at 9:53 AM, Lydia Ickler <ickle...@googlemail.com> > wrote: > >> Hi all, >> >> I am doing some operations on a DataSet<Tuple3<Integer,Integer,Double>> … >> (see code below) >> When I run my program on a cluster with 3 machines I can see within the >> web client that only my master is executing the program. >> Do I have to specify somewhere that all machines have to participate? >> Usually the cluster executes in parallel. >> >> Any suggestions? >> >> Best regards, >> Lydia >> >> DataSet<Tuple3<Integer, Integer, Double>> matrixA = readMatrix(env, input); >> >> DataSet<Tuple3<Integer, Integer, Double>> initial = >> matrixA.groupBy(0).sum(2); >> >> //normalize by maximum value >> initial = initial.cross(initial.max(2)).map(new normalizeByMax()); >> >> matrixA.join(initial).where(1).equalTo(0) >> >> .map(new ProjectJoinResultMapper()).groupBy(0, 1).sum(2); >> >> >> > >