Re: Flink Kafka example in Scala

2015-07-16 Thread Aljoscha Krettek
Hi, this looks like the flink-connector-kafka jar is not available where the job is running? Did you put it in the library folder of flink on all the machines or did you submit it with the job? On Thu, Jul 16, 2015, 21:05 Wendong wrote: > Hi Gyula, > > Cool. I removed .print and the error was go

One improvement suggestion: “flink-xx-jobmanager-linux-3lsu.log" file can't auto be recovered/detected after mistaking delete

2015-07-16 Thread Chenliang (Liang, DataSight)
One improvement suggestion, please check if it is valid? For checking system whether be adequately reliability, testers usually designedly do some delete operation. Steps: 1.go to "flink\build-target\log" 2.delete “flink-xx-jobmanager-linux-3lsu.log" file 3.Run jobs along with writing log info, m

Re: Containment Join Support

2015-07-16 Thread Fabian Hueske
Hi Martin, good to hear that you like Flink :-) AFAIK, there are no plans to add a containment join. The Flink community is currently working on adding support for outer joins. Regarding a containment join, I am not sure about the number of use cases. I would rather try to implement it on top of F

Re: Flink Kafka example in Scala

2015-07-16 Thread Wendong
Hi Gyula, Cool. I removed .print and the error was gone. However, env.execute failed with errors: . Caused by: java.lang.Exception: Call to registerInputOutput() of invokable failed at org.apache.flink.runtime.taskmanager.Task.run(Task.java:504) ... Caused by: org.apache.fli

Re: Gelly EOFException

2015-07-16 Thread Flavio Pompermaier
but isn't something that cause problem to have multiple vertices with the same id? On 16 Jul 2015 19:34, "Andra Lungu" wrote: > For now, there is a validator that checks whether the vertex ids > correspond to the target/src ids in the edges. If you want to check for > vertex ID uniqueness, you'll

Re: Flink Kafka example in Scala

2015-07-16 Thread Gyula Fóra
Hey, The reason you are getting that error is because you are calling print after adding a sink, which is an invalid operation. Remove either addSink or print :) Cheers, Gyula On Thu, Jul 16, 2015 at 7:37 PM Wendong wrote: > Thanks! I tried your updated MySimpleStringSchema and it works for both

Re: Flink Kafka example in Scala

2015-07-16 Thread Wendong
Thanks! I tried your updated MySimpleStringSchema and it works for both source and sink. However, my problem is the runtime error "Data stream sinks cannot be copied" as listed in previous post. I hope someone ran into the problem before and can give a hint. Wendong -- View this message in co

Re: Gelly EOFException

2015-07-16 Thread Andra Lungu
For now, there is a validator that checks whether the vertex ids correspond to the target/src ids in the edges. If you want to check for vertex ID uniqueness, you'll have to implement your own custom validator... I know people with the same error outside Gelly, so I doubt that the lack of unique id

Re: Submitting jobs from within Scala code

2015-07-16 Thread Till Rohrmann
Good to hear that your problem is solved :-) Cheers, Till On Thu, Jul 16, 2015 at 5:45 PM, Philipp Goetze < philipp.goe...@tu-ilmenau.de> wrote: > Hi Till, > > many thanks for your effort. I finally got it working. > > I'm a bit embarrassed because the issue was solved by using the same > flink

Re: Flink Scala performance

2015-07-16 Thread Vinh June
I just checked on web job manager, it says that runtime for flink job is 349ms, but actually it takes 18s using "time" command in terminal Should I care more about the latter timing ? -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Scal

Re: Submitting jobs from within Scala code

2015-07-16 Thread Philipp Goetze
Hi Till, many thanks for your effort. I finally got it working. I'm a bit embarrassed because the issue was solved by using the same flink-dist-JAR from the locally running Flink version. So to say I used an older Snapshot version for compiling than for running :-[ Best Regards, Philipp On

Re: Re: Submitting jobs from within Scala code

2015-07-16 Thread Till Rohrmann
Hi Philipp, what I usually do to run a Flink program on a cluster from within my IDE, I create a RemoteExecutionEnvironment. Since I have one UDF (the map function which doubles the values) defined, I also need to specify the jar containing this class. In my case, the jar is called test-1.0-SNAPSH

Re: Gelly EOFException

2015-07-16 Thread Flavio Pompermaier
I thought a bit about this error..in my job I was generating multiple vertices with the same id. Could this cause such errors? Maybe there could be a check about such situations in Gelly.. On Tue, Jul 14, 2015 at 10:00 PM, Andra Lungu wrote: > Hello, > > Sorry for the delay. The bug is not in Ge

Re: Cluster execution -jar files-

2015-07-16 Thread Matthias J. Sax
As the JavaDoc explains: >* @param jarFiles The JAR files with code that needs to be shipped to > the cluster. If the program uses >* user-defined functions, user-defined input formats, > or any libraries, those must be >* provided in the J

Re: Flink Scala performance

2015-07-16 Thread Stephan Ewen
Is it possible that it takes a long time to spawn JVMs on your system? That this takes up all the time? On Thu, Jul 16, 2015 at 3:34 PM, Vinh June wrote: > Here are my logs > http://pastebin.com/AJwiy2D8 > http://pastebin.com/K05H3Qur > from client log, it seems to take ~2s, but with "time flink

Fwd: Re: Submitting jobs from within Scala code

2015-07-16 Thread Philipp Goetze
Hey Tim, I think my previous mail was intercepted or something similar. However you can find my reply below. I already tried a simpler job which just does a env.fromElements... but still the same stack. How do you normally submit jobs (jars) from within the code? Best Regards, Philipp

Re: getIndexOfThisSubtask : starts at 0 or 1 ?

2015-07-16 Thread Ufuk Celebi
You are right, Arnaud. Sorry about this. :( I'm pushing a fix right now (for 0.9.1 as well)). Thanks for reporting this! On 16 Jul 2015, at 16:22, LINZ, Arnaud wrote: > Hello, > > According to the documentation, getIndexOfThisSubtask starts from 1; > >/** > * Gets the number o

getIndexOfThisSubtask : starts at 0 or 1 ?

2015-07-16 Thread LINZ, Arnaud
Hello, According to the documentation, getIndexOfThisSubtask starts from 1; /** * Gets the number of the parallel subtask. The numbering starts from 1 and goes up to the parallelism, * as returned by {@link #getNumberOfParallelSubtasks()}. * * @return

Re: Cluster execution -jar files-

2015-07-16 Thread Juan Fumero
Missing reference: [1] https://ci.apache.org/projects/flink/flink-docs-release-0.9/apis/cluster_execution.html On Don, 2015-07-16 at 16:04 +0200, Juan Fumero wrote: > Hi, > I would like to use the createRemoteEnvironment to run the application > in a cluster and I have some questions. Followin

Cluster execution -jar files-

2015-07-16 Thread Juan Fumero
Hi, I would like to use the createRemoteEnvironment to run the application in a cluster and I have some questions. Following the documentation in [1] It is not clear to me how to use it. What should be the content of the jar file? All the external libraries that I use? or need to include the pr

Re: Flink Scala performance

2015-07-16 Thread Vinh June
Here are my logs http://pastebin.com/AJwiy2D8 http://pastebin.com/K05H3Qur from client log, it seems to take ~2s, but with "time flink run ...", actual time is ~18s -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Scala-performance-tp206

Re: Flink Scala performance

2015-07-16 Thread Stephan Ewen
If you use the sample data from the example, there must be an issue with the setup. In Flink's standalone mode, it runs in 100ms on my machine. It may be possible that the command line client takes a long time to start up, so it appears that the program run time is long. If it takes so long, one

Re: Submitting jobs from within Scala code

2015-07-16 Thread Till Rohrmann
Hi Philipp, it seems that Stephan was right and that your JobGraph is somehow corrupted. You can see it in the exception JobSubmissionException that the JobGraph contains a vertex whose InvokableClassName is null. Furthermore, even the ID and the vertex name are null. This is a strong indicator, t

No accumulator results in streaming

2015-07-16 Thread LINZ, Arnaud
Hello, I’m struggling with this simple issue for hours now : I am unable to get the accumulator result of a streaming context result, the accumulator map in the JobExecutionResult is always empty. Simple test code (directly inspired from the documentation) : My source = public static clas

Re: Flink Scala performance

2015-07-16 Thread Vinh June
@Stephan: I use the sample data comes with the sample -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Scala-performance-tp2065p2091.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: Submitting jobs from within Scala code

2015-07-16 Thread Philipp Goetze
Hey Tim, here the console output now with log4j: 0[pool-7-thread-1-ScalaTest-running-FlinkCompileIt] INFO org.apache.flink.client.program.Client - Starting program in interactive mode 121 [pool-7-thread-1-ScalaTest-running-FlinkCompileIt] DEBUG org.apache.flink.api.scala.ClosureCleaner$

Re: Flink Scala performance

2015-07-16 Thread Chiwan Park
You can increase Flink managed memory by increasing Taskmanager JVM Heap (taskmanager.heap.mb) in flink-conf.yaml. There is some explanation of options in Flink documentation [1]. Regards, Chiwan Park [1] https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#common-options >

Re: Submitting jobs from within Scala code

2015-07-16 Thread Stephan Ewen
Could you also look into the JobManager logs? You may be submitting a corrupt JobGraph... On Thu, Jul 16, 2015 at 11:45 AM, Till Rohrmann wrote: > When you run your program from the IDE, then you can specify a > log4j.properties file. There you can configure where and what to log. It > should be

Re: Flink Scala performance

2015-07-16 Thread Vinh June
I found it in JobManager log "21:16:54,986 INFO org.apache.flink.runtime.taskmanager.TaskManager - Using 25 MB for Flink managed memory." is there a way to explicitly assign this for local ? -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.

Re: Flink Scala performance

2015-07-16 Thread Stephan Ewen
Vinh, Are you using the sample data built into the example, or are you using your own data? On Thu, Jul 16, 2015 at 8:54 AM, Vinh June wrote: > I ran it on local, from terminal. > And it's the Word Count example so it's small > > > > -- > View this message in context: > http://apache-flink-user

Re: Flink Scala performance

2015-07-16 Thread Ufuk Celebi
Hey Vinh, you have to look into the logs folder and find the log of the TaskManager (something like *taskmanager*.log) – Ufuk On 16 Jul 2015, at 11:35, Vinh June wrote: > Hi Max, > When I call 'flink run', it doesn't show any information like that > > > > -- > View this message in context

Re: Submitting jobs from within Scala code

2015-07-16 Thread Till Rohrmann
When you run your program from the IDE, then you can specify a log4j.properties file. There you can configure where and what to log. It should be enough to place the log4j.properties file in the resource folder of your project. An example properties file could look like: log4j.rootLogger=INFO, tes

Re: Flink Scala performance

2015-07-16 Thread Vinh June
Hi Max, When I call 'flink run', it doesn't show any information like that -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Scala-performance-tp2065p2083.html Sent from the Apache Flink User Mailing List archive. mailing list archive at

Re: Submitting jobs from within Scala code

2015-07-16 Thread Philipp Goetze
Hi Till, the problem is that this is the only output :( Or is it possible to get a more verbose log output? Maybe it is important to note, that both Flink and our project is built with Scala 2.11. Best Regards, Philipp On 16.07.2015 11:12, Till Rohrmann wrote: Hi Philipp, could you post

Re: Submitting jobs from within Scala code

2015-07-16 Thread Till Rohrmann
Hi Philipp, could you post the complete log output. This might help to get to the bottom of the problem. Cheers, Till On Thu, Jul 16, 2015 at 11:01 AM, Philipp Goetze < philipp.goe...@tu-ilmenau.de> wrote: > Hi community, > > in our project we try to submit built Flink programs to the jobmanag

Submitting jobs from within Scala code

2015-07-16 Thread Philipp Goetze
Hi community, in our project we try to submit built Flink programs to the jobmanager from within Scala code. The test program is executed correctly when submitted via the wrapper script "bin/flink run ..." and also with the webclient. But when executed from within the Scala code nothing seems

Re: Flink Kafka example in Scala

2015-07-16 Thread Aljoscha Krettek
Hi, your first example doesn't work because the SimpleStringSchema does not work for sinks. You can use this modified serialization schema: https://gist.github.com/aljoscha/e131fa8581f093915582. This works for both source and sink (I think the current SimpleStringSchema is not correct and should be

Re: Sort Benchmark infrastructure

2015-07-16 Thread Hawin Jiang
Hi George Thanks for the details. It looks like I have a long way to go. For big data benchmark, I would like to use that test cases, test data and test methodology to test different big data technologies. BTW, I am agree with you that no one system will necessarily be optimal for all cases for

Re: Flink Scala performance

2015-07-16 Thread Maximilian Michels
HI Vinh, If you run your program locally, then Flink uses the local execution mode which allocates only little managed memory. Managed memory is used by Flink to perform operations on serialized data. These operations can get slow if too little memory gets allocated because data needs to be spille

Re: How can handles Exist ,not Exist query on flink

2015-07-16 Thread Michele Bertoni
ds1.filter(//here selection of query 1) ds2.filter(//here selection of query 2) exist ds1.join(ds2.distinct(id)).where(id).equal(id){ // join by your join key(s) - note the distinct operator, otherwise you will get many line for each input line (left, right) => left //collect left } or ds1.cogr

Containment Join Support

2015-07-16 Thread Martin Junghanns
Hi everyone, at first, thanks for building this great framework! We are using Flink and especially Gelly for building a graph analytics stack (gradoop.com). I was wondering if there is a [planned] support for a containment join operator. Consider the following example: DataSet> left := {[0, 1],