Re: Compile fails with scala 2.11.4

2016-01-20 Thread Robert Metzger
Hi, in the latest master, the "tools/change-scala-version.sh" should be fixed. Also, the 1.0-SNAPSHOT version deployed to the snapshot repository should be good again. @Ritesh: The commands were correct. I'm not sure if Flink builds with Scala 2.11.4, the default 2.11 version we are using is 2.11

Re: flink 1.0-SNAPSHOT scala 2.11 compilation error

2016-01-20 Thread Robert Metzger
Hey David, the issue should be resolved now. Please let me know if its still an issue for you. Regards, Robert On Fri, Jan 15, 2016 at 4:02 PM, David Kim wrote: > Thanks Till! I'll keep an eye out on the JIRA issue. Many thanks for the > prompt reply. > > Cheers, > David > > On Fri, Jan 15, 2

Actual byte-streams in multiple-node pipelines

2016-01-20 Thread Tal Maoz
Hey, I’m a new user to Flink and I’m trying to figure out if I can build a pipeline I’m working on using Flink. I have a data source that sends out a continues data stream at a bandwidth of anywhere between 45MB/s to 600MB/s (yes, that’s MiB/s, not Mib/s, and NOT a series of individual messages

Unexpected out of bounds error in UnilateralSortMerger

2016-01-20 Thread Theodore Vasiloudis
Hello all, I'm trying to run a job using FlinkML and I'm confused about the source of an error. The job reads a libSVM formatted file and trains an SVM classifier on it. I've tried this with small datasets and everything works out fine. When trying to run the same job on a large dataset (~11GB

Re: Unexpected out of bounds error in UnilateralSortMerger

2016-01-20 Thread Stephan Ewen
Hi! Does this error occur in 0.10 or im 1.0-SNAPSHOT? It is probably an incorrectly configured Kryo instance (not a problem of the sorter). What is strange is that it occurs in the "MapReferenceResolver" - there should be no reference resolution during serialization / deserialization. Can you tr

Re: Frequent exceptions killing streaming job

2016-01-20 Thread Robert Metzger
Hey Nick, I had a discussion with Stephan Ewen on how we could resolve the issue. I filed a JIRA with our suggested approach: https://issues.apache.org/jira/browse/FLINK-3264 By handling this directly in the KafkaConsumer, we would avoid fetching data we can not handle anyways (discarding in the

parallelism parameter and output relation

2016-01-20 Thread Serkan Taş
I am working on this example http://www.itshared.org/2015/03/naive-bayes-on-apache-flink.html to learn get some more experience on platform. Question is ; By default, the output of process is double file (named 1 and 2) located in created folder. If i set parallelism to 1, FileNotFound excepti

Re: Actual byte-streams in multiple-node pipelines

2016-01-20 Thread Ritesh Kumar Singh
I think with sufficient processing power flink can do the above mentioned task using the stream api . Thanks, *Ritesh Kumar Singh,* *https://riteshtoday.wordpress.com/* On Wed,

Re: Unexpected out of bounds error in UnilateralSortMerger

2016-01-20 Thread Theodore Vasiloudis
It's on 0.10. I've tried explicitly registering SparseVector (which is done anyway by registerFlinkMLTypes which is called when

Re: Actual byte-streams in multiple-node pipelines

2016-01-20 Thread Robert Metzger
Hi Tal, that sounds like an interesting use case. I think I need a bit more details about your use case to see how it can be done with Flink. You said you need low latency, what latency is acceptable for you? Also, I was wondering how are you going to feed the input data into Flink? If the data i

Re: Results of testing Flink quickstart against 0.10-SNAPSHOT and 1.0-SNAPSHOT (re. Dependency on non-existent org.scalamacros:quasiquotes_2.11:)

2016-01-20 Thread Robert Metzger
Hi Prez, thanks a lot for the thorough research you did on this issue. The issue with "1.0-SNAPSHOT with fetched binary dependencies" should be resolved by a fix I've pushed to master yesterday: a) The "change-scala-version" script wasn't adopted to the renamed examples directory, that's why it f

Re: Actual byte-streams in multiple-node pipelines

2016-01-20 Thread Tal Maoz
Hey Robert, Thanks for responding! The latency I'm talking about would be no more than 1 second from input to output (meaning, bytes should flow immediately through the pipline and get to the other side after going through the processing). You can assume the processors have enough power to work i

Re: integration with a scheduler

2016-01-20 Thread Robert Metzger
Hi Serkan, I would suggest to have a look at the "./bin/flink" tool. It allows you to start ("run") and stop ("cancel") batch and streaming jobs. Flink doesn't support suspending jobs. You can also use the JobManager web interface (default port: 8081) to get the status of the job and also to canc

Re: Reading Binary Data (Matrix) with Flink

2016-01-20 Thread Saliya Ekanayake
Thank you, I saw the readHadoopFile, but I was not sure how it can be used to the following, which is what I need. The logic of the code requires an entire row to operate on, so in our current implementation with P tasks, each of them will read a rectangular block of (N/P) x N from the matrix. Is t

Could not upload the jar files to the job manager IOException

2016-01-20 Thread Ana M. Martinez
Hi all, I am running some experiments with flink in an Amazon cluster and every now and then (it seems to appear at random) I get the following IOException: > org.apache.flink.client.program.ProgramInvocationException: The program > execution failed: Could not upload the jar files to the job man

Re: Could not upload the jar files to the job manager IOException

2016-01-20 Thread Robert Metzger
Hi, can you check the log file of the JobManager you're trying to submit the job to? Maybe there you can find helpful information why it failed. On Wed, Jan 20, 2016 at 3:23 PM, Ana M. Martinez wrote: > Hi all, > > I am running some experiments with flink in an Amazon cluster and every > now an

Re: Unexpected out of bounds error in UnilateralSortMerger

2016-01-20 Thread Theodore Vasiloudis
I haven't been able to reproduce this with other datasets. Taking a smaller sample from the large dataset I'm using (link to data ) causes the same problem however. I'm wondering if the implementation of readLibSVM is what

groupBy(int) is undefined for the type SingleOutputStreamOperator while running streaming example provided on webpage

2016-01-20 Thread Vinaya M S
Hi, I'm new to Flink. Can anyone help me with the error below. Exception in thread "main" java.lang.Error: Unresolved compilation problem: The method groupBy(int) is undefined for the type SingleOutputStreamOperator,capture#1-of ?> The code snippet is: DataStream> dataStream = env

Re: groupBy(int) is undefined for the type SingleOutputStreamOperator while running streaming example provided on webpage

2016-01-20 Thread Vinaya M S
Version FLink 0.10. Example is mentioned in https://ci.apache.org/projects/flink/flink-docs-release-0.8/streaming_guide.html Please let me know where I can find these kind of updates. Thank you. On Wed, Jan 20, 2016 at 9:41 AM, Robert Metzger wrote: > Hi. > > which Flink version are you using?

Re: parallelism parameter and output relation

2016-01-20 Thread Robert Metzger
Hi Serkan, yes, with parallelism=1, you'll get one file, with everything higher, Flink is creating a directory with a file for each parallel instance. In your case, Flink can not create (or write to) the file because there is already a directory with the same name. Can you delete the directory and

Re: groupBy(int) is undefined for the type SingleOutputStreamOperator while running streaming example provided on webpage

2016-01-20 Thread Robert Metzger
Hi. which Flink version are you using? Starting from Flink 0.10., "groupBy" has been renamed to "keyBy". Where did you find the "groupBy" example? On Wed, Jan 20, 2016 at 3:37 PM, Vinaya M S wrote: > Hi, > > I'm new to Flink. Can anyone help me with the error below. > > Exception in thread "ma

Re: groupBy(int) is undefined for the type SingleOutputStreamOperator while running streaming example provided on webpage

2016-01-20 Thread Robert Metzger
Hi, as you can see from the URL, its the documentation for Flink 0.8. The current documentation is here: https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streaming_guide.html In the 0.10. release announcement ( http://flink.apache.org/news/2015/11/16/release-0.10.0.html ) the cha

Re: Results of testing Flink quickstart against 0.10-SNAPSHOT and 1.0-SNAPSHOT (re. Dependency on non-existent org.scalamacros:quasiquotes_2.11:)

2016-01-20 Thread Prez Cannady
Morning, Robert. You’re right; the 1.0-SNAPSHOT with fetched binaries issue is resolved now. Unfortunately, it now emits the same error as 0.10-SNAPSHOT with fetched binaries. There is a fix for that: https://github.com/apache/flink/pull/1511 It’s

Re: Compile fails with scala 2.11.4

2016-01-20 Thread Ritesh Kumar Singh
Thanks for the update Robert, I tried it out and it works fine for scala_2.11.4 version. I've made a docker image of the same and put it up on the hub just in case anyone else needs it. Thanks, *Ritesh Kumar Singh,* *https://rites

Re: Unexpected out of bounds error in UnilateralSortMerger

2016-01-20 Thread Stephan Ewen
The bug looks to be in the serialization via Kryo while spilling windows. Note that Kryo is here used as a fallback serializer, since the SparseVector is not transparent type to Flink. I think there are two possible reasons: 1) Kryo, or our Kryo setup has an issue here 2) Kryo is inconsistentl

Re: flink 1.0-SNAPSHOT scala 2.11 compilation error

2016-01-20 Thread David Kim
Hi Robert, Thanks for following up. The issue is resolved! Cheers, David On Wed, Jan 20, 2016 at 3:08 AM, Robert Metzger wrote: > Hey David, > > the issue should be resolved now. Please let me know if its still an issue > for you. > > Regards, > Robert > > > On Fri, Jan 15, 2016 at 4:02 PM, Da

An interesting apache project: Reef

2016-01-20 Thread kovas boguta
Some people here (especially Flink contributors) might be interested to know about this project: https://reef.apache.org/index.html It is lower-level than Flink (and less mature), but with similar architectural sensibilities and emphasis on interfaces. It would be pretty interesting to compare the

Cannot start FlinkMiniCluster.WebServer using different port in FlinkMiniCluster

2016-01-20 Thread HungChang
The original port is used so I'm changing the web port but it fails to. Can I ask which part I made a mistake? The error: Exception in thread "main" java.lang.NullPointerException at org.apache.flink.runtime.minicluster.FlinkMiniCluster.startWebServer(FlinkMiniCluster.scala:295) at org.ap

Re: Cannot start FlinkMiniCluster.WebServer using different port in FlinkMiniCluster

2016-01-20 Thread Till Rohrmann
It seems that the web server could not been instantiated. The reason for this problem should be in your logs. Could you look it up and post the reason here? Additionally, we should build in a sanity check to avoid the NPE. Cheers, Till On Wed, Jan 20, 2016 at 5:06 PM, HungChang wrote: > The or

Re: Unexpected out of bounds error in UnilateralSortMerger

2016-01-20 Thread Theodore Vasiloudis
OK here's what I tried: * Build Flink (mvn clean install) from the branch you linked (kryo) * Build my uber-jar, I use SBT with 1.0-SNAPSHOT as the Flink version, added local maven repo to resolvers so that it picks up the previously installed version (I hope) * Launch local cluster from newly bui

Re: JDBCInputFormat GC overhead limit exceeded error

2016-01-20 Thread Maximilian Bode
Hi Stephan, thanks for your fast answer. Just setting the Flink-managed memory to a low value would not have worked for us, as we need joins etc. in the same job. After investigating the JDBCInputFormat, we found the line statement = dbConn.createStatement(ResultSet.TYPE_SCROLL_INSENSITIVE, Re

Re: Unexpected out of bounds error in UnilateralSortMerger

2016-01-20 Thread Till Rohrmann
You could change the version of Stephan’s branch via mvn versions:set -DnewVersion=MyCustomBuildVersion and then mvn versions:commit. Now after you install the Flink binaries you can reference them in your project by setting the version of your Flink dependencies to MyCustomBuildVersion. That way,

Re: Cannot start FlinkMiniCluster.WebServer using different port in FlinkMiniCluster

2016-01-20 Thread HungChang
Yea I'm wondering why the web server cannot be instantiated because changing the port 8081 to works well in the following demo sample of Flink. https://github.com/dataArtisans/flink-streaming-demo/blob/master/src/main/scala/com/dataartisans/flink_demo/utils/DemoStreamEnvironment.scala so is t

Error starting job manager in 1.0-SNAPSHOT

2016-01-20 Thread Andrew Whitaker
Hi, I'm getting the following error when attempting to start the job manager: ``` ./bin/jobmanager.sh start cluster streaming ``` ``` 10:51:27,824 INFO org.apache.flink.runtime.jobmanager.JobManager - Registered UNIX signal handlers for [TERM, HUP, INT] 10:51:27,914 INFO org.apache.flink.

Re: Cannot start FlinkMiniCluster.WebServer using different port in FlinkMiniCluster

2016-01-20 Thread Till Rohrmann
I guess it’s easiest to simply enable logging and see what the problem is. If you run it from the IDE then you can also set a breakpoint in WebMonitorUtils.startWebRuntimeMonitor and see what the exception is. Cheers, Till ​ On Wed, Jan 20, 2016 at 6:04 PM, HungChang wrote: > Yea I'm wondering

Re: Error starting job manager in 1.0-SNAPSHOT

2016-01-20 Thread Stephan Ewen
Hi! As of a few weeks ago, there is no "streaming" or "batch" mode any more. There is only one mode that handles both. I think the argument "streaming" passed to the script is then incorrectly interpreted as the hostname to bin the JobManager network interface to. Then you get the "UnknownHostExc

Re: Reading Binary Data (Matrix) with Flink

2016-01-20 Thread Till Rohrmann
With readHadoopFile you can use all of Hadoop’s FileInputFormats and thus you can also do everything with Flink, what you can do with Hadoop. Simply take the same Hadoop FileInputFormat which you would take for your MapReduce job. Cheers, Till ​ On Wed, Jan 20, 2016 at 3:16 PM, Saliya Ekanayake

Re: Cannot start FlinkMiniCluster.WebServer using different port in FlinkMiniCluster

2016-01-20 Thread Stephan Ewen
I think this null pointer comes when the log files are not found (bug in 0.10). You can double check by either trying 1.0-SNAPSHOT or putting for test an absolute path of a file that exists for the log file. Greetings, Stephan On Wed, Jan 20, 2016 at 6:33 PM, Till Rohrmann wrote: > I guess it’

Re: JDBCInputFormat GC overhead limit exceeded error

2016-01-20 Thread Stephan Ewen
Super, thanks for finding this. Makes a lot of sense to have a result set that does hold onto data. Would be great if you could open a pull request with this fix, as other users will benefit from that as well! Cheers, Stephan On Wed, Jan 20, 2016 at 6:03 PM, Maximilian Bode < maximilian.b...@tn

Re: An interesting apache project: Reef

2016-01-20 Thread Stephan Ewen
Thanks for the pointers. I actually know some of the Reef people (like Markus Weimer) and we talked about ideas a few times. Especially around batch and iterations. There were many common thoughts and good discussions. In the streaming space, I think Flink and Reef diverged quite a bit (checkpoin

Re: Unexpected out of bounds error in UnilateralSortMerger

2016-01-20 Thread Theodore Vasiloudis
Alright I will try to do that. I've tried running the job with a CSV file as input, and using DenseVectors to represent the features, still the same IndexOutOfBounds error. On Wed, Jan 20, 2016 at 6:05 PM, Till Rohrmann wrote: > You could change the version of Stephan’s branch via mvn versions:

Re: Error starting job manager in 1.0-SNAPSHOT

2016-01-20 Thread Andrew Whitaker
Stephen, Thanks so much for the quick response. That worked for me! On Wed, Jan 20, 2016 at 11:34 AM, Stephan Ewen wrote: > Hi! > > As of a few weeks ago, there is no "streaming" or "batch" mode any more. > There is only one mode that handles both. > > I think the argument "streaming" passed to

Re: DeserializationSchema isEndOfStream usage?

2016-01-20 Thread Robert Metzger
I've now merged the pull request. DeserializationSchema.isEndOfStream() should now be evaluated correctly by the Kafka 0.9 and 0.8 connectors. Please let me know if the updated code has any issues. I'll fix the issues asap. On Wed, Jan 13, 2016 at 5:06 PM, David Kim wrote: > Thanks Robert! I'll

Re: Error starting job manager in 1.0-SNAPSHOT

2016-01-20 Thread Stephan Ewen
Sorry for the confusion with that. The 1.0-SNAPSHOT is changing quite a bit, we are trying to consolidate as much as possible for 1.0 to keep braking changes after that low. On Wed, Jan 20, 2016 at 7:58 PM, Andrew Whitaker < andrew.whita...@braintreepayments.com> wrote: > Stephen, > > Thanks so m

Re: Unexpected out of bounds error in UnilateralSortMerger

2016-01-20 Thread Stephan Ewen
Can you again post the stack trace? With the patched branch, the reference mapper should not be used any more (which is where the original exception occurred). On Wed, Jan 20, 2016 at 7:38 PM, Theodore Vasiloudis < theodoros.vasilou...@gmail.com> wrote: > Alright I will try to do that. > > I've t

Re: Results of testing Flink quickstart against 0.10-SNAPSHOT and 1.0-SNAPSHOT (re. Dependency on non-existent org.scalamacros:quasiquotes_2.11:)

2016-01-20 Thread Stephan Ewen
Hi Prez! I merged the pull request into master a while back. Have a look here ( https://github.com/apache/flink/commits/master commits of January 15th). Is it possible that you are using a cached older version? Greetings, Stephan On Wed, Jan 20, 2016 at 4:00 PM, Prez Cannady wrote: > Morni

Re: integration with a scheduler

2016-01-20 Thread Stephan Ewen
If you want to programmatically start / stop / cancel jobs, have a look at the class "Client" ( https://github.com/apache/flink/blob/master/flink-clients/src/main/java/org/apache/flink/client/program/Client.java ) >From the classes RemoteEnvironment or the RemoteExecutor, you can see how to use i

Re: InvalidTypesException - Input mismatch: Basic type 'Integer' expected but was 'Long'

2016-01-20 Thread Biplob Biswas
Hello everyone, I am still stuck with this issue, can anyone point me in the right direction? Thanks & Regards Biplob Biswas On Mon, Jan 18, 2016 at 2:24 PM, Biplob Biswas wrote: > Hi Till, > > I am using flink 0.10.1 and if i am not wrong it corresponds to the > 1.0-Snapshot you mentioned. >

Re: Actual byte-streams in multiple-node pipelines

2016-01-20 Thread Stephan Ewen
This sounds quite feasible, actually, though it is a pretty unique use case. Like Robert said, you can write map() and flatMap() function on byte[] arrays. Make sure that the byte[] that the sources produce are not super small and not too large (I would start with 1-4K or so). You can control how

Re: InvalidTypesException - Input mismatch: Basic type 'Integer' expected but was 'Long'

2016-01-20 Thread Stephan Ewen
Hi! Can you check if the problem persists in the 1.0-SNAPSHOT branch? It may be fixed in the newest version already, since we cannot reproduce it the latest version. Thanks, Staphen On Wed, Jan 20, 2016 at 9:56 PM, Biplob Biswas wrote: > Hello everyone, > > I am still stuck with this issue, c

Re: integration with a scheduler

2016-01-20 Thread Serkan Taş
Thank you very much Stephan and Robert. As Robert offers, the most common way is to execute a batch script, but i want to go beyond. i am going to work on both alternatives. Best regards, > 20 Oca 2016 tarihinde 22:53 saatinde, Stephan Ewen şunları > yazdı: > > If you want to programmatica

Re: Compile fails with scala 2.11.4

2016-01-20 Thread Chiwan Park
Thanks for sharing, Ritesh! Regards, Chiwan Park > On Jan 21, 2016, at 12:28 AM, Ritesh Kumar Singh > wrote: > > Thanks for the update Robert, I tried it out and it works fine for > scala_2.11.4 version. > I've made a docker image of the same and put it up on the hub just in case > anyone el

How to prepare data for K means clustering

2016-01-20 Thread Ashutosh Kumar
I saw example code for K means clustering . It takes input data points as pair of double values (1.2 2.3\n5.3 7.2\.). My question is how do I convert my business data to this format. I have customer data which has columns like house hold income , education and several others. I want to do clusteri

Re: How to prepare data for K means clustering

2016-01-20 Thread Chiwan Park
Hi Ashutosh, You can use basic Flink DataSet operations such as map and filter to transform your data. Basically, you have to declare a distance metric between each record in data. In example, we use euclidean distance (see euclideanDistance method in Point class). In map method in SelectNeare