Re: Reading Binary Data (Matrix) with Flink

2016-01-19 Thread Chiwan Park
Hi Saliya, You can use the input format from Hadoop in Flink by using readHadoopFile method. The method returns a dataset which of type is Tuple2. Note that MapReduce equivalent transformation in Flink is composed of map, groupBy, and reduceGroup. > On Jan 20, 2016, at 3:04 PM, Suneel Marthi

Re: Reading Binary Data (Matrix) with Flink

2016-01-19 Thread Suneel Marthi
Guess u r looking for Flink's BinaryInputFormat to be able to read blocks of data from HDFS https://ci.apache.org/projects/flink/flink-docs-release-0.10/api/java/org/apache/flink/api/common/io/BinaryInputFormat.html On Wed, Jan 20, 2016 at 12:45 AM, Saliya Ekanayake wrote: > Hi, > > I am trying

Reading Binary Data (Matrix) with Flink

2016-01-19 Thread Saliya Ekanayake
Hi, I am trying to use Flink perform a parallel batch operation on a NxN matrix represented as a binary file. Each (i,j) element is stored as a Java Short value. In a typical MapReduce programming with Hadoop, each map task will read a block of rows of this matrix and perform computation on that b

Re: JDBCInputFormat GC overhead limit exceeded error

2016-01-19 Thread Stephan Ewen
Hi! This kind of error (GC overhead exceeded) usually means that the system is reaching a state where it has very many still living objects and frees little memory during each collection. As a consequence, it is basically busy with only garbage collection. Your job probably has about 500-600 MB o

Re: Flink Stream: How to ship results through socket server

2016-01-19 Thread Matthias J. Sax
Yes (if I understand correctly what you aim for). On 01/19/2016 05:57 PM, Saiph Kappa wrote: > Thanks for your reply Mattias. So it is not possible to open a socket > server in the JobGraph and having it open during the lifetime of the > job, is that what you are saying? And it is required to have

Re: DataSet in Streaming application under Flink

2016-01-19 Thread Till Rohrmann
Hi Sylvain, what you could do for example is to load a static data set, e.g. from HDFS, in the open method of your comparator and cache it there. The open method is called for each task once when it is created. The comparator could then be a RichMapFunction implementation. By making the field stor

Re: Flink Stream: How to ship results through socket server

2016-01-19 Thread Saiph Kappa
Thanks for your reply Mattias. So it is not possible to open a socket server in the JobGraph and having it open during the lifetime of the job, is that what you are saying? And it is required to have an external process to open that socket server. On Tue, Jan 19, 2016 at 5:38 PM, Matthias J. Sax

DataSet in Streaming application under Flink

2016-01-19 Thread Sylvain Hotte
Hi, I want to know if it is possible to load a small dataset in a stream application under flink. Here's an example: I have a data stream A and a Data Set B I need to compare all A tuple to tuple of B. Since B is small, it would be loaded on all node and be persistent (not reloaded at every co

Re: JDBCInputFormat GC overhead limit exceeded error

2016-01-19 Thread Maximilian Bode
Hi Robert, I am using 0.10.1. > Am 19.01.2016 um 17:42 schrieb Robert Metzger : > > Hi Max, > > which version of Flink are you using? > > On Tue, Jan 19, 2016 at 5:35 PM, Maximilian Bode > wrote: > Hi everyone, > > I am facing a problem using the JDBCInput

Re: JDBCInputFormat GC overhead limit exceeded error

2016-01-19 Thread Robert Metzger
Hi Max, which version of Flink are you using? On Tue, Jan 19, 2016 at 5:35 PM, Maximilian Bode < maximilian.b...@tngtech.com> wrote: > Hi everyone, > > I am facing a problem using the JDBCInputFormat which occurred in a larger > Flink job. As a minimal example I can reproduce it when just writin

Re: Flink Stream: How to ship results through socket server

2016-01-19 Thread Matthias J. Sax
Your "SocketWriter-Thread" code will run on your client. All code in "main" runs on the client. execute() itself runs on the client, too. Of course, it triggers the job submission to the cluster. In this step, the assembled job from the previous calls is translated into the JobGraph which is submi

JDBCInputFormat GC overhead limit exceeded error

2016-01-19 Thread Maximilian Bode
Hi everyone, I am facing a problem using the JDBCInputFormat which occurred in a larger Flink job. As a minimal example I can reproduce it when just writing data into a csv after having read it from a database, i.e. DataSet> existingData = env.createInput( JDBCInputFormat.buildJDBCInput

Re: Flink Stream: collect in an array all records within a window

2016-01-19 Thread Matthias J. Sax
Seems you are right. It works on the current 1.0-Snapshot version which has a different signature... > def writeToSocket( > hostname: String, > port: Integer, > schema: SerializationSchema[T]): DataStreamSink[T] = { > javaStream.writeToSocket(hostname, port, schema) > } in

Re: Discarded messages on filter and checkpoint ack

2016-01-19 Thread Don Frascuchon
Yes... In fact, my function filter was wrong. It's working well now Thanks for your response Stephan! El mar., 19 ene. 2016 a las 14:25, Stephan Ewen () escribió: > Hi! > > There are no acks for individual messages in Flink. All messages that an > operator receives between two checkpoint barrie

Re: Flink Stream: collect in an array all records within a window

2016-01-19 Thread Saiph Kappa
I think this is a bug in the scala API. def writeToSocket(hostname : scala.Predef.String, port : java.lang.Integer, schema : org.apache.flink.streaming.util.serialization.SerializationSchema[T, scala.Array[scala.Byte]]) : org.apache.flink.streaming.api.datastream.DataStreamSink[T] = { /* compiled

Flink Stream: How to ship results through socket server

2016-01-19 Thread Saiph Kappa
Hi, This is a simple example that I found using Flink Stream. I changed it so the flink client can be executed on a remote cluster, and so that it can open a socket server to ship its results for any other consumer machine. It seems to me that the socket server is not being open in the remote clus

Re: Discarded messages on filter and checkpoint ack

2016-01-19 Thread Stephan Ewen
Hi! There are no acks for individual messages in Flink. All messages that an operator receives between two checkpoint barriers fail or succeed together. That's why dropped messages in filters need no dedicated acks. If the next checkpoint barrier passes through the whole topology, the set of mess

Re: Flink Stream: collect in an array all records within a window

2016-01-19 Thread Matthias J. Sax
It should work. Your error message indicates, that your DataStream is of type [String,Array[Byte]] and not of type [String]. > Type mismatch, expected: SerializationSchema[String, Array[Byte]], actual: > SimpleStringSchema Can you maybe share your code? -Matthias On 01/19/2016 01:57 PM, Saiph

Re: Flink Stream: collect in an array all records within a window

2016-01-19 Thread Saiph Kappa
It's DataStream[String]. So it seems that SimpleStringSchema cannot be used in writeToSocket regardless of the type of the DataStream. Right? On Tue, Jan 19, 2016 at 1:32 PM, Matthias J. Sax wrote: > What type is your DataStream? It must be DataStream[String] to work with > SimpleStringSchema. >

Discarded messages on filter and checkpoint ack

2016-01-19 Thread Don Frascuchon
Hello, Working with flink checkpointing i see the messages discarded by a filter function are no acked for the checkpoint ack. How can i auto confirm those discarded messages? The ack notification is trigged by a sink ? Thanks in advance!

Re: Flink Stream: collect in an array all records within a window

2016-01-19 Thread Matthias J. Sax
What type is your DataStream? It must be DataStream[String] to work with SimpleStringSchema. If you have a different type, just implement a customized SerializationSchema. -Matthias On 01/19/2016 11:26 AM, Saiph Kappa wrote: > When I use SimpleStringSchema I get the error: Type mismatch, expect

Re: k means Clustering for images

2016-01-19 Thread Chiwan Park
Hi Ashutosh, There is a K-means implementation [1] in Flink example and a pending pull request about K-Means [2] in FlinkML (Machine learning library based on Flink). I think you can start from these implementations. [1]: https://github.com/apache/flink/blob/master/flink-examples/flink-example

integration with a scheduler

2016-01-19 Thread serkan . tas
Hi, I am planning to integrate flink with our job scheduler product to execute jobs - especially bathc like - on flink which may be the part of some other DAG style job chain. I need some control ablities like start, stop, suspend, get status... Where shold i go through ? -- Serkan Tas Likya B

Re: Problem to show logs in task managers

2016-01-19 Thread Ana M. Martinez
Hi Till, Sorry for the delay, you were right, I was not restarting the yarn cluster… Many thanks for your help! Ana On 11 Jan 2016, at 14:39, Till Rohrmann mailto:trohrm...@apache.org>> wrote: You have to restart the yarn cluster to let your changes take effect. You can do that via HADOOP_HO

k means Clustering for images

2016-01-19 Thread Ashutosh Kumar
I am looking for steps to perform clustering on image repository. Request you to provide me few pointers. Thanks Ashutosh

Re: Flink Stream: collect in an array all records within a window

2016-01-19 Thread Saiph Kappa
When I use SimpleStringSchema I get the error: Type mismatch, expected: SerializationSchema[String, Array[Byte]], actual: SimpleStringSchema. I think SimpleStringSchema extends SerializationSchema[String], and therefore cannot be used as argument of writeToSocket. Can you confirm this please? s.wr

Re: Flink Stream: collect in an array all records within a window

2016-01-19 Thread Matthias J. Sax
There is SimpleStringSchema. -Matthias On 01/18/2016 11:21 PM, Saiph Kappa wrote: > Hi Matthias, > > Thanks for your response. The method .writeToSocket seems to be what I > was looking for. Can you tell me what kind of serialization schema > should I use assuming my socket server receives strin

Re: Unexpected behavior with Scala App trait.

2016-01-19 Thread Andrea Sella
Hi Chiwan, I’m not expert of Scala but It seems about closure cleaning problem. Scala > App trait extends DelayedInit trait to initialize object. But Flink > serialization stack doesn’t handle this special initialization. (It is just > my opinion, not verified.) > I arrived at the same conclusion