Re: Using Flink with Redis question

2015-09-04 Thread Márton Balassi
Hey Jerry, Jay is on the right track, this issue has to do with the Flink operator life cycle. When you hit execute all your user defined classes get serialized, so that they can be shipped to the workers on the cluster. To execute some code before your FlatMapFunction starts processing the data y

Re: Using Flink with Redis question

2015-09-04 Thread Jay Vyas
Maybe wrapping Jedis with a serializable class will do the trick? But in general is there a way to reference jar classes in flink apps without serializable them? > On Sep 4, 2015, at 1:36 PM, Jerry Peng wrote: > > Hello, > > So I am trying to use jedis (redis java client) with Flink streami

Using Flink with Redis question

2015-09-04 Thread Jerry Peng
Hello, So I am trying to use jedis (redis java client) with Flink streaming api, but I get an exception: org.apache.flink.client.program.ProgramInvocationException: The main method caused an error. at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:452) at or

Fwd: Memory management issue

2015-09-04 Thread Ricarda Schueler
Hi All, We're running into a memory management issue when using the iterateWithTermination function. Using a small amount of data, everything works perfectly fine. However, as soon as the main memory is filled up on a worker, nothing seems to be happening any more. Once this happens, any worker

Flink join with external source

2015-09-04 Thread Jerry Peng
Hello, Does a flink currently support operators to use redis? If I using the streaming api in Flink and I need to look up something in a redis database during the processing of the stream how can I do that?

Re: Flink to ingest from Kafka to HDFS?

2015-09-04 Thread Aljoscha Krettek
Hi, I have an open Pull Request for a RollingFile sink. It is integrated with checkpointing, so it can provide exactly-once behavior. If you're interested, please check it out: https://github.com/apache/flink/pull/1084 Cheers, Aljoscha On Wed, 26 Aug 2015 at 10:31 Stephan Ewen wrote: > BTW: We

BloomFilter Exception

2015-09-04 Thread Flavio Pompermaier
Hi to all, running a job with Flink 0/10-SNAPSHOT I got the following Exception: java.lang.IllegalArgumentException: expectedEntries should be > 0 at org.apache.flink.shaded.com.google.common.base.Preconditions.checkArgument(Preconditions.java:122) at org.apache.flink.runtime.opera

Re: Convergence Criterion in IterativeDataSet

2015-09-04 Thread Sachin Goel
Hi Andres Does something like this solve what you're trying to achieve? https://github.com/apache/flink/pull/918/files Regards Sachin On Sep 4, 2015 6:24 PM, "Stephan Ewen" wrote: > I think you can do this with the current interface. The convergence > criterion object stays around, so you should

Re: Convergence Criterion in IterativeDataSet

2015-09-04 Thread Stephan Ewen
I think you can do this with the current interface. The convergence criterion object stays around, so you should be able to simply store the current aggregator value in a field (when the check is invoked). Any round but the first could compare against that field. On Fri, Sep 4, 2015 at 2:25 PM, An

Convergence Criterion in IterativeDataSet

2015-09-04 Thread Andres R. Masegosa
Hi, I trying to implement some machine learning algorithms that involve several iterations until convergence (to a fixed point). My idea is to use a IterativeDataSet with an Aggregator which produces the result (i.e. a set of parameters defining the model). >From the interface "ConvergenceCrite

Re: How to create a stream of data batches

2015-09-04 Thread Stephan Ewen
Interesting question, you are the second to ask that. Batching in user code is a way, as Matthias said. We have on the roadmap a way to transform a stream to a set of batches, but it will be a bit until this is in. See https://cwiki.apache.org/confluence/display/FLINK/Streams+and+Operations+on+Str

Re: How to create a stream of data batches

2015-09-04 Thread Matthias J. Sax
Hi Andres, you could do this by using your own data type, for example > public class MyBatch { > private ArrayList data = new ArrayList > } In the DataSource, you need to specify your own InputFormat that reads multiple tuples into a batch and emits the whole batch at once. However, be aware,

Re: Usage of Hadoop 2.2.0

2015-09-04 Thread Matthias J. Sax
+1 for dropping On 09/04/2015 11:04 AM, Maximilian Michels wrote: > +1 for dropping Hadoop 2.2.0 binary and source-compatibility. The > release is hardly used and complicates the important high-availability > changes in Flink. > > On Fri, Sep 4, 2015 at 9:33 AM, Stephan Ewen wrote: >> I am good

How to create a stream of data batches

2015-09-04 Thread Andres R. Masegosa
Hi, I'm trying to code some machine learning algorithms on top of flink such as a variational Bayes learning algorithms. Instead of working at a data element level (i.e. using map transformations), it would be far more efficient to work at a "batch of elements" levels (i.e. I get a batch of elemen

Re: Question on flink and hdfs

2015-09-04 Thread Maximilian Michels
Hi Jerry, If you don't want to use Hadoop, simply pick _any_ Flink version. We recommend the Hadoop 1 version because it contains the least dependencies, i.e. you need to download less and the installation occupies less space. Other than that, it doesn't really matter if you don't use the HDFS fun

Re: Usage of Hadoop 2.2.0

2015-09-04 Thread Maximilian Michels
+1 for dropping Hadoop 2.2.0 binary and source-compatibility. The release is hardly used and complicates the important high-availability changes in Flink. On Fri, Sep 4, 2015 at 9:33 AM, Stephan Ewen wrote: > I am good with that as well. Mind that we are not only dropping a binary > distribution

Re: Question on flink and hdfs

2015-09-04 Thread Welly Tambunan
Hi Jerry, yes, that's possible. You can download the appropriate version https://flink.apache.org/downloads.html [image: Inline image 1] Cheers On Fri, Sep 4, 2015 at 1:57 AM, Jerry Peng wrote: > Hello, > > Does flink require hdfs to run? I know you can use hdfs to checkpoint and > process fil

Re: Efficiency for Filter then Transform ( filter().map() vs flatMap() )

2015-09-04 Thread Welly Tambunan
Hi Stephan, Thanks for your clarification. Basically we will have lots of sensor that will push this kind of data to queuing system ( currently we are using RabbitMQ, but will soon move to Kafka). We also will use the same pipeline to process the historical data. I also want to minimize the chai

Re: Efficiency for Filter then Transform ( filter().map() vs flatMap() )

2015-09-04 Thread Welly Tambunan
Hi Stephan, Cheers On Fri, Sep 4, 2015 at 2:31 PM, Stephan Ewen wrote: > We will definitely also try to get the chaining overhead down a bit. > > BTW: To reach this kind of throughput, you need sources that can produce > very fast... > > On Fri, Sep 4, 2015 at 12:20 AM, Welly Tambunan wrote: >

Re: Usage of Hadoop 2.2.0

2015-09-04 Thread Stephan Ewen
I am good with that as well. Mind that we are not only dropping a binary distribution for Hadoop 2.2.0, but also the source compatibility with 2.2.0. Lets also reconfigure Travis to test - Hadoop1 - Hadoop 2.3 - Hadoop 2.4 - Hadoop 2.6 - Hadoop 2.7 On Fri, Sep 4, 2015 at 6:19 AM, Chiwan

Re: Efficiency for Filter then Transform ( filter().map() vs flatMap() )

2015-09-04 Thread Stephan Ewen
We will definitely also try to get the chaining overhead down a bit. BTW: To reach this kind of throughput, you need sources that can produce very fast... On Fri, Sep 4, 2015 at 12:20 AM, Welly Tambunan wrote: > Hi Stephan, > > That's good information to know. We will hit that throughput easily