Optimizations not performed - please confirm

2016-06-28 Thread Ovidiu-Cristian MARCU
Hi, The optimizer internals described in this document [1] are probably not up-to-date. Can you please confirm if this is still valid: “The following optimizations are not performed Join reordering (or operator reordering in general): Joins / Filters / Reducers are not re-ordered in Flink. This

Flink for historical time series processing

2016-06-28 Thread Mindaugas Zickus
Hi All, I wonder if Flink is a right tool for processing historical time series data e.g. many small files. Our use case: we have clickstream histories (time series) of many users. We would like to calculate user specific sliding count window aggregates over past periods for a sample of users

How to count number of records received per second in processing time while using event time characteristic

2016-06-28 Thread Saiph Kappa
Hi, I have a flink streaming application and I want to count records received per second (as a way of measuring the throughput of my application). However, I am using the EventTime time characteristic, as follows: val env = StreamExecutionEnvironment.getExecutionEnvironment env.setStreamTimeChara

Re: maximum size of window

2016-06-28 Thread Vishnu Viswanath
Hi, Thank you for the responses. I am not sure if I will be able to use Fold/Reduce function, but I will keep that in mind. I have one more question, so what is the implication of having a key that splits the data into window of very small size(=> large number of small windows) ? Thanks and Rega

Re: Flink java.io.FileNotFoundException Exception with Peel Framework

2016-06-28 Thread ANDREA SPINA
Hi Max, thank you for the fast reply and sorry: I use flink-1.0.3. Yes I tested on dummy dataset with numOfBuffers = 16384 and decreasing the parallelism degree and this solution solved the first exception. Anyway on the 80GiB dataset I struggle with the second exception. Regards, Andrea 2016-06-

Re: Adding 3rd party moving average and other 'indicators'

2016-06-28 Thread Maximilian Michels
Hi Anton, I would suggest you simply put your moving average code in a MapFunction where you can keep track of the current average using a class field. Cheers, Max On Fri, Jun 24, 2016 at 10:05 PM, Anton wrote: > Hello > > I'm currently trying to learn Flink. And so far am really impressed by i

Re: Flink java.io.FileNotFoundException Exception with Peel Framework

2016-06-28 Thread Maximilian Michels
Hi Andrea, The number of network buffers should be sufficient. Actually, assuming you have 16 task slots on each of the 25 nodes, it should be enough to have 16^2 * 25 * 4 = 14400 network buffers. See https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#background So we have

Flink java.io.FileNotFoundException Exception with Peel Framework

2016-06-28 Thread ANDREA SPINA
Hi everyone, I am running some Flink experiments with Peel benchmark http://peel-framework.org/ and I am struggling with exceptions: the environment is a 25-nodes cluster, 16 cores per nodes. The dataset is ~80GiB and is located on Hdfs 2.7.1. At the beginning I tried with 400 as degree of parall

Re: maximum size of window

2016-06-28 Thread Aljoscha Krettek
Hi, one thing to add: if you use a ReduceFunction or a FoldFunction for your window the state will not grow with bigger window sizes or larger numbers of elements because the result is eagerly computed. In that case, state size is only dependent on the number of individual keys. Cheers, Aljoscha

Re: Way to hold execution of one of the map operator in Co-FlatMaps

2016-06-28 Thread Aljoscha Krettek
Hi, I might lead to flooding, yes. But I'm afraid it's the only way to go right now. Cheers, Aljoscha On Mon, 27 Jun 2016 at 17:57 Biplob Biswas wrote: > Hi, > > I was afraid of buffering because I am not sure when the second map > function > would get data, wouldn't the first map be flooded wi

Re: maximum size of window

2016-06-28 Thread Kostas Kloudas
Hi Vishnu, RocksDB allows for storing the window contents on disk when the state of a window becomes too big. BUT when you have to trigger and apply the computation of your window function on that big window, then all of its state is loaded in memory. So although during the window formation ph