Re: PLC/Scada/Sensor anomaly detection

2016-05-07 Thread Aljoscha Krettek
Hi, I'm afraid not. I think the first step would be finding a way to get the data into Flink. The rest could be simple, since there are already some good libraries such as the CEP library. Cheers, Aljoscha On Wed, 4 May 2016 at 00:37 Ivan wrote: > Hello! > Has anyone used Flink in "production"

Re: Window

2016-05-07 Thread Aljoscha Krettek
Hi, you could do it with a combination of GlobalWindows and the ContinuousProcessingTimeTrigger: env.socketTextStream("192.168.1.101", ) .map(new mapper()) .keyBy(0) .window(GlobalWindows.create()) .trigger(PurgingTrigger.of(ContinuousProcessingTimeTrigger.of(Time.seconds(5 .apply

Re: How to choose the 'parallelism.default' value

2016-05-07 Thread Aljoscha Krettek
Could it be that the TaskManagers are configured with not-enough memory? On Thu, 5 May 2016 at 13:35 Robert Metzger wrote: > The default value of taskmanager.network.numberOfBuffers is 2048. I would > recommend to use a multiple of that value, for example 16384 (given that > you have enough memo

Re: question regarding windowed stream

2016-05-07 Thread Aljoscha Krettek
Hi, yes, the input does indeed play a role. If not elements are incoming then there will also be no window. Cheers, Aljoscha On Fri, 6 May 2016 at 12:18 Balaji Rajagopalan < balaji.rajagopa...@olacabs.com> wrote: > I have a requirement where I want to do aggregation on one data stream > every 5

Re: general design questions when using flink

2016-05-07 Thread Aljoscha Krettek
Hi, if it is a fixed number of event types and logical pipelines I would probably split them into several jobs to achieve good isolation. There are, however people who go a different way and integrate everything into a general-purpose job that can be dynamically modified and also deals with errors

Re: How to choose the 'parallelism.default' value

2016-05-07 Thread Punit Naik
I am afraid not. On 07-May-2016 1:24 PM, "Aljoscha Krettek" wrote: > Could it be that the TaskManagers are configured with not-enough memory? > > On Thu, 5 May 2016 at 13:35 Robert Metzger wrote: > >> The default value of taskmanager.network.numberOfBuffers is 2048. I would >> recommend to use a

How to make Flink read all files in HDFS folder and do transformations on th e data

2016-05-07 Thread Palle
Hi there. I've got a HDFS folder containing a lot of files. All files contains a lot of JSON objects, one for each line. I will have several TB in the HDFS folder. My plan is to make Flink read all files and all JSON objects and then do some analysis on the data, actually very similar to the fl

Re: How to make Flink read all files in HDFS folder and do transformations on th e data

2016-05-07 Thread Flavio Pompermaier
I had the same issue :) I resolved it reading all file paths in a collection, then using this code: env.fromCollection(filePaths).rebalance().map(file2pojo) You can have your dataset of Pojos! The rebalance() is necessary to exploit parallelism,otherwise the pipeline will be executed with parall

Re: How to make Flink read all files in HDFS folder and do transformations on th e data

2016-05-07 Thread Fabian Hueske
Hi Palle, you can recursively read all files in a folder as explained in the "Recursive Traversal of the Input Path Directory" section of the Data Source documentation [1]. The easiest way to read line-wise JSON objects is to use ExecutionEnvironment.readTextFile() which reads text files linewise

Re: How to make Flink read all files in HDFS folder and do transformations on th e data

2016-05-07 Thread Flavio Pompermaier
Sorry Palle, I wrongly understood that you were trying to read a single json object per file...the solution suggested by Fabian is definitely the right solution for your specific use case! Best, Flavio On 7 May 2016 12:52, "Fabian Hueske" wrote: > Hi Palle, > > you can recursively read all files

Unable to understand datastream error message

2016-05-07 Thread subash basnet
Hello all, I am getting the below error on execute of StreamExecutionEnvironment. *Caused by: java.lang.IllegalStateException: Iteration FeedbackTransformation{id=15, name='Feedback', outputType=PojoType]>, parallelism=4} does not have any feedback edges.* The run method inside the thread class

Re: Unable to understand datastream error message

2016-05-07 Thread Aljoscha Krettek
Could you please post your code. On Sat, 7 May 2016 at 19:16 subash basnet wrote: > Hello all, > > I am getting the below error on execute of StreamExecutionEnvironment. > > > *Caused by: java.lang.IllegalStateException: Iteration > FeedbackTransformation{id=15, name='Feedback', > outputType=Poj