Re: Sharing Java Collections within Flink Cluster

2016-09-07 Thread pushpendra.jaiswal
Hi Fabian I am also looking for this solution, could you help me with two things: 1. How this is different from Queryable state. 2. How to query this key-value state from DS2 even if its running in the same application. e.g. val keyedStream = stream.keyby(_.key) val otherStream = somekafka.cr

Re: assignTimestamp after keyBy

2016-09-07 Thread pushpendra.jaiswal
Please refer https://ci.apache.org/projects/flink/flink-docs-master/dev/event_timestamps_watermarks.html for assigning timestamps. You can do map after keyby to assign timestamps e.g: val withTimestampsAndWatermarks: DataStream[MyEvent] = stream .filter( _.severity == WARNING )

Unable to find KvState flink Queryablestate

2016-09-07 Thread Pushpendra Jaiswal
On querying to Queryable State it throws Exception, Exception org.apache.flink.runtime.query.UnknownKvStateLocation: No KvStateLocation found for KvState instance with name 'queryStore'. at org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$handleKvS

assignTimestamp after keyBy

2016-09-07 Thread Dong-iL, Kim
Hi. my stream data is from some files. ( files -> kafka -> flink(source -> keyBy -> windowing) ) data is arranged in a file. I wanna assingTimestamp after keyBy. How can I do that. Regards.

Re: Sharing Java Collections within Flink Cluster

2016-09-07 Thread Fabian Hueske
That depends. 1) Growing/Shrinking: This should work. New entries can always be inserted. In order to remove entries from the k-v-state you have to set the value to null. Note that you need an explicit delete-value record to trigger the eviction. 2) Multiple lookups: This does only work if all look

Re: Exception in CEP 1.1.2

2016-09-07 Thread Gyula Fóra
Interestingly on my local machine I could not reproduce the problem, maybe it was some build issue on the other machine. Have to investigate tomorrow :) Gyula Gyula Fóra ezt írta (időpont: 2016. szept. 7., Sze, 17:37): > Hi, > > I will try to get some minimal input to reproduce this. We were r

Re: Sharing Java Collections within Flink Cluster

2016-09-07 Thread Chakravarthy varaga
I'm understanding this better with your explanation.. With this use case,each element in DS1 has to look up against a 'bunch of keys' from DS2 and DS2 could shrink/expand in terms of the no., of keys will the key-value shard work in this case? On Wed, Sep 7, 2016 at 7:44 PM, Fabian Hueske

Re: Sharing Java Collections within Flink Cluster

2016-09-07 Thread Fabian Hueske
Operator state is always local in Flink. However, with key-value state, you can have something which behaves kind of similar to a distribute hashmap, because each operator holds a different shard/partition of the hashtable. If you have to do only a single key lookup for each element of DS1, you sh

Re: Sharing Java Collections within Flink Cluster

2016-09-07 Thread Chakravarthy varaga
certainly, what I thought as well... The output of DataStream2 could be in 1000s and there are state updates... reading this topic from the other job, job1, is okie. However, assuming that we maintain this state into a collection, and updating the state (by reading from the topic) in this collectio

Re: Sharing Java Collections within Flink Cluster

2016-09-07 Thread Fabian Hueske
Is writing DataStream2 to a Kafka topic and reading it from the other job an option? 2016-09-07 19:07 GMT+02:00 Chakravarthy varaga : > Hi Fabian, > > Thanks for your response. Apparently these DataStream > (Job1-DataStream1 & Job2-DataStream2) are from different flink applications > running

Re: Sharing Java Collections within Flink Cluster

2016-09-07 Thread Chakravarthy varaga
Hi Fabian, Thanks for your response. Apparently these DataStream (Job1-DataStream1 & Job2-DataStream2) are from different flink applications running within the same cluster. DataStream2 (from Job2) applies transformations and updates a 'cache' on which (Job1) needs to work on. Our inte

Re: fsbackend with nfs

2016-09-07 Thread Robert Metzger
Hi CPC, It should be possible to use the FsBackend with NFS. However, I'm not sure how well it will perform. Regards, Robert On Mon, Sep 5, 2016 at 2:11 PM, CPC wrote: > Hi, > > Is it possible to use flinkstatebackend with nfs? We dont want to deploy > hadoop in our environment and we want to

Re: Sharing Java Collections within Flink Cluster

2016-09-07 Thread Fabian Hueske
Hi, Flink does not provide shared state. However, you can broadcast a stream to CoFlatMapFunction, such that each operator has its own local copy of the state. If that does not work for you because the state is too large and if it is possible to partition the state (and both streams), you can als

Re: Sharing Java Collections within Flink Cluster

2016-09-07 Thread Chakravarthy varaga
Hi Team, Can someone help me here? Appreciate any response ! Best Regards Varaga On Mon, Sep 5, 2016 at 4:51 PM, Chakravarthy varaga < chakravarth...@gmail.com> wrote: > Hi Team, > > I'm working on a Flink Streaming application. The data is injected > through Kafka connectors. The payl

Re: Exception in CEP 1.1.2

2016-09-07 Thread Gyula Fóra
Hi, I will try to get some minimal input to reproduce this. We were reading events from Kafka so I might need some time. Thanks Till for looking into this Gyula Till Rohrmann ezt írta (időpont: 2016. szept. 7., Sze, 17:34): > Hi Gyula, > > could you send us en example input which reproduces t

Re: Exception in CEP 1.1.2

2016-09-07 Thread Till Rohrmann
Hi Gyula, could you send us en example input which reproduces the problem? The underlying problem is that the system expects a state to be still stored in the `SharedBuffer` which has already been removed. This should actually not happen and it clearly indicates a bug. Cheers, Till On Wed, Sep

Re: Flink Iterations vs. While loop

2016-09-07 Thread Till Rohrmann
Hi Dan, first a general remark: I fear that your L-BFGS implementation is not well suited for large scale problems. You might wanna take a look at [1]. In the case of the while loop solution you're actually executing n jobs with n being the number of iterations. Thus, you have to add the executio

Exception in CEP 1.1.2

2016-09-07 Thread Gyula Fóra
Hi guys, We tried building a simple pattern with the CEP library that matches 2 events with 2 filter conditions (where) but we get a strange error that comes from the stream operator: Pattern, ?> viewAndClick = Pattern .> begin("view") .where(Either::isLeft)

Re: How to assign timestamp for event time in a stream?

2016-09-07 Thread Till Rohrmann
Hi, using event time and assigning timestamps does not order the stream records. In order to do that you can define a window and sort the elements in each window using Java sorting, for example. Alternatively, you can write your own operator which has a priority queue and always emits the elements

Re: FlinkML ALS matrix factorization: java.io.IOException: Thread 'SortMerger Reading Thread' terminated due to an exception: null

2016-09-07 Thread Till Rohrmann
Hi Andrea, the exception says that you don't have enough heap memory available to keep a factors block in memory. You always have to create an object on the heap when the user function is called. You can try the following out to solve the problem. 1. Further decrease the taskmanager.memory.fract

How to assign timestamp for event time in a stream?

2016-09-07 Thread jiecxy
The program is to read the unordered records from a log file, and to print the record in order. But it doesn't change the order, is there anything wrong in my code? Can anyone give me an example? This is my program: Note: the class Tokenizer is to transfer the log to four parts. Like this:

Re: FlinkML ALS matrix factorization: java.io.IOException: Thread 'SortMerger Reading Thread' terminated due to an exception: null

2016-09-07 Thread ANDREA SPINA
Ok, I'm still struggling with ALS. Now I'm running with a dataset of 2M users, 250K items, 700 rates per users (1,4B ratings). 50 latent factors, 400 numOfBlocks, 400 DOP. Somehow I got the error, from the JM log I catch the previous mentioned exception: 09/06/2016 19:30:18 CoGroup (CoGroup

Re: Flink Iterations vs. While loop

2016-09-07 Thread Till Rohrmann
Usually, the while loop solution should perform much worse since it will execute with each new iteration all previous iterations steps without persisting the intermediate results. Thus, it should have a quadratic complexity in terms of iteration step operations instead of a linear complexity. Addit

Re: Not able to query : Queryable State

2016-09-07 Thread Till Rohrmann
I think the exception message is saying what’s the problem. The job simply does not exist. You can verify that by running bin/flink list or look it up in the web interface. The reason is that calling env.getStreamGraph.getJobGraph will generate a new JobGraph (not the one which is sent to the JobM

Re: 回复:Re: 回复:Re: fromParallelCollection

2016-09-07 Thread Timo Walther
If your data comes from HBase maybe it would also good to implement a HBase source. A current HBase sink is in the making: https://github.com/apache/flink/pull/2332 Maybe it would be better to save your data in an HDFS (e.g. CSV file) and use the built-in "readFile()". This does the parallelis

Re: Re: 回复:Re: modify coGroup GlobalWindows_GlobalWindow

2016-09-07 Thread Till Rohrmann
Hi Rimin, I have to admit that I don't really understand what you're trying to achieve. Could you try again to explain your problem? Cheers, Till On Tue, Sep 6, 2016 at 3:53 PM, wrote: > think your anwser. > but i can not get your ideal."If all elements of "words2" have been > processed, the r