Re: [Discussion] Query regarding Join and Windows

2016-07-01 Thread Aljoscha Krettek
Hi, yes, the window operator is stateful, which means that it will pick up where it left in case of a failure and restore. You're right about the graph, chained operators are shown as one box. Cheers, Aljoscha On Fri, 1 Jul 2016 at 04:52 Vinay Patil wrote: > Hi, > > Just watched the video on R

Re: [Discussion] Query regarding Join and Windows

2016-06-30 Thread Vinay Patil
Hi, Just watched the video on Robust Stream Processing . So when we say Window is a stateful operator , does it mean that even if the task manager doing the window operation fails, will it pick up from the state left earlier when it comes up ? (Have not read more on state for now) Also in one o

Re: [Discussion] Query regarding Join and Windows

2016-06-30 Thread Vinay Patil
Hi Aljoscha, Just wanted to check if it works with it. Anyways to solve the problem what we have thought of is to push heartbeat message to Kafka after certain interval, so that we get continuous stream always and that edge case will never occur, right ? One more question I have regarding the fai

Re: [Discussion] Query regarding Join and Windows

2016-06-30 Thread Aljoscha Krettek
Hi, I think the problem is that the DeltaFunction needs to have this signature: DeltaFunction, Tuple2>> because the Trigger will see elements from both input streams which are represented as a TaggedUnion that can contain an element from either side. May I ask why you want to use the DeltaTrigge

Re: [Discussion] Query regarding Join and Windows

2016-06-29 Thread Vinay Patil
Hi, Yes , now I am getting clear with the concepts here. One last thing I want to try before going for custom trigger, I want to try Delta Trigger but I am not able to get the syntax right , this is how I am trying it : TypeInformation> typeInfo = TypeInformation.of(new TypeHint>() { }); // sourc

Re: [Discussion] Query regarding Join and Windows

2016-06-29 Thread Aljoscha Krettek
Hi, you can use ingestion time if you don't care about the timestamps in your events, yes. If elements from the two streams happen to arrive at such times that they are not put into the same window then you won't get a match, correct. Regarding ingestion time and out-of-order events. I think this

Re: [Discussion] Query regarding Join and Windows

2016-06-29 Thread Vinay Patil
Hi, Ok. Inside the checkAndGetNextWatermark(lastElement, extractedTimestamp) method both these parameters are coming same (timestamp value) , I was expecting last element timestamp value in the 1st param when I extract it. Lets say I decide to use IngestionTime (since I am getting accurate result

Re: [Discussion] Query regarding Join

2016-06-29 Thread Aljoscha Krettek
Hi, the element will be kept around indefinitely if no new watermark arrives. I think the same problem will persist for AssignerWithPunctuatedWatermarks since there you also might not get the required "last watermark" to trigger processing of the last window. Cheers, Aljoscha On Wed, 29 Jun 2016

Re: [Discussion] Query regarding Join

2016-06-29 Thread Vinay Patil
Hi Aljoscha, This clears a lot of doubts now. So now lets say the stream paused for a while or it stops completely on Friday , let us assume that the last message did not get processed and is kept in the internal buffers. So when the stream starts again on Monday , will it consider the last eleme

Re: [Discussion] Query regarding Join

2016-06-29 Thread Aljoscha Krettek
Hi, the reason why the last element might never be emitted is the way the ascending timestamp extractor works. I'll try and explain with an example. Let's say we have a window size of 2 milliseconds, elements arrive starting with timestamp 0, window begin timestamp is inclusive, end timestamp is e

Re: [Discussion] Query regarding Join

2016-06-28 Thread Vinay Patil
Hi Aljoscha, Thanks a lot for your inputs. I still did not get you when you say you will not face this issue in case of continuous stream, lets consider the following example : Assume that the stream runs continuously from Monday to Friday, and on Friday it stops after 5.00 PM , will I still fac

Re: [Discussion] Query regarding Join

2016-06-28 Thread Aljoscha Krettek
Hi, ingestion time can only be used if you don't care about the timestamp in the elements. So if you have those you should probably use event time. If your timestamps really are strictly increasing then the ascending extractor is good. And if you have a continuous stream of incoming elements you w

Re: [Discussion] Query regarding Join

2016-06-28 Thread Vinay Patil
Hi Aljoscha, Thank you for your response. So do you suggest to use different approach for extracting timestamp (as given in document) instead of AscendingTimeStamp Extractor ? Is that the reason I am seeing this unexpected behaviour ? in case of continuous stream I would not see any data loss ? A

Re: [Discussion] Query regarding Join

2016-06-28 Thread Aljoscha Krettek
Hi, first regarding tumbling windows: even if you have 5 minute windows it can happen that elements that are only seconds apart go into different windows. Consider the following case: |x | x | These are two 5-mintue windows and the two elements are only seconds apa

Re: [Discussion] Query regarding Join

2016-06-27 Thread Vinay Patil
Hi, Following is the timestamp I am getting from DTO, here is the timestamp difference between the two records : 1466115892162154279 1466116026233613409 So the time difference is roughly 3 min, even if I apply the window of 5min , I am not getting the last record (last timestamp value above), usi

Re: [Discussion] Query regarding Join

2016-06-27 Thread Vinay Patil
Just an update, when I keep IngestionTime and remove the timestamp I am generating, I am getting all the records, but for Event Time I am getting one less record, I checked the Time Difference between two records, it is 3 min, I tried keeping the window time to 5 mins, but that even did not work.

Re: [Discussion] Query regarding Join

2016-06-27 Thread Vinay Patil
Hi , Actually I am only publishing 5 messages each to two different kafka topics (using Junit), even if I keep the window to 500 seconds the result is same. I am not understanding why it is not sending the 5th element to co-group operator even when the keys are same. I actually cannot share the

Re: [Discussion] Query regarding Join

2016-06-27 Thread Aljoscha Krettek
Hi, what timestamps are you assigning? Is it guaranteed that all of them would fall into the same 30 second window? The issue with duplicate printing in the ElementSelector is strange? Could you post a more complete code example so that I can reproduce the problem? Cheers, Aljoscha On Mon, 27 Ju

Re: [Discussion] Query regarding Join

2016-06-27 Thread Vinay Patil
Hi , I am able to get the matching and non-matching elements. However when I am unit testing the code , I am getting one record less inside the overriden cogroup function. Testing the following way : 1) Insert 5 messages into local kafka topic (test1) 2) Insert different 5 messages into local ka

Re: [Discussion] Query regarding Join

2016-06-15 Thread Fabian Hueske
Can you add a flag to each element emitted by the CoGroupFunction that indicates whether it was joined or not? Then you can use split to distinguish between both cases and handle both streams differently. Best, Fabian 2016-06-15 6:45 GMT+02:00 Vinay Patil : > Hi Jark, > > I am able to get the no

Re: [Discussion] Query regarding Join

2016-06-14 Thread Vinay Patil
Hi Jark, I am able to get the non-matching elements in a stream :, Of-course we can collect the matching elements in the same stream as well, however I want to perform additional operations on the joined stream before writing it to S3, so I would have to include a separate join operator for the s

Re: [Discussion] Query regarding Join

2016-06-14 Thread Vinay Patil
You are right, debugged it for all elements , I can do that now. Thanks a lot. Regards, Vinay Patil On Tue, Jun 14, 2016 at 11:56 AM, Jark Wu wrote: > In `coGroup(Iterable iter1, Iterable iter2, > Collector out)` , when both iter1 and iter2 are not empty, it > means they are matched elements

Re: [Discussion] Query regarding Join

2016-06-13 Thread Jark Wu
In `coGroup(Iterable iter1, Iterable iter2, Collector out)` , when both iter1 and iter2 are not empty, it means they are matched elements from both stream. When one of iter1 and iter2 is empty , it means that they are unmatched. - Jark Wu (wuchong) > 在 2016年6月14日,下午12:46,Vinay Patil 写道: >

Re: [Discussion] Query regarding Join

2016-06-13 Thread Vinay Patil
Hi Matthias , I did not get you, even if we use Co-Group we have to apply it on a key sourceStream.coGroup(destStream) .where(new ElementSelector()) .equalTo(new ElementSelector()) .window(TumblingEventTimeWindows.of(Time.seconds(30))) .apply(new CoGroupFunction() { private static final long seri

Re: [Discussion] Query regarding Join

2016-06-13 Thread Matthias J. Sax
You need to do an outer-join. However, there is no build-in support for outer-joins yet. You can use Window-CoGroup to implement the outer-join as an own operator. -Matthias On 06/13/2016 06:53 PM, Vinay Patil wrote: > Hi, > > I have a question regarding the join operation, consider the follow

[Discussion] Query regarding Join

2016-06-13 Thread Vinay Patil
Hi, I have a question regarding the join operation, consider the following dummy example: StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.setStreamTimeCharacteristic(TimeCharacteristic.IngestionTime); DataStreamSource sourceStream = env.fromElements(10,2