Hi,
yes, the window operator is stateful, which means that it will pick up
where it left in case of a failure and restore.
You're right about the graph, chained operators are shown as one box.
Cheers,
Aljoscha
On Fri, 1 Jul 2016 at 04:52 Vinay Patil wrote:
> Hi,
>
> Just watched the video on R
Hi,
Just watched the video on Robust Stream Processing .
So when we say Window is a stateful operator , does it mean that even if
the task manager doing the window operation fails, will it pick up from
the state left earlier when it comes up ? (Have not read more on state for
now)
Also in one o
Hi Aljoscha,
Just wanted to check if it works with it.
Anyways to solve the problem what we have thought of is to push heartbeat
message to Kafka after certain interval, so that we get continuous stream
always and that edge case will never occur, right ?
One more question I have regarding the fai
Hi,
I think the problem is that the DeltaFunction needs to have this signature:
DeltaFunction,
Tuple2>>
because the Trigger will see elements from both input streams which are
represented as a TaggedUnion that can contain an element from either side.
May I ask why you want to use the DeltaTrigge
Hi,
Yes , now I am getting clear with the concepts here.
One last thing I want to try before going for custom trigger, I want to try
Delta Trigger but I am not able to get the syntax right , this is how I am
trying it :
TypeInformation> typeInfo = TypeInformation.of(new
TypeHint>() {
});
// sourc
Hi,
you can use ingestion time if you don't care about the timestamps in your
events, yes. If elements from the two streams happen to arrive at such
times that they are not put into the same window then you won't get a
match, correct.
Regarding ingestion time and out-of-order events. I think this
Hi,
Ok.
Inside the checkAndGetNextWatermark(lastElement, extractedTimestamp) method
both these parameters are coming same (timestamp value) , I was expecting
last element timestamp value in the 1st param when I extract it.
Lets say I decide to use IngestionTime (since I am getting accurate result
Hi,
the element will be kept around indefinitely if no new watermark arrives.
I think the same problem will persist for AssignerWithPunctuatedWatermarks
since there you also might not get the required "last watermark" to trigger
processing of the last window.
Cheers,
Aljoscha
On Wed, 29 Jun 2016
Hi Aljoscha,
This clears a lot of doubts now.
So now lets say the stream paused for a while or it stops completely on
Friday , let us assume that the last message did not get processed and is
kept in the internal buffers.
So when the stream starts again on Monday , will it consider the last
eleme
Hi,
the reason why the last element might never be emitted is the way the
ascending timestamp extractor works. I'll try and explain with an example.
Let's say we have a window size of 2 milliseconds, elements arrive starting
with timestamp 0, window begin timestamp is inclusive, end timestamp is
e
Hi Aljoscha,
Thanks a lot for your inputs.
I still did not get you when you say you will not face this issue in case
of continuous stream, lets consider the following example :
Assume that the stream runs continuously from Monday to Friday, and on
Friday it stops after 5.00 PM , will I still fac
Hi,
ingestion time can only be used if you don't care about the timestamp in
the elements. So if you have those you should probably use event time.
If your timestamps really are strictly increasing then the ascending
extractor is good. And if you have a continuous stream of incoming elements
you w
Hi Aljoscha,
Thank you for your response.
So do you suggest to use different approach for extracting timestamp (as
given in document) instead of AscendingTimeStamp Extractor ?
Is that the reason I am seeing this unexpected behaviour ? in case of
continuous stream I would not see any data loss ?
A
Hi,
first regarding tumbling windows: even if you have 5 minute windows it can
happen that elements that are only seconds apart go into different windows.
Consider the following case:
|x | x |
These are two 5-mintue windows and the two elements are only seconds apa
Hi,
Following is the timestamp I am getting from DTO, here is the timestamp
difference between the two records :
1466115892162154279
1466116026233613409
So the time difference is roughly 3 min, even if I apply the window of 5min
, I am not getting the last record (last timestamp value above),
usi
Just an update, when I keep IngestionTime and remove the timestamp I am
generating, I am getting all the records, but for Event Time I am getting
one less record, I checked the Time Difference between two records, it is 3
min, I tried keeping the window time to 5 mins, but that even did not work.
Hi ,
Actually I am only publishing 5 messages each to two different kafka topics
(using Junit), even if I keep the window to 500 seconds the result is same.
I am not understanding why it is not sending the 5th element to co-group
operator even when the keys are same.
I actually cannot share the
Hi,
what timestamps are you assigning? Is it guaranteed that all of them would
fall into the same 30 second window?
The issue with duplicate printing in the ElementSelector is strange? Could
you post a more complete code example so that I can reproduce the problem?
Cheers,
Aljoscha
On Mon, 27 Ju
Hi ,
I am able to get the matching and non-matching elements.
However when I am unit testing the code , I am getting one record less
inside the overriden cogroup function.
Testing the following way :
1) Insert 5 messages into local kafka topic (test1)
2) Insert different 5 messages into local ka
Can you add a flag to each element emitted by the CoGroupFunction that
indicates whether it was joined or not?
Then you can use split to distinguish between both cases and handle both
streams differently.
Best, Fabian
2016-06-15 6:45 GMT+02:00 Vinay Patil :
> Hi Jark,
>
> I am able to get the no
Hi Jark,
I am able to get the non-matching elements in a stream :,
Of-course we can collect the matching elements in the same stream as well,
however I want to perform additional operations on the joined stream before
writing it to S3, so I would have to include a separate join operator for
the s
You are right, debugged it for all elements , I can do that now.
Thanks a lot.
Regards,
Vinay Patil
On Tue, Jun 14, 2016 at 11:56 AM, Jark Wu
wrote:
> In `coGroup(Iterable iter1, Iterable iter2,
> Collector out)` , when both iter1 and iter2 are not empty, it
> means they are matched elements
In `coGroup(Iterable iter1, Iterable iter2,
Collector out)` , when both iter1 and iter2 are not empty, it means
they are matched elements from both stream.
When one of iter1 and iter2 is empty , it means that they are unmatched.
- Jark Wu (wuchong)
> 在 2016年6月14日,下午12:46,Vinay Patil 写道:
>
Hi Matthias ,
I did not get you, even if we use Co-Group we have to apply it on a key
sourceStream.coGroup(destStream)
.where(new ElementSelector())
.equalTo(new ElementSelector())
.window(TumblingEventTimeWindows.of(Time.seconds(30)))
.apply(new CoGroupFunction() {
private static final long seri
You need to do an outer-join. However, there is no build-in support for
outer-joins yet.
You can use Window-CoGroup to implement the outer-join as an own operator.
-Matthias
On 06/13/2016 06:53 PM, Vinay Patil wrote:
> Hi,
>
> I have a question regarding the join operation, consider the follow
Hi,
I have a question regarding the join operation, consider the following
dummy example:
StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.IngestionTime);
DataStreamSource sourceStream =
env.fromElements(10,2
26 matches
Mail list logo