subject:"TimerService\/Watermarks and Checkpoints"

Re: TimerService/Watermarks and Checkpoints

2018-06-01 Thread Narayanan Arunachalam

Yeah that's my observation too. Basically small chunks of late data can get added up quickly when data is read at a faster rate. On a related note, I would expect if there is no late data produced in Kafka, then immaterial of what rate the data is read, this problem should not occur. To take care

Re: TimerService/Watermarks and Checkpoints

2018-06-01 Thread Fabian Hueske

One explanation would be that during catch up, data is consumer with higher throughput because its just read from Kafka. Hence, you'd see also more late data per minute while the job catches up until it reads data at the rate at which it is produced into Kafka. Would that explain your observations

Re: TimerService/Watermarks and Checkpoints

2018-05-30 Thread Narayanan Arunachalam

Thanks for the explanation. I looked at this metric closely and noticed there are some events arriving in out of order. The hypothesis I have is, when the job is restarted, all of the small out of order chunks add up and show a significant number. The graph below shows the number of out of order ev

Re: TimerService/Watermarks and Checkpoints

2018-05-30 Thread Fabian Hueske

Hi Nara and Sihua, That's indeed an unexpected behavior and it would be good to identify the reason for the late data. As Sihua said, watermarks are currently not checkpointed and reset to Long.MIN_VALUE upon restart. AFAIK, the main reason why WMs are not checkpointed is that the special type of

Re: TimerService/Watermarks and Checkpoints

2018-05-30 Thread Narayanan Arunachalam

Thanks Sihua. If it's reset to Long.MIN_VALUE I can't explain why outOfOrderEvents are reported. Because the event time on the data will always be greater than Long.MIN_VALUE. Following are the steps to reproduce this scenario. - A source to produce events with timestamps that is increasing for ev

TimerService/Watermarks and Checkpoints

2018-05-29 Thread Narayanan Arunachalam

Hi, Is it possible the watermark in TimerService not getting reset when a job is restarted from a previous checkpoint? I would expect the watermark in a TimerService also to go back in time. I have the following ProcessFunction implementation. override def processElement( e: TraceEvent,

Re: TimerService/Watermarks and Checkpoints

Re: TimerService/Watermarks and Checkpoints

Re: TimerService/Watermarks and Checkpoints

Re: TimerService/Watermarks and Checkpoints

Re: TimerService/Watermarks and Checkpoints

TimerService/Watermarks and Checkpoints

6 matches

Site Navigation

Mail list logo

Footer information