Re: Custom Watermark Instance being created multiple times for KafkaIO

2019-05-23 Thread Lukasz Cwik
The watermark should be checkpointed along with partition offsets. You will have one watermark class instance for each bundle that is processing. You will have one bundle processed per checkpoint and also one bundle per split (so that Kafka can be read in parallel by multiple workers) and also one

Re: Custom Watermark Instance being created multiple times for KafkaIO

2019-05-22 Thread rahul patwari
Hi Lukasz, There was a bug in my code. When the topic is idle, I indeed get watermark as (now - maxDelay). I have a few questions: I have created a static int variable in my watermark class and incremented the variable inside the constructor. I ran the pipeline in SparkRunner for approximately 3

Re: Custom Watermark Instance being created multiple times for KafkaIO

2019-05-22 Thread Lukasz Cwik
On Wed, May 22, 2019 at 11:17 AM rahul patwari wrote: > will watermark also get checkpointed by default along with the offset of > the partition? > > We have found a limitation for CustomTimestampPolicyWithLimitedDelay. Consider > this scenario: > If we are processing a stream of events from Kafk

Re: Custom Watermark Instance being created multiple times for KafkaIO

2019-05-22 Thread rahul patwari
will watermark also get checkpointed by default along with the offset of the partition? We have found a limitation for CustomTimestampPolicyWithLimitedDelay. Consider this scenario: If we are processing a stream of events from Kafka with event timestamps older than the current processing time(say

Custom Watermark Instance being created multiple times for KafkaIO

2019-05-21 Thread rahul patwari
Hi, We are using withTimestampPolicyFactory (TimestampPolicyFactory