I am assigning timestamps using a  threshold-based extractor
<https://gist.github.com/shikhar/2d9306e2ebd8ca89728c>   -- the static delta
from last timestamp is probably sufficient and the PriorityQueue for
allowing outliers not necessary, that is something I added while figuring
out what was going on.

The timestamps across partitions don't differ that much in normal operation
when stream processing is caught up with the head of the partitions, so the
thresholding works well. However, during catch-up, like if I stop for a bit
& start the job again, or there is no offset in ZK and I'm using
'auto.offset.reset=smallest', the source tends to emit messages with much
larger deviations, and the timestamp extraction which is not partition-aware
will start providing an incorrect watermark.


Aljoscha Krettek wrote
> Hi,
> in general it should not be a problem if one parallel instance of a sink
> is responsible for several Kafka partitions. It can become a problem if
> the timestamps in the different partitions differ by a lot and the
> watermark assignment logic is not able to handle this.
> 
> How are you assigning the timestamps/watermarks in your job?
> 
> Cheers,
> Aljoscha
>> On 08 Feb 2016, at 21:51, shikhar &lt;

> shikhar@

> &gt; wrote:
>> 
>> Stephan explained in that thread that we're picking the min watermark
>> when
>> doing operations that join streams from multiple sources. If we have m:n
>> partition-source assignment where m>n, the source is going to end up with
>> the max watermark. Having m<=n ensures that the lowest watermark is used.
>> 
>> Re: automatic enforcement, perhaps allowing for more than 1 Kafka
>> partition
>> on a source should require opt-in, e.g. allowOversubscription()
>> 
>> 
>> 
>> --
>> View this message in context:
>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-partition-alignment-for-event-time-tp4782p4788.html
>> Sent from the Apache Flink User Mailing List archive. mailing list
>> archive at Nabble.com.





--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-partition-alignment-for-event-time-tp4782p4816.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.

Reply via email to