Re: Kafka Streams - Out of Order Handling

2021-04-28 Thread Marcus Horsley-Rai
Thanks very much for taking the time to answer, Matthias! Very much appreciated All the best, Marcus On Wed, Apr 7, 2021 at 10:22 PM Matthias J. Sax wrote: > Sorry for late reply... > > > > I only see issues of out of order data in my re-partitioned topic as a > result of a rebalance happenin

Re: Kafka Streams - Out of Order Handling

2021-04-07 Thread Matthias J. Sax
Sorry for late reply... > I only see issues of out of order data in my re-partitioned topic as a result > of a rebalance happening. If you re-partition, you may actually see out-of-order data even if there is no rebalance. In the end, during repartitioning you have multiple upstream writers for

Re: Kafka Streams - Out of Order Handling

2021-03-12 Thread Marcus Horsley-Rai
Thanks Matthias - that's great to know. > Increasing the grace period should not really affect throughput, but > latency. Yes, a slip of the tongue on my part, you’re right :-) One last question if I may? I only see issues of out of order data in my re-partitioned topic as a result of a rebalan

Re: Kafka Streams - Out of Order Handling

2021-03-10 Thread Matthias J. Sax
> will it consider a timestamp in the body of the message, if we have > implemented a custom TimeExtractor? Yes. > Or, which I feel is more likely - does TimeExtractor stream time only apply > later on once deserialisation has happened? Well, the extractor does apply after deserialization, bu

Re: Kafka Streams - Out of Order Handling

2021-03-10 Thread Marcus Horsley-Rai
Thanks for your reply Matthias, and really great talks :-) You’re right that I only have one input topic - though it does have 20 partitions. The pointer to max.task.idle.ms cleared something up for me; I read the following line from Kafka docs but couldn’t find what configuration they were r

Re: Kafka Streams - Out of Order Handling

2021-03-09 Thread Matthias J. Sax
In general, Kafka Streams tries to process messages in timestamp order, ie, oldest message first. However, Kafka Streams always need to process messages in offset order per partition, and thus, the timestamp synchronization applied to records from different topic (eg, if you join two topics). Ther