If you have out-of-order data, there is no guarantee that the current
record has larger timestamp than previous record. Data is still
processed in offset order.

Also, you use selectKey() and groupByKey() thus triggering a
repartioning that may introduce out-of-order data downstream even if all
your original input data is ordered by timestamp.


-Matthias

On 12/13/18 10:53 PM, Dmitry Minkovsky wrote:
> I just discovered that 2.1.0 included MAX_TASK_IDLE_MS_CONFIG and dropped
> everything to play with it! Exciting!
> 
> I built the following topology, attempting to synchronize two
> topic-partitions:
> https://gist.github.com/dminkovsky/45aa29aefefad564f9663cd36ad21ce1.
> 
> However, I keep failing at
> https://gist.github.com/dminkovsky/45aa29aefefad564f9663cd36ad21ce1#file-gistfile1-txt-L42.
> Am I misunderstanding how this feature works? The two topic-partitions
> appear to be grouped together in a single task:
> 
> [2018-12-13 16:38:29,082] DEBUG
> (org.apache.kafka.streams.processor.internals.TaskManager:120)
> stream-thread
> [migration-fed2f8fa-150e-494c-b1c8-0594f6d52a29-StreamThread-1] Adding
> assigned tasks as active: {0_0=[join-requests-0,
> settings-update-requests-0],
> 1_0=[migration-KSTREAM-AGGREGATE-STATE-STORE-0000000008-repartition-0]}
> 
> 
> So shouldn't this new config enable effectively synchronizing them?
> 
> 
> Thank you,
> 
> Dmitry
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to