Hi,
I have asked this on StackOverflow, but will ask it here as well.

I have an Apache Kafka 2.6 Producer which writes to topic-A (TA). I also have a 
Kafka streams application which consumes from TA and writes to topic-B (TB). In 
the streams application, I have a custom timestamp extractor which extracts the 
timestamp from the message payload.

For one of my failure handling test cases, I shutdown the Kafka cluster while 
my applications are running.

When the producer application tries to write messages to TA, it cannot because 
the cluster is down and hence (I assume) buffers the messages. Let's say it 
receives 4 messages m1,m2,m3,m4 in increasing time order. (i.e. m1 is first and 
m4 is last).

When I bring the Kafka cluster back online, the producer sends the buffered 
messages to the topic, but they are not in order. I receive for example, m2 
then m3 then m1 and then m4.

Why is that ? Is it because the buffering in the producer is multi-threaded 
with each producing to the topic at the same time ?

I assumed that the custom timestamp extractor would help in ordering messages 
when consuming them. But they do not. Or maybe my understanding of the 
timestamp extractor is wrong.

If not, then what are the specific uses of the timestamp extractor ? Just to 
associate a timestamp with an event ?

I got one solution from SO here, to just stream all events from tA to another 
intermediate topic (say tA') which will use the TimeStamp extractor to another 
topic. But I am not sure if this will cause the events to get reordered based 
on the extracted timestamp.

Regards,
Neeraj

Reply via email to