Hi,
I had some basic questions on sequence of tasks for streaming application
restart in case of failure or otherwise.
Say my stream is structured this way
source-topic
branched into 2 kstreams
source-topic-1
source-topic-2
each mapped to 2 new kstreams (new key,value pairs) backed by 2 kafka
topics
source-topic-1-new
source-topic-2-new
each aggregated to new ktable backed by internal changelog topics
source-topic-1-new-table (scource-topic-1-new-changelog)
source-topic-2-new-table (scource-topic-2-new-changelog)
table1 left join table2 -> to final stream
Results of final stream are then persisted into another data storage
So if you see I have following physical topics or state stores
source-topic
source-topic-1-new
source-topic-2-new
scource-topic-1-new-changelog
scource-topic-2-new-changelog
Now at a give point if the streaming application is stopped there is some
data in all these topics.
Barring the source-topic all other topic has data inserted by the streaming
application.
Also I suppose streaming application stores the offset for each of the
topic as where it was last.
So when I restart the application how does the processing starts again?
Will it pick the data from last left changelog topics and process them
first and then process the source topic data from the offset last left?
Or it will start from source topic. I really don't want it to maintain
offset to changelog tables because any old key's value can be modified as
part of aggregation again.
Bit confused here, any light would help a lot.
Thanks
Sachin