Reprocessing messsages in kafka streams vs Transformer init

xardaso Fri, 07 Oct 2022 13:58:15 -0700

Hi Everyone,

I have a question related to messages auto reprocessing in kafka streams and 
Transformer/Processor init()/close() methods.


Is possible that in some scenarios (failures, rebalance etc.) a message is 
processed twice by Transformer.transform() without calling Transformer.init() 
between the first and the second processing?

To illustrate my question with an example. Let's say I have kafka streams 
application with a default at_least_once semantics setting. The topology is the 
following - read messages from one topic, apply Transformer and produce 
messages to another topic. Transformer:

new TransformerSupplier() {
     Transformer get() {
         return new Transformer() {
             private ProcessorContext context;

             void init(ProcessorContext context) {
                 logger.info("init called());

this.context = context;
             }

             KeyValue transform(K key, V value) {
                 logger.info("Transform offset: {}, partition: {}, 
context.offset(), context.partition());
                 return new KeyValue(key, value);
             }

             void close() {
                 logger.info("close() called.")
             }
         }
     }
 }

Is it possible that the app will log "Transform offset: x, partition: y" twice 
with the same x and y values each time without logging "init() called." between 
(chronologically)?

I tried to simulate some failure scenarios (for example when 
CommitFailedException is thrown) but in my simple test cases init() was always 
called between the first and the second reprocessing. Although I could have 
easily missed something or a scenario where this is not true which is why I'm 
asking this question here.

Sorry, if my question is not clear - please let me know in this case and I'll 
try to clarify.

Thank you for any help you can provide.

Regards,
Tomasz

Reprocessing messsages in kafka streams vs Transformer init

Reply via email to