1. The record will be re-read, but the state would not be re-build (ie,
no undo of step (2). Thus, on re-processing you would add the record
again, and you would "over count" in step (3) -- trigger would still
fire I assume.

2. I assume, by "forward" you mean writing to an output topic: If you
write to an output topic, the Producer will buffer multiple records to
increase write efficiency. Thus, if can happen that the write is done
immediately (if the buffer is full) or later. So, yes, there could be
duplicate in the output topic, because the write can happen before flush.

3. As mentioned above, the write can happen before flushing

4. Yes. For this case, the state would be reset, too.


-Matthias

On 9/18/18 10:09 PM, Vishnu Viswanath wrote:
> Hi All,
> 
> I have KafkaStreams application (processor API) that does roughly the below
> steps.
> 
> 1. read
> 2. add to state
> 3. check state size (count based trigger)
> 3.1 process
> 3.2 delete records from state
> 3.3 forward
> 3.4 commit
> (kafka internally does)
> 3.4.1 flush state
> 3.4.2 flush producer
> 3.4.3 commit offset
> 
> Have the following questions regarding failures at different steps:
> 
> 1. Am I correct in assuming that failure at steps 1, 2, 3.1 & 3.2 is fine
> (will read records from last committed offset and rebuild the state and
> continue with rest of the steps.
> 2. When will forward at step 3.3 have effect? will it be sent out only at
> 3.4.2 (when producer is flushed) or can it happen before that? if it can
> happen before, then any failure after 3.3 can cause duplicates?
> 3. if forward will take effect only during flush, still a failure between
> 3.4.2 and 3.4.3 can cause duplicates?
> 4. willl setting processing.guarantee=exactly_once solve the duplicates
> issue?
> 
> Thanks,
> Vishnu
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to