[ https://issues.apache.org/jira/browse/KAFKA-18019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jeff Kim resolved KAFKA-18019. ------------------------------ Fix Version/s: 4.0.0 Resolution: Fixed > Convert INVALID_PRODUCER_ID_MAPPING from abortable error to fatal error > ----------------------------------------------------------------------- > > Key: KAFKA-18019 > URL: https://issues.apache.org/jira/browse/KAFKA-18019 > Project: Kafka > Issue Type: Sub-task > Reporter: Ritika Reddy > Assignee: Ritika Reddy > Priority: Major > Fix For: 4.0.0 > > > Since we bump epoch on abort, we no longer need to call InitProducerId to > fence requests. InitProducerId will only be called when the producer starts > up to fence a previous instance. > With this change, some other calls to InitProducerId were inspected including > the call after receiving an InvalidPidMappingException. This exception was > changed to abortable as part of [KIP-360: Improve reliability of > idempotent/transactional > producer|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=89068820]. > However, this change means that we can violate EOS guarantees. As an example: > Consider an application that is copying data from one partition to another > * Application instance A processes to offset 4 > * Application instance B comes up and fences application instance A > * Application instance B processes to offset 5 > * Application instances A and B are idle for transaction.id.expiration.ms, > transaction id expires on server > * Application instance A attempts to process offset 5 (since in its view, > that is next) -- if we recover from invalid pid mapping, we can duplicate > this processing > Thus, INVALID_PID_MAPPING should be fatal to the producer. > This is consistent with [KIP-1050: Consistent error handling for > Transactions|https://cwiki.apache.org/confluence/display/KAFKA/KIP-1050%3A+Consistent+error+handling+for+Transactions] > where errors that are fatal to the producer are in the "application > recoverable" category. This is a grouping that indicates to the client that > the producer needs to restart and recovery on the application side is > necessary. KIP-1050 is approved so we are consistent with that decision. > h3. -- This message was sent by Atlassian Jira (v8.20.10#820010)