Luke Chen created KAFKA-18911:

             Summary: alterPartition gets stuck when getting out-of-date errors
                 Key: KAFKA-18911
             Project: Kafka
          Issue Type: Bug
    Affects Versions: 3.9.0
            Reporter: Luke Chen
            Assignee: Luke Chen

When the leader node sends the AlterPartition request to the controller, the 
controller will do [some 
 before processing it. And in the leader node side, when receiving the errors, 
we'll decide if it should be retried or not 
 However, in some non-retry cases, we directly return false without changing 
the state:

  info(s"Failed to alter partition to $proposedIsrState since the controller 
doesn't know about " +
    "this topic or partition. Partition state may be out of sync, awaiting new 
the latest metadata.")
case Errors.UNKNOWN_TOPIC_ID =>
  info(s"Failed to alter partition to $proposedIsrState since the controller 
doesn't know about " +
    "this topic. Partition state may be out of sync, awaiting new the latest 
  info(s"Failed to alter partition to $proposedIsrState since the leader epoch 
is old. " +
    "Partition state may be out of sync, awaiting new the latest metadata.")
  info(s"Failed to alter partition to $proposedIsrState because the partition 
epoch is invalid. " +
    "Partition state may be out of sync, awaiting new the latest metadata.")
case Errors.INVALID_REQUEST =>
  info(s"Failed to alter partition to $proposedIsrState because the request is 
invalid. " +
    "Partition state may be out of sync, awaiting new the latest metadata.")
  // The operation completed successfully but this replica got removed from the 
replica set by the controller
  // while completing a ongoing reassignment. This replica is no longer the 
leader but it does not know it
  // yet. It should remain in the current pending state until the metadata 
overrides it.
  // This is only raised in KRaft mode.
  info(s"The alter partition request successfully updated the partition state 
to $proposedIsrState but " +
    "this replica got removed from the replica set while completing a 
reassignment. " +
    "Waiting on new metadata to clean up this replica.")
As we said in the log, "Partition state may be out of sync, awaiting new the 
latest metadata". But without updating the partition state means it will stays 
at `PendingExpandIsr` or `PendingShrinkIsr` state, which keeps the `isInflight` 
to true. Under this state, the partition state will never be updated anymore.


The impact of this issue is that the ISR state will be in stale(wrong) state 
until leadership change.

This message was sent by Atlassian Jira

Reply via email to