[ https://issues.apache.org/jira/browse/KAFKA-16198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lucas Brutschy updated KAFKA-16198: ----------------------------------- Description: The current reconciliation code in `AsyncKafkaConsumer`s `MembershipManager` may lose part of the server-provided assignment when metadata is delayed. The reason is incorrect handling of partially resolved topic names, as in this example: * We get assigned {{T1-1}} and {{T2-1}} * We reconcile {{{}T1-1{}}}, {{T2-1}} remains in {{assignmentUnresolved}} since the topic id {{T2}} is not known yet * We get new cluster metadata, which includes {{{}T2{}}}, so {{T2-1}} is moved to {{assignmentReadyToReconcile}} * We call {{reconcile}} -- {{T2-1}} is now treated as the full assignment, so {{T1-1}} is being revoked * We end up with assignment {{T2-1, which is inconsistent with the broker-side target assignment.}} Generally, this seems to be a problem around semantics of the internal collections `assignmentUnresolved` and `assignmentReadyToReconcile`. Absence of a topic in `assignmentReadyToReconcile` may mean either revocation of the topic partition(s), or unavailability of a topic name for the topic. Internal state with simpler and correct invariants could be achieved by using a single collection `currentTargetAssignment` which is based on topic IDs and always corresponds to the latest assignment received from the broker. During every attempted reconciliation, all topic IDs will be resolved from the local cache, which should not introduce a lot of overhead. `assignmentUnresolved` and `assignmentReadyToReconcile` are removed. was: The current reconciliation code in `AsyncKafkaConsumer`s `MembershipManager` may lose part of the server-provided assignment when metadata is delayed. The reason is incorrect handling of partially resolved topic names, as in this example: * We get assigned {{T1-1}} and {{T2-1}} * We reconcile {{{}T1-1{}}}, {{T2-1}} remains in {{assignmentUnresolved}} since the topic id {{T2}} is not known yet * We get new cluster metadata, which includes {{{}T2{}}}, so {{T2-1}} is moved to {{assignmentReadyToReconcile}} * We call {{reconcile}} -- {{T2-1}} is now treated as the full assignment, so {{T1-1}} is being revoked * We end up with assignment {{T2-1, which is inconsistent with the broker-side target assignment.}} Generally, this seems to be a problem around semantics of the internal collections `assignmentUnresolved` and `assignmentReadyToReconcile`. Absence of a topic in `assignmentReadyToReconcile` may mean either revocation of the topic partition(s), or unavailability of a topic name for the topic. Internal state with simpler and correct invariant could be achieved by using a single collection `currentTargetAssignment` which is based on topic IDs and always corresponds to the latest assignment received from the broker. During every attempted reconciliation, all topic IDs will be resolved from the local cache, which should not introduce a lot of overhead. `assignmentUnresolved` and `assignmentReadyToReconcile` are removed. > Reconciliation may lose partitions when topic metadata is delayed > ----------------------------------------------------------------- > > Key: KAFKA-16198 > URL: https://issues.apache.org/jira/browse/KAFKA-16198 > Project: Kafka > Issue Type: Bug > Components: clients, consumer > Reporter: Lucas Brutschy > Assignee: Lucas Brutschy > Priority: Critical > Labels: clients, consumer, kip-848, kip-848-client-support > Fix For: 3.8.0 > > > The current reconciliation code in `AsyncKafkaConsumer`s `MembershipManager` > may lose part of the server-provided assignment when metadata is delayed. The > reason is incorrect handling of partially resolved topic names, as in this > example: > * We get assigned {{T1-1}} and {{T2-1}} > * We reconcile {{{}T1-1{}}}, {{T2-1}} remains in {{assignmentUnresolved}} > since the topic id {{T2}} is not known yet > * We get new cluster metadata, which includes {{{}T2{}}}, so {{T2-1}} is > moved to {{assignmentReadyToReconcile}} > * We call {{reconcile}} -- {{T2-1}} is now treated as the full assignment, > so {{T1-1}} is being revoked > * We end up with assignment {{T2-1, which is inconsistent with the > broker-side target assignment.}} > > Generally, this seems to be a problem around semantics of the internal > collections `assignmentUnresolved` and `assignmentReadyToReconcile`. Absence > of a topic in `assignmentReadyToReconcile` may mean either revocation of the > topic partition(s), or unavailability of a topic name for the topic. > Internal state with simpler and correct invariants could be achieved by using > a single collection `currentTargetAssignment` which is based on topic IDs and > always corresponds to the latest assignment received from the broker. During > every attempted reconciliation, all topic IDs will be resolved from the local > cache, which should not introduce a lot of overhead. `assignmentUnresolved` > and `assignmentReadyToReconcile` are removed. -- This message was sent by Atlassian Jira (v8.20.10#820010)