divijvaidya commented on PR #13111: URL: https://github.com/apache/kafka/pull/13111#issuecomment-1387158089
Thanks for the comments folks. I would like to break the conversation as multiple FAQs and hopefully that would address the questions and points that you have raised above. **What is the motivation? Is it to make the latest versions compatible with pre 2.8 clients (for the scope of this bug) OR is it to protect the server when older clients are used?** It’s the latter. Currently, the bug manifests in availability loss for the impacted topic since the topic stops replication. This is recoverable by deleting the metadata file and broker will recreate it from Zk. However, when KIP405(Tiered Storage) is merged in, it will begin to impact data integrity. This is because the metadata for a segment uses topicId as a key. When same segment for same topic is uploaded with different topic Ids, it leads to an unrecoverable situation. I would be happy to discuss a different solution than what has been proposed in the PR which can protect the server against the above two cases. **Does this bug impact client/server in the same major version as well? ** Yes. <2.8 client with >=2.8 server will have this bug. ** Can the users migrate to the newer versions? ** That would be ideal. But practically, there are many cases where the users rely on 3P libraries which haven’t updated their client version. We have been observing multiple cases where customers are facing this bug. As a community, we can push the problem back to the users and request them to upgrade their software, OR we can empathise with their situation and try to find a path forward which doesn’t have side effects and doesn’t burden the newer clients/servers. In some cases, former is the right thing to do but I would argue that in this particular case, we have a simple and safe fix to prevent majority of the cases. Hence, in this particular case, we can strive to improve the experience of the users and go with the latter option. ** Why is this PR safe to merge?** Change in this PR breaks the premise that Zk is the source of truth since it updates Zk with a value that is stored locally in the controller. This is not ideal. But it is a safe change to make. This is primarily because topic IDs are immutable and controller context is either empty or consistent with the latest state of the system. More specifically: 1. we update Zk *only when* it doesn’t have a topic Id during alter partition which is not possible (since create topic would have allocated a topic Id) unless it hits this bug. Hence, we won't encounter a scenario where we "overwrite" an existing topic Id. 2. Topic IDs are immutable. They only change for a topic, when it is deleted and re-created. In cases, where topic is deleted and re-created, controller context removes the topic Id from local cache on deletion. Hence, the topic Id in the local cache of a controller is always the one which should correctly be associated with a particular topic. 3. The zkClient.setTopicIds() ensures that Zk is only updated from the latest controller (by verifying the controller epoch), hence, eliminating the possibility of a stale controller updating the Zk with stale topic Id. **What are the alternative ways to protect the state of the server against thus bug?** 1. As Colin, suggested, we could potentially start storing topic Ids in a different place in Zk so that they don't get overwritten by older clients. I believe that it is a more intrusive change (and much holistic covering 100% of bug scenarios) than what I suggested above. 2. If a topic Id mismatch is detected, consider the partition as a "bad partition" and perform the recovery steps listed https://issues.apache.org/jira/browse/KAFKA-14190 manually. Stop archival to remote storage as soon as a topic Id mismatch is detected. We should probably make this change in addition to the change in this PR. Any other suggestions? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org