[ https://issues.apache.org/jira/browse/KAFKA-15230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Colin McCabe resolved KAFKA-15230. ---------------------------------- Fix Version/s: 3.7.0 Resolution: Fixed > ApiVersions data between controllers is not reliable > ---------------------------------------------------- > > Key: KAFKA-15230 > URL: https://issues.apache.org/jira/browse/KAFKA-15230 > Project: Kafka > Issue Type: Bug > Reporter: David Arthur > Assignee: Colin McCabe > Priority: Critical > Fix For: 3.7.0 > > > While testing ZK migrations, I noticed a case where the controller was not > starting the migration due to the missing ApiVersions data from other > controllers. This was unexpected because the quorum was running and the > followers were replicating the metadata log as expected. After examining a > heap dump of the leader, it was in fact the case that the ApiVersions map of > NodeApiVersions was empty. > > After further investigation and offline discussion with [~jsancio], we > realized that after the initial leader election, the connection from the Raft > leader to the followers will become idle and eventually timeout and close. > This causes NetworkClient to purge the NodeApiVersions data for the closed > connections. > > There are two main side effects of this behavior: > 1) If migrations are not started within the idle timeout period (10 minutes, > by default), then they will not be able to be started. After this timeout > period, I was unable to restart the controllers in such a way that the leader > had active connections with all followers. > 2) Dynamically updating features, such as "metadata.version", is not > guaranteed to be safe > > There is a partial workaround for the migration issue. If we set " > connections.max.idle.ms" to -1, the Raft leader will never disconnect from > the followers. However, if a follower restarts, the leader will not > re-establish a connection. > > The feature update issue has no safe workarounds. -- This message was sent by Atlassian Jira (v8.20.10#820010)