[ https://issues.apache.org/jira/browse/KAFKA-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17165484#comment-17165484 ]
Stanislav Kozlovski commented on KAFKA-10301: --------------------------------------------- Merged in master ([fa6e5b892227db24568e2cec0a0a0969bc1ec4b5|https://github.com/apache/kafka/commit/fa6e5b892227db24568e2cec0a0a0969bc1ec4b5]) and 2.6 ([fb0a79ddc7c2144cd58b9fc27e4d36ab8f063a78|https://github.com/apache/kafka/commit/fb0a79ddc7c2144cd58b9fc27e4d36ab8f063a78]) > Partition#remoteReplicasMap can be empty in certain race conditions > ------------------------------------------------------------------- > > Key: KAFKA-10301 > URL: https://issues.apache.org/jira/browse/KAFKA-10301 > Project: Kafka > Issue Type: Bug > Reporter: Stanislav Kozlovski > Assignee: Stanislav Kozlovski > Priority: Blocker > > In Partition#updateAssignmentAndIsr, we would previously update the > `partition#remoteReplicasMap` by adding the new replicas to the map and then > removing the old ones > ([source]([https://github.com/apache/kafka/blob/7f9187fe399f3f6b041ca302bede2b3e780491e7/core/src/main/scala/kafka/cluster/Partition.scala#L657)] > During a recent refactoring, we changed it to first clear the map and then > add all the replicas to it > ([source]([https://github.com/apache/kafka/blob/2.6/core/src/main/scala/kafka/cluster/Partition.scala#L663])) > While this is done in a write lock (`inWriteLock(leaderIsrUpdateLock)`), not > all callers that access the map structure use a lock. Some examples: > - Partition#updateFollowerFetchState > - DelayedDeleteRecords#tryComplete > - Partition#getReplicaOrException - called in > `checkEnoughReplicasReachOffset` without a lock, which itself is called by > DelayedProduce. I think this can fail a `ReplicaManager#appendRecords` call. > While we want to polish the code to ensure these sort of race conditions > become harder (or impossible) to introduce, it sounds safest to revert to the > previous behavior given the timelines regarding the 2.6 release. Jira > https://issues.apache.org/jira/browse/KAFKA-10302 tracks further > modifications to the code. -- This message was sent by Atlassian Jira (v8.3.4#803005)