[ 
https://issues.apache.org/jira/browse/KAFKA-9261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16987217#comment-16987217
 ] 

ASF GitHub Bot commented on KAFKA-9261:
---------------------------------------

hachikuji commented on pull request #7770: KAFKA-9261; Client should handle 
unavailable leader metadata
URL: https://github.com/apache/kafka/pull/7770
 
 
   The client caches metadata fetched from Metadata requests. Previously, each 
metadata response overwrote all of the metadata from the previous one, so we 
could rely on the expectation that the broker only returned the leaderId for a 
partition if it had connection information available. This behavior changed 
with KIP-320 since having the leader epoch allows the client to filter out 
partition metadata which is known to be stale. However, because of this, we can 
no longer rely on the request-level guarantee of leader availability. There is 
no mechanism similar to the leader epoch to track the staleness of broker 
metadata, so we still overwrite all of the broker metadata from each response, 
which means that the partition metadata can get out of sync with the broker 
metadata in the client's cache. Hence it is no longer safe to validate inside 
the `Cluster` constructor that each leader has an associated `Node`
   
   Fixing this issue was unfortunately not straightforward because the cache 
was built to maintain references to broker metadata through the `Node` object 
at the partition level. In order to keep the state consistent, each `Node` 
reference would need to be updated based on the new broker metadata. Instead of 
doing that, this patch changes the cache so that it is structured more closely 
with the Metadata response schema. Broker node information is maintained at the 
top level in a single collection and cached partition metadata only references 
the id of the broker. To accommodate this, we have removed 
`PartitionInfoAndEpoch` and we have altered 
`MetadataResponse.PartitionMetadata` to eliminate its `Node` references.
   
   Note that one of the side benefits of the refactor here is that we virtually 
eliminate one of the hotspots in Metadata request handling in 
`MetadataCache.getEndpoints (which was renamed to `maybeFilterAliveReplicas`). 
The only reason this was expensive was because we had to build a new collection 
for the `Node` representations of each of the replica lists. This information 
was doomed to just get discarded on serialization, so the whole effort was 
wasteful. Now, we work with the lower level id lists and no copy of the 
replicas is needed (at least for all versions other than 0).
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> NPE when updating client metadata
> ---------------------------------
>
>                 Key: KAFKA-9261
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9261
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Jason Gustafson
>            Assignee: Jason Gustafson
>            Priority: Major
>
> We have seen the following exception recently:
> {code}
> java.lang.NullPointerException
>       at java.base/java.util.Objects.requireNonNull(Objects.java:221)
>       at org.apache.kafka.common.Cluster.<init>(Cluster.java:134)
>       at org.apache.kafka.common.Cluster.<init>(Cluster.java:89)
>       at 
> org.apache.kafka.clients.MetadataCache.computeClusterView(MetadataCache.java:120)
>       at org.apache.kafka.clients.MetadataCache.<init>(MetadataCache.java:82)
>       at org.apache.kafka.clients.MetadataCache.<init>(MetadataCache.java:58)
>       at 
> org.apache.kafka.clients.Metadata.handleMetadataResponse(Metadata.java:325)
>       at org.apache.kafka.clients.Metadata.update(Metadata.java:252)
>       at 
> org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.handleCompletedMetadataResponse(NetworkClient.java:1059)
>       at 
> org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:845)
>       at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:548)
>       at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:262)
>       at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
>       at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1281)
>       at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1225)
>       at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201)
> {code}
> The client assumes that if a leader is included in the response, then node 
> information must also be available. There are at least a couple possible 
> reasons this assumption can fail:
> 1. The client is able to detect stale partition metadata using leader epoch 
> information available. If stale partition metadata is detected, the client 
> ignores it and uses the last known metadata. However, it cannot detect stale 
> broker information and will always accept the latest update. This means that 
> the latest metadata may be a mix of multiple metadata responses and therefore 
> the invariant will not generally hold.
> 2. There is no lock which protects both the fetching of partition metadata 
> and the live broker when handling a Metadata request. This means an 
> UpdateMetadata request can arrive concurrently and break the intended 
> invariant.
> It seems case 2 has been possible for a long time, but it should be extremely 
> rare. Case 1 was only made possible with KIP-320, which added the leader 
> epoch tracking. It should also be rare, but the window for inconsistent 
> metadata is probably a bit bigger than the window for a concurrent update.
> To fix this, we should make the client more defensive about metadata updates 
> and not assume that the leader is among the live endpoints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to