[ 
https://issues.apache.org/jira/browse/KAFKA-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16657776#comment-16657776
 ] 

ASF GitHub Bot commented on KAFKA-7235:
---------------------------------------

hzxa21 opened a new pull request #5821: KAFKA-7235: Detect outdated control 
requests and bounced brokers using broker generation
URL: https://github.com/apache/kafka/pull/5821
 
 
   This PR introduces the broker generation concept and leverage it to allow 
controller to detect fast bounced brokers and allow broker to reject outdated 
control requests.
   
   It has the changes required to implement KIP-380:
   
   [Common]
   - Refactor ZookeeperClient to expose the zookeeper `multi` request directly
   - Refactor KafkaZkClient to use MultiRequest instead of raw zookeeper 
transaciton
   - Atomically get creation transaction id (czxid) with broker znode creation 
and use it as broker epoch to identify broker gerneration across bounces
   - Introduce LeaderAndIsrRequest V2, UpdateMetadataRequest V5 and 
StopReplicaRequest V1 to include broker epoch in control requests and normalize 
their schemas to make it more memory efficient
   - Add STALE_BROKER_EPOCH error
   
   [Broker]
   - Cache the current broker epoch after broker znode registration
   - Reject LeaderAndIsrRequest, UpdateMetadataRequest and StopReplicaRequest 
if the request's broker epoch < current broker epoch, and respond back with 
STALE_BROKER_EPOCH error
   
   [Controller]
   - Cache/update broker epochs in `controllerContext.brokerEpochsCache` after 
reading from zk when processing `BrokerChange` event and `onControllerFailover`
   - Detect bounced brokers in `BrokerChange` event by comparing the broker 
epochs get from zk and cached broker epochs and trigger necessary state changes
   - Avoid sending out requests to dead brokers
   
   [Test]
   - Add `BrokerEpochIntegrationTest` to test broker processing new versions of 
the control requests and rejecting requests with stale broker epoch
   - Add a test case in `ControllerIntegrationTest` to test controller 
detecting bounced brokers
   - Add test cases in `RequestResponseTest` to test seralization and 
de-seralization for new versions of the control requests
   - Add `ControlRequstTest` unit test to test control requests schemas 
normalization
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Use brokerZkNodeVersion to prevent broker from processing outdated controller 
> request
> -------------------------------------------------------------------------------------
>
>                 Key: KAFKA-7235
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7235
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Dong Lin
>            Assignee: Zhanxiang (Patrick) Huang
>            Priority: Major
>
> Currently a broker can process controller requests that are sent before the 
> broker is restarted. This could cause a few problems. Here is one example:
> Let's assume partitions p1 and p2 exists on broker1.
> 1) Controller generates LeaderAndIsrRequest with p1 to be sent to broker1.
> 2) Before controller sends the request, broker1 is quickly restarted.
> 3) The LeaderAndIsrRequest with p1 is delivered to broker1.
> 4) After processing the first LeaderAndIsrRequest, broker1 starts to 
> checkpoint high watermark for all partitions that it owns. Thus it may 
> overwrite high watermark checkpoint file with only the hw for partition p1. 
> The hw for partition p2 is now lost, which could be a problem.
> In general, the correctness of broker logic currently relies on a few 
> assumption, e.g. the first LeaderAndIsrRequest received by broker should 
> contain all partitions hosted by the broker, which could break if broker can 
> receive controller requests that were generated before it restarts. 
> One reasonable solution to the problem is to include the 
> expectedBrokeNodeZkVersion in the controller requests. Broker should remember 
> the broker znode zkVersion after it registers itself in the zookeeper. Then 
> broker can reject those controller requests whose expectedBrokeNodeZkVersion 
> is different from its broker znode zkVersion.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to