[ https://issues.apache.org/jira/browse/KAFKA-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13592884#comment-13592884 ]
Neha Narkhede commented on KAFKA-513: ------------------------------------- Thanks for the patch! This patch is great and is very close to being checked in. Few review suggestions - 1. ControllerChannelManager 1.1 The log message when the controller sends the data includes the controller's epoch, but the response doesn't. It is useful to know the epoch wherever the controller id is logged. 1.2 The "," between controller id and epoch probably could be skipped, this pattern is not followed elsewhere in this patch. 2. Partition 2.1 The error statement when the broker is dropping the leaderAndIsr request should be - "since local leader epoch %d is >= the request's leader epoch" 2.2 Also, let's add "Broker %d aborted..." to the following statement similar to "Broker %d discarded..." "Aborted the become-follower state change since leader %d for partition [%s,%d]" 3. PartitionStateMachine In electLeaderForPartition() API, it is useful to include the contents of the StateChangedFailedException in the state change log. This is because it is useful to know that after starting the state change, it got aborted because another controller with a higher controller epoch was detected. So something like - "Controller %d epoch %d aborted leader election for partition [%s,%d] since ..." 4. ReplicaManager The log statements here are of the form of [Replica Manager on Broker %d]: Handling... This is different from the ones in Partition which are like "Broker %d started become-follower...". I guess the default logIdent, which is meant for the main log, is the cause of this discrepancy. For the purpose of the state change log, we actually don't care if it is Replica Manager or Partition on that broker, it would just suffice to have the simplest statement to communicate the state changes like the ones in Partition except the logIdent. 5. ReplicaStateMachine Same as 4 6. Rename stateChangeLogMerger.scala to have 1st letter in caps. 7. StateChangeLogMerger This tool looks great now. Thanks for including the review suggestions. Few minor observations - 7.1 You check if at least one of the two log input options are specified. I think we should also check that at most one of them is specified. In other words, what's the expected behavior if both are speci fied ? > Add state change log to Kafka brokers > ------------------------------------- > > Key: KAFKA-513 > URL: https://issues.apache.org/jira/browse/KAFKA-513 > Project: Kafka > Issue Type: Sub-task > Affects Versions: 0.8 > Reporter: Neha Narkhede > Assignee: Swapnil Ghike > Priority: Blocker > Labels: p1, replication, tools > Fix For: 0.8 > > Attachments: kafka-513-v1.patch, kafka-513-v2.patch, > kafka-513-v3.patch, kafka-513-v4.patch > > Original Estimate: 96h > Remaining Estimate: 96h > > Once KAFKA-499 is checked in, every controller to broker communication can be > modelled as a state change for one or more partitions. Every state change > request will carry the controller epoch. If there is a problem with the state > of some partitions, it will be good to have a tool that can create a timeline > of requested and completed state changes. This will require each broker to > output a state change log that has entries like > [2012-09-10 10:06:17,280] broker 1 received request LeaderAndIsr() for > partition [foo, 0] from controller 2, epoch 1 > [2012-09-10 10:06:17,350] broker 1 completed request LeaderAndIsr() for > partition [foo, 0] from controller 2, epoch 1 > On controller, this will look like - > [2012-09-10 10:06:17,198] controller 2, epoch 1, initiated state change > request LeaderAndIsr() for partition [foo, 0] > We need a tool that can collect the state change log from all brokers and > create a per-partition timeline of state changes - > [foo, 0] > [2012-09-10 10:06:17,198] controller 2, epoch 1 initiated state change > request LeaderAndIsr() > [2012-09-10 10:06:17,280] broker 1 received request LeaderAndIsr() from > controller 2, epoch 1 > [2012-09-10 10:06:17,350] broker 1 completed request LeaderAndIsr() from > controller 2, epoch 1 > This JIRA involves adding the state change log to each broker and adding the > tool to create the timeline -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira