[ BTW, after some more research, I think what might be happening here is that we had some de-facto network partitioning happen as a side-effect of us renaming some network interfaces, though if that's the case, I'd like to know how to get everything back into sync. ]
Hi. I'm seeing something weird, where if I do a MetadataRequest, what I get back says I have out-of-sync replicas... but if I use kafka-topic.sh, it says I don't. I'm running Kafka 0.8.1.1, still, for the moment, on Java 1.7.0_55. The code I have to do this uses kafka-python: ====== #!/usr/bin/python import logging import signal import sys # Should use argparse, but we shouldn't use python 2.6, either... from optparse import OptionParser import simplejson as json from kafka.client import KafkaClient from kafka.protocol import KafkaProtocol #logging.basicConfig(level=logging.DEBUG) def main(): parser = OptionParser() parser.add_option('-t', '--topic', dest='topic', help='topic to which we should subscribe', default='mytopic') parser.add_option('-b', '--broker', dest='kafkaHost', help='Kafka broker to which we should connect', default='host309'-ilg1.rtc.vrsn.com') (options, args) = parser.parse_args() kafka = KafkaClient('%s:9092' % options.kafkaHost) # WARNING: terrible abuse of private methods follows. id = kafka._next_id() request = KafkaProtocol.encode_metadata_request(kafka.client_id, id) response = kafka._send_broker_unaware_request(id, request) (brokers, topics) = KafkaProtocol.decode_metadata_response(response) if options.topic != '*': topics_we_want = [options.topic] else: topics_we_want = sorted(topics.keys()) for topic in topics_we_want: for partition in sorted(topics[topic].keys()): meta = topics[topic][partition] delta = set(meta.replicas) - set(meta.isr) if len(delta) == 0: print 'topic', topic, 'partition', partition, 'leader', meta.leader, 'replicas', meta.replicas, 'isr', meta.isr else: print 'topic', topic, 'partition', partition, 'leader', meta.leader, 'replicas', meta.replicas, 'isr', meta.isr, 'OUT-OF-SYNC', delta sys.exit(0) if __name__ == "__main__": #logging.basicConfig(level=logging.DEBUG) main() ====== And if I run that against "mytopic", I get: topic mytopic partition 0 leader 311 replicas (311, 323) isr (311, 323) topic mytopic partition 1 leader 323 replicas (323, 312) isr (312, 323) topic mytopic partition 2 leader 324 replicas (324, 313) isr (324, 313) topic mytopic partition 3 leader 309 replicas (309, 314) isr (314, 309) topic mytopic partition 4 leader 315 replicas (310, 315) isr (315,) OUT-OF-SYNC set([310]) topic mytopic partition 5 leader 311 replicas (311, 316) isr (311, 316) topic mytopic partition 6 leader 312 replicas (312, 317) isr (317, 312) topic mytopic partition 7 leader 318 replicas (313, 318) isr (318, 313) topic mytopic partition 8 leader 314 replicas (314, 319) isr (314, 319) topic mytopic partition 9 leader 315 replicas (315, 320) isr (320, 315) topic mytopic partition 10 leader 316 replicas (316, 321) isr (316, 321) topic mytopic partition 11 leader 317 replicas (317, 322) isr (317, 322) topic mytopic partition 12 leader 318 replicas (318, 323) isr (318, 323) topic mytopic partition 13 leader 324 replicas (319, 324) isr (324,) OUT-OF-SYNC set([319]) topic mytopic partition 14 leader 320 replicas (320, 309) isr (320, 309) topic mytopic partition 15 leader 321 replicas (321, 310) isr (321,) OUT-OF-SYNC set([310]) topic mytopic partition 16 leader 312 replicas (312, 320) isr (312, 320) topic mytopic partition 17 leader 323 replicas (323, 313) isr (323, 313) topic mytopic partition 18 leader 324 replicas (324, 314) isr (314, 324) topic mytopic partition 19 leader 309 replicas (309, 315) isr (309, 315) but if I do: /opt/kafka/bin/kafka-topics.sh --describe --zookeeper host301:2181 --topic mytopic I get: Topic:mytopic PartitionCount:20 ReplicationFactor:2 Configs:retention.bytes=100000000000 Topic: mytopic Partition: 0 Leader: 311 Replicas: 311,323 Isr: 311,323 Topic: mytopic Partition: 1 Leader: 323 Replicas: 323,312 Isr: 312,323 Topic: mytopic Partition: 2 Leader: 324 Replicas: 324,313 Isr: 324,313 Topic: mytopic Partition: 3 Leader: 309 Replicas: 309,314 Isr: 314,309 Topic: mytopic Partition: 4 Leader: 315 Replicas: 310,315 Isr: 315,310 Topic: mytopic Partition: 5 Leader: 311 Replicas: 311,316 Isr: 311,316 Topic: mytopic Partition: 6 Leader: 312 Replicas: 312,317 Isr: 317,312 Topic: mytopic Partition: 7 Leader: 318 Replicas: 313,318 Isr: 318,313 Topic: mytopic Partition: 8 Leader: 314 Replicas: 314,319 Isr: 314,319 Topic: mytopic Partition: 9 Leader: 315 Replicas: 315,320 Isr: 320,315 Topic: mytopic Partition: 10 Leader: 316 Replicas: 316,321 Isr: 316,321 Topic: mytopic Partition: 11 Leader: 317 Replicas: 317,322 Isr: 317,322 Topic: mytopic Partition: 12 Leader: 318 Replicas: 318,323 Isr: 318,323 Topic: mytopic Partition: 13 Leader: 324 Replicas: 319,324 Isr: 324,319 Topic: mytopic Partition: 14 Leader: 320 Replicas: 320,309 Isr: 320,309 Topic: mytopic Partition: 15 Leader: 321 Replicas: 321,310 Isr: 321,310 Topic: mytopic Partition: 16 Leader: 312 Replicas: 312,320 Isr: 312,320 Topic: mytopic Partition: 17 Leader: 323 Replicas: 323,313 Isr: 323,313 Topic: mytopic Partition: 18 Leader: 324 Replicas: 324,314 Isr: 314,324 Topic: mytopic Partition: 19 Leader: 309 Replicas: 309,315 Isr: 309,315 and if I do: /opt/kafka/bin/kafka-topics.sh --describe --zookeeper host301-ilg1:2181 --under-replicated-partitions it prints nothing. Looking at a system-call trace of kafka-topics.sh, I never see it do a MetadataRequest at all: I see it connect to ZK and I see it fishing around in there, though. If I poke around in ZK manually, I see, for example (looking at partition 4, since that's one it says is out of sync): [zk: localhost:2181(CONNECTED) 14] get /brokers/topics/mytopic/partitions/4/state {"controller_epoch":9,"leader":315,"version":1,"leader_epoch":15,"isr":[315,310]} cZxid = 0x100000032 ctime = Fri Oct 31 21:20:31 UTC 2014 mZxid = 0x44d07e3b0 mtime = Fri Apr 17 11:44:32 UTC 2015 pZxid = 0x100000032 cversion = 0 dataVersion = 27 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 81 numChildren = 0 I can do the metadata request against every broker in our cluster, and I get the same results, the same three partitions that show as out of sync. I can also get that key from ZK on all our ZK instances and I get the same basic thing as above. Looking at the filesystem on 310, the one that has mytopic-4 in it, I see log segments being updated there with the current time on them, so something's writing there at least -- which doesn't preclude it from being a bit behind, I suppose, but it's not like the mod-times are last February. (-: Here's what I see in state-change.log for 'mytopic,4' on broker 315: state-change.log:[2015-04-16 13:26:42,424] TRACE Broker 315 received LeaderAndIsr request (LeaderAndIsrInfo:(Leader:310,ISR:310,315,LeaderEpoch:14,ControllerEpoch:8),ReplicationFactor:2),AllReplicas:310,315) correlation id 0 from controller 312 epoch 9 for partition [mytopic,4] (state.change.logger) state-change.log:[2015-04-16 13:26:42,430] WARN Broker 315 received invalid LeaderAndIsr request with correlation id 0 from controller 312 epoch 9 with an older leader epoch 14 for partition [mytopic,4], current leader epoch is 14 (state.change.logger) state-change.log:[2015-04-16 13:26:42,702] TRACE Broker 315 cached leader info (LeaderAndIsrInfo:(Leader:310,ISR:310,315,LeaderEpoch:14,ControllerEpoch:8),ReplicationFactor:2),AllReplicas:310,315) for partition [mytopic,4] in response to UpdateMetadata request sent by controller 312 epoch 9 with correlation id 0 (state.change.logger) state-change.log:[2015-04-16 13:26:55,541] TRACE Broker 315 cached leader info (LeaderAndIsrInfo:(Leader:310,ISR:310,315,LeaderEpoch:14,ControllerEpoch:8),ReplicationFactor:2),AllReplicas:310,315) for partition [mytopic,4] in response to UpdateMetadata request sent by controller 312 epoch 9 with correlation id 3 (state.change.logger) state-change.log:[2015-04-17 11:42:52,215] TRACE Broker 315 received LeaderAndIsr request (LeaderAndIsrInfo:(Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9),ReplicationFactor:2),AllReplicas:310,315) correlation id 503 from controller 312 epoch 9 for partition [mytopic,4] (state.change.logger) state-change.log:[2015-04-17 11:42:52,215] TRACE Broker 315 handling LeaderAndIsr request correlationId 503 from controller 312 epoch 9 starting the become-leader transition for partition [mytopic,4] (state.change.logger) state-change.log:[2015-04-17 11:42:52,216] TRACE Broker 315 stopped fetchers as part of become-leader request from controller 312 epoch 9 with correlation id 503 for partition [mytopic,4] (state.change.logger) state-change.log:[2015-04-17 11:42:52,216] TRACE Broker 315 completed LeaderAndIsr request correlationId 503 from controller 312 epoch 9 for the become-leader transition for partition [mytopic,4] (state.change.logger) state-change.log:[2015-04-17 11:42:52,217] TRACE Broker 315 cached leader info (LeaderAndIsrInfo:(Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9),ReplicationFactor:2),AllReplicas:310,315) for partition [mytopic,4] in response to UpdateMetadata request sent by controller 312 epoch 9 with correlation id 503 (state.change.logger) state-change.log:[2015-04-17 11:43:06,187] TRACE Broker 315 received LeaderAndIsr request (LeaderAndIsrInfo:(Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9),ReplicationFactor:2),AllReplicas:310,315) correlation id 547 from controller 312 epoch 9 for partition [mytopic,4] (state.change.logger) state-change.log:[2015-04-17 11:43:06,187] WARN Broker 315 received invalid LeaderAndIsr request with correlation id 547 from controller 312 epoch 9 with an older leader epoch 15 for partition [mytopic,4], current leader epoch is 15 (state.change.logger) state-change.log:[2015-04-17 11:43:06,212] TRACE Broker 315 cached leader info (LeaderAndIsrInfo:(Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9),ReplicationFactor:2),AllReplicas:310,315) for partition [mytopic,4] in response to UpdateMetadata request sent by controller 312 epoch 9 with correlation id 547 (state.change.logger) state-change.log:[2015-04-17 11:44:30,347] TRACE Broker 315 cached leader info (LeaderAndIsrInfo:(Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9),ReplicationFactor:2),AllReplicas:310,315) for partition [mytopic,4] in response to UpdateMetadata request sent by controller 312 epoch 9 with correlation id 549 (state.change.logger) and for 312: [2015-04-17 11:42:52,207] TRACE Controller 312 epoch 9 started leader election for partition [mytopic,4] (state.change.logger) [2015-04-17 11:42:52,210] TRACE Controller 312 epoch 9 elected leader 315 for Offline partition [mytopic,4] (state.change.logger) [2015-04-17 11:42:52,215] TRACE Controller 312 epoch 9 changed partition [mytopic,4] from OnlinePartition to OnlinePartition with leader 315 (state.change.logger) [2015-04-17 11:42:52,215] TRACE Controller 312 epoch 9 sending become-follower LeaderAndIsr request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 503 to broker 310 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:42:52,215] TRACE Controller 312 epoch 9 sending become-leader LeaderAndIsr request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 503 to broker 315 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:42:52,215] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 503 to broker 322 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:42:52,215] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 503 to broker 313 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:42:52,215] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 503 to broker 316 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:42:52,215] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 503 to broker 319 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:42:52,215] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 503 to broker 310 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:42:52,215] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 503 to broker 309 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:42:52,215] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 503 to broker 318 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:42:52,215] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 503 to broker 312 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:42:52,215] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 503 to broker 321 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:42:52,215] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 503 to broker 315 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:42:52,215] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 503 to broker 324 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:42:52,215] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 503 to broker 323 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:42:52,215] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 503 to broker 317 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:42:52,215] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 503 to broker 311 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:42:52,215] TRACE Broker 312 cached leader info (LeaderAndIsrInfo:(Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9),ReplicationFactor:2),AllReplicas:310,315) for partition [mytopic,4] in response to UpdateMetadata request sent by controller 312 epoch 9 with correlation id 503 (state.change.logger) [2015-04-17 11:42:52,215] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 503 to broker 320 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:42:52,215] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 503 to broker 314 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:43:06,043] TRACE Controller 312 epoch 9 changed state of replica 310 for partition [mytopic,4] from OnlineReplica to OfflineReplica (state.change.logger) [2015-04-17 11:43:06,186] TRACE Controller 312 epoch 9 sending become-leader LeaderAndIsr request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 547 to broker 315 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:43:06,188] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 547 to broker 322 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:43:06,190] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 547 to broker 313 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:43:06,193] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 547 to broker 316 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:43:06,195] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 547 to broker 319 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:43:06,197] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 547 to broker 309 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:43:06,199] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 547 to broker 318 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:43:06,201] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 547 to broker 312 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:43:06,203] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 547 to broker 321 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:43:06,205] TRACE Broker 312 cached leader info (LeaderAndIsrInfo:(Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9),ReplicationFactor:2),AllReplicas:310,315) for partition [mytopic,4] in response to UpdateMetadata request sent by controller 312 epoch 9 with correlation id 547 (state.change.logger) [2015-04-17 11:43:06,206] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 547 to broker 315 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:43:06,208] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 547 to broker 324 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:43:06,211] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 547 to broker 323 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:43:06,213] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 547 to broker 317 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:43:06,215] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 547 to broker 311 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:43:06,217] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 547 to broker 320 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:43:06,219] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 547 to broker 314 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:44:30,292] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 548 to broker 310 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:44:30,310] TRACE Controller 312 epoch 9 changed state of replica 310 for partition [mytopic,4] from OfflineReplica to OnlineReplica (state.change.logger) [2015-04-17 11:44:30,317] TRACE Controller 312 epoch 9 sending become-follower LeaderAndIsr request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 549 to broker 310 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:44:30,320] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 549 to broker 322 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:44:30,322] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 549 to broker 313 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:44:30,324] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 549 to broker 316 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:44:30,327] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 549 to broker 319 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:44:30,329] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 549 to broker 310 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:44:30,331] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 549 to broker 309 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:44:30,334] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 549 to broker 318 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:44:30,336] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 549 to broker 312 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:44:30,338] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 549 to broker 321 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:44:30,340] TRACE Broker 312 cached leader info (LeaderAndIsrInfo:(Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9),ReplicationFactor:2),AllReplicas:310,315) for partition [mytopic,4] in response to UpdateMetadata request sent by controller 312 epoch 9 with correlation id 549 (state.change.logger) [2015-04-17 11:44:30,341] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) with correlationId 549 to broker 315 for partition [mytopic,4] (state.change.logger) [2015-04-17 11:44:30,344] TRACE Controller 312 epoch 9 sending UpdateMetadata request (Leader:315,ISR:315,LeaderEpoch:15,ControllerEpoch:9) and for 310: [2015-04-16 13:26:42,406] TRACE Broker 310 received LeaderAndIsr request (LeaderAndIsrInfo:(Leader:310,ISR:310,315,LeaderEpoch:14,ControllerEpoch:8),ReplicationFactor:2),AllReplicas:310,315) correlation id 0 from controller 312 epoch 9 for partition [mytopic,4] (state.change.logger) [2015-04-16 13:26:42,410] WARN Broker 310 received invalid LeaderAndIsr request with correlation id 0 from controller 312 epoch 9 with an older leader epoch 14 for partition [mytopic,4], current leader epoch is 14 (state.change.logger) [2015-04-16 13:26:42,556] TRACE Broker 310 cached leader info (LeaderAndIsrInfo:(Leader:310,ISR:310,315,LeaderEpoch:14,ControllerEpoch:8),ReplicationFactor:2),AllReplicas:310,315) for partition [mytopic,4] in response to UpdateMetadata request sent by controller 312 epoch 9 with correlation id 0 (state.change.logger) [2015-04-16 13:26:55,776] TRACE Broker 310 cached leader info (LeaderAndIsrInfo:(Leader:310,ISR:310,315,LeaderEpoch:14,ControllerEpoch:8),ReplicationFactor:2),AllReplicas:310,315) for partition [mytopic,4] in response to UpdateMetadata request sent by control Which one is right? Should I not be using MetadataRequests to figure out who is and isn't in sync? If there's something else -Steve