Each broker should have a controller log, and at one period of time only one of them will host the controller, while others' controller logs will be almost empty. If you found some entries like "controller start-up" one multiple controllers or if more than one controller log has large amount of data then that would indicate controller has migrated from one broker to another recently.
If that happens, then you probably hit this one: KAFKA-1578 <https://issues.apache.org/jira/browse/KAFKA-1578>, it is fixed in the 0.8.2 release. Guozhang On Thu, Jan 29, 2015 at 1:55 PM, Allen Wang <[email protected]> wrote: > We are using 0.8.1.1. > > How do we identify controller migration? Is it in logs or some metrics? > > Allen > > On Tue, Jan 27, 2015 at 9:35 AM, Guozhang Wang <[email protected]> wrote: > > > Allen, which version of Kafka are you using? And if you have multiple > > brokers, is there a controller migration happened before? > > > > Guozhang > > > > On Fri, Jan 23, 2015 at 3:56 PM, Allen Wang <[email protected]> > > wrote: > > > > > Hello, > > > > > > We tried the ReassignPartitionsCommand to move partitions to new > brokers. > > > The execution initially showed message "Successfully started > reassignment > > > of partitions ...". But when I tried to verify using --verify option, > it > > > reported some reassignments have failed: > > > > > > ERROR: Assigned replicas (0,5,2) don't match the list of replicas for > > > reassignment (0,5) for partition [vhs_playback_event,1] > > > ERROR: Assigned replicas (4,5,0,2) don't match the list of replicas for > > > reassignment (4,5) for partition [vhs_playback_event,11] > > > ERROR: Assigned replicas (3,5,0,2) don't match the list of replicas for > > > reassignment (3,5) for partition [vhs_playback_event,16] > > > > > > I noticed that the assigned replicas in the error messages include both > > old > > > assignment and new assignment. Is this a real error or just means > > > partitions are being copied and current state does not match the final > > > expected state? > > > > > > Since I was confused by the errors, I ran the same > > > ReassignPartitionsCommand with the same assignment again but got some > > > additional failure messages complaining that three partitions do not > > exist: > > > > > > [2015-01-23 18:15:41,333] ERROR Skipping reassignment of partition > > > [vhs_playback_event,16] since it doesn't exist > > > (kafka.admin.ReassignPartitionsCommand) > > > [2015-01-23 18:15:41,455] ERROR Skipping reassignment of partition > > > [vhs_playback_event,15] since it doesn't exist > > > (kafka.admin.ReassignPartitionsCommand) > > > [2015-01-23 18:15:41,499] ERROR Skipping reassignment of partition > > > [vhs_playback_event,17] since it doesn't exist > > > (kafka.admin.ReassignPartitionsCommand) > > > > > > These partitions later reappeared from the output of --verify. > > > > > > The other thing is that at one point the BytesOut from one broker > exceeds > > > 100Mbytes, which is quite alarming. > > > > > > In the end, the reassignment was done according to the input file to > > > ReassignPartitionsCommand. But the UnderReplicatedPartitions for the > > > brokers keeps showing a positive number, even though the output of > > describe > > > topic command and ZooKeeper data show the ISRs are all in sync, and > > > Replica-MaxLag is 0. > > > > > > To sum up, the overall execution is successful but the error messages > are > > > quite noisy and the metric is not consistent with what appears to be. > > > > > > Does anyone have the similar experience and is there anything we can do > > get > > > it done smoother? What can we do to reset the inconsistent > > > UnderReplicatedPartitions metric? > > > > > > Thanks, > > > Allen > > > > > > > > > > > -- > > -- Guozhang > > > -- -- Guozhang
