I am using the SimpleConsumer to consume specific partitions on specific processes. The workflow is kind of like this:
i) An external arbiter assigns partitions to a specific processes. It provides the guarantees of: a) All partitions are consumed by the cluster. b) A single partition is only consumed by a single host. ii) Each host when assigned partitions figures out the leaders for each of those partitions. It then starts a thread each for each kafka leader that it needs to connect to and consumes from these partitions. iii)The offset management is also done external to Kafka. iv) f any of these requests lead to responses with errors which signify that the partitions have moved to new leaders (UnknownTopicOrPartition, LeaderNotAvailable, NotLeaderForPartitionCode etc etc), we refresh the broker -> partition mapping for the partitions we own and then start/stop threads based on which leaders we need to talk to and rinse and repeat. I refresh the broker -> partition mapping by querying kafka brokers serially with TopicMetadataRequest for the partitions that my host owns. As soon as I get leaders for all the partitions I own I break out i.e. I don't ask all of my seed brokers the same question. The TopicMetadataRequest just seems to be a proxy to some data stored in ZK. I am not sure whether this request will always give me all the partitions I am querying for. For example what do I get if one of the partitions I am making a TopicMetadataRequest for is in the middle of moving to a new leader? If I lose a broker, the thread for that broker starts complaining about not being able to connect to it and I refresh the broker -> partition mapping and the partitions should now be assigned to another broker. Similarly if one of the downed brokers comes back up, some partitions will be assigned to it and the per-broker consumer thread which owned the partitions will complain and I refresh the broker->partition mapping and will know about the new leader for those partitions. I am still worried about the transition phase like I mentioned in the previous paragraph. I don't quite know what to do if I query for TopicMetadataRequest and don't get results for all the partitions I queried. Maybe just retry with an exponential back-off till I get results? Again is this even possible or is the ZK partition->leader mapping always consistent? Thanks!