Hey, Guys,
I am using the high level consumer. I have a daemon process that checks the
lag for a topic.
Suppose I have a topic with 5 partitions, and partition 0, 1 has lag of 0,
while the other 3 all have lags. In this case, should I best start 3
threads, or 5 threads to read from this topic again to achieve best
performance?
I am currently using 3 threads in this case, but it seems that each thread
still first try to get hold of partition0, 1 first.(which seems unnecessary
in my case)
Another question is that I am currently using a signal thread to spawn
different thread to read from kafka. So if a topic has 5 partitions, 5
signals will be sent, and 5 different threads will start polling the topic
in the following manner:
Map<String, Integer> topicCountMap = new HashMap<String, Integer>();
topicCountMap.put(kafkaTopic, new* Integer(1))*;
Map<String, List<KafkaStream<byte[], byte[]>>> consumerMap = consumer
.createMessageStreams(topicCountMap);
List<KafkaStream<byte[], byte[]>> streams = consumerMap.get(kafkaTopic);
KafkaStream<byte[], byte[]> stream = streams.*get(0);*
ConsumerIterator<byte[], byte[]> it = stream.iterator();
As you can see, I specify *Integer(1) to ensure there is only one stream in
the polling thread.*
But in my testing, I am using:
Map<String, Integer> topicCountMap = new HashMap<String, Integer>();
topicCountMap.put(topic, *new Integer(numOfPartitions));*
Map<String, List<KafkaStream<byte[], byte[]>>> consumerMap = consumer
.createMessageStreams(topicCountMap);
List<KafkaStream<byte[], byte[]>> streams = consumerMap.get(topic);
executor = Executors.newFixedThreadPool(*numOfPartitions*);
for (final KafkaStream stream : streams) {
executor.submit(new ConsumerTest(stream, threadNumber,this.targetTopic));
threadNumber++;
}
Are these two methods fundamentally the same, or the later one is
preferred?
Thanks much!
Chen