As mentioned in previous emails, we are also working on a re-implementation of the consumer. I would like to use this email thread to discuss the details of the public API. I would also like us to be picky about this public api now so it is as good as possible and we don't need to break it in the future.
The best way to get a feel for the API is actually to take a look at the javadoc<http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/kafka/clients/consumer/KafkaConsumer.html>, the hope is to get the api docs good enough so that it is self-explanatory. You can also take a look at the configs here<http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/kafka/clients/consumer/ConsumerConfig.html> Some background info on implementation: At a high level the primary difference in this consumer is that it removes the distinction between the "high-level" and "low-level" consumer. The new consumer API is non blocking and instead of returning a blocking iterator, the consumer provides a poll() API that returns a list of records. We think this is better compared to the blocking iterators since it effectively decouples the threading strategy used for processing messages from the consumer. It is worth noting that the consumer is entirely single threaded and runs in the user thread. The advantage is that it can be easily rewritten in less multi-threading-friendly languages. The consumer batches data and multiplexes I/O over TCP connections to each of the brokers it communicates with, for high throughput. The consumer also allows long poll to reduce the end-to-end message latency for low throughput data. The consumer provides a group management facility that supports the concept of a group with multiple consumer instances (just like the current consumer). This is done through a custom heartbeat and group management protocol transparent to the user. At the same time, it allows users the option to subscribe to a fixed set of partitions and not use group management at all. The offset management strategy defaults to Kafka based offset management and the API provides a way for the user to use a customized offset store to manage the consumer's offsets. A key difference in this consumer also is the fact that it does not depend on zookeeper at all. More details about the new consumer design are here<https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+Rewrite+Design> Please take a look at the new API<http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/kafka/clients/consumer/KafkaConsumer.html>and give us any thoughts you may have. Thanks, Neha