Hi Jay, Thanks for taking time for the details ! Appreciate that.
Just clarifying myself couple of things in the same line. 1) The new producer and consumer is being designed to take care of auto balancing between partitions. Right ? 2) with the current available producer and consumer, is my current setup(pls see attached file) a good design in terms of scalability. Thanks, Krishna On Wednesday, March 19, 2014, Jay Kreps <jay.kr...@gmail.com> wrote: > Hey Krishna, > > Let me clarify the current state of things a little: > 1. Kafka offers a single producer interface as well as two consumer > interfaces: the low-level "simple consumer" which just directly makes > network requests, and the higher level interface which handles > fault-tolerance, partition assignment, etc. These have been in all releases > and not too much has changed with them. > 2. Partitioning in the producer is controlled by the key specified with > the message. This key is used to assign the message to a partition. This is > the normal mechanism for balancing load. If no key is specified the > producer will connect to a single broker at random and send its traffic > there (to minimize the number of tcp connections). If you have many > producers this will also balance traffic, but if you have just one it will > not and you will want to specify some partitioning key (you can even just > use a random number if you like). This behavior has really really confused > people and seems to have been a mistake on our part. > > In an effort to simplify these interfaces as well as improve a lot of > other things, we are working on a future replacement producer and consumer. > The intention is that these will replace the existing clients (both the > current producer, as well as the simple and high-level consumer). This is > the KafkaProducer and KafkaConsumer discussion you are referring to. These > are not yet available, code is being written right now. The producer is > available in beta form on trunk if you want to try it out but the consumer > does not yet exist so you definitely can't use that. :-) > > Hope that helps! > > Cheers, > > -Jay > > On Wed, Mar 19, 2014 at 2:33 AM, Krishna Raj > <reach.krishna...@gmail.com<javascript:_e(%7B%7D,'cvml','reach.krishna...@gmail.com');> > > wrote: > >> Hello Experts & Kafka Team, >> >> Its existing to learn and work on using Kafka. I have been going through >> lot of pages and Q&A. >> >> We are building an infra & topology using Kafka for events processing in >> our application. >> >> We need some advice about designing the Producer and Consumer. >> >> *Please find attached file/below picture* of our current setup that we >> are thinking of. >> >> >> [image: Inline image 1] >> >> *1) Producer:* >> >> I understand that from 0.8.1, the message balancing is done in a fashion >> that the broker will choose a partition after every meta refresh(the >> default of which is 10 mins) >> >> Questions are: >> >> a.* Is there any other mechanism other than changing meta refresh ?* (I >> understand that the logic implementation using custom class is no longer >> supported in 0.8.1) >> >> b. We ultimately want the message to be evenly distributed across >> partitions so that consumer's load is also evenly distributed thus paving >> scalability & reduce lag and will help us scale easily as we can just add >> partition with a corresponding consumer node attached to it. Is this >> advised ? And to achieve this, *what is the optimized meta refresh time >> without affecting performance ?* >> >> *2) Consumer* >> >> a. I was in a though that SimpleConsumer has more flexibillity and >> features. But after reading Neha's below JavaDoc, I am liking KafkaConsumer >> features and the less need to handle at granular level. *What is the >> adviced Consumer, SimpleCosumer or KafkaConsumer ?* >> >> Neha's KafkaCosumer JavaDoc: >> http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/kafka/clients/consumer/KafkaConsumer.html >> >> b. For keeping track of the Offset at each Consumer node, I am thinking >> to manually control the Offset commit(to ensure that processing a message >> is neither missed nor duplicated). On failure or exception, I would also >> log the current Offset in a file or something before exiting so that when I >> start my consumer again I can start from the Offset where I left. *Is >> this a good design ?* >> >> >> Thanks for the time and really appreciate the effort for making Kafka >> amazing :) >> >> Thanks, >> KR >> >> >> >