1. Correct 2. The producer does not use or depend on zookeeper anymore. It refreshes its view of the cluster metadata by using a TopicMetadataRequest to any of the kafka brokers. It maps a message to a partition using the following rules - 2.1 If a message has no key, use any available partition 2.2 If a message has a key and the user has defined a custom partitioner, use it to map the key to a partition id 2.3 If a message has a key and the user has not defined a custom partitioner, use the default hash based partitioner that ships with Kafka
Thanks, Neha On Wed, May 22, 2013 at 1:33 PM, Dave Peterson <dspeter...@tagged.com>wrote: > Ok, the picture I have in my mind of how things work in 0.8 (from a > producer's point of view) is as follows: > > 1. An application program sends log messages to a producer. Each > message is provided as a key/value pair, where the key is chosen > by the application and the value is the message contents. By its > choice of key, the application may influence or control which > partition the message gets sent to. > > 2. The producer receives messages as key/value pairs. From talking > with zookeeper, it knows the set of available brokers and which > partitions each broker has. If the sending application provided a > key > for a given message, the contents of the key may optionally > influence the producer's choice of broker and partition to send the > message to, according to some convention understood by both > application program and producer. > > Is this correct? > > Thanks, > Dave > > On Wed, May 22, 2013 at 9:28 AM, Jun Rao <jun...@gmail.com> wrote: > > Dave, > > > > Currently, the broker expects each producer request to specify the exact > > partition id (-1 is on longer valid). The mapping from a message to a > > partition is done at the producer client. The producer can choose a > random > > partition (from the existing list of partitions) or deterministically > > choose a partition based on the key. > > > > Thanks, > > > > Jun > > > > > > On Tue, May 21, 2013 at 1:12 PM, Dave Peterson <dspeter...@tagged.com > >wrote: > > > >> In my case, there is a load balancer between the producers and the > >> brokers, so I want the behavior described for the Java client (null key > >> specifies "any partition"). If the Key field of each individual message > >> specifies the partition to send it to, then I don't understand the > purpose > >> of the 32-bit partition identifier that precedes each message set in a > >> produce request: what if a produce request specifies "partition N" for a > >> given message set, and then each individual message in the set > >> specifies a different partition in its Key field? Also, the above- > >> mentioned partition identifier is a 32-bit integer and the Key field of > >> each individual message can contain data of arbitrary length, which > >> seems inconsistent. Is a partition identifier a 32-bit integer, or can > it > >> be of arbitrary length? > >> > >> Thanks, > >> Dave > >> > >> On Tue, May 21, 2013 at 12:30 PM, Neha Narkhede < > neha.narkh...@gmail.com> > >> wrote: > >> > Dave, > >> > > >> > Colin described the producer behavior of picking the partition for a > >> > message before it is sent to Kafka broker correctly. However, I'm > >> > interested in knowing your use case a little before to see why you > would > >> > rather have the broker decide the partition? > >> > > >> > Thanks, > >> > Neha > >> > > >> > > >> > On Tue, May 21, 2013 at 12:05 PM, Colin Blower <cblo...@barracuda.com > >> >wrote: > >> > > >> >> The key is used by the client to decide which partition to send the > >> >> message to. By the time the client is creating the produce request, > it > >> >> should be known which partition each message is being sent to. I > believe > >> >> Neha described the behavior of the Java client which sends messages > with > >> >> a null key to any partition. > >> >> > >> >> The key is described in past tense because of the use case for > >> >> persisting keys with messages. The key is persisted through the > broker > >> >> so that a consumer knows what key was used to partition the message > on > >> >> the producer side. > >> >> > >> >> I don't believe that you can have the broker decide which partition a > >> >> message goes to. > >> >> > >> >> -- > >> >> Colin B. > >> >> > >> >> On 05/21/2013 11:48 AM, Dave Peterson wrote: > >> >> > I'm looking at the document entitled "A Guide to the Kafka > Protocol" > >> >> > located here: > >> >> > > >> >> > > https://cwiki.apache.org/KAFKA/a-guide-to-the-kafka-protocol.html > >> >> > > >> >> > It shows a produce request as containing a number of message sets, > >> which > >> >> are > >> >> > grouped first by topic and second by partition (a 32-bit integer). > >> >> > However, each > >> >> > message in a message set contains a Key field, which is described > as > >> >> follows: > >> >> > > >> >> > The key is an optional message key that was used for partition > >> >> assignment. > >> >> > The key can be null. > >> >> > > >> >> > I notice the use of "was" (past tense) above. That seems to > suggest > >> >> that the > >> >> > Key field was once used to specify a partition (at the granularity > of > >> >> each > >> >> > individual message), but the plan for the future is to instead use > the > >> >> 32-bit > >> >> > partition value preceding each message set. Is this correct? If > so, > >> >> when I am > >> >> > creating a produce request for 0.8, what should I use for the > 32-bit > >> >> partition > >> >> > value, and how does this relate to the Key field of each individual > >> >> message? > >> >> > Ideally, I would like to just send a produce request and let the > >> broker > >> >> choose > >> >> > the partition. How do I accomplish this in 0.8, and are there > plans > >> to > >> >> change > >> >> > this after 0.8? > >> >> > > >> >> > Thanks, > >> >> > Dave > >> >> > > >> >> > On Tue, May 21, 2013 at 10:47 AM, Neha Narkhede < > >> neha.narkh...@gmail.com> > >> >> wrote: > >> >> >> No. In 0.8, if you don't specify a key for a message, it is sent > to > >> any > >> >> of > >> >> >> the available partitions. In other words, the partition id is > >> selected > >> >> on > >> >> >> the partition and the server doesn't get -1 as the partition id. > >> >> >> > >> >> >> Thanks, > >> >> >> Neha > >> >> >> > >> >> >> > >> >> >> On Tue, May 21, 2013 at 9:54 AM, Dave Peterson < > >> dspeter...@tagged.com > >> >> >wrote: > >> >> >> > >> >> >>> In the version 0.8 wire format for a produce request, does a > value > >> of > >> >> -1 > >> >> >>> still indicate "use a random partition" as it did for 0.7? > >> >> >>> > >> >> >>> Thanks, > >> >> >>> Dave > >> >> >>> > >> >> > >> >> > >> >> > >> >