Re: produce request wire format question

Neha Narkhede Wed, 22 May 2013 16:30:29 -0700

1. Correct
2. The producer does not use or depend on zookeeper anymore. It refreshes
its view of the cluster metadata by using a TopicMetadataRequest to any of
the kafka brokers. It maps a message to a partition using the following
rules -
2.1 If a message has no key, use any available partition
2.2 If a message has a key and the user has defined a custom partitioner,
use it to map the key to a partition id
2.3 If a message has a key and the user has not defined a custom
partitioner, use the default hash based partitioner that ships with Kafka


Thanks,
Neha


On Wed, May 22, 2013 at 1:33 PM, Dave Peterson <[email protected]>wrote:

> Ok, the picture I have in my mind of how things work in 0.8 (from a
> producer's point of view) is as follows:
>
>     1.  An application program sends log messages to a producer.  Each
>         message is provided as a key/value pair, where the key is chosen
>         by the application and the value is the message contents.  By its
>         choice of key, the application may influence or control which
>         partition the message gets sent to.
>
>     2.  The producer receives messages as key/value pairs.  From talking
>         with zookeeper, it knows the set of available brokers and which
>         partitions each broker has.  If the sending application provided a
> key
>         for a given message, the contents of the key may optionally
>         influence the producer's choice of broker and partition to send the
>         message to, according to some convention understood by both
>         application program and producer.
>
> Is this correct?
>
> Thanks,
> Dave
>
> On Wed, May 22, 2013 at 9:28 AM, Jun Rao <[email protected]> wrote:
> > Dave,
> >
> > Currently, the broker expects each producer request to specify the exact
> > partition id (-1 is on longer valid). The mapping from a message to a
> > partition is done at the producer client. The producer can choose a
> random
> > partition (from the existing list of partitions) or deterministically
> > choose a partition based on the key.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Tue, May 21, 2013 at 1:12 PM, Dave Peterson <[email protected]
> >wrote:
> >
> >> In my case, there is a load balancer between the producers and the
> >> brokers, so I want the behavior described for the Java client (null key
> >> specifies "any partition").  If the Key field of each individual message
> >> specifies the partition to send it to, then I don't understand the
> purpose
> >> of the 32-bit partition identifier that precedes each message set in a
> >> produce request: what if a produce request specifies "partition N" for a
> >> given message set, and then each individual message in the set
> >> specifies a different partition in its Key field?  Also, the above-
> >> mentioned partition identifier is a 32-bit integer and the Key field of
> >> each individual message can contain data of arbitrary length, which
> >> seems inconsistent.  Is a partition identifier a 32-bit integer, or can
> it
> >> be of arbitrary length?
> >>
> >> Thanks,
> >> Dave
> >>
> >> On Tue, May 21, 2013 at 12:30 PM, Neha Narkhede <
> [email protected]>
> >> wrote:
> >> > Dave,
> >> >
> >> > Colin described the producer behavior of picking the partition for a
> >> > message before it is sent to Kafka broker correctly. However, I'm
> >> > interested in knowing your use case a little before to see why you
> would
> >> > rather have the broker decide the partition?
> >> >
> >> > Thanks,
> >> > Neha
> >> >
> >> >
> >> > On Tue, May 21, 2013 at 12:05 PM, Colin Blower <[email protected]
> >> >wrote:
> >> >
> >> >> The key is used by the client to decide which partition to send the
> >> >> message to. By the time the client is creating the produce request,
> it
> >> >> should be known which partition each message is being sent to. I
> believe
> >> >> Neha described the behavior of the Java client which sends messages
> with
> >> >> a null key to any partition.
> >> >>
> >> >> The key is described in past tense because of the use case for
> >> >> persisting keys with messages. The key is persisted through the
> broker
> >> >> so that a consumer knows what key was used to partition the message
> on
> >> >> the producer side.
> >> >>
> >> >> I don't believe that you can have the broker decide which partition a
> >> >> message goes to.
> >> >>
> >> >> --
> >> >> Colin B.
> >> >>
> >> >> On 05/21/2013 11:48 AM, Dave Peterson wrote:
> >> >> > I'm looking at the document entitled "A Guide to the Kafka
> Protocol"
> >> >> > located here:
> >> >> >
> >> >> >
> https://cwiki.apache.org/KAFKA/a-guide-to-the-kafka-protocol.html
> >> >> >
> >> >> > It shows a produce request as containing a number of message sets,
> >> which
> >> >> are
> >> >> > grouped first by topic and second by partition (a 32-bit integer).
> >> >> > However, each
> >> >> > message in a message set contains a Key field, which is described
> as
> >> >> follows:
> >> >> >
> >> >> >     The key is an optional message key that was used for partition
> >> >> assignment.
> >> >> >     The key can be null.
> >> >> >
> >> >> > I notice the use of "was" (past tense) above.  That seems to
> suggest
> >> >> that the
> >> >> > Key field was once used to specify a partition (at the granularity
> of
> >> >> each
> >> >> > individual message), but the plan for the future is to instead use
> the
> >> >> 32-bit
> >> >> > partition value preceding each message set.  Is this correct?  If
> so,
> >> >> when I am
> >> >> > creating a produce request for 0.8, what should I use for the
> 32-bit
> >> >> partition
> >> >> > value, and how does this relate to the Key field of each individual
> >> >> message?
> >> >> > Ideally, I would like to just send a produce request and let the
> >> broker
> >> >> choose
> >> >> > the partition.  How do I accomplish this in 0.8, and are there
> plans
> >> to
> >> >> change
> >> >> > this after 0.8?
> >> >> >
> >> >> > Thanks,
> >> >> > Dave
> >> >> >
> >> >> > On Tue, May 21, 2013 at 10:47 AM, Neha Narkhede <
> >> [email protected]>
> >> >> wrote:
> >> >> >> No. In 0.8, if you don't specify a key for a message, it is sent
> to
> >> any
> >> >> of
> >> >> >> the available partitions. In other words, the partition id is
> >> selected
> >> >> on
> >> >> >> the partition and the server doesn't get -1 as the partition id.
> >> >> >>
> >> >> >> Thanks,
> >> >> >> Neha
> >> >> >>
> >> >> >>
> >> >> >> On Tue, May 21, 2013 at 9:54 AM, Dave Peterson <
> >> [email protected]
> >> >> >wrote:
> >> >> >>
> >> >> >>> In the version 0.8 wire format for a produce request, does a
> value
> >> of
> >> >> -1
> >> >> >>> still indicate "use a random partition" as it did for 0.7?
> >> >> >>>
> >> >> >>> Thanks,
> >> >> >>> Dave
> >> >> >>>
> >> >>
> >> >>
> >> >>
> >>
>

Re: produce request wire format question

Reply via email to