Re: produce request wire format question

Dave Peterson Thu, 23 May 2013 09:43:44 -0700

Ok, thanks for the information.  Looking at the wire format for the
metadata response, I see that the right hand side of the TopicMetadata
production contains a TopicErrorCode, and the right hand side of the
PartitionMetadata production contains a PartitionErrorCode.  Are both
of these 16-bit values?  In general, where it isn't stated explicitly in
the documentation, can I assume that all error codes are 16-bit values?


Thanks,
Dave


On Wed, May 22, 2013 at 4:29 PM, Neha Narkhede <[email protected]> wrote:
> 1. Correct
> 2. The producer does not use or depend on zookeeper anymore. It refreshes
> its view of the cluster metadata by using a TopicMetadataRequest to any of
> the kafka brokers. It maps a message to a partition using the following
> rules -
> 2.1 If a message has no key, use any available partition
> 2.2 If a message has a key and the user has defined a custom partitioner,
> use it to map the key to a partition id
> 2.3 If a message has a key and the user has not defined a custom
> partitioner, use the default hash based partitioner that ships with Kafka
>
> Thanks,
> Neha
>
>
> On Wed, May 22, 2013 at 1:33 PM, Dave Peterson <[email protected]>wrote:
>
>> Ok, the picture I have in my mind of how things work in 0.8 (from a
>> producer's point of view) is as follows:
>>
>>     1.  An application program sends log messages to a producer.  Each
>>         message is provided as a key/value pair, where the key is chosen
>>         by the application and the value is the message contents.  By its
>>         choice of key, the application may influence or control which
>>         partition the message gets sent to.
>>
>>     2.  The producer receives messages as key/value pairs.  From talking
>>         with zookeeper, it knows the set of available brokers and which
>>         partitions each broker has.  If the sending application provided a
>> key
>>         for a given message, the contents of the key may optionally
>>         influence the producer's choice of broker and partition to send the
>>         message to, according to some convention understood by both
>>         application program and producer.
>>
>> Is this correct?
>>
>> Thanks,
>> Dave
>>
>> On Wed, May 22, 2013 at 9:28 AM, Jun Rao <[email protected]> wrote:
>> > Dave,
>> >
>> > Currently, the broker expects each producer request to specify the exact
>> > partition id (-1 is on longer valid). The mapping from a message to a
>> > partition is done at the producer client. The producer can choose a
>> random
>> > partition (from the existing list of partitions) or deterministically
>> > choose a partition based on the key.
>> >
>> > Thanks,
>> >
>> > Jun
>> >
>> >
>> > On Tue, May 21, 2013 at 1:12 PM, Dave Peterson <[email protected]
>> >wrote:
>> >
>> >> In my case, there is a load balancer between the producers and the
>> >> brokers, so I want the behavior described for the Java client (null key
>> >> specifies "any partition").  If the Key field of each individual message
>> >> specifies the partition to send it to, then I don't understand the
>> purpose
>> >> of the 32-bit partition identifier that precedes each message set in a
>> >> produce request: what if a produce request specifies "partition N" for a
>> >> given message set, and then each individual message in the set
>> >> specifies a different partition in its Key field?  Also, the above-
>> >> mentioned partition identifier is a 32-bit integer and the Key field of
>> >> each individual message can contain data of arbitrary length, which
>> >> seems inconsistent.  Is a partition identifier a 32-bit integer, or can
>> it
>> >> be of arbitrary length?
>> >>
>> >> Thanks,
>> >> Dave
>> >>
>> >> On Tue, May 21, 2013 at 12:30 PM, Neha Narkhede <
>> [email protected]>
>> >> wrote:
>> >> > Dave,
>> >> >
>> >> > Colin described the producer behavior of picking the partition for a
>> >> > message before it is sent to Kafka broker correctly. However, I'm
>> >> > interested in knowing your use case a little before to see why you
>> would
>> >> > rather have the broker decide the partition?
>> >> >
>> >> > Thanks,
>> >> > Neha
>> >> >
>> >> >
>> >> > On Tue, May 21, 2013 at 12:05 PM, Colin Blower <[email protected]
>> >> >wrote:
>> >> >
>> >> >> The key is used by the client to decide which partition to send the
>> >> >> message to. By the time the client is creating the produce request,
>> it
>> >> >> should be known which partition each message is being sent to. I
>> believe
>> >> >> Neha described the behavior of the Java client which sends messages
>> with
>> >> >> a null key to any partition.
>> >> >>
>> >> >> The key is described in past tense because of the use case for
>> >> >> persisting keys with messages. The key is persisted through the
>> broker
>> >> >> so that a consumer knows what key was used to partition the message
>> on
>> >> >> the producer side.
>> >> >>
>> >> >> I don't believe that you can have the broker decide which partition a
>> >> >> message goes to.
>> >> >>
>> >> >> --
>> >> >> Colin B.
>> >> >>
>> >> >> On 05/21/2013 11:48 AM, Dave Peterson wrote:
>> >> >> > I'm looking at the document entitled "A Guide to the Kafka
>> Protocol"
>> >> >> > located here:
>> >> >> >
>> >> >> >
>> https://cwiki.apache.org/KAFKA/a-guide-to-the-kafka-protocol.html
>> >> >> >
>> >> >> > It shows a produce request as containing a number of message sets,
>> >> which
>> >> >> are
>> >> >> > grouped first by topic and second by partition (a 32-bit integer).
>> >> >> > However, each
>> >> >> > message in a message set contains a Key field, which is described
>> as
>> >> >> follows:
>> >> >> >
>> >> >> >     The key is an optional message key that was used for partition
>> >> >> assignment.
>> >> >> >     The key can be null.
>> >> >> >
>> >> >> > I notice the use of "was" (past tense) above.  That seems to
>> suggest
>> >> >> that the
>> >> >> > Key field was once used to specify a partition (at the granularity
>> of
>> >> >> each
>> >> >> > individual message), but the plan for the future is to instead use
>> the
>> >> >> 32-bit
>> >> >> > partition value preceding each message set.  Is this correct?  If
>> so,
>> >> >> when I am
>> >> >> > creating a produce request for 0.8, what should I use for the
>> 32-bit
>> >> >> partition
>> >> >> > value, and how does this relate to the Key field of each individual
>> >> >> message?
>> >> >> > Ideally, I would like to just send a produce request and let the
>> >> broker
>> >> >> choose
>> >> >> > the partition.  How do I accomplish this in 0.8, and are there
>> plans
>> >> to
>> >> >> change
>> >> >> > this after 0.8?
>> >> >> >
>> >> >> > Thanks,
>> >> >> > Dave
>> >> >> >
>> >> >> > On Tue, May 21, 2013 at 10:47 AM, Neha Narkhede <
>> >> [email protected]>
>> >> >> wrote:
>> >> >> >> No. In 0.8, if you don't specify a key for a message, it is sent
>> to
>> >> any
>> >> >> of
>> >> >> >> the available partitions. In other words, the partition id is
>> >> selected
>> >> >> on
>> >> >> >> the partition and the server doesn't get -1 as the partition id.
>> >> >> >>
>> >> >> >> Thanks,
>> >> >> >> Neha
>> >> >> >>
>> >> >> >>
>> >> >> >> On Tue, May 21, 2013 at 9:54 AM, Dave Peterson <
>> >> [email protected]
>> >> >> >wrote:
>> >> >> >>
>> >> >> >>> In the version 0.8 wire format for a produce request, does a
>> value
>> >> of
>> >> >> -1
>> >> >> >>> still indicate "use a random partition" as it did for 0.7?
>> >> >> >>>
>> >> >> >>> Thanks,
>> >> >> >>> Dave
>> >> >> >>>
>> >> >>
>> >> >>
>> >> >>
>> >>
>>

Re: produce request wire format question

Reply via email to