Re: Producer not distributing across all partitions

Swapnil Ghike Sat, 14 Sep 2013 21:02:37 -0700

Hi Joe, Drew,

In 0.8 HEAD, if the key is null, the DefaultEventHandler randomly chooses
an available partition and never calls the partitioner.partition(key,
numPartitions) method. This is done in lines 204 to 212 of the github
commit Drew pointed to, though that piece of code is slightly different now
because of KAFKA-1017 and KAFKA-959.


I did a local test today that showed that choosing DefaultPartitioner with
null key in the messages appended data to multiple partitions. For this
Test, I set topic.metadata.refresh.interval.ms to 1 second because 0.8
HEAD 
Sticks to a partition in a given topic.metadata.refresh.interval.ms (as is
being discussed in the other e-mail thread on dev@kafka).

Please let me know if you see different results.

Thanks,
Swapnil



On 9/13/13 1:48 PM, "Joe Stein" <crypt...@gmail.com> wrote:

>Isn't this a bug?
>
>I don't see why we would want users to have to code and generate random
>partition keys to randomly distributed the data to partitions, that is
>Kafka's job isn't it?
>
>Or if supplying a null value tell the user this is not supported (throw
>exception) in KeyedMessage like we do for topic and not treat null as a
>key
>to hash?
>
>My preference is to put those three lines back in and let key be null and
>give folks randomness unless its not a bug and there is a good reason for
>it?
>
>Is there something about
>https://issues.apache.org/jira/browse/KAFKA-691that requires the lines
>taken out? I haven't had a chance to look through
>it yet
>
>My thought is a new person coming in they would expect to see the
>partitions filling up in a round robin fashion as our docs says and unless
>we force them in the API to know they have to-do this or give them the
>ability for this to happen when passing nothing in
>
>/*******************************************
> Joe Stein
> Founder, Principal Consultant
> Big Data Open Source Security LLC
> http://www.stealth.ly
> Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
>********************************************/
>
>
>On Fri, Sep 13, 2013 at 4:17 PM, Drew Goya <d...@gradientx.com> wrote:
>
>> I ran into this problem as well Prashant.  The default partition key was
>> recently changed:
>>
>>
>> 
>>https://github.com/apache/kafka/commit/b71e6dc352770f22daec0c9a3682138666
>>f032be
>>
>> It no longer assigns a random partition to data with a null partition
>>key.
>>  I had to change my code to generate random partition keys to get the
>> randomly distributed behavior the producer used to have.
>>
>>
>> On Fri, Sep 13, 2013 at 11:42 AM, prashant amar <amasin...@gmail.com>
>> wrote:
>>
>> > Thanks Neha
>> >
>> > I will try applying this property and circle back.
>> >
>> > Also, I have been attempting to execute kafka-producer-perf-test.sh
>>and I
>> > receive the following error
>> >
>> >        Error: Could not find or load main class
>> > kafka.perf.ProducerPerformance
>> >
>> > I am running against 0.8.0-beta1
>> >
>> > Seems like perf is a separate project in the workspace.
>> >
>> > Does sbt package-assembly bundle the perf jar as well?
>> >
>> > Neither producer-perf-test not consumer-test are working with this
>>build
>> >
>> >
>> >
>> > On Fri, Sep 13, 2013 at 9:56 AM, Neha Narkhede
>><neha.narkh...@gmail.com
>> > >wrote:
>> >
>> > > As Jun suggested, one reason could be that the
>> > > topic.metadata.refresh.interval.ms is too high. Did you observe if
>>the
>> > > distribution improves after topic.metadata.refresh.interval.ms has
>> > passed
>> > > ?
>> > >
>> > > Thanks
>> > > Neha
>> > >
>> > >
>> > > On Fri, Sep 13, 2013 at 4:47 AM, prashant amar <amasin...@gmail.com>
>> > > wrote:
>> > >
>> > > > I am using kafka 08 version ...
>> > > >
>> > > >
>> > > > On Thu, Sep 12, 2013 at 8:44 PM, Jun Rao <jun...@gmail.com> wrote:
>> > > >
>> > > > > Which revision of 0.8 are you using? In a recent change, a
>>producer
>> > > will
>> > > > > stick to a partition for topic.metadata.refresh.interval.ms
>> (defaults
>> > > to
>> > > > > 10
>> > > > > mins) time before picking another partition at random.
>> > > > > Thanks,
>> > > > > Jun
>> > > > >
>> > > > >
>> > > > > On Thu, Sep 12, 2013 at 1:56 PM, prashant amar <
>> amasin...@gmail.com>
>> > > > > wrote:
>> > > > >
>> > > > > > I created a topic with 4 partitions and for some reason the
>> > producer
>> > > is
>> > > > > > pushing only to one partition.
>> > > > > >
>> > > > > > This is consistently happening across all topics that I
>>created
>> ...
>> > > > > >
>> > > > > > Is there a specific configuration that I need to apply to
>>ensure
>> > that
>> > > > > load
>> > > > > > is evenly distributed across all partitions?
>> > > > > >
>> > > > > >
>> > > > > > Group           Topic                          Pid Offset
>> > > > >  logSize
>> > > > > >         Lag             Owner
>> > > > > > perfgroup1      perfpayload1                   0   10965
>> > > > 11220
>> > > > > >         255             perfgroup1_XXXX-0
>> > > > > > perfgroup1      perfpayload1                   1   0
>> > 0
>> > > > > >         0               perfgroup1_XXXX-1
>> > > > > > perfgroup1      perfpayload1                   2   0
>> > 0
>> > > > > >         0               perfgroup1_XXXXX-2
>> > > > > > perfgroup1      perfpayload1                   3   0
>> > 0
>> > > > > >         0               perfgroup1_XXXXX-3
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >

Re: Producer not distributing across all partitions

Reply via email to