Re: Producer not distributing across all partitions

Swapnil Ghike Sat, 14 Sep 2013 12:45:56 -0700

Hi Prashant,

I tried a local test using a very short topic.metadata.refresh.interval.ms
on the producer. The server had two partitions and both of them appended
data. Could you check if you have set the
topic.metadata.refresh.interval.ms on your producer to a very high value?


Swapnil

On 9/13/13 8:46 PM, "Jun Rao" <jun...@gmail.com> wrote:

>Without fixing KAFKA-1017, the issue is that the producer will maintain a
>socket connection per min(#partitions, #brokers). If you have lots of
>producers, the open file handlers on the broker could be an issue.
>
>So, what KAFKA-1017 fixes is to pick a random partition and stick to it
>for
>a configurable amount of time, and then switch to another random
>partition.
>This is the behavior in 0.7 when a load balancer is used and reduces #
>socket connections significantly.
>
>The issue you are reporting seems like a bug though. Which revision in 0.8
>are you using?
>
>Thanks,
>
>Jun
>
>
>On Fri, Sep 13, 2013 at 8:28 PM, prashant amar <amasin...@gmail.com>
>wrote:
>
>> Hi Guozhang, Joe, Drew
>>
>> In our case we have been running for the past 3 weeks and it has been
>> consistently writing only to to the first partition. The rest of the
>> partitions have empty index files.
>>
>> Not sure if I am hitting any issue here.
>>
>> I am using  offset checker as my barometer. Also introspect r&d the
>>folder
>> and it indicates the same.
>>
>> On Friday, September 13, 2013, Guozhang Wang wrote:
>>
>> > Hello Joe,
>> >
>> > The reason we make the producers to produce to a fixed partition for
>>each
>> > metadata-refresh interval are the following:
>> >
>> > https://issues.apache.org/jira/browse/KAFKA-1017
>> >
>> > https://issues.apache.org/jira/browse/KAFKA-959
>> >
>> > So in a word the randomness is still preserved but within one
>> > metadata-refresh interval the assignment is fixed.
>> >
>> > I agree that the document should be updated accordingly.
>> >
>> > Guozhang
>> >
>> >
>> > On Fri, Sep 13, 2013 at 1:48 PM, Joe Stein <crypt...@gmail.com> wrote:
>> >
>> > > Isn't this a bug?
>> > >
>> > > I don't see why we would want users to have to code and generate
>>random
>> > > partition keys to randomly distributed the data to partitions, that
>>is
>> > > Kafka's job isn't it?
>> > >
>> > > Or if supplying a null value tell the user this is not supported
>>(throw
>> > > exception) in KeyedMessage like we do for topic and not treat null
>>as a
>> > key
>> > > to hash?
>> > >
>> > > My preference is to put those three lines back in and let key be
>>null
>> and
>> > > give folks randomness unless its not a bug and there is a good
>>reason
>> for
>> > > it?
>> > >
>> > > Is there something about
>> > > https://issues.apache.org/jira/browse/KAFKA-691that requires the
>>lines
>> > > taken out? I haven't had a chance to look through
>> > > it yet
>> > >
>> > > My thought is a new person coming in they would expect to see the
>> > > partitions filling up in a round robin fashion as our docs says and
>> > unless
>> > > we force them in the API to know they have to-do this or give them
>>the
>> > > ability for this to happen when passing nothing in
>> > >
>> > > /*******************************************
>> > >  Joe Stein
>> > >  Founder, Principal Consultant
>> > >  Big Data Open Source Security LLC
>> > >  http://www.stealth.ly
>> > >  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
>> > > ********************************************/
>> > >
>> > >
>> > > On Fri, Sep 13, 2013 at 4:17 PM, Drew Goya <d...@gradientx.com>
>>wrote:
>> > >
>> > > > I ran into this problem as well Prashant.  The default partition
>>key
>> > was
>> > > > recently changed:
>> > > >
>> > > >
>> > > >
>> > >
>> >
>> 
>>https://github.com/apache/kafka/commit/b71e6dc352770f22daec0c9a3682138666
>>f032be
>> > > >
>> > > > It no longer assigns a random partition to data with a null
>>partition
>> > > key.
>> > > >  I had to change my code to generate random partition keys to get
>>the
>> > > > randomly distributed behavior the producer used to have.
>> > > >
>> > > >
>> > > > On Fri, Sep 13, 2013 at 11:42 AM, prashant amar
>><amasin...@gmail.com
>> >
>> > > > wrote:
>> > > >
>> > > > > Thanks Neha
>> > > > >
>> > > > > I will try applying this property and circle back.
>> > > > >
>> > > > > Also, I have been attempting to execute
>>kafka-producer-perf-test.sh
>> > > and I
>> > > > > receive the following error
>> > > > >
>> > > > >        Error: Could not find or load main class
>> > > > > kafka.perf.ProducerPerformance
>> > > > >
>> > > > > I am running against 0.8.0-beta1
>> > > > >
>> > > > > Seems like perf is a separate project in the workspace.
>> > > > >
>> > > > > Does sbt package-assembly bundle the perf jar as well?
>> > > > >
>> > > > > Neither producer-perf-test not consumer-test are working with
>>this
>> > > build
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Fri, Sep 13, 2013 at 9:56 AM, Neha Narkhede <
>> > > neha.narkh...@gmail.com
>> > > > > >wrote:
>> > > > >
>> > > > > > As Jun suggested, one reason could be that the
>> > > > > > topic.metadata.refresh.interval.ms is too high. Did you
>>observe
>> if
>> > > the
>> > > > > > distribution improves after
>>topic.metadata.refresh.interval.mshas
>> > > > > passed
>> > > > > > ?
>> > > > > >
>> > > > > > Thanks
>> > > > > > Neha
>> > > > > >
>> > > > > >
>> > > > > > On Fri, Sep 13, 2013 at 4:47 AM, prashant amar <
>> > amasin...@gmail.com>
>> > > > > > wrote:
>> > > > > >
>> > > > > --
>> > -- Guozhang
>> >
>>

Re: Producer not distributing across all partitions

Reply via email to