Swapnil What do you mean by "I did a local test today that showed that choosing DefaultPartitioner with null key in the messages appended data to multiple partitions"?
Are messages being duplicated across partitions? -Chetan On Sat, Sep 14, 2013 at 9:02 PM, Swapnil Ghike <sgh...@linkedin.com> wrote: > Hi Joe, Drew, > > In 0.8 HEAD, if the key is null, the DefaultEventHandler randomly chooses > an available partition and never calls the partitioner.partition(key, > numPartitions) method. This is done in lines 204 to 212 of the github > commit Drew pointed to, though that piece of code is slightly different now > because of KAFKA-1017 and KAFKA-959. > > I did a local test today that showed that choosing DefaultPartitioner with > null key in the messages appended data to multiple partitions. For this > Test, I set topic.metadata.refresh.interval.ms to 1 second because 0.8 > HEAD > Sticks to a partition in a given topic.metadata.refresh.interval.ms (as is > being discussed in the other e-mail thread on dev@kafka). > > Please let me know if you see different results. > > Thanks, > Swapnil > > > > On 9/13/13 1:48 PM, "Joe Stein" <crypt...@gmail.com> wrote: > > >Isn't this a bug? > > > >I don't see why we would want users to have to code and generate random > >partition keys to randomly distributed the data to partitions, that is > >Kafka's job isn't it? > > > >Or if supplying a null value tell the user this is not supported (throw > >exception) in KeyedMessage like we do for topic and not treat null as a > >key > >to hash? > > > >My preference is to put those three lines back in and let key be null and > >give folks randomness unless its not a bug and there is a good reason for > >it? > > > >Is there something about > >https://issues.apache.org/jira/browse/KAFKA-691that requires the lines > >taken out? I haven't had a chance to look through > >it yet > > > >My thought is a new person coming in they would expect to see the > >partitions filling up in a round robin fashion as our docs says and unless > >we force them in the API to know they have to-do this or give them the > >ability for this to happen when passing nothing in > > > >/******************************************* > > Joe Stein > > Founder, Principal Consultant > > Big Data Open Source Security LLC > > http://www.stealth.ly > > Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop> > >********************************************/ > > > > > >On Fri, Sep 13, 2013 at 4:17 PM, Drew Goya <d...@gradientx.com> wrote: > > > >> I ran into this problem as well Prashant. The default partition key was > >> recently changed: > >> > >> > >> > >> > https://github.com/apache/kafka/commit/b71e6dc352770f22daec0c9a3682138666 > >>f032be > >> > >> It no longer assigns a random partition to data with a null partition > >>key. > >> I had to change my code to generate random partition keys to get the > >> randomly distributed behavior the producer used to have. > >> > >> > >> On Fri, Sep 13, 2013 at 11:42 AM, prashant amar <amasin...@gmail.com> > >> wrote: > >> > >> > Thanks Neha > >> > > >> > I will try applying this property and circle back. > >> > > >> > Also, I have been attempting to execute kafka-producer-perf-test.sh > >>and I > >> > receive the following error > >> > > >> > Error: Could not find or load main class > >> > kafka.perf.ProducerPerformance > >> > > >> > I am running against 0.8.0-beta1 > >> > > >> > Seems like perf is a separate project in the workspace. > >> > > >> > Does sbt package-assembly bundle the perf jar as well? > >> > > >> > Neither producer-perf-test not consumer-test are working with this > >>build > >> > > >> > > >> > > >> > On Fri, Sep 13, 2013 at 9:56 AM, Neha Narkhede > >><neha.narkh...@gmail.com > >> > >wrote: > >> > > >> > > As Jun suggested, one reason could be that the > >> > > topic.metadata.refresh.interval.ms is too high. Did you observe if > >>the > >> > > distribution improves after topic.metadata.refresh.interval.ms has > >> > passed > >> > > ? > >> > > > >> > > Thanks > >> > > Neha > >> > > > >> > > > >> > > On Fri, Sep 13, 2013 at 4:47 AM, prashant amar <amasin...@gmail.com > > > >> > > wrote: > >> > > > >> > > > I am using kafka 08 version ... > >> > > > > >> > > > > >> > > > On Thu, Sep 12, 2013 at 8:44 PM, Jun Rao <jun...@gmail.com> > wrote: > >> > > > > >> > > > > Which revision of 0.8 are you using? In a recent change, a > >>producer > >> > > will > >> > > > > stick to a partition for topic.metadata.refresh.interval.ms > >> (defaults > >> > > to > >> > > > > 10 > >> > > > > mins) time before picking another partition at random. > >> > > > > Thanks, > >> > > > > Jun > >> > > > > > >> > > > > > >> > > > > On Thu, Sep 12, 2013 at 1:56 PM, prashant amar < > >> amasin...@gmail.com> > >> > > > > wrote: > >> > > > > > >> > > > > > I created a topic with 4 partitions and for some reason the > >> > producer > >> > > is > >> > > > > > pushing only to one partition. > >> > > > > > > >> > > > > > This is consistently happening across all topics that I > >>created > >> ... > >> > > > > > > >> > > > > > Is there a specific configuration that I need to apply to > >>ensure > >> > that > >> > > > > load > >> > > > > > is evenly distributed across all partitions? > >> > > > > > > >> > > > > > > >> > > > > > Group Topic Pid Offset > >> > > > > logSize > >> > > > > > Lag Owner > >> > > > > > perfgroup1 perfpayload1 0 10965 > >> > > > 11220 > >> > > > > > 255 perfgroup1_XXXX-0 > >> > > > > > perfgroup1 perfpayload1 1 0 > >> > 0 > >> > > > > > 0 perfgroup1_XXXX-1 > >> > > > > > perfgroup1 perfpayload1 2 0 > >> > 0 > >> > > > > > 0 perfgroup1_XXXXX-2 > >> > > > > > perfgroup1 perfpayload1 3 0 > >> > 0 > >> > > > > > 0 perfgroup1_XXXXX-3 > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >