Swapnil

What do you mean by "I did a local test today that showed that choosing
DefaultPartitioner with
null key in the messages appended data to multiple partitions"?

Are messages being duplicated across partitions?

-Chetan


On Sat, Sep 14, 2013 at 9:02 PM, Swapnil Ghike <sgh...@linkedin.com> wrote:

> Hi Joe, Drew,
>
> In 0.8 HEAD, if the key is null, the DefaultEventHandler randomly chooses
> an available partition and never calls the partitioner.partition(key,
> numPartitions) method. This is done in lines 204 to 212 of the github
> commit Drew pointed to, though that piece of code is slightly different now
> because of KAFKA-1017 and KAFKA-959.
>
> I did a local test today that showed that choosing DefaultPartitioner with
> null key in the messages appended data to multiple partitions. For this
> Test, I set topic.metadata.refresh.interval.ms to 1 second because 0.8
> HEAD
> Sticks to a partition in a given topic.metadata.refresh.interval.ms (as is
> being discussed in the other e-mail thread on dev@kafka).
>
> Please let me know if you see different results.
>
> Thanks,
> Swapnil
>
>
>
> On 9/13/13 1:48 PM, "Joe Stein" <crypt...@gmail.com> wrote:
>
> >Isn't this a bug?
> >
> >I don't see why we would want users to have to code and generate random
> >partition keys to randomly distributed the data to partitions, that is
> >Kafka's job isn't it?
> >
> >Or if supplying a null value tell the user this is not supported (throw
> >exception) in KeyedMessage like we do for topic and not treat null as a
> >key
> >to hash?
> >
> >My preference is to put those three lines back in and let key be null and
> >give folks randomness unless its not a bug and there is a good reason for
> >it?
> >
> >Is there something about
> >https://issues.apache.org/jira/browse/KAFKA-691that requires the lines
> >taken out? I haven't had a chance to look through
> >it yet
> >
> >My thought is a new person coming in they would expect to see the
> >partitions filling up in a round robin fashion as our docs says and unless
> >we force them in the API to know they have to-do this or give them the
> >ability for this to happen when passing nothing in
> >
> >/*******************************************
> > Joe Stein
> > Founder, Principal Consultant
> > Big Data Open Source Security LLC
> > http://www.stealth.ly
> > Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> >********************************************/
> >
> >
> >On Fri, Sep 13, 2013 at 4:17 PM, Drew Goya <d...@gradientx.com> wrote:
> >
> >> I ran into this problem as well Prashant.  The default partition key was
> >> recently changed:
> >>
> >>
> >>
> >>
> https://github.com/apache/kafka/commit/b71e6dc352770f22daec0c9a3682138666
> >>f032be
> >>
> >> It no longer assigns a random partition to data with a null partition
> >>key.
> >>  I had to change my code to generate random partition keys to get the
> >> randomly distributed behavior the producer used to have.
> >>
> >>
> >> On Fri, Sep 13, 2013 at 11:42 AM, prashant amar <amasin...@gmail.com>
> >> wrote:
> >>
> >> > Thanks Neha
> >> >
> >> > I will try applying this property and circle back.
> >> >
> >> > Also, I have been attempting to execute kafka-producer-perf-test.sh
> >>and I
> >> > receive the following error
> >> >
> >> >        Error: Could not find or load main class
> >> > kafka.perf.ProducerPerformance
> >> >
> >> > I am running against 0.8.0-beta1
> >> >
> >> > Seems like perf is a separate project in the workspace.
> >> >
> >> > Does sbt package-assembly bundle the perf jar as well?
> >> >
> >> > Neither producer-perf-test not consumer-test are working with this
> >>build
> >> >
> >> >
> >> >
> >> > On Fri, Sep 13, 2013 at 9:56 AM, Neha Narkhede
> >><neha.narkh...@gmail.com
> >> > >wrote:
> >> >
> >> > > As Jun suggested, one reason could be that the
> >> > > topic.metadata.refresh.interval.ms is too high. Did you observe if
> >>the
> >> > > distribution improves after topic.metadata.refresh.interval.ms has
> >> > passed
> >> > > ?
> >> > >
> >> > > Thanks
> >> > > Neha
> >> > >
> >> > >
> >> > > On Fri, Sep 13, 2013 at 4:47 AM, prashant amar <amasin...@gmail.com
> >
> >> > > wrote:
> >> > >
> >> > > > I am using kafka 08 version ...
> >> > > >
> >> > > >
> >> > > > On Thu, Sep 12, 2013 at 8:44 PM, Jun Rao <jun...@gmail.com>
> wrote:
> >> > > >
> >> > > > > Which revision of 0.8 are you using? In a recent change, a
> >>producer
> >> > > will
> >> > > > > stick to a partition for topic.metadata.refresh.interval.ms
> >> (defaults
> >> > > to
> >> > > > > 10
> >> > > > > mins) time before picking another partition at random.
> >> > > > > Thanks,
> >> > > > > Jun
> >> > > > >
> >> > > > >
> >> > > > > On Thu, Sep 12, 2013 at 1:56 PM, prashant amar <
> >> amasin...@gmail.com>
> >> > > > > wrote:
> >> > > > >
> >> > > > > > I created a topic with 4 partitions and for some reason the
> >> > producer
> >> > > is
> >> > > > > > pushing only to one partition.
> >> > > > > >
> >> > > > > > This is consistently happening across all topics that I
> >>created
> >> ...
> >> > > > > >
> >> > > > > > Is there a specific configuration that I need to apply to
> >>ensure
> >> > that
> >> > > > > load
> >> > > > > > is evenly distributed across all partitions?
> >> > > > > >
> >> > > > > >
> >> > > > > > Group           Topic                          Pid Offset
> >> > > > >  logSize
> >> > > > > >         Lag             Owner
> >> > > > > > perfgroup1      perfpayload1                   0   10965
> >> > > > 11220
> >> > > > > >         255             perfgroup1_XXXX-0
> >> > > > > > perfgroup1      perfpayload1                   1   0
> >> > 0
> >> > > > > >         0               perfgroup1_XXXX-1
> >> > > > > > perfgroup1      perfpayload1                   2   0
> >> > 0
> >> > > > > >         0               perfgroup1_XXXXX-2
> >> > > > > > perfgroup1      perfpayload1                   3   0
> >> > 0
> >> > > > > >         0               perfgroup1_XXXXX-3
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
>
>

Reply via email to