Re: Random Partitioning Issue

Jay Kreps Sun, 15 Sep 2013 08:38:12 -0700

I just took a look at this change. I agree with Joe, not to put to fine a
point on it, but this is a confusing hack.


Jun, I don't think wanting to minimizing the number of TCP connections is
going to be a very common need for people with less than 10k producers. I
also don't think people are going to get very good load balancing out of
this because most people don't have a ton of producers. I think instead we
will spend the next year explaining this behavior which 99% of people will
think is a bug (because it is crazy, non-intuitive, and breaks their usage).

Why was this done by adding special default behavior in the null key case
instead of as a partitioner? The argument that the partitioner interface
doesn't have sufficient information to choose a partition is not a good
argument for hacking in changes to the default, it is an argument for *
improving* the partitioner interface.

The whole point of a partitioner interface is to make it possible to plug
in non-standard behavior like this, right?

-Jay


On Sat, Sep 14, 2013 at 8:15 PM, Jun Rao <[email protected]> wrote:

> Joe,
>
> Thanks for bringing this up. I want to clarify this a bit.
>
> 1. Currently, the producer side logic is that if the partitioning key is
> not provided (i.e., it is null), the partitioner won't be called. We did
> that because we want to select a random and "available" partition to send
> messages so that if some partitions are temporarily unavailable (because of
> broker failures), messages can still be sent to other partitions. Doing
> this in the partitioner is difficult since the partitioner doesn't know
> which partitions are currently available (the DefaultEventHandler does).
>
> 2. As Joel said, the common use case in production is that there are many
> more producers than #partitions in a topic. In this case, sticking to a
> partition for a few minutes is not going to cause too much imbalance in the
> partitions and has the benefit of reducing the # of socket connections. My
> feeling is that this will benefit most production users. In fact, if one
> uses a hardware load balancer for producing data in 0.7, it behaves in
> exactly the same way (a producer will stick to a broker until the reconnect
> interval is reached).
>
> 3. It is true that If one is testing a topic with more than one partition
> (which is not the default value), this behavior can be a bit weird.
> However, I think it can be mitigated by running multiple test producer
> instances.
>
> 4. Someone reported in the mailing list that all data shows in only one
> partition after a few weeks. This is clearly not the expected behavior. We
> can take a closer look to see if this is real issue.
>
> Do you think these address your concerns?
>
> Thanks,
>
> Jun
>
>
>
> On Sat, Sep 14, 2013 at 11:18 AM, Joe Stein <[email protected]> wrote:
>
> > How about creating a new class called RandomRefreshPartioner and copy the
> > DefaultPartitioner code to it and then revert the DefaultPartitioner
> code.
> >  I appreciate this is a one time burden for folks using the existing
> > 0.8-beta1 bumping into KAFKA-1017 in production having to switch to the
> > RandomRefreshPartioner and when folks deploy to production will have to
> > consider this property change.
> >
> > I make this suggestion keeping in mind the new folks that on board with
> > Kafka and when everyone is in development and testing mode for the first
> > time their experience would be as expected from how it would work in
> > production this way.  In dev/test when first using Kafka they won't have
> so
> > many producers for partitions but would look to parallelize their
> consumers
> > IMHO.
> >
> > The random broker change sounds like maybe a bigger change now this late
> > in the release cycle if we can accommodate folks trying Kafka for the
> first
> > time and through their development and testing along with full blown
> > production deploys.
> >
> > /*******************************************
> >  Joe Stein
> >  Founder, Principal Consultant
> >  Big Data Open Source Security LLC
> >  http://www.stealth.ly
> >  Twitter: @allthingshadoop
> > ********************************************/
> >
> >
> > On Sep 14, 2013, at 8:17 AM, Joel Koshy <[email protected]> wrote:
> >
> > >>
> > >>
> > >> Thanks for bringing this up - it is definitely an important point to
> > >> discuss. The underlying issue of KAFKA-1017 was uncovered to some
> > degree by
> > >> the fact that in our deployment we did not significantly increase the
> > total
> > >> number of partitions over 0.7 - i.e., in 0.7 we had say four
> partitions
> > per
> > >> broker, now we are using (say) eight partitions across the cluster. So
> > with
> > >> random partitioning every producer would end up connecting to nearly
> > every
> > >> broker (unlike 0.7 in which we would connect to only one broker within
> > each
> > >> reconnect interval). In a production-scale deployment that causes the
> > high
> > >> number of connections that KAFKA-1017 addresses.
> > >>
> > >> You are right that the fix of sticking to one partition over the
> > metadata
> > >> refresh interval goes against true consumer parallelism, but this
> would
> > be
> > >> the case only if there are few producers. If you have a sizable number
> > of
> > >> producers on average all partitions would get uniform volumes of data.
> > >>
> > >> One tweak to KAFKA-1017 that I think is reasonable would be instead of
> > >> sticking to a random partition, stick to a random broker and send to
> > random
> > >> partitions within that broker. This would make the behavior closer to
> > 0.7
> > >> wrt number of connections and random partitioning provided the number
> of
> > >> partitions per broker is high enough, which is why I mentioned the
> > >> partition count (in our usage) in 0.7 vs 0.8 above. Thoughts?
> > >>
> > >> Joel
> > >>
> > >>
> > >> On Friday, September 13, 2013, Joe Stein wrote:
> > >>>
> > >>> First, let me apologize for not realizing/noticing this until today.
> >  One
> > >>> reason I left my last company was not being paid to work on Kafka nor
> > >> being
> > >> able to afford any time for a while to work on it. Now in my new gig
> > (just
> > >> wrapped up my first week, woo hoo) while I am still not "paid to work
> on
> > >> Kafka" I can afford some more time for it now and maybe in 6 months I
> > will
> > >> be able to hire folks to work on Kafka (with more and more time for
> > myself
> > >> to work on it too) while we also work on client projects (especially
> > Kafka
> > >> based ones).
> > >>
> > >> So, I understand about the changes that were made to fix open file
> > handles
> > >> and make the random pinning be timed based (with a very large default
> > >> time).  Got all that.
> > >>
> > >> But, doesn't this completely negate what has been communicated to the
> > >> community for a very long time and the expectation they have? I think
> it
> > >> does.
> > >>
> > >> The expected functionality for random partitioning is that "This can
> be
> > >> done in a round-robin fashion simply to balance load" and that the
> > >> "producer" does it for you.
> > >>
> > >> Isn't a primary use case for partitions to paralyze consumers? If so
> > then
> > >> the expectation would be that all consumers would be getting in
> parallel
> > >> equally in a "round robin fashion" the data that was produced for the
> > >> topic... simply to balance load...with the producer handling it and
> with
> > >> the client application not having to-do anything. This randomness
> > occurring
> > >> every 10 minutes can't balance load.
> > >>
> > >> If users are going to work around this anyways (as I would honestly do
> > too)
> > >> doing a pseudo semantic random key and essentially forcing real
> > randomness
> > >> to simply balance load to my consumers running in parallel would we
> > still
> > >> end up hitting the KAFKA-1017 problem anyways? If not then why can't
> we
> > >> just give users the functionality and put back the 3 lines of code 1)
> > >> if(key == null) 2)  random.nextInt(numPartitions) 3) else ... If we
> > would
> > >> bump into KAFKA-1017 by working around it then we have not really
> solved
> > >> the root cause problem and removing expected functionality for a
> corner
> > >> case that might have other work arounds and/or code changes to solve
> it
> > >> another way or am I still not getting something?
> > >>
> > >> Also, I was looking at testRandomPartitioner in AsyncProducerTest and
> I
> > >> don't see how this would ever fail, the assertion is always for
> > partitionId
> > >> == 0 and it should be checking that data is going to different
> > partitions
> > >> for a topic, right?
> > >>
> > >> Let me know, I think this is an important discussion and even if it
> > ends up
> > >> as telling the community to only use one partition that is all you
> need
> > and
> > >> partitions become our super columns (Apache Cassandra joke, its funny)
> > then
> > >> we manage and support it and that is just how it is but if partitions
> > are a
> > >> good thing and having multiple consumers scale in parrelel for a
> single
> > >> topic also good then we have to manage and support that.
> > >>
> > >> /*******************************************
> > >> Joe Stein
> > >> Founder, Principal Consultant
> > >> Big Data Open Source Security LLC
> > >> http://www.stealth.ly
> > >> Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > >> ********************************************/
> > >>
> >
>

Re: Random Partitioning Issue

Reply via email to