I would be in favor of that. I agree this is better than 0.7. -Jay
On Tue, Sep 17, 2013 at 10:19 AM, Joel Koshy <[email protected]> wrote: > I agree that minimizing the number of producer connections (while > being a good thing) is really required in very large production > deployments, and the net-effect of the existing change is > counter-intuitive to users who expect an immediate even distribution > across _all_ partitions of the topic. > > However, I don't think it is a hack because it is almost exactly the > same behavior as 0.7 in one of its modes. The 0.7 producer (which I > think was even more confusing) had three modes: > i) ZK send > ii) Config send(a): static list of broker1:port1,broker2:port2,etc. > iii) Config send(b): static list of a hardwareVIP:VIPport > > (i) and (ii) would achieve even distribution. (iii) would effectively > select one broker and distribute to partitions on that broker within > each reconnect interval. (iii) is very similar to what we now do in > 0.8. (Although we stick to one partition during each metadata refresh > interval that can be changed to stick to one broker and distribute > across partitions on that broker). > > At the same time, I agree with Joe's suggestion that we should keep > the more intuitive pre-KAFKA-1017 behavior as the default and move the > change in KAFKA-1017 to a more specific partitioner implementation. > > Joel > > > On Sun, Sep 15, 2013 at 8:44 AM, Jay Kreps <[email protected]> wrote: > > Let me ask another question which I think is more objective. Let's say > 100 > > random, smart infrastructure specialists try Kafka, of these 100 how many > > do you believe will > > 1. Say that this behavior is what they expected to happen? > > 2. Be happy with this behavior? > > I am not being facetious I am genuinely looking for a numerical > estimate. I > > am trying to figure out if nobody thought about this or if my estimate is > > just really different. For what it is worth my estimate is 0 and 5 > > respectively. > > > > This would be fine expect that we changed it from the good behavior to > the > > bad behavior to fix an issue that probably only we have. > > > > -Jay > > > > > > On Sun, Sep 15, 2013 at 8:37 AM, Jay Kreps <[email protected]> wrote: > > > >> I just took a look at this change. I agree with Joe, not to put to fine > a > >> point on it, but this is a confusing hack. > >> > >> Jun, I don't think wanting to minimizing the number of TCP connections > is > >> going to be a very common need for people with less than 10k producers. > I > >> also don't think people are going to get very good load balancing out of > >> this because most people don't have a ton of producers. I think instead > we > >> will spend the next year explaining this behavior which 99% of people > will > >> think is a bug (because it is crazy, non-intuitive, and breaks their > usage). > >> > >> Why was this done by adding special default behavior in the null key > case > >> instead of as a partitioner? The argument that the partitioner interface > >> doesn't have sufficient information to choose a partition is not a good > >> argument for hacking in changes to the default, it is an argument for * > >> improving* the partitioner interface. > >> > >> The whole point of a partitioner interface is to make it possible to > plug > >> in non-standard behavior like this, right? > >> > >> -Jay > >> > >> > >> On Sat, Sep 14, 2013 at 8:15 PM, Jun Rao <[email protected]> wrote: > >> > >>> Joe, > >>> > >>> Thanks for bringing this up. I want to clarify this a bit. > >>> > >>> 1. Currently, the producer side logic is that if the partitioning key > is > >>> not provided (i.e., it is null), the partitioner won't be called. We > did > >>> that because we want to select a random and "available" partition to > send > >>> messages so that if some partitions are temporarily unavailable > (because > >>> of > >>> broker failures), messages can still be sent to other partitions. Doing > >>> this in the partitioner is difficult since the partitioner doesn't know > >>> which partitions are currently available (the DefaultEventHandler > does). > >>> > >>> 2. As Joel said, the common use case in production is that there are > many > >>> more producers than #partitions in a topic. In this case, sticking to a > >>> partition for a few minutes is not going to cause too much imbalance in > >>> the > >>> partitions and has the benefit of reducing the # of socket > connections. My > >>> feeling is that this will benefit most production users. In fact, if > one > >>> uses a hardware load balancer for producing data in 0.7, it behaves in > >>> exactly the same way (a producer will stick to a broker until the > >>> reconnect > >>> interval is reached). > >>> > >>> 3. It is true that If one is testing a topic with more than one > partition > >>> (which is not the default value), this behavior can be a bit weird. > >>> However, I think it can be mitigated by running multiple test producer > >>> instances. > >>> > >>> 4. Someone reported in the mailing list that all data shows in only one > >>> partition after a few weeks. This is clearly not the expected > behavior. We > >>> can take a closer look to see if this is real issue. > >>> > >>> Do you think these address your concerns? > >>> > >>> Thanks, > >>> > >>> Jun > >>> > >>> > >>> > >>> On Sat, Sep 14, 2013 at 11:18 AM, Joe Stein <[email protected]> > wrote: > >>> > >>> > How about creating a new class called RandomRefreshPartioner and copy > >>> the > >>> > DefaultPartitioner code to it and then revert the DefaultPartitioner > >>> code. > >>> > I appreciate this is a one time burden for folks using the existing > >>> > 0.8-beta1 bumping into KAFKA-1017 in production having to switch to > the > >>> > RandomRefreshPartioner and when folks deploy to production will have > to > >>> > consider this property change. > >>> > > >>> > I make this suggestion keeping in mind the new folks that on board > with > >>> > Kafka and when everyone is in development and testing mode for the > first > >>> > time their experience would be as expected from how it would work in > >>> > production this way. In dev/test when first using Kafka they won't > >>> have so > >>> > many producers for partitions but would look to parallelize their > >>> consumers > >>> > IMHO. > >>> > > >>> > The random broker change sounds like maybe a bigger change now this > late > >>> > in the release cycle if we can accommodate folks trying Kafka for the > >>> first > >>> > time and through their development and testing along with full blown > >>> > production deploys. > >>> > > >>> > /******************************************* > >>> > Joe Stein > >>> > Founder, Principal Consultant > >>> > Big Data Open Source Security LLC > >>> > http://www.stealth.ly > >>> > Twitter: @allthingshadoop > >>> > ********************************************/ > >>> > > >>> > > >>> > On Sep 14, 2013, at 8:17 AM, Joel Koshy <[email protected]> wrote: > >>> > > >>> > >> > >>> > >> > >>> > >> Thanks for bringing this up - it is definitely an important point > to > >>> > >> discuss. The underlying issue of KAFKA-1017 was uncovered to some > >>> > degree by > >>> > >> the fact that in our deployment we did not significantly increase > the > >>> > total > >>> > >> number of partitions over 0.7 - i.e., in 0.7 we had say four > >>> partitions > >>> > per > >>> > >> broker, now we are using (say) eight partitions across the > cluster. > >>> So > >>> > with > >>> > >> random partitioning every producer would end up connecting to > nearly > >>> > every > >>> > >> broker (unlike 0.7 in which we would connect to only one broker > >>> within > >>> > each > >>> > >> reconnect interval). In a production-scale deployment that causes > the > >>> > high > >>> > >> number of connections that KAFKA-1017 addresses. > >>> > >> > >>> > >> You are right that the fix of sticking to one partition over the > >>> > metadata > >>> > >> refresh interval goes against true consumer parallelism, but this > >>> would > >>> > be > >>> > >> the case only if there are few producers. If you have a sizable > >>> number > >>> > of > >>> > >> producers on average all partitions would get uniform volumes of > >>> data. > >>> > >> > >>> > >> One tweak to KAFKA-1017 that I think is reasonable would be > instead > >>> of > >>> > >> sticking to a random partition, stick to a random broker and send > to > >>> > random > >>> > >> partitions within that broker. This would make the behavior > closer to > >>> > 0.7 > >>> > >> wrt number of connections and random partitioning provided the > >>> number of > >>> > >> partitions per broker is high enough, which is why I mentioned the > >>> > >> partition count (in our usage) in 0.7 vs 0.8 above. Thoughts? > >>> > >> > >>> > >> Joel > >>> > >> > >>> > >> > >>> > >> On Friday, September 13, 2013, Joe Stein wrote: > >>> > >>> > >>> > >>> First, let me apologize for not realizing/noticing this until > today. > >>> > One > >>> > >>> reason I left my last company was not being paid to work on Kafka > >>> nor > >>> > >> being > >>> > >> able to afford any time for a while to work on it. Now in my new > gig > >>> > (just > >>> > >> wrapped up my first week, woo hoo) while I am still not "paid to > >>> work on > >>> > >> Kafka" I can afford some more time for it now and maybe in 6 > months I > >>> > will > >>> > >> be able to hire folks to work on Kafka (with more and more time > for > >>> > myself > >>> > >> to work on it too) while we also work on client projects > (especially > >>> > Kafka > >>> > >> based ones). > >>> > >> > >>> > >> So, I understand about the changes that were made to fix open file > >>> > handles > >>> > >> and make the random pinning be timed based (with a very large > default > >>> > >> time). Got all that. > >>> > >> > >>> > >> But, doesn't this completely negate what has been communicated to > the > >>> > >> community for a very long time and the expectation they have? I > >>> think it > >>> > >> does. > >>> > >> > >>> > >> The expected functionality for random partitioning is that "This > can > >>> be > >>> > >> done in a round-robin fashion simply to balance load" and that the > >>> > >> "producer" does it for you. > >>> > >> > >>> > >> Isn't a primary use case for partitions to paralyze consumers? If > so > >>> > then > >>> > >> the expectation would be that all consumers would be getting in > >>> parallel > >>> > >> equally in a "round robin fashion" the data that was produced for > the > >>> > >> topic... simply to balance load...with the producer handling it > and > >>> with > >>> > >> the client application not having to-do anything. This randomness > >>> > occurring > >>> > >> every 10 minutes can't balance load. > >>> > >> > >>> > >> If users are going to work around this anyways (as I would > honestly > >>> do > >>> > too) > >>> > >> doing a pseudo semantic random key and essentially forcing real > >>> > randomness > >>> > >> to simply balance load to my consumers running in parallel would > we > >>> > still > >>> > >> end up hitting the KAFKA-1017 problem anyways? If not then why > can't > >>> we > >>> > >> just give users the functionality and put back the 3 lines of > code 1) > >>> > >> if(key == null) 2) random.nextInt(numPartitions) 3) else ... If > we > >>> > would > >>> > >> bump into KAFKA-1017 by working around it then we have not really > >>> solved > >>> > >> the root cause problem and removing expected functionality for a > >>> corner > >>> > >> case that might have other work arounds and/or code changes to > solve > >>> it > >>> > >> another way or am I still not getting something? > >>> > >> > >>> > >> Also, I was looking at testRandomPartitioner in AsyncProducerTest > >>> and I > >>> > >> don't see how this would ever fail, the assertion is always for > >>> > partitionId > >>> > >> == 0 and it should be checking that data is going to different > >>> > partitions > >>> > >> for a topic, right? > >>> > >> > >>> > >> Let me know, I think this is an important discussion and even if > it > >>> > ends up > >>> > >> as telling the community to only use one partition that is all you > >>> need > >>> > and > >>> > >> partitions become our super columns (Apache Cassandra joke, its > >>> funny) > >>> > then > >>> > >> we manage and support it and that is just how it is but if > partitions > >>> > are a > >>> > >> good thing and having multiple consumers scale in parrelel for a > >>> single > >>> > >> topic also good then we have to manage and support that. > >>> > >> > >>> > >> /******************************************* > >>> > >> Joe Stein > >>> > >> Founder, Principal Consultant > >>> > >> Big Data Open Source Security LLC > >>> > >> http://www.stealth.ly > >>> > >> Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop > > > >>> > >> ********************************************/ > >>> > >> > >>> > > >>> > >> > >> >
