Hey Ismael, that sounds fair to me. I'm +1. -Jason
On Thu, Dec 8, 2016 at 8:01 AM, Ismael Juma <ism...@juma.me.uk> wrote: > Thanks Onur and Jason. I filed a JIRA to track this: > > https://issues.apache.org/jira/browse/KAFKA-4513 > > My take is that this would be good to have and one could argue that we > should not remove the old consumers until we have it. However, I think we > should still go ahead with the deprecation of the old consumers for the > next release. That will make it clear to existing users that, where > possible, they should start moving to the new consumer (everything will > still work fine). > > Thoughts? > > Ismael > > On Mon, Nov 28, 2016 at 3:07 AM, Jason Gustafson <ja...@confluent.io> > wrote: > > > Onur's suggestion or something like it sounds like it could work. Suppose > > we add some metadata in Zookeeper for consumers which support the > embedded > > KafkaConsumer. Until all members in the group have declared support, the > > consumer will continue use Zk for their partition assignments. But once > all > > members support the embedded consumer, then they will switch to receiving > > their assignments from the embedded KafkaConsumer. So basically upgrading > > to the new consumer first requires that you upgrade the old consumer to > use > > the new consumer's group assignment protocol. Once you've done that, then > > upgrading to the new consumer becomes straightforward. Does that work? > Then > > maybe you don't need to propagate any extra information over the > rebalance > > protocol. > > > > -Jason > > > > On Wed, Nov 23, 2016 at 12:35 AM, Onur Karaman < > > onurkaraman.apa...@gmail.com > > > wrote: > > > > > Some coworkers may have had issues seeing my earlier post so reposting > > > under a different email: > > > > > > So my earlier stated suboptimal migration plans and Joel's idea all > > suffer > > > from either downtime or dual partition ownership and consumption. > > > > > > But I think there's a bigger problem: they assume users are willing to > do > > > the full migration immediately. I'm not convinced that this is > realistic. > > > Some teams may be okay with this (and the earlier stated consequences > of > > > the existing approaches), but others want to "canary" new code. That > is, > > > they want to deploy a single instance of the new code to test the > waters > > > while all the other instances run old code. It's not unreasonable for > > this > > > to span days. In this world, earlier alternatives would have the canary > > > under heavy load since it is the sole new consumer in the group and it > is > > > guaranteed to own every partition the group is interested in. So the > > canary > > > is likely going to look unhealthy and the consumer can fall behind. > > > > > > Here's a not-fully-thought-out idea: > > > Suppose we roll out a ZookeeperConsumerConnector that uses an embedded > > > KafkaConsumer to passively participate in kafka-based coordination > while > > > still participating in zookeeper-based coordination. For now, the > > > ZookeeperConsumerConnectors just uses the partition assignment as > decided > > > in zookeeper. Now suppose an outside KafkaConsumer comes up. > Kafka-based > > > coordination allows arbitrary metadata to get broadcasted to the group. > > > Maybe we can somehow broadcast a flag saying a new consumer is running > > > during this migration. If the KafkaConsumers embedded in the > > > ZookeeperConsumerConnector see this flag, then they can notify the > > > ZookeeperConsumerConnector's fetchers to fetch the partitions defined > by > > > the kafka-based coordination rebalance result. The > > > ZookeeperConsumerConnector's embedded KafkaConsumer's fetchers never > get > > > used at any point in time. > > > > > > The benefits of this approach would be: > > > 1. no downtime > > > 2. minimal window of dual partition ownership > > > 3. even partition distribution upon canary arrival. > > > ZookeeperConsumerConnector instances can claim some partition > ownership, > > so > > > the canaried KafkaConsumer doesn't get overwhelmed by all of the > > > partitions. > > > > > > On Fri, Nov 18, 2016 at 12:54 PM, Onur Karaman < > > > okara...@linkedin.com.invalid> wrote: > > > > > > > So my earlier stated suboptimal migration plans and Joel's idea all > > > suffer > > > > from either downtime or dual partition ownership and consumption. > > > > > > > > But I think there's a bigger problem: they assume users are willing > to > > do > > > > the full migration immediately. I'm not convinced that this is > > realistic. > > > > Some teams may be okay with this (and the earlier stated consequences > > of > > > > the existing approaches), but others want to "canary" new code. That > > is, > > > > they want to deploy a single instance of the new code to test the > > waters > > > > while all the other instances run old code. It's not unreasonable for > > > this > > > > to span days. In this world, earlier alternatives would have the > canary > > > > under heavy load since it is the sole new consumer in the group and > it > > is > > > > guaranteed to own every partition the group is interested in. So the > > > canary > > > > is likely going to look unhealthy and the consumer can fall behind. > > > > > > > > Here's a not-fully-thought-out idea: > > > > Suppose we roll out a ZookeeperConsumerConnector that uses an > embedded > > > > KafkaConsumer to passively participate in kafka-based coordination > > while > > > > still participating in zookeeper-based coordination. For now, the > > > > ZookeeperConsumerConnectors just uses the partition assignment as > > decided > > > > in zookeeper. Now suppose an outside KafkaConsumer comes up. > > Kafka-based > > > > coordination allows arbitrary metadata to get broadcasted to the > group. > > > > Maybe we can somehow broadcast a flag saying a new consumer is > running > > > > during this migration. If the KafkaConsumers embedded in the > > > > ZookeeperConsumerConnector see this flag, then they can notify the > > > > ZookeeperConsumerConnector's fetchers to fetch the partitions defined > > by > > > > the kafka-based coordination rebalance result. The > > > > ZookeeperConsumerConnector's embedded KafkaConsumer's fetchers never > > get > > > > used at any point in time. > > > > > > > > The benefits of this approach would be: > > > > 1. no downtime > > > > 2. minimal window of dual partition ownership > > > > 3. even partition distribution upon canary arrival. > > > > ZookeeperConsumerConnector instances can claim some partition > > ownership, > > > so > > > > the canaried KafkaConsumer doesn't get overwhelmed by all of the > > > > partitions. > > > > > > > > On Thu, Nov 17, 2016 at 9:17 PM, Joel Koshy <jjkosh...@gmail.com> > > wrote: > > > > > > > > > Not sure it is worth doing, but a simple migration approach that > > avoids > > > > > *service* downtime could be as follows: > > > > > > > > > > - Add a “migration mode” to the old consumer that disables its > > > > fetchers > > > > > and disables offset commits. i.e., the consumers rebalance and > own > > > > > partitions but do basically nothing. > > > > > - So assuming the old consumer is already committing offsets to > > > Kafka, > > > > > the process would be: > > > > > - Bounce the consumer group (still on the old consumer) with: > > > > > - Migration mode on > > > > > - consumer.timeout.ms -1 > > > > > - Bounce the consumer group to switch to the new consumer > > > > > - i.e., effectively pause and resume the entire group without > real > > > > > downtime of the services. > > > > > > > > > > > > > > > > > > > > On Thu, Nov 17, 2016 at 7:30 PM, Ismael Juma <ism...@juma.me.uk> > > > wrote: > > > > > > > > > > > Thanks James. I had read your post and was planning to find it in > > > order > > > > > to > > > > > > share it here so you saved me some work. :) > > > > > > > > > > > > Ismael > > > > > > > > > > > > On Fri, Nov 18, 2016 at 3:21 AM, James Cheng < > wushuja...@gmail.com > > > > > > > > wrote: > > > > > > > > > > > > > Sorry to self-plug, but I wrote a blog post that talks about > > this, > > > > with > > > > > > > respect to mirrormaker. I came to the same 3 solutions that > Onur > > > > > > described. > > > > > > > > > > > > > > https://logallthethings.com/2016/10/07/mirrormaker- > > > > > > > gotchas-when-moving-from-the-old-consumer-to-the-new-consumer/ > < > > > > > > > https://logallthethings.com/2016/10/07/mirrormaker- > > > > > > > gotchas-when-moving-from-the-old-consumer-to-the-new- > consumer/> > > > > > > > > > > > > > > -James > > > > > > > > > > > > > > > On Nov 17, 2016, at 7:37 AM, Ismael Juma <ism...@juma.me.uk> > > > > wrote: > > > > > > > > > > > > > > > > Hi Onur, > > > > > > > > > > > > > > > > It is a good point that there is currently no out of the box > > > > solution > > > > > > for > > > > > > > > migrating from the old consumer to the new consumer where > > neither > > > > > > > downtime > > > > > > > > or duplicate consumption are acceptable. As I understand, > this > > is > > > > > > > important > > > > > > > > for some of the usages at LinkedIn. Do you have any plans to > > > tackle > > > > > > this > > > > > > > > issue? > > > > > > > > > > > > > > > > Jason, any thoughts on this? > > > > > > > > > > > > > > > > Ismael > > > > > > > > > > > > > > > > On Mon, Oct 31, 2016 at 11:03 PM, Onur Karaman < > > > > > > > > okara...@linkedin.com.invalid> wrote: > > > > > > > > > > > > > > > >> Does this make sense given that we still don't have a > graceful > > > > > > migration > > > > > > > >> plan from the old to new consumer? > > > > > > > >> > > > > > > > >> Different suboptimal migration plans that I can think of > are: > > > > > > > >> 1. shutdown all the old consumers of a group first and start > > > them > > > > > back > > > > > > > up > > > > > > > >> with the new consumer, causing downtime. > > > > > > > >> 2. have a mix of old and new consumers in the same group, > > > causing > > > > > > > duplicate > > > > > > > >> partition ownership and consumption as each rebalance > protocol > > > > > ignores > > > > > > > the > > > > > > > >> other. > > > > > > > >> 3. form a brand new group for the new consumers doing the > same > > > > work > > > > > as > > > > > > > the > > > > > > > >> old consumer group, still causing duplicate partition > > ownership > > > > and > > > > > > > >> consumption across the two groups. > > > > > > > >> > > > > > > > >> On Mon, Oct 31, 2016 at 3:42 PM, Jun Rao <j...@confluent.io> > > > > wrote: > > > > > > > >> > > > > > > > >>> Starting to deprecate the old consumer in the next release > > > seems > > > > > > like a > > > > > > > >>> good idea. > > > > > > > >>> > > > > > > > >>> Thanks, > > > > > > > >>> > > > > > > > >>> Jun > > > > > > > >>> > > > > > > > >>> On Tue, Oct 25, 2016 at 2:45 AM, Ismael Juma < > > > ism...@juma.me.uk> > > > > > > > wrote: > > > > > > > >>> > > > > > > > >>>> Hi all, > > > > > > > >>>> > > > > > > > >>>> In 0.10.1.0, we removed the beta label from the new Java > > > > consumer > > > > > > > >>>> documentation and updated the various tools so that they > can > > > use > > > > > the > > > > > > > >> new > > > > > > > >>>> consumer without having to pass the `--new-consumer` flag > > > (more > > > > > > > >>>> specifically the new consumer is used if > `bootstrap-server` > > is > > > > > set). > > > > > > > >> More > > > > > > > >>>> details of the reasoning can be found in the original > > discuss > > > > > > thread: > > > > > > > >>>> http://search-hadoop.com/m/Kafka/uyzND1e4bUP1Rjq721 > > > > > > > >>>> > > > > > > > >>>> The old consumers don't have security or > > `offsetsForTimestamp` > > > > > > > (KIP-79) > > > > > > > >>>> support and the plan is to only add features to the new > Java > > > > > > consumer. > > > > > > > >>> Even > > > > > > > >>>> so, the old consumers are a significant maintenance burden > > as > > > > they > > > > > > > >>>> duplicate protocol request/response classes (the > > > SimpleConsumer > > > > > > > exposes > > > > > > > >>>> them in the public API sadly). I experienced this first > hand > > > > most > > > > > > > >>> recently > > > > > > > >>>> while working on KIP-74. > > > > > > > >>>> > > > > > > > >>>> Given the above, I propose we deprecate the old consumers > in > > > > trunk > > > > > > to > > > > > > > >>> nudge > > > > > > > >>>> users in the right direction. Users will have the 0.10.1.0 > > > cycle > > > > > to > > > > > > > >> start > > > > > > > >>>> migrating to the new Java consumer with no build warnings. > > > Once > > > > > they > > > > > > > >>>> upgrade to the next release (i.e. 0.10.2.0), users who are > > > still > > > > > > using > > > > > > > >>> the > > > > > > > >>>> old consumers will get warnings at build time encouraging > > them > > > > to > > > > > > move > > > > > > > >> to > > > > > > > >>>> the new consumer, but everything will still work fine. > > > > > > > >>>> > > > > > > > >>>> In a future major release, the old consumers (along with > the > > > old > > > > > > > >>> producers) > > > > > > > >>>> will be removed. We will have a separate discuss/vote > thread > > > for > > > > > > that > > > > > > > >> to > > > > > > > >>>> make sure the time is right. > > > > > > > >>>> > > > > > > > >>>> Thoughts? > > > > > > > >>>> > > > > > > > >>>> Ismael > > > > > > > >>>> > > > > > > > >>> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >