Re: [DISCUSS] Deprecating the old consumers in trunk

Jason Gustafson Fri, 09 Dec 2016 13:03:28 -0800

Hey Ismael, that sounds fair to me. I'm +1.

-Jason


On Thu, Dec 8, 2016 at 8:01 AM, Ismael Juma <ism...@juma.me.uk> wrote:

> Thanks Onur and Jason. I filed a JIRA to track this:
>
> https://issues.apache.org/jira/browse/KAFKA-4513
>
> My take is that this would be good to have and one could argue that we
> should not remove the old consumers until we have it. However, I think we
> should still go ahead with the deprecation of the old consumers for the
> next release. That will make it clear to existing users that, where
> possible, they should start moving to the new consumer (everything will
> still work fine).
>
> Thoughts?
>
> Ismael
>
> On Mon, Nov 28, 2016 at 3:07 AM, Jason Gustafson <ja...@confluent.io>
> wrote:
>
> > Onur's suggestion or something like it sounds like it could work. Suppose
> > we add some metadata in Zookeeper for consumers which support the
> embedded
> > KafkaConsumer. Until all members in the group have declared support, the
> > consumer will continue use Zk for their partition assignments. But once
> all
> > members support the embedded consumer, then they will switch to receiving
> > their assignments from the embedded KafkaConsumer. So basically upgrading
> > to the new consumer first requires that you upgrade the old consumer to
> use
> > the new consumer's group assignment protocol. Once you've done that, then
> > upgrading to the new consumer becomes straightforward. Does that work?
> Then
> > maybe you don't need to propagate any extra information over the
> rebalance
> > protocol.
> >
> > -Jason
> >
> > On Wed, Nov 23, 2016 at 12:35 AM, Onur Karaman <
> > onurkaraman.apa...@gmail.com
> > > wrote:
> >
> > > Some coworkers may have had issues seeing my earlier post so reposting
> > > under a different email:
> > >
> > > So my earlier stated suboptimal migration plans and Joel's idea all
> > suffer
> > > from either downtime or dual partition ownership and consumption.
> > >
> > > But I think there's a bigger problem: they assume users are willing to
> do
> > > the full migration immediately. I'm not convinced that this is
> realistic.
> > > Some teams may be okay with this (and the earlier stated consequences
> of
> > > the existing approaches), but others want to "canary" new code. That
> is,
> > > they want to deploy a single instance of the new code to test the
> waters
> > > while all the other instances run old code. It's not unreasonable for
> > this
> > > to span days. In this world, earlier alternatives would have the canary
> > > under heavy load since it is the sole new consumer in the group and it
> is
> > > guaranteed to own every partition the group is interested in. So the
> > canary
> > > is likely going to look unhealthy and the consumer can fall behind.
> > >
> > > Here's a not-fully-thought-out idea:
> > > Suppose we roll out a ZookeeperConsumerConnector that uses an embedded
> > > KafkaConsumer to passively participate in kafka-based coordination
> while
> > > still participating in zookeeper-based coordination. For now, the
> > > ZookeeperConsumerConnectors just uses the partition assignment as
> decided
> > > in zookeeper. Now suppose an outside KafkaConsumer comes up.
> Kafka-based
> > > coordination allows arbitrary metadata to get broadcasted to the group.
> > > Maybe we can somehow broadcast a flag saying a new consumer is running
> > > during this migration. If the KafkaConsumers embedded in the
> > > ZookeeperConsumerConnector see this flag, then they can notify the
> > > ZookeeperConsumerConnector's fetchers to fetch the partitions defined
> by
> > > the kafka-based coordination rebalance result. The
> > > ZookeeperConsumerConnector's embedded KafkaConsumer's fetchers never
> get
> > > used at any point in time.
> > >
> > > The benefits of this approach would be:
> > > 1. no downtime
> > > 2. minimal window of dual partition ownership
> > > 3. even partition distribution upon canary arrival.
> > > ZookeeperConsumerConnector instances can claim some partition
> ownership,
> > so
> > > the canaried KafkaConsumer doesn't get overwhelmed by all of the
> > > partitions.
> > >
> > > On Fri, Nov 18, 2016 at 12:54 PM, Onur Karaman <
> > > okara...@linkedin.com.invalid> wrote:
> > >
> > > > So my earlier stated suboptimal migration plans and Joel's idea all
> > > suffer
> > > > from either downtime or dual partition ownership and consumption.
> > > >
> > > > But I think there's a bigger problem: they assume users are willing
> to
> > do
> > > > the full migration immediately. I'm not convinced that this is
> > realistic.
> > > > Some teams may be okay with this (and the earlier stated consequences
> > of
> > > > the existing approaches), but others want to "canary" new code. That
> > is,
> > > > they want to deploy a single instance of the new code to test the
> > waters
> > > > while all the other instances run old code. It's not unreasonable for
> > > this
> > > > to span days. In this world, earlier alternatives would have the
> canary
> > > > under heavy load since it is the sole new consumer in the group and
> it
> > is
> > > > guaranteed to own every partition the group is interested in. So the
> > > canary
> > > > is likely going to look unhealthy and the consumer can fall behind.
> > > >
> > > > Here's a not-fully-thought-out idea:
> > > > Suppose we roll out a ZookeeperConsumerConnector that uses an
> embedded
> > > > KafkaConsumer to passively participate in kafka-based coordination
> > while
> > > > still participating in zookeeper-based coordination. For now, the
> > > > ZookeeperConsumerConnectors just uses the partition assignment as
> > decided
> > > > in zookeeper. Now suppose an outside KafkaConsumer comes up.
> > Kafka-based
> > > > coordination allows arbitrary metadata to get broadcasted to the
> group.
> > > > Maybe we can somehow broadcast a flag saying a new consumer is
> running
> > > > during this migration. If the KafkaConsumers embedded in the
> > > > ZookeeperConsumerConnector see this flag, then they can notify the
> > > > ZookeeperConsumerConnector's fetchers to fetch the partitions defined
> > by
> > > > the kafka-based coordination rebalance result. The
> > > > ZookeeperConsumerConnector's embedded KafkaConsumer's fetchers never
> > get
> > > > used at any point in time.
> > > >
> > > > The benefits of this approach would be:
> > > > 1. no downtime
> > > > 2. minimal window of dual partition ownership
> > > > 3. even partition distribution upon canary arrival.
> > > > ZookeeperConsumerConnector instances can claim some partition
> > ownership,
> > > so
> > > > the canaried KafkaConsumer doesn't get overwhelmed by all of the
> > > > partitions.
> > > >
> > > > On Thu, Nov 17, 2016 at 9:17 PM, Joel Koshy <jjkosh...@gmail.com>
> > wrote:
> > > >
> > > > > Not sure it is worth doing, but a simple migration approach that
> > avoids
> > > > > *service* downtime could be as follows:
> > > > >
> > > > >    - Add a “migration mode” to the old consumer that disables its
> > > > fetchers
> > > > >    and disables offset commits. i.e., the consumers rebalance and
> own
> > > > >    partitions but do basically nothing.
> > > > >    - So assuming the old consumer is already committing offsets to
> > > Kafka,
> > > > >    the process would be:
> > > > >    - Bounce the consumer group (still on the old consumer) with:
> > > > >          - Migration mode on
> > > > >          - consumer.timeout.ms -1
> > > > >       - Bounce the consumer group to switch to the new consumer
> > > > >    - i.e., effectively pause and resume the entire group without
> real
> > > > >    downtime of the services.
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Nov 17, 2016 at 7:30 PM, Ismael Juma <ism...@juma.me.uk>
> > > wrote:
> > > > >
> > > > > > Thanks James. I had read your post and was planning to find it in
> > > order
> > > > > to
> > > > > > share it here so you saved me some work. :)
> > > > > >
> > > > > > Ismael
> > > > > >
> > > > > > On Fri, Nov 18, 2016 at 3:21 AM, James Cheng <
> wushuja...@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Sorry to self-plug, but I wrote a blog post that talks about
> > this,
> > > > with
> > > > > > > respect to mirrormaker. I came to the same 3 solutions that
> Onur
> > > > > > described.
> > > > > > >
> > > > > > > https://logallthethings.com/2016/10/07/mirrormaker-
> > > > > > > gotchas-when-moving-from-the-old-consumer-to-the-new-consumer/
> <
> > > > > > > https://logallthethings.com/2016/10/07/mirrormaker-
> > > > > > > gotchas-when-moving-from-the-old-consumer-to-the-new-
> consumer/>
> > > > > > >
> > > > > > > -James
> > > > > > >
> > > > > > > > On Nov 17, 2016, at 7:37 AM, Ismael Juma <ism...@juma.me.uk>
> > > > wrote:
> > > > > > > >
> > > > > > > > Hi Onur,
> > > > > > > >
> > > > > > > > It is a good point that there is currently no out of the box
> > > > solution
> > > > > > for
> > > > > > > > migrating from the old consumer to the new consumer where
> > neither
> > > > > > > downtime
> > > > > > > > or duplicate consumption are acceptable. As I understand,
> this
> > is
> > > > > > > important
> > > > > > > > for some of the usages at LinkedIn. Do you have any plans to
> > > tackle
> > > > > > this
> > > > > > > > issue?
> > > > > > > >
> > > > > > > > Jason, any thoughts on this?
> > > > > > > >
> > > > > > > > Ismael
> > > > > > > >
> > > > > > > > On Mon, Oct 31, 2016 at 11:03 PM, Onur Karaman <
> > > > > > > > okara...@linkedin.com.invalid> wrote:
> > > > > > > >
> > > > > > > >> Does this make sense given that we still don't have a
> graceful
> > > > > > migration
> > > > > > > >> plan from the old to new consumer?
> > > > > > > >>
> > > > > > > >> Different suboptimal migration plans that I can think of
> are:
> > > > > > > >> 1. shutdown all the old consumers of a group first and start
> > > them
> > > > > back
> > > > > > > up
> > > > > > > >> with the new consumer, causing downtime.
> > > > > > > >> 2. have a mix of old and new consumers in the same group,
> > > causing
> > > > > > > duplicate
> > > > > > > >> partition ownership and consumption as each rebalance
> protocol
> > > > > ignores
> > > > > > > the
> > > > > > > >> other.
> > > > > > > >> 3. form a brand new group for the new consumers doing the
> same
> > > > work
> > > > > as
> > > > > > > the
> > > > > > > >> old consumer group, still causing duplicate partition
> > ownership
> > > > and
> > > > > > > >> consumption across the two groups.
> > > > > > > >>
> > > > > > > >> On Mon, Oct 31, 2016 at 3:42 PM, Jun Rao <j...@confluent.io>
> > > > wrote:
> > > > > > > >>
> > > > > > > >>> Starting to deprecate the old consumer in the next release
> > > seems
> > > > > > like a
> > > > > > > >>> good idea.
> > > > > > > >>>
> > > > > > > >>> Thanks,
> > > > > > > >>>
> > > > > > > >>> Jun
> > > > > > > >>>
> > > > > > > >>> On Tue, Oct 25, 2016 at 2:45 AM, Ismael Juma <
> > > ism...@juma.me.uk>
> > > > > > > wrote:
> > > > > > > >>>
> > > > > > > >>>> Hi all,
> > > > > > > >>>>
> > > > > > > >>>> In 0.10.1.0, we removed the beta label from the new Java
> > > > consumer
> > > > > > > >>>> documentation and updated the various tools so that they
> can
> > > use
> > > > > the
> > > > > > > >> new
> > > > > > > >>>> consumer without having to pass the `--new-consumer` flag
> > > (more
> > > > > > > >>>> specifically the new consumer is used if
> `bootstrap-server`
> > is
> > > > > set).
> > > > > > > >> More
> > > > > > > >>>> details of the reasoning can be found in the original
> > discuss
> > > > > > thread:
> > > > > > > >>>> http://search-hadoop.com/m/Kafka/uyzND1e4bUP1Rjq721
> > > > > > > >>>>
> > > > > > > >>>> The old consumers don't have security or
> > `offsetsForTimestamp`
> > > > > > > (KIP-79)
> > > > > > > >>>> support and the plan is to only add features to the new
> Java
> > > > > > consumer.
> > > > > > > >>> Even
> > > > > > > >>>> so, the old consumers are a significant maintenance burden
> > as
> > > > they
> > > > > > > >>>> duplicate protocol request/response classes (the
> > > SimpleConsumer
> > > > > > > exposes
> > > > > > > >>>> them in the public API sadly). I experienced this first
> hand
> > > > most
> > > > > > > >>> recently
> > > > > > > >>>> while working on KIP-74.
> > > > > > > >>>>
> > > > > > > >>>> Given the above, I propose we deprecate the old consumers
> in
> > > > trunk
> > > > > > to
> > > > > > > >>> nudge
> > > > > > > >>>> users in the right direction. Users will have the 0.10.1.0
> > > cycle
> > > > > to
> > > > > > > >> start
> > > > > > > >>>> migrating to the new Java consumer with no build warnings.
> > > Once
> > > > > they
> > > > > > > >>>> upgrade to the next release (i.e. 0.10.2.0), users who are
> > > still
> > > > > > using
> > > > > > > >>> the
> > > > > > > >>>> old consumers will get warnings at build time encouraging
> > them
> > > > to
> > > > > > move
> > > > > > > >> to
> > > > > > > >>>> the new consumer, but everything will still work fine.
> > > > > > > >>>>
> > > > > > > >>>> In a future major release, the old consumers (along with
> the
> > > old
> > > > > > > >>> producers)
> > > > > > > >>>> will be removed. We will have a separate discuss/vote
> thread
> > > for
> > > > > > that
> > > > > > > >> to
> > > > > > > >>>> make sure the time is right.
> > > > > > > >>>>
> > > > > > > >>>> Thoughts?
> > > > > > > >>>>
> > > > > > > >>>> Ismael
> > > > > > > >>>>
> > > > > > > >>>
> > > > > > > >>
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Deprecating the old consumers in trunk

Reply via email to