Regarding (2) - yes that's a good point. @Onur - I think the KIP should
explicitly call this out.
It is something that we did consider and decided against optimizing for.
i.e., we just wrote that off as a minor caveat of the upgrade path in that
there will be a few duplicates, but not too many given that we expect the
period of duplicate ownership to be minimal. Although it could be addressed
as you described, it does add complexity to an already-rather-complex
migration path. Given that it is a transition state (i.e., migration) we
felt it would be better and sufficient to keep it only as complex as it
needs to be.

On Mon, Feb 20, 2017 at 4:45 PM, Onur Karaman <onurkaraman.apa...@gmail.com>
wrote:

> Regarding 1: We won't lose the offset from zookeeper upon partition
> transfer from OZKCC/MDZKCC to MEZKCC because MEZKCC has
> "dual.commit.enabled" set to true as well as "offsets.storage" set to
> kafka. The combination of these configs results in the consumer fetching
> offsets from both kafka and zookeeper and just picking the greater of the
> two.
>
> On Mon, Feb 20, 2017 at 4:33 PM, Dong Lin <lindon...@gmail.com> wrote:
>
> > Hey Onur,
> >
> > Thanks for the well-written KIP! I have two questions below.
> >
> > 1) In the process of migrating from OZKCCs and MDZKCCs to MEZKCCs, we
> will
> > may a mix of OZKCCs, MDZKCCs and MEZKCCs. OZKCC and MDZKCC will only
> commit
> > to zookeeper and MDZKCC will use kafka-based offset storage. Would we
> lose
> > offset committed to zookeeper by a MDZKCC if a partition ownership if
> > transferred from a MDZKCC to a MEZKCC?
> >
> > 2) Suppose every process in the group is running MEZKCC. Each MEZKCC has
> a
> > zookeeper-based partition assignment and kafka-based partition
> assignment.
> > Is it guaranteed that these two assignments are exactly the same across
> > processes? If not, say the zookeeper-based assignment assigns p1, p2 to
> > process 1, and p3 to process 2. And kafka-based assignment assigns p1, p3
> > to process 1, and p2 to process 2. Say process 1 handles receives the
> > notification to switch to kafka-based notification before process 2, it
> is
> > possible that during a short period of time p3 will be consumed by both
> > processes?
> >
> > This period is probably short and I am not sure how many messages may be
> > duplicated as a result. But it seems possible to avoid this completely
> > according to an idea that Becket suggested in a previous discussion. The
> > znode /consumers/<group id>/migration/mode can contain a sequence number
> > that increment for each switch. Say the znode is toggled to kafka with
> > sequence number 2, each MEZKCC will commit offset to with number 2 in the
> > metadata for partitions that it currently owns according to the zk-based
> > partition assignment, and then periodically fetches the committed offset
> > and the metadata for the partitions that it should own according to the
> > kafka-based partition assignment. Each MEZKCC only starts consumption
> when
> > the metadata has incremented to the number 2.
> >
> > Thanks,
> > Dong
> >
> >
> >
> >
> >
> >
> >
> >
> > On Mon, Feb 20, 2017 at 12:04 PM, Onur Karaman <
> > onurkaraman.apa...@gmail.com
> > > wrote:
> >
> > > Hey everyone.
> > >
> > > I made a KIP that provides a mechanism for migrating from
> > > ZookeeperConsumerConnector to KafkaConsumer as well as a mechanism for
> > > rolling back from KafkaConsumer to ZookeeperConsumerConnector:
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-125%
> > > 3A+ZookeeperConsumerConnector+to+KafkaConsumer+Migration+and+Rollback
> > >
> > > Comments are welcome.
> > >
> > > - Onur
> > >
> >
>

Reply via email to