Re: Data loss while upgrading confluent 3.0.0 kafka cluster to confluent 3.2.2

Ismael Juma Mon, 18 Sep 2017 13:38:53 -0700

Hi Scott,

There is nothing preventing a replica running a newer version from being in
sync as long as the instructions are followed (i.e.
inter.broker.protocol.version has to be set correctly and, if there's a
message format change, log.message.format.version). That's why I asked
Yogesh for more details. The upgrade path he mentioned (0.10.0 -> 0.10.2)
is straightforward, there isn't a message format change, so only
inter.broker.protocol.version needs to be set.


Ismael

On Mon, Sep 18, 2017 at 5:50 PM, Scott Reynolds <
sreyno...@twilio.com.invalid> wrote:

> Can we get some clarity on this point:
> >older version leader is not allowing newer version replicas to be in sync,
> so the data pushed using this older version leader
>
> That is super scary.
>
> What protocol version is the older version leader running?
>
> Would this happen if you are skipping a protocol version bump?
>
>
>
> On Mon, Sep 18, 2017 at 9:33 AM Ismael Juma <ism...@juma.me.uk> wrote:
>
> > Hi Yogesh,
> >
> > Can you please clarify what you mean by "observing data loss"?
> >
> > Ismael
> >
> > On Mon, Sep 18, 2017 at 5:08 PM, Yogesh Sangvikar <
> > yogesh.sangvi...@gmail.com> wrote:
> >
> > > Hi Team,
> > >
> > > Please help to find resolution for below kafka rolling upgrade issue.
> > >
> > > Thanks,
> > >
> > > Yogesh
> > >
> > > On Monday, September 18, 2017 at 9:03:04 PM UTC+5:30, Yogesh Sangvikar
> > > wrote:
> > >>
> > >> Hi Team,
> > >>
> > >> Currently, we are using confluent 3.0.0 kafka cluster in our
> production
> > >> environment. And, we are planing to upgrade the kafka cluster for
> > confluent
> > >> 3.2.2
> > >> We are having topics with millions on records and data getting
> > >> continuously published to those topics. And, also, we are using other
> > >> confluent services like schema-registry, kafka connect and kafka rest
> to
> > >> process the data.
> > >>
> > >> So, we can't afford downtime upgrade for the platform.
> > >>
> > >> We have tries rolling kafka upgrade as suggested on blogs in
> Development
> > >> environment,
> > >>
> > >>
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.
> confluent.io_3.2.2_upgrade.html&d=DwIBaQ&c=x_Y1Lz9GyeGp2OvBCa_eow&r=
> ChXZJWKniTslJvQGptpIW7qAh4kkrpgYSer_wfh4G5w&m=JGTnLlVIAvVddNas19L_
> w54zWrVd48xst46GuPGCxV0&s=DMcA8JOnGXNNa_dRFpkNOd7AJoIQUgkEcw6q06RHgl0&e=
> > >>
> > >>
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__kafka.
> apache.org_documentation_-23upgrade&d=DwIBaQ&c=x_Y1Lz9GyeGp2OvBCa_eow&r=
> ChXZJWKniTslJvQGptpIW7qAh4kkrpgYSer_wfh4G5w&m=JGTnLlVIAvVddNas19L_
> w54zWrVd48xst46GuPGCxV0&s=0p4Fn8sKMbVJMR6nk42C-lhyujAEVUXTYZJhteC11Fs&e=
> > >>
> > >> But, we are observing data loss on topics while doing rolling upgrade
> /
> > >> restart of kafka servers for "inter.broker.protocol.version=0.10.2".
> > >>
> > >> As per our observation, we suspect the root cause for the data loss
> > >> (explained for a topic partition having 3 replicas),
> > >>
> > >>    - As the kafka broker protocol version updates from 0.10.0 to
> 0.10.2
> > >>    in rolling fashion, the in-sync replicas having older version will
> > not
> > >>    allow updated replicas (0.10.2) to be in sync unless are all
> updated.
> > >>    - Also, we have explicitly disabled "unclean.leader.election.
> enabled"
> > >>    property, so only in-sync replicas will be elected as leader for
> the
> > given
> > >>    partition.
> > >>    - While doing rolling fashion update, as mentioned above, older
> > >>    version leader is not allowing newer version replicas to be in
> sync,
> > so the
> > >>    data pushed using this older version leader, will not be synced
> with
> > other
> > >>    replicas and if this leader(older version)  goes down for an
> > upgrade, other
> > >>    updated replicas will be shown in in-sync column and become leader,
> > but
> > >>    they lag in offset with old version leader and shows the offset of
> > the data
> > >>    till they have synced.
> > >>    - And, once the last replica comes up with updated version, will
> > >>    start syncing data from the current leader.
> > >>
> > >>
> > >> Please let us know comments on our observation and suggest proper way
> > for
> > >> rolling kafka upgrade as we can't afford downtime.
> > >>
> > >> Thanks,
> > >> Yogesh
> > >>
> > >
> >
> --
>
> Scott Reynolds
> Principal Engineer
> [image: twilio] <http://www.twilio.com/?utm_source=email_signature>
> MOBILE (630) 254-2474
> EMAIL sreyno...@twilio.com
>

Re: Data loss while upgrading confluent 3.0.0 kafka cluster to confluent 3.2.2

Reply via email to