Hi Scott, There is nothing preventing a replica running a newer version from being in sync as long as the instructions are followed (i.e. inter.broker.protocol.version has to be set correctly and, if there's a message format change, log.message.format.version). That's why I asked Yogesh for more details. The upgrade path he mentioned (0.10.0 -> 0.10.2) is straightforward, there isn't a message format change, so only inter.broker.protocol.version needs to be set.
Ismael On Mon, Sep 18, 2017 at 5:50 PM, Scott Reynolds < sreyno...@twilio.com.invalid> wrote: > Can we get some clarity on this point: > >older version leader is not allowing newer version replicas to be in sync, > so the data pushed using this older version leader > > That is super scary. > > What protocol version is the older version leader running? > > Would this happen if you are skipping a protocol version bump? > > > > On Mon, Sep 18, 2017 at 9:33 AM Ismael Juma <ism...@juma.me.uk> wrote: > > > Hi Yogesh, > > > > Can you please clarify what you mean by "observing data loss"? > > > > Ismael > > > > On Mon, Sep 18, 2017 at 5:08 PM, Yogesh Sangvikar < > > yogesh.sangvi...@gmail.com> wrote: > > > > > Hi Team, > > > > > > Please help to find resolution for below kafka rolling upgrade issue. > > > > > > Thanks, > > > > > > Yogesh > > > > > > On Monday, September 18, 2017 at 9:03:04 PM UTC+5:30, Yogesh Sangvikar > > > wrote: > > >> > > >> Hi Team, > > >> > > >> Currently, we are using confluent 3.0.0 kafka cluster in our > production > > >> environment. And, we are planing to upgrade the kafka cluster for > > confluent > > >> 3.2.2 > > >> We are having topics with millions on records and data getting > > >> continuously published to those topics. And, also, we are using other > > >> confluent services like schema-registry, kafka connect and kafka rest > to > > >> process the data. > > >> > > >> So, we can't afford downtime upgrade for the platform. > > >> > > >> We have tries rolling kafka upgrade as suggested on blogs in > Development > > >> environment, > > >> > > >> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__docs. > confluent.io_3.2.2_upgrade.html&d=DwIBaQ&c=x_Y1Lz9GyeGp2OvBCa_eow&r= > ChXZJWKniTslJvQGptpIW7qAh4kkrpgYSer_wfh4G5w&m=JGTnLlVIAvVddNas19L_ > w54zWrVd48xst46GuPGCxV0&s=DMcA8JOnGXNNa_dRFpkNOd7AJoIQUgkEcw6q06RHgl0&e= > > >> > > >> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__kafka. > apache.org_documentation_-23upgrade&d=DwIBaQ&c=x_Y1Lz9GyeGp2OvBCa_eow&r= > ChXZJWKniTslJvQGptpIW7qAh4kkrpgYSer_wfh4G5w&m=JGTnLlVIAvVddNas19L_ > w54zWrVd48xst46GuPGCxV0&s=0p4Fn8sKMbVJMR6nk42C-lhyujAEVUXTYZJhteC11Fs&e= > > >> > > >> But, we are observing data loss on topics while doing rolling upgrade > / > > >> restart of kafka servers for "inter.broker.protocol.version=0.10.2". > > >> > > >> As per our observation, we suspect the root cause for the data loss > > >> (explained for a topic partition having 3 replicas), > > >> > > >> - As the kafka broker protocol version updates from 0.10.0 to > 0.10.2 > > >> in rolling fashion, the in-sync replicas having older version will > > not > > >> allow updated replicas (0.10.2) to be in sync unless are all > updated. > > >> - Also, we have explicitly disabled "unclean.leader.election. > enabled" > > >> property, so only in-sync replicas will be elected as leader for > the > > given > > >> partition. > > >> - While doing rolling fashion update, as mentioned above, older > > >> version leader is not allowing newer version replicas to be in > sync, > > so the > > >> data pushed using this older version leader, will not be synced > with > > other > > >> replicas and if this leader(older version) goes down for an > > upgrade, other > > >> updated replicas will be shown in in-sync column and become leader, > > but > > >> they lag in offset with old version leader and shows the offset of > > the data > > >> till they have synced. > > >> - And, once the last replica comes up with updated version, will > > >> start syncing data from the current leader. > > >> > > >> > > >> Please let us know comments on our observation and suggest proper way > > for > > >> rolling kafka upgrade as we can't afford downtime. > > >> > > >> Thanks, > > >> Yogesh > > >> > > > > > > -- > > Scott Reynolds > Principal Engineer > [image: twilio] <http://www.twilio.com/?utm_source=email_signature> > MOBILE (630) 254-2474 > EMAIL sreyno...@twilio.com >