Re: Re: ZK and Kafka failover testing

Jeff Widman Wed, 19 Apr 2017 14:12:06 -0700

Oops, I linked to the wrong ticket, this is the one we hit:
https://issues.apache.org/jira/browse/KAFKA-3042


On Wed, Apr 19, 2017 at 1:45 PM, Jeff Widman <j...@netskope.com> wrote:

>
>
>
>
>
> *As Onur explained, if ZK is down, Kafka can still work, but won't be able
> to react to actual broker failures until ZK is up again. So if a broker is
> down in that window, some of the partitions may not be ready for read or
> write.*
> We had a production scenario where ZK had a long GC pause and Kafka lost
> connection temporarily. The brokers kept sending data just fine for
> existing topics. However, when ZK came back, the kafka cluster could not
> recover gracefully because of this issue: https://issues.apache.org/
> jira/browse/KAFKA-2729
> IIRC, in our case, the cached data that was mismatched was the controller
> generations in zookeeper for the partition leaders did not match the
> generation id listed in the controller znode.
> Manually forcing a controller re-election solved this because it brought
> all generation IDs in sync. However, it would have been nice if the cluster
> had been able to automatically do the controller re-election without
> waiting for manual intervention.
>
> On Wed, Apr 19, 2017 at 1:30 PM, Jun Rao <j...@confluent.io> wrote:
>
>> Hi, Shri,
>>
>> As Onur explained, if ZK is down, Kafka can still work, but won't be able
>> to react to actual broker failures until ZK is up again. So if a broker is
>> down in that window, some of the partitions may not be ready for read or
>> write.
>>
>> As for the duplicates in the consumer, Hans had a good point. It would be
>> useful to see if the duplicates are introduced by the producer or the
>> consumer. Perhaps you can read the log again and see if duplicates are in
>> the log in the first place. Note that broker retries can introduce
>> duplicates.
>>
>> Hi, Onur,
>>
>> For the data loss issue that you mentioned, that should only happen with
>> acks=1. As we discussed offline, if acks=all is used and unclean leader
>> election is disabled, acked messages shouldn't be lost.
>>
>> Thanks,
>>
>> Jun
>>
>>
>> On Wed, Apr 19, 2017 at 10:19 AM, Onur Karaman <
>> onurkaraman.apa...@gmail.com
>> > wrote:
>>
>> > If this is what I think it is, it has nothing to do with acks,
>> > max.in.flight.requests.per.connection, or anything client-side and is
>> > purely about the kafka cluster.
>> >
>> > Here's a simple example involving a single zookeeper instance, 3
>> brokers, a
>> > KafkaConsumer and KafkaProducer (neither of these clients interact with
>> > zookeeper).
>> > 1. start up zookeeper:
>> > > ./bin/zookeeper-server-start.sh config/zookeeper.properties
>> >
>> > 2. start up some brokers:
>> > > ./bin/kafka-server-start.sh config/server0.properties
>> > > ./bin/kafka-server-start.sh config/server1.properties
>> > > ./bin/kafka-server-start.sh config/server2.properties
>> >
>> > 3 create a topic:
>> > > ./bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic t
>> > --partitions 1 --replication-factor 3
>> >
>> > 4. start a console consumer (this needs to happen before step 5 so we
>> can
>> > write __consumer_offsets metadata to zookeeper):
>> > > ./bin/kafka-console-consumer.sh --broker-list
>> > localhost:9090,localhost:9091,localhost:9092 --topic t
>> >
>> > 5. kill zookeeper
>> >
>> > 6. start a console producer and produce some messages:
>> > > ./bin/kafka-console-producer.sh --broker-list
>> > localhost:9090,localhost:9091,localhost:9092 --topic t
>> >
>> > 7. notice the size of the broker logs grow with each message you send:
>> > > l /tmp/kafka-logs*/t-0
>> >
>> > 8. notice the consumer consuming the messages being produced
>> >
>> > Basically, zookeeper can be completely offline and your brokers will
>> append
>> > to logs and process client requests just fine as long as it doesn't
>> need to
>> > interact with zookeeper. Today, the only way a broker knows to stop
>> > accepting requests is when it receives instruction from the controller.
>> >
>> > I first realized this last July when debugging a small production data
>> loss
>> > scenario that was a result of this[1]. Maybe this is an attempt at
>> leaning
>> > towards availability over consistency. Personally I think that brokers
>> > should stop accepting requests when it disconnects from zookeeper.
>> >
>> > [1] The small production data loss scenario happens when accepting
>> requests
>> > during the small window in between a broker's zookeeper session
>> expiration
>> > and when the controller instructs the broker to stop accepting requests.
>> > During this time, the broker still thinks it leads partitions that are
>> > currently being led by another broker, effectively resulting in a window
>> > where the partition is led by two brokers. Clients can continue sending
>> > requests to the old leader, and for producers with low acknowledgement
>> > settings (like ack=1), their messages will be lost without the client
>> > knowing, as the messages are being appended to the phantom leader's logs
>> > instead of the true leader's logs.
>> >
>> > On Wed, Apr 19, 2017 at 7:56 AM, Shrikant Patel <spa...@pdxinc.com>
>> wrote:
>> >
>> > > While we were testing, our producer had following configuration
>> > > max.in.flight.requests.per.connection=1, acks= all and retries=3.
>> > >
>> > > The entire producer side set is below. The consumer has manual offset
>> > > commit, it commit offset after it has successfully processed the
>> message.
>> > >
>> > > Producer setting
>> > > bootstrap.servers= {point the F5 VS fronting Kafka cluster}
>> > > key.serializer= {appropriate value as per your cases}
>> > > value.serializer= {appropriate value as per your case}
>> > > acks= all
>> > > retries=3
>> > > ssl.key.password= {appropriate value as per your case}
>> > > ssl.keystore.location= {appropriate value as per your case}
>> > > ssl.keystore.password= {appropriate value as per your case}
>> > > ssl.truststore.location= {appropriate value as per your case}
>> > > ssl.truststore.password= {appropriate value as per your case}
>> > > batch.size=16384
>> > > client.id= {appropriate value as per your case, may help with
>> debugging}
>> > > max.block.ms=65000
>> > > request.timeout.ms=30000
>> > > security.protocol= SSL
>> > > ssl.enabled.protocols=TLSv1.2
>> > > ssl.keystore.type=JKS
>> > > ssl.protocol=TLSv1.2
>> > > ssl.truststore.type=JKS
>> > > max.in.flight.requests.per.connection=1
>> > > metadata.fetch.timeout.ms=60000
>> > > reconnect.backoff.ms=1000
>> > > retry.backoff.ms=1000
>> > > max.request.size=1048576
>> > > linger.ms=0
>> > >
>> > > Consumer setting
>> > > bootstrap.servers= {point the F5 VS fronting Kafka cluster}
>> > > key.deserializer= {appropriate value as per your cases}
>> > > value.deserializer= {appropriate value as per your case}
>> > > group.id= {appropriate value as per your case}
>> > > ssl.key.password= {appropriate value as per your case}
>> > > ssl.keystore.location= {appropriate value as per your case}
>> > > ssl.keystore.password= {appropriate value as per your case}
>> > > ssl.truststore.location= {appropriate value as per your case}
>> > > ssl.truststore.password= {appropriate value as per your case}
>> > > enable.auto.commit=false
>> > > security.protocol= SSL
>> > > ssl.enabled.protocols=TLSv1.2
>> > > ssl.keystore.type=JKS
>> > > ssl.protocol=TLSv1.2
>> > > ssl.truststore.type=JKS
>> > > client.id= {appropriate value as per your case, may help with
>> > debugging}
>> > > reconnect.backoff.ms=1000
>> > > retry.backoff.ms=1000
>> > >
>> > > Thanks,
>> > > Shri
>> > >
>> > > -----Original Message-----
>> > > From: Hans Jespersen [mailto:h...@confluent.io]
>> > > Sent: Tuesday, April 18, 2017 7:57 PM
>> > > To: users@kafka.apache.org
>> > > Subject: [EXTERNAL] Re: ZK and Kafka failover testing
>> > >
>> > > ***** Notice: This email was received from an external source *****
>> > >
>> > > When you publish, is acks=0,1 or all (-1)?
>> > > What is max.in.flight.requests.per.connection (default is 5)?
>> > >
>> > > It sounds to me like your publishers are using acks=0 and so they are
>> not
>> > > actually succeeding in publishing (i.e. you are getting no acks) but
>> they
>> > > will retry over and over and will have up to 5 retries in flight, so
>> when
>> > > the broker comes back up, you are getting 4 or 5 copies of the same
>> > message.
>> > >
>> > > Try setting max.in.flight.requests.per.connection=1 to get rid of
>> > > duplicates Try setting acks=all to ensure the messages are being
>> > persisted
>> > > by the leader and all the available replicas in the kafka cluster.
>> > >
>> > > -hans
>> > >
>> > > /**
>> > >  * Hans Jespersen, Principal Systems Engineer, Confluent Inc.
>> > >  * h...@confluent.io (650)924-2670
>> > >  */
>> > >
>> > > On Tue, Apr 18, 2017 at 4:10 PM, Shrikant Patel <spa...@pdxinc.com>
>> > wrote:
>> > >
>> > > > Hi All,
>> > > >
>> > > > I am seeing strange behavior between ZK and Kafka. We have 5 node in
>> > > > ZK and Kafka cluster each. Kafka version - 2.11-0.10.1.1
>> > > >
>> > > > The min.insync.replicas is 3, replication.factor is 5 for all
>> topics,
>> > > > unclean.leader.election.enable is false. We have 15 partitions for
>> > > > each topic.
>> > > >
>> > > > The step we are following in our testing.
>> > > >
>> > > >
>> > > > *         My understanding is that ZK needs aleast 3 out of 5
>> server to
>> > > be
>> > > > functional. Kafka could not be functional without zookeeper. In out
>> > > > testing, we bring down 3 ZK nodes and don't touch Kafka nodes. Kafka
>> > > > is still functional, consumer\producer can still consume\publish
>> from
>> > > > Kafka cluster. We then bring down all ZK nodes, Kafka
>> > > > consumer\producers are still functional. I am not able to understand
>> > > > why Kafka cluster is not failing as soon as majority of ZK nodes are
>> > > > down. I do see error in Kafka that it cannot connection to ZK
>> cluster.
>> > > >
>> > > >
>> > > >
>> > > > *         With all or majority of ZK node down, we bring down 1
>> Kafka
>> > > > nodes (out of 5, so 4 are running). And at that point the consumer
>> and
>> > > > producer start failing. My guess is the new leadership election
>> cannot
>> > > > happen without ZK.
>> > > >
>> > > >
>> > > >
>> > > > *         Then we bring up the majority of ZK node up. (1st Kafka is
>> > > still
>> > > > down) Now the Kafka cluster become functional, consumer and producer
>> > > > now start working again. But Consumer sees big junk of message from
>> > > > kafka, and many of them are duplicates. It's like these messages
>> were
>> > > > held up somewhere, Where\Why I don't know?  And why the duplicates?
>> I
>> > > > can understand few duplicates for messages that consumer would not
>> > > > commit before 1st node when down. But why so many duplicates and
>> like
>> > > > 4 copy for each message. I cannot understand this behavior.
>> > > >
>> > > > Appreciate some insight about our issues. Also if there are blogs
>> that
>> > > > describe the ZK and Kafka failover scenario behaviors, that would be
>> > > > extremely helpful.
>> > > >
>> > > > Thanks,
>> > > > Shri
>> > > >
>> > > > This e-mail and its contents (to include attachments) are the
>> property
>> > > > of National Health Systems, Inc., its subsidiaries and affiliates,
>> > > > including but not limited to Rx.com Community Healthcare Network,
>> Inc.
>> > > > and its subsidiaries, and may contain confidential and proprietary
>> or
>> > > > privileged information. If you are not the intended recipient of
>> this
>> > > > e-mail, you are hereby notified that any unauthorized disclosure,
>> > > > copying, or distribution of this e-mail or of its attachments, or
>> the
>> > > > taking of any unauthorized action based on information contained
>> herein
>> > > is strictly prohibited.
>> > > > Unauthorized use of information contained herein may subject you to
>> > > > civil and criminal prosecution and penalties. If you are not the
>> > > > intended recipient, please immediately notify the sender by
>> telephone
>> > > > at
>> > > > 800-433-5719 <(800)%20433-5719> or return e-mail and permanently
>> delete the original
>> > e-mail.
>> > > >
>> > > This e-mail and its contents (to include attachments) are the
>> property of
>> > > National Health Systems, Inc., its subsidiaries and affiliates,
>> including
>> > > but not limited to Rx.com Community Healthcare Network, Inc. and its
>> > > subsidiaries, and may contain confidential and proprietary or
>> privileged
>> > > information. If you are not the intended recipient of this e-mail, you
>> > are
>> > > hereby notified that any unauthorized disclosure, copying, or
>> > distribution
>> > > of this e-mail or of its attachments, or the taking of any
>> unauthorized
>> > > action based on information contained herein is strictly prohibited.
>> > > Unauthorized use of information contained herein may subject you to
>> civil
>> > > and criminal prosecution and penalties. If you are not the intended
>> > > recipient, please immediately notify the sender by telephone at
>> > > 800-433-5719 <(800)%20433-5719> or return e-mail and permanently
>> delete the original e-mail.
>> > >
>> >
>>
>
>

Re: Re: ZK and Kafka failover testing

Reply via email to