Hi,

We are running Apache Kafka 2.7.1 on AWS EC2 with ZooKeeper running in the
same VPC (private network).

Since version 2.7.0 we observe the subj. issue with the first start of the
Kafka process when we rotate the EC2 instances (for the sake of software
upgrade).
Our supervisor script notices the failure and tries to start it again in a
few seconds, which has always been successful so far.

Sample logs from the first start:

 2021-07-08 15:52:26.414 INFO starting (kafka.server.KafkaServer)
 2021-07-08 15:52:26.415 INFO Connecting to zookeeper on 10.154.206.111:2181
,10.154.192.139:2181,10.154.215.44:2181/staging-fra-2
(kafka.server.KafkaServer)
 2021-07-08 15:52:26.432 INFO [ZooKeeperClient Kafka server] Initializing a
new session to 10.154.206.111:2181,10.154.192.139:2181,10.154.215.44:2181.
(kafka.zookeeper.ZooKeeperClient)
 2021-07-08 15:52:26.438 INFO Client
environment:zookeeper.version=3.5.9-83df9301aa5c2a5d284a9940177808c01bc35cef,
built on 01/06/2021 20:03 GMT (org.apache.zookeeper.ZooKeeper)
 2021-07-08 15:52:26.438 INFO Client
environment:host.name=ip-10-154-192-11.eu-central-1.compute.internal
(org.apache.zookeeper.ZooKeeper)
 2021-07-08 15:52:26.438 INFO Client environment:java.version=1.8.0_292
(org.apache.zookeeper.ZooKeeper)
 2021-07-08 15:52:26.438 INFO Client environment:java.vendor=Oracle
Corporation (org.apache.zookeeper.ZooKeeper)
 2021-07-08 15:52:26.438 INFO Client
environment:java.home=/usr/local/openjdk-8/jre
(org.apache.zookeeper.ZooKeeper)
 2021-07-08 15:52:26.438 INFO Client
environment:java.class.path=/opt/kafka/bin/../libs/activation-1.1.1.jar:...
 2021-07-08 15:52:26.438 INFO Client
environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
(org.apache.zookeeper.ZooKeeper)
 2021-07-08 15:52:26.438 INFO Client environment:java.io.tmpdir=/tmp
(org.apache.zookeeper.ZooKeeper)
 2021-07-08 15:52:26.438 INFO Client environment:java.compiler=<NA>
(org.apache.zookeeper.ZooKeeper)
 2021-07-08 15:52:26.438 INFO Client environment:os.name=Linux
(org.apache.zookeeper.ZooKeeper)
 2021-07-08 15:52:26.438 INFO Client environment:os.arch=amd64
(org.apache.zookeeper.ZooKeeper)
 2021-07-08 15:52:26.438 INFO Client environment:os.version=4.15.0-1099-aws
(org.apache.zookeeper.ZooKeeper)
 2021-07-08 15:52:26.438 INFO Client environment:user.name=?
(org.apache.zookeeper.ZooKeeper)
 2021-07-08 15:52:26.439 INFO Client environment:user.home=?
(org.apache.zookeeper.ZooKeeper)
 2021-07-08 15:52:26.439 INFO Client environment:user.dir=/
(org.apache.zookeeper.ZooKeeper)
 2021-07-08 15:52:26.439 INFO Client environment:os.memory.free=6074MB
(org.apache.zookeeper.ZooKeeper)
 2021-07-08 15:52:26.439 INFO Client environment:os.memory.max=6144MB
(org.apache.zookeeper.ZooKeeper)
 2021-07-08 15:52:26.442 INFO Client environment:os.memory.total=6144MB
(org.apache.zookeeper.ZooKeeper)
 2021-07-08 15:52:26.444 INFO Initiating client connection, connectString=
10.154.206.111:2181,10.154.192.139:2181,10.154.215.44:2181
sessionTimeout=18000
watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@5158b42f
(org.apache.zookeeper.ZooKeeper)
 2021-07-08 15:52:26.447 INFO jute.maxbuffer value is 4194304 Bytes
(org.apache.zookeeper.ClientCnxnSocket)
 2021-07-08 15:52:26.453 INFO zookeeper.request.timeout value is 0. feature
enabled= (org.apache.zookeeper.ClientCnxn)
 2021-07-08 15:52:26.455 INFO [ZooKeeperClient Kafka server] Waiting until
connected. (kafka.zookeeper.ZooKeeperClient)


* 2021-07-08 15:52:26.493 INFO Opening socket connection to server
ip-10-154-192-139.eu-central-1.compute.internal/10.154.192.139:2181
<http://10.154.192.139:2181>. Will not attempt to authenticate using SASL
(unknown error) (org.apache.zookeeper.ClientCnxn) 2021-07-08 15:52:32.456
INFO [ZooKeeperClient Kafka server] Closing.
(kafka.zookeeper.ZooKeeperClient) 2021-07-08 15:52:32.459 WARN Client
session timed out, have not heard from server in 6006ms for sessionid 0x0
(org.apache.zookeeper.ClientCnxn)*
 2021-07-08 15:52:32.564 INFO Session: 0x0 closed
(org.apache.zookeeper.ZooKeeper)
 2021-07-08 15:52:32.564 INFO EventThread shut down for session: 0x0
(org.apache.zookeeper.ClientCnxn)
 2021-07-08 15:52:32.566 INFO [ZooKeeperClient Kafka server] Closed.
(kafka.zookeeper.ZooKeeperClient)
 2021-07-08 15:52:32.569 ERROR Fatal error during KafkaServer startup.
Prepare to shutdown (kafka.server.KafkaServer)
 kafka.zookeeper.ZooKeeperClientTimeoutException: Timed out waiting for
connection while in state: CONNECTING
 #011at
kafka.zookeeper.ZooKeeperClient.waitUntilConnected(ZooKeeperClient.scala:262)
 #011at kafka.zookeeper.ZooKeeperClient.<init>(ZooKeeperClient.scala:119)
 #011at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1881)
 #011at kafka.server.KafkaServer.createZkClient$1(KafkaServer.scala:441)
 #011at
kafka.server.KafkaServer.$anonfun$initZkClient$2(KafkaServer.scala:460)
 #011at
kafka.server.KafkaServer.$anonfun$initZkClient$2$adapted(KafkaServer.scala:458)
 #011at scala.Option.foreach(Option.scala:437)
 #011at kafka.server.KafkaServer.initZkClient(KafkaServer.scala:458)
 #011at kafka.server.KafkaServer.startup(KafkaServer.scala:233)
 #011at
kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:44)
 #011at kafka.Kafka$.main(Kafka.scala:82)
 #011at kafka.Kafka.main(Kafka.scala)
 2021-07-08 15:52:32.570 INFO shutting down (kafka.server.KafkaServer)
 2021-07-08 15:52:32.574 INFO App info kafka.server for -1 unregistered
(org.apache.kafka.common.utils.AppInfoParser)
 2021-07-08 15:52:32.574 INFO shut down completed (kafka.server.KafkaServer)
 2021-07-08 15:52:32.575 ERROR Exiting Kafka.
(kafka.server.KafkaServerStartable)
 2021-07-08 15:52:32.578 INFO shutting down (kafka.server.KafkaServer)

Logs from the same EC2 instance after 2nd attempt to start (successful):

 2021-07-08 15:52:46.624 INFO Initiating client connection, connectString=
10.154.206.111:2181,10.154.192.139:2181,10.154.215.44:2181
sessionTimeout=18000 watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeper
ClientWatcher$@5158b42f (org.apache.zookeeper.ZooKeeper)
 2021-07-08 15:52:46.627 INFO jute.maxbuffer value is 4194304 Bytes
(org.apache.zookeeper.ClientCnxnSocket)
 2021-07-08 15:52:46.633 INFO zookeeper.request.timeout value is 0. feature
enabled= (org.apache.zookeeper.ClientCnxn)
 2021-07-08 15:52:46.635 INFO [ZooKeeperClient Kafka server] Waiting until
connected. (kafka.zookeeper.ZooKeeperClient)
 2021-07-08 15:52:46.638 INFO Opening socket connection to server
ip-10-154-192-139.eu-central-1.compute.internal/10.154.192.139:2181. Will
not attempt to authenticate using SASL (unknown error)
(org.apache.zookeeper.ClientCnxn)
 2021-07-08 15:52:46.641 INFO Socket connection established, initiating
session, client: /10.154.192.11:50816, server:
ip-10-154-192-139.eu-central-1.compute.internal/10.154.192.139:2181
(org.apache.zookeeper.ClientCnxn)
 2021-07-08 15:52:46.645 INFO Session establishment complete on server
ip-10-154-192-139.eu-central-1.compute.internal/10.154.192.139:2181,
sessionid = 0x800000b004865c7, negotiated timeout = 18000
(org.apache.zookeeper.ClientCnxn)
 2021-07-08 15:52:46.648 INFO [ZooKeeperClient Kafka server] Connected.
(kafka.zookeeper.ZooKeeperClient)
 2021-07-08 15:52:46.691 INFO Created zookeeper path /staging-fra-2
(kafka.server.KafkaServer)
 2021-07-08 15:52:46.692 INFO [ZooKeeperClient Kafka server] Closing.
(kafka.zookeeper.ZooKeeperClient)
 2021-07-08 15:52:46.795 INFO Session: 0x800000b004865c7 closed
(org.apache.zookeeper.ZooKeeper)
 2021-07-08 15:52:46.795 INFO EventThread shut down for session:
0x800000b004865c7 (org.apache.zookeeper.ClientCnxn)
 2021-07-08 15:52:46.797 INFO [ZooKeeperClient Kafka server] Closed.
(kafka.zookeeper.ZooKeeperClient)
 2021-07-08 15:52:46.798 INFO [ZooKeeperClient Kafka server] Initializing a
new session to 10.154.206.111:2181,10.154.192.139:2181,
10.154.215.44:2181/staging-fra-2. (kafka.zookeeper.ZooKeeperClient)
 2021-07-08 15:52:46.798 INFO Initiating client connection, connectString=
10.154.206.111:2181,10.154.192.139:2181,10.154.215.44:2181/staging-fra-2
sessionTimeout=18000
watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@3a52dba3
(org.apache.zookeeper.ZooKeeper)
 2021-07-08 15:52:46.799 INFO jute.maxbuffer value is 4194304 Bytes
(org.apache.zookeeper.ClientCnxnSocket)
 2021-07-08 15:52:46.799 INFO zookeeper.request.timeout value is 0. feature
enabled= (org.apache.zookeeper.ClientCnxn)
 2021-07-08 15:52:46.800 INFO [ZooKeeperClient Kafka server] Waiting until
connected. (kafka.zookeeper.ZooKeeperClient)
 2021-07-08 15:52:46.800 INFO Opening socket connection to server
ip-10-154-192-139.eu-central-1.compute.internal/10.154.192.139:2181. Will
not attempt to authenticate using SASL (unknown error)
(org.apache.zookeeper.ClientCnxn)
 2021-07-08 15:52:46.801 INFO Socket connection established, initiating
session, client: /10.154.192.11:50818, server:
ip-10-154-192-139.eu-central-1.compute.internal/10.154.192.139:2181
(org.apache.zookeeper.ClientCnxn)
 2021-07-08 15:52:46.803 INFO Session establishment complete on server
ip-10-154-192-139.eu-central-1.compute.internal/10.154.192.139:2181,
sessionid = 0x800000b004865c8, negotiated timeout = 18000
(org.apache.zookeeper.ClientCnxn)
 2021-07-08 15:52:46.803 INFO [ZooKeeperClient Kafka server] Connected.
(kafka.zookeeper.ZooKeeperClient)
 2021-07-08 15:52:46.824 INFO [feature-zk-node-event-process-thread]:
Starting
(kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread)


Now my questions are:

1. What could be the reason for a 6-second timeout on the first attempt to
connect?  Has anything changed significantly in this regard since version
2.6?  On a quick glance over the release notes[1] I can't spot anything
relevant.

2. Why does it "just work" within a few milliseconds on the second try?

3. We do not see anything relevant logged from ZooKeeper around the time of
failed connection attempts.  Is there a chance that DEBUG logging on
ZooKeeper will reveal something?

[1] https://downloads.apache.org/kafka/2.7.0/RELEASE_NOTES.html


Regards,
-- 
Alex

Reply via email to