michaeljmarshall commented on PR #266: URL: https://github.com/apache/pulsar-helm-chart/pull/266#issuecomment-1128425243
The error comes from the `bookie-init` job in the Zookeeper TLS tests. We're seeing similar errors in https://github.com/apache/pulsar-helm-chart/pull/260. It fails running this code: ``` bin/apply-config-from-env.py conf/bookkeeper.conf; /pulsar/keytool/keytool.sh toolset ${HOSTNAME}.pulsar-ci-toolset.pulsar.svc.cluster.local true; if bin/bookkeeper shell whatisinstanceid; then echo "bookkeeper cluster already initialized"; else bin/bookkeeper shell initnewcluster; fi ``` Specifically, the logs show that `bin/bookkeeper shell whatisinstanceid` connects to zookeeper, fails to find the instance id, and then `bin/bookkeeper shell initnewcluster` is run and times out while opening a connection to zookeeper. I looked at this a bit tonight, but couldn't find anything relevant. The one oddity in the bookie init logs is the last log line where it logs that it handles a new connection. That time is somewhat close to the time that the zookeeper expires an old connection. It could just be a coincidence. Bookie init logs: ``` [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 2022-05-14T04:49:40,679+0000 [main] INFO org.apache.zookeeper.ClientCnxnSocket - jute.maxbuffer value is 1048575 Bytes [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 2022-05-14T04:49:40,919+0000 [main] INFO org.apache.zookeeper.ClientCnxn - zookeeper.request.timeout value is 0. feature enabled=false [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 2022-05-14T04:49:41,161+0000 [main-SendThread(pulsar-ci-zookeeper:2281)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server pulsar-ci-zookeeper/10.244.1.17:2281. [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 2022-05-14T04:49:41,161+0000 [main-SendThread(pulsar-ci-zookeeper:2281)] INFO org.apache.zookeeper.ClientCnxn - SASL config status: Will not attempt to authenticate using SASL (unknown error) [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 2022-05-14T04:50:21,359+0000 [main-SendThread(pulsar-ci-zookeeper:2281)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 40358ms for session id 0x0 [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 2022-05-14T04:50:22,663+0000 [main-SendThread(pulsar-ci-zookeeper:2281)] WARN org.apache.zookeeper.ClientCnxn - An exception was thrown while closing send thread for session 0x0. [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] org.apache.zookeeper.ClientCnxn$SessionTimeoutException: Client session timed out, have not heard from server in 40358ms for session id 0x0 [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1258) [org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3] [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 2022-05-14T04:50:27,458+0000 [main] INFO org.apache.zookeeper.ClientCnxnSocketNetty - channel is told closing [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 2022-05-14T04:50:27,677+0000 [main] INFO org.apache.zookeeper.ZooKeeper - Session: 0x0 closed [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 2022-05-14T04:50:27,776+0000 [main] ERROR org.apache.bookkeeper.meta.zk.ZKMetadataDriverBase - Failed to create zookeeper client to pulsar-ci-zookeeper:2281 [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] at org.apache.zookeeper.KeeperException.create(KeeperException.java:102) ~[org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3] [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] at org.apache.bookkeeper.zookeeper.ZooKeeperWatcherBase.waitForConnection(ZooKeeperWatcherBase.java:159) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4] [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] at org.apache.bookkeeper.zookeeper.ZooKeeperClient$Builder.build(ZooKeeperClient.java:260) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4] [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] at org.apache.bookkeeper.meta.zk.ZKMetadataDriverBase.initialize(ZKMetadataDriverBase.java:207) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4] [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] at org.apache.bookkeeper.meta.zk.ZKMetadataBookieDriver.initialize(ZKMetadataBookieDriver.java:60) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4] [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] at org.apache.bookkeeper.meta.MetadataDrivers.runFunctionWithMetadataBookieDriver(MetadataDrivers.java:345) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4] [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] at org.apache.bookkeeper.meta.MetadataDrivers.runFunctionWithRegistrationManager(MetadataDrivers.java:372) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4] [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] at org.apache.bookkeeper.client.BookKeeperAdmin.initNewCluster(BookKeeperAdmin.java:1278) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4] [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] at org.apache.bookkeeper.tools.cli.commands.bookies.InitCommand.apply(InitCommand.java:56) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4] [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] at org.apache.bookkeeper.bookie.BookieShell$InitNewCluster.runCmd(BookieShell.java:334) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4] [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] at org.apache.bookkeeper.bookie.BookieShell$MyCommand.runCmd(BookieShell.java:238) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4] [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] at org.apache.bookkeeper.bookie.BookieShell.run(BookieShell.java:2278) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4] [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] at org.apache.bookkeeper.bookie.BookieShell.main(BookieShell.java:2369) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4] [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 2022-05-14T04:50:27,432+0000 [globalEventExecutor-1-1] WARN org.apache.zookeeper.ClientCnxnSocketNetty - future isn't success. [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] io.netty.util.concurrent.DefaultPromise$LeanCancellationException: null [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] at io.netty.util.concurrent.DefaultPromise.cancel(...)(Unknown Source) ~[io.netty-netty-common-4.1.74.Final.jar:4.1.74.Final] [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 2022-05-14T04:50:27,722+0000 [main-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x0 [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] Exception in thread "main" com.google.common.util.concurrent.UncheckedExecutionException: Failed to create zookeeper client to pulsar-ci-zookeeper:2281 [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] at org.apache.bookkeeper.tools.cli.commands.bookies.InitCommand.apply(InitCommand.java:58) [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] at org.apache.bookkeeper.bookie.BookieShell$InitNewCluster.runCmd(BookieShell.java:334) [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] at org.apache.bookkeeper.bookie.BookieShell$MyCommand.runCmd(BookieShell.java:238) [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] at org.apache.bookkeeper.bookie.BookieShell.run(BookieShell.java:2278) [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] at org.apache.bookkeeper.bookie.BookieShell.main(BookieShell.java:2369) [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] Caused by: org.apache.bookkeeper.meta.exceptions.MetadataException: Failed to create zookeeper client to pulsar-ci-zookeeper:2281 [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] at org.apache.bookkeeper.meta.zk.ZKMetadataDriverBase.initialize(ZKMetadataDriverBase.java:227) [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] at org.apache.bookkeeper.meta.zk.ZKMetadataBookieDriver.initialize(ZKMetadataBookieDriver.java:60) [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] at org.apache.bookkeeper.meta.MetadataDrivers.runFunctionWithMetadataBookieDriver(MetadataDrivers.java:345) [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] at org.apache.bookkeeper.meta.MetadataDrivers.runFunctionWithRegistrationManager(MetadataDrivers.java:372) [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] at org.apache.bookkeeper.client.BookKeeperAdmin.initNewCluster(BookKeeperAdmin.java:1278) [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] at org.apache.bookkeeper.tools.cli.commands.bookies.InitCommand.apply(InitCommand.java:56) [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] ... 4 more [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] at org.apache.zookeeper.KeeperException.create(KeeperException.java:102) [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] at org.apache.bookkeeper.zookeeper.ZooKeeperWatcherBase.waitForConnection(ZooKeeperWatcherBase.java:159) [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] at org.apache.bookkeeper.zookeeper.ZooKeeperClient$Builder.build(ZooKeeperClient.java:260) [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] at org.apache.bookkeeper.meta.zk.ZKMetadataDriverBase.initialize(ZKMetadataDriverBase.java:207) [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] ... 9 more [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 2022-05-14T04:50:28,965+0000 [epollEventLoopGroup-2-1] INFO org.apache.zookeeper.ClientCnxnSocketNetty - SSL handler added for channel: [id: 0xbcd9c9b9] ``` Zookeeper logs during the above timeout: ``` [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:49:40,253+0000 [SessionTracker] INFO org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x10000059c2e001f, timeout of 30000ms exceeded [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:49:44,255+0000 [SessionTracker] INFO org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x10000059c2e0020, timeout of 30000ms exceeded [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:49:49,717+0000 [epollEventLoopGroup-7-2] INFO org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-bookie,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509' [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:49:49,900+0000 [epollEventLoopGroup-7-1] INFO org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-broker,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509' [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:49:52,254+0000 [SessionTracker] INFO org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x10000059c2e0022, timeout of 30000ms exceeded [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:49:52,575+0000 [epollEventLoopGroup-7-2] INFO org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-bookie,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509' [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:49:52,643+0000 [epollEventLoopGroup-7-1] INFO org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-bookie,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509' [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:49:54,252+0000 [SessionTracker] INFO org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x10000059c2e0025, timeout of 30000ms exceeded [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:49:54,582+0000 [epollEventLoopGroup-7-2] INFO org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-zookeeper,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509' [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:49:54,586+0000 [epollEventLoopGroup-7-2] INFO org.apache.zookeeper.server.NettyServerCnxn - Processing ruok command from /127.0.0.1:43088 [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:49:58,254+0000 [SessionTracker] INFO org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x10000059c2e0026, timeout of 30000ms exceeded [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:49:59,714+0000 [epollEventLoopGroup-7-1] INFO org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-broker,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509' [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:50:00,398+0000 [epollEventLoopGroup-7-2] INFO org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-zookeeper,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509' [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:50:00,413+0000 [epollEventLoopGroup-7-2] INFO org.apache.zookeeper.server.NettyServerCnxn - Processing ruok command from /127.0.0.1:43090 [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:50:03,414+0000 [epollEventLoopGroup-7-1] INFO org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-bookie,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509' [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:50:08,145+0000 [epollEventLoopGroup-7-2] INFO org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-bookie,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509' [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:50:08,252+0000 [SessionTracker] INFO org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x10000059c2e0028, timeout of 30000ms exceeded [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:50:08,331+0000 [epollEventLoopGroup-7-1] INFO org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-bookie,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509' [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:50:10,202+0000 [epollEventLoopGroup-7-1] INFO org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-broker,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509' [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:50:10,254+0000 [SessionTracker] INFO org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x10000059c2e002b, timeout of 30000ms exceeded [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:50:15,393+0000 [epollEventLoopGroup-7-2] INFO org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-bookie,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509' [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:50:19,342+0000 [epollEventLoopGroup-7-2] ERROR org.apache.zookeeper.server.NettyServerCnxnFactory - Unsuccessful handshake with session 0x0 [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:50:20,254+0000 [SessionTracker] INFO org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x10000059c2e002e, timeout of 30000ms exceeded [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:50:20,255+0000 [SessionTracker] INFO org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x10000059c2e002c, timeout of 30000ms exceeded [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:50:22,179+0000 [epollEventLoopGroup-7-1] INFO org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-broker,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509' [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:50:23,699+0000 [epollEventLoopGroup-7-2] INFO org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-bookie,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509' [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:50:24,531+0000 [epollEventLoopGroup-7-1] INFO org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-bookie,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509' [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:50:24,713+0000 [epollEventLoopGroup-7-2] INFO org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-zookeeper,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509' [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:50:24,721+0000 [epollEventLoopGroup-7-2] INFO org.apache.zookeeper.server.NettyServerCnxn - Processing ruok command from /127.0.0.1:43092 [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:50:28,256+0000 [SessionTracker] INFO org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x10000059c2e0031, timeout of 30000ms exceeded [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:50:28,630+0000 [epollEventLoopGroup-7-1] INFO org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-bookie,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509' [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:50:30,253+0000 [SessionTracker] INFO org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x10000059c2e0032, timeout of 30000ms exceeded ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org