yabinmeng opened a new issue #91: URL: https://github.com/apache/pulsar-helm-chart/issues/91
**Describe the bug** When using the official Helm chart to deploy a K8s based Pulsar cluster for version 2.6.x, the **broker** pod is stuck in " wait-bookkeeper-ready" check although the bookie pod is up and running without any issue, see Pod list below: ``` $ kubectl -n pulsar get pod NAME READY STATUS RESTARTS AGE mytest-pulsar1-bookie-0 1/1 Running 0 7m14s mytest-pulsar1-bookie-init-z8gtb 0/1 Completed 0 7m14s mytest-pulsar1-broker-0 0/1 Init:1/2 0 7m14s mytest-pulsar1-grafana-7bcb854cf4-lmbmj 1/1 Running 0 7m15s mytest-pulsar1-prometheus-6f79d5c86c-2fdvt 1/1 Running 0 7m15s mytest-pulsar1-proxy-0 0/1 Init:1/2 0 7m14s mytest-pulsar1-pulsar-init-hln7v 0/1 Completed 0 7m14s mytest-pulsar1-pulsar-manager-6959fb64d4-tl65f 1/1 Running 0 7m15s mytest-pulsar1-recovery-0 1/1 Running 0 7m15s mytest-pulsar1-toolset-0 1/1 Running 0 7m15s mytest-pulsar1-zookeeper-0 1/1 Running 0 7m14s ``` It looks like it is stuck in the following check of "wait-bookkeeper-ready" init container ``` until bin/bookkeeper shell whatisinstanceid; do echo "bookkeeper cluster is not initialized yet. backoff for 3 seconds ..."; sleep 3; done; ``` When I manually run command "bin/bookkeeper shell whatisinstanceid" in "wait-bookkeeper-ready" init container, the result is as below and it looks fine to me. ``` ... 20:53:30.932 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 20:53:30.932 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.io.tmpdir=/tmp 20:53:30.932 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.compiler=<NA> 20:53:30.932 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:os.name=Linux 20:53:30.932 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:os.arch=amd64 20:53:30.932 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:os.version=5.4.0-1029-gke 20:53:30.932 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:user.name=root 20:53:30.932 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:user.home=/root 20:53:30.932 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:user.dir=/pulsar 20:53:30.932 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:os.memory.free=899MB 20:53:30.933 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:os.memory.max=1024MB 20:53:30.933 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:os.memory.total=1024MB 20:53:30.940 [main] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=mytest-pulsar1-zookeeper:2181 sessionTimeout=30000 watcher=org.apache.bookkeeper.zookeeper.ZooKeeperWatcherBase@5e4bd84a 20:53:30.948 [main] INFO org.apache.zookeeper.common.X509Util - Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation 20:53:30.957 [main] INFO org.apache.zookeeper.ClientCnxnSocket - jute.maxbuffer value is 4194304 Bytes 20:53:30.967 [main] INFO org.apache.zookeeper.ClientCnxn - zookeeper.request.timeout value is 0. feature enabled= 20:53:30.986 [main-SendThread(mytest-pulsar1-zookeeper:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server mytest-pulsar1-zookeeper/10.100.1.36:2181. Will not attempt to authenticate using SASL (unknown error) 20:53:30.994 [main-SendThread(mytest-pulsar1-zookeeper:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established, initiating session, client: /10.100.0.168:55470, server: mytest-pulsar1-zookeeper/10.100.1.36:2181 20:53:31.007 [main-SendThread(mytest-pulsar1-zookeeper:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server mytest-pulsar1-zookeeper/10.100.1.36:2181, sessionid = 0x10000b712950011, negotiated timeout = 30000 20:53:31.011 [main-EventThread] INFO org.apache.bookkeeper.zookeeper.ZooKeeperWatcherBase - ZooKeeper client is connected now. 20:53:31.040 [main] INFO org.apache.bookkeeper.tools.cli.commands.bookies.InstanceIdCommand - Metadata Service Uri: zk+null://mytest-pulsar1-zookeeper:2181/ledgers InstanceId: 0b091500-6750-479f-b419-25d957d5a4e0 20:53:31.147 [main-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x10000b712950011 20:53:31.148 [main] INFO org.apache.zookeeper.ZooKeeper - Session: 0x10000b712950011 closed ``` **To Reproduce** Just follow the official procedure except specify an older Pulsar version in the value.yaml file, as below: ``` mages: zookeeper: repository: apachepulsar/pulsar-all tag: 2.6.1 pullPolicy: IfNotPresent bookie: repository: apachepulsar/pulsar-all tag: 2.6.1 pullPolicy: IfNotPresent autorecovery: repository: apachepulsar/pulsar-all tag: 2.6.1 pullPolicy: IfNotPresent broker: repository: apachepulsar/pulsar-all tag: 2.6.1 pullPolicy: IfNotPresent proxy: repository: apachepulsar/pulsar-all tag: 2.6.1 pullPolicy: IfNotPresent functions: repository: apachepulsar/pulsar-all tag: 2.6.1 ``` I also tested ont version 2.6.2 and this time both bookie and broker Pods got stuck in the same check. The chart works with 2.7.0 though without any issue. **Expected behavior** All Pulsar Pods should be up and running, as expected and demonstrated with version 2.7.0 **Screenshots** If applicable, add screenshots to help explain your problem. **Desktop (please complete the following information):** - GKE (Ubuntu); K8s version: 1.17.14-gke.1600 **Additional context** Add any other context about the problem here. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org