yabinmeng opened a new issue #91:
URL: https://github.com/apache/pulsar-helm-chart/issues/91


   **Describe the bug**
   When using the official Helm chart to deploy a K8s based Pulsar cluster for 
version 2.6.x, the **broker** pod is stuck in " wait-bookkeeper-ready" check 
although the bookie pod is up and running without any issue, see Pod list below:
   
   ```
   $ kubectl -n pulsar get pod
   NAME                                             READY   STATUS      
RESTARTS   AGE
   mytest-pulsar1-bookie-0                          1/1     Running     0       
   7m14s
   mytest-pulsar1-bookie-init-z8gtb                 0/1     Completed   0       
   7m14s
   mytest-pulsar1-broker-0                          0/1     Init:1/2    0       
   7m14s
   mytest-pulsar1-grafana-7bcb854cf4-lmbmj          1/1     Running     0       
   7m15s
   mytest-pulsar1-prometheus-6f79d5c86c-2fdvt       1/1     Running     0       
   7m15s
   mytest-pulsar1-proxy-0                           0/1     Init:1/2    0       
   7m14s
   mytest-pulsar1-pulsar-init-hln7v                 0/1     Completed   0       
   7m14s
   mytest-pulsar1-pulsar-manager-6959fb64d4-tl65f   1/1     Running     0       
   7m15s
   mytest-pulsar1-recovery-0                        1/1     Running     0       
   7m15s
   mytest-pulsar1-toolset-0                         1/1     Running     0       
   7m15s
   mytest-pulsar1-zookeeper-0                       1/1     Running     0       
   7m14s
   ```
   It looks like it is stuck in the following check of "wait-bookkeeper-ready" 
init container
   ```
   until bin/bookkeeper shell whatisinstanceid; do
           echo "bookkeeper cluster is not initialized yet. backoff for 3 
seconds ...";
           sleep 3;
         done;
   ```
   When I manually run command "bin/bookkeeper shell whatisinstanceid"  in 
"wait-bookkeeper-ready" init container, the result is as below and it looks 
fine to me.
   ```
   ...
   20:53:30.932 [main] INFO  org.apache.zookeeper.ZooKeeper - Client 
environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
   20:53:30.932 [main] INFO  org.apache.zookeeper.ZooKeeper - Client 
environment:java.io.tmpdir=/tmp
   20:53:30.932 [main] INFO  org.apache.zookeeper.ZooKeeper - Client 
environment:java.compiler=<NA>
   20:53:30.932 [main] INFO  org.apache.zookeeper.ZooKeeper - Client 
environment:os.name=Linux
   20:53:30.932 [main] INFO  org.apache.zookeeper.ZooKeeper - Client 
environment:os.arch=amd64
   20:53:30.932 [main] INFO  org.apache.zookeeper.ZooKeeper - Client 
environment:os.version=5.4.0-1029-gke
   20:53:30.932 [main] INFO  org.apache.zookeeper.ZooKeeper - Client 
environment:user.name=root
   20:53:30.932 [main] INFO  org.apache.zookeeper.ZooKeeper - Client 
environment:user.home=/root
   20:53:30.932 [main] INFO  org.apache.zookeeper.ZooKeeper - Client 
environment:user.dir=/pulsar
   20:53:30.932 [main] INFO  org.apache.zookeeper.ZooKeeper - Client 
environment:os.memory.free=899MB
   20:53:30.933 [main] INFO  org.apache.zookeeper.ZooKeeper - Client 
environment:os.memory.max=1024MB
   20:53:30.933 [main] INFO  org.apache.zookeeper.ZooKeeper - Client 
environment:os.memory.total=1024MB
   20:53:30.940 [main] INFO  org.apache.zookeeper.ZooKeeper - Initiating client 
connection, connectString=mytest-pulsar1-zookeeper:2181 sessionTimeout=30000 
watcher=org.apache.bookkeeper.zookeeper.ZooKeeperWatcherBase@5e4bd84a
   20:53:30.948 [main] INFO  org.apache.zookeeper.common.X509Util - Setting -D 
jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS 
renegotiation
   20:53:30.957 [main] INFO  org.apache.zookeeper.ClientCnxnSocket - 
jute.maxbuffer value is 4194304 Bytes
   20:53:30.967 [main] INFO  org.apache.zookeeper.ClientCnxn - 
zookeeper.request.timeout value is 0. feature enabled=
   20:53:30.986 [main-SendThread(mytest-pulsar1-zookeeper:2181)] INFO  
org.apache.zookeeper.ClientCnxn - Opening socket connection to server 
mytest-pulsar1-zookeeper/10.100.1.36:2181. Will not attempt to authenticate 
using SASL (unknown error)
   20:53:30.994 [main-SendThread(mytest-pulsar1-zookeeper:2181)] INFO  
org.apache.zookeeper.ClientCnxn - Socket connection established, initiating 
session, client: /10.100.0.168:55470, server: 
mytest-pulsar1-zookeeper/10.100.1.36:2181
   20:53:31.007 [main-SendThread(mytest-pulsar1-zookeeper:2181)] INFO  
org.apache.zookeeper.ClientCnxn - Session establishment complete on server 
mytest-pulsar1-zookeeper/10.100.1.36:2181, sessionid = 0x10000b712950011, 
negotiated timeout = 30000
   20:53:31.011 [main-EventThread] INFO  
org.apache.bookkeeper.zookeeper.ZooKeeperWatcherBase - ZooKeeper client is 
connected now.
   20:53:31.040 [main] INFO  
org.apache.bookkeeper.tools.cli.commands.bookies.InstanceIdCommand - Metadata 
Service Uri: zk+null://mytest-pulsar1-zookeeper:2181/ledgers InstanceId: 
0b091500-6750-479f-b419-25d957d5a4e0
   20:53:31.147 [main-EventThread] INFO  org.apache.zookeeper.ClientCnxn - 
EventThread shut down for session: 0x10000b712950011
   20:53:31.148 [main] INFO  org.apache.zookeeper.ZooKeeper - Session: 
0x10000b712950011 closed
   ```
   
   **To Reproduce**
   Just follow the official procedure except specify an older Pulsar version in 
the value.yaml file, as below:
   ```
   mages:
     zookeeper:
       repository: apachepulsar/pulsar-all
       tag: 2.6.1
       pullPolicy: IfNotPresent
     bookie:
       repository: apachepulsar/pulsar-all
       tag: 2.6.1
       pullPolicy: IfNotPresent
     autorecovery:
       repository: apachepulsar/pulsar-all
       tag: 2.6.1
       pullPolicy: IfNotPresent
     broker:
       repository: apachepulsar/pulsar-all
       tag: 2.6.1
       pullPolicy: IfNotPresent
     proxy:
       repository: apachepulsar/pulsar-all
       tag: 2.6.1
       pullPolicy: IfNotPresent
     functions:
       repository: apachepulsar/pulsar-all
       tag: 2.6.1
   ```
   
   I also tested ont version 2.6.2 and this time both bookie and broker Pods 
got stuck in the same check.
   The chart works with 2.7.0 though without any issue.
   
   **Expected behavior**
   All Pulsar Pods should be up and running, as expected and demonstrated with 
version 2.7.0
   
   **Screenshots**
   If applicable, add screenshots to help explain your problem.
   
   **Desktop (please complete the following information):**
    - GKE (Ubuntu); K8s version: 1.17.14-gke.1600
   
   **Additional context**
   Add any other context about the problem here.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to