2019-12-10 10:07:34 UTC - Sijie Guo: yes most of the development happens in master. ---- 2019-12-10 16:48:15 UTC - Sijie Guo: The first release candidate for 2.5.0 is out for voting. Please help review, validate and vote the release. ---- 2019-12-10 20:08:41 UTC - juraj: building from master, docker image build is failing -- the `build-wheel-file-within-docker.sh` happens to be present in the root of that image, there isn't any `/pulsar/..` folder -- any hints how to resolve this quickly? ---- 2019-12-10 20:10:27 UTC - juraj: ---- 2019-12-11 01:37:20 UTC - jia zhai: @jia zhai set the channel topic: Apache Pulsar 2.5.0 is out for votes! Please help review and vote the release. <https://lists.apache.org/thread.html/7a050d26e327d09803de368ce109c7d61177131551bc5a114204a61b%40%3Cdev.pulsar.apache.org%3E> ---- 2019-12-11 06:32:24 UTC - Sijie Guo: ah I forgot that `-Pdocker` will use invoke another docker build.
so I am not sure if you can actually do `mvn install -DskipTests -Pdocker` in a docker environment. ---- 2019-12-11 07:58:31 UTC - juraj: but there is obviously an error because the image is missing /pulsar git structure which is needed to perform the 'wheel' build, but it's missing from it ---- 2019-12-11 07:59:45 UTC - juraj: when i run the build outside docker, it fails much earlier, on this: ---- 2019-12-11 08:01:26 UTC - juraj: i'm trying to convince my company to embrace Pulsar instead of Kafka but i'm running out of time. i've done heaps of work to push this forward but i can't do much more if the build process is so badly broken. ---- 2019-12-11 08:21:51 UTC - juraj: ah, the docker within docker uses the host's docker daemon, that's why the volume sharing doesn't work, i will add an optional env var override ---- 2019-12-11 08:26:23 UTC - Sijie Guo: > but there is obviously an error because the image is missing /pulsar git structure which is needed to perform the ‘wheel’ build, but it’s missing from it pulsar-build image only provides the build environment. you have to mount your git repo to the docker instance. something like : docker run -i -v ${local_git_repo_dir}:/pulsar apachepulsar/pulsar-build.. > when i run the build outside docker, it fails much earlier, I haven’t encountered this issue when running on my laptop. so I wasn’t sure what is the problem you encountered. > i can’t do much more if the build process is so badly broken just try to understand more here, why not use the pulsar released images? ---- 2019-12-11 08:29:32 UTC - Sijie Guo: > the docker within docker uses the host’s docker daemon The pulsar image contains the server jars (built by java) and c++/python clients. The build process uses docker to build c++ and python clients. we haven’t tested the whole build in a Docker-in-Docker environment. I am not sure what errors you will see. ---- 2019-12-11 08:30:15 UTC - juraj: the 2.4.1 has a problem where data isn't auto cleaned from the cluster (quotas/evictions) due to the issue where readers and consumers are on the same topic. 2.4.2 fixes this issue but there is a new k8s init problem with the zookeeper data init. hence i'm trying to get a hang of this, so that i can also contribute fixes. ---- 2019-12-11 08:33:18 UTC - juraj: btw either way works for me - docker-in-docker or build directly from the machine -- but both fail for me currently. the build on the machine cannot exec `./manage.py` in an image in `com.<spotify:dockerfile-maven-plugin:1.4.13:build>`, idk why ---- 2019-12-11 08:37:14 UTC - Sijie Guo: > the machine cannot exec `./manage.py` in an image in `com.<spotify:dockerfile-maven-plugin:1.4.13:build>` manage.py is part of the dashboard code. `dashboard/django/manage.py` this file should have the execute permission. ```[sijie@Sijies-MacBook-Pro pulsar (master)]$ ls -l dashboard/django/manage.py -rwxr-xr-x 1 sijie staff 1597 Nov 6 06:09 dashboard/django/manage.py``` ---- 2019-12-11 08:38:35 UTC - Sijie Guo: Can you check the fille permission for this file at your machine? ---- 2019-12-11 08:41:03 UTC - Sijie Guo: back to your original problem with 2.4.2, > 2.4.2 fixes this issue but there is a new k8s init problem with the zookeeper data init. have you tried to install the helm chart from a fresh state? what kind of changes you made to the helm chart? ---- 2019-12-11 08:49:51 UTC - juraj: ok, the manage.py is not getting +x during COPY, this is a known issue on Docker for Windows i'll add an explicit +x ---- 2019-12-11 08:51:15 UTC - Sijie Guo: oh ok ---- 2019-12-11 08:53:19 UTC - juraj: the mods i've done on the helm chart was mostly value tweaking (also for direct broker access w/o proxy) and switch from deployments to stateful sets, and it worked flawlessly with 2.4.1 - i can document / contribute back later. there may be something i can do with the container init scripts which delay based on ZK being fully up -- but i currently have no idea what the ZK init problem actually is ---- 2019-12-11 08:55:06 UTC - juraj: there already are changes in master for PulsarClusterMetadataSetup.main() so maybe it'll just work, maybe not ---- 2019-12-11 08:56:24 UTC - juraj: to remind, this ---- 2019-12-11 08:57:18 UTC - juraj: fails with this ---- 2019-12-11 08:57:44 UTC - Sijie Guo: but this code isn’t changed from 2.4.1 to 2.4.2. that’s why I don’t think it is a problem of the image. have you tried to install 2.4.1 again? ---- 2019-12-11 08:58:30 UTC - juraj: is there a determinable/known condition in which that code will 100% succeed? e.g. "all ZK nodes must be fully initialized at that time" ---- 2019-12-11 08:59:14 UTC - juraj: (i can try 2.4.1 again but i was running it dozens of times before and it always worked - sure this still well may be bc of my own changes) ---- 2019-12-11 09:00:11 UTC - juraj: if i know what state must the ZK cluster be at the time when the ZK-metadata-init task is run, i can focus on checking/achieving that w/ my k8s config ---- 2019-12-11 09:00:53 UTC - juraj: i can start by adding a `sleep 60` at the start of the zk meta init task ---- 2019-12-11 09:03:36 UTC - Sijie Guo: there is already a logic in the helm chart checking if the zookeeper cluster is ready (<https://github.com/apache/pulsar/blob/master/deployment/kubernetes/helm/pulsar/templates/zookeeper-metadata.yaml#L36>). If you want to be more precise, you can use the following logic to check: <https://github.com/apache/pulsar/blob/master/docker/pulsar/scripts/pulsar-zookeeper-ruok.sh> (replacing localhost with the actual zookeeper server) ---- 2019-12-11 09:05:13 UTC - juraj: yes that's the one i meant, the first link ---- 2019-12-11 09:05:15 UTC - juraj: i'll check the second now ---- 2019-12-11 09:06:33 UTC - juraj: so do you think i'm possibly hitting uninitialized zookeeper server too early? ---- 2019-12-11 09:07:01 UTC - Sijie Guo: > i can try 2.4.1 again but i was running it dozens of times before and it always worked - sure this still well may be bc of my own changes the /namespace exists typically means either your zookeeper data is not cleaned up in your previous or the script is run twice. I would suggest you trying out 2.4.1 to see if which one is the cause. because I still don’t think code is the problem. we have to figure out what was wrong in your current setup. If 2.4.1 also fails, that means there is something wrong with you current environment. then we have to debug why /namespace exists. ---- 2019-12-11 09:07:41 UTC - Sijie Guo: > so do you think i’m possibly hitting uninitialized zookeeper server too early? I don’t think so. the exception says ‘/namespace’ exists ---- 2019-12-11 09:09:09 UTC - juraj: so what are the possible reasons that it already exists? ---- 2019-12-11 09:10:36 UTC - Sijie Guo: the /namespace exists typically means either your zookeeper data is not cleaned up in your previous run or the script is run twice. ----