2019-12-10 10:07:34 UTC - Sijie Guo: yes most of the development happens in 
master.
----
2019-12-10 16:48:15 UTC - Sijie Guo: The first release candidate for 2.5.0 is 
out for voting. Please help review, validate and vote the release.
----
2019-12-10 20:08:41 UTC - juraj: building from master, docker image build is 
failing -- the `build-wheel-file-within-docker.sh` happens to be present in the 
root of that image, there isn't any `/pulsar/..` folder -- any hints how to 
resolve this quickly?
----
2019-12-10 20:10:27 UTC - juraj: 
----
2019-12-11 01:37:20 UTC - jia zhai: @jia zhai set the channel topic: Apache 
Pulsar 2.5.0 is out for votes! Please help review and vote the release. 
<https://lists.apache.org/thread.html/7a050d26e327d09803de368ce109c7d61177131551bc5a114204a61b%40%3Cdev.pulsar.apache.org%3E>
----
2019-12-11 06:32:24 UTC - Sijie Guo: ah I forgot that `-Pdocker` will use 
invoke another docker build.

so I am not sure if you can actually do `mvn install -DskipTests -Pdocker` in a 
docker environment.
----
2019-12-11 07:58:31 UTC - juraj: but there is obviously an error because the 
image is missing /pulsar git structure which is needed to perform the 'wheel' 
build, but it's missing from it
----
2019-12-11 07:59:45 UTC - juraj: when i run the build outside docker, it fails 
much earlier, on this:
----
2019-12-11 08:01:26 UTC - juraj: i'm trying to convince my company to embrace 
Pulsar instead of Kafka but i'm running out of time.
i've done heaps of work to push this forward but i can't do much more if the 
build process is so badly broken.
----
2019-12-11 08:21:51 UTC - juraj: ah, the docker within docker uses the host's 
docker daemon, that's why the volume sharing doesn't work, i will add an 
optional env var override
----
2019-12-11 08:26:23 UTC - Sijie Guo: &gt;  but there is obviously an error 
because the image is missing /pulsar git structure which is needed to perform 
the ‘wheel’ build, but it’s missing from it
pulsar-build image only provides the build environment. you have to mount your 
git repo to the docker instance.

something like : docker run -i -v ${local_git_repo_dir}:/pulsar 
apachepulsar/pulsar-build..

&gt;  when i run the build outside docker, it fails much earlier,
I haven’t encountered this issue when running on my laptop. so I wasn’t sure 
what is the problem you encountered.

&gt; i can’t do much more if the build process is so badly broken 
just try to understand more here, why not use the pulsar released images?
----
2019-12-11 08:29:32 UTC - Sijie Guo: &gt; the docker within docker uses the 
host’s docker daemon
The pulsar image contains the server jars (built by java) and c++/python 
clients. The build process uses docker to build c++ and python clients.

we haven’t tested the whole build in a Docker-in-Docker environment. I am not 
sure what errors you will see.
----
2019-12-11 08:30:15 UTC - juraj: the 2.4.1 has a problem where data isn't auto 
cleaned from the cluster (quotas/evictions) due to the issue where readers and 
consumers are on the same topic.
2.4.2 fixes this issue but there is a new k8s init problem with the zookeeper 
data init.
hence i'm trying to get a hang of this, so that i can also contribute fixes.
----
2019-12-11 08:33:18 UTC - juraj: btw either way works for me - docker-in-docker 
or build directly from the machine -- but both fail for me currently.
the build on the machine cannot exec `./manage.py` in an image in 
`com.<spotify:dockerfile-maven-plugin:1.4.13:build>`, idk why
----
2019-12-11 08:37:14 UTC - Sijie Guo: &gt;  the machine cannot exec 
`./manage.py` in an image in 
`com.<spotify:dockerfile-maven-plugin:1.4.13:build>`
manage.py is part of the dashboard code. `dashboard/django/manage.py`

this file should have the execute permission.

```[sijie@Sijies-MacBook-Pro pulsar (master)]$ ls -l dashboard/django/manage.py
-rwxr-xr-x  1 sijie  staff  1597 Nov  6 06:09 dashboard/django/manage.py```
----
2019-12-11 08:38:35 UTC - Sijie Guo: Can you check the fille permission for 
this file at your machine?
----
2019-12-11 08:41:03 UTC - Sijie Guo: back to your original problem with 2.4.2,

&gt;  2.4.2 fixes this issue but there is a new k8s init problem with the 
zookeeper data init.
have you tried to install the helm chart from a fresh state?

what kind of changes you made to the helm chart?
----
2019-12-11 08:49:51 UTC - juraj: ok, the manage.py is not getting +x during 
COPY, this is a known issue on Docker for Windows
i'll add an explicit +x
----
2019-12-11 08:51:15 UTC - Sijie Guo: oh ok
----
2019-12-11 08:53:19 UTC - juraj: the mods i've done on the helm chart was 
mostly value tweaking (also for direct broker access w/o proxy) and switch from 
deployments to stateful sets, and it worked flawlessly with 2.4.1 - i can 
document / contribute back later.

there may be something i can do with the container init scripts which delay 
based on ZK being fully up -- but i currently have no idea what the ZK init 
problem actually is
----
2019-12-11 08:55:06 UTC - juraj: there already are changes in master for 
PulsarClusterMetadataSetup.main() so maybe it'll just work, maybe not
----
2019-12-11 08:56:24 UTC - juraj: to remind, this
----
2019-12-11 08:57:18 UTC - juraj: fails with this
----
2019-12-11 08:57:44 UTC - Sijie Guo: but this code isn’t changed from 2.4.1 to 
2.4.2. that’s why I don’t think it is a problem of the image. have you tried to 
install 2.4.1 again?
----
2019-12-11 08:58:30 UTC - juraj: is there a determinable/known condition in 
which that code will 100% succeed?  e.g. "all ZK nodes must be fully 
initialized at that time"
----
2019-12-11 08:59:14 UTC - juraj: (i can try 2.4.1 again but i was running it 
dozens of times before and it always worked - sure this still well may be bc of 
my own changes)
----
2019-12-11 09:00:11 UTC - juraj: if i know what state must the ZK cluster be at 
the time when the ZK-metadata-init task is run, i can focus on 
checking/achieving that w/ my k8s config
----
2019-12-11 09:00:53 UTC - juraj: i can start by adding a `sleep 60` at the 
start of the zk meta init task
----
2019-12-11 09:03:36 UTC - Sijie Guo: there is already a logic in the helm chart 
checking if the zookeeper cluster is ready 
(<https://github.com/apache/pulsar/blob/master/deployment/kubernetes/helm/pulsar/templates/zookeeper-metadata.yaml#L36>).

If you want to be more precise, you can use the following logic to check:
<https://github.com/apache/pulsar/blob/master/docker/pulsar/scripts/pulsar-zookeeper-ruok.sh>
 (replacing localhost with the actual zookeeper server)
----
2019-12-11 09:05:13 UTC - juraj: yes that's the one i meant, the first link
----
2019-12-11 09:05:15 UTC - juraj: i'll check the second now
----
2019-12-11 09:06:33 UTC - juraj: so do you think i'm possibly hitting 
uninitialized zookeeper server too early?
----
2019-12-11 09:07:01 UTC - Sijie Guo: &gt; i can try 2.4.1 again but i was 
running it dozens of times before and it always worked - sure this still well 
may be bc of my own changes
the /namespace exists typically means either your zookeeper data is not cleaned 
up in your previous or the script is run twice.

I would suggest you trying out 2.4.1 to see if which one is the cause. because 
I still don’t think code is the problem. we have to figure out what was wrong 
in your current setup. If 2.4.1 also fails, that means there is something wrong 
with you current environment.  then we have to debug why /namespace exists.
----
2019-12-11 09:07:41 UTC - Sijie Guo: &gt; so do you think i’m possibly hitting 
uninitialized zookeeper server too early?
I don’t think so. the exception says ‘/namespace’ exists
----
2019-12-11 09:09:09 UTC - juraj: so what are the possible reasons that it 
already exists?
----
2019-12-11 09:10:36 UTC - Sijie Guo: the /namespace exists typically means 
either your zookeeper data is not cleaned up in your previous run or the script 
is run twice.
----

Reply via email to