Thanks Dian! > >> Is using pyflink from the flink distribution tarball (without pip) not a supported way to use pyflink? > You are right.
What is the reason for including opt/python/{pyflink.zip,cloudpickle.zip,py4j.zip} in the base distribution then? Oh, a guess: to make it easier for TaskManagers to run pyflink without having pyflink installed themselves? Somehow I'd guess this wouldn't work tho; I'd assume TaskManagers would also need some python transitive dependencies, e.g. google protobuf. > you could remove the JAR packages located under /usr/local/lib/python3.7/dist-packages/pyflink/lib manually after `pip install apache-flink` Since we're building our own Docker image, I'm going the other way around: just install pyflink, and symlink /opt/flink -> /usr/lib/python3.7/dist-packages/pyflink. So far so good, but I'm worried that something will be fishy when trying to run JVM apps via pyflink. -Ao On Sun, Jan 29, 2023 at 1:43 AM Dian Fu <dian0511...@gmail.com> wrote: > Hi Andrew, > > >> By pip installing apache-flink, this docker image will have the flink > distro installed at /opt/flink and FLINK_HOME set to /opt/flink > <https://github.com/apache/flink-docker/blob/master/1.16/scala_2.12-java11-ubuntu/Dockerfile>. > BUT ALSO flink lib jars will be installed at e.g. > /usr/local/lib/python3.7/dist-packages/pyflink/lib! > So, by following those instructions, flink is effectively installed twice > into the docker image. > > Yes, your understanding is correct. The base image `flink:1.15.2` doesn't > include PyFlink and so you need to build a custom image if you want to use > PyFlink. Regarding to the jar packages which are installed twice, you could > remove the JAR packages located under > /usr/local/lib/python3.7/dist-packages/pyflink/lib manually after `pip > install apache-flink`. It will use the JAR packages located under > $FLINK_HOME/lib. > > >> Is using pyflink from the flink distribution tarball (without pip) not > a supported way to use pyflink? > You are right. > > Regards, > Dian > > > On Thu, Jan 26, 2023 at 11:12 PM Andrew Otto <o...@wikimedia.org> wrote: > >> Ah, oops and my original email had a typo: >> > Some python dependencies are not included in the flink distribution >> tarballs: cloudpickle, py4j and pyflink are in opt/python. >> >> Should read: >> > Some python dependencies ARE included in the flink distribution >> tarballs: cloudpickle, py4j and pyflink are in opt/python. >> >> On Thu, Jan 26, 2023 at 10:10 AM Andrew Otto <o...@wikimedia.org> wrote: >> >>> Let me ask a related question: >>> >>> We are building our own base Flink docker image. We will be deploying >>> both JVM and python apps via flink-kubernetes-operator. >>> >>> Is there any reason not to install Flink in this image via `pip install >>> apache-flink` and use it for JVM apps? >>> >>> -Andrew Otto >>> Wikimedia Foundation >>> >>> >>> >>> On Tue, Jan 24, 2023 at 4:26 PM Andrew Otto <o...@wikimedia.org> wrote: >>> >>>> Hello, >>>> >>>> I'm having quite a bit of trouble running pyflink from the default >>>> flink distribution tarballs. I'd expect the python examples to work as >>>> long as python is installed, and we've got the distribution. Some python >>>> dependencies are not included in the flink distribution tarballs: >>>> cloudpickle, py4j and pyflink are in opt/python. Others are not, e.g. >>>> protobuf. >>>> >>>> Now that I'm looking, I see that the pyflink installation instructions >>>> <https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/python/installation/> >>>> are >>>> to install via pip. >>>> >>>> I'm doing this in Docker for use with the flink-kubernetes-operator. >>>> In the Using Flink Python on Docker >>>> <https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/resource-providers/standalone/docker/#using-flink-python-on-docker> >>>> instructions, >>>> there is a pip3 install apache-flink step. I find this strange, since I'd >>>> expect the 'FROM flink:1.15.2' part to be sufficient. >>>> >>>> By pip installing apache-flink, this docker image will have the flink >>>> distro installed at /opt/flink and FLINK_HOME set to /opt/flink >>>> <https://github.com/apache/flink-docker/blob/master/1.16/scala_2.12-java11-ubuntu/Dockerfile>. >>>> BUT ALSO flink lib jars will be installed at e.g. >>>> /usr/local/lib/python3.7/dist-packages/pyflink/lib! >>>> So, by following those instructions, flink is effectively installed >>>> twice into the docker image. >>>> >>>> Am I correct or am I missing something? >>>> >>>> Is using pyflink from the flink distribution tarball (without pip) not >>>> a supported way to use pyflink? >>>> >>>> Thanks! >>>> -Andrew Otto >>>> Wikimedia Foundation >>>> >>>>