>> What is the reason for including opt/python/{pyflink.zip,cloudpickle.zip,py4j.zip} in the base distribution then? Oh, a guess: to make it easier for TaskManagers to run pyflink without having pyflink installed themselves? Somehow I'd guess this wouldn't work tho; I'd assume TaskManagers would also need some python transitive dependencies, e.g. google protobuf.
It has some historical reasons. In the first version (1.9.x) which has not provided Python UDF support, it's not necessary to install PyFlink in the nodes of TaskManagers. Since 1.10 which supports Python UDF, users have to install PyFlink in the nodes of TaskManager as there are many transitive dependencies, e.g. Apache Beam、protobuf、pandas, etc. However, we have not removed these packages as they are still useful for client node which is responsible for compiling jobs(it's not necessary to install PyFlink in the client node). >> Since we're building our own Docker image, I'm going the other way around: just install pyflink, and symlink /opt/flink -> /usr/lib/python3.7/dist-packages/pyflink. So far so good, but I'm worried that something will be fishy when trying to run JVM apps via pyflink. Good idea! It contains all the things necessary needed to run JVM apps in the PyFlink package and so I think you could just try this way. Regards, Dian On Mon, Jan 30, 2023 at 9:58 PM Andrew Otto <o...@wikimedia.org> wrote: > Thanks Dian! > > > >> Is using pyflink from the flink distribution tarball (without pip) > not a supported way to use pyflink? > > You are right. > > What is the reason for including > opt/python/{pyflink.zip,cloudpickle.zip,py4j.zip} in the base > distribution then? Oh, a guess: to make it easier for TaskManagers to run > pyflink without having pyflink installed themselves? Somehow I'd guess > this wouldn't work tho; I'd assume TaskManagers would also need some python > transitive dependencies, e.g. google protobuf. > > > you could remove the JAR packages located under > /usr/local/lib/python3.7/dist-packages/pyflink/lib manually after `pip > install apache-flink` > > Since we're building our own Docker image, I'm going the other way around: > just install pyflink, and symlink /opt/flink -> > /usr/lib/python3.7/dist-packages/pyflink. So far so good, but I'm worried > that something will be fishy when trying to run JVM apps via pyflink. > > -Ao > > > > On Sun, Jan 29, 2023 at 1:43 AM Dian Fu <dian0511...@gmail.com> wrote: > >> Hi Andrew, >> >> >> By pip installing apache-flink, this docker image will have the flink >> distro installed at /opt/flink and FLINK_HOME set to /opt/flink >> <https://github.com/apache/flink-docker/blob/master/1.16/scala_2.12-java11-ubuntu/Dockerfile>. >> BUT ALSO flink lib jars will be installed at e.g. >> /usr/local/lib/python3.7/dist-packages/pyflink/lib! >> So, by following those instructions, flink is effectively installed twice >> into the docker image. >> >> Yes, your understanding is correct. The base image `flink:1.15.2` >> doesn't include PyFlink and so you need to build a custom image if you want >> to use PyFlink. Regarding to the jar packages which are installed twice, >> you could remove the JAR packages located under >> /usr/local/lib/python3.7/dist-packages/pyflink/lib manually after `pip >> install apache-flink`. It will use the JAR packages located under >> $FLINK_HOME/lib. >> >> >> Is using pyflink from the flink distribution tarball (without pip) not >> a supported way to use pyflink? >> You are right. >> >> Regards, >> Dian >> >> >> On Thu, Jan 26, 2023 at 11:12 PM Andrew Otto <o...@wikimedia.org> wrote: >> >>> Ah, oops and my original email had a typo: >>> > Some python dependencies are not included in the flink distribution >>> tarballs: cloudpickle, py4j and pyflink are in opt/python. >>> >>> Should read: >>> > Some python dependencies ARE included in the flink distribution >>> tarballs: cloudpickle, py4j and pyflink are in opt/python. >>> >>> On Thu, Jan 26, 2023 at 10:10 AM Andrew Otto <o...@wikimedia.org> wrote: >>> >>>> Let me ask a related question: >>>> >>>> We are building our own base Flink docker image. We will be deploying >>>> both JVM and python apps via flink-kubernetes-operator. >>>> >>>> Is there any reason not to install Flink in this image via `pip install >>>> apache-flink` and use it for JVM apps? >>>> >>>> -Andrew Otto >>>> Wikimedia Foundation >>>> >>>> >>>> >>>> On Tue, Jan 24, 2023 at 4:26 PM Andrew Otto <o...@wikimedia.org> wrote: >>>> >>>>> Hello, >>>>> >>>>> I'm having quite a bit of trouble running pyflink from the default >>>>> flink distribution tarballs. I'd expect the python examples to work as >>>>> long as python is installed, and we've got the distribution. Some python >>>>> dependencies are not included in the flink distribution tarballs: >>>>> cloudpickle, py4j and pyflink are in opt/python. Others are not, e.g. >>>>> protobuf. >>>>> >>>>> Now that I'm looking, I see that the pyflink installation instructions >>>>> <https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/python/installation/> >>>>> are >>>>> to install via pip. >>>>> >>>>> I'm doing this in Docker for use with the flink-kubernetes-operator. >>>>> In the Using Flink Python on Docker >>>>> <https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/resource-providers/standalone/docker/#using-flink-python-on-docker> >>>>> instructions, >>>>> there is a pip3 install apache-flink step. I find this strange, since I'd >>>>> expect the 'FROM flink:1.15.2' part to be sufficient. >>>>> >>>>> By pip installing apache-flink, this docker image will have the flink >>>>> distro installed at /opt/flink and FLINK_HOME set to /opt/flink >>>>> <https://github.com/apache/flink-docker/blob/master/1.16/scala_2.12-java11-ubuntu/Dockerfile>. >>>>> BUT ALSO flink lib jars will be installed at e.g. >>>>> /usr/local/lib/python3.7/dist-packages/pyflink/lib! >>>>> So, by following those instructions, flink is effectively installed >>>>> twice into the docker image. >>>>> >>>>> Am I correct or am I missing something? >>>>> >>>>> Is using pyflink from the flink distribution tarball (without pip) not >>>>> a supported way to use pyflink? >>>>> >>>>> Thanks! >>>>> -Andrew Otto >>>>> Wikimedia Foundation >>>>> >>>>>