>> What is the reason for including
opt/python/{pyflink.zip,cloudpickle.zip,py4j.zip} in the base distribution
then?  Oh, a guess: to make it easier for TaskManagers to run
pyflink without having pyflink installed themselves?  Somehow I'd guess
this wouldn't work tho; I'd assume TaskManagers would also need some python
transitive dependencies, e.g. google protobuf.

It has some historical reasons. In the first version (1.9.x) which has not
provided Python UDF support, it's not necessary to install PyFlink in the
nodes of TaskManagers. Since 1.10 which supports Python UDF, users have to
install PyFlink in the nodes of TaskManager as there are many transitive
dependencies, e.g. Apache Beam、protobuf、pandas, etc. However, we have not
removed these packages as they are still useful for client node which is
responsible for compiling jobs(it's not necessary to install PyFlink in the
client node).

>> Since we're building our own Docker image, I'm going the other way
around: just install pyflink, and symlink /opt/flink ->
/usr/lib/python3.7/dist-packages/pyflink.  So far so good, but I'm worried
that something will be fishy when trying to run JVM apps via pyflink.

Good idea! It contains all the things necessary needed to run JVM apps in
the PyFlink package and so I think you could just try this way.

Regards,
Dian

On Mon, Jan 30, 2023 at 9:58 PM Andrew Otto <o...@wikimedia.org> wrote:

> Thanks Dian!
>
> > >> Is using pyflink from the flink distribution tarball (without pip)
> not a supported way to use pyflink?
> > You are right.
>
> What is the reason for including
> opt/python/{pyflink.zip,cloudpickle.zip,py4j.zip} in the base
> distribution then?  Oh, a guess: to make it easier for TaskManagers to run
> pyflink without having pyflink installed themselves?  Somehow I'd guess
> this wouldn't work tho; I'd assume TaskManagers would also need some python
> transitive dependencies, e.g. google protobuf.
>
> > you could remove the JAR packages located under
> /usr/local/lib/python3.7/dist-packages/pyflink/lib manually after `pip
> install apache-flink`
>
> Since we're building our own Docker image, I'm going the other way around:
> just install pyflink, and symlink /opt/flink ->
> /usr/lib/python3.7/dist-packages/pyflink.  So far so good, but I'm worried
> that something will be fishy when trying to run JVM apps via pyflink.
>
> -Ao
>
>
>
> On Sun, Jan 29, 2023 at 1:43 AM Dian Fu <dian0511...@gmail.com> wrote:
>
>> Hi Andrew,
>>
>> >> By pip installing apache-flink, this docker image will have the flink
>> distro installed at /opt/flink and FLINK_HOME set to /opt/flink
>> <https://github.com/apache/flink-docker/blob/master/1.16/scala_2.12-java11-ubuntu/Dockerfile>.
>> BUT ALSO flink lib jars will be installed at e.g.
>> /usr/local/lib/python3.7/dist-packages/pyflink/lib!
>> So, by following those instructions, flink is effectively installed twice
>> into the docker image.
>>
>> Yes, your understanding is correct. The base image `flink:1.15.2`
>> doesn't include PyFlink and so you need to build a custom image if you want
>> to use PyFlink. Regarding to the jar packages which are installed twice,
>> you could remove the JAR packages located under
>> /usr/local/lib/python3.7/dist-packages/pyflink/lib manually after `pip
>> install apache-flink`. It will use the JAR packages located under
>> $FLINK_HOME/lib.
>>
>> >> Is using pyflink from the flink distribution tarball (without pip) not
>> a supported way to use pyflink?
>> You are right.
>>
>> Regards,
>> Dian
>>
>>
>> On Thu, Jan 26, 2023 at 11:12 PM Andrew Otto <o...@wikimedia.org> wrote:
>>
>>> Ah, oops and my original email had a typo:
>>> > Some python dependencies are not included in the flink distribution
>>> tarballs: cloudpickle, py4j and pyflink are in opt/python.
>>>
>>> Should read:
>>> > Some python dependencies ARE included in the flink distribution
>>> tarballs: cloudpickle, py4j and pyflink are in opt/python.
>>>
>>> On Thu, Jan 26, 2023 at 10:10 AM Andrew Otto <o...@wikimedia.org> wrote:
>>>
>>>> Let me ask a related question:
>>>>
>>>> We are building our own base Flink docker image.  We will be deploying
>>>> both JVM and python apps via flink-kubernetes-operator.
>>>>
>>>> Is there any reason not to install Flink in this image via `pip install
>>>> apache-flink` and use it for JVM apps?
>>>>
>>>> -Andrew Otto
>>>>  Wikimedia Foundation
>>>>
>>>>
>>>>
>>>> On Tue, Jan 24, 2023 at 4:26 PM Andrew Otto <o...@wikimedia.org> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I'm having quite a bit of trouble running pyflink from the default
>>>>> flink distribution tarballs.  I'd expect the python examples to work as
>>>>> long as python is installed, and we've got the distribution.  Some python
>>>>> dependencies are not included in the flink distribution tarballs:
>>>>> cloudpickle, py4j and pyflink are in opt/python.  Others are not, e.g.
>>>>> protobuf.
>>>>>
>>>>> Now that I'm looking, I see that the pyflink installation instructions
>>>>> <https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/python/installation/>
>>>>>  are
>>>>> to install via pip.
>>>>>
>>>>> I'm doing this in Docker for use with the flink-kubernetes-operator.
>>>>> In the Using Flink Python on Docker
>>>>> <https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/resource-providers/standalone/docker/#using-flink-python-on-docker>
>>>>>  instructions,
>>>>> there is a pip3 install apache-flink step.  I find this strange, since I'd
>>>>> expect the 'FROM flink:1.15.2'  part to be sufficient.
>>>>>
>>>>> By pip installing apache-flink, this docker image will have the flink
>>>>> distro installed at /opt/flink and FLINK_HOME set to /opt/flink
>>>>> <https://github.com/apache/flink-docker/blob/master/1.16/scala_2.12-java11-ubuntu/Dockerfile>.
>>>>> BUT ALSO flink lib jars will be installed at e.g.
>>>>> /usr/local/lib/python3.7/dist-packages/pyflink/lib!
>>>>> So, by following those instructions, flink is effectively installed
>>>>> twice into the docker image.
>>>>>
>>>>> Am I correct or am I missing something?
>>>>>
>>>>> Is using pyflink from the flink distribution tarball (without pip) not
>>>>> a supported way to use pyflink?
>>>>>
>>>>> Thanks!
>>>>> -Andrew Otto
>>>>>  Wikimedia Foundation
>>>>>
>>>>>

Reply via email to