Thanks Dian!

> >> Is using pyflink from the flink distribution tarball (without pip) not
a supported way to use pyflink?
> You are right.

What is the reason for including
opt/python/{pyflink.zip,cloudpickle.zip,py4j.zip} in the base distribution
then?  Oh, a guess: to make it easier for TaskManagers to run
pyflink without having pyflink installed themselves?  Somehow I'd guess
this wouldn't work tho; I'd assume TaskManagers would also need some python
transitive dependencies, e.g. google protobuf.

> you could remove the JAR packages located under
/usr/local/lib/python3.7/dist-packages/pyflink/lib manually after `pip
install apache-flink`

Since we're building our own Docker image, I'm going the other way around:
just install pyflink, and symlink /opt/flink ->
/usr/lib/python3.7/dist-packages/pyflink.  So far so good, but I'm worried
that something will be fishy when trying to run JVM apps via pyflink.

-Ao



On Sun, Jan 29, 2023 at 1:43 AM Dian Fu <dian0511...@gmail.com> wrote:

> Hi Andrew,
>
> >> By pip installing apache-flink, this docker image will have the flink
> distro installed at /opt/flink and FLINK_HOME set to /opt/flink
> <https://github.com/apache/flink-docker/blob/master/1.16/scala_2.12-java11-ubuntu/Dockerfile>.
> BUT ALSO flink lib jars will be installed at e.g.
> /usr/local/lib/python3.7/dist-packages/pyflink/lib!
> So, by following those instructions, flink is effectively installed twice
> into the docker image.
>
> Yes, your understanding is correct. The base image `flink:1.15.2` doesn't
> include PyFlink and so you need to build a custom image if you want to use
> PyFlink. Regarding to the jar packages which are installed twice, you could
> remove the JAR packages located under
> /usr/local/lib/python3.7/dist-packages/pyflink/lib manually after `pip
> install apache-flink`. It will use the JAR packages located under
> $FLINK_HOME/lib.
>
> >> Is using pyflink from the flink distribution tarball (without pip) not
> a supported way to use pyflink?
> You are right.
>
> Regards,
> Dian
>
>
> On Thu, Jan 26, 2023 at 11:12 PM Andrew Otto <o...@wikimedia.org> wrote:
>
>> Ah, oops and my original email had a typo:
>> > Some python dependencies are not included in the flink distribution
>> tarballs: cloudpickle, py4j and pyflink are in opt/python.
>>
>> Should read:
>> > Some python dependencies ARE included in the flink distribution
>> tarballs: cloudpickle, py4j and pyflink are in opt/python.
>>
>> On Thu, Jan 26, 2023 at 10:10 AM Andrew Otto <o...@wikimedia.org> wrote:
>>
>>> Let me ask a related question:
>>>
>>> We are building our own base Flink docker image.  We will be deploying
>>> both JVM and python apps via flink-kubernetes-operator.
>>>
>>> Is there any reason not to install Flink in this image via `pip install
>>> apache-flink` and use it for JVM apps?
>>>
>>> -Andrew Otto
>>>  Wikimedia Foundation
>>>
>>>
>>>
>>> On Tue, Jan 24, 2023 at 4:26 PM Andrew Otto <o...@wikimedia.org> wrote:
>>>
>>>> Hello,
>>>>
>>>> I'm having quite a bit of trouble running pyflink from the default
>>>> flink distribution tarballs.  I'd expect the python examples to work as
>>>> long as python is installed, and we've got the distribution.  Some python
>>>> dependencies are not included in the flink distribution tarballs:
>>>> cloudpickle, py4j and pyflink are in opt/python.  Others are not, e.g.
>>>> protobuf.
>>>>
>>>> Now that I'm looking, I see that the pyflink installation instructions
>>>> <https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/python/installation/>
>>>>  are
>>>> to install via pip.
>>>>
>>>> I'm doing this in Docker for use with the flink-kubernetes-operator.
>>>> In the Using Flink Python on Docker
>>>> <https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/resource-providers/standalone/docker/#using-flink-python-on-docker>
>>>>  instructions,
>>>> there is a pip3 install apache-flink step.  I find this strange, since I'd
>>>> expect the 'FROM flink:1.15.2'  part to be sufficient.
>>>>
>>>> By pip installing apache-flink, this docker image will have the flink
>>>> distro installed at /opt/flink and FLINK_HOME set to /opt/flink
>>>> <https://github.com/apache/flink-docker/blob/master/1.16/scala_2.12-java11-ubuntu/Dockerfile>.
>>>> BUT ALSO flink lib jars will be installed at e.g.
>>>> /usr/local/lib/python3.7/dist-packages/pyflink/lib!
>>>> So, by following those instructions, flink is effectively installed
>>>> twice into the docker image.
>>>>
>>>> Am I correct or am I missing something?
>>>>
>>>> Is using pyflink from the flink distribution tarball (without pip) not
>>>> a supported way to use pyflink?
>>>>
>>>> Thanks!
>>>> -Andrew Otto
>>>>  Wikimedia Foundation
>>>>
>>>>

Reply via email to