Deepyaman Datta created FLINK-32758:
---------------------------------------
Summary: PyFlink bounds are overly restrictive and outdated
Key: FLINK-32758
URL: https://issues.apache.org/jira/browse/FLINK-32758
Project: Flink
Issue Type: Improvement
Components: API / Python
Affects Versions: 1.17.1
Reporter: Deepyaman Datta
Hi! I am part of a team building the Flink backend for Ibis
([https://github.com/ibis-project/ibis]). We would like to leverage PyFlink
under the hood for execution; however, PyFlink's requirements are incompatible
with several other Ibis requirements. Beyond Ibis, PyFlink's outdated and
restrictive requirements prevent it from being used alongside most recent
releases of Python data libraries.
Some of the major libraries we (and likely others in the Python community
interested in using PyFlink alongside other libraries) need compatibility with:
* PyArrow (at least >=10.0.0, but there's no reason not to be also be
compatible with latest)
* pandas (should be compatible with 2.x series, but also probably with 1.4.x,
released January 2022, and 1.5.x)
* numpy (1.22 was released in December 2022)
* Newer releases of Apache Beam
* Newer releases of cython
Furthermore, uncapped dependencies could be more generally preferable, as they
avoid the need for frequent PyFlink releases as newer versions of libraries are
released. A common (and great) argument for not upper-bounding dependencies,
especially for libraries:
[https://iscinumpy.dev/post/bound-version-constraints/]
I am currently testing removing upper bounds in
[https://github.com/apache/flink/pull/23141]; so far, builds pass without issue
in
[b65c072|https://github.com/apache/flink/pull/23141/commits/b65c0723ed66e01e83d718f770aa916f41f34581],
and I'm currently waiting on
[c8eb15c|https://github.com/apache/flink/pull/23141/commits/c8eb15cbc371dc259fb4fda5395f0f55e08ea9c6]
to see if I can get PyArrow to resolve >=10.0.0. Solving the proposed
dependencies results in:
{{#}}
{{# This file is autogenerated by pip-compile with Python 3.8}}
{{# by the following command:}}
{{#}}
{{# pip-compile --config=pyproject.toml
--output-file=dev/compiled-requirements.txt dev/dev-requirements.txt}}
{{#}}
{{apache-beam==2.49.0}}
{{ # via -r dev/dev-requirements.txt}}
{{avro-python3==1.10.2}}
{{ # via -r dev/dev-requirements.txt}}
{{certifi==2023.7.22}}
{{ # via requests}}
{{charset-normalizer==3.2.0}}
{{ # via requests}}
{{cloudpickle==2.2.1}}
{{ # via}}
{{ # -r dev/dev-requirements.txt}}
{{ # apache-beam}}
{{crcmod==1.7}}
{{ # via apache-beam}}
{{cython==3.0.0}}
{{ # via -r dev/dev-requirements.txt}}
{{dill==0.3.1.1}}
{{ # via apache-beam}}
{{dnspython==2.4.1}}
{{ # via pymongo}}
{{docopt==0.6.2}}
{{ # via hdfs}}
{{exceptiongroup==1.1.2}}
{{ # via pytest}}
{{fastavro==1.8.2}}
{{ # via}}
{{ # -r dev/dev-requirements.txt}}
{{ # apache-beam}}
{{fasteners==0.18}}
{{ # via apache-beam}}
{{find-libpython==0.3.0}}
{{ # via pemja}}
{{grpcio==1.56.2}}
{{ # via}}
{{ # -r dev/dev-requirements.txt}}
{{ # apache-beam}}
{{ # grpcio-tools}}
{{grpcio-tools==1.56.2}}
{{ # via -r dev/dev-requirements.txt}}
{{hdfs==2.7.0}}
{{ # via apache-beam}}
{{httplib2==0.22.0}}
{{ # via}}
{{ # -r dev/dev-requirements.txt}}
{{ # apache-beam}}
{{idna==3.4}}
{{ # via requests}}
{{iniconfig==2.0.0}}
{{ # via pytest}}
{{numpy==1.24.4}}
{{ # via}}
{{ # -r dev/dev-requirements.txt}}
{{ # apache-beam}}
{{ # pandas}}
{{ # pyarrow}}
{{objsize==0.6.1}}
{{ # via apache-beam}}
{{orjson==3.9.2}}
{{ # via apache-beam}}
{{packaging==23.1}}
{{ # via pytest}}
{{pandas==2.0.3}}
{{ # via -r dev/dev-requirements.txt}}
{{pemja==0.3.0 ; platform_system != "Windows"}}
{{ # via -r dev/dev-requirements.txt}}
{{pluggy==1.2.0}}
{{ # via pytest}}
{{proto-plus==1.22.3}}
{{ # via apache-beam}}
{{protobuf==4.23.4}}
{{ # via}}
{{ # -r dev/dev-requirements.txt}}
{{ # apache-beam}}
{{ # grpcio-tools}}
{{ # proto-plus}}
{{py4j==0.10.9.7}}
{{ # via -r dev/dev-requirements.txt}}
{{pyarrow==11.0.0}}
{{ # via}}
{{ # -r dev/dev-requirements.txt}}
{{ # apache-beam}}
{{pydot==1.4.2}}
{{ # via apache-beam}}
{{pymongo==4.4.1}}
{{ # via apache-beam}}
{{pyparsing==3.1.1}}
{{ # via}}
{{ # httplib2}}
{{ # pydot}}
{{pytest==7.4.0}}
{{ # via -r dev/dev-requirements.txt}}
{{python-dateutil==2.8.2}}
{{ # via}}
{{ # -r dev/dev-requirements.txt}}
{{ # apache-beam}}
{{ # pandas}}
{{pytz==2023.3}}
{{ # via}}
{{ # -r dev/dev-requirements.txt}}
{{ # apache-beam}}
{{ # pandas}}
{{regex==2023.6.3}}
{{ # via apache-beam}}
{{requests==2.31.0}}
{{ # via}}
{{ # apache-beam}}
{{ # hdfs}}
{{six==1.16.0}}
{{ # via}}
{{ # hdfs}}
{{ # python-dateutil}}
{{tomli==2.0.1}}
{{ # via pytest}}
{{typing-extensions==4.7.1}}
{{ # via apache-beam}}
{{tzdata==2023.3}}
{{ # via pandas}}
{{urllib3==2.0.4}}
{{ # via requests}}
{{wheel==0.41.0}}
{{ # via -r dev/dev-requirements.txt}}
{{zstandard==0.21.0}}
{{ # via apache-beam}}
{{# The following packages are considered to be unsafe in a requirements file:}}
{{# pip}}
{{# setuptools}}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)