Deepyaman Datta created FLINK-32758: ---------------------------------------
Summary: PyFlink bounds are overly restrictive and outdated Key: FLINK-32758 URL: https://issues.apache.org/jira/browse/FLINK-32758 Project: Flink Issue Type: Improvement Components: API / Python Affects Versions: 1.17.1 Reporter: Deepyaman Datta Hi! I am part of a team building the Flink backend for Ibis ([https://github.com/ibis-project/ibis]). We would like to leverage PyFlink under the hood for execution; however, PyFlink's requirements are incompatible with several other Ibis requirements. Beyond Ibis, PyFlink's outdated and restrictive requirements prevent it from being used alongside most recent releases of Python data libraries. Some of the major libraries we (and likely others in the Python community interested in using PyFlink alongside other libraries) need compatibility with: * PyArrow (at least >=10.0.0, but there's no reason not to be also be compatible with latest) * pandas (should be compatible with 2.x series, but also probably with 1.4.x, released January 2022, and 1.5.x) * numpy (1.22 was released in December 2022) * Newer releases of Apache Beam * Newer releases of cython Furthermore, uncapped dependencies could be more generally preferable, as they avoid the need for frequent PyFlink releases as newer versions of libraries are released. A common (and great) argument for not upper-bounding dependencies, especially for libraries: [https://iscinumpy.dev/post/bound-version-constraints/] I am currently testing removing upper bounds in [https://github.com/apache/flink/pull/23141]; so far, builds pass without issue in [b65c072|https://github.com/apache/flink/pull/23141/commits/b65c0723ed66e01e83d718f770aa916f41f34581], and I'm currently waiting on [c8eb15c|https://github.com/apache/flink/pull/23141/commits/c8eb15cbc371dc259fb4fda5395f0f55e08ea9c6] to see if I can get PyArrow to resolve >=10.0.0. Solving the proposed dependencies results in: {{#}} {{# This file is autogenerated by pip-compile with Python 3.8}} {{# by the following command:}} {{#}} {{# pip-compile --config=pyproject.toml --output-file=dev/compiled-requirements.txt dev/dev-requirements.txt}} {{#}} {{apache-beam==2.49.0}} {{ # via -r dev/dev-requirements.txt}} {{avro-python3==1.10.2}} {{ # via -r dev/dev-requirements.txt}} {{certifi==2023.7.22}} {{ # via requests}} {{charset-normalizer==3.2.0}} {{ # via requests}} {{cloudpickle==2.2.1}} {{ # via}} {{ # -r dev/dev-requirements.txt}} {{ # apache-beam}} {{crcmod==1.7}} {{ # via apache-beam}} {{cython==3.0.0}} {{ # via -r dev/dev-requirements.txt}} {{dill==0.3.1.1}} {{ # via apache-beam}} {{dnspython==2.4.1}} {{ # via pymongo}} {{docopt==0.6.2}} {{ # via hdfs}} {{exceptiongroup==1.1.2}} {{ # via pytest}} {{fastavro==1.8.2}} {{ # via}} {{ # -r dev/dev-requirements.txt}} {{ # apache-beam}} {{fasteners==0.18}} {{ # via apache-beam}} {{find-libpython==0.3.0}} {{ # via pemja}} {{grpcio==1.56.2}} {{ # via}} {{ # -r dev/dev-requirements.txt}} {{ # apache-beam}} {{ # grpcio-tools}} {{grpcio-tools==1.56.2}} {{ # via -r dev/dev-requirements.txt}} {{hdfs==2.7.0}} {{ # via apache-beam}} {{httplib2==0.22.0}} {{ # via}} {{ # -r dev/dev-requirements.txt}} {{ # apache-beam}} {{idna==3.4}} {{ # via requests}} {{iniconfig==2.0.0}} {{ # via pytest}} {{numpy==1.24.4}} {{ # via}} {{ # -r dev/dev-requirements.txt}} {{ # apache-beam}} {{ # pandas}} {{ # pyarrow}} {{objsize==0.6.1}} {{ # via apache-beam}} {{orjson==3.9.2}} {{ # via apache-beam}} {{packaging==23.1}} {{ # via pytest}} {{pandas==2.0.3}} {{ # via -r dev/dev-requirements.txt}} {{pemja==0.3.0 ; platform_system != "Windows"}} {{ # via -r dev/dev-requirements.txt}} {{pluggy==1.2.0}} {{ # via pytest}} {{proto-plus==1.22.3}} {{ # via apache-beam}} {{protobuf==4.23.4}} {{ # via}} {{ # -r dev/dev-requirements.txt}} {{ # apache-beam}} {{ # grpcio-tools}} {{ # proto-plus}} {{py4j==0.10.9.7}} {{ # via -r dev/dev-requirements.txt}} {{pyarrow==11.0.0}} {{ # via}} {{ # -r dev/dev-requirements.txt}} {{ # apache-beam}} {{pydot==1.4.2}} {{ # via apache-beam}} {{pymongo==4.4.1}} {{ # via apache-beam}} {{pyparsing==3.1.1}} {{ # via}} {{ # httplib2}} {{ # pydot}} {{pytest==7.4.0}} {{ # via -r dev/dev-requirements.txt}} {{python-dateutil==2.8.2}} {{ # via}} {{ # -r dev/dev-requirements.txt}} {{ # apache-beam}} {{ # pandas}} {{pytz==2023.3}} {{ # via}} {{ # -r dev/dev-requirements.txt}} {{ # apache-beam}} {{ # pandas}} {{regex==2023.6.3}} {{ # via apache-beam}} {{requests==2.31.0}} {{ # via}} {{ # apache-beam}} {{ # hdfs}} {{six==1.16.0}} {{ # via}} {{ # hdfs}} {{ # python-dateutil}} {{tomli==2.0.1}} {{ # via pytest}} {{typing-extensions==4.7.1}} {{ # via apache-beam}} {{tzdata==2023.3}} {{ # via pandas}} {{urllib3==2.0.4}} {{ # via requests}} {{wheel==0.41.0}} {{ # via -r dev/dev-requirements.txt}} {{zstandard==0.21.0}} {{ # via apache-beam}} {{# The following packages are considered to be unsafe in a requirements file:}} {{# pip}} {{# setuptools}} -- This message was sent by Atlassian Jira (v8.20.10#820010)