Deepyaman Datta created FLINK-32758:
---------------------------------------

             Summary: PyFlink bounds are overly restrictive and outdated
                 Key: FLINK-32758
                 URL: https://issues.apache.org/jira/browse/FLINK-32758
             Project: Flink
          Issue Type: Improvement
          Components: API / Python
    Affects Versions: 1.17.1
            Reporter: Deepyaman Datta


Hi! I am part of a team building the Flink backend for Ibis 
([https://github.com/ibis-project/ibis]). We would like to leverage PyFlink 
under the hood for execution; however, PyFlink's requirements are incompatible 
with several other Ibis requirements. Beyond Ibis, PyFlink's outdated and 
restrictive requirements prevent it from being used alongside most recent 
releases of Python data libraries.

Some of the major libraries we (and likely others in the Python community 
interested in using PyFlink alongside other libraries) need compatibility with:
 * PyArrow (at least >=10.0.0, but there's no reason not to be also be 
compatible with latest)
 * pandas (should be compatible with 2.x series, but also probably with 1.4.x, 
released January 2022, and 1.5.x)
 * numpy (1.22 was released in December 2022)
 * Newer releases of Apache Beam
 * Newer releases of cython

Furthermore, uncapped dependencies could be more generally preferable, as they 
avoid the need for frequent PyFlink releases as newer versions of libraries are 
released. A common (and great) argument for not upper-bounding dependencies, 
especially for libraries: 
[https://iscinumpy.dev/post/bound-version-constraints/]

I am currently testing removing upper bounds in 
[https://github.com/apache/flink/pull/23141]; so far, builds pass without issue 
in 
[b65c072|https://github.com/apache/flink/pull/23141/commits/b65c0723ed66e01e83d718f770aa916f41f34581],
 and I'm currently waiting on 
[c8eb15c|https://github.com/apache/flink/pull/23141/commits/c8eb15cbc371dc259fb4fda5395f0f55e08ea9c6]
 to see if I can get PyArrow to resolve >=10.0.0. Solving the proposed 
dependencies results in:



{{#}}
{{# This file is autogenerated by pip-compile with Python 3.8}}
{{# by the following command:}}
{{#}}
{{#    pip-compile --config=pyproject.toml 
--output-file=dev/compiled-requirements.txt dev/dev-requirements.txt}}
{{#}}
{{apache-beam==2.49.0}}
{{    # via -r dev/dev-requirements.txt}}
{{avro-python3==1.10.2}}
{{    # via -r dev/dev-requirements.txt}}
{{certifi==2023.7.22}}
{{    # via requests}}
{{charset-normalizer==3.2.0}}
{{    # via requests}}
{{cloudpickle==2.2.1}}
{{    # via}}
{{    #   -r dev/dev-requirements.txt}}
{{    #   apache-beam}}
{{crcmod==1.7}}
{{    # via apache-beam}}
{{cython==3.0.0}}
{{    # via -r dev/dev-requirements.txt}}
{{dill==0.3.1.1}}
{{    # via apache-beam}}
{{dnspython==2.4.1}}
{{    # via pymongo}}
{{docopt==0.6.2}}
{{    # via hdfs}}
{{exceptiongroup==1.1.2}}
{{    # via pytest}}
{{fastavro==1.8.2}}
{{    # via}}
{{    #   -r dev/dev-requirements.txt}}
{{    #   apache-beam}}
{{fasteners==0.18}}
{{    # via apache-beam}}
{{find-libpython==0.3.0}}
{{    # via pemja}}
{{grpcio==1.56.2}}
{{    # via}}
{{    #   -r dev/dev-requirements.txt}}
{{    #   apache-beam}}
{{    #   grpcio-tools}}
{{grpcio-tools==1.56.2}}
{{    # via -r dev/dev-requirements.txt}}
{{hdfs==2.7.0}}
{{    # via apache-beam}}
{{httplib2==0.22.0}}
{{    # via}}
{{    #   -r dev/dev-requirements.txt}}
{{    #   apache-beam}}
{{idna==3.4}}
{{    # via requests}}
{{iniconfig==2.0.0}}
{{    # via pytest}}
{{numpy==1.24.4}}
{{    # via}}
{{    #   -r dev/dev-requirements.txt}}
{{    #   apache-beam}}
{{    #   pandas}}
{{    #   pyarrow}}
{{objsize==0.6.1}}
{{    # via apache-beam}}
{{orjson==3.9.2}}
{{    # via apache-beam}}
{{packaging==23.1}}
{{    # via pytest}}
{{pandas==2.0.3}}
{{    # via -r dev/dev-requirements.txt}}
{{pemja==0.3.0 ; platform_system != "Windows"}}
{{    # via -r dev/dev-requirements.txt}}
{{pluggy==1.2.0}}
{{    # via pytest}}
{{proto-plus==1.22.3}}
{{    # via apache-beam}}
{{protobuf==4.23.4}}
{{    # via}}
{{    #   -r dev/dev-requirements.txt}}
{{    #   apache-beam}}
{{    #   grpcio-tools}}
{{    #   proto-plus}}
{{py4j==0.10.9.7}}
{{    # via -r dev/dev-requirements.txt}}
{{pyarrow==11.0.0}}
{{    # via}}
{{    #   -r dev/dev-requirements.txt}}
{{    #   apache-beam}}
{{pydot==1.4.2}}
{{    # via apache-beam}}
{{pymongo==4.4.1}}
{{    # via apache-beam}}
{{pyparsing==3.1.1}}
{{    # via}}
{{    #   httplib2}}
{{    #   pydot}}
{{pytest==7.4.0}}
{{    # via -r dev/dev-requirements.txt}}
{{python-dateutil==2.8.2}}
{{    # via}}
{{    #   -r dev/dev-requirements.txt}}
{{    #   apache-beam}}
{{    #   pandas}}
{{pytz==2023.3}}
{{    # via}}
{{    #   -r dev/dev-requirements.txt}}
{{    #   apache-beam}}
{{    #   pandas}}
{{regex==2023.6.3}}
{{    # via apache-beam}}
{{requests==2.31.0}}
{{    # via}}
{{    #   apache-beam}}
{{    #   hdfs}}
{{six==1.16.0}}
{{    # via}}
{{    #   hdfs}}
{{    #   python-dateutil}}
{{tomli==2.0.1}}
{{    # via pytest}}
{{typing-extensions==4.7.1}}
{{    # via apache-beam}}
{{tzdata==2023.3}}
{{    # via pandas}}
{{urllib3==2.0.4}}
{{    # via requests}}
{{wheel==0.41.0}}
{{    # via -r dev/dev-requirements.txt}}
{{zstandard==0.21.0}}
{{    # via apache-beam}}
{{# The following packages are considered to be unsafe in a requirements file:}}
{{# pip}}
{{# setuptools}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to