Dian Fu created FLINK-28114: ------------------------------- Summary: The path of the Python client interpreter could not point to an archive file in distributed file system Key: FLINK-28114 URL: https://issues.apache.org/jira/browse/FLINK-28114 Project: Flink Issue Type: Bug Components: API / Python Reporter: Dian Fu Fix For: 1.16.0, 1.15.1
See https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonEnvUtils.java#L178 for more details about this limitation. Users could execute PyFlink jobs in YARN application mode as following: {code} ./bin/flink run-application -t yarn-application \ -Djobmanager.memory.process.size=1024m \ -Dtaskmanager.memory.process.size=1024m \ -Dyarn.application.name=<ApplicationName> \ -Dyarn.ship-files=/path/to/shipfiles \ -pyarch shipfiles/venv.zip \ -pyclientexec venv.zip/venv/bin/python3 \ -pyexec venv.zip/venv/bin/python3 \ -py shipfiles/word_count.py {code} In the above case, venv.zip will be distributed to the TMs via Flink blob server. However, blob server doesn't support files with size exceeding of 2GB. See https://github.com/apache/flink/blob/ea52732dc48a4f1c5be0925890cd8aa1ea2a11ed/flink-runtime/src/main/java/org/apache/flink/runtime/blob/BlobServerConnection.java#L223 for more details. This is very serious problem as Python users usually tend to install a lot Python libraries inside the venv.zip and some Python libraries are very large. -- This message was sent by Atlassian Jira (v8.20.7#820007)