KaranShishodia opened a new issue, #371: URL: https://github.com/apache/flink-agents/issues/371
### Search before asking - [x] I searched in the [issues](https://github.com/apache/flink-agents/issues) and found nothing similar. ### Description When running the example workflow_single_agent_example.py from Flink Agents in YARN Application Cluster mode, the job fails with: ModuleNotFoundError: No module named 'encodings' ``` :contentReference[oaicite:2]{index=2} The error indicates that the embedded Python interpreter (via Pemja) cannot locate its standard-library modules (like the standard encodings module) in the supplied virtual environment archive. Important context: The same job runs successfully in a standalone Flink cluster mode — using exactly the same Python virtual environment and code. [Mail Archive](https://www.mail-archive.com/issues%40flink.apache.org/msg806809.html?utm_source=chatgpt.com) A standard PyFlink job (e.g. word_count.py) works fine in YARN mode, meaning that using PyFlink’s “normal” Python worker is okay. But using the embedded interpreter via Flink Agents + Pemja fails — pointing to a problem in how the archive-based environment is initialized under YARN. The failure happens during interpreter initialization inside Pemja: the “filesystem encoding” cannot be set because the core standard module encodings can’t be found. ### How to reproduce ./flink-1.20.3/bin/flink run-application -t yarn-application \ -Dcontainerized.master.env.JAVA_HOME=/usr/lib/jvm/jre-11 \ -Dcontainerized.taskmanager.env.JAVA_HOME=/usr/lib/jvm/jre-11 \ -Djobmanager.memory.process.size=1024m \ -Dcontainerized.taskmanager.env.PYTHONHOME=venv.tar.gz \ -Dtaskmanager.memory.process.size=1024m \ -Dyarn.application.name=flink-agents-workflow \ -Dyarn.ship-files=./shipfiles \ -pyarch shipfiles/venv.tar.gz \ -pyclientexec venv.tar.gz/bin/python \ -pyexec venv.tar.gz/bin/python \ -pyfs shipfiles \ -pym workflow_single_agent_example Where: venv.tar.gz is the archived Python virtual environment (containing the interpreter, stdlib, installed dependencies). (https://www.mail-archive.com/issues%40flink.apache.org/msg806809.html?utm_source=chatgpt.com) The archive is shipped to the cluster via -pyarch, and the job tries to use the embedded interpreter inside that archive. (https://www.mail-archive.com/issues%40flink.apache.org/msg806809.html?utm_source=chatgpt.com) No system-Python is assumed on the YARN NodeManager machines — everything must come from the archive. (https://www.mail-archive.com/issues%40flink.apache.org/msg806809.html?utm_source=chatgpt.com) The reported log from the failure (in taskmanager.log) includes lines like: sys.executable = '/.../venv.tar.gz/bin/python' sys.prefix = 'venv.tar.gz' sys.path = [ 'venv.tar.gz/lib/python310.zip', 'venv.tar.gz/lib/python3.10', 'venv.tar.gz/lib/python3.10/lib-dynload', ] Fatal Python error: init_fs_encoding: failed to get the Python codec of the filesystem encoding ModuleNotFoundError: No module named 'encodings' ``` :contentReference[oaicite:12]{index=12} According to the issue comments, removing the manual `PYTHONHOME` setting does *not* prevent the error — so this seems inherent to how Pemja + archive-based venv are initialized under YARN. :contentReference[oaicite:13]{index=13} ### Version and environment ## 📦 Version and Environment Details From the issue report: :contentReference[oaicite:14]{index=14} | Component | Version / Setting | |-----------|------------------| | Flink (Flink cluster) | **1.20.3** :contentReference[oaicite:15]{index=15} | | Flink Agents version | **0.1.0** :contentReference[oaicite:16]{index=16} | | Deployment mode | **YARN Application Cluster mode** (`-t yarn-application`) :contentReference[oaicite:17]{index=17} | | Python version (inside virtualenv) | **Python 3.10** :contentReference[oaicite:18]{index=18} | | Python virtual environment packaging | Virtual environment archived as `venv.tar.gz` (Conda-created or venv-based) and shipped to cluster. | | YARN setup assumptions | NodeManager machines do *not* have a system-wide Python; all Python interpreter and dependencies are from the shipped archive. :contentReference[oaicite:20]{index=20} | --- If you like, I can **pull up full raw logs from the issue** (including full stack trace) and **link to them** — might help if you are preparing a patch or creating a reproducible test. ::contentReference[oaicite:21]{index=21} ### Are you willing to submit a PR? - [ ] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
