KaranShishodia opened a new issue, #371:
URL: https://github.com/apache/flink-agents/issues/371

   ### Search before asking
   
   - [x] I searched in the 
[issues](https://github.com/apache/flink-agents/issues) and found nothing 
similar.
   
   ### Description
   
   When running the example workflow_single_agent_example.py from Flink Agents 
in YARN Application Cluster mode, the job fails with:
   
   
   ModuleNotFoundError: No module named 'encodings'
   ``` :contentReference[oaicite:2]{index=2}  
   
   
   The error indicates that the embedded Python interpreter (via Pemja) cannot 
locate its standard-library modules (like the standard encodings module) in the 
supplied virtual environment archive. 
   
   Important context:
   
   The same job runs successfully in a standalone Flink cluster mode — using 
exactly the same Python virtual environment and code. 
   [Mail 
Archive](https://www.mail-archive.com/issues%40flink.apache.org/msg806809.html?utm_source=chatgpt.com)
   
   A standard PyFlink job (e.g. word_count.py) works fine in YARN mode, meaning 
that using PyFlink’s “normal” Python worker is okay. But using the embedded 
interpreter via Flink Agents + Pemja fails — pointing to a problem in how the 
archive-based environment is initialized under YARN. 
   
   
   The failure happens during interpreter initialization inside Pemja: the 
“filesystem encoding” cannot be set because the core standard module encodings 
can’t be found. 
   
   
   ### How to reproduce
   
   ./flink-1.20.3/bin/flink run-application -t yarn-application \
     -Dcontainerized.master.env.JAVA_HOME=/usr/lib/jvm/jre-11 \
     -Dcontainerized.taskmanager.env.JAVA_HOME=/usr/lib/jvm/jre-11 \
     -Djobmanager.memory.process.size=1024m \
     -Dcontainerized.taskmanager.env.PYTHONHOME=venv.tar.gz \
     -Dtaskmanager.memory.process.size=1024m \
     -Dyarn.application.name=flink-agents-workflow \
     -Dyarn.ship-files=./shipfiles \
     -pyarch shipfiles/venv.tar.gz \
     -pyclientexec venv.tar.gz/bin/python \
     -pyexec venv.tar.gz/bin/python \
     -pyfs shipfiles \
     -pym workflow_single_agent_example
   
   
   Where:
   
   venv.tar.gz is the archived Python virtual environment (containing the 
interpreter, stdlib, installed dependencies). 
   
(https://www.mail-archive.com/issues%40flink.apache.org/msg806809.html?utm_source=chatgpt.com)
   
   The archive is shipped to the cluster via -pyarch, and the job tries to use 
the embedded interpreter inside that archive. 
   
(https://www.mail-archive.com/issues%40flink.apache.org/msg806809.html?utm_source=chatgpt.com)
   
   No system-Python is assumed on the YARN NodeManager machines — everything 
must come from the archive. 
   
(https://www.mail-archive.com/issues%40flink.apache.org/msg806809.html?utm_source=chatgpt.com)
   
   The reported log from the failure (in taskmanager.log) includes lines like:
   
   sys.executable = '/.../venv.tar.gz/bin/python'
   sys.prefix = 'venv.tar.gz'
   sys.path = [
     'venv.tar.gz/lib/python310.zip',
     'venv.tar.gz/lib/python3.10',
     'venv.tar.gz/lib/python3.10/lib-dynload',
   ]
   Fatal Python error: init_fs_encoding: failed to get the Python codec of the 
filesystem encoding
   ModuleNotFoundError: No module named 'encodings'
   ``` :contentReference[oaicite:12]{index=12}
   
   According to the issue comments, removing the manual `PYTHONHOME` setting 
does *not* prevent the error — so this seems inherent to how Pemja + 
archive-based venv are initialized under YARN. 
:contentReference[oaicite:13]{index=13}
   
   
   
   ### Version and environment
   
   
   ## 📦 Version and Environment Details
   
   From the issue report: :contentReference[oaicite:14]{index=14}
   
   | Component | Version / Setting |
   |-----------|------------------|
   | Flink (Flink cluster) | **1.20.3** :contentReference[oaicite:15]{index=15} 
|
   | Flink Agents version | **0.1.0** :contentReference[oaicite:16]{index=16} |
   | Deployment mode | **YARN Application Cluster mode** (`-t 
yarn-application`) :contentReference[oaicite:17]{index=17} |
   | Python version (inside virtualenv) | **Python 3.10** 
:contentReference[oaicite:18]{index=18} |
   | Python virtual environment packaging | Virtual environment archived as 
`venv.tar.gz` (Conda-created or venv-based) and shipped to cluster.  |
   | YARN setup assumptions | NodeManager machines do *not* have a system-wide 
Python; all Python interpreter and dependencies are from the shipped archive. 
:contentReference[oaicite:20]{index=20} |
   
   ---
   
   If you like, I can **pull up full raw logs from the issue** (including full 
stack trace) and **link to them** — might help if you are preparing a patch or 
creating a reproducible test.
   ::contentReference[oaicite:21]{index=21}
   
   ### Are you willing to submit a PR?
   
   - [ ] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to