shaheeramjad opened a new issue, #37515:
URL: https://github.com/apache/beam/issues/37515

   When a pip subprocess fails and the command is passed as a short list (e.g. 
`[python, '-m', 'pip', 'install', 'pkg']`), the exception handler in 
`sdks/python/apache_beam/utils/processes.py` raises **IndexError** instead of 
the intended RuntimeError with traceback and pip output.
   
   ### Root cause
   
   The pip-specific branch in `call`, `check_call`, and `check_output` uses a 
**hardcoded index 6** for the "package name" when formatting the error message:
   
   ```python
   if isinstance(args, tuple) and (args[0][2] == "pip"):
     raise RuntimeError(
       "Full traceback: {}\n Pip install failed for package: {} \n Output from 
execution of subprocess: {}"
       .format(traceback.format_exc(), args[0][6], error.output)) from error
   ```
   
   - For `['python', '-m', 'pip', 'install', 'somepkg']` the list has only 5 
elements (indices 0–4), so **`args[0][6]` raises IndexError**.
   - The "friendly" pip error path is never shown; users see an IndexError 
instead.
   
   ### Additional problem
   
   Even when index 6 exists (e.g. stager’s `pip download -r requirements_file` 
with many args), that index may not be a package name (e.g. it can be 
`--find-links`). The message "Pip install failed for package: --find-links" is 
misleading.
   
   ## Steps to reproduce
   
   1. Use `apache_beam.utils.processes.check_call` (or `check_output` / `call`) 
with a short pip command that fails:
   
   ```python
   from apache_beam.utils import processes
   
   # Short pip command (5 elements) that will fail (nonexistent package)
   cmd = ['python', '-m', 'pip', 'install', 'nonexistent-package-xyz']
   processes.check_call(cmd)
   ```
   
   2. When pip fails (e.g. package not found), the code hits the pip branch and 
formats the message with `args[0][6]`.
   3. **Actual:** `IndexError: list index out of range` (index 6 does not 
exist).
   4. **Expected:** A `RuntimeError` whose message includes the full traceback 
and pip subprocess output (no IndexError).
   
   ## Expected behavior
   
   - When a pip subprocess fails, the code should always raise a 
**RuntimeError** (with `from error`) whose message includes:
     - The full traceback
     - Useful context (e.g. that it was a pip failure; package name only when 
it can be determined safely)
     - The subprocess output (`error.output`)
   - No **IndexError** should occur regardless of the length or shape of the 
command list.
   
   ## Actual behavior
   
   - For short pip commands (e.g. `pip install <pkg>`), **IndexError** is 
raised when building the error message, so the intended RuntimeError is never 
shown.
   - For some longer pip commands, the message can show a wrong "package" (e.g. 
an option like `--find-links`) because index 6 is assumed to be the package 
name.
   
   ## Affected code
   
   - **File:** `sdks/python/apache_beam/utils/processes.py`
   - **Functions:** `call`, `check_call`, `check_output` (pip branch in each, 
e.g. lines 55–59, 74–78, 93–97)
   - **Relevant line:** `.format(traceback.format_exc(), args[0][6], 
error.output)` — `args[0][6]` is unsafe.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to