Re: [I] [Tech Debt] Evaluate deprecating Python local_runner / local_execution_environment [flink-agents]

via GitHub Sat, 13 Jun 2026 22:47:52 -0700


weiqingy commented on issue #829:
URL: https://github.com/apache/flink-agents/issues/829#issuecomment-4700842201


   I dug into the current state to scope this out. Sharing a neutral audit so 
we can decide whether it's worth doing and at what priority — no strong opinion 
from me on the call itself.
   
   ## What the local path actually is
   
   The local vs remote choice is a single switch in 
`get_execution_environment(env=None)` (`api/execution_environment.py:132-135`): 
no env → `LocalExecutionEnvironment`, any Flink env → 
`RemoteExecutionEnvironment`.
   
   Behind `env=None` sits ~823 LOC of pure-Python runtime that re-implements 
execution semantics the production path gets from the Java operator: 
`local_runner.py` (387 LOC — the dispatch loop, keyed context, manual async 
driving, and the only `action.exec` call in the Python tree), 
`local_execution_environment.py` (191 LOC), and `local_memory_object.py` (245 
LOC).
   
   ## The maintenance tax, concretely
   
   The local path exposes a different IO/test API than remote: `from_list` / 
`to_list` are local-only (remote raises `NotImplementedError`), 
`from_datastream` / `to_table` are remote-only, and there's a bespoke 
`get_tool_request_events()` hook on the local builder with no remote 
counterpart. That divergence is exactly why #722's structured tool-invocation 
assertions had to be carried through both paths.
   
   Three capabilities are already documented local-only no-ops: long-term 
memory (`NotImplementedError`), metrics (returns `None`), and durable execution 
/ recovery (stubbed).
   
   ## Who depends on it (full audit)
   
   Two separable things ship under "the local path": the execution environment 
(`LocalExecutionEnvironment` / `LocalRunner`) and `LocalMemoryObject` (a 
245-LOC in-memory `MemoryObject` that several tests use purely as a fixture, 
independent of the runner). Worth keeping that split in mind — removing the 
runner doesn't force removing the in-memory memory object.
   
   Execution-path consumers (`env=None` → `LocalRunner`):
   
   - 3 unit tests that exist only to validate the local execution env itself 
(`test_local_execution_environment.py`, `test_local_runner_cross_language.py`, 
`test_local_runner_reconcilable.py`) — deleted, not migrated.
   - 4 harness unit tests using it as a fast in-process harness 
(`test_built_in_actions.py`, `test_runner_context_execute.py`, 
`test_get_resource_in_action.py`, `test_loader.py`).
   - 4 integration/e2e tests (`chat_model_integration_test.py`, 
`workflow_test.py`, one of three tests in `react_agent_test.py`, 
`mcp_test.py`), with `e2e_tests/test_utils.py` adapting the `LocalRunner` 
tool-capture hook.
   - 1 example (`rag_agent_example.py`) — the only example on the local path; 
all five quickstarts already use the remote form.
   - 1 production call site: `ReActAgent` calls the no-arg form as its default.
   
   `LocalMemoryObject`-only consumers (fixture use, would survive removing the 
runner): `test_local_memory_object.py` (local-specific, delete), 
`test_memory_reference.py`, `test_chat_model_action.py`.
   
   User-facing surface: 3 docs (`deployment.md`'s "Run with Test Data" section, 
`configuration.md`, `long_term_memory.md`) plus the `ReActAgent` docstring. The 
no-arg "run without Flink" path is documented public behavior, so deprecating 
it is a user-visible change, not pure internal cleanup.
   
   ## Migration feasibility
   
   The remote path already runs on an embedded pyflink MiniCluster 
(`LocalStreamEnvironment`) — there's no explicit `MiniCluster` anywhere in the 
Python tree; it's what `env=None`-less `get_execution_environment()` resolves 
to under pyflink.
   
   But migration isn't a mechanical swap: `from_list` / `to_list` have no 
MiniCluster equivalent (remote raises `NotImplementedError`), so the harness 
tests and the local example would need rewriting onto `from_datastream` / 
`from_table` + source/sink, and bounded sources are limited on the Flink path. 
It also shifts CI cost — the `ut-python` job is JVM-free today while 
`it-python` builds the dist JARs and runs Ollama; moving the harness unit tests 
onto MiniCluster pulls JVM boot + dist-JAR classload into the fast lane.
   
   On parity, it's worth being precise: Java's `AgentsExecutionEnvironment` 
exposes the same no-arg `getExecutionEnvironment()` in its API and Javadoc ("a 
local execution environment is returned for testing and development"), but it's 
implemented as 
`Class.forName("org.apache.flink.agents.runtime.env.LocalExecutionEnvironment")`
 on a class that doesn't exist in the tree (the `env/` package ships only 
`RemoteExecutionEnvironment`), so calling it throws at runtime. No Java code 
calls it — every Java test and example uses a real 
`StreamExecutionEnvironment`. So Python today has the only *working* in-process 
local runtime; deprecating it would align Python with Java's de-facto 
remote-only reality. (Separately, that dead Java no-arg stub looks like its own 
small cleanup.)
   
   ## The open question I think gates priority
   
   I couldn't find any in-repo measurement of what the local path actually 
saves — no benchmarks, no CI timing budget. The "it keeps unit tests fast" 
rationale is structural but unmeasured. Whether carrying ~823 LOC plus the 
dual-API divergence is worth it really hinges on how much wall-clock the 
JVM-free lane saves versus a MiniCluster harness, which is a number the CI 
history can probably answer better than a static read can.
   
   ## Options, roughly
   
   - Keep as-is — accept the dual-path tax, zero disruption.
   - Deprecate + migrate — warn on `env=None` in one release, remove in a later 
one; migrate the harness/integration tests + example + docs to a MiniCluster 
harness and delete the three local-specific tests. Needs a 
`from_list`/`to_list`-equivalent bounded-source test helper (or test rewrites).
   - Shrink rather than remove — keep a thinner in-process harness for fast 
unit tests but route real semantics (LTM / metrics / durable-exec / 
tool-events) through shared code so the API divergence narrows. Smaller blast 
radius, keeps fast tests, still pays some duplication.
   
   I'll leave the direction and priority to the community — happy to take any 
of these forward if there's appetite.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Tech Debt] Evaluate deprecating Python local_runner / local_execution_environment [flink-agents]

Reply via email to