wenjin272 commented on PR #667: URL: https://github.com/apache/flink-agents/pull/667#issuecomment-4457592067
> The one CI failure (`it-python [java-17] [python-3.12] [flink-2.1]`) is the known `test_react_agent_on_local_runner` LLM flake against Ollama `qwen3:1.7b`, not caused by this PR: > > ``` > FAILED flink_agents/e2e_tests/e2e_tests_integration/react_agent_test.py::test_react_agent_on_local_runner > - assert 432596736 == 1386528 > ``` > > The test expects `4444 × 312 = 1386528`, but the LLM made an extra unnecessary `multiply(1386528, 312)` call and returned `432596736`. The test source has a comment right next to the assertion: _"This may be caused by the LLM response does not match the output schema, you can rerun this case."_ > > This same failure (same exact numbers, `432596736 == 1386528`) is currently failing on `main` at `b38ae21` — the commit this PR is rebased onto — and on several other recent main-branch runs. Failure runs through the Python `local_runner`, which logs `"Local runner does not support durable execution; recovery is not available."` — the Java `DurableExecutionManager` / `ActionExecutionOperator` paths changed by this PR are never exercised. > > Will re-run CI. I believe we need to polish the stability and observability of CI in version 0.4. If you encounter any unstable cases, please contact me to rerun them. I now have the permission to rerun failed CI jobs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
