weiqingy commented on PR #548:
URL: https://github.com/apache/flink-agents/pull/548#issuecomment-3982608733
I checked the CI failures - both are LLM-dependent e2e tests and don’t
appear to be caused by this PR.
Test 1 (react_agent_test): The output 4444 = 2123 + 2321 proves our
ResourceCache IS working correctly — the chat model was resolved, the add tool
was resolved and called successfully. The LLM (qwen3:1.7b) simply stopped after
one tool call instead of continuing to call multiply(4444, 312). This is LLM
non-determinism.
Test 2 (long_term_memory_test): This runs on the Flink remote runner, where
there's exactly ONE FlinkRunnerContext with ONE ResourceCache. The behavior is
identical to before. The failure is assert len(doc) == 1 after LLM-based
compaction using qwen3:8b — if the model's summarization response is malformed,
compaction produces incorrect output.
We can re-run CI to confirm flakiness — if it fails again with different
assertion values, that would further support LLM non-determinism.
@wenjin272 do you have access to re-run the CI tests? It looks like admin
rights are required.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]