zxs1633079383 commented on issue #858:
URL: https://github.com/apache/flink-agents/issues/858#issuecomment-4881187842
Hi, I would like to work on this and have a first PR ready.
I took a first look at the current metric path and the related work in #860
and #861. My current understanding is that embedding token metrics should
follow
the same production constraint as chat token metrics: record usage under the
request/action metric group that initiated the embedding call, not under a
mutable cached resource binding.
Proposed first PR scope:
- Add provider-neutral embedding token usage accounting for `promptTokens`
and
`totalTokens` when provider usage is available.
- Cover both Python and Java embedding model paths with unit tests.
- Keep providers that do not expose usage as no-op, so embedding calls do not
fail or invent token counts.
- Align the metric dimensions with the existing chat model token metrics
where
practical.
I noticed the related metric work in #860 and #861. I kept the first PR small
and based on current `main`, so it should be easy to review or rebase if the
chat request-scoped metric changes from #861 land first.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]