Re: [I] [Feature] Track token usage metrics for embedding models [flink-agents]

via GitHub Sat, 04 Jul 2026 00:49:01 -0700


zxs1633079383 commented on issue #858:
URL: https://github.com/apache/flink-agents/issues/858#issuecomment-4881187842


   Hi, I would like to work on this and have a first PR ready.
   
   I took a first look at the current metric path and the related work in #860
   and #861. My current understanding is that embedding token metrics should 
follow
   the same production constraint as chat token metrics: record usage under the
   request/action metric group that initiated the embedding call, not under a
   mutable cached resource binding.
   
   Proposed first PR scope:
   
   - Add provider-neutral embedding token usage accounting for `promptTokens` 
and
     `totalTokens` when provider usage is available.
   - Cover both Python and Java embedding model paths with unit tests.
   - Keep providers that do not expose usage as no-op, so embedding calls do not
     fail or invent token counts.
   - Align the metric dimensions with the existing chat model token metrics 
where
     practical.
   
   I noticed the related metric work in #860 and #861. I kept the first PR small
   and based on current `main`, so it should be easy to review or rebase if the
   chat request-scoped metric changes from #861 land first.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Feature] Track token usage metrics for embedding models [flink-agents]

Reply via email to