[I] Reduce metrics collection overhead [datafusion-comet]

via GitHub Fri, 18 Oct 2024 10:15:24 -0700


mbutrovich opened a new issue, #1024:
URL: https://github.com/apache/datafusion-comet/issues/1024


   ### What is the problem the feature request solves?
   
   Running TPC-H locally, I see >3% of on-CPU time spent in 
`comet::execution::metrics::utils::update_comet_metric`. This function appears 
to be called whenever native execution wakes up in the polling loop, typically 
to produce a batch. Starting from the root of the plan, its behavior is:
   
   - For every metric in the node:
      - JNI call to allocate a string for the metric's name
      - JNI call to update the metric
   - For every child in the node:
      - JNI call to fetch child metric node, and recursively call this function
   
   ### Describe the potential solution
   
   There are a few things to explore:
   1. Does reducing the granularity of metrics updates affect the correctness 
of these metrics? If not, we could update metrics less frequently.
   2. Can we eliminate the overhead of repeatedly allocating strings via JNI 
for every metric? This is ~1% of the total on-CPU time alone.
   3. Can we update an entire node's metrics with a single JNI call, rather 
than a JNI call for each metric?
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Reduce metrics collection overhead [datafusion-comet]

Reply via email to