BlakeOrth opened a new pull request, #18045:
URL: https://github.com/apache/datafusion/pull/18045

   
   ## Which issue does this PR close?
   
   This does not fully close, but is an incremental building block component 
for: 
    - https://github.com/apache/datafusion/issues/17207
   
   The full context of how this code is likely to progress can be seen in the 
POC for this effort:
    - https://github.com/apache/datafusion/pull/17266
   
   ## Rationale for this change
   
   For particularly large requests, in terms of number of objects in a table or 
large objects, the number of operations for a query may be quite large. In 
these cases, understanding the aggregate impact of various object store 
operations is likely the best way to understand the impact those operations had 
on a particular query. This PR allows users of an instrumented object store to 
understand and display basic summary statistics related to the `RequestDetails` 
collected during a query.
   
   ## What changes are included in this PR?
   
    - Adds a `RequestSummary` type for the instrumented object store to display 
summary statistics about instrumented requests
    - Adds a generic Stats type to track the statistics for the summary
    - Adds tests for the new code
    - Adds a basic summary output to the user-facing display when profiling is 
enabled
    - Adds docs for new and newly exported public items
   
   ## Are these changes tested?
   
   Yes. The new functionality has tests implemented, aside from testing the 
actual display output. The functional output can be seen below:
   
   ```sql
   DataFusion CLI v50.1.0
   > \object_store_profiling enabled
   ObjectStore Profile mode set to Enabled
   > CREATE EXTERNAL TABLE hits
   STORED AS PARQUET
   LOCATION 
'https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_1.parquet';
   0 row(s) fetched.
   Elapsed 0.268 seconds.
   
   Object Store Profiling
   Instrumented Object Store: instrument_mode: Enabled, inner: HttpStore
   2025-10-13T22:15:50.518465131+00:00 operation=Get duration=0.030742s size=8 
range: bytes=174965036-174965043 
path=hits_compatible/athena_partitioned/hits_1.parquet
   2025-10-13T22:15:50.549263341+00:00 operation=Get duration=0.033060s 
size=34322 range: bytes=174930714-174965035 
path=hits_compatible/athena_partitioned/hits_1.parquet
   
   Summaries:
   Get
   count: 2
   duration min: 0.030742s
   duration max: 0.033060s
   duration avg: 0.031901s
   size min: 8 B
   size max: 34322 B
   size avg: 17165 B
   size sum: 34330 B
   
   >
   ```
   
   ## Are there any user-facing changes?
   
   Yes? Just like the previous PR this does change the user-facing output, but 
there's no API breaking changes.
   
   cc @alamb 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to