2010YOUY01 commented on issue #16244:
URL: https://github.com/apache/datafusion/issues/16244#issuecomment-2938981804

   > I'd be interested in working on this, but I might need a little guidance 
since I'm new to the project.
   
   Thank you! Here are some additional info
   
   Each operator holds a `BaselineMetrics` inside for common metrics like 
`output_rows`, and this new `output_bytes` should also belong to 
`BaselineMetrics`.
   When the operator output one batch, it will call `record_poll()` to update 
the `BaselineMetrics`
   
https://github.com/apache/datafusion/blob/992d156c46f6ad4f0096c4a62b293cabef63718d/datafusion/physical-plan/src/metrics/baseline.rs#L123
   So I think the implementation would be adding a new field `output_bytes` 
(and other useful structures) into `BaselineMetrics`, and then update them 
inside `record_poll()`, the tricky part would be avoiding double-counting array 
buffers I mentioned above.
   
   To see the expected result, run a `explain analyze` query using 
`datafusion-cli`, and this new metrics should show up inside `metrics`
   ```
   > explain analyze select * from generate_series(1, 1000000) as t1(v1) order 
by v1 desc;
   
+-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | plan_type         | plan                                                   
                                                                                
                                   |
   
+-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | Plan with Metrics | SortExec: expr=[v1@0 DESC], 
preserve_partitioning=[false], metrics=[output_rows=1000000, 
elapsed_compute=91.856373ms, spill_count=0, spilled_bytes=0.0 B, 
spilled_rows=0] |
   |                   |   ProjectionExec: expr=[value@0 as v1], 
metrics=[output_rows=1000000, elapsed_compute=14.702µs]                         
                                                  |
   |                   |     LazyMemoryExec: partitions=1, 
batch_generators=[generate_series: start=1, end=1000000, batch_size=8192], 
metrics=[]                                                   |
   |                   |                                                        
                                                                                
                                   |
   
+-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   1 row(s) fetched.
   Elapsed 0.039 seconds.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to