2010YOUY01 commented on issue #16244: URL: https://github.com/apache/datafusion/issues/16244#issuecomment-2938981804
> I'd be interested in working on this, but I might need a little guidance since I'm new to the project. Thank you! Here are some additional info Each operator holds a `BaselineMetrics` inside for common metrics like `output_rows`, and this new `output_bytes` should also belong to `BaselineMetrics`. When the operator output one batch, it will call `record_poll()` to update the `BaselineMetrics` https://github.com/apache/datafusion/blob/992d156c46f6ad4f0096c4a62b293cabef63718d/datafusion/physical-plan/src/metrics/baseline.rs#L123 So I think the implementation would be adding a new field `output_bytes` (and other useful structures) into `BaselineMetrics`, and then update them inside `record_poll()`, the tricky part would be avoiding double-counting array buffers I mentioned above. To see the expected result, run a `explain analyze` query using `datafusion-cli`, and this new metrics should show up inside `metrics` ``` > explain analyze select * from generate_series(1, 1000000) as t1(v1) order by v1 desc; +-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | plan_type | plan | +-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Plan with Metrics | SortExec: expr=[v1@0 DESC], preserve_partitioning=[false], metrics=[output_rows=1000000, elapsed_compute=91.856373ms, spill_count=0, spilled_bytes=0.0 B, spilled_rows=0] | | | ProjectionExec: expr=[value@0 as v1], metrics=[output_rows=1000000, elapsed_compute=14.702µs] | | | LazyMemoryExec: partitions=1, batch_generators=[generate_series: start=1, end=1000000, batch_size=8192], metrics=[] | | | | +-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row(s) fetched. Elapsed 0.039 seconds. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org