Re: [PR] Add Python bindings for accessing ExecutionMetrics [datafusion-python]

via GitHub Thu, 19 Feb 2026 21:59:03 -0800


ShreyeshArangath commented on PR #1381:
URL: 
https://github.com/apache/datafusion-python/pull/1381#issuecomment-3931838295


   Thanks for the suggestion! Initially when I designed the change, I did 
consider moving `collect()` / `execute_*()` onto plan object. The reason I 
didn’t go that route was more about how observability fits into real usage 
patterns (from the cases that I have seen).
   
   Today, I think the users naturally treat a dataframe as the primary handle 
for a query:
   ```py
   df = ctx.sql("SELECT * FROM t WHERE column1 > 1")
   batches = df.collect()
   ```
   
   Requiring metrics to go through ExecutionPlan would effectively change the 
model to look something like so
   ```py
   df = ctx.sql("SELECT * FROM t WHERE column1 > 1")
   plan = df.execution_plan()
   batches = plan.collect()
   metrics = plan.collect_metrics()
   ```
   
   I thought that this would require users to restructure pipelines and thread 
a plan object through call chains purely to have access to metrics. The LoE 
required to get people to use it seemed high to me. 
   
   My goal was to make minimal changes to how users can add support for metrics 
without changing how they run queries
   ```py
   df = ctx.sql("SELECT * FROM t WHERE column1 > 1")
   batches = df.collect()
   plan = df.execution_plan()
   metrics = plan.collect_metrics()
   ```
   
   I’m happy to switch to the plan-based approach if we prefer stronger 
alignment with the upstream API, but I leaned toward this design to make 
observability easier to adopt without disrupting current usage patterns — lmk 
what you think


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add Python bindings for accessing ExecutionMetrics [datafusion-python]

Reply via email to