Re: [I] [DISCUSSION] Memory accounting model discussion [datafusion]

via GitHub Wed, 01 Oct 2025 08:44:54 -0700


debajyoti-truefoundry commented on issue #16841:
URL: https://github.com/apache/datafusion/issues/16841#issuecomment-3356999707


   Hello, 
   
   > Currently in DataFusion that is (1) tied to the CPU work width
   
   This is essentially `datafusion.execution.target_partitions`, right?
   
   >  has no way to account for the selectivity of the query in terms of rows 
or columns.
   
   Could you please elaborate on this? Should the outcome (selects 1 row out of 
10000 rows spread across 100 files) of the query matter here at all? In any 
scenario, regardless of the selectivity of the query, we would like to saturate 
the network bandwidth. Or is the problem particularly visible in selective 
queries as you described? If this is off topic, I could ask this on your PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] [DISCUSSION] Memory accounting model discussion [datafusion]

Reply via email to