debajyoti-truefoundry commented on issue #16841: URL: https://github.com/apache/datafusion/issues/16841#issuecomment-3356999707
Hello, > Currently in DataFusion that is (1) tied to the CPU work width This is essentially `datafusion.execution.target_partitions`, right? > has no way to account for the selectivity of the query in terms of rows or columns. Could you please elaborate on this? Should the outcome (selects 1 row out of 10000 rows spread across 100 files) of the query matter here at all? In any scenario, regardless of the selectivity of the query, we would like to saturate the network bandwidth. Or is the problem particularly visible in selective queries as you described? If this is off topic, I could ask this on your PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
