andygrove opened a new issue, #1430: URL: https://github.com/apache/datafusion-comet/issues/1430
### What is the problem the feature request solves? This is a follow on issue based on discussions in https://github.com/apache/datafusion-comet/pull/1424. When choosing the smaller side of a join to use for the build-side, we just use the total table size based on the `sizeInBytes` that was computed in a completed query stage. We can make some improvements to this approach: - Calculate the resulting hash table size based on the join keys and the columns from the table that will be used in the join. We can compute size based on rowCount * sum(estimated size of each column). - In cases where the input is now a completed query stage, we can look at the HadoopFsRelation contained by the LogicalRelation. From this, we can can sizeInBytes and infer a row count based on this and the estimated schema size ### Describe the potential solution _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org