berkaysynnada commented on PR #15503: URL: https://github.com/apache/datafusion/pull/15503#issuecomment-2785682160
> What do you think? I'm still thinking we should unify the API's. We've discussed the issue with our team, and got a common opinion: If I give an example from my concerns, there will be many duplications like this: ```rust fn statistics(&self) -> Result<Statistics> { let stats = Self::statistics_helper( self.schema(), self.input().statistics()?, self.predicate(), self.default_selectivity, )?; Ok(stats.project(self.projection.as_ref())) } fn statistics_by_partition(&self) -> Result<PartitionedStatistics> { let input_stats = self.input.statistics_by_partition()?; let stats: Result<Vec<Arc<Statistics>>> = input_stats .iter() .map(|stat| { let stat = Self::statistics_helper( self.schema(), stat.clone(), self.predicate(), self.default_selectivity, ) .map(|stat| stat.project(self.projection.as_ref()))?; Ok(Arc::new(stat)) }) .collect(); Ok(PartitionedStatistics::new(stats?)) } ``` There are/will be duplications of statistics() logics in each operator like this, because the calculations are the same, whether the stats are coming from the whole table or just for one partition. We can avoid the duplications and write efficient functional statistics() implementations if we adopt ```rust fn statistics(&self, partition: Option<usize>) -> Result<Statistics> ``` style. So, that alternative wins clearly for me against other alternatives. It also does not modify the other existing structs/API's, and propose an extensible way while enabling the statistics access over any partition. TLDR, updating the API as `fn statistics(&self, partition: Option<usize>) -> Result<Statistics>` has a minimal change, doesn't force us to follow an immature design path, reduce duplications, and enables partition-based stats access, that's the main goal. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org