Re: [PR] Support computing statistics for FileGroup [datafusion]

2025-04-05 Thread via GitHub
jayzhan211 commented on code in PR #15432: URL: https://github.com/apache/datafusion/pull/15432#discussion_r2016084730 ## datafusion/core/src/datasource/statistics.rs: ## @@ -145,7 +147,142 @@ pub async fn get_statistics_with_limit( Ok((result_files, statistics)) } -fn a

Re: [PR] Support computing statistics for FileGroup [datafusion]

2025-04-05 Thread via GitHub
xudong963 commented on PR #15432: URL: https://github.com/apache/datafusion/pull/15432#issuecomment-2768150465 Thanks for your review! Lets go -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Support computing statistics for FileGroup [datafusion]

2025-04-01 Thread via GitHub
alamb commented on PR #15432: URL: https://github.com/apache/datafusion/pull/15432#issuecomment-2770308752 For anyone else following along, this PR is part of a larger plan top optimize ORDER BY queries operating on pre-sorted inputs. See this ticket for more detail - https://github.com/

Re: [PR] Support computing statistics for FileGroup [datafusion]

2025-03-31 Thread via GitHub
xudong963 merged PR #15432: URL: https://github.com/apache/datafusion/pull/15432 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Support computing statistics for FileGroup [datafusion]

2025-03-31 Thread via GitHub
alamb commented on code in PR #15432: URL: https://github.com/apache/datafusion/pull/15432#discussion_r2021528103 ## datafusion/core/src/datasource/statistics.rs: ## @@ -217,3 +354,183 @@ fn set_min_if_lesser( _ => {} } } + +#[cfg(test)] +mod tests { +use supe

Re: [PR] Support computing statistics for FileGroup [datafusion]

2025-03-30 Thread via GitHub
xudong963 commented on PR #15432: URL: https://github.com/apache/datafusion/pull/15432#issuecomment-2764431052 > Would it be possible to add some unit tests for `compute_summary_statistics`? Something like: Thanks @alamb ! I'm cooking it -- This is an automated message from the Apa

Re: [PR] Support computing statistics for FileGroup [datafusion]

2025-03-29 Thread via GitHub
alamb commented on code in PR #15432: URL: https://github.com/apache/datafusion/pull/15432#discussion_r2019764664 ## datafusion/core/src/datasource/listing/table.rs: ## @@ -1181,6 +1175,92 @@ impl ListingTable { } } +/// Processes a stream of partitioned files and return

Re: [PR] Support computing statistics for FileGroup [datafusion]

2025-03-28 Thread via GitHub
xudong963 commented on code in PR #15432: URL: https://github.com/apache/datafusion/pull/15432#discussion_r2019681642 ## datafusion/core/src/datasource/statistics.rs: ## @@ -145,7 +147,142 @@ pub async fn get_statistics_with_limit( Ok((result_files, statistics)) } -fn ad

Re: [PR] Support computing statistics for FileGroup [datafusion]

2025-03-27 Thread via GitHub
xudong963 commented on code in PR #15432: URL: https://github.com/apache/datafusion/pull/15432#discussion_r2016725230 ## datafusion/core/src/datasource/listing/table.rs: ## @@ -1181,6 +1175,92 @@ impl ListingTable { } } +/// Processes a stream of partitioned files and re

Re: [PR] Support computing statistics for FileGroup [datafusion]

2025-03-27 Thread via GitHub
xudong963 commented on code in PR #15432: URL: https://github.com/apache/datafusion/pull/15432#discussion_r2016725230 ## datafusion/core/src/datasource/listing/table.rs: ## @@ -1181,6 +1175,92 @@ impl ListingTable { } } +/// Processes a stream of partitioned files and re

Re: [PR] Support computing statistics for FileGroup [datafusion]

2025-03-27 Thread via GitHub
xudong963 commented on code in PR #15432: URL: https://github.com/apache/datafusion/pull/15432#discussion_r2016664497 ## datafusion/core/src/datasource/statistics.rs: ## @@ -145,7 +147,142 @@ pub async fn get_statistics_with_limit( Ok((result_files, statistics)) } -fn ad

Re: [PR] Support computing statistics for FileGroup [datafusion]

2025-03-27 Thread via GitHub
xudong963 commented on code in PR #15432: URL: https://github.com/apache/datafusion/pull/15432#discussion_r2016651315 ## datafusion/core/src/datasource/statistics.rs: ## @@ -145,7 +147,142 @@ pub async fn get_statistics_with_limit( Ok((result_files, statistics)) } -fn ad

Re: [PR] Support computing statistics for FileGroup [datafusion]

2025-03-27 Thread via GitHub
xudong963 commented on code in PR #15432: URL: https://github.com/apache/datafusion/pull/15432#discussion_r201659 ## datafusion/core/src/datasource/statistics.rs: ## @@ -145,7 +147,142 @@ pub async fn get_statistics_with_limit( Ok((result_files, statistics)) } -fn ad

Re: [PR] Support computing statistics for FileGroup [datafusion]

2025-03-27 Thread via GitHub
berkaysynnada commented on code in PR #15432: URL: https://github.com/apache/datafusion/pull/15432#discussion_r2016335211 ## datafusion/core/src/datasource/listing/table.rs: ## @@ -1181,6 +1175,92 @@ impl ListingTable { } } +/// Processes a stream of partitioned files an

Re: [PR] Support computing statistics for FileGroup [datafusion]

2025-03-27 Thread via GitHub
jayzhan211 commented on code in PR #15432: URL: https://github.com/apache/datafusion/pull/15432#discussion_r2016078687 ## datafusion/core/src/datasource/statistics.rs: ## @@ -145,7 +147,142 @@ pub async fn get_statistics_with_limit( Ok((result_files, statistics)) } -fn a

Re: [PR] Support computing statistics for FileGroup [datafusion]

2025-03-27 Thread via GitHub
jayzhan211 commented on code in PR #15432: URL: https://github.com/apache/datafusion/pull/15432#discussion_r2016102166 ## datafusion/core/src/datasource/statistics.rs: ## @@ -145,7 +147,142 @@ pub async fn get_statistics_with_limit( Ok((result_files, statistics)) } -fn a

Re: [PR] Support computing statistics for FileGroup [datafusion]

2025-03-27 Thread via GitHub
jayzhan211 commented on code in PR #15432: URL: https://github.com/apache/datafusion/pull/15432#discussion_r2016070941 ## datafusion/core/src/datasource/statistics.rs: ## @@ -145,7 +147,142 @@ pub async fn get_statistics_with_limit( Ok((result_files, statistics)) } -fn a

Re: [PR] Support computing statistics for FileGroup [datafusion]

2025-03-26 Thread via GitHub
xudong963 commented on PR #15432: URL: https://github.com/apache/datafusion/pull/15432#issuecomment-2754452649 > when we create `FileGroups` to be less error-prone (accidentally get the incorrect statistics) Sorry, I don't get it. Why does adding a new `FileGroups` reduce error-prone?

Re: [PR] Support computing statistics for FileGroup [datafusion]

2025-03-26 Thread via GitHub
jayzhan211 commented on PR #15432: URL: https://github.com/apache/datafusion/pull/15432#issuecomment-2754133369 Do you think it is a good idea to add another `FileGroups` struct and compute the statistics across all the files when we create `FileGroups` to be less error-prone (accidentally

Re: [PR] Support computing statistics for FileGroup [datafusion]

2025-03-26 Thread via GitHub
xudong963 commented on code in PR #15432: URL: https://github.com/apache/datafusion/pull/15432#discussion_r2013679682 ## datafusion/core/src/datasource/statistics.rs: ## @@ -145,7 +147,142 @@ pub async fn get_statistics_with_limit( Ok((result_files, statistics)) } -fn ad