jayzhan211 commented on code in PR #15432:
URL: https://github.com/apache/datafusion/pull/15432#discussion_r2016084730
##
datafusion/core/src/datasource/statistics.rs:
##
@@ -145,7 +147,142 @@ pub async fn get_statistics_with_limit(
Ok((result_files, statistics))
}
-fn a
xudong963 commented on PR #15432:
URL: https://github.com/apache/datafusion/pull/15432#issuecomment-2768150465
Thanks for your review! Lets go
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
alamb commented on PR #15432:
URL: https://github.com/apache/datafusion/pull/15432#issuecomment-2770308752
For anyone else following along, this PR is part of a larger plan top
optimize ORDER BY queries operating on pre-sorted inputs. See this ticket for
more detail
- https://github.com/
xudong963 merged PR #15432:
URL: https://github.com/apache/datafusion/pull/15432
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@data
alamb commented on code in PR #15432:
URL: https://github.com/apache/datafusion/pull/15432#discussion_r2021528103
##
datafusion/core/src/datasource/statistics.rs:
##
@@ -217,3 +354,183 @@ fn set_min_if_lesser(
_ => {}
}
}
+
+#[cfg(test)]
+mod tests {
+use supe
xudong963 commented on PR #15432:
URL: https://github.com/apache/datafusion/pull/15432#issuecomment-2764431052
> Would it be possible to add some unit tests for
`compute_summary_statistics`? Something like:
Thanks @alamb ! I'm cooking it
--
This is an automated message from the Apa
alamb commented on code in PR #15432:
URL: https://github.com/apache/datafusion/pull/15432#discussion_r2019764664
##
datafusion/core/src/datasource/listing/table.rs:
##
@@ -1181,6 +1175,92 @@ impl ListingTable {
}
}
+/// Processes a stream of partitioned files and return
xudong963 commented on code in PR #15432:
URL: https://github.com/apache/datafusion/pull/15432#discussion_r2019681642
##
datafusion/core/src/datasource/statistics.rs:
##
@@ -145,7 +147,142 @@ pub async fn get_statistics_with_limit(
Ok((result_files, statistics))
}
-fn ad
xudong963 commented on code in PR #15432:
URL: https://github.com/apache/datafusion/pull/15432#discussion_r2016725230
##
datafusion/core/src/datasource/listing/table.rs:
##
@@ -1181,6 +1175,92 @@ impl ListingTable {
}
}
+/// Processes a stream of partitioned files and re
xudong963 commented on code in PR #15432:
URL: https://github.com/apache/datafusion/pull/15432#discussion_r2016725230
##
datafusion/core/src/datasource/listing/table.rs:
##
@@ -1181,6 +1175,92 @@ impl ListingTable {
}
}
+/// Processes a stream of partitioned files and re
xudong963 commented on code in PR #15432:
URL: https://github.com/apache/datafusion/pull/15432#discussion_r2016664497
##
datafusion/core/src/datasource/statistics.rs:
##
@@ -145,7 +147,142 @@ pub async fn get_statistics_with_limit(
Ok((result_files, statistics))
}
-fn ad
xudong963 commented on code in PR #15432:
URL: https://github.com/apache/datafusion/pull/15432#discussion_r2016651315
##
datafusion/core/src/datasource/statistics.rs:
##
@@ -145,7 +147,142 @@ pub async fn get_statistics_with_limit(
Ok((result_files, statistics))
}
-fn ad
xudong963 commented on code in PR #15432:
URL: https://github.com/apache/datafusion/pull/15432#discussion_r201659
##
datafusion/core/src/datasource/statistics.rs:
##
@@ -145,7 +147,142 @@ pub async fn get_statistics_with_limit(
Ok((result_files, statistics))
}
-fn ad
berkaysynnada commented on code in PR #15432:
URL: https://github.com/apache/datafusion/pull/15432#discussion_r2016335211
##
datafusion/core/src/datasource/listing/table.rs:
##
@@ -1181,6 +1175,92 @@ impl ListingTable {
}
}
+/// Processes a stream of partitioned files an
jayzhan211 commented on code in PR #15432:
URL: https://github.com/apache/datafusion/pull/15432#discussion_r2016078687
##
datafusion/core/src/datasource/statistics.rs:
##
@@ -145,7 +147,142 @@ pub async fn get_statistics_with_limit(
Ok((result_files, statistics))
}
-fn a
jayzhan211 commented on code in PR #15432:
URL: https://github.com/apache/datafusion/pull/15432#discussion_r2016102166
##
datafusion/core/src/datasource/statistics.rs:
##
@@ -145,7 +147,142 @@ pub async fn get_statistics_with_limit(
Ok((result_files, statistics))
}
-fn a
jayzhan211 commented on code in PR #15432:
URL: https://github.com/apache/datafusion/pull/15432#discussion_r2016070941
##
datafusion/core/src/datasource/statistics.rs:
##
@@ -145,7 +147,142 @@ pub async fn get_statistics_with_limit(
Ok((result_files, statistics))
}
-fn a
xudong963 commented on PR #15432:
URL: https://github.com/apache/datafusion/pull/15432#issuecomment-2754452649
> when we create `FileGroups` to be less error-prone (accidentally get the
incorrect statistics)
Sorry, I don't get it. Why does adding a new `FileGroups` reduce error-prone?
jayzhan211 commented on PR #15432:
URL: https://github.com/apache/datafusion/pull/15432#issuecomment-2754133369
Do you think it is a good idea to add another `FileGroups` struct and
compute the statistics across all the files when we create `FileGroups` to be
less error-prone (accidentally
xudong963 commented on code in PR #15432:
URL: https://github.com/apache/datafusion/pull/15432#discussion_r2013679682
##
datafusion/core/src/datasource/statistics.rs:
##
@@ -145,7 +147,142 @@ pub async fn get_statistics_with_limit(
Ok((result_files, statistics))
}
-fn ad
20 matches
Mail list logo