alamb commented on code in PR #22969:
URL: https://github.com/apache/datafusion/pull/22969#discussion_r3423904048
##########
datafusion/catalog-listing/src/options.rs:
##########
@@ -38,13 +37,6 @@ pub struct ListingOptions {
/// The expected partition column names in the folder structure.
/// See [Self::with_table_partition_cols] for details
pub table_partition_cols: Vec<(String, DataType)>,
- /// Set true to try to guess statistics from the files.
Review Comment:
technically speaking I can see someone wanting to wanting to control
statistics collection on a per table (rather than a per session) basis -- I
think you could still get the same effect by changing the session config
settings before creating the table, rather than explicitly setting it on the
ListingOptions
So while this is a brekaing change I think users can use the same pattern
I think the note in the upgrade guide looks good to me
##########
datafusion/catalog-listing/src/table.rs:
##########
@@ -738,7 +738,7 @@ impl ListingTable {
.buffer_unordered(ctx.config_options().execution.meta_fetch_concurrency);
let (file_group, inexact_stats) =
- get_files_with_limit(files, limit,
self.options.collect_stat).await?;
+ get_files_with_limit(files, limit,
ctx.config().collect_statistics()).await?;
Review Comment:
I double checked that this is just a reference (not a deep copy)
##########
datafusion/catalog-listing/src/table.rs:
##########
@@ -747,32 +747,32 @@ impl ListingTable {
// hash repartitioning for aggregates and joins on partition columns.
let threshold =
ctx.config_options().optimizer.preserve_file_partitions;
- let (file_groups, grouped_by_partition) = if threshold > 0
- && !self.options.table_partition_cols.is_empty()
- {
- let grouped =
-
file_group.group_by_partition_values(self.options.target_partitions);
- if grouped.len() >= threshold {
- (grouped, true)
+ let (file_groups, grouped_by_partition) =
Review Comment:
it is weird that rustfmt reformattted this as it lookslike the code is
mostly the same up here
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]