alamb commented on code in PR #22969:
URL: https://github.com/apache/datafusion/pull/22969#discussion_r3423904048


##########
datafusion/catalog-listing/src/options.rs:
##########
@@ -38,13 +37,6 @@ pub struct ListingOptions {
     /// The expected partition column names in the folder structure.
     /// See [Self::with_table_partition_cols] for details
     pub table_partition_cols: Vec<(String, DataType)>,
-    /// Set true to try to guess statistics from the files.

Review Comment:
   technically speaking I can see someone wanting to wanting to control 
statistics collection on a per table (rather than a per session) basis -- I 
think you could still get the same effect by changing the session config 
settings before creating the table, rather than explicitly setting it on the 
ListingOptions
   
   So while this is a brekaing change I think users can use the same pattern
   
   I think the note in the upgrade guide looks good to me



##########
datafusion/catalog-listing/src/table.rs:
##########
@@ -738,7 +738,7 @@ impl ListingTable {
             
.buffer_unordered(ctx.config_options().execution.meta_fetch_concurrency);
 
         let (file_group, inexact_stats) =
-            get_files_with_limit(files, limit, 
self.options.collect_stat).await?;
+            get_files_with_limit(files, limit, 
ctx.config().collect_statistics()).await?;

Review Comment:
   I double checked that this is just a reference (not a deep copy)



##########
datafusion/catalog-listing/src/table.rs:
##########
@@ -747,32 +747,32 @@ impl ListingTable {
         // hash repartitioning for aggregates and joins on partition columns.
         let threshold = 
ctx.config_options().optimizer.preserve_file_partitions;
 
-        let (file_groups, grouped_by_partition) = if threshold > 0
-            && !self.options.table_partition_cols.is_empty()
-        {
-            let grouped =
-                
file_group.group_by_partition_values(self.options.target_partitions);
-            if grouped.len() >= threshold {
-                (grouped, true)
+        let (file_groups, grouped_by_partition) =

Review Comment:
   it is weird that rustfmt reformattted this as it lookslike the code is 
mostly the same up here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to