gene-bordegaray commented on PR #22657:
URL: https://github.com/apache/datafusion/pull/22657#issuecomment-4707679807

   this is ready for another glance. I took the following approach to declaring 
the partitioning:
   
   - target_partitions is used as a scan parallelism hint. It is used when no 
declared output_partitioning is set.
   - output_partitioning overrides target_partitions when set. The listing 
table creates one file group per output partition and preserves empty groups 
when needed, validates the final group count, and then advertises the physical 
partitioning.
   
   From comments this seems to be the topics for follow-up work / discussion:
   
   1. Replace the split between output_partitioning output_partitioning: 
Option<Partitioning> and partitioned_by_file_group: bool some better internal 
structure. I was thinking unifying them with something like an internal enum: 
   ```rust
   enum FilePartitioning {
     None, 
     Declared(Partitioning), 
     DeriveFromPartitionedFileGroups,
   }
   ```
   
   2. with_target_partitions. For now, keeping it as parallelism API, but maybe 
a rename or deprecation if we want a clearer distinction between scan 
parallelism and output partitioning layout.
   
   cc: @gabotechs @NGA-TRAN @alamb @stuhood 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to