peasee opened a new issue, #17957:
URL: https://github.com/apache/datafusion/issues/17957

   ### Describe the bug
   
   When the `ListingTable` provider performs a scan, it does not prune any 
partitions when there are no filters supplied.
   
   If there are partitions present that do not match the partition scheme, this 
results in them being returned from the scan which can cause query errors due 
to missing partition values. For example, I encountered this while reading a 
delta lake table which contained a `_delta_log` directory. The `_delta_log` was 
not pruned:
   
   ```console
   DataSourceExec: file_groups={1 group: 
[[peasee-hive-test/_delta_log/0000.checkpoint.parquet, 
peasee-hive-test/pid=1/data.parquet, peasee-hive-test/pid=2/data.parquet]]}
   ```
   
   This results in a `Invalid partitioning found on disk` error when executed 
when retrieving the partitioning column in the query.
   
   ### To Reproduce
   
   Setup a hive partitioned object store, with a table partition column. Add an 
extra random folder (not a partition key, an invalid partition key), and 
perform a scan with no filters.
   
   ### Expected behavior
   
   The `ListingTable` provider should correctly prune the partitions when no 
filters are defined. This seems to already be implied from a note on the 
`ListingOptions`:
   
   ```
   Files that don't follow this partitioning scheme will be
   ignored.
   ```
   
   ### Additional context
   
   I have already created a fix which I will be raising shortly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to