peasee opened a new issue, #17957:
URL: https://github.com/apache/datafusion/issues/17957
### Describe the bug
When the `ListingTable` provider performs a scan, it does not prune any
partitions when there are no filters supplied.
If there are partitions present that do not match the partition scheme, this
results in them being returned from the scan which can cause query errors due
to missing partition values. For example, I encountered this while reading a
delta lake table which contained a `_delta_log` directory. The `_delta_log` was
not pruned:
```console
DataSourceExec: file_groups={1 group:
[[peasee-hive-test/_delta_log/0000.checkpoint.parquet,
peasee-hive-test/pid=1/data.parquet, peasee-hive-test/pid=2/data.parquet]]}
```
This results in a `Invalid partitioning found on disk` error when executed
when retrieving the partitioning column in the query.
### To Reproduce
Setup a hive partitioned object store, with a table partition column. Add an
extra random folder (not a partition key, an invalid partition key), and
perform a scan with no filters.
### Expected behavior
The `ListingTable` provider should correctly prune the partitions when no
filters are defined. This seems to already be implied from a note on the
`ListingOptions`:
```
Files that don't follow this partitioning scheme will be
ignored.
```
### Additional context
I have already created a fix which I will be raising shortly.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]