crm26 opened a new pull request, #21367:
URL: https://github.com/apache/datafusion/pull/21367
## Summary
Adds four inline SQL table functions for ad-hoc file querying:
```sql
SELECT * FROM read_parquet('/path/to/*.parquet')
SELECT * FROM read_csv('/data/file.csv')
SELECT * FROM read_json('/data/file.json')
SELECT * FROM read_avro('/data/file.avro')
```
Closes #3773
## Design
Each function is a thin `TableFunctionImpl` wrapper (~60 lines) over
`ListingTable`:
1. Extract path string from `Expr::Literal`
2. Construct `ListingOptions` with the format's `FileFormat`
3. Infer schema via blocking bridge
4. Return `ListingTable` as `TableProvider`
Since the SQL planner wraps UDTF output as `LogicalPlan::TableScan`, all
optimizer rules apply automatically:
- **Filter pushdown** — verified via EXPLAIN test
- **Projection pushdown** — verified via EXPLAIN test
- **Partition pruning** — inherited from `ListingTable`
## Async bridge
`call_with_args` is a sync fn but `infer_schema` is async. Uses
`std::thread::scope` + `Handle::block_on` (not `block_in_place`) so it works on
both multi-thread and current-thread Tokio runtimes. Tested with
single-threaded runtime.
## Feature gating
- `read_parquet` — requires `parquet` feature (default on)
- `read_avro` — requires `avro` feature (default off)
- `read_csv` / `read_json` — always available (no heavy optional
dependencies)
## Limitations (v1)
- Positional arguments only — no named args like `has_header => true`
- No user-supplied schema override
- No explicit Hive partition column specification
- S3 paths require a registered object store
These can be addressed in follow-on PRs.
## Tests
16 tests covering: basic read, filtered read, projection, aggregation, glob
multi-file, error paths (no args, wrong type), filter/projection pushdown
verification, and single-threaded runtime safety.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]