karlovnv opened a new issue, #10433:
URL: https://github.com/apache/datafusion/issues/10433
### Is your feature request related to a problem or challenge?
Consider we have huge data source consists of many record batches.
Now it's impossible to get last recent N rows without full scan:
``` sql
SELECT * FROM Events
ORDER BE event_time DESC
LIMIT 1000
```
The query above will do full scan from the starting row, but TableProvider
may know that it needed to provide only last record batches (or latest parquet
files in folder).
### Describe the solution you'd like
Now we have filter and limit in TableProvider::scan:
``` rust
async fn scan(
&self,
state: &SessionState,
projection: Option<&Vec<usize>>,
// filters and limit can be used here to inject some push-down
operations if needed
filters: &[Expr],
limit: Option<usize>,
) -> Result<Arc<dyn ExecutionPlan>> {
```
Let's add SortExpression as well to push it down or just consider:
``` rust
async fn scan(
&self,
state: &SessionState,
projection: Option<&Vec<usize>>,
// filters and limit can be used here to inject some push-down
operations if needed
filters: &[Expr],
// sort expression
sorting: &[Expr],
limit: Option<usize>,
) -> Result<Arc<dyn ExecutionPlan>> {
```
### Describe alternatives you've considered
_No response_
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]