Hi Marko,

Indeed most databases do support time travel/stale reads (specially
distributed databases) , hence an important feature,IMHO.

Just for my understanding - the proposal assumes that writes will result in
a new table version correct?
Asking since, some databases provide stale read support - but the table
schema version itself does not change, rather the records are appendonly
and have a timestamp associated with it ( typically in an 'internal'
column).
Perhaps the solution can be extended to have the facility to specify/tag ,
in the table structure, a column as a commit timestamp tracker , then it
can be used to provide stale reads based on a timestamp as well.

Something like an "AS OF TIMESTAMP" support, basically.

Hope it makes sense.

Thanks,
akshara

On Fri, Aug 18, 2023 at 8:35 AM Marko Grujic <mark...@gmail.com> wrote:

> Hi all!
>
> I'm wondering what people think of a possibility to extend DataFusion so as
> to accommodate time-travel querying? This would work well with the new
> table formats, particularly Iceberg and Delta Lake, where table versioning
> is at the core of the protocol.
>
> You can see some details in the issue I raised below[1], but the TLDR of
> the work I see is:
> 1. extend sqlparser-rs to be aware of the `AS OF` clause (or something else
> people prefer)
> 2. capture that information inside `TableFactor::Table
> <
> https://github.com/sqlparser-rs/sqlparser-rs/blob/main/src/ast/query.rs#L650-L664
> >`
> expression
> 3. then in DataFusion itself while building `SessionContextProvider` and
> pre-populating the tables for a given query keep track of both the table
> version and table name specified
> 4. this would also mean a breaking change in the `SchemaProvider::table`
> along the lines of
> ```rust
> async fn table(&self, name: &str, version: Option<TableVersion>) ->
> Option<Arc<dyn TableProvider>>
> ```
> which would allow the provider implementation to be version-aware
>
> I'd be glad to commence work on this if there's consensus on the addition
> of such a feature to DataFusion.
>
> Cheers,
> Marko
>
> [1] https://github.com/apache/arrow-datafusion/issues/7292
>

Reply via email to