Hi Marko, Indeed most databases do support time travel/stale reads (specially distributed databases) , hence an important feature,IMHO.
Just for my understanding - the proposal assumes that writes will result in a new table version correct? Asking since, some databases provide stale read support - but the table schema version itself does not change, rather the records are appendonly and have a timestamp associated with it ( typically in an 'internal' column). Perhaps the solution can be extended to have the facility to specify/tag , in the table structure, a column as a commit timestamp tracker , then it can be used to provide stale reads based on a timestamp as well. Something like an "AS OF TIMESTAMP" support, basically. Hope it makes sense. Thanks, akshara On Fri, Aug 18, 2023 at 8:35 AM Marko Grujic <mark...@gmail.com> wrote: > Hi all! > > I'm wondering what people think of a possibility to extend DataFusion so as > to accommodate time-travel querying? This would work well with the new > table formats, particularly Iceberg and Delta Lake, where table versioning > is at the core of the protocol. > > You can see some details in the issue I raised below[1], but the TLDR of > the work I see is: > 1. extend sqlparser-rs to be aware of the `AS OF` clause (or something else > people prefer) > 2. capture that information inside `TableFactor::Table > < > https://github.com/sqlparser-rs/sqlparser-rs/blob/main/src/ast/query.rs#L650-L664 > >` > expression > 3. then in DataFusion itself while building `SessionContextProvider` and > pre-populating the tables for a given query keep track of both the table > version and table name specified > 4. this would also mean a breaking change in the `SchemaProvider::table` > along the lines of > ```rust > async fn table(&self, name: &str, version: Option<TableVersion>) -> > Option<Arc<dyn TableProvider>> > ``` > which would allow the provider implementation to be version-aware > > I'd be glad to commence work on this if there's consensus on the addition > of such a feature to DataFusion. > > Cheers, > Marko > > [1] https://github.com/apache/arrow-datafusion/issues/7292 >