Noted, I was calling it a snapshot since that's what the prototype
constructs to pass it down to file-planning API, but that's just an
implementation detail.
We could add an appendsBetween(s1, s2) API, but I wanted to keep the
Original Scan API separate from Incremental scan as the original scan is
Thanks for working on this!
I think the overall idea of being able to plan an incremental scan is a
good idea. But, we should avoid calling the incremental data a “snapshot”.
A snapshot is the table state at some point in time, and I think it would
be confusing if we started adding new meanings.
Thanks Ryan,
Iceberg can give you the data files that were added or deleted in a
> snapshot, but there isn't a good way to take those and actually read them
> as a DataFrame or select that data from a table in SQL. I'd think that's
> a good first step
One approach which I'm currently prototyping
Like having a system table to store in-use snapshot? Isn’t the incremental
processing much like incremental pulling in Hudi?
-Li
发件人: Ryan Blue
答复: "dev@iceberg.apache.org" , "rb...@netflix.com"
日期: 2019年7月18日 星期四 上午3:55
收件人: RD
抄送: Iceberg Dev List
主题: Re: Getting d
I think it would be helpful to have a pattern for incremental processing.
Iceberg can give you the data files that were added or deleted in a
snapshot, but there isn't a good way to take those and actually read them
as a DataFrame or select that data from a table in SQL. I'd think that's a
good fir
Hi Iceberg devs,
We are starting work on a somewhat similar project. The idea is that
users can ask for incremental data since the last snapshot they processed,
i.e the delta that was added since the last snapshot. Do you guys think
that whether this can be a general feature that can we benefici
You can do this using time-travel. First, read the table at each snapshot.
This creates a temporary table for both snapshots:
// create temp tables for each snapshot
spark.read.format("iceberg").option("snapshot-id",
8924558786060583479L).load("db.table").createOrReplaceTempTable("s1")
spark.read.
Hi,
Could you please advise how we can get delta data changes (diff) between 2
Snapshots?
Is there any way providing 2 Snapshot Ids (8924558786060583479,
6536733823181975045) and get records that added after 8924558786060583479
?+-+-+--