Re: [DISCUSS] Incremental Replication for Iceberg Tables

Xinli shang Sat, 18 Oct 2025 01:43:46 -0700

Hi Maninder,

Thanks for your reply and comments! The synchronous replication will be
performed after the transaction instead of in parallel. Can you cast a
light on 'replication to be done by the engine'?
For asynchronous replication system, I assume you meant a
separate replication system. That has advantage but challenges too. We
briefnly discussed in the Google doc. We can move the discussion there.


On Wed, Oct 1, 2025 at 11:47 AM Maninder Parmar <
[email protected]> wrote:

> Thanks for the proposal Xinli! I have few thoughts about this approach:
>
> 1. Doing commit time synchronous replication that involves copying data
> files will severely limit the transaction throughput as well as
> reliability. Even if we want to attempt synchronous replication, it would
> be better for file (both data and metadata) replication to be done by the
> engine.
> 2. In general, it might be performant/easier to design an asynchronous
> replication system that provides snapshot isolation guarantees for reads on
> the replica.
>
> Regards,
> Maninder
>
> On Wed, Oct 1, 2025 at 9:47 AM Manu Zhang <[email protected]> wrote:
>
>> Thanks for the proposal Xinli. I have also thought through Iceberg table
>> replication before and have some doubts over this approach.
>>
>> 1. Will synchronous replication be useful since underlying
>> distributed file systems like S3 already provide high durability? On the
>> other hand, a cross-datacenter network hiccup would fail the commit. It
>> might involve an oncall to disable the option for a commit to succeed if
>> the network issue lasts for a while. IMO, replication for disaster recovery
>> should be transparent and have no impact on users' applications.
>>
>> 2. How about commits from rewrite_data_files? Will it replicate the
>> entire table if all files of a table have been rewritten? In this case,
>> there's actually no "changes" to the table and I think only "changes" are
>> needed to replicate.
>>
>> 3. Metadata replication is not easy to get right. We've seen such issues
>> [1] with rewrite_table_path that not updating sizes in manifest lists could
>> lead to correctness problems. How about creating new metadata files for
>> replicated data files?
>>
>> [1] https://github.com/apache/iceberg/issues/13719
>>
>> Best,
>> Manu
>>
>> On Wed, Oct 1, 2025 at 6:43 AM Chao Sun <[email protected]> wrote:
>>
>>> Thanks for the proposal Xinli! It sounds very useful and I also just
>>> left some comments.
>>>
>>> On Mon, Sep 29, 2025 at 8:42 PM Gang Wu <[email protected]> wrote:
>>>
>>>> This thread was accidentally in my spam folder.
>>>>
>>>> I have left some comments with regard to the implication on the Iceberg
>>>> rest catalog side.
>>>>
>>>> Best,
>>>> Gang
>>>>
>>>> On Tue, Sep 30, 2025 at 5:44 AM Huaxin Gao <[email protected]>
>>>> wrote:
>>>>
>>>>> Thanks for the proposal. I think it's in the right direction. I left
>>>>> some comments and will take another look when time allows.
>>>>>
>>>>> Huaxin
>>>>>
>>>>> On 2025/09/27 17:27:29 Xinli shang wrote:
>>>>> > Hi all,
>>>>> >
>>>>> > I’d like to propose adding *native incremental replication* to
>>>>> Iceberg
>>>>> > tables.
>>>>> >
>>>>> > *Motivation:* Many production deployments require cross–data center
>>>>> backup
>>>>> > and data locality. Today this is usually handled by external
>>>>> services,
>>>>> > which adds operational overhead and introduces failure modes outside
>>>>> > Iceberg’s transactional boundary. Integrating replication into the
>>>>> commit
>>>>> > workflow would simplify operations and improve consistency.
>>>>> >
>>>>> > *Proposal:* An optional replication phase in the commit process would
>>>>> > automatically copy data files and metadata to one or more targets
>>>>> (e.g.,
>>>>> > S3, HDFS, GCS, Azure). Replication is configured via table
>>>>> properties and
>>>>> > supports both synchronous (immediate consistency, higher latency) and
>>>>> > asynchronous (background retries, eventual consistency) modes. This
>>>>> > provides built-in disaster recovery, data locality optimization, and
>>>>> > cross-region analytics without external tool
>>>>> >
>>>>> > Full draft proposal with design details is here:
>>>>> > 👉 Incremental Iceberg Replication Proposal
>>>>> > <
>>>>> https://docs.google.com/document/d/1yrVLs0CQyIHs9WbBVx_EK6ad419Adsl9xHozpmQEMrs/edit?tab=t.0#heading=h.aa5ph23raz9l
>>>>> >
>>>>> >
>>>>> > Thanks,
>>>>> > Xinli
>>>>> >
>>>>>
>>>>

-- 
Xinli Shang

Re: [DISCUSS] Incremental Replication for Iceberg Tables

Reply via email to