We met on Wednesday and created the channel #cdc-read on Iceberg Slack. A summary of the meeting discussion points is there.
Thanks, Walaa. On Tue, Nov 21, 2023 at 8:06 AM Renjie Liu <liurenjie2...@gmail.com> wrote: > Hi: > > Is there any update on this topic? > > On Tue, Nov 14, 2023 at 07:25 Yufei Gu <flyrain...@gmail.com> wrote: > >> Hi folks, >> >> We will discuss it this Wednesday(11/15) at 9 am PST. Feel free to join >> if you are interested. >> >> Sync-up for Iceberg CDC View on MOR >> Wednesday, November 15 · 9:00 – 10:00am >> Time zone: America/Los_Angeles >> Google Meet joining info >> Video call link: https://meet.google.com/zef-grqu-cqy >> >> Yufei >> >> >> On Mon, Nov 6, 2023 at 4:39 AM Péter Váry <peter.vary.apa...@gmail.com> >> wrote: >> >>> Hi Team, >>> >>> I was thinking about the possible implementations of a streaming read of >>> MOR tables from Flink. >>> I was checking the Spark code, and found that the feature is also >>> missing from Spark. As Yufei mentioned, the building blocks are there, but >>> the feature is not implemented yet. >>> It would be good to implement the DeletedRowsScanTask and related >>> features, so this feature would be available for both Spark and Flink >>> engines. >>> >>> Thanks, >>> Peter >>> >>> Yufei Gu <flyrain...@gmail.com> ezt írta (időpont: 2023. nov. 3., P, >>> 18:47): >>> >>>> Hi Pucheng, >>>> >>>> In short, we can reuse front-end infrastructure, including the >>>> changelog view procedure and iterators. We need some work from the reader >>>> side, it is not a trivial one, but some essential building blocks, like the >>>> `_deleted` metadata column, are there already. >>>> >>>> To get row-level deletes, we will leverage the `_deleted` metadata >>>> column for both pos deletes and eq deletes. Especially, instead of emitting >>>> equality deletes directly as cdc deleted rows, we resolve the eq deletes to >>>> actual deleted rows and emit them as CDC delete rows. For example, an eq >>>> delete may delete two data rows. We will emit the 2 actual deleted rows.We >>>> change the design so that we emit all deleted(pos and eq) rows together in >>>> the same format. >>>> >>>> The downside is that it is expensive for certain use cases. For >>>> example, it has to scan all data files to resolve global eq deletes. We can >>>> try to solve this by providing an option to emit eq deletes rows directly >>>> in the future. Please refer to >>>> https://github.com/apache/iceberg/issues/3941#issuecomment-1081273709 >>>> for more details. >>>> >>>> >>>> Yufei >>>> >>>> >>>> On Thu, Nov 2, 2023 at 9:17 PM Pucheng Yang <py...@pinterest.com.invalid> >>>> wrote: >>>> >>>>> Feature request ticket: https://github.com/apache/iceberg/issues/8975 >>>>> >>>>> On Thu, Nov 2, 2023 at 9:16 PM Pucheng Yang <py...@pinterest.com> >>>>> wrote: >>>>> >>>>>> Hi community, >>>>>> >>>>>> I wonder if anyone is interested in having a MOR CDC view feature? My >>>>>> organization is interested in using Flink upsert (MOR) into the Iceberg >>>>>> table, but currently the MOR CDC view is not supported. >>>>>> >>>>>> If we were to support it, do you know how much work it will be? How >>>>>> difficult will that be? Any pointers will be greatly appreciated. >>>>>> >>>>>> Thanks! >>>>>> >>>>>> Best, >>>>>> Pucheng >>>>>> >>>>>