Hi: Is there any update on this topic?
On Tue, Nov 14, 2023 at 07:25 Yufei Gu <flyrain...@gmail.com> wrote: > Hi folks, > > We will discuss it this Wednesday(11/15) at 9 am PST. Feel free to join if > you are interested. > > Sync-up for Iceberg CDC View on MOR > Wednesday, November 15 · 9:00 – 10:00am > Time zone: America/Los_Angeles > Google Meet joining info > Video call link: https://meet.google.com/zef-grqu-cqy > > Yufei > > > On Mon, Nov 6, 2023 at 4:39 AM Péter Váry <peter.vary.apa...@gmail.com> > wrote: > >> Hi Team, >> >> I was thinking about the possible implementations of a streaming read of >> MOR tables from Flink. >> I was checking the Spark code, and found that the feature is also missing >> from Spark. As Yufei mentioned, the building blocks are there, but the >> feature is not implemented yet. >> It would be good to implement the DeletedRowsScanTask and related >> features, so this feature would be available for both Spark and Flink >> engines. >> >> Thanks, >> Peter >> >> Yufei Gu <flyrain...@gmail.com> ezt írta (időpont: 2023. nov. 3., P, >> 18:47): >> >>> Hi Pucheng, >>> >>> In short, we can reuse front-end infrastructure, including the changelog >>> view procedure and iterators. We need some work from the reader side, it is >>> not a trivial one, but some essential building blocks, like the `_deleted` >>> metadata column, are there already. >>> >>> To get row-level deletes, we will leverage the `_deleted` metadata >>> column for both pos deletes and eq deletes. Especially, instead of emitting >>> equality deletes directly as cdc deleted rows, we resolve the eq deletes to >>> actual deleted rows and emit them as CDC delete rows. For example, an eq >>> delete may delete two data rows. We will emit the 2 actual deleted rows.We >>> change the design so that we emit all deleted(pos and eq) rows together in >>> the same format. >>> >>> The downside is that it is expensive for certain use cases. For example, >>> it has to scan all data files to resolve global eq deletes. We can try to >>> solve this by providing an option to emit eq deletes rows directly in the >>> future. Please refer to >>> https://github.com/apache/iceberg/issues/3941#issuecomment-1081273709 >>> for more details. >>> >>> >>> Yufei >>> >>> >>> On Thu, Nov 2, 2023 at 9:17 PM Pucheng Yang <py...@pinterest.com.invalid> >>> wrote: >>> >>>> Feature request ticket: https://github.com/apache/iceberg/issues/8975 >>>> >>>> On Thu, Nov 2, 2023 at 9:16 PM Pucheng Yang <py...@pinterest.com> >>>> wrote: >>>> >>>>> Hi community, >>>>> >>>>> I wonder if anyone is interested in having a MOR CDC view feature? My >>>>> organization is interested in using Flink upsert (MOR) into the Iceberg >>>>> table, but currently the MOR CDC view is not supported. >>>>> >>>>> If we were to support it, do you know how much work it will be? How >>>>> difficult will that be? Any pointers will be greatly appreciated. >>>>> >>>>> Thanks! >>>>> >>>>> Best, >>>>> Pucheng >>>>> >>>>