Thanks for sharing. On Tue, Nov 21, 2023 at 21:52 Walaa Eldin Moustafa <wa.moust...@gmail.com> wrote:
> We met on Wednesday and created the channel #cdc-read on Iceberg Slack. A > summary of the meeting discussion points is there. > > Thanks, > Walaa. > > On Tue, Nov 21, 2023 at 8:06 AM Renjie Liu <liurenjie2...@gmail.com> > wrote: > >> Hi: >> >> Is there any update on this topic? >> >> On Tue, Nov 14, 2023 at 07:25 Yufei Gu <flyrain...@gmail.com> wrote: >> >>> Hi folks, >>> >>> We will discuss it this Wednesday(11/15) at 9 am PST. Feel free to join >>> if you are interested. >>> >>> Sync-up for Iceberg CDC View on MOR >>> Wednesday, November 15 · 9:00 – 10:00am >>> Time zone: America/Los_Angeles >>> Google Meet joining info >>> Video call link: https://meet.google.com/zef-grqu-cqy >>> >>> Yufei >>> >>> >>> On Mon, Nov 6, 2023 at 4:39 AM Péter Váry <peter.vary.apa...@gmail.com> >>> wrote: >>> >>>> Hi Team, >>>> >>>> I was thinking about the possible implementations of a streaming read >>>> of MOR tables from Flink. >>>> I was checking the Spark code, and found that the feature is also >>>> missing from Spark. As Yufei mentioned, the building blocks are there, but >>>> the feature is not implemented yet. >>>> It would be good to implement the DeletedRowsScanTask and related >>>> features, so this feature would be available for both Spark and Flink >>>> engines. >>>> >>>> Thanks, >>>> Peter >>>> >>>> Yufei Gu <flyrain...@gmail.com> ezt írta (időpont: 2023. nov. 3., P, >>>> 18:47): >>>> >>>>> Hi Pucheng, >>>>> >>>>> In short, we can reuse front-end infrastructure, including the >>>>> changelog view procedure and iterators. We need some work from the reader >>>>> side, it is not a trivial one, but some essential building blocks, like >>>>> the >>>>> `_deleted` metadata column, are there already. >>>>> >>>>> To get row-level deletes, we will leverage the `_deleted` metadata >>>>> column for both pos deletes and eq deletes. Especially, instead of >>>>> emitting >>>>> equality deletes directly as cdc deleted rows, we resolve the eq deletes >>>>> to >>>>> actual deleted rows and emit them as CDC delete rows. For example, an eq >>>>> delete may delete two data rows. We will emit the 2 actual deleted rows.We >>>>> change the design so that we emit all deleted(pos and eq) rows together in >>>>> the same format. >>>>> >>>>> The downside is that it is expensive for certain use cases. For >>>>> example, it has to scan all data files to resolve global eq deletes. We >>>>> can >>>>> try to solve this by providing an option to emit eq deletes rows directly >>>>> in the future. Please refer to >>>>> https://github.com/apache/iceberg/issues/3941#issuecomment-1081273709 >>>>> for more details. >>>>> >>>>> >>>>> Yufei >>>>> >>>>> >>>>> On Thu, Nov 2, 2023 at 9:17 PM Pucheng Yang >>>>> <py...@pinterest.com.invalid> wrote: >>>>> >>>>>> Feature request ticket: https://github.com/apache/iceberg/issues/8975 >>>>>> >>>>>> On Thu, Nov 2, 2023 at 9:16 PM Pucheng Yang <py...@pinterest.com> >>>>>> wrote: >>>>>> >>>>>>> Hi community, >>>>>>> >>>>>>> I wonder if anyone is interested in having a MOR CDC view feature? >>>>>>> My organization is interested in using Flink upsert (MOR) into the >>>>>>> Iceberg >>>>>>> table, but currently the MOR CDC view is not supported. >>>>>>> >>>>>>> If we were to support it, do you know how much work it will be? How >>>>>>> difficult will that be? Any pointers will be greatly appreciated. >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>> Best, >>>>>>> Pucheng >>>>>>> >>>>>>