Slightly side topic: Are slack channels archived anywhere for offline consumption (apologies if I missed it on the community page)?
Thanks, Micah On Tue, Nov 21, 2023 at 6:07 AM Renjie Liu <liurenjie2...@gmail.com> wrote: > Thanks for sharing. > > On Tue, Nov 21, 2023 at 21:52 Walaa Eldin Moustafa <wa.moust...@gmail.com> > wrote: > >> We met on Wednesday and created the channel #cdc-read on Iceberg Slack. A >> summary of the meeting discussion points is there. >> >> Thanks, >> Walaa. >> >> On Tue, Nov 21, 2023 at 8:06 AM Renjie Liu <liurenjie2...@gmail.com> >> wrote: >> >>> Hi: >>> >>> Is there any update on this topic? >>> >>> On Tue, Nov 14, 2023 at 07:25 Yufei Gu <flyrain...@gmail.com> wrote: >>> >>>> Hi folks, >>>> >>>> We will discuss it this Wednesday(11/15) at 9 am PST. Feel free to join >>>> if you are interested. >>>> >>>> Sync-up for Iceberg CDC View on MOR >>>> Wednesday, November 15 · 9:00 – 10:00am >>>> Time zone: America/Los_Angeles >>>> Google Meet joining info >>>> Video call link: https://meet.google.com/zef-grqu-cqy >>>> >>>> Yufei >>>> >>>> >>>> On Mon, Nov 6, 2023 at 4:39 AM Péter Váry <peter.vary.apa...@gmail.com> >>>> wrote: >>>> >>>>> Hi Team, >>>>> >>>>> I was thinking about the possible implementations of a streaming read >>>>> of MOR tables from Flink. >>>>> I was checking the Spark code, and found that the feature is also >>>>> missing from Spark. As Yufei mentioned, the building blocks are there, but >>>>> the feature is not implemented yet. >>>>> It would be good to implement the DeletedRowsScanTask and related >>>>> features, so this feature would be available for both Spark and Flink >>>>> engines. >>>>> >>>>> Thanks, >>>>> Peter >>>>> >>>>> Yufei Gu <flyrain...@gmail.com> ezt írta (időpont: 2023. nov. 3., P, >>>>> 18:47): >>>>> >>>>>> Hi Pucheng, >>>>>> >>>>>> In short, we can reuse front-end infrastructure, including the >>>>>> changelog view procedure and iterators. We need some work from the reader >>>>>> side, it is not a trivial one, but some essential building blocks, like >>>>>> the >>>>>> `_deleted` metadata column, are there already. >>>>>> >>>>>> To get row-level deletes, we will leverage the `_deleted` metadata >>>>>> column for both pos deletes and eq deletes. Especially, instead of >>>>>> emitting >>>>>> equality deletes directly as cdc deleted rows, we resolve the eq deletes >>>>>> to >>>>>> actual deleted rows and emit them as CDC delete rows. For example, an eq >>>>>> delete may delete two data rows. We will emit the 2 actual deleted >>>>>> rows.We >>>>>> change the design so that we emit all deleted(pos and eq) rows together >>>>>> in >>>>>> the same format. >>>>>> >>>>>> The downside is that it is expensive for certain use cases. For >>>>>> example, it has to scan all data files to resolve global eq deletes. We >>>>>> can >>>>>> try to solve this by providing an option to emit eq deletes rows directly >>>>>> in the future. Please refer to >>>>>> https://github.com/apache/iceberg/issues/3941#issuecomment-1081273709 >>>>>> for more details. >>>>>> >>>>>> >>>>>> Yufei >>>>>> >>>>>> >>>>>> On Thu, Nov 2, 2023 at 9:17 PM Pucheng Yang >>>>>> <py...@pinterest.com.invalid> wrote: >>>>>> >>>>>>> Feature request ticket: >>>>>>> https://github.com/apache/iceberg/issues/8975 >>>>>>> >>>>>>> On Thu, Nov 2, 2023 at 9:16 PM Pucheng Yang <py...@pinterest.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi community, >>>>>>>> >>>>>>>> I wonder if anyone is interested in having a MOR CDC view feature? >>>>>>>> My organization is interested in using Flink upsert (MOR) into the >>>>>>>> Iceberg >>>>>>>> table, but currently the MOR CDC view is not supported. >>>>>>>> >>>>>>>> If we were to support it, do you know how much work it will be? How >>>>>>>> difficult will that be? Any pointers will be greatly appreciated. >>>>>>>> >>>>>>>> Thanks! >>>>>>>> >>>>>>>> Best, >>>>>>>> Pucheng >>>>>>>> >>>>>>>