Slightly side topic: Are slack channels archived anywhere for offline
consumption (apologies if I missed it on the community page)?

Thanks,
Micah

On Tue, Nov 21, 2023 at 6:07 AM Renjie Liu <liurenjie2...@gmail.com> wrote:

> Thanks for sharing.
>
> On Tue, Nov 21, 2023 at 21:52 Walaa Eldin Moustafa <wa.moust...@gmail.com>
> wrote:
>
>> We met on Wednesday and created the channel #cdc-read on Iceberg Slack. A
>> summary of the meeting discussion points is there.
>>
>> Thanks,
>> Walaa.
>>
>> On Tue, Nov 21, 2023 at 8:06 AM Renjie Liu <liurenjie2...@gmail.com>
>> wrote:
>>
>>> Hi:
>>>
>>> Is there any update on this topic?
>>>
>>> On Tue, Nov 14, 2023 at 07:25 Yufei Gu <flyrain...@gmail.com> wrote:
>>>
>>>> Hi folks,
>>>>
>>>> We will discuss it this Wednesday(11/15) at 9 am PST. Feel free to join
>>>> if you are interested.
>>>>
>>>> Sync-up for Iceberg CDC View on MOR
>>>> Wednesday, November 15 · 9:00 – 10:00am
>>>> Time zone: America/Los_Angeles
>>>> Google Meet joining info
>>>> Video call link: https://meet.google.com/zef-grqu-cqy
>>>>
>>>> Yufei
>>>>
>>>>
>>>> On Mon, Nov 6, 2023 at 4:39 AM Péter Váry <peter.vary.apa...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Team,
>>>>>
>>>>> I was thinking about the possible implementations of a streaming read
>>>>> of MOR tables from Flink.
>>>>> I was checking the Spark code, and found that the feature is also
>>>>> missing from Spark. As Yufei mentioned, the building blocks are there, but
>>>>> the feature is not implemented yet.
>>>>> It would be good to implement the DeletedRowsScanTask and related
>>>>> features, so this feature would be available for both Spark and Flink
>>>>> engines.
>>>>>
>>>>> Thanks,
>>>>> Peter
>>>>>
>>>>> Yufei Gu <flyrain...@gmail.com> ezt írta (időpont: 2023. nov. 3., P,
>>>>> 18:47):
>>>>>
>>>>>> Hi Pucheng,
>>>>>>
>>>>>> In short, we can reuse front-end infrastructure, including the
>>>>>> changelog view procedure and iterators. We need some work from the reader
>>>>>> side, it is not a trivial one, but some essential building blocks, like 
>>>>>> the
>>>>>> `_deleted` metadata column, are there already.
>>>>>>
>>>>>> To get row-level deletes, we will leverage the `_deleted` metadata
>>>>>> column for both pos deletes and eq deletes. Especially, instead of 
>>>>>> emitting
>>>>>> equality deletes directly as cdc deleted rows, we resolve the eq deletes 
>>>>>> to
>>>>>> actual deleted rows and emit them as CDC delete rows. For example, an eq
>>>>>> delete may delete two data rows. We will emit the 2 actual deleted 
>>>>>> rows.We
>>>>>> change the design so that we emit all deleted(pos and eq) rows together 
>>>>>> in
>>>>>> the same format.
>>>>>>
>>>>>> The downside is that it is expensive for certain use cases. For
>>>>>> example, it has to scan all data files to resolve global eq deletes. We 
>>>>>> can
>>>>>> try to solve this by providing an option to emit eq deletes rows directly
>>>>>> in the future. Please refer to
>>>>>> https://github.com/apache/iceberg/issues/3941#issuecomment-1081273709
>>>>>> for more details.
>>>>>>
>>>>>>
>>>>>> Yufei
>>>>>>
>>>>>>
>>>>>> On Thu, Nov 2, 2023 at 9:17 PM Pucheng Yang
>>>>>> <py...@pinterest.com.invalid> wrote:
>>>>>>
>>>>>>> Feature request ticket:
>>>>>>> https://github.com/apache/iceberg/issues/8975
>>>>>>>
>>>>>>> On Thu, Nov 2, 2023 at 9:16 PM Pucheng Yang <py...@pinterest.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi community,
>>>>>>>>
>>>>>>>> I wonder if anyone is interested in having a MOR CDC view feature?
>>>>>>>> My organization is interested in using Flink upsert (MOR) into the 
>>>>>>>> Iceberg
>>>>>>>> table, but currently the MOR CDC view is not supported.
>>>>>>>>
>>>>>>>> If we were to support it, do you know how much work it will be? How
>>>>>>>> difficult will that be? Any pointers will be greatly appreciated.
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Pucheng
>>>>>>>>
>>>>>>>

Reply via email to