We met on Wednesday and created the channel #cdc-read on Iceberg Slack. A
summary of the meeting discussion points is there.

Thanks,
Walaa.

On Tue, Nov 21, 2023 at 8:06 AM Renjie Liu <liurenjie2...@gmail.com> wrote:

> Hi:
>
> Is there any update on this topic?
>
> On Tue, Nov 14, 2023 at 07:25 Yufei Gu <flyrain...@gmail.com> wrote:
>
>> Hi folks,
>>
>> We will discuss it this Wednesday(11/15) at 9 am PST. Feel free to join
>> if you are interested.
>>
>> Sync-up for Iceberg CDC View on MOR
>> Wednesday, November 15 · 9:00 – 10:00am
>> Time zone: America/Los_Angeles
>> Google Meet joining info
>> Video call link: https://meet.google.com/zef-grqu-cqy
>>
>> Yufei
>>
>>
>> On Mon, Nov 6, 2023 at 4:39 AM Péter Váry <peter.vary.apa...@gmail.com>
>> wrote:
>>
>>> Hi Team,
>>>
>>> I was thinking about the possible implementations of a streaming read of
>>> MOR tables from Flink.
>>> I was checking the Spark code, and found that the feature is also
>>> missing from Spark. As Yufei mentioned, the building blocks are there, but
>>> the feature is not implemented yet.
>>> It would be good to implement the DeletedRowsScanTask and related
>>> features, so this feature would be available for both Spark and Flink
>>> engines.
>>>
>>> Thanks,
>>> Peter
>>>
>>> Yufei Gu <flyrain...@gmail.com> ezt írta (időpont: 2023. nov. 3., P,
>>> 18:47):
>>>
>>>> Hi Pucheng,
>>>>
>>>> In short, we can reuse front-end infrastructure, including the
>>>> changelog view procedure and iterators. We need some work from the reader
>>>> side, it is not a trivial one, but some essential building blocks, like the
>>>> `_deleted` metadata column, are there already.
>>>>
>>>> To get row-level deletes, we will leverage the `_deleted` metadata
>>>> column for both pos deletes and eq deletes. Especially, instead of emitting
>>>> equality deletes directly as cdc deleted rows, we resolve the eq deletes to
>>>> actual deleted rows and emit them as CDC delete rows. For example, an eq
>>>> delete may delete two data rows. We will emit the 2 actual deleted rows.We
>>>> change the design so that we emit all deleted(pos and eq) rows together in
>>>> the same format.
>>>>
>>>> The downside is that it is expensive for certain use cases. For
>>>> example, it has to scan all data files to resolve global eq deletes. We can
>>>> try to solve this by providing an option to emit eq deletes rows directly
>>>> in the future. Please refer to
>>>> https://github.com/apache/iceberg/issues/3941#issuecomment-1081273709
>>>> for more details.
>>>>
>>>>
>>>> Yufei
>>>>
>>>>
>>>> On Thu, Nov 2, 2023 at 9:17 PM Pucheng Yang <py...@pinterest.com.invalid>
>>>> wrote:
>>>>
>>>>> Feature request ticket: https://github.com/apache/iceberg/issues/8975
>>>>>
>>>>> On Thu, Nov 2, 2023 at 9:16 PM Pucheng Yang <py...@pinterest.com>
>>>>> wrote:
>>>>>
>>>>>> Hi community,
>>>>>>
>>>>>> I wonder if anyone is interested in having a MOR CDC view feature? My
>>>>>> organization is interested in using Flink upsert (MOR) into the Iceberg
>>>>>> table, but currently the MOR CDC view is not supported.
>>>>>>
>>>>>> If we were to support it, do you know how much work it will be? How
>>>>>> difficult will that be? Any pointers will be greatly appreciated.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> Best,
>>>>>> Pucheng
>>>>>>
>>>>>

Reply via email to