Hi:

Is there any update on this topic?

On Tue, Nov 14, 2023 at 07:25 Yufei Gu <flyrain...@gmail.com> wrote:

> Hi folks,
>
> We will discuss it this Wednesday(11/15) at 9 am PST. Feel free to join if
> you are interested.
>
> Sync-up for Iceberg CDC View on MOR
> Wednesday, November 15 · 9:00 – 10:00am
> Time zone: America/Los_Angeles
> Google Meet joining info
> Video call link: https://meet.google.com/zef-grqu-cqy
>
> Yufei
>
>
> On Mon, Nov 6, 2023 at 4:39 AM Péter Váry <peter.vary.apa...@gmail.com>
> wrote:
>
>> Hi Team,
>>
>> I was thinking about the possible implementations of a streaming read of
>> MOR tables from Flink.
>> I was checking the Spark code, and found that the feature is also missing
>> from Spark. As Yufei mentioned, the building blocks are there, but the
>> feature is not implemented yet.
>> It would be good to implement the DeletedRowsScanTask and related
>> features, so this feature would be available for both Spark and Flink
>> engines.
>>
>> Thanks,
>> Peter
>>
>> Yufei Gu <flyrain...@gmail.com> ezt írta (időpont: 2023. nov. 3., P,
>> 18:47):
>>
>>> Hi Pucheng,
>>>
>>> In short, we can reuse front-end infrastructure, including the changelog
>>> view procedure and iterators. We need some work from the reader side, it is
>>> not a trivial one, but some essential building blocks, like the `_deleted`
>>> metadata column, are there already.
>>>
>>> To get row-level deletes, we will leverage the `_deleted` metadata
>>> column for both pos deletes and eq deletes. Especially, instead of emitting
>>> equality deletes directly as cdc deleted rows, we resolve the eq deletes to
>>> actual deleted rows and emit them as CDC delete rows. For example, an eq
>>> delete may delete two data rows. We will emit the 2 actual deleted rows.We
>>> change the design so that we emit all deleted(pos and eq) rows together in
>>> the same format.
>>>
>>> The downside is that it is expensive for certain use cases. For example,
>>> it has to scan all data files to resolve global eq deletes. We can try to
>>> solve this by providing an option to emit eq deletes rows directly in the
>>> future. Please refer to
>>> https://github.com/apache/iceberg/issues/3941#issuecomment-1081273709
>>> for more details.
>>>
>>>
>>> Yufei
>>>
>>>
>>> On Thu, Nov 2, 2023 at 9:17 PM Pucheng Yang <py...@pinterest.com.invalid>
>>> wrote:
>>>
>>>> Feature request ticket: https://github.com/apache/iceberg/issues/8975
>>>>
>>>> On Thu, Nov 2, 2023 at 9:16 PM Pucheng Yang <py...@pinterest.com>
>>>> wrote:
>>>>
>>>>> Hi community,
>>>>>
>>>>> I wonder if anyone is interested in having a MOR CDC view feature? My
>>>>> organization is interested in using Flink upsert (MOR) into the Iceberg
>>>>> table, but currently the MOR CDC view is not supported.
>>>>>
>>>>> If we were to support it, do you know how much work it will be? How
>>>>> difficult will that be? Any pointers will be greatly appreciated.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Best,
>>>>> Pucheng
>>>>>
>>>>

Reply via email to