Hi Team,

I was thinking about the possible implementations of a streaming read of
MOR tables from Flink.
I was checking the Spark code, and found that the feature is also missing
from Spark. As Yufei mentioned, the building blocks are there, but the
feature is not implemented yet.
It would be good to implement the DeletedRowsScanTask and related features,
so this feature would be available for both Spark and Flink engines.

Thanks,
Peter

Yufei Gu <flyrain...@gmail.com> ezt írta (időpont: 2023. nov. 3., P, 18:47):

> Hi Pucheng,
>
> In short, we can reuse front-end infrastructure, including the changelog
> view procedure and iterators. We need some work from the reader side, it is
> not a trivial one, but some essential building blocks, like the `_deleted`
> metadata column, are there already.
>
> To get row-level deletes, we will leverage the `_deleted` metadata column
> for both pos deletes and eq deletes. Especially, instead of emitting
> equality deletes directly as cdc deleted rows, we resolve the eq deletes to
> actual deleted rows and emit them as CDC delete rows. For example, an eq
> delete may delete two data rows. We will emit the 2 actual deleted rows.We
> change the design so that we emit all deleted(pos and eq) rows together in
> the same format.
>
> The downside is that it is expensive for certain use cases. For example,
> it has to scan all data files to resolve global eq deletes. We can try to
> solve this by providing an option to emit eq deletes rows directly in the
> future. Please refer to
> https://github.com/apache/iceberg/issues/3941#issuecomment-1081273709 for
> more details.
>
>
> Yufei
>
>
> On Thu, Nov 2, 2023 at 9:17 PM Pucheng Yang <py...@pinterest.com.invalid>
> wrote:
>
>> Feature request ticket: https://github.com/apache/iceberg/issues/8975
>>
>> On Thu, Nov 2, 2023 at 9:16 PM Pucheng Yang <py...@pinterest.com> wrote:
>>
>>> Hi community,
>>>
>>> I wonder if anyone is interested in having a MOR CDC view feature? My
>>> organization is interested in using Flink upsert (MOR) into the Iceberg
>>> table, but currently the MOR CDC view is not supported.
>>>
>>> If we were to support it, do you know how much work it will be? How
>>> difficult will that be? Any pointers will be greatly appreciated.
>>>
>>> Thanks!
>>>
>>> Best,
>>> Pucheng
>>>
>>

Reply via email to