Hi Pucheng,

In short, we can reuse front-end infrastructure, including the changelog
view procedure and iterators. We need some work from the reader side, it is
not a trivial one, but some essential building blocks, like the `_deleted`
metadata column, are there already.

To get row-level deletes, we will leverage the `_deleted` metadata column
for both pos deletes and eq deletes. Especially, instead of emitting
equality deletes directly as cdc deleted rows, we resolve the eq deletes to
actual deleted rows and emit them as CDC delete rows. For example, an eq
delete may delete two data rows. We will emit the 2 actual deleted rows.We
change the design so that we emit all deleted(pos and eq) rows together in
the same format.

The downside is that it is expensive for certain use cases. For example, it
has to scan all data files to resolve global eq deletes. We can try to
solve this by providing an option to emit eq deletes rows directly in the
future. Please refer to
https://github.com/apache/iceberg/issues/3941#issuecomment-1081273709 for
more details.


Yufei


On Thu, Nov 2, 2023 at 9:17 PM Pucheng Yang <py...@pinterest.com.invalid>
wrote:

> Feature request ticket: https://github.com/apache/iceberg/issues/8975
>
> On Thu, Nov 2, 2023 at 9:16 PM Pucheng Yang <py...@pinterest.com> wrote:
>
>> Hi community,
>>
>> I wonder if anyone is interested in having a MOR CDC view feature? My
>> organization is interested in using Flink upsert (MOR) into the Iceberg
>> table, but currently the MOR CDC view is not supported.
>>
>> If we were to support it, do you know how much work it will be? How
>> difficult will that be? Any pointers will be greatly appreciated.
>>
>> Thanks!
>>
>> Best,
>> Pucheng
>>
>

Reply via email to