Hi All!

I would like to rekindle this conversation, as it seems to have stalled a few 
months ago.

We at Apple are looking very much forward to this feature and would love to 
start working on adding the CDC read support to the Flink connector once the 
first version of this has been merged. This would unblock a wide set of 
Flink/FlinkSQL use cases that are currently not possible to fully implement.

Based on the PR discussions, there seems to be a general consensus that the 
current implementation would well, even if there is a performance hit for the 
equality delete cases.

@Anton:
We are looking forward to all performance improvement ideas from You, but in 
the meantime maybe the best course of action would be to move ahead with the 
current version so we can unblock the adoption in the Flink connector side.

What do you think?

Cheers,
Gyula

On 2025/02/15 01:39:47 Wing Yew Poon wrote:
> Ok Anton. Please let me know.
> 
> 
> On Thu, Feb 13, 2025 at 9:28 PM Anton Okolnychyi <aokolnyc...@gmail.com>
> wrote:
> 
> > Hey Wing Yew, I am planning to focus on this after we get partition stats
> > readers/writers into main. I actually have ideas on how to implement
> > changelog scans for V2 tables efficiently.
> >
> > - Anton
> >
> > пн, 10 лют. 2025 р. о 21:11 Wing Yew Poon <wyp...@cloudera.com.invalid>
> > пише:
> >
> >> Hi Anton,
> >>
> >> Thank you for looking at https://github.com/apache/iceberg/pull/10935. I
> >> think we are in agreement on the behavior, but you have concerns about the
> >> performance of the scan, which I agree is justified. It has been some
> >> months now. Do you have any suggestions for improving the performance? How
> >> can we move forward with this? Can we get a working implementation in first
> >> and optimize it later?
> >>
> >> - Wing Yew
> >>
> >>
> >> On Sat, Oct 5, 2024 at 10:53 PM Anton Okolnychyi <aokolnyc...@gmail.com>
> >> wrote:
> >>
> >>> I will take a look next week!
> >>>
> >>> субота, 5 жовтня 2024 р. Péter Váry <peter.vary.apa...@gmail.com> пише:
> >>>
> >>>> Hi Team,
> >>>>
> >>>> Gentle reminder, that the PR for the changelog planning (
> >>>> https://github.com/apache/iceberg/pull/10935) is still waiting for
> >>>> expert reviews.
> >>>>
> >>>> Thanks, Peter
> >>>>
> >>>> On Tue, Oct 1, 2024, 06:46 Yufei Gu <flyrain...@gmail.com> wrote:
> >>>>
> >>>>> Thanks, Peter and Wing Yew Poon, for tackling these! I’ve been eager
> >>>>> to review, but this week has been hectic. I plan to check out PR #10935
> >>>>> next week, though I’d be happy if someone beats me to it.
> >>>>>
> >>>>> Yufei
> >>>>>
> >>>>>
> >>>>> On Mon, Sep 30, 2024 at 3:02 AM Péter Váry <
> >>>>> peter.vary.apa...@gmail.com> wrote:
> >>>>>
> >>>>>> Hi Team,
> >>>>>>
> >>>>>> The Changelog scan Java API interfaces were created a long time ago
> >>>>>> by Anton, but it has not been implemented until yet. There is a Spark
> >>>>>> specific SQL implementation for the feature, but the feature is not
> >>>>>> available on the Java API.
> >>>>>>
> >>>>>> The Flink CDC streaming read is one of the often required features
> >>>>>> [1] [2]. Flink needs the Java API to provide streaming reads for tables
> >>>>>> with deletes.
> >>>>>>
> >>>>>> Wing Yew Poon implemented the Java API [3]. I did my best reviewing
> >>>>>> the PR, but I am not an expert on this part of the code. I would like 
> >>>>>> to
> >>>>>> ask some of the planning experts (or anyone else for that matter), to 
> >>>>>> take
> >>>>>> a look and validate too.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Peter
> >>>>>>
> >>>>>> [1] - https://github.com/apache/iceberg/issues/5623
> >>>>>> [2] -
> >>>>>> https://github.com/apache/iceberg/issues/5803#issuecomment-1259759074
> >>>>>> [3] - https://github.com/apache/iceberg/pull/10935
> >>>>>>
> >>>>>
> 

Reply via email to