Re: Re: FLIP-510: Drop ChangelogNormalize for operations which don't need it

Dawid Wysakowicz Fri, 28 Feb 2025 03:16:34 -0800

Hey Xuyang,
Ad. 1
Yes, you're right, but we already do that for determining if we need
UPDATE_BEFORE or not. FlinkChangelogModeInferenceProgram already deals with
that.
Ad. 2
Unfortunately it is. This is also the only reason I need a FLIP. We can
determine internally for every internal operator if we can work with
partial deletes or if we need full deletes. The only missing information is
if the external sink can consume deletes by key and if a source produces
full deletes or deletes by key. Unfortunately this is information that
comes from a connector implementation and thus needs to be provided via a
public API.
Ad. 3
With ChangelogMode#kinds -> to some degree yes. We theoretically could
split RowKind#DELETE to RowKind#DELETE_BY_KEY and RowKind#FULL_DELETE.
However, that change would 1) be much more involved 2) we would need to
encode that information in every single message, which I think is not
necessary. I don't think it has much to do with PK.
Ad.4
I don't think so. PK information is part of Schema not about the kind of
messages. We don't have PK information for UPDATE_BEFORE/UPDATE_AFTER and
they also apply per key. If the name containing `DELETE_BY_KEY` is
confusing I am happy to rename it to e.g. PARTIAL_DELETE, therefore I'd add
`supportsPartialDeletes`


Best,
Dawid

On Fri, 28 Feb 2025 at 04:43, Xuyang <[email protected]> wrote:

> Hi Dawid.
>
>
>
>
> Big +1 for this FLIP. After reading through it, I have a few questions and
> would appreciate your responses:
>
> 1. IIUC, we only need to provide additional information in the
> `FlinkChangelogModeInferenceProgram` to enable the
>
> inference program to determine whether it is safe to remove
> `ChangelogNormalize`. My first instinct is that we need to
>
> know if all subsequent output-side nodes consuming Upsert Keys include the
> Upsert Keys provided by the input-side operator (source).
>
> If this condition is met, we can safely eliminate `ChangelogNormalize`.
> Perhaps, I have missed some important points, so please feel
>
> free to correct me if necessary.
>
> 2. The introduction of `supportsDeleteByKey` in ChangelogMode seems to
> exist solely as auxiliary information for the
>
> `FlinkChangelogModeInferenceProgram`. If that's the case, it doesn't seem
> necessary to expose it in the public API, does it?
>
> 3. If the purpose of introducing `supportsDeleteByKey` in ChangelogMode is
> to facilitate support for `#fromChangelogStream`
>
> and `#toChangelogStream`, it appears that `supportsDeleteByKey` might
> overlap with ChangelogMode#kinds and Schema#PK
>
> to some extent, right?
>
> 4. Regarding supportsDeleteByKey, as part of a complete ChangelogMode
> entity, should we also store the specific key information?
>
>
>
>
>
>
>
> --
>
>     Best！
>     Xuyang
>
>
>
>
>
> 在 2025-02-28 04:27:19，"Martijn Visser" <[email protected]> 写道：
> >Hi Dawid,
> >
> >Thanks for the FLIP, looks like a good improvement for me that will bring
> a
> >lot of benefits. +1
> >
> >Best regards,
> >
> >Martijn
> >
> >On Tue, Feb 25, 2025 at 6:51 AM Sergey Nuyanzin <[email protected]>
> wrote:
> >
> >> +1 for such improvement
> >>
> >> On Mon, Feb 24, 2025 at 12:01 PM Dawid Wysakowicz
> >> <[email protected]> wrote:
> >> >
> >> > Hi everyone,
> >> >
> >> > I would like to initiate a discussion for the FLIP-510[1] below, which
> >> aims
> >> > on optimising certain use cases in SQL which at the moment add
> >> > ChangelogNormalize, but don't necessarily need to do it.
> >> >
> >> > Looking forward to hearing from you.
> >> >
> >> > [1] https://cwiki.apache.org/confluence/x/7o5EF
> >>
> >>
> >>
> >> --
> >> Best regards,
> >> Sergey
> >>
>

Re: Re: FLIP-510: Drop ChangelogNormalize for operations which don't need it

Reply via email to