Hey Xuyang, Ad. 1 Yes, you're right, but we already do that for determining if we need UPDATE_BEFORE or not. FlinkChangelogModeInferenceProgram already deals with that. Ad. 2 Unfortunately it is. This is also the only reason I need a FLIP. We can determine internally for every internal operator if we can work with partial deletes or if we need full deletes. The only missing information is if the external sink can consume deletes by key and if a source produces full deletes or deletes by key. Unfortunately this is information that comes from a connector implementation and thus needs to be provided via a public API. Ad. 3 With ChangelogMode#kinds -> to some degree yes. We theoretically could split RowKind#DELETE to RowKind#DELETE_BY_KEY and RowKind#FULL_DELETE. However, that change would 1) be much more involved 2) we would need to encode that information in every single message, which I think is not necessary. I don't think it has much to do with PK. Ad.4 I don't think so. PK information is part of Schema not about the kind of messages. We don't have PK information for UPDATE_BEFORE/UPDATE_AFTER and they also apply per key. If the name containing `DELETE_BY_KEY` is confusing I am happy to rename it to e.g. PARTIAL_DELETE, therefore I'd add `supportsPartialDeletes`
Best, Dawid On Fri, 28 Feb 2025 at 04:43, Xuyang <xyzhong...@163.com> wrote: > Hi Dawid. > > > > > Big +1 for this FLIP. After reading through it, I have a few questions and > would appreciate your responses: > > 1. IIUC, we only need to provide additional information in the > `FlinkChangelogModeInferenceProgram` to enable the > > inference program to determine whether it is safe to remove > `ChangelogNormalize`. My first instinct is that we need to > > know if all subsequent output-side nodes consuming Upsert Keys include the > Upsert Keys provided by the input-side operator (source). > > If this condition is met, we can safely eliminate `ChangelogNormalize`. > Perhaps, I have missed some important points, so please feel > > free to correct me if necessary. > > 2. The introduction of `supportsDeleteByKey` in ChangelogMode seems to > exist solely as auxiliary information for the > > `FlinkChangelogModeInferenceProgram`. If that's the case, it doesn't seem > necessary to expose it in the public API, does it? > > 3. If the purpose of introducing `supportsDeleteByKey` in ChangelogMode is > to facilitate support for `#fromChangelogStream` > > and `#toChangelogStream`, it appears that `supportsDeleteByKey` might > overlap with ChangelogMode#kinds and Schema#PK > > to some extent, right? > > 4. Regarding supportsDeleteByKey, as part of a complete ChangelogMode > entity, should we also store the specific key information? > > > > > > > > -- > > Best! > Xuyang > > > > > > 在 2025-02-28 04:27:19,"Martijn Visser" <martijnvis...@apache.org> 写道: > >Hi Dawid, > > > >Thanks for the FLIP, looks like a good improvement for me that will bring > a > >lot of benefits. +1 > > > >Best regards, > > > >Martijn > > > >On Tue, Feb 25, 2025 at 6:51 AM Sergey Nuyanzin <snuyan...@gmail.com> > wrote: > > > >> +1 for such improvement > >> > >> On Mon, Feb 24, 2025 at 12:01 PM Dawid Wysakowicz > >> <wysakowicz.da...@gmail.com> wrote: > >> > > >> > Hi everyone, > >> > > >> > I would like to initiate a discussion for the FLIP-510[1] below, which > >> aims > >> > on optimising certain use cases in SQL which at the moment add > >> > ChangelogNormalize, but don't necessarily need to do it. > >> > > >> > Looking forward to hearing from you. > >> > > >> > [1] https://cwiki.apache.org/confluence/x/7o5EF > >> > >> > >> > >> -- > >> Best regards, > >> Sergey > >> >