I have no other questions. +1 for it.
-- Best! Xuyang At 2025-03-07 19:37:09, "Dawid Wysakowicz" <dwysakow...@apache.org> wrote: >> >> From my understanding, for a sink, if its schema includes a primary key, >> we can assume it has >> the ability to process delete messages (with '-D') and perform deletions >> by key (PK). If it does not >> include a PK, we would implicitly treat it as a log-structured table that >> supports full row deletions. > > >I am afraid this assumption is too far going. PK is information about >columns uniqueness and that's it. It does not tell us what is required to >perform a DELETE operation. I agree the assumption would most often hold, >but I am afraid it is not guaranteed. E.g. In a log based systems one may >just want to have full information encoded in the DELETE messages. (e.g. in >a debezium message) > >Same holds for sources. Even though theoretically, if there is a PK, >deletes could contain only the key information, but the source may just as >well produce DELETEs with all fields set. > >Given that you mentioned `PARTIAL_DELETE`, should I interpret this as >> referring to a scenario >> similar to wide tables, where if the sink has a PK, some columns are >> deleted (set to null or through other operations) while others remain >> unchanged? > > >No. The effect is the same. That the ROW is deleted/disappears. The >difference is what is required to perform the deletion. In some cases it >may be enough to have the PK to perform the deletion and then we don't need >the information about other columns, but there may be systems that require >all columns to be set. > >By the way, since the flag applies both for sources and sinks to tell what >is the expected format of DELETE records produced/consumed I renamed the >flag in the FLIP: >supportsDeleteByKey -> deletesByKeOnly. > >Let me know if there are other questions. If there are none, I'd like to >start a vote in the upcoming days. > >Best, >Dawid > > >On Mon, 3 Mar 2025 at 07:29, Xuyang <xyzhong...@163.com> wrote: > >> Hi, Dawid. >> >> Thanks for your response. I believe I've identified a key point, but I’m a >> bit unclear about the >> >> following you said. Could you please provide an example for clarification? >> >> ``` >> >> The only missing information is if the external sink can consume deletes >> by key and if a source >> >> produces full deletes or deletes by key. >> >> ``` >> >> From my understanding, for a sink, if its schema includes a primary key, >> we can assume it has >> >> the ability to process delete messages (with '-D') and perform deletions >> by key (PK). If it does not >> >> include a PK, we would implicitly treat it as a log-structured table that >> supports full row deletions. >> >> Given that you mentioned `PARTIAL_DELETE`, should I interpret this as >> referring to a scenario >> >> similar to wide tables, where if the sink has a PK, some columns are >> deleted (set to null or through >> >> other operations) while others remain unchanged? >> >> Looking forward your reply. >> >> >> >> >> >> >> >> -- >> >> Best! >> Xuyang >> >> >> >> >> >> At 2025-02-28 19:16:12, "Dawid Wysakowicz" <wysakowicz.da...@gmail.com> >> wrote: >> >Hey Xuyang, >> >Ad. 1 >> >Yes, you're right, but we already do that for determining if we need >> >UPDATE_BEFORE or not. FlinkChangelogModeInferenceProgram already deals >> with >> >that. >> >Ad. 2 >> >Unfortunately it is. This is also the only reason I need a FLIP. We can >> >determine internally for every internal operator if we can work with >> >partial deletes or if we need full deletes. The only missing information >> is >> >if the external sink can consume deletes by key and if a source produces >> >full deletes or deletes by key. Unfortunately this is information that >> >comes from a connector implementation and thus needs to be provided via a >> >public API. >> >Ad. 3 >> >With ChangelogMode#kinds -> to some degree yes. We theoretically could >> >split RowKind#DELETE to RowKind#DELETE_BY_KEY and RowKind#FULL_DELETE. >> >However, that change would 1) be much more involved 2) we would need to >> >encode that information in every single message, which I think is not >> >necessary. I don't think it has much to do with PK. >> >Ad.4 >> >I don't think so. PK information is part of Schema not about the kind of >> >messages. We don't have PK information for UPDATE_BEFORE/UPDATE_AFTER and >> >they also apply per key. If the name containing `DELETE_BY_KEY` is >> >confusing I am happy to rename it to e.g. PARTIAL_DELETE, therefore I'd >> add >> >`supportsPartialDeletes` >> > >> >Best, >> >Dawid >> > >> >On Fri, 28 Feb 2025 at 04:43, Xuyang <xyzhong...@163.com> wrote: >> > >> >> Hi Dawid. >> >> >> >> >> >> >> >> >> >> Big +1 for this FLIP. After reading through it, I have a few questions >> and >> >> would appreciate your responses: >> >> >> >> 1. IIUC, we only need to provide additional information in the >> >> `FlinkChangelogModeInferenceProgram` to enable the >> >> >> >> inference program to determine whether it is safe to remove >> >> `ChangelogNormalize`. My first instinct is that we need to >> >> >> >> know if all subsequent output-side nodes consuming Upsert Keys include >> the >> >> Upsert Keys provided by the input-side operator (source). >> >> >> >> If this condition is met, we can safely eliminate `ChangelogNormalize`. >> >> Perhaps, I have missed some important points, so please feel >> >> >> >> free to correct me if necessary. >> >> >> >> 2. The introduction of `supportsDeleteByKey` in ChangelogMode seems to >> >> exist solely as auxiliary information for the >> >> >> >> `FlinkChangelogModeInferenceProgram`. If that's the case, it doesn't >> seem >> >> necessary to expose it in the public API, does it? >> >> >> >> 3. If the purpose of introducing `supportsDeleteByKey` in ChangelogMode >> is >> >> to facilitate support for `#fromChangelogStream` >> >> >> >> and `#toChangelogStream`, it appears that `supportsDeleteByKey` might >> >> overlap with ChangelogMode#kinds and Schema#PK >> >> >> >> to some extent, right? >> >> >> >> 4. Regarding supportsDeleteByKey, as part of a complete ChangelogMode >> >> entity, should we also store the specific key information? >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> Best! >> >> Xuyang >> >> >> >> >> >> >> >> >> >> >> >> 在 2025-02-28 04:27:19,"Martijn Visser" <martijnvis...@apache.org> 写道: >> >> >Hi Dawid, >> >> > >> >> >Thanks for the FLIP, looks like a good improvement for me that will >> bring >> >> a >> >> >lot of benefits. +1 >> >> > >> >> >Best regards, >> >> > >> >> >Martijn >> >> > >> >> >On Tue, Feb 25, 2025 at 6:51 AM Sergey Nuyanzin <snuyan...@gmail.com> >> >> wrote: >> >> > >> >> >> +1 for such improvement >> >> >> >> >> >> On Mon, Feb 24, 2025 at 12:01 PM Dawid Wysakowicz >> >> >> <wysakowicz.da...@gmail.com> wrote: >> >> >> > >> >> >> > Hi everyone, >> >> >> > >> >> >> > I would like to initiate a discussion for the FLIP-510[1] below, >> which >> >> >> aims >> >> >> > on optimising certain use cases in SQL which at the moment add >> >> >> > ChangelogNormalize, but don't necessarily need to do it. >> >> >> > >> >> >> > Looking forward to hearing from you. >> >> >> > >> >> >> > [1] https://cwiki.apache.org/confluence/x/7o5EF >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> Best regards, >> >> >> Sergey >> >> >> >> >> >>