[DISCUSS] - Ongoing development of Flink Statefun

2025-03-09 Thread frans.king
Hi all,

 

I wanted to ask around and see if anyone has opinions on ongoing development
of Flink Statefun.   I had a look at upgrading it to Flink 1.19
  which seems doable but also noticed that there is at least
one usage of a deprecated API - DataSet in the state bootstrapping).

 

I'd be happy to uplift.

 

Thanks,


Frans

 

 



Re:Re: Re: Re: FLIP-510: Drop ChangelogNormalize for operations which don't need it

2025-03-09 Thread Xuyang
I have no other questions. +1 for it.




--

Best!
Xuyang





At 2025-03-07 19:37:09, "Dawid Wysakowicz"  wrote:
>>
>> From my understanding, for a sink, if its schema includes a primary key,
>> we can assume it has
>> the ability to process delete messages (with '-D') and perform deletions
>> by key (PK). If it does not
>> include a PK, we would implicitly treat it as a log-structured table that
>> supports full row deletions.
>
>
>I am afraid this assumption is too far going. PK is information about
>columns uniqueness and that's it. It does not tell us what is required to
>perform a DELETE operation. I agree the assumption would most often hold,
>but I am afraid it is not guaranteed. E.g. In a log based systems one may
>just want to have full information encoded in the DELETE messages. (e.g. in
>a debezium message)
>
>Same holds for sources. Even though theoretically, if there is a PK,
>deletes could contain only the key information, but the source may just as
>well produce DELETEs with all fields set.
>
>Given that you mentioned `PARTIAL_DELETE`, should I interpret this as
>> referring to a scenario
>> similar to wide tables, where if the sink has a PK, some columns are
>> deleted (set to null or through other operations) while others remain
>> unchanged?
>
>
>No. The effect is the same. That the ROW is deleted/disappears. The
>difference is what is required to perform the deletion. In some cases it
>may be enough to have the PK to perform the deletion and then we don't need
>the information about other columns, but there may be systems that require
>all columns to be set.
>
>By the way, since the flag applies both for sources and sinks to tell what
>is the expected format of DELETE records produced/consumed I renamed the
>flag in the FLIP:
>supportsDeleteByKey -> deletesByKeOnly.
>
>Let me know if there are other questions. If there are none, I'd like to
>start a vote in the upcoming days.
>
>Best,
>Dawid
>
>
>On Mon, 3 Mar 2025 at 07:29, Xuyang  wrote:
>
>> Hi, Dawid.
>>
>> Thanks for your response. I believe I've identified a key point, but I’m a
>> bit unclear about the
>>
>> following you said. Could you please provide an example for clarification?
>>
>> ```
>>
>> The only missing information is if the external sink can consume deletes
>> by key and if a source
>>
>> produces full deletes or deletes by key.
>>
>> ```
>>
>> From my understanding, for a sink, if its schema includes a primary key,
>> we can assume it has
>>
>> the ability to process delete messages (with '-D') and perform deletions
>> by key (PK). If it does not
>>
>> include a PK, we would implicitly treat it as a log-structured table that
>> supports full row deletions.
>>
>> Given that you mentioned `PARTIAL_DELETE`, should I interpret this as
>> referring to a scenario
>>
>> similar to wide tables, where if the sink has a PK, some columns are
>> deleted (set to null or through
>>
>> other operations) while others remain unchanged?
>>
>> Looking forward your reply.
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> Best!
>> Xuyang
>>
>>
>>
>>
>>
>> At 2025-02-28 19:16:12, "Dawid Wysakowicz" 
>> wrote:
>> >Hey Xuyang,
>> >Ad. 1
>> >Yes, you're right, but we already do that for determining if we need
>> >UPDATE_BEFORE or not. FlinkChangelogModeInferenceProgram already deals
>> with
>> >that.
>> >Ad. 2
>> >Unfortunately it is. This is also the only reason I need a FLIP. We can
>> >determine internally for every internal operator if we can work with
>> >partial deletes or if we need full deletes. The only missing information
>> is
>> >if the external sink can consume deletes by key and if a source produces
>> >full deletes or deletes by key. Unfortunately this is information that
>> >comes from a connector implementation and thus needs to be provided via a
>> >public API.
>> >Ad. 3
>> >With ChangelogMode#kinds -> to some degree yes. We theoretically could
>> >split RowKind#DELETE to RowKind#DELETE_BY_KEY and RowKind#FULL_DELETE.
>> >However, that change would 1) be much more involved 2) we would need to
>> >encode that information in every single message, which I think is not
>> >necessary. I don't think it has much to do with PK.
>> >Ad.4
>> >I don't think so. PK information is part of Schema not about the kind of
>> >messages. We don't have PK information for UPDATE_BEFORE/UPDATE_AFTER and
>> >they also apply per key. If the name containing `DELETE_BY_KEY` is
>> >confusing I am happy to rename it to e.g. PARTIAL_DELETE, therefore I'd
>> add
>> >`supportsPartialDeletes`
>> >
>> >Best,
>> >Dawid
>> >
>> >On Fri, 28 Feb 2025 at 04:43, Xuyang  wrote:
>> >
>> >> Hi Dawid.
>> >>
>> >>
>> >>
>> >>
>> >> Big +1 for this FLIP. After reading through it, I have a few questions
>> and
>> >> would appreciate your responses:
>> >>
>> >> 1. IIUC, we only need to provide additional information in the
>> >> `FlinkChangelogModeInferenceProgram` to enable the
>> >>
>> >> inference program to determine whether it is safe to remove
>> >> `ChangelogNorma

Re: [VOTE] FLIP-506: Support Reuse Multiple Table Sinks in Planner

2025-03-09 Thread xiangyu feng
Hi devs,

All comments in the discussion thread[1] have been resolved. I would like
to proceed this voting process.

[1] https://lists.apache.org/thread/r1wo9sf3d1725fhwzrttvv56k4rc782m

Regards,
Xiangyu Feng

Leonard Xu  于2025年3月10日周一 12:01写道:

> +1 (binding)
>
> Best,
> Leonard
>
> > 2025年2月25日 10:12,weijie guo  写道:
> >
> > +1(binding)
> >
> > Best regards,
> >
> > Weijie
> >
> >
> > Zhanghao Chen  于2025年2月23日周日 16:36写道:
> >
> >> +1 (non-binding)
> >>
> >> Thanks for driving this. It's a nice useability improvement for
> performing
> >> partial-updates on datalakes.
> >>
> >>
> >> Best,
> >> Zhanghao Chen
> >> 
> >> From: xiangyu feng 
> >> Sent: Sunday, February 23, 2025 10:44
> >> To: dev@flink.apache.org 
> >> Subject: [VOTE] FLIP-506: Support Reuse Multiple Table Sinks in Planner
> >>
> >> Hi all,
> >>
> >> I would like to start the vote for FLIP-506: Support Reuse Multiple
> Table
> >> Sinks in Planner[1].
> >> This FLIP was discussed in this thread [2].
> >>
> >> The vote will be open for at least 72 hours unless there is an
> objection or
> >> insufficient votes.
> >>
> >> [1]
> >>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-506%3A+Support+Reuse+Multiple+Table+Sinks+in+Planner
> >> [2] https://lists.apache.org/thread/r1wo9sf3d1725fhwzrttvv56k4rc782m
> >>
> >> Regards,
> >> Xiangyu Feng
> >>
>
>


Re: [VOTE] FLIP-506: Support Reuse Multiple Table Sinks in Planner

2025-03-09 Thread Lincoln Lee
+1 (binding)


Best,
Lincoln Lee


xiangyu feng  于2025年3月10日周一 13:33写道:

> Hi devs,
>
> All comments in the discussion thread[1] have been resolved. I would like
> to proceed this voting process.
>
> [1] https://lists.apache.org/thread/r1wo9sf3d1725fhwzrttvv56k4rc782m
>
> Regards,
> Xiangyu Feng
>
> Leonard Xu  于2025年3月10日周一 12:01写道:
>
> > +1 (binding)
> >
> > Best,
> > Leonard
> >
> > > 2025年2月25日 10:12,weijie guo  写道:
> > >
> > > +1(binding)
> > >
> > > Best regards,
> > >
> > > Weijie
> > >
> > >
> > > Zhanghao Chen  于2025年2月23日周日 16:36写道:
> > >
> > >> +1 (non-binding)
> > >>
> > >> Thanks for driving this. It's a nice useability improvement for
> > performing
> > >> partial-updates on datalakes.
> > >>
> > >>
> > >> Best,
> > >> Zhanghao Chen
> > >> 
> > >> From: xiangyu feng 
> > >> Sent: Sunday, February 23, 2025 10:44
> > >> To: dev@flink.apache.org 
> > >> Subject: [VOTE] FLIP-506: Support Reuse Multiple Table Sinks in
> Planner
> > >>
> > >> Hi all,
> > >>
> > >> I would like to start the vote for FLIP-506: Support Reuse Multiple
> > Table
> > >> Sinks in Planner[1].
> > >> This FLIP was discussed in this thread [2].
> > >>
> > >> The vote will be open for at least 72 hours unless there is an
> > objection or
> > >> insufficient votes.
> > >>
> > >> [1]
> > >>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-506%3A+Support+Reuse+Multiple+Table+Sinks+in+Planner
> > >> [2] https://lists.apache.org/thread/r1wo9sf3d1725fhwzrttvv56k4rc782m
> > >>
> > >> Regards,
> > >> Xiangyu Feng
> > >>
> >
> >
>


Re: [VOTE] FLIP-506: Support Reuse Multiple Table Sinks in Planner

2025-03-09 Thread Leonard Xu
+1 (binding)

Best,
Leonard

> 2025年2月25日 10:12,weijie guo  写道:
> 
> +1(binding)
> 
> Best regards,
> 
> Weijie
> 
> 
> Zhanghao Chen  于2025年2月23日周日 16:36写道:
> 
>> +1 (non-binding)
>> 
>> Thanks for driving this. It's a nice useability improvement for performing
>> partial-updates on datalakes.
>> 
>> 
>> Best,
>> Zhanghao Chen
>> 
>> From: xiangyu feng 
>> Sent: Sunday, February 23, 2025 10:44
>> To: dev@flink.apache.org 
>> Subject: [VOTE] FLIP-506: Support Reuse Multiple Table Sinks in Planner
>> 
>> Hi all,
>> 
>> I would like to start the vote for FLIP-506: Support Reuse Multiple Table
>> Sinks in Planner[1].
>> This FLIP was discussed in this thread [2].
>> 
>> The vote will be open for at least 72 hours unless there is an objection or
>> insufficient votes.
>> 
>> [1]
>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-506%3A+Support+Reuse+Multiple+Table+Sinks+in+Planner
>> [2] https://lists.apache.org/thread/r1wo9sf3d1725fhwzrttvv56k4rc782m
>> 
>> Regards,
>> Xiangyu Feng
>>