[jira] [Created] (FLINK-37451) Document correction - 1.10.0

2025-03-10 Thread Shalini M (Jira)
Shalini M created FLINK-37451:
-

 Summary: Document correction - 1.10.0
 Key: FLINK-37451
 URL: https://issues.apache.org/jira/browse/FLINK-37451
 Project: Flink
  Issue Type: Improvement
Reporter: Shalini M






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[VOTE] FLIP-509 Pluggable Batching For Async Sink

2025-03-10 Thread Poorvank Bhatia
Hi all,

I would like to start a vote on FLIP-509
:
 Add
pluggable Batching for Async Sink.
This FLIP introduces customizable, pluggable components for batching and
buffering, giving sink implementers better flexibility.

The discussion thread can be found here
.


[jira] [Created] (FLINK-37449) Postgres don't commit lsn when taskmanager failover

2025-03-10 Thread Xin Gong (Jira)
Xin Gong created FLINK-37449:


 Summary: Postgres don't commit lsn when taskmanager failover
 Key: FLINK-37449
 URL: https://issues.apache.org/jira/browse/FLINK-37449
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Affects Versions: cdc-3.2.1, cdc-3.3.0, cdc-3.1.1, cdc-3.2.0, cdc-3.1.0
Reporter: Xin Gong
 Fix For: cdc-3.3.1


Postgres cdc don't commit lsn when the task has entered incremental phase and 
happens failover. It cause wal data can't be cleaned.

Beacause PostgresSourceEnumerator#receiveOffsetCommitAck is true cause 
PostgresSourceReader#isCommitOffset always be false when the task failover and 
receiveOffsetCommitAck is true.  It will cause task never call 
notifyCheckpointComplete.
{code:java}
@Override
public void notifyCheckpointComplete(long checkpointId) throws Exception {
this.minHeap.add(checkpointId);
if (this.minHeap.size() <= this.lsnCommitCheckpointsDelay) {
LOG.info("Pending checkpoints '{}'.", this.minHeap);
return;
}
final long checkpointIdToCommit = this.minHeap.poll();
LOG.info(
"Pending checkpoints '{}', to be committed checkpoint id '{}'.",
this.minHeap,
checkpointIdToCommit);

// After all snapshot splits are finished, update stream split's metadata and 
reset start
// offset, which maybe smaller than before.
// In case that new start-offset of stream split has been recycled, don't 
commit offset
// during new table added phase.
if (isCommitOffset()) {
super.notifyCheckpointComplete(checkpointIdToCommit);
}
} {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] FLIP-507: Add Model DDL methods in TABLE API

2025-03-10 Thread Leonard Xu
Sorry for jumping the thread late, but I think current status of this FLIP is 
not ready, at least for me

(1) Could you finish your proposed API according Flink API bylaws?  For example 
the code piece should be: 
Builder {
SELF option(String key, String value
SELF setComment(@Nullable String comment);
=> 
/** Builder for {@link ModelDescriptor}. **/
@PublicEvolving
Builder {
/** Defines the option of {@link ModelDescriptor}. **/
SELF option(String key, String value);
/** Defines the comment of {@link ModelDescriptor}. **/
SELF setComment(@Nullable String comment);
(2) TableEnvironment is a public API, so any changes (such as adding public 
methods in this case) must be clearly documented. You may refer to [1] as an 
example. In the [Public Interfaces] section of this FLIP, only TableEnvironment 
is listed. However, the subsequent [Proposed Changes] section appears to 
conflate TableEnvironment with ModelDescriptor. Clarifications are needed:
Which package should ModelDescriptor belong to?
Is ModelDescriptor intended to be an inner class of TableEnvironment?

At last, this is a useful FLIP and I generally agree with the motivation and 
design, but it is not clear and standardized enough.
I will not directly -1 to cancel existing voting process, but I hope to 
continue voting after the addressed above(1)(2) comments. WDYT?

Best,
Leonard
[1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=334760466




> 2025年3月11日 01:19,Yash Anand  写道:
> 
> Hi Timo,
> 
> Thanks for pointing that out. I have added the full API of
> the ModelDescriptor in the FLIP.
> 
> Thanks,
> Yash Anand
> 
> 
> On Mon, Mar 10, 2025 at 11:27 AM Timo Walther  wrote:
> 
>> Hi Yash,
>> 
>> could you provide the full API of the ModelDescriptor in the FLIP?
>> 
>> Thanks,
>> Timo
>> 
>> 
>> On 10.03.25 16:55, Mingge Deng wrote:
>>> Thanks Yash!
>>> 
>>> +1 (binding)
>>> 
>>> Best,
>>> Mingge
>>> 
>>> 
>>> On Mon, Mar 10, 2025 at 8:51 AM Dawid Wysakowicz >> 
>>> wrote:
>>> 
 +1 (binding)
 Best,
 Dawid
 
 On Wed, 19 Feb 2025 at 18:37, Hao Li  wrote:
 
> +1 (non-binding)
> 
> Thanks Yash,
> Hao
> 
> On Tue, Feb 18, 2025 at 10:46 AM Yash Anand
>>  
> wrote:
> 
>> Hi Everyone,
>> 
>> I'd like to start a vote on FLIP-507: Add Model DDL methods in TABLE
 API
>> [1] which has been discussed in this thread [2].
>> 
>> The vote will be open for at least 72 hours unless there is an
 objection
> or
>> not enough votes.
>> 
>> [1]
>> 
>> 
> 
 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-507%3A+Add+Model+DDL+methods+in+TABLE+API
>> [2] https://lists.apache.org/thread/w9dt6y1w0yns5j3g4685tstjdg5flvy9
>> 
> 
 
>>> 
>> 
>> 



Re: Re: Re: Re: [DISCUSS] Early Fire Support for Flink SQL Interval Join

2025-03-10 Thread Weiqing Yang
Hi Becket,

Thanks for the feedback! I’ve updated the design doc with a new
“Granularity of the Early Fire Hint” section: link

.

Here’s a quick summary:

   - EARLY_FIRE is scoped to a single operator. If multiple operators in
   one query can support early firing, Flink applies the hint to the first
   matching operator and logs a warning to highlight the ambiguity. Subsequent
   operators will not receive the hint.
   - Split queries for different early-fire settings. If you need separate
   early-fire configurations - or just want to avoid ambiguity - splitting the
   SQL into multiple statements or views ensures each operator has its own
   distinct EARLY_FIRE hint.

Please let me know if you have any other questions or suggestions.

Best,
Weiqing

On Fri, Feb 28, 2025 at 5:01 PM Becket Qin  wrote:

> Thanks for updating the FLIP, Weiqing.
>
> Another question, what is the granularity of the early fire hint? Will the
> hint be applied to all the outer joins in the same block? What if in the
> query block there are also aggregations? For example:
>
> SELECT window_start, window_end, buyer, count(*) /*+
> EARLY_FIRE('delay'='5s'
> , 'time_mode'='rowtime|proctime') */
> FROM Orders o LEFT JOIN Shipments s
> WHERE o.id = s.order_id
> AND o.order_time BETWEEN s.ship_time - INTERVAL '4' HOUR AND s.ship_time
> GROUP BY o.window_start, o.window_end, buyer;
>
> What will be the behavior? Will both the aggregation and interval join? Or
> will it only be applied to the join?
> It would be useful to define the behavior here.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Thu, Feb 27, 2025 at 2:31 PM Weiqing Yang 
> wrote:
>
> > Thanks for the feedback, Becket. I’ve heard similar concerns from other
> > reviewers, and I now agree that the `interval` option is likely not
> widely
> > used. In the FLIP, it was intended specifically for Interval Join, not
> as a
> > general early-fire mechanism. We can consider making it more generic in a
> > future update. Since it’s optional, I’ll go ahead and remove it for
> > clarity.
> >
> >  Appreciate your input!
> >
> > Best,
> > Weiqing
> >
> > On Wed, Feb 19, 2025 at 9:46 AM Becket Qin  wrote:
> >
> > > Sorry for the late reply. I have a question regarding the configuration
> > of
> > > interval. I am wondering in which scenario the interval config will
> > > actually be used. Is the interval configuration only useful for
> windowed
> > > aggregation?
> > >
> > > For the join operations, early fire seems only applicable to the outer
> > join
> > > before there is a match. Once there is a match, the output of the join
> > > operation should be just event driven. For example, consider the
> > following
> > > scenario:
> > > 1. left outer join, and there is no match from the right side before
> the
> > > initial delay
> > > 2. a record [left, null] was emitted due to early fire
> > > 3. one of the following two cases must happen
> > > a. there is still no match event from the right side before the
> > > specific interval passes. In this case, we are not supposed to emit
> > another
> > > [left, null].
> > > b. if there is a match, a record of [left, right] should be emitted
> > > immediately regardless of the interval. If this happens, the interval
> > will
> > > be ignored from this point on.
> > >
> > > That said, if the configuration proposed in this FLIP is intended not
> > only
> > > for join, but for general purpose early fire, then the interval config
> > > makes sense.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > On Mon, Jan 27, 2025 at 1:56 PM Weiqing Yang  >
> > > wrote:
> > >
> > > > Thank you all for reviewing. As there are no objections, I’ll move
> > > forward
> > > > with a vote.
> > > >
> > > > Best regards,
> > > > Weiqing
> > > >
> > > > On Mon, Jan 27, 2025 at 9:30 AM Venkatakrishnan Sowrirajan <
> > > > vsowr...@asu.edu>
> > > > wrote:
> > > >
> > > > > Same here, I don't have any more questions. Thanks!
> > > > >
> > > > > On Sat, Jan 25, 2025, 10:10 PM Xingcan Cui 
> > wrote:
> > > > >
> > > > > > Hi Weiqing,
> > > > > >
> > > > > > I don't have any more questions. The doc looks good to me.
> > > > > >
> > > > > > Thanks,
> > > > > > Xingcan
> > > > > >
> > > > > > On Wed, Jan 22, 2025 at 8:46 PM Venkatakrishnan Sowrirajan <
> > > > > > vsowr...@asu.edu>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Weiqing,
> > > > > > >
> > > > > > > Thanks, that makes sense! Looks like I missed it.
> > > > > > >
> > > > > > > Regards
> > > > > > > Venkata krishnan
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Jan 21, 2025 at 10:55 PM Weiqing Yang <
> > > > > yangweiqing...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Venkata,
> > > > > > > >
> > > > > > > > * > where only one earlyFire is fired The DELAY *
> > > > > > > >
> > > > > > > > The DELAY option mentioned in the Public Interfaces section
> > 

[jira] [Created] (FLINK-37450) Document correction

2025-03-10 Thread Shalini M (Jira)
Shalini M created FLINK-37450:
-

 Summary: Document correction
 Key: FLINK-37450
 URL: https://issues.apache.org/jira/browse/FLINK-37450
 Project: Flink
  Issue Type: Improvement
Reporter: Shalini M






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-37452) Add support for writing to Paimon Append only table.

2025-03-10 Thread Yanquan Lv (Jira)
Yanquan Lv created FLINK-37452:
--

 Summary: Add support for writing to Paimon Append only table.
 Key: FLINK-37452
 URL: https://issues.apache.org/jira/browse/FLINK-37452
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Affects Versions: cdc-3.4.0
Reporter: Yanquan Lv
 Fix For: cdc-3.4.0


[Append table|https://paimon.apache.org/docs/0.8/append-table/append-table] is 
a common table type in Paimon, however, we don't clearly support and testing 
this type as pipeline data sink.

We should add support for this kind of table as sink.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] FLIP-506: Support Reuse Multiple Table Sinks in Planner

2025-03-10 Thread xiangyu feng
Thank you all for the votes! I will close the voting thread and summarize
the result in a separate email.

Regards,
Xiangyu Feng

Rui Fan <1996fan...@gmail.com> 于2025年3月10日周一 16:56写道:

> +1(binding)
>
> Best,
> Rui
>
> On Mon, Mar 10, 2025 at 4:07 PM Ron Liu  wrote:
>
> > +1(binding)
> >
> > Best,
> > Ron
> >
> > Jingsong Li  于2025年3月10日周一 15:44写道:
> >
> > > +1
> > >
> > > On Mon, Mar 10, 2025 at 1:56 PM Lincoln Lee 
> > > wrote:
> > > >
> > > > +1 (binding)
> > > >
> > > >
> > > > Best,
> > > > Lincoln Lee
> > > >
> > > >
> > > > xiangyu feng  于2025年3月10日周一 13:33写道:
> > > >
> > > > > Hi devs,
> > > > >
> > > > > All comments in the discussion thread[1] have been resolved. I
> would
> > > like
> > > > > to proceed this voting process.
> > > > >
> > > > > [1]
> https://lists.apache.org/thread/r1wo9sf3d1725fhwzrttvv56k4rc782m
> > > > >
> > > > > Regards,
> > > > > Xiangyu Feng
> > > > >
> > > > > Leonard Xu  于2025年3月10日周一 12:01写道:
> > > > >
> > > > > > +1 (binding)
> > > > > >
> > > > > > Best,
> > > > > > Leonard
> > > > > >
> > > > > > > 2025年2月25日 10:12,weijie guo  写道:
> > > > > > >
> > > > > > > +1(binding)
> > > > > > >
> > > > > > > Best regards,
> > > > > > >
> > > > > > > Weijie
> > > > > > >
> > > > > > >
> > > > > > > Zhanghao Chen  于2025年2月23日周日
> 16:36写道:
> > > > > > >
> > > > > > >> +1 (non-binding)
> > > > > > >>
> > > > > > >> Thanks for driving this. It's a nice useability improvement
> for
> > > > > > performing
> > > > > > >> partial-updates on datalakes.
> > > > > > >>
> > > > > > >>
> > > > > > >> Best,
> > > > > > >> Zhanghao Chen
> > > > > > >> 
> > > > > > >> From: xiangyu feng 
> > > > > > >> Sent: Sunday, February 23, 2025 10:44
> > > > > > >> To: dev@flink.apache.org 
> > > > > > >> Subject: [VOTE] FLIP-506: Support Reuse Multiple Table Sinks
> in
> > > > > Planner
> > > > > > >>
> > > > > > >> Hi all,
> > > > > > >>
> > > > > > >> I would like to start the vote for FLIP-506: Support Reuse
> > > Multiple
> > > > > > Table
> > > > > > >> Sinks in Planner[1].
> > > > > > >> This FLIP was discussed in this thread [2].
> > > > > > >>
> > > > > > >> The vote will be open for at least 72 hours unless there is an
> > > > > > objection or
> > > > > > >> insufficient votes.
> > > > > > >>
> > > > > > >> [1]
> > > > > > >>
> > > > > > >>
> > > > > >
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-506%3A+Support+Reuse+Multiple+Table+Sinks+in+Planner
> > > > > > >> [2]
> > > https://lists.apache.org/thread/r1wo9sf3d1725fhwzrttvv56k4rc782m
> > > > > > >>
> > > > > > >> Regards,
> > > > > > >> Xiangyu Feng
> > > > > > >>
> > > > > >
> > > > > >
> > > > >
> > >
> >
>


[RESULT][VOTE] FLIP-506: Support Reuse Multiple Table Sinks in Planner

2025-03-10 Thread xiangyu feng
Hi all,

Thanks for your review and the votes!

I am happy to announce that FLIP-506: Support Reuse Multiple Table Sinks in
Planner[1] has been accepted.

There are 6 binding votes and 1 non-binding votes [2]:

- Zhanghao Chen (non-binding)
- Weijie Guo (binding)
- Leonard Xu (binding)
- Lincoln Lee (binding)
- Jingsong Li (binding)
- Ron Liu (binding)
- Rui Fan (binding)

There is no disapproving vote.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-506%3A+Support+Reuse+Multiple+Table+Sinks+in+Planner
[2] https://lists.apache.org/thread/rgh2wolddgvljnqvqv74l92pp7t3w2y9

Regards,
Xiangyu Feng


Re: [VOTE] FLIP-507: Add Model DDL methods in TABLE API

2025-03-10 Thread Yash Anand
Hi Leonard,

Thank you for your input. I will update the FLIP accordingly to make it
more clear and standardized enough.

Thanks,
Yash Anand



On Mon, Mar 10, 2025 at 11:08 PM Leonard Xu  wrote:

> Sorry for jumping the thread late, but I think current status of this FLIP
> is not ready, at least for me
>
> (1) Could you finish your proposed API according Flink API bylaws?  For
> example the code piece should be:
> Builder {
> SELF option(String key, String value
> SELF setComment(@Nullable String comment);
> =>
> /** Builder for {@link ModelDescriptor}. **/
> @PublicEvolving
> Builder {
> /** Defines the option of {@link ModelDescriptor}. **/
> SELF option(String key, String value);
> /** Defines the comment of {@link ModelDescriptor}. **/
> SELF setComment(@Nullable String comment);
> (2) TableEnvironment is a public API, so any changes (such as adding
> public methods in this case) must be clearly documented. You may refer to
> [1] as an example. In the [Public Interfaces] section of this FLIP, only
> TableEnvironment is listed. However, the subsequent [Proposed Changes]
> section appears to conflate TableEnvironment with ModelDescriptor.
> Clarifications are needed:
> Which package should ModelDescriptor belong to?
> Is ModelDescriptor intended to be an inner class of TableEnvironment?
>
> At last, this is a useful FLIP and I generally agree with the motivation
> and design, but it is not clear and standardized enough.
> I will not directly -1 to cancel existing voting process, but I hope to
> continue voting after the addressed above(1)(2) comments. WDYT?
>
> Best,
> Leonard
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=334760466
>
>
>
>
> > 2025年3月11日 01:19,Yash Anand  写道:
> >
> > Hi Timo,
> >
> > Thanks for pointing that out. I have added the full API of
> > the ModelDescriptor in the FLIP.
> >
> > Thanks,
> > Yash Anand
> >
> >
> > On Mon, Mar 10, 2025 at 11:27 AM Timo Walther 
> wrote:
> >
> >> Hi Yash,
> >>
> >> could you provide the full API of the ModelDescriptor in the FLIP?
> >>
> >> Thanks,
> >> Timo
> >>
> >>
> >> On 10.03.25 16:55, Mingge Deng wrote:
> >>> Thanks Yash!
> >>>
> >>> +1 (binding)
> >>>
> >>> Best,
> >>> Mingge
> >>>
> >>>
> >>> On Mon, Mar 10, 2025 at 8:51 AM Dawid Wysakowicz <
> dwysakow...@apache.org
> >>>
> >>> wrote:
> >>>
>  +1 (binding)
>  Best,
>  Dawid
> 
>  On Wed, 19 Feb 2025 at 18:37, Hao Li 
> wrote:
> 
> > +1 (non-binding)
> >
> > Thanks Yash,
> > Hao
> >
> > On Tue, Feb 18, 2025 at 10:46 AM Yash Anand
> >>  >
> > wrote:
> >
> >> Hi Everyone,
> >>
> >> I'd like to start a vote on FLIP-507: Add Model DDL methods in TABLE
>  API
> >> [1] which has been discussed in this thread [2].
> >>
> >> The vote will be open for at least 72 hours unless there is an
>  objection
> > or
> >> not enough votes.
> >>
> >> [1]
> >>
> >>
> >
> 
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-507%3A+Add+Model+DDL+methods+in+TABLE+API
> >> [2]
> https://lists.apache.org/thread/w9dt6y1w0yns5j3g4685tstjdg5flvy9
> >>
> >
> 
> >>>
> >>
> >>
>
>


[jira] [Created] (FLINK-37443) Add returns() method to DataStream V2 API for specifying output types with lambda expressions

2025-03-10 Thread Nil Madhab (Jira)
Nil Madhab created FLINK-37443:
--

 Summary:  Add returns() method to DataStream V2 API for specifying 
output types with lambda expressions
 Key: FLINK-37443
 URL: https://issues.apache.org/jira/browse/FLINK-37443
 Project: Flink
  Issue Type: Improvement
  Components: API / DataStream
Affects Versions: 1.20.1
 Environment: * Apache Flink version: 1.20.1
 * Java version: OpenJDK 21
 * API: DataStream V2 API
Reporter: Nil Madhab


While following the official 
[tutorial|https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream-v2/overview/]
 of  DataStream V2 API, I encountered an issue.

When using the DataStream V2 API with lambda expressions for 
{{{}OneInputStreamProcessFunction{}}}, Java's type erasure prevents Flink from 
automatically determining the output type, resulting in the following exception:

 
{code:java}
Exception in thread "main" 
org.apache.flink.api.common.functions.InvalidTypesException: The return type of 
function 'process(NonKeyedPartitionStreamImpl.java:74)' could not be determined 
automatically, due to type erasure.{code}
The error suggests implementing {{ResultTypeQueryable}} interface or using an 
anonymous class as workarounds. In the traditional DataStream API, users could 
simply call {{.returns(TypeInformation)}} to explicitly specify the output type.
h3. Example that fails:
{code:java}
NonKeyedPartitionStream parsed = input.process(
    (OneInputStreamProcessFunction) (record, output, ctx) -> 
        output.collect(Integer.parseInt(record))
); {code}
h3. 
Example that works (but is more verbose):
{code:java}
NonKeyedPartitionStream parsed = input.process(
    new OneInputStreamProcessFunction() {
      @Override
      public void processRecord(String record, Collector output,
          PartitionedContext ctx) throws Exception {
        output.collect(Integer.parseInt(record));
      }
    }
); {code}
h3. 

Requested Enhancement

Add a {{.returns(TypeInformation)}} or {{.returns(Class)}} method to the 
{{NonKeyedPartitionStream}} class in the DataStream V2 API to allow for the 
specification of output types when using lambda expressions with process 
functions.

The documentation can use the anonymous class, until the issue is fixed, to 
prevent confusion for people new to flink. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] FLIP-506: Support Reuse Multiple Table Sinks in Planner

2025-03-10 Thread Ron Liu
+1(binding)

Best,
Ron

Jingsong Li  于2025年3月10日周一 15:44写道:

> +1
>
> On Mon, Mar 10, 2025 at 1:56 PM Lincoln Lee 
> wrote:
> >
> > +1 (binding)
> >
> >
> > Best,
> > Lincoln Lee
> >
> >
> > xiangyu feng  于2025年3月10日周一 13:33写道:
> >
> > > Hi devs,
> > >
> > > All comments in the discussion thread[1] have been resolved. I would
> like
> > > to proceed this voting process.
> > >
> > > [1] https://lists.apache.org/thread/r1wo9sf3d1725fhwzrttvv56k4rc782m
> > >
> > > Regards,
> > > Xiangyu Feng
> > >
> > > Leonard Xu  于2025年3月10日周一 12:01写道:
> > >
> > > > +1 (binding)
> > > >
> > > > Best,
> > > > Leonard
> > > >
> > > > > 2025年2月25日 10:12,weijie guo  写道:
> > > > >
> > > > > +1(binding)
> > > > >
> > > > > Best regards,
> > > > >
> > > > > Weijie
> > > > >
> > > > >
> > > > > Zhanghao Chen  于2025年2月23日周日 16:36写道:
> > > > >
> > > > >> +1 (non-binding)
> > > > >>
> > > > >> Thanks for driving this. It's a nice useability improvement for
> > > > performing
> > > > >> partial-updates on datalakes.
> > > > >>
> > > > >>
> > > > >> Best,
> > > > >> Zhanghao Chen
> > > > >> 
> > > > >> From: xiangyu feng 
> > > > >> Sent: Sunday, February 23, 2025 10:44
> > > > >> To: dev@flink.apache.org 
> > > > >> Subject: [VOTE] FLIP-506: Support Reuse Multiple Table Sinks in
> > > Planner
> > > > >>
> > > > >> Hi all,
> > > > >>
> > > > >> I would like to start the vote for FLIP-506: Support Reuse
> Multiple
> > > > Table
> > > > >> Sinks in Planner[1].
> > > > >> This FLIP was discussed in this thread [2].
> > > > >>
> > > > >> The vote will be open for at least 72 hours unless there is an
> > > > objection or
> > > > >> insufficient votes.
> > > > >>
> > > > >> [1]
> > > > >>
> > > > >>
> > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-506%3A+Support+Reuse+Multiple+Table+Sinks+in+Planner
> > > > >> [2]
> https://lists.apache.org/thread/r1wo9sf3d1725fhwzrttvv56k4rc782m
> > > > >>
> > > > >> Regards,
> > > > >> Xiangyu Feng
> > > > >>
> > > >
> > > >
> > >
>


[VOTE] Release 2.0.0, release candidate #2

2025-03-10 Thread Xintong Song
Hi everyone,

There were some blocker issues reported during building the RC1, when the
git tag has already been pushed and cannot be modified. Therefore, we are
skipping the RC1 and starting a vote on the RC2.

Please review and vote on the release candidate #2 for the version 2.0.0,
as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

The complete staging area is available for your review, which includes:

   - JIRA release notes [1], and the pull request adding release note for
   users [2],
   - the official Apache source release and binary convenience releases to
   be deployed to dist.apache.org [3], which are signed with the key with
   fingerprint F8E419AA0B60C28879E876859DFF40967ABFC5A4 [4],
   - all artifacts to be deployed to the Maven Central Repository [5],
   - source code tag "release-2.0.0-rc2" [6],
   - website pull request listing the new release and adding announcement
   blog post [7].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Best,

Xintong


[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12335344

[2] https://github.com/apache/flink/pull/26266

[3] https://dist.apache.org/repos/dist/dev/flink/flink-2.0.0-rc2

[4] https://dist.apache.org/repos/dist/release/flink/KEYS

[5] https://repository.apache.org/content/repositories/orgapacheflink-1792

[6] https://github.com/apache/flink/releases/tag/release-2.0.0-rc2

[7] https://github.com/apache/flink-web/pull/777


Re: [VOTE] FLIP-506: Support Reuse Multiple Table Sinks in Planner

2025-03-10 Thread Rui Fan
+1(binding)

Best,
Rui

On Mon, Mar 10, 2025 at 4:07 PM Ron Liu  wrote:

> +1(binding)
>
> Best,
> Ron
>
> Jingsong Li  于2025年3月10日周一 15:44写道:
>
> > +1
> >
> > On Mon, Mar 10, 2025 at 1:56 PM Lincoln Lee 
> > wrote:
> > >
> > > +1 (binding)
> > >
> > >
> > > Best,
> > > Lincoln Lee
> > >
> > >
> > > xiangyu feng  于2025年3月10日周一 13:33写道:
> > >
> > > > Hi devs,
> > > >
> > > > All comments in the discussion thread[1] have been resolved. I would
> > like
> > > > to proceed this voting process.
> > > >
> > > > [1] https://lists.apache.org/thread/r1wo9sf3d1725fhwzrttvv56k4rc782m
> > > >
> > > > Regards,
> > > > Xiangyu Feng
> > > >
> > > > Leonard Xu  于2025年3月10日周一 12:01写道:
> > > >
> > > > > +1 (binding)
> > > > >
> > > > > Best,
> > > > > Leonard
> > > > >
> > > > > > 2025年2月25日 10:12,weijie guo  写道:
> > > > > >
> > > > > > +1(binding)
> > > > > >
> > > > > > Best regards,
> > > > > >
> > > > > > Weijie
> > > > > >
> > > > > >
> > > > > > Zhanghao Chen  于2025年2月23日周日 16:36写道:
> > > > > >
> > > > > >> +1 (non-binding)
> > > > > >>
> > > > > >> Thanks for driving this. It's a nice useability improvement for
> > > > > performing
> > > > > >> partial-updates on datalakes.
> > > > > >>
> > > > > >>
> > > > > >> Best,
> > > > > >> Zhanghao Chen
> > > > > >> 
> > > > > >> From: xiangyu feng 
> > > > > >> Sent: Sunday, February 23, 2025 10:44
> > > > > >> To: dev@flink.apache.org 
> > > > > >> Subject: [VOTE] FLIP-506: Support Reuse Multiple Table Sinks in
> > > > Planner
> > > > > >>
> > > > > >> Hi all,
> > > > > >>
> > > > > >> I would like to start the vote for FLIP-506: Support Reuse
> > Multiple
> > > > > Table
> > > > > >> Sinks in Planner[1].
> > > > > >> This FLIP was discussed in this thread [2].
> > > > > >>
> > > > > >> The vote will be open for at least 72 hours unless there is an
> > > > > objection or
> > > > > >> insufficient votes.
> > > > > >>
> > > > > >> [1]
> > > > > >>
> > > > > >>
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-506%3A+Support+Reuse+Multiple+Table+Sinks+in+Planner
> > > > > >> [2]
> > https://lists.apache.org/thread/r1wo9sf3d1725fhwzrttvv56k4rc782m
> > > > > >>
> > > > > >> Regards,
> > > > > >> Xiangyu Feng
> > > > > >>
> > > > >
> > > > >
> > > >
> >
>


[jira] [Created] (FLINK-37445) Migrate InputFormatSourceFunction to the Source V2 API

2025-03-10 Thread Poorvank Bhatia (Jira)
Poorvank Bhatia created FLINK-37445:
---

 Summary: Migrate InputFormatSourceFunction to the Source V2 API
 Key: FLINK-37445
 URL: https://issues.apache.org/jira/browse/FLINK-37445
 Project: Flink
  Issue Type: Technical Debt
Reporter: Poorvank Bhatia


InputFormatSourceFunction: 
[https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/streaming/api/functions/source/legacy/InputFormatSourceFunction.java]
 needs to be moved to the new Source V2 API.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-512: Add meta information to SQL state connector

2025-03-10 Thread Gabor Somogyi
Hi Shengkai,

Please see my comments inline.

BR,
G


On Mon, Mar 3, 2025 at 7:07 AM Shengkai Fang  wrote:

> Hi, Gabor. Thanks for your the FLIP. I have some questions about the FLIP:
>
> 1. State TTL for Value Columns
> How can users retrieve the state TTL (Time-to-Live) for each value column?
> From my understanding of the current design, it seems that this
> functionality is not supported. Could you clarify if there are plans to
> address this limitation?
>

Since the state processor API is not yet exposing this information this
would require several steps.
First, the state processor API support needs to be added which can be then
exposed on the SQL API.
This is definitely a future improvement which is useful and can be handled
in a separate jira.


> 2. Metadata Table vs. Metadata Column
> The metadata information described in the FLIP appears to be intended to
> describe the state files stored at a specific location. To me, this concept
> aligns more closely with system tables like pg_tables in PostgreSQL [1] or
> the INFORMATION_SCHEMA in MySQL [2].
>

Adding a new connector with `savepoint-metadata` is a possibility where we
can create such functionality.
I'm not against that, just want to have a common agreement that we would
like to move that direction.
(As a side note not just PG but Spark also has similar approach and I
basically like the idea).
If we would go that direction savepoint metadata can be reached in a way
that one row would represent
an operator with it's values something like this:

┌─┬─┬─┬─┬─┬─┬─┬┐
│operatorN│operatorU│operatorH│paralleli│maxParall│subtaskSt│coordinat│totalSta│
│ame  │id   │ash  │sm   │elism│atesCount│orStateSi│tesSizeI│
│ │ │ │ │ │ │zeInBytes│nBytes  │
├─┼─┼─┼─┼─┼─┼─┼┤
│Source:  │datagen-s│47aee9439│2│128  │2│16   │546 │
│datagen-s│ource-uid│4d6ea26e2│ │ │ │ ││
│ource│ │d544bef0a│ │ │ │ ││
│ │ │37bb5│ │ │ │ ││
├─┼─┼─┼─┼─┼─┼─┼┤
│long-udf-│long-udf-│6ed3f40bf│2│128  │2│0│0   │
│with-mast│with-mast│f3c8dfcdf│ │ │ │ ││
│er-hook  │er-hook-u│cb95128a1│ │ │ │ ││
│ │id   │018f1│ │ │ │ ││
├─┼─┼─┼─┼─┼─┼─┼┤
│value-pro│value-pro│ca4f5fe9a│2│128  │2│0│40726   │
│cess │cess-uid │637b656f0│ │ │ │ ││
│ │ │9ea78b3e7│ │ │ │ ││
│ │ │a15b9│ │ │ │ ││
├─┼─┼─┼─┼─┼─┼─┼┤

This table can then be joined with the actually existing `savepoint`
connector created tables based on UID hash (which is unique and always
exists).
This would mean that the already existing table would need only a single
metadata column which is the UID hash.
WDYT?
@zakelly, plz share your thoughts too.


> If we opt to use metadata columns, every record in the table would end up
> having identical values for these columns (please correct me if I’m
> mistaken). On the other hand, the state connector requires users to specify
> an operator UID or operator UID hash, after which it outputs user-defined
> values in its records. This approach feels somewhat redundant to me.
>

If we would add a new `savepoint-metadata` connector then this can be
addressed.
On the other hand UID and UID hash are having either-or relationship from
config perspective,
so when a user provides the UID then he/she can be interested in the hash
for further calculations
(the whole Flink internals are depending on the hash). Printing out the
human readable UID
is an explicit requirement from the user side because hashes are not human
readable.


> 3. Handling LIST and MAP States in the State Connector
> I have concerns about how the current design handles LIST and MAP states.
> Specifically, the state connector uses Flink SQL’s MAP and ARRAY types,
> which implies that it attempts to load entire MAP or LIST states into
> memory.
>
> However, in many real-world scenarios, these states can grow very large.
> Typically, the state API addresses this by providing an iterator to
> traverse elements within the state incrementally. I’m unsure whether I’ve
> missed something in FLIP-496 or FLIP-512, but it seems that the current
> design might struggle with scalability in such cases.
>

You see it good, the current implementation keeps state for a sing