Re: [ANNOUNCE] Apache Flink Kafka Connectors 3.0.2 released

2023-12-04 Thread Martijn Visser
Hi Gordon,

Thanks for the release! I've pushed one hotfix [1], to make sure that
the Flink documentation shows the correct version number for the Flink
version it's compatible with.

Best regards,

Martijn

[1] 
https://github.com/apache/flink-connector-kafka/commit/6c3d3d06689336f2fd37bfa5a3b17a5377f07887

On Sat, Dec 2, 2023 at 1:57 AM Tzu-Li (Gordon) Tai  wrote:
>
> The Apache Flink community is very happy to announce the release of Apache 
> Flink Kafka Connectors 3.0.2. This release is compatible with the Apache 
> Flink 1.17.x and 1.18.x release series.
>
> Apache Flink® is an open-source stream processing framework for distributed, 
> high-performing, always-available, and accurate data streaming applications.
>
> The release is available for download at:
> https://flink.apache.org/downloads.html
>
> The full release notes are available in Jira:
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12353768
>
> We would like to thank all contributors of the Apache Flink community who 
> made this release possible!
>
> Regards,
> Gordon


Re: [VOTE] FLIP-364: Improve the restart-strategy

2023-12-04 Thread Rui Fan
Thanks to everyone who participated in this discussion and vote.

And closing this vote thread, results will be announced in a separate email.

Best,
Rui

On Sun, Dec 3, 2023 at 1:44 PM Jing Ge  wrote:

> +1(binding)
> Thanks!
>
> Best regards,
> Jing
>
> On Fri, Dec 1, 2023 at 9:29 AM Matthias Pohl  .invalid>
> wrote:
>
> > +1 (binding)
> >
> > On Fri, Dec 1, 2023 at 3:40 AM Zhu Zhu  wrote:
> >
> > > +1 (binding)
> > >
> > > Thanks,
> > > Zhu
> > >
> > > Zhanghao Chen  于2023年11月30日周四 23:31写道:
> > >
> > > > +1 (non-binding)
> > > >
> > > > Best,
> > > > Zhanghao Chen
> > > > 
> > > > From: Rui Fan <1996fan...@gmail.com>
> > > > Sent: Monday, November 13, 2023 11:01
> > > > To: dev 
> > > > Subject: [VOTE] FLIP-364: Improve the restart-strategy
> > > >
> > > > Hi everyone,
> > > >
> > > > Thank you to everyone for the feedback on FLIP-364: Improve the
> > > > restart-strategy[1]
> > > > which has been discussed in this thread [2].
> > > >
> > > > I would like to start a vote for it. The vote will be open for at
> least
> > > 72
> > > > hours unless there is an objection or not enough votes.
> > > >
> > > > [1] https://cwiki.apache.org/confluence/x/uJqzDw
> > > > [2] https://lists.apache.org/thread/5cgrft73kgkzkgjozf9zfk0w2oj7rjym
> > > >
> > > > Best,
> > > > Rui
> > > >
> > >
> >
>


[RESULT][VOTE] FLIP-364: Improve the exponential-delay restart-strategy

2023-12-04 Thread Rui Fan
Dear developers,

FLIP-364 Improve the exponential-delay restart-strategy[1] has been
accepted and voted through this thread [2].

The proposal received 6 approving votes, 5 of which are binding, and
there is no disapproval.

- Maximilian Michels (binding)
- Zhanghao Chen
- Zhu Zhu (binding)
- Matthias Pohl (binding)
- Jing Ge (binding)
- Rui Fan (binding)

Thanks to all participants for the discussion, voting, and providing
valuable feedback.

[1] https://cwiki.apache.org/confluence/x/uJqzDw
[2] https://lists.apache.org/thread/xo03tzw6d02w1vbcj5y9ccpqyc7bqrh9

Best,
Rui


[jira] [Created] (FLINK-33735) FLIP-364: Improve the exponential-delay restart-strategy

2023-12-04 Thread Rui Fan (Jira)
Rui Fan created FLINK-33735:
---

 Summary: FLIP-364: Improve the exponential-delay restart-strategy
 Key: FLINK-33735
 URL: https://issues.apache.org/jira/browse/FLINK-33735
 Project: Flink
  Issue Type: New Feature
  Components: Runtime / Coordination
Reporter: Rui Fan
Assignee: Rui Fan
 Fix For: 1.19.0


This is an umbrella Jira of [FLIP-364: Improve the exponential-delay 
restart-strategy.|https://cwiki.apache.org/confluence/x/uJqzDw]

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33736) Update default value of exponential-delay.max-backoff and exponential-delay.backoff-multiplier

2023-12-04 Thread Rui Fan (Jira)
Rui Fan created FLINK-33736:
---

 Summary: Update default value of exponential-delay.max-backoff and 
exponential-delay.backoff-multiplier
 Key: FLINK-33736
 URL: https://issues.apache.org/jira/browse/FLINK-33736
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Rui Fan
Assignee: Rui Fan
 Fix For: 1.19.0


Update default value of exponential-delay.max-backoff from 5min to 1min.

Update default value of exponential-delay.backoff-multiplier from 2.0 to 1.2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33737) Merge multiple Exceptions into one attempt for exponential-delay restart-strategy

2023-12-04 Thread Rui Fan (Jira)
Rui Fan created FLINK-33737:
---

 Summary: Merge multiple Exceptions into one attempt for 
exponential-delay restart-strategy
 Key: FLINK-33737
 URL: https://issues.apache.org/jira/browse/FLINK-33737
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Rui Fan
Assignee: Rui Fan
 Fix For: 1.19.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33738) Make exponential-delay restart-strategy the default restart strategy

2023-12-04 Thread Rui Fan (Jira)
Rui Fan created FLINK-33738:
---

 Summary: Make exponential-delay restart-strategy the default 
restart strategy
 Key: FLINK-33738
 URL: https://issues.apache.org/jira/browse/FLINK-33738
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Rui Fan
Assignee: Rui Fan
 Fix For: 1.19.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33739) Document FLIP-364: Improve the exponential-delay restart-strategy

2023-12-04 Thread Rui Fan (Jira)
Rui Fan created FLINK-33739:
---

 Summary: Document FLIP-364: Improve the exponential-delay 
restart-strategy
 Key: FLINK-33739
 URL: https://issues.apache.org/jira/browse/FLINK-33739
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Rui Fan
Assignee: Rui Fan
 Fix For: 1.19.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-395: Deprecate Global Aggregator Manager

2023-12-04 Thread Zhanghao Chen
Hi Benchao,

I think part of the reason is that a general global coordination mechanism is 
complex and hence subject to some internals changes in the future. Instead of 
directly exposing the full mechanism to users, it might be better to expose 
some well-defined subset of the feature set to users.

I'm also ccing the email to Piotr and David for their suggestions on this.

Best,
Zhanghao Chen

From: Benchao Li 
Sent: Monday, November 27, 2023 13:03
To: dev@flink.apache.org 
Subject: Re: [DISCUSS] FLIP-395: Deprecate Global Aggregator Manager

+1 for the idea.

Currently OperatorCoordinator is still marked as @Internal, shouldn't
it be a public API already?

Besides, GlobalAggregatorManager supports coordination between
different operators, but OperatorCoordinator only supports
coordination within one operator. And CoordinatorStore introduced in
FLINK-24439 opens the door for multi operators. Again, should it also
be a public API too?

Weihua Hu  于2023年11月27日周一 11:05写道:
>
> Thanks Zhanghao for driving this FLIP.
>
> +1 for this.
>
> Best,
> Weihua
>
>
> On Mon, Nov 20, 2023 at 5:49 PM Zhanghao Chen 
> wrote:
>
> > Hi all,
> >
> > I'd like to start a discussion of FLIP-395: Deprecate Global Aggregator
> > Manager [1].
> >
> > Global Aggregate Manager was introduced in [2] to support event time
> > synchronization across sources and more generally, coordination of parallel
> > tasks. AFAIK, this was only used in the Kinesis source for an early version
> > of watermark alignment. Operator Coordinator, introduced in FLIP-27,
> > provides a more powerful and elegant solution for that need and is part of
> > the new source API standard. FLIP-217 further provides a complete solution
> > for watermark alignment of source splits on top of the Operator Coordinator
> > mechanism. Furthermore, Global Aggregate Manager manages state in JobMaster
> > object, causing problems for adaptive parallelism changes [3].
> >
> > Therefore, I propose to deprecate the use of Global Aggregate Manager,
> > which can improve the maintainability of the Flink codebase without
> > compromising its functionality.
> >
> > Looking forward to your feedbacks, thanks.
> >
> > [1]
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-395%3A+Deprecate+Global+Aggregator+Manager
> > [2] https://issues.apache.org/jira/browse/FLINK-10886
> > [3] https://issues.apache.org/jira/browse/FLINK-31245
> >
> > Best,
> > Zhanghao Chen
> >



--

Best,
Benchao Li


Re: [DISCUSS] Resolve diamond inheritance of Sink.createWriter

2023-12-04 Thread Gyula Fóra
Hi All!

Based on the discussion above, I feel that the most reasonable approach
from both developers and users perspective at this point is what Becket
lists as Option 1:

Revert the naming change to the backward compatible version and accept that
the names are not perfect (treat it as legacy).

On a different note, I agree that the current sink v2 interface is very
difficult to evolve and structuring the interfaces the way they are now is
not a good design in the long run.
For new functionality or changes we can make easily, we should switch to
the decorative/mixin interface approach used successfully in the source and
table interfaces. Let's try to do this as much as possible within the v2
and compatibility boundaries and we should only introduce a v3 if we really
must.

So from my side, +1 to reverting the naming to keep backward compatibility.

Cheers,
Gyula


On Fri, Dec 1, 2023 at 10:43 AM Péter Váry 
wrote:

> Thanks Becket for your reply!
>
> *On Option 1:*
> - I personally consider API inconsistencies more important, since they will
> remain with us "forever", but this is up to the community. I can implement
> whichever solution we decide upon.
>
> *Option 2:*
> - I don't think this specific issue merits a rewrite, but if we decide to
> change our approach, then it's a different story.
>
> *Evolvability:*
> This discussion reminds me of a similar discussion on FLIP-372 [1], where
> we are trying to decide if we should use mixin interfaces, or use interface
> inheritance.
> With the mixin approach, we have a more flexible interface, but we can't
> check the generic types of the interfaces/classes on compile time, or even
> when we create the DAG. The first issue happens when we call the method and
> fail.
> The issue here is similar:
> - *StatefulSink* needs a writer with a method to `*snapshotState*`
> - *TwoPhaseCommittingSink* needs a writer with `*prepareCommit*`
> - If there is a Sink which is stateful and needs to commit, then it needs
> both of these methods.
>
> If we use the mixin solution here, we lose the possibility to check the
> types in compile time. We could do the type check in runtime using `
> *instanceof*`, so we are better off than with the FLIP-372 example above,
> but still lose any important possibility. I personally prefer the mixin
> approach, but that would mean we rewrite the Sink API again - likely a
> SinkV3. Are we ready to move down that path?
>
> Thanks,
> Peter
>
> [1] - https://lists.apache.org/thread/344pzbrqbbb4w0sfj67km25msp7hxlyd
>
> On Thu, Nov 30, 2023, 14:53 Becket Qin  wrote:
>
> > Hi folks,
> >
> > Sorry for replying late on the thread.
> >
> > For this particular FLIP, I see two solutions:
> >
> > Option 1:
> > 1. On top of the the current status, rename
> > *org.apache.flink.api.connector.sink2.InitContext *to
> > *CommonInitContext (*should
> > probably be package private*)*.
> > 2. Change the name *WriterInitContext* back to *InitContext, *and revert
> > the deprecation. We can change the parameter name to writerContext if we
> > want to.
> > Admittedly, this does not have full symmetric naming of the InitContexts
> -
> > we will have CommonInitContext / InitContext / CommitterInitContext
> instead
> > of InitContext / WriterInitContext / CommitterInitContext. However, the
> > naming seems clear without much confusion. Personally, I can live with
> > that, treating the class InitContext as a non-ideal legacy class name
> > without much material harm.
> >
> > Option 2:
> > Theoretically speaking, if we really want to reach the perfect state
> while
> > being backwards compatible, we can create a brand new set of Sink
> > interfaces and deprecate the old ones. But I feel this is an overkill
> here.
> >
> > The solution to this particular issue aside, the evolvability of the
> > current interface hierarchy seems a more fundamental issue and worries me
> > more. I haven't completely thought it through, but there are two
> noticeable
> > differences between the interface design principles between Source and
> > Sink.
> > 1. Source uses decorative interfaces. For example, we have a
> > SupportsFilterPushdown interface, instead of a subclass of
> > FilterableSource. This seems provides better flexibility.
> > 2. Source tends to have a more coarse-grained interface. For example,
> > SourceReader always has the methods of snapshotState(),
> > notifyCheckpointComplete(). Even if they may not be always required, we
> do
> > not separate them into different interfaces.
> > My hunch is that if we follow similar approach as Source, the
> evolvability
> > might be better. If we want to do this, we'd better to do it before 2.0.
> > What do you think?
> >
> > Process wise,
> > - I agree that if there is a change to the passed FLIP during
> > implementation, it should be brought back to the mailing list.
> > - There might be value for the connector nightly build to depend on the
> > latest snapshot of the same Flink major version. It helps catching
> > unexpected brea

[jira] [Created] (FLINK-33740) Introduce a flip to list the supported sql patterns

2023-12-04 Thread xuyang (Jira)
xuyang created FLINK-33740:
--

 Summary: Introduce a flip to list the supported sql patterns
 Key: FLINK-33740
 URL: https://issues.apache.org/jira/browse/FLINK-33740
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / API
Reporter: xuyang


Introduce a flip with aligning sql usages to list all patterns we support and 
do not support. 

See more details in https://issues.apache.org/jira/browse/FLINK-33490



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-395: Migration to GitHub Actions

2023-12-04 Thread Chesnay Schepler

We could limit the (first) trial run to branches.

PRs wouldn't be affected (avoiding a bunch of concerns about maybe 
blocking PRs and misleading people into thinking that CI is green), we'd 
have a better handle on how much capacity we are consuming, but 
contributors would still get the new setup (which for some is better 
than none).

We'd also side-step any potential security issue for the time being.

On 01/12/2023 05:10, Yangze Guo wrote:

Thanks for the efforts, @Matthias. +1 to start a trial on Github
Actions and migrate the CI if we can prove its computation capacity
and stability.

I share the same concern with Xintong that we do not explicitly claim
the effect of this trial on the contribution procedure. I think you
can elaborate more on this in the migration plan section. Here is my
thought about it:
I prefer to enable the CI workflow based on GitHub Actions for each PR
because it helps us understand its stability and performance under
certain pressures. However, I am not inclined to make "passing the CI
via GitHub Actions" a necessity in the code contribution process, we
can encourage contributors to report unstable cases under a specific
ticket umbrella when they encounter them.

Best,
Yangze Guo

On Thu, Nov 30, 2023 at 12:10 AM Matthias Pohl
 wrote:

With regards to Alex' concerns on hardware disparity: I did a bit more
digging on that one. I added my findings in a hardware section to FLIP-396
[1]. It appears that the hardware is more or less the same between the
different hosts. Apache INFRA's runners have more disk space (1TB in
comparison to 14GB), though.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-396%3A+Trial+during+Flink+1.19+Cycle+to+test+migrating+to+GitHub+Actions#FLIP396:TrialduringFlink1.19CycletotestmigratingtoGitHubActions-HardwareSpecifications

On Wed, Nov 29, 2023 at 4:01 PM Matthias Pohl 
wrote:


Thanks for your feedback Alex. I responded to your comments below:

This is mentioned in the "Limitations of GitHub Actions in the past"

section of the FLIP. Does this also apply to the Apache INFRA setup or
can we expect contributors' runs executed there too?


Workflow runs on Flink forks (independent of PRs that would merge to
Apache Flink's core repo) will be executed with runners provided by GitHub
with their own limitations. Secrets are not set in these runs (similar to
what we have right now with PR runs).

If we allow the PR CI to run on Apache INFRA-hosted ephemeral runners we
might have the same freedom because of their ephemeral nature (the VMs are
discarded leaving).

We only have to start thinking about self-hosted customized runners if we
decide/need to have dedicated VMs for Flink's CI (similar to what we have
right now with Azure CI and Alibaba's VMs). This might happen if the
waiting times for acquiring a runner are too long. In that case, we might
give a certain group of people (e.g. committers) or certain types of events
(for PRs,  nightly builds, PR merges) the ability to use the self-hosted
runners.

As you mentioned in the FLIP, there are some timeout-related test

discrepancies between different setups. Similar discrepancies could
manifest themselves between the Github runners and the Apache INFRA
runners. It would be great if we should have a uniform setup, where if
tests pass in the individual CI, they also pass in the main runner and vice
versa.


I agree. So far, what we've seen is that the timeout instability is coming
from too optimistic timeout configurations in some tests (they eventually
also fail in Azure CI; but the GitHub-provided runners seem to be more
sensitive in this regard). Fixing the tests if such a flakiness is observed
should bring us to a stage where the test behavior is matching between
different runners.

We had a similar issue in the Azure CI setup: Certain tests were more
stable on the Alibaba machines than on Azure VMs. That is why we introduced
a dedicated stage for Azure CI VMs as part of the nightly runs (see
FLINK-18370 [1]). We could do the same for GitHub Actions if necessary.

Currently we have such memory limits-related issues in individual vs main

Azure CI pipelines.


I'm not sure I understand what you mean by memory limit-related issues.
The GitHub-provided runners do not seem to run into memory-related issues.
We have to see whether this also applies to Apache INFRA-provided runners.
My hope is that they have even better hardware than what GitHub offers. But
GitHub-provided runners seem to be a good fallback to rely on (see the
workflows I shared in my previous response to Xintong's message).

[1] https://issues.apache.org/jira/browse/FLINK-18370

On Wed, Nov 29, 2023 at 3:17 PM Matthias Pohl 
wrote:


Thanks for your comments, Xintong. See my answers below.



I think it would be helpful if we can at the end migrate the CI to an
ASF-managed Github Action, as long as it provides us a similar
computation capacity and stability.


The current test runs in my Flink fork (using the GitHub-provided
runn

Re: [VOTE] FLIP-379: Dynamic source parallelism inference for batch jobs

2023-12-04 Thread Etienne Chauchot

Correct,

I forgot that in the bylaws, committer vote is binding for FLIPs thanks 
for the reminder.


Best

Etienne

Le 30/11/2023 à 10:43, Leonard Xu a écrit :

+1(binding)

Btw, @Etienne, IIRC, your vote should be a binding one.


Best,
Leonard


2023年11月30日 下午5:03,Etienne Chauchot  写道:

+1 (non-biding)

Etienne

Le 30/11/2023 à 09:13, Rui Fan a écrit :

+1(binding)

Best,
Rui

On Thu, Nov 30, 2023 at 3:56 PM Lijie Wang   wrote:


+1 (binding)

Best,
Lijie

Zhu Zhu   于2023年11月30日周四 13:13写道:


+1

Thanks,
Zhu

Xia Sun   于2023年11月30日周四 11:41写道:


Hi everyone,

I'd like to start a vote on FLIP-379: Dynamic source parallelism

inference

for batch jobs[1] which has been discussed in this thread [2].

The vote will be open for at least 72 hours unless there is an

objection

or

not enough votes.


[1]



https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs

[2]https://lists.apache.org/thread/ocftkqy5d2x4n58wzprgm5qqrzzkbmb8


Best Regards,
Xia

Re: [VOTE] FLIP-379: Dynamic source parallelism inference for batch jobs

2023-12-04 Thread Jingsong Li
+1 binding

On Mon, Dec 4, 2023 at 10:33 PM Etienne Chauchot  wrote:
>
> Correct,
>
> I forgot that in the bylaws, committer vote is binding for FLIPs thanks
> for the reminder.
>
> Best
>
> Etienne
>
> Le 30/11/2023 à 10:43, Leonard Xu a écrit :
> > +1(binding)
> >
> > Btw, @Etienne, IIRC, your vote should be a binding one.
> >
> >
> > Best,
> > Leonard
> >
> >> 2023年11月30日 下午5:03,Etienne Chauchot  写道:
> >>
> >> +1 (non-biding)
> >>
> >> Etienne
> >>
> >> Le 30/11/2023 à 09:13, Rui Fan a écrit :
> >>> +1(binding)
> >>>
> >>> Best,
> >>> Rui
> >>>
> >>> On Thu, Nov 30, 2023 at 3:56 PM Lijie Wang   
> >>> wrote:
> >>>
>  +1 (binding)
> 
>  Best,
>  Lijie
> 
>  Zhu Zhu   于2023年11月30日周四 13:13写道:
> 
> > +1
> >
> > Thanks,
> > Zhu
> >
> > Xia Sun   于2023年11月30日周四 11:41写道:
> >
> >> Hi everyone,
> >>
> >> I'd like to start a vote on FLIP-379: Dynamic source parallelism
> > inference
> >> for batch jobs[1] which has been discussed in this thread [2].
> >>
> >> The vote will be open for at least 72 hours unless there is an
>  objection
> > or
> >> not enough votes.
> >>
> >>
> >> [1]
> >>
> >>
>  https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs
> >> [2]https://lists.apache.org/thread/ocftkqy5d2x4n58wzprgm5qqrzzkbmb8
> >>
> >>
> >> Best Regards,
> >> Xia


Re: [DISCUSS] Contribute Flink Doris Connector to the Flink community

2023-12-04 Thread Leonard Xu
Hey, wudi

I’ve added permission for you, hope to see your FLIP.

Best,
Leonard



> 2023年12月1日 上午10:09,wudi <676366...@qq.com.INVALID> 写道:
> 
> Thank you everyone. But I encountered a problem when creating FLIP. There is 
> no permission to create files in the Flink Improvement Proposals [1] space. I 
> may need PMC to help me add permissions: My Jira account is Di Wu The email 
> is d...@apache.org Thanks  [1] 
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals 
> 
> 
> 
> --
> 
> Brs,
> 
> di.wu
> 
> 
>> 2023年11月27日 下午1:22,Jing Ge  写道:
>> 
>> That sounds great! +1
>> 
>> Best regards
>> Jing
>> 
>> On Mon, Nov 27, 2023 at 3:38 AM Leonard Xu  wrote:
>> 
>>> Thanks wudi for kicking off the discussion,
>>> 
>>> +1 for the idea from my side.
>>> 
>>> A FLIP like Yun posted is required if no other objections.
>>> 
>>> Best,
>>> Leonard
>>> 
 2023年11月26日 下午6:22,wudi <676366...@qq.com.INVALID> 写道:
 
 Hi all,
 
 At present, Flink Connector and Flink's repository have been
>>> decoupled[1].
 At the same time, the Flink-Doris-Connector[3] has been maintained based
>>> on the Apache Doris[2] community.
 I think the Flink Doris Connector can be migrated to the Flink community
>>> because it It is part of Flink Connectors and can also expand the ecosystem
>>> of Flink Connectors.
 
 I volunteer to move this forward if I can.
 
 [1]
>>> https://cwiki.apache.org/confluence/display/FLINK/Externalized+Connector+development
 [2] https://doris.apache.org/
 [3] https://github.com/apache/doris-flink-connector
 
 --
 
 Brs,
 di.wu
>>> 
>>> 
> 



Re: Doc about cleaning savePoints and checkPoints

2023-12-04 Thread Rodrigo Meneses
Thanks for providing that information!


On Fri, Dec 1, 2023 at 12:58 AM Zakelly Lan  wrote:

> Hi, Rodrigo
>
> It appears that the configurations you mentioned in your first question are
> related to the flink kubernetes operator. Are you using the flink
> kubernetes operator?
>
> In regards to the cleaning behavior when users restore a job from a
> savepoint or retained checkpoint, you can find detailed information in:
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#execution-savepoint-restore-mode
> (See "execution.savepoint-restore-mode").  Hope this helps.
>
>
> Best,
> Zakelly
>
> On Fri, Dec 1, 2023 at 3:34 AM Rodrigo Meneses  wrote:
>
> > Hi Flink Community,
> >
> > I'm searching for docs about how the cleaning of checkpoints and
> savepoints
> > actually work.
> >
> > I'm interested particularly in the cases when the user has `NATIVE`
> format
> > (incremental savepoint). Somehow, when using NATIVE format, the number of
> > savepoints kept are not matching the savepoint parameters like :
> > ```
> >   ["kubernetes.operator.savepoint.history.max.age"] = "7d"
> >   ["kubernetes.operator.savepoint.history.max.count"] = "14"
> > ```
> >
> > Also, I would like to understand better when the checkpoints are cleaned.
> > According to
> >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/checkpoints/
> > the checkpoints are cleaned when a program is cancelled. What happens if
> a
> > user suspends and then restores the job? Or when a user upgrades the job?
> > Are the checkpoints also cleaned in this situation?
> >
> > Thanks so much for you time
> > -Rodrigo
> >
>


Re: [DISCUSS] FLIP-392: Deprecate the Legacy Group Window Aggregation

2023-12-04 Thread Feng Jin
Hi xuyang,

Thank you for initiating this proposal.

I'm glad to see that TVF's functionality can be fully supported.

Regarding the early fire, late fire, and allow lateness features, how will
they be provided to users? The documentation doesn't seem to provide a
detailed description of this part.

Since this FLIP will also involve a lot of feature development, I am more
than willing to help, including development and code review.

Best,
Feng

On Tue, Nov 28, 2023 at 8:31 PM Xuyang  wrote:

> Hi all.
> I'd like to start a discussion of FLIP-392: Deprecate the Legacy Group
> Window Aggregation.
>
>
> Although the current Flink SQL Window Aggregation documentation[1]
> indicates that the legacy Group Window Aggregation
> syntax has been deprecated, the new Window TVF Aggregation syntax has not
> fully covered all of the features of the legacy one.
>
>
> Compared to Group Window Aggergation, Window TVF Aggergation has several
> advantages, such as two-stage optimization,
> support for standard GROUPING SET syntax, and so on. However, it needs to
> supplement and enrich the following features.
>
>
> 1. Support for SESSION Window TVF Aggregation
> 2. Support for consuming CDC stream
> 3. Support for HOP window size with non-integer step length
> 4. Support for configurations such as early fire, late fire and allow
> lateness
> (which are internal experimental configurations in Group Window
> Aggregation and not public to users yet.)
> 5. Unification of the Window TVF Aggregation operator in runtime at the
> implementation layer
> (In the long term, the cost to maintain the operators about Window TVF
> Aggregation and Group Window Aggregation is too expensive.)
>
>
> This flip aims to continue the unfinished work in FLIP-145[2], which is to
> fully enable the capabilities of Window TVF Aggregation
>  and officially deprecate the legacy syntax Group Window Aggregation, to
> prepare for the removal of the legacy one in Flink 2.0.
>
>
> I have already done some preliminary POC to validate the feasibility of
> the related work in this flip as follows.
> 1. POC for SESSION Window TVF Aggregation [3]
> 2. POC for CUMULATE in Group Window Aggregation operator [4]
> 3. POC for consuming CDC stream in Window Aggregation operator [5]
>
>
> Looking forward to your feedback and thoughts!
>
>
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/window-agg/
>
> [2]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function#FLIP145:SupportSQLwindowingtablevaluedfunction-SessionWindows
> [3] https://github.com/xuyangzhong/flink/tree/FLINK-24024
> [4]
> https://github.com/xuyangzhong/flink/tree/poc_legacy_group_window_agg_cumulate
> [5]
> https://github.com/xuyangzhong/flink/tree/poc_window_agg_consumes_cdc_stream
>
>
>
> --
>
> Best!
> Xuyang


Re: [DISCUSS] Proposing an LTS Release for the 1.x Line

2023-12-04 Thread Alexander Fedulov
Hi everyone,

As we progress with the 1.19 release, which might potentially (although not
likely) be the last in the 1.x line, I'd like to revive our discussion on
the
LTS support matter. There is a general consensus that due to breaking API
changes in 2.0, extending bug fixes support by designating an LTS release
is
something we want to do.

To summarize, the approaches we've considered are:

Time-based: The last release of the 1.x line gets a clear end-of-life date
(2 years).
Release-based: The last release of the 1.x line gets support for 4 minor
releases in the 2.x line. The exact time is unknown, but we assume it to be
roughly 2 years.
LTS-to-LTS release: The last release of the 1.x line is supported until the
last release in the 2.x line is designated as LTS.

We need to strike a balance between being user-friendly and nudging people
to
upgrade. From that perspective, option 1 is my favorite - we all know that
having a clear deadline works wonders in motivating action. At the same
time,
I appreciate that we might not want to introduce new kinds of procedures,
so
option 2 would be my second choice, provided we also formulate it like "4
minor releases, at least 2 years" (in case the minor release cadence
accelerates for some reason). I find option 3 a bit problematic since it
gives no incentive to upgrade, and people who do not follow Flink
development
closely might be caught by surprise when we decide to bump the major
version
again.

I'd like to open a vote to stamp the official decision, but first, I would
like to understand if we can reach consensus on one of the options here, or
if
we'll need to push that to a vote by presenting multiple options.

Looking forward to hearing your thoughts on this matter.

P.S.: 1.x and 2.x are just examples in this case; the decision also
translates
into a procedure for future major releases.

Best,
Alex

On Thu, 27 Jul 2023 at 17:32, Jing Ge  wrote:

> Hi Konstantin,
>
> What you said makes sense. Dropping modules is an efficient option to
> accelerate Flink evolution with acceptable function regressions. We should
> do it constantly and strategically. On the other hand, we should point out
> the core modules that must be backward compatible when a new major version
> is released.
>
> Best regards,
> Jing
>
> On Wed, Jul 26, 2023 at 10:52 PM Matthias Pohl
>  wrote:
>
> > >
> > > @Mathias, I am not quite sure about the 3 versions description. Are you
> > > concerned that 1.x and 2.x LTS releases could overlap, if 3.0 comes
> > early?
> >
> > Yes. Maybe, that's only a theoretical scenario. It wouldn't work if we go
> > with your suggestion to use "proper time" rather than release cycles to
> > define the length of a support period (which sounds reasonable). My
> concern
> > was that we get into a situation where we need to support four versions
> of
> > Flink.
> >
> > On Wed, Jul 26, 2023 at 4:21 PM Alexander Fedulov <
> > alexander.fedu...@gmail.com> wrote:
> >
> > > The question is if we want to tie the release cycle of 2.x to how much
> > time
> > > we give our users to migrate. And "time" is a critical word here. I can
> > see
> > > us
> > > potentially wanting to iterate on the 2.x line more rapidly, because of
> > all
> > > of the
> > > major changes, until the cycles get settled to a typical cadence again.
> > >
> > > This means that user's won't know how much time they would have to
> > actually
> > > migrate off of 1.x. And I can see this knowledge being critical for
> > > companies'
> > > quarterly/yearly plannings, so transparency here is key.
> > >
> > > That's why I think it makes sense to deviate from the typical N minor
> > > releases
> > > rule and set an explicit time period. We usually have a minor release
> > every
> > > four
> > > months, so my proposition would be to designate a 1.5 years period as
> > > a generous approximation of a 4 releases cycle.
> > >
> > > I also agree with limiting support to bugfixes - Flink is at the level
> of
> > > maturity where
> > > I believe nothing so critical will be  missing in the last 1.x release
> > that
> > > we'd need to
> > > backport if from 2.x. In the end, we want to encourage users to
> migrate.
> > >
> > > @Mathias, I am not quite sure about the 3 versions description. Are you
> > > concerned that 1.x and 2.x LTS releases could overlap, if 3.0 comes
> > early?
> > >
> > > Best,
> > > Alex
> > >
> > > On Wed, 26 Jul 2023 at 14:47, Matthias Pohl  > > .invalid>
> > > wrote:
> > >
> > > > I think making the last minor release before a major release an LTS
> > > release
> > > > with extended support makes sense. I cannot think of a reason against
> > the
> > > > four minor release cycles suggested by Marton. Only providing bug
> fixes
> > > and
> > > > not allowing features to be backported sounds reasonable to keep the
> > > > maintenance costs low.
> > > >
> > > > And maybe we can make this a general convention for last minor
> releases
> > > for
> > > > > all major releases, rather than only discuss i

Re: [DISCUSS] FLIP-392: Deprecate the Legacy Group Window Aggregation

2023-12-04 Thread David Anderson
The current situation (where we have both the legacy windows and the
TVF-based windows) is confusing for users, and I'd like to see us move
forward as rapidly as possible.

Since the early fire, late fire, and allowed lateness features were never
documented or exposed to users, I don't feel that we need to provide
replacements for these internal, experimental features before officially
deprecating the legacy group window aggregations, and I'd rather not wait.

However, I'd be delighted to see a proposal for what that might look like.

Best,
David

On Mon, Dec 4, 2023 at 12:45 PM Feng Jin  wrote:

> Hi xuyang,
>
> Thank you for initiating this proposal.
>
> I'm glad to see that TVF's functionality can be fully supported.
>
> Regarding the early fire, late fire, and allow lateness features, how will
> they be provided to users? The documentation doesn't seem to provide a
> detailed description of this part.
>
> Since this FLIP will also involve a lot of feature development, I am more
> than willing to help, including development and code review.
>
> Best,
> Feng
>
> On Tue, Nov 28, 2023 at 8:31 PM Xuyang  wrote:
>
> > Hi all.
> > I'd like to start a discussion of FLIP-392: Deprecate the Legacy Group
> > Window Aggregation.
> >
> >
> > Although the current Flink SQL Window Aggregation documentation[1]
> > indicates that the legacy Group Window Aggregation
> > syntax has been deprecated, the new Window TVF Aggregation syntax has not
> > fully covered all of the features of the legacy one.
> >
> >
> > Compared to Group Window Aggergation, Window TVF Aggergation has several
> > advantages, such as two-stage optimization,
> > support for standard GROUPING SET syntax, and so on. However, it needs to
> > supplement and enrich the following features.
> >
> >
> > 1. Support for SESSION Window TVF Aggregation
> > 2. Support for consuming CDC stream
> > 3. Support for HOP window size with non-integer step length
> > 4. Support for configurations such as early fire, late fire and allow
> > lateness
> > (which are internal experimental configurations in Group Window
> > Aggregation and not public to users yet.)
> > 5. Unification of the Window TVF Aggregation operator in runtime at the
> > implementation layer
> > (In the long term, the cost to maintain the operators about Window TVF
> > Aggregation and Group Window Aggregation is too expensive.)
> >
> >
> > This flip aims to continue the unfinished work in FLIP-145[2], which is
> to
> > fully enable the capabilities of Window TVF Aggregation
> >  and officially deprecate the legacy syntax Group Window Aggregation, to
> > prepare for the removal of the legacy one in Flink 2.0.
> >
> >
> > I have already done some preliminary POC to validate the feasibility of
> > the related work in this flip as follows.
> > 1. POC for SESSION Window TVF Aggregation [3]
> > 2. POC for CUMULATE in Group Window Aggregation operator [4]
> > 3. POC for consuming CDC stream in Window Aggregation operator [5]
> >
> >
> > Looking forward to your feedback and thoughts!
> >
> >
> >
> > [1]
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/window-agg/
> >
> > [2]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function#FLIP145:SupportSQLwindowingtablevaluedfunction-SessionWindows
> > [3] https://github.com/xuyangzhong/flink/tree/FLINK-24024
> > [4]
> >
> https://github.com/xuyangzhong/flink/tree/poc_legacy_group_window_agg_cumulate
> > [5]
> >
> https://github.com/xuyangzhong/flink/tree/poc_window_agg_consumes_cdc_stream
> >
> >
> >
> > --
> >
> > Best!
> > Xuyang
>


Re:Re: [DISCUSS] FLIP-392: Deprecate the Legacy Group Window Aggregation

2023-12-04 Thread Xuyang
Hi, Feng and David.


Thank you very much to share your thoughts.


This flip does not include the official exposure of these experimental conf to 
users. Thus there is not adetailed description of this part.
However, in view that some technical users may have added these experimental 
conf in actual production jobs, the processing
of these conf while using window tvf syntax has been added to this flip.


Overall, the behavior of using these experimental parameters is no different 
from before, and I think we should provide the compatibility
about using these experimental conf.


Look for your thoughs.




--

Best!
Xuyang





At 2023-12-05 09:17:49, "David Anderson"  wrote:
>The current situation (where we have both the legacy windows and the
>TVF-based windows) is confusing for users, and I'd like to see us move
>forward as rapidly as possible.
>
>Since the early fire, late fire, and allowed lateness features were never
>documented or exposed to users, I don't feel that we need to provide
>replacements for these internal, experimental features before officially
>deprecating the legacy group window aggregations, and I'd rather not wait.
>
>However, I'd be delighted to see a proposal for what that might look like.
>
>Best,
>David
>
>On Mon, Dec 4, 2023 at 12:45 PM Feng Jin  wrote:
>
>> Hi xuyang,
>>
>> Thank you for initiating this proposal.
>>
>> I'm glad to see that TVF's functionality can be fully supported.
>>
>> Regarding the early fire, late fire, and allow lateness features, how will
>> they be provided to users? The documentation doesn't seem to provide a
>> detailed description of this part.
>>
>> Since this FLIP will also involve a lot of feature development, I am more
>> than willing to help, including development and code review.
>>
>> Best,
>> Feng
>>
>> On Tue, Nov 28, 2023 at 8:31 PM Xuyang  wrote:
>>
>> > Hi all.
>> > I'd like to start a discussion of FLIP-392: Deprecate the Legacy Group
>> > Window Aggregation.
>> >
>> >
>> > Although the current Flink SQL Window Aggregation documentation[1]
>> > indicates that the legacy Group Window Aggregation
>> > syntax has been deprecated, the new Window TVF Aggregation syntax has not
>> > fully covered all of the features of the legacy one.
>> >
>> >
>> > Compared to Group Window Aggergation, Window TVF Aggergation has several
>> > advantages, such as two-stage optimization,
>> > support for standard GROUPING SET syntax, and so on. However, it needs to
>> > supplement and enrich the following features.
>> >
>> >
>> > 1. Support for SESSION Window TVF Aggregation
>> > 2. Support for consuming CDC stream
>> > 3. Support for HOP window size with non-integer step length
>> > 4. Support for configurations such as early fire, late fire and allow
>> > lateness
>> > (which are internal experimental configurations in Group Window
>> > Aggregation and not public to users yet.)
>> > 5. Unification of the Window TVF Aggregation operator in runtime at the
>> > implementation layer
>> > (In the long term, the cost to maintain the operators about Window TVF
>> > Aggregation and Group Window Aggregation is too expensive.)
>> >
>> >
>> > This flip aims to continue the unfinished work in FLIP-145[2], which is
>> to
>> > fully enable the capabilities of Window TVF Aggregation
>> >  and officially deprecate the legacy syntax Group Window Aggregation, to
>> > prepare for the removal of the legacy one in Flink 2.0.
>> >
>> >
>> > I have already done some preliminary POC to validate the feasibility of
>> > the related work in this flip as follows.
>> > 1. POC for SESSION Window TVF Aggregation [3]
>> > 2. POC for CUMULATE in Group Window Aggregation operator [4]
>> > 3. POC for consuming CDC stream in Window Aggregation operator [5]
>> >
>> >
>> > Looking forward to your feedback and thoughts!
>> >
>> >
>> >
>> > [1]
>> >
>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/window-agg/
>> >
>> > [2]
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function#FLIP145:SupportSQLwindowingtablevaluedfunction-SessionWindows
>> > [3] https://github.com/xuyangzhong/flink/tree/FLINK-24024
>> > [4]
>> >
>> https://github.com/xuyangzhong/flink/tree/poc_legacy_group_window_agg_cumulate
>> > [5]
>> >
>> https://github.com/xuyangzhong/flink/tree/poc_window_agg_consumes_cdc_stream
>> >
>> >
>> >
>> > --
>> >
>> > Best!
>> > Xuyang
>>


[jira] [Created] (FLINK-33741) introduce stat dump period and statsLevel configuration

2023-12-04 Thread xiaogang zhou (Jira)
xiaogang zhou created FLINK-33741:
-

 Summary: introduce stat dump period and statsLevel configuration
 Key: FLINK-33741
 URL: https://issues.apache.org/jira/browse/FLINK-33741
 Project: Flink
  Issue Type: New Feature
Reporter: xiaogang zhou


I'd like to introduce 2 rocksdb statistic related configuration.

Then we can customize stats
{code:java}
// code placeholder
Statistics s = new Statistics();
s.setStatsLevel(EXCEPT_TIME_FOR_MUTEX);
currentOptions.setStatsDumpPeriodSec(internalGetOption(RocksDBConfigurableOptions.STATISTIC_DUMP_PERIOD))
.setStatistics(s); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Re: [DISCUSS] FLIP-392: Deprecate the Legacy Group Window Aggregation

2023-12-04 Thread Shengkai Fang
Hi, Xuyang. Thanks for your great work. Big +1!

After reading the FLIP, I have some questions. Please read the content
below.

1. Support SESSION Window TVF Aggregation
a. Could you give an example about the pass-through column? A session
window may contain multiple rows, which value is selected by the window
operator?

2. TVF support CDC
   a. What's the behavior if all the data in the window have been removed?
   b. Could you explain more details about the how window process CDC? For
example, what's the behavior if the window only gets the DELETE data from
the upstream Operator?
   c. The subtitle is not correct here. It seems we only support Window TVF
Aggregation to consume the CDC stream.

3. Late fire/Early fire

   It's better if we can introduce a syntax like the `emit` keyword to set
the emit strategy for every window.
   ```
   SESSION(data [PARTITION BY (keycols, ...)], DESCRIPTOR(timecol), gap,
emit='')
   ```

   The drawback of the current implementation is ambiguous to users. It is
up to the operator implementation to determine whether it works. But it's
ok if we just align the behavior with the legacy window operator.

5. Unify into the Window TVF Aggregation operators in runtime at the
implementation layer

I think more work should be mentioned in the FLIP. What's the behavior if
the input schema contains a column named `window_start`? In the current
design, `window_start` is a reserved keyword in the window TVF syntax, but
it is legal in the legacy implementation.

In the FLIP, you mention the FLIP should introduce an option to fall back
to the legacy behavior. Could you tell us what's the name of the option?
BTW, I think we should unify the implementation when window TVF can do all
the work that the legacy operator can do and there is no need to introduce
an option to fallback.

If we remove the legacy window operator in the future, how users upgrade
their jobs? Do you have any plan to support state migration from the legacy
window to Windows TVF?

Best,
Shengkai




Xuyang  于2023年12月5日周二 11:10写道:

> Hi, Feng and David.
>
>
> Thank you very much to share your thoughts.
>
>
> This flip does not include the official exposure of these experimental
> conf to users. Thus there is not adetailed description of this part.
> However, in view that some technical users may have added these
> experimental conf in actual production jobs, the processing
> of these conf while using window tvf syntax has been added to this flip.
>
>
> Overall, the behavior of using these experimental parameters is no
> different from before, and I think we should provide the compatibility
> about using these experimental conf.
>
>
> Look for your thoughs.
>
>
>
>
> --
>
> Best!
> Xuyang
>
>
>
>
>
> At 2023-12-05 09:17:49, "David Anderson"  wrote:
> >The current situation (where we have both the legacy windows and the
> >TVF-based windows) is confusing for users, and I'd like to see us move
> >forward as rapidly as possible.
> >
> >Since the early fire, late fire, and allowed lateness features were never
> >documented or exposed to users, I don't feel that we need to provide
> >replacements for these internal, experimental features before officially
> >deprecating the legacy group window aggregations, and I'd rather not wait.
> >
> >However, I'd be delighted to see a proposal for what that might look like.
> >
> >Best,
> >David
> >
> >On Mon, Dec 4, 2023 at 12:45 PM Feng Jin  wrote:
> >
> >> Hi xuyang,
> >>
> >> Thank you for initiating this proposal.
> >>
> >> I'm glad to see that TVF's functionality can be fully supported.
> >>
> >> Regarding the early fire, late fire, and allow lateness features, how
> will
> >> they be provided to users? The documentation doesn't seem to provide a
> >> detailed description of this part.
> >>
> >> Since this FLIP will also involve a lot of feature development, I am
> more
> >> than willing to help, including development and code review.
> >>
> >> Best,
> >> Feng
> >>
> >> On Tue, Nov 28, 2023 at 8:31 PM Xuyang  wrote:
> >>
> >> > Hi all.
> >> > I'd like to start a discussion of FLIP-392: Deprecate the Legacy Group
> >> > Window Aggregation.
> >> >
> >> >
> >> > Although the current Flink SQL Window Aggregation documentation[1]
> >> > indicates that the legacy Group Window Aggregation
> >> > syntax has been deprecated, the new Window TVF Aggregation syntax has
> not
> >> > fully covered all of the features of the legacy one.
> >> >
> >> >
> >> > Compared to Group Window Aggergation, Window TVF Aggergation has
> several
> >> > advantages, such as two-stage optimization,
> >> > support for standard GROUPING SET syntax, and so on. However, it
> needs to
> >> > supplement and enrich the following features.
> >> >
> >> >
> >> > 1. Support for SESSION Window TVF Aggregation
> >> > 2. Support for consuming CDC stream
> >> > 3. Support for HOP window size with non-integer step length
> >> > 4. Support for configurations such as early fire, late fire and allow
> >> > lateness
> >> 

Re: Re: [DISCUSS] FLIP-392: Deprecate the Legacy Group Window Aggregation

2023-12-04 Thread liu ron
Hi, xuyang

Thanks for starting this FLIP discussion, currently there are two types of
window aggregation in Flink SQL, namely legacy group window aggregation and
window tvf aggregation, these two types of window aggregation are not fully
aligned in behavior, which will bring a lot of confusion to the users, so
there is a need to unify and align them. I think the final ideal state
should be that there is only one window tvf aggregation, which supports
Tumble, HOP, Cumulate and Session windows, and supports consuming CDC data
streams. There is also support for configuring EARLY-FIRE and LATER-FIRE.

This FLIP is a continuation of FLIP-145, and also supports legacy group
window aggregation to flat-migrate to the new window tvf agregation, which
is very useful, especially for the support of CDC streams, a pain point
that users often feedback. Big +1 for this FLIP.

Best,
Ron

Xuyang  于2023年12月5日周二 11:11写道:

> Hi, Feng and David.
>
>
> Thank you very much to share your thoughts.
>
>
> This flip does not include the official exposure of these experimental
> conf to users. Thus there is not adetailed description of this part.
> However, in view that some technical users may have added these
> experimental conf in actual production jobs, the processing
> of these conf while using window tvf syntax has been added to this flip.
>
>
> Overall, the behavior of using these experimental parameters is no
> different from before, and I think we should provide the compatibility
> about using these experimental conf.
>
>
> Look for your thoughs.
>
>
>
>
> --
>
> Best!
> Xuyang
>
>
>
>
>
> At 2023-12-05 09:17:49, "David Anderson"  wrote:
> >The current situation (where we have both the legacy windows and the
> >TVF-based windows) is confusing for users, and I'd like to see us move
> >forward as rapidly as possible.
> >
> >Since the early fire, late fire, and allowed lateness features were never
> >documented or exposed to users, I don't feel that we need to provide
> >replacements for these internal, experimental features before officially
> >deprecating the legacy group window aggregations, and I'd rather not wait.
> >
> >However, I'd be delighted to see a proposal for what that might look like.
> >
> >Best,
> >David
> >
> >On Mon, Dec 4, 2023 at 12:45 PM Feng Jin  wrote:
> >
> >> Hi xuyang,
> >>
> >> Thank you for initiating this proposal.
> >>
> >> I'm glad to see that TVF's functionality can be fully supported.
> >>
> >> Regarding the early fire, late fire, and allow lateness features, how
> will
> >> they be provided to users? The documentation doesn't seem to provide a
> >> detailed description of this part.
> >>
> >> Since this FLIP will also involve a lot of feature development, I am
> more
> >> than willing to help, including development and code review.
> >>
> >> Best,
> >> Feng
> >>
> >> On Tue, Nov 28, 2023 at 8:31 PM Xuyang  wrote:
> >>
> >> > Hi all.
> >> > I'd like to start a discussion of FLIP-392: Deprecate the Legacy Group
> >> > Window Aggregation.
> >> >
> >> >
> >> > Although the current Flink SQL Window Aggregation documentation[1]
> >> > indicates that the legacy Group Window Aggregation
> >> > syntax has been deprecated, the new Window TVF Aggregation syntax has
> not
> >> > fully covered all of the features of the legacy one.
> >> >
> >> >
> >> > Compared to Group Window Aggergation, Window TVF Aggergation has
> several
> >> > advantages, such as two-stage optimization,
> >> > support for standard GROUPING SET syntax, and so on. However, it
> needs to
> >> > supplement and enrich the following features.
> >> >
> >> >
> >> > 1. Support for SESSION Window TVF Aggregation
> >> > 2. Support for consuming CDC stream
> >> > 3. Support for HOP window size with non-integer step length
> >> > 4. Support for configurations such as early fire, late fire and allow
> >> > lateness
> >> > (which are internal experimental configurations in Group Window
> >> > Aggregation and not public to users yet.)
> >> > 5. Unification of the Window TVF Aggregation operator in runtime at
> the
> >> > implementation layer
> >> > (In the long term, the cost to maintain the operators about Window TVF
> >> > Aggregation and Group Window Aggregation is too expensive.)
> >> >
> >> >
> >> > This flip aims to continue the unfinished work in FLIP-145[2], which
> is
> >> to
> >> > fully enable the capabilities of Window TVF Aggregation
> >> >  and officially deprecate the legacy syntax Group Window Aggregation,
> to
> >> > prepare for the removal of the legacy one in Flink 2.0.
> >> >
> >> >
> >> > I have already done some preliminary POC to validate the feasibility
> of
> >> > the related work in this flip as follows.
> >> > 1. POC for SESSION Window TVF Aggregation [3]
> >> > 2. POC for CUMULATE in Group Window Aggregation operator [4]
> >> > 3. POC for consuming CDC stream in Window Aggregation operator [5]
> >> >
> >> >
> >> > Looking forward to your feedback and thoughts!
> >> >
> >> >
> >> >
> >> > [1]
> >> >
> >>
> https:/

[jira] [Created] (FLINK-33742) Hybrid Shuffle should work well with Adaptive Query Execution

2023-12-04 Thread Yuxin Tan (Jira)
Yuxin Tan created FLINK-33742:
-

 Summary: Hybrid Shuffle should work well with Adaptive Query 
Execution
 Key: FLINK-33742
 URL: https://issues.apache.org/jira/browse/FLINK-33742
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Coordination, Runtime / Network
Affects Versions: 1.19.0
Reporter: Yuxin Tan


At present, Hybrid Shuffle and Adaptive Query Execution (AQE), which includes 
features such as Dynamic Partition Pruning (DPP), Runtime Filter, and Adaptive 
Batch Scheduler, are not fully compatible. While they can be used concurrently 
at the same time, the activation of AQE inhibits the key capability of Hybrid 
Shuffle to perform simultaneous reading and writing. This limitation arises 
because AQE dictates that downstream tasks may only initiate once upstream 
tasks have finished, a requirement that is inconsistent with the simultaneous 
read-write process facilitated by Hybrid Shuffle.

To harness the full potential of Hybrid Shuffle and AQE, it is essential to 
refine their integration. By doing so, we can capitalize on each feature's 
distinct advantages and enhance overall system performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Release flink-connector-parent v1.01

2023-12-04 Thread Péter Váry
Hi Etienne,

Which branch would you cut the release from?

I find the flink-connector-parent branches confusing.

If I merge a PR to the ci_utils branch, would it immediately change the CI
workflow of all of the connectors?

If I merge something to the release_utils branch, would it immediately
change the release process of all of the connectors?

I would like to add the possibility of creating Python packages for the
connectors [1]. This would consist of some common code, which should reside
in flink-connector-parent, like:
- scripts for running Python test - test infra. I expect that this would
evolve in time
- ci workflow - this would be more slow moving, but might change if the
infra is charging
- release scripts - this would be slow moving, but might change too.

I think we should have a release for all of the above components, so the
connectors could move forward on their own pace.

What do you think?

Thanks,
Péter

[1] https://issues.apache.org/jira/browse/FLINK-33528

On Thu, Nov 30, 2023, 16:55 Etienne Chauchot  wrote:

> Thanks Sergey for your vote. Indeed I have listed only the PRs merged
> since last release but there are these 2 open PRs that could be worth
> reviewing/merging before release.
>
> https://github.com/apache/flink-connector-shared-utils/pull/25
>
> https://github.com/apache/flink-connector-shared-utils/pull/20
>
> Best
>
> Etienne
>
>
> Le 30/11/2023 à 11:12, Sergey Nuyanzin a écrit :
> > thanks for volunteering Etienne
> >
> > +1 for releasing
> > however there is one more PR to enable custom jvm flags for connectors
> > in similar way it is done in Flink main repo for modules
> > It will simplify a bit support for java 17
> >
> > could we have this as well in the coming release?
> >
> >
> >
> > On Wed, Nov 29, 2023 at 11:40 AM Etienne Chauchot
> > wrote:
> >
> >> Hi all,
> >>
> >> I would like to discuss making a v1.0.1 release of
> flink-connector-parent.
> >>
> >> Since last release, there were only 2 changes:
> >>
> >> -https://github.com/apache/flink-connector-shared-utils/pull/19
> >> (spotless addition)
> >>
> >> -https://github.com/apache/flink-connector-shared-utils/pull/26
> >> (surefire configuration)
> >>
> >> The new release would bring the ability to skip some tests in the
> >> connectors and among other things skip the archunit tests. It is
> >> important for connectors to skip archunit tests when tested against a
> >> version of Flink that changes the archunit rules leading to a change of
> >> the violation store. As there is only one violation store and the
> >> connector needs to be tested against last 2 minor Flink versions, only
> >> the version the connector was built against needs to run the archunit
> >> tests and have them reflected in the violation store.
> >>
> >>
> >> I volunteer to make the release. As it would be my first ASF release, I
> >> might require the guidance of one of the PMC members.
> >>
> >>
> >> Best
> >>
> >> Etienne
> >>
> >>
> >>
> >>
> >>


[jira] [Created] (FLINK-33743) Support consuming multiple subpartitions on a single channel

2023-12-04 Thread Yuxin Tan (Jira)
Yuxin Tan created FLINK-33743:
-

 Summary: Support consuming multiple subpartitions on a single 
channel
 Key: FLINK-33743
 URL: https://issues.apache.org/jira/browse/FLINK-33743
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Network
Reporter: Yuxin Tan


At present, a downstream channel is limited to consuming data from a single 
subpartition, a constraint that can lead to increased memory consumption. 
Addressing this issue is also a critical step in ensuring that Hybrid Shuffle 
functions effectively with Adaptive Query Execution (AQE). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33744) Hybrid shuffle avoids restarting the whole job when failover

2023-12-04 Thread Yuxin Tan (Jira)
Yuxin Tan created FLINK-33744:
-

 Summary: Hybrid shuffle avoids restarting the whole job when 
failover
 Key: FLINK-33744
 URL: https://issues.apache.org/jira/browse/FLINK-33744
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Network
Affects Versions: 1.19.0
Reporter: Yuxin Tan


If Hybrid shuffle is enabled, the whole job will be restarted when failover. 
This is a critical issue for large-scale jobs. We should improve the logic and 
avoid restarting the whole job when failover.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33745) Dynamically choose hybrid shuffle or AQE in a job level

2023-12-04 Thread Yuxin Tan (Jira)
Yuxin Tan created FLINK-33745:
-

 Summary: Dynamically choose hybrid shuffle or AQE in a job level
 Key: FLINK-33745
 URL: https://issues.apache.org/jira/browse/FLINK-33745
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Network
Affects Versions: 1.19.0
Reporter: Yuxin Tan


To enhance the initial integration of Hybrid Shuffle with Adaptive Query 
Execution (AQE), we could implement a coarse-grained mode selection strategy. 
For instance, we can opt for either Hybrid Shuffle or AQE at the granularity 
level of an entire job. This approach would allow us to better align the two 
features in the early stages of adoption.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33746) More precise dynamic selection of Hybrid Shuffle or AQE

2023-12-04 Thread Yuxin Tan (Jira)
Yuxin Tan created FLINK-33746:
-

 Summary: More precise dynamic selection of Hybrid Shuffle or AQE
 Key: FLINK-33746
 URL: https://issues.apache.org/jira/browse/FLINK-33746
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination, Runtime / Network
Affects Versions: 1.19.0
Reporter: Yuxin Tan


We can even adopt more precise and intelligent strategies to select between 
Hybrid Shuffle and AQE. For instance, the choice could be made based on the 
edge type between tasks, or we could leverage historical job performance data 
and other metrics to inform our decision. Such tailored strategies would enable 
us to utilize each feature where it is most beneficial.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33747) Remove Sink V1 API in 2.0

2023-12-04 Thread Weijie Guo (Jira)
Weijie Guo created FLINK-33747:
--

 Summary: Remove Sink V1 API in 2.0
 Key: FLINK-33747
 URL: https://issues.apache.org/jira/browse/FLINK-33747
 Project: Flink
  Issue Type: Technical Debt
  Components: Connectors / Common
Reporter: Weijie Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-383: Support Job Recovery for Batch Jobs

2023-12-04 Thread Lijie Wang
Thanks for raising this valueable point, Xintong

Supporting external shuffle service makes sense to me. In order to recover
the internal states of ShuffleMaster after JM restarts, we will add the
following 3 methods to ShuffleMaster:

boolean supportsBatchSnapshot();
void snapshotState(CompletableFuture snapshotFuture);
void restoreState(byte[] snapshotData);

We will provide empty implementations by default. If an external service
wants to support Job Recovery, it needs to override these methods.  Before
the job starts running, we will check whether the shuffle master supports
taking snapshots(through method supportsBatchSnapshot). If it is not
supported, we will disable Job Recovery for jobs.

The default Netty/TM shuffle is stateless, so we only need to override the
"supportsBatchSnapshot" method to let it return true ("snapshotState" and
"restoreState" keep empty implementations).

You can find more details in FLIP  "JobEvent" section.

Best,
Lijie

Xintong Song  于2023年12月4日周一 15:34写道:

> Thanks for the proposal, Lijie and Zhu.
>
> I have been having offline discussions with the Apache Celeborn folks
> regarding integrating Apache Celeborn into Flink's Hybrid Shuffle mode. One
> thing coming from those discussions that might relate to this FLIP is that
> Celeborn maintains some internal states inside its LifecycleManager (think
> of this as a component resident in Flink's Shuffle Master), which would
> also need persistent and recovery in order for the partitions to be reused
> after a JM crash. Given that Flink supports pluggable shuffle services,
> there could be other custom shuffle services with similar demands. I wonder
> if it makes sense to also add interfaces that take snapshots from Shuffle
> Master once a while, and provide such snapshots to Shuffle Master upon
> recovery?
>
> Best,
>
> Xintong
>
>
>
> On Thu, Nov 30, 2023 at 5:48 PM Lijie Wang 
> wrote:
>
> > Hi Guowei,
> >
> > Thanks for your feedback.
> >
> > >> As far as I know, there are multiple job managers on standby in some
> > scenarios. In this case, is your design still effective?
> > I think it's still effective. There will only be one leader. After
> becoming
> > the leader, the startup process of JobMaster is the same as only one
> > jobmanger restarts, so I think the current process should also be
> > applicable to multi-jobmanager situation. We will also do some tests to
> > cover this case.
> >
> > >> How do you rule out that there might still be some states in the
> memory
> > of the original operator coordinator?
> > Current restore process is the same as steraming jobs restore from
> > checkpoint(call the same methods) after failover, which is widely used in
> > production, so I think there is no problem.
> >
> > >> Additionally, using NO_CHECKPOINT seems a bit odd. Why not use a
> normal
> > checkpoint ID greater than 0 and record it in the event store?
> > We use -1(NO_CHECKPOINT) to distinguish it from a normal checkpoint, -1
> > indicates that this is a snapshot for the no-checkpoint/batch scenarios.
> >
> > Besides, considering that currently some operator coordinators may not
> > support taking snapshots in the no-checkpint/batch scenarios (or don't
> > support passing -1 as a checkpoint id), we think it is better to let the
> > developer explicitly specify whether it supports snapshots in the batch
> > scenario. Therefore, we intend to introduce the "SupportsBatchSnapshot"
> > interface for split enumerator and the "supportsBatchSnapshot" method for
> > operator coordinator. You can find more details in FLIP "Introduce
> > SupportsBatchSnapshot interface" and "JobEvent" sections.
> >
> > Looking forward to your further feedback.
> >
> > Best,
> > Lijie
> >
> > Guowei Ma  于2023年11月19日周日 10:47写道:
> >
> > > Hi,
> > >
> > >
> > > This is a very good proposal, as far as I know, it can solve some very
> > > critical production operations in certain scenarios. I have two minor
> > > issues:
> > >
> > > As far as I know, there are multiple job managers on standby in some
> > > scenarios. In this case, is your design still effective? I'm unsure if
> > you
> > > have conducted any tests. For instance, standby job managers might take
> > > over these failed jobs more quickly.
> > > Regarding the part about the operator coordinator, how can you ensure
> > that
> > > the checkpoint mechanism can restore the state of the operator
> > coordinator:
> > > For example:
> > > How do you rule out that there might still be some states in the memory
> > of
> > > the original operator coordinator? After all, the implementation was
> done
> > > under the assumption of scenarios where the job manager doesn't fail.
> > > Additionally, using NO_CHECKPOINT seems a bit odd. Why not use a normal
> > > checkpoint ID greater than 0 and record it in the event store?
> > > If the issues raised in point 2 cannot be resolved in the short term,
> > would
> > > it be possible to consider not supporting failover with a source job
> > > manager?
> > >
>

[jira] [Created] (FLINK-33748) Remove legacy TableSource/TableSink API in 2.0

2023-12-04 Thread Weijie Guo (Jira)
Weijie Guo created FLINK-33748:
--

 Summary: Remove legacy TableSource/TableSink API in 2.0
 Key: FLINK-33748
 URL: https://issues.apache.org/jira/browse/FLINK-33748
 Project: Flink
  Issue Type: Technical Debt
  Components: Table SQL / API
Reporter: Weijie Guo
 Fix For: 2.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Subscribe Apache Flink development email.

2023-12-04 Thread aaron ai
Subscribe Apache Flink development email.


Re: [DISCUSS] Resolve diamond inheritance of Sink.createWriter

2023-12-04 Thread Becket Qin
I am with Gyula about fixing the current SinkV2 API.

A SinkV3 seems not necessary because we are not changing the fundamental
design of the API. Hopefully we can modify the interface structure a little
bit to make it similar to the Source while still keep the backwards
compatibility.
For example, one approach is:

- Add snapshotState(int checkpointId) and precommit() methods to the
SinkWriter with default implementation doing nothing. Deprecate
StatefulSinkWriter and PrecommittingSinkWriter.
- Add two mixin interfaces of SupportsStatefulWrite and
SupportsTwoPhaseCommit. Deprecate the StatefulSink and
TwoPhaseCommittingSink.

Thanks,

Jiangjie (Becket) Qin

On Mon, Dec 4, 2023 at 7:25 PM Gyula Fóra  wrote:

> Hi All!
>
> Based on the discussion above, I feel that the most reasonable approach
> from both developers and users perspective at this point is what Becket
> lists as Option 1:
>
> Revert the naming change to the backward compatible version and accept that
> the names are not perfect (treat it as legacy).
>
> On a different note, I agree that the current sink v2 interface is very
> difficult to evolve and structuring the interfaces the way they are now is
> not a good design in the long run.
> For new functionality or changes we can make easily, we should switch to
> the decorative/mixin interface approach used successfully in the source and
> table interfaces. Let's try to do this as much as possible within the v2
> and compatibility boundaries and we should only introduce a v3 if we really
> must.
>
> So from my side, +1 to reverting the naming to keep backward compatibility.
>
> Cheers,
> Gyula
>
>
> On Fri, Dec 1, 2023 at 10:43 AM Péter Váry 
> wrote:
>
> > Thanks Becket for your reply!
> >
> > *On Option 1:*
> > - I personally consider API inconsistencies more important, since they
> will
> > remain with us "forever", but this is up to the community. I can
> implement
> > whichever solution we decide upon.
> >
> > *Option 2:*
> > - I don't think this specific issue merits a rewrite, but if we decide to
> > change our approach, then it's a different story.
> >
> > *Evolvability:*
> > This discussion reminds me of a similar discussion on FLIP-372 [1], where
> > we are trying to decide if we should use mixin interfaces, or use
> interface
> > inheritance.
> > With the mixin approach, we have a more flexible interface, but we can't
> > check the generic types of the interfaces/classes on compile time, or
> even
> > when we create the DAG. The first issue happens when we call the method
> and
> > fail.
> > The issue here is similar:
> > - *StatefulSink* needs a writer with a method to `*snapshotState*`
> > - *TwoPhaseCommittingSink* needs a writer with `*prepareCommit*`
> > - If there is a Sink which is stateful and needs to commit, then it needs
> > both of these methods.
> >
> > If we use the mixin solution here, we lose the possibility to check the
> > types in compile time. We could do the type check in runtime using `
> > *instanceof*`, so we are better off than with the FLIP-372 example above,
> > but still lose any important possibility. I personally prefer the mixin
> > approach, but that would mean we rewrite the Sink API again - likely a
> > SinkV3. Are we ready to move down that path?
> >
> > Thanks,
> > Peter
> >
> > [1] - https://lists.apache.org/thread/344pzbrqbbb4w0sfj67km25msp7hxlyd
> >
> > On Thu, Nov 30, 2023, 14:53 Becket Qin  wrote:
> >
> > > Hi folks,
> > >
> > > Sorry for replying late on the thread.
> > >
> > > For this particular FLIP, I see two solutions:
> > >
> > > Option 1:
> > > 1. On top of the the current status, rename
> > > *org.apache.flink.api.connector.sink2.InitContext *to
> > > *CommonInitContext (*should
> > > probably be package private*)*.
> > > 2. Change the name *WriterInitContext* back to *InitContext, *and
> revert
> > > the deprecation. We can change the parameter name to writerContext if
> we
> > > want to.
> > > Admittedly, this does not have full symmetric naming of the
> InitContexts
> > -
> > > we will have CommonInitContext / InitContext / CommitterInitContext
> > instead
> > > of InitContext / WriterInitContext / CommitterInitContext. However, the
> > > naming seems clear without much confusion. Personally, I can live with
> > > that, treating the class InitContext as a non-ideal legacy class name
> > > without much material harm.
> > >
> > > Option 2:
> > > Theoretically speaking, if we really want to reach the perfect state
> > while
> > > being backwards compatible, we can create a brand new set of Sink
> > > interfaces and deprecate the old ones. But I feel this is an overkill
> > here.
> > >
> > > The solution to this particular issue aside, the evolvability of the
> > > current interface hierarchy seems a more fundamental issue and worries
> me
> > > more. I haven't completely thought it through, but there are two
> > noticeable
> > > differences between the interface design principles between Source and
> > > Sink.
> > > 1. Source uses deco

Re: [DISCUSS] FLIP-335: Removing Flink's Time classes as part of Flink 2.0

2023-12-04 Thread weijie guo
Thanks for driving this FLIP, Matthias. I'm +1 for this.


Best regards,

Weijie


Matthias Pohl  于2023年7月19日周三 22:32写道:

> The overall Scala-related plan for this FLIP is to ignore the Scala API
> because of FLIP-265. The CEP Java/Scala version
> parity (through the PatternScalaAPICompletenessTest) requires us to touch
> the Scala API, though, because we want to offer an alternative to the
> deprecated API in FLink 1.x. I wanted to point that out in that paragraph.
>
> The alternative would have been to add an exclusion for the newly added
> method. That sounded like a worse option. Deprecating the Scala API should
> be independent from the parity of Java and Scala API in Flink 1.x.
>
> I rewrote this paragraph in the FLIP. I hope it helps.
>
> Matthias
>
> On Mon, Jul 17, 2023 at 11:23 AM Chesnay Schepler 
> wrote:
>
> > I don't understand this bit:/
> >
> > "One minor Scala change is necessary, though: We need to touch the Scala
> > implementation of the Pattern class (in flink-cep). Pattern requires a
> > new method which needs to be implemented in the Scala Pattern class as
> > well to comply with PatternScalaAPICompletenessTest."
> >
> > /FLIP-265//states that /all/ Scala APIs will be removed, which should
> > also cover CEP.
> > //
> > On 13/07/2023 12:08, Matthias Pohl wrote:
> > > The 2.0 feature list includes the removal of Flink's Time classes in
> > favor
> > > of the JDKs java.time.Duration class. There was already a discussion
> > about
> > > it in [1] and FLINK-14068 [2] was created as a consequence of this
> > > discussion.
> > >
> > > I started working on marking the APIs as deprecated in FLINK-32570 [3]
> > > where Chesnay raised a fair point that there isn't a FLIP, yet, to
> > > formalize this public API change. Therefore, I went ahead and created
> > > FLIP-335 [4] to have this change properly documented.
> > >
> > > I'm not 100% sure whether there are better ways of checking whether
> we're
> > > covering everything Public API-related. There are even classes which I
> > > think might be user-facing but are not labeled accordingly (e.g.
> > > flink-cep). But I don't have the proper knowledge in these parts of the
> > > code. Therefore, I would propose marking these methods as deprecated,
> > > anyway, to be on the safe side.
> > >
> > > I'm open to any suggestions on improving the Test Plan of this change.
> > >
> > > I'm looking forward to feedback on this FLIP.
> > >
> > > Best,
> > > Matthias
> > >
> > > [1]https://lists.apache.org/thread/76yywnwf3lk8qn4dby0vz7yoqx7f7pkj
> > > [2]https://issues.apache.org/jira/browse/FLINK-14068
> > > [3]https://issues.apache.org/jira/browse/FLINK-32570
> > > [4]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-335%3A+Removing+Flink%27s+Time+classes
> > >
> >
>


Re: [DISCUSS] FLIP-383: Support Job Recovery for Batch Jobs

2023-12-04 Thread Paul Lam
Hi Lijie,

Recovery for batch jobs is no doubt a long-awaited feature. Thanks for
the proposal!

I’m concerned about the multi-job scenario. In session mode, users could
use web submission to upload and run jars which may produce multiple 
Flink jobs. However, these jobs may not be submitted at once and run in 
parallel. Instead, they could be dependent on other jobs like a DAG. The
schedule of the jobs is controlled by the user's main method.

IIUC, in the FLIP, the main method is lost after the recovery, and only 
submitted jobs would be recovered. Is that right?

Best,
Paul Lam

> 2023年11月2日 18:00,Lijie Wang  写道:
> 
> Hi devs,
> 
> Zhu Zhu and I would like to start a discussion about FLIP-383: Support Job
> Recovery for Batch Jobs[1]
> 
> Currently, when Flink’s job manager crashes or gets killed, possibly due to
> unexpected errors or planned nodes decommission, it will cause the
> following two situations:
> 1. Failed, if the job does not enable HA.
> 2. Restart, if the job enable HA. If it’s a streaming job, the job will be
> resumed from the last successful checkpoint. If it’s a batch job, it has to
> run from beginning, all previous progress will be lost.
> 
> In view of this, we think the JM crash may cause great regression for batch
> jobs, especially long running batch jobs. This FLIP is mainly to solve this
> problem so that batch jobs can recover most job progress after JM crashes.
> In this FLIP, our goal is to let most finished tasks not need to be re-run.
> 
> You can find more details in the FLIP-383[1]. Looking forward to your
> feedback.
> 
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-383%3A+Support+Job+Recovery+for+Batch+Jobs
> 
> Best,
> Lijie



Re: [DISCUSS] Resolve diamond inheritance of Sink.createWriter

2023-12-04 Thread weijie guo
Thanks Martijn for driving this!

I'm +1  to reverting the breaking change.

> For new functionality or changes we can make easily, we should switch to
the decorative/mixin interface approach used successfully in the source and
table interfaces.

I like the way of switching to mixin interface.

Best regards,

Weijie


Becket Qin  于2023年12月5日周二 14:50写道:

> I am with Gyula about fixing the current SinkV2 API.
>
> A SinkV3 seems not necessary because we are not changing the fundamental
> design of the API. Hopefully we can modify the interface structure a little
> bit to make it similar to the Source while still keep the backwards
> compatibility.
> For example, one approach is:
>
> - Add snapshotState(int checkpointId) and precommit() methods to the
> SinkWriter with default implementation doing nothing. Deprecate
> StatefulSinkWriter and PrecommittingSinkWriter.
> - Add two mixin interfaces of SupportsStatefulWrite and
> SupportsTwoPhaseCommit. Deprecate the StatefulSink and
> TwoPhaseCommittingSink.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Mon, Dec 4, 2023 at 7:25 PM Gyula Fóra  wrote:
>
> > Hi All!
> >
> > Based on the discussion above, I feel that the most reasonable approach
> > from both developers and users perspective at this point is what Becket
> > lists as Option 1:
> >
> > Revert the naming change to the backward compatible version and accept
> that
> > the names are not perfect (treat it as legacy).
> >
> > On a different note, I agree that the current sink v2 interface is very
> > difficult to evolve and structuring the interfaces the way they are now
> is
> > not a good design in the long run.
> > For new functionality or changes we can make easily, we should switch to
> > the decorative/mixin interface approach used successfully in the source
> and
> > table interfaces. Let's try to do this as much as possible within the v2
> > and compatibility boundaries and we should only introduce a v3 if we
> really
> > must.
> >
> > So from my side, +1 to reverting the naming to keep backward
> compatibility.
> >
> > Cheers,
> > Gyula
> >
> >
> > On Fri, Dec 1, 2023 at 10:43 AM Péter Váry 
> > wrote:
> >
> > > Thanks Becket for your reply!
> > >
> > > *On Option 1:*
> > > - I personally consider API inconsistencies more important, since they
> > will
> > > remain with us "forever", but this is up to the community. I can
> > implement
> > > whichever solution we decide upon.
> > >
> > > *Option 2:*
> > > - I don't think this specific issue merits a rewrite, but if we decide
> to
> > > change our approach, then it's a different story.
> > >
> > > *Evolvability:*
> > > This discussion reminds me of a similar discussion on FLIP-372 [1],
> where
> > > we are trying to decide if we should use mixin interfaces, or use
> > interface
> > > inheritance.
> > > With the mixin approach, we have a more flexible interface, but we
> can't
> > > check the generic types of the interfaces/classes on compile time, or
> > even
> > > when we create the DAG. The first issue happens when we call the method
> > and
> > > fail.
> > > The issue here is similar:
> > > - *StatefulSink* needs a writer with a method to `*snapshotState*`
> > > - *TwoPhaseCommittingSink* needs a writer with `*prepareCommit*`
> > > - If there is a Sink which is stateful and needs to commit, then it
> needs
> > > both of these methods.
> > >
> > > If we use the mixin solution here, we lose the possibility to check the
> > > types in compile time. We could do the type check in runtime using `
> > > *instanceof*`, so we are better off than with the FLIP-372 example
> above,
> > > but still lose any important possibility. I personally prefer the mixin
> > > approach, but that would mean we rewrite the Sink API again - likely a
> > > SinkV3. Are we ready to move down that path?
> > >
> > > Thanks,
> > > Peter
> > >
> > > [1] - https://lists.apache.org/thread/344pzbrqbbb4w0sfj67km25msp7hxlyd
> > >
> > > On Thu, Nov 30, 2023, 14:53 Becket Qin  wrote:
> > >
> > > > Hi folks,
> > > >
> > > > Sorry for replying late on the thread.
> > > >
> > > > For this particular FLIP, I see two solutions:
> > > >
> > > > Option 1:
> > > > 1. On top of the the current status, rename
> > > > *org.apache.flink.api.connector.sink2.InitContext *to
> > > > *CommonInitContext (*should
> > > > probably be package private*)*.
> > > > 2. Change the name *WriterInitContext* back to *InitContext, *and
> > revert
> > > > the deprecation. We can change the parameter name to writerContext if
> > we
> > > > want to.
> > > > Admittedly, this does not have full symmetric naming of the
> > InitContexts
> > > -
> > > > we will have CommonInitContext / InitContext / CommitterInitContext
> > > instead
> > > > of InitContext / WriterInitContext / CommitterInitContext. However,
> the
> > > > naming seems clear without much confusion. Personally, I can live
> with
> > > > that, treating the class InitContext as a non-ideal legacy class name
> > > > without much material harm.
> > > >
> > > > Opti

[jira] [Created] (FLINK-33749) Remove deprecated getter method in Configuration.

2023-12-04 Thread Junrui Li (Jira)
Junrui Li created FLINK-33749:
-

 Summary: Remove deprecated getter method in Configuration.
 Key: FLINK-33749
 URL: https://issues.apache.org/jira/browse/FLINK-33749
 Project: Flink
  Issue Type: Sub-task
  Components: API / Core
Reporter: Junrui Li


Currently, the Configuration has several getter methods for retrieving config 
values based on a String key, such as getString(String key, String 
defaultValue), and all of which have been deprecated.

We should remove these getter methods in FLINK-2.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33750) Remove deprecated config options.

2023-12-04 Thread Junrui Li (Jira)
Junrui Li created FLINK-33750:
-

 Summary: Remove deprecated config options.
 Key: FLINK-33750
 URL: https://issues.apache.org/jira/browse/FLINK-33750
 Project: Flink
  Issue Type: Sub-task
  Components: API / Core
Reporter: Junrui Li


Remove deprecated config options in FLINK-2.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)