Re: [VOTE] FLIP-471: Fixing watermark idleness timeout accounting

2024-07-31 Thread Stefan Richter

+1 (binding)

Best,
Stefan



> On 31. Jul 2024, at 04:56, Zakelly Lan  wrote:
> 
> +1 (binding)
> 
> 
> Best,
> Zakelly
> 
> On Wed, Jul 31, 2024 at 12:07 AM Piotr Nowojski 
> wrote:
> 
>> Hi all!
>> 
>> I would like to open the vote for FLIP-471 [1]. It has been discussed here
>> [2].
>> 
>> The vote will remain open for at least 72 hours.
>> 
>> Best,
>> Piotrek
>> 
>> [1] 
>> https://www.google.com/url?q=https://cwiki.apache.org/confluence/x/oQvOEg&source=gmail-imap&ust=172299943500&usg=AOvVaw1i6RoAc0yxKmQkhezGamhM
>> [2] 
>> https://www.google.com/url?q=https://lists.apache.org/thread/byj1l2236rfx3mcl3v4374rcbkq4rf85&source=gmail-imap&ust=172299943500&usg=AOvVaw1z2jA41WlE0WWB_ZCstuci
>> 



Re: [VOTE] Release 1.20.0, release candidate #2

2024-07-31 Thread Qingsheng Ren
+1 (binding)

- Built from source
- Reviewed web PR and release note
- Verified checksum and signature
- Checked GitHub release tag
- Tested submitting SQL job with SQL client reading and writing Kafka

Best,
Qingsheng

On Tue, Jul 30, 2024 at 2:26 PM Xintong Song  wrote:

> +1 (binding)
>
> - reviewed flink-web PR
> - verified checksum and signature
> - verified source archives don't contain binaries
> - built from source
> - tried example jobs on a standalone cluster, and everything looks fine
>
> Best,
>
> Xintong
>
>
>
> On Tue, Jul 30, 2024 at 12:13 AM Jing Ge 
> wrote:
>
> > Thanks Weijie!
> >
> > +1 (binding)
> >
> > - verified signatures
> > - verified checksums
> > - checked Github release tag
> > - reviewed the PRs
> > - checked the repo
> > - started a local cluster, tried with WordCount, everything was fine.
> >
> > Best regards,
> > Jing
> >
> >
> > On Mon, Jul 29, 2024 at 1:47 PM Samrat Deb 
> wrote:
> >
> > > Thank you Weijie for driving 1.20 release
> > >
> > > +1 (non-binding)
> > >
> > > - Verified checksums and sha512
> > > - Verified signatures
> > > - Verified Github release tags
> > > - Build from source
> > > - Start the flink cluster locally run few jobs (Statemachine and word
> > > Count)
> > >
> > > Bests,
> > > Samrat
> > >
> > >
> > > On Mon, Jul 29, 2024 at 3:15 PM Ahmed Hamdy 
> > wrote:
> > >
> > > > Thanks Weijie for driving
> > > >
> > > > +1 (non-binding)
> > > >
> > > > - Verified checksums
> > > > - Verified signature matches Rui Fan's
> > > > - Verified tag exists on Github
> > > > - Build from source
> > > > - Verified no binaries in source archive
> > > > - Reviewed release notes PR (some nits)
> > > >
> > > > Best Regards
> > > > Ahmed Hamdy
> > > >
> > > >
> > > > On Thu, 25 Jul 2024 at 12:21, weijie guo 
> > > > wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > >
> > > > > Please review and vote on the release candidate #2 for the version
> > > > 1.20.0,
> > > > >
> > > > > as follows:
> > > > >
> > > > >
> > > > > [ ] +1, Approve the release
> > > > >
> > > > > [ ] -1, Do not approve the release (please provide specific
> comments)
> > > > >
> > > > >
> > > > > The complete staging area is available for your review, which
> > includes:
> > > > >
> > > > > * JIRA release notes [1], and the pull request adding release note
> > for
> > > > > users [2]
> > > > >
> > > > > * the official Apache source release and binary convenience
> releases
> > to
> > > > be
> > > > >
> > > > > deployed to dist.apache.org [3], which are signed with the key
> with
> > > > >
> > > > > fingerprint B2D64016B940A7E0B9B72E0D7D0528B28037D8BC  [4],
> > > > >
> > > > > * all artifacts to be deployed to the Maven Central Repository [5],
> > > > >
> > > > > * source code tag "release-1.20.0-rc2" [6],
> > > > >
> > > > > * website pull request listing the new release and adding
> > announcement
> > > > blog
> > > > >
> > > > > post [7].
> > > > >
> > > > >
> > > > > The vote will be open for at least 72 hours. It is adopted by
> > majority
> > > > >
> > > > > approval, with at least 3 PMC affirmative votes.
> > > > >
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12354210
> > > > >
> > > > > [2] https://github.com/apache/flink/pull/25091
> > > > >
> > > > > [3] https://dist.apache.org/repos/dist/dev/flink/flink-1.20.0-rc2/
> > > > >
> > > > > [4] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > > >
> > > > > [5]
> > > > >
> > >
> https://repository.apache.org/content/repositories/orgapacheflink-1752/
> > > > >
> > > > > [6]
> https://github.com/apache/flink/releases/tag/release-1.20.0-rc2
> > > > >
> > > > > [7] https://github.com/apache/flink-web/pull/751
> > > > >
> > > > >
> > > > > Best,
> > > > >
> > > > > Robert, Rui, Ufuk, Weijie
> > > > >
> > > >
> > >
> >
>


[jira] [Created] (FLINK-35937) Helm RBAC cleanup

2024-07-31 Thread Tim (Jira)
Tim created FLINK-35937:
---

 Summary: Helm RBAC cleanup
 Key: FLINK-35937
 URL: https://issues.apache.org/jira/browse/FLINK-35937
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Affects Versions: 1.19.0
 Environment: Kubernetes 1.29.4
Reporter: Tim


This is a follow up ticket to 
[https://issues.apache.org/jira/projects/FLINK/issues/FLINK-35310]

In further research and testing with Kyverno I figured out that some apiGroups 
seem to be invalid and I removed them with this PR.

It seems that the "extensions" apiGroups does not exist on our recent cluster 
(Kubernetes 1.29.4). I'm not sure but it might be related to these deprecation 
notices: 
https://kubernetes.io/docs/reference/using-api/deprecation-guide/#deployment-v116

Same holds for the "finalizers" resources. They do not seem to exist anymore 
and lead to problems with our deployment. So I also removed them.

To complete the verb list I also added "deleteCollections" where applicable.

Please take a look if this makes sense to you.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[DISCUSS] Remove deprecated dataset based API from State Processor API

2024-07-31 Thread Gabor Somogyi
Hi Devs,

There is a common agreement that the dataset API going to be removed in 2.0.

We're planning to invest quite some time in the State Processor API[1] in
the near future.
On this road we've realized that there are several deprecated APIs in this
area which are based
on the dataset API (for example Savepoint, ExistingSavepoint, NewSavepoint,
etc...) [2].

Based on the dataset removal agreement these are going to be removed but in
order to ease
our efforts it would be good to remove them now. The question is whether
it's fine to do it
or there are some reasons not to do so?

Here are the collected facts about the APIs to help forming an opinion:
* Example intended to remove class: Savepoint (others share the same
characteristics)
* API marker: PublicEvolving
* Marked deprecated in: FLINK-24912 [3]
* Deprecation date: 22/12/2021
* Deprecation in Flink version: 1.15.0
* Eligible for removal according to our deprecation process [4]: 1.16.0+
* New replacement APIs: SavepointReader, SavepointWriter

Please share your thoughts on this.

[1]
https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/libs/state_processor_api/
[2]
https://github.com/apache/flink/blob/82b628d4730eef32b2f7a022e3b73cb18f950e6e/flink-libraries/flink-state-processing-api/src/main/java/org/apache/flink/state/api/Savepoint.java#L47-L49
[3] https://issues.apache.org/jira/browse/FLINK-24912
[4]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-321%3A+Introduce+an+API+deprecation+process

BR,
G


Re: [DISCUSS] FLIP-XXX Amazon SQS Source Connector

2024-07-31 Thread Saurabh Singh
Hi Ahmed,

Thank you very much for the detailed, valuable review. Please find our
responses below:


   - In the FLIP you mention the split is going to be 1 sqs Queue, does
   this mean we would support reading from multiple queues? This is also not
   clear in the implementation of `addSplitsBack` whether we are planning to
   support multiple sqs topics or not.

*Our current implementation assumes that each source reads from a single
SQS queue. If you need to read from multiple SQS queues, you can define
multiple sources accordingly. We believe this approach is clearer and more
organized compared to having a single source switch between multiple
queues. This design choice is based on weighing the benefits, but we can
support multiple queues per source if the need arises.*

   - Regarding Client creation, there has been some effort in the common
   `aws-util` module like createAwsSyncClient, we should reuse that for
   `SqsClient` creation.

   *Thank you for bringing this to our attention. Yes, we will utilize the
   existing createClient methods available in the libraries. Our goal is to
   avoid any code duplication on our end.*


   - On the same point for clients, Is there a reason the FLIP suggests
   async clients? sync clients have proven more stable and the source
   threading model already guarantees no blocking by sync clients.

*We were not aware of this, and we have been using async clients for our
in-house use cases. However, since we already have sync clients in the
aws-util that ensure no blocking, we are in a good position. We will use
these sync clients during our development and testing efforts, and we will
share the results and keep the community updated.*

   - On mentioning threading, the FLIP doesn’t mention the fetcher manager.
   Is it going to be `SingleThreadFetcherManager`? Would it be better to make
   the source reader extend the SingleThreadedMultiplexReaderBase or are we
   going to implement a more simple version?

*Yes, we are considering implementing SingleThreadMultiplexSourceReaderBase
for the Reader. We have included the implementation snippet in the FLIP for
reference.*

   - The FLIP doesn’t mention schema deserialization or the recordEmitter
   implementation, Are we going to use `deserializationSchema` or some sort of
   string to element converter? It is also not clear form the builder example
   provided?

*Yes, we plan to use the deserializationSchema and recordEmitter
implementations. We have included sample code for these in the FLIP for
reference.*

   - Are the values mentioned in getSqsClientProperties recommended
   defaults? If so we should highlight that.

*The defaults are not decided. These are just sample snapshots for example.*

   - Most importantly I am a bit skeptical regarding enforcing exactly-once
   semantics with side effects especially with dependency on checkpointing
   configuration, could we add flags to disable and disable by default if the
   checkpointing is not enabled?

*During our initial design phase, we intended to enforce exactly-once
semantics via checkpoints. However, you raise a valid point, and we will
make this a configurable feature for users. They can choose to disable
exactly-once semantics, accepting some duplicate processing (at-least-once)
as a trade-off. We have updated the FLIP to include support for this
feature.*

   - I am not 100% convinced we should block FLIP itself on the FLIP-438
   implementation but I echo the fact that there might be some reusable code
   between the 2 submodules we should make use of.

   *Yes we echo the same. We fully understand the concern and are closely
   examining the SQS Sink implementation. We will ensure there is no
   duplication of work or submodules. If any issues arise, we will address
   them promptly.*


Thank you for your valuable feedback on the FLIP. Your input is helping us
refine and improve it significantly.

[Main Proposal doc] -
https://docs.google.com/document/d/1lreo27jNh0LkRs1Mj9B3wj3itrzMa38D4_XGryOIFks/edit#heading=h.ci1rrcgbsvkl

Regards
Saurabh & Abhi


On Sun, Jul 28, 2024 at 3:04 AM Ahmed Hamdy  wrote:

> Hi Saurabh
> I think this is going to be a valuable addition which is needed.
> I have a couple of comments
> - In the FLIP you mention the split is going to be 1 sqs Queue, does this
> mean we would support reading from multiple queues? The builder example
> seems to support a single queue
> SqsSource.builder.setSqsUrl("
> https://sqs.us-east-1.amazonaws.com/23145433/sqs-test";)
> - This is also not clear in the implementation of `addSplitsBack` whether
> we are planning to support multiple sqs topics or not.
> - Regarding Client creation, there has been some effort in the common
> `aws-util` module like createAwsSyncClient  , we should reuse that for
> `SqsClient` creation.
> - On the same point for clients, Is there a reason the FLIP suggests async
> clients? sync clients have proven more stable and the source threading
> model already guarantees no b

Re: [VOTE] FLIP-471: Fixing watermark idleness timeout accounting

2024-07-31 Thread Timo Walther

+1 (binding)

Thanks for fixing this critical bug.

Regards,
Timo

On 31.07.24 09:51, Stefan Richter wrote:


+1 (binding)

Best,
Stefan




On 31. Jul 2024, at 04:56, Zakelly Lan  wrote:

+1 (binding)


Best,
Zakelly

On Wed, Jul 31, 2024 at 12:07 AM Piotr Nowojski 
wrote:


Hi all!

I would like to open the vote for FLIP-471 [1]. It has been discussed here
[2].

The vote will remain open for at least 72 hours.

Best,
Piotrek

[1] 
https://www.google.com/url?q=https://cwiki.apache.org/confluence/x/oQvOEg&source=gmail-imap&ust=172299943500&usg=AOvVaw1i6RoAc0yxKmQkhezGamhM
[2] 
https://www.google.com/url?q=https://lists.apache.org/thread/byj1l2236rfx3mcl3v4374rcbkq4rf85&source=gmail-imap&ust=172299943500&usg=AOvVaw1z2jA41WlE0WWB_ZCstuci








Re: [VOTE] Release 1.20.0, release candidate #2

2024-07-31 Thread Leonard Xu
+1 (binding)

- checked Github release tag
- verified signatures and hashsums
- built from source code succeeded
- reviewed the web PR, left minor comment
- checked release notes, minor: there're some issues need to update Fix 
Version[1]
- started SQL Client, used MySQL CDC connector to read changelog from database 
, the result is expected

Best,
Leonard
[1] https://issues.apache.org/jira/projects/FLINK/versions/12354210


> 2024年7月31日 下午4:24,Qingsheng Ren  写道:
> 
> +1 (binding)
> 
> - Built from source
> - Reviewed web PR and release note
> - Verified checksum and signature
> - Checked GitHub release tag
> - Tested submitting SQL job with SQL client reading and writing Kafka
> 
> Best,
> Qingsheng
> 
> On Tue, Jul 30, 2024 at 2:26 PM Xintong Song  wrote:
> 
>> +1 (binding)
>> 
>> - reviewed flink-web PR
>> - verified checksum and signature
>> - verified source archives don't contain binaries
>> - built from source
>> - tried example jobs on a standalone cluster, and everything looks fine
>> 
>> Best,
>> 
>> Xintong
>> 
>> 
>> 
>> On Tue, Jul 30, 2024 at 12:13 AM Jing Ge 
>> wrote:
>> 
>>> Thanks Weijie!
>>> 
>>> +1 (binding)
>>> 
>>> - verified signatures
>>> - verified checksums
>>> - checked Github release tag
>>> - reviewed the PRs
>>> - checked the repo
>>> - started a local cluster, tried with WordCount, everything was fine.
>>> 
>>> Best regards,
>>> Jing
>>> 
>>> 
>>> On Mon, Jul 29, 2024 at 1:47 PM Samrat Deb 
>> wrote:
>>> 
 Thank you Weijie for driving 1.20 release
 
 +1 (non-binding)
 
 - Verified checksums and sha512
 - Verified signatures
 - Verified Github release tags
 - Build from source
 - Start the flink cluster locally run few jobs (Statemachine and word
 Count)
 
 Bests,
 Samrat
 
 
 On Mon, Jul 29, 2024 at 3:15 PM Ahmed Hamdy 
>>> wrote:
 
> Thanks Weijie for driving
> 
> +1 (non-binding)
> 
> - Verified checksums
> - Verified signature matches Rui Fan's
> - Verified tag exists on Github
> - Build from source
> - Verified no binaries in source archive
> - Reviewed release notes PR (some nits)
> 
> Best Regards
> Ahmed Hamdy
> 
> 
> On Thu, 25 Jul 2024 at 12:21, weijie guo 
> wrote:
> 
>> Hi everyone,
>> 
>> 
>> Please review and vote on the release candidate #2 for the version
> 1.20.0,
>> 
>> as follows:
>> 
>> 
>> [ ] +1, Approve the release
>> 
>> [ ] -1, Do not approve the release (please provide specific
>> comments)
>> 
>> 
>> The complete staging area is available for your review, which
>>> includes:
>> 
>> * JIRA release notes [1], and the pull request adding release note
>>> for
>> users [2]
>> 
>> * the official Apache source release and binary convenience
>> releases
>>> to
> be
>> 
>> deployed to dist.apache.org [3], which are signed with the key
>> with
>> 
>> fingerprint B2D64016B940A7E0B9B72E0D7D0528B28037D8BC  [4],
>> 
>> * all artifacts to be deployed to the Maven Central Repository [5],
>> 
>> * source code tag "release-1.20.0-rc2" [6],
>> 
>> * website pull request listing the new release and adding
>>> announcement
> blog
>> 
>> post [7].
>> 
>> 
>> The vote will be open for at least 72 hours. It is adopted by
>>> majority
>> 
>> approval, with at least 3 PMC affirmative votes.
>> 
>> 
>> [1]
>> 
>> 
> 
 
>>> 
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12354210
>> 
>> [2] https://github.com/apache/flink/pull/25091
>> 
>> [3] https://dist.apache.org/repos/dist/dev/flink/flink-1.20.0-rc2/
>> 
>> [4] https://dist.apache.org/repos/dist/release/flink/KEYS
>> 
>> [5]
>> 
 
>> https://repository.apache.org/content/repositories/orgapacheflink-1752/
>> 
>> [6]
>> https://github.com/apache/flink/releases/tag/release-1.20.0-rc2
>> 
>> [7] https://github.com/apache/flink-web/pull/751
>> 
>> 
>> Best,
>> 
>> Robert, Rui, Ufuk, Weijie
>> 
> 
 
>>> 
>> 



[jira] [Created] (FLINK-35938) Avoid commit the same datafile again in Paimon Sink.

2024-07-31 Thread LvYanquan (Jira)
LvYanquan created FLINK-35938:
-

 Summary: Avoid commit the same datafile again in Paimon Sink.
 Key: FLINK-35938
 URL: https://issues.apache.org/jira/browse/FLINK-35938
 Project: Flink
  Issue Type: Technical Debt
  Components: Flink CDC
Affects Versions: cdc-3.1.0
Reporter: LvYanquan
 Fix For: cdc-3.2.0
 Attachments: image-2024-07-31-19-45-14-153.png

[Flink will re-commit 
committables|https://github.com/apache/flink/blob/82b628d4730eef32b2f7a022e3b73cb18f950e6e/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/connector/sink2/GlobalCommitterOperator.java#L148]
 when job restart from failure. This may cause the same datafile were added 
twice in current PaimonCommitter.




!image-2024-07-31-19-45-14-153.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[RESULT][VOTE] Release 1.20.0, release candidate #2

2024-07-31 Thread weijie guo
Hi everyone,


I'm happy to announce that we have unanimously approved this release.


There are 7 approving votes, 5 of which are binding:


- Rui Fan(binding)

- Ahmed Hamdy(non-binding)

- Samrat Deb(non-binding)

- Jing Ge(binding)

- Xintong Song (binding)

- Qingsheng Ren(binding)

- Leonard Xu(binding)


There are no disapproving votes.


Thank you for verifying the release candidate. We will now proceed

to finalize the release and announce it once everything is published.



Best,

Robert, Rui, Ufuk, Weijie


[jira] [Created] (FLINK-35939) Do not set empty config values via ConfigUtils#encodeCollectionToConfig

2024-07-31 Thread Ferenc Csaky (Jira)
Ferenc Csaky created FLINK-35939:


 Summary: Do not set empty config values via 
ConfigUtils#encodeCollectionToConfig
 Key: FLINK-35939
 URL: https://issues.apache.org/jira/browse/FLINK-35939
 Project: Flink
  Issue Type: Improvement
Affects Versions: 1.19.1
Reporter: Ferenc Csaky
 Fix For: 2.0.0


The {{ConfigUtils#encodeCollectionToConfig}} function only skips to set a given 
{{ConfigOption}} value, if that value is null. If the passed collection is 
empty, it will set that empty collection.

I think this behavior is less logical and can cause more undesired situations, 
when we only set a value if it is not empty AND not null.

Furthermore, the method's 
[javadoc|https://github.com/apache/flink/blob/82b628d4730eef32b2f7a022e3b73cb18f950e6e/flink-core/src/main/java/org/apache/flink/configuration/ConfigUtils.java#L73]
 describes the logic I just mentioned above, which is in conflict with the 
actual implementation and tests, which sets an empty collection.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] FLIP-471: Fixing watermark idleness timeout accounting

2024-07-31 Thread Rui Fan
+1(binding)

Best,
Rui

Timo Walther 于2024年7月31日 周三17:47写道:

> +1 (binding)
>
> Thanks for fixing this critical bug.
>
> Regards,
> Timo
>
> On 31.07.24 09:51, Stefan Richter wrote:
> >
> > +1 (binding)
> >
> > Best,
> > Stefan
> >
> >
> >
> >> On 31. Jul 2024, at 04:56, Zakelly Lan  wrote:
> >>
> >> +1 (binding)
> >>
> >>
> >> Best,
> >> Zakelly
> >>
> >> On Wed, Jul 31, 2024 at 12:07 AM Piotr Nowojski 
> >> wrote:
> >>
> >>> Hi all!
> >>>
> >>> I would like to open the vote for FLIP-471 [1]. It has been discussed
> here
> >>> [2].
> >>>
> >>> The vote will remain open for at least 72 hours.
> >>>
> >>> Best,
> >>> Piotrek
> >>>
> >>> [1]
> https://www.google.com/url?q=https://cwiki.apache.org/confluence/x/oQvOEg&source=gmail-imap&ust=172299943500&usg=AOvVaw1i6RoAc0yxKmQkhezGamhM
> >>> [2]
> https://www.google.com/url?q=https://lists.apache.org/thread/byj1l2236rfx3mcl3v4374rcbkq4rf85&source=gmail-imap&ust=172299943500&usg=AOvVaw1z2jA41WlE0WWB_ZCstuci
> >>>
> >
> >
>
>


Re: [DISCUSS] Remove deprecated dataset based API from State Processor API

2024-07-31 Thread Gyula Fóra
Hi!
Thanks Gabor for looking into this.

+1 for removing the DataSet based APIs from the state processor in the next
Flink version, I don't think we should wait until 2.0.
This will also reduce the 2.0 scope overall which is always good :)

Cheers,
Gyula

On Wed, Jul 31, 2024 at 10:30 AM Gabor Somogyi 
wrote:

> Hi Devs,
>
> There is a common agreement that the dataset API going to be removed in
> 2.0.
>
> We're planning to invest quite some time in the State Processor API[1] in
> the near future.
> On this road we've realized that there are several deprecated APIs in this
> area which are based
> on the dataset API (for example Savepoint, ExistingSavepoint, NewSavepoint,
> etc...) [2].
>
> Based on the dataset removal agreement these are going to be removed but in
> order to ease
> our efforts it would be good to remove them now. The question is whether
> it's fine to do it
> or there are some reasons not to do so?
>
> Here are the collected facts about the APIs to help forming an opinion:
> * Example intended to remove class: Savepoint (others share the same
> characteristics)
> * API marker: PublicEvolving
> * Marked deprecated in: FLINK-24912 [3]
> * Deprecation date: 22/12/2021
> * Deprecation in Flink version: 1.15.0
> * Eligible for removal according to our deprecation process [4]: 1.16.0+
> * New replacement APIs: SavepointReader, SavepointWriter
>
> Please share your thoughts on this.
>
> [1]
>
> https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/libs/state_processor_api/
> [2]
>
> https://github.com/apache/flink/blob/82b628d4730eef32b2f7a022e3b73cb18f950e6e/flink-libraries/flink-state-processing-api/src/main/java/org/apache/flink/state/api/Savepoint.java#L47-L49
> [3] https://issues.apache.org/jira/browse/FLINK-24912
> [4]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-321%3A+Introduce+an+API+deprecation+process
>
> BR,
> G
>


[jira] [Created] (FLINK-35940) Upgrade log4j dependencies to 2.23.1

2024-07-31 Thread Daniel Burrell (Jira)
Daniel Burrell created FLINK-35940:
--

 Summary: Upgrade log4j dependencies to 2.23.1
 Key: FLINK-35940
 URL: https://issues.apache.org/jira/browse/FLINK-35940
 Project: Flink
  Issue Type: Improvement
  Components: API / Core
Affects Versions: 1.19.1
 Environment: N/A
Reporter: Daniel Burrell


There is a need to upgrade log4j dependencies as the current version has a 
vulnerability.

I propose upgrading to 2.23.1 which is the latest 2.x version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35941) Add CompiledPlan annotations to BatchExecLimit

2024-07-31 Thread Jim Hughes (Jira)
Jim Hughes created FLINK-35941:
--

 Summary: Add CompiledPlan annotations to BatchExecLimit
 Key: FLINK-35941
 URL: https://issues.apache.org/jira/browse/FLINK-35941
 Project: Flink
  Issue Type: Sub-task
Reporter: Jim Hughes
Assignee: Jim Hughes


In addition to the annotations, implement the BatchCompiledPlan test for this 
operator.

Additionally, tests for the TableSource operator will be pulled into this work.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35942) Add CompiledPlan annotations to BatchExecSort

2024-07-31 Thread Jim Hughes (Jira)
Jim Hughes created FLINK-35942:
--

 Summary: Add CompiledPlan annotations to BatchExecSort
 Key: FLINK-35942
 URL: https://issues.apache.org/jira/browse/FLINK-35942
 Project: Flink
  Issue Type: Sub-task
Reporter: Jim Hughes
Assignee: Jim Hughes


In addition to the annotations, implement the BatchCompiledPlan test for this 
operator.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35943) Add CompiledPlan annotations to BatchExecHashJoin and BatchExecNestedLoopJoin

2024-07-31 Thread Jim Hughes (Jira)
Jim Hughes created FLINK-35943:
--

 Summary: Add CompiledPlan annotations to BatchExecHashJoin and 
BatchExecNestedLoopJoin
 Key: FLINK-35943
 URL: https://issues.apache.org/jira/browse/FLINK-35943
 Project: Flink
  Issue Type: Sub-task
Reporter: Jim Hughes
Assignee: Jim Hughes


In addition to the annotations, implement the BatchCompiledPlan test for these 
two operators.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35944) Add CompiledPlan annotations to BatchExecUnion

2024-07-31 Thread Jim Hughes (Jira)
Jim Hughes created FLINK-35944:
--

 Summary: Add CompiledPlan annotations to BatchExecUnion
 Key: FLINK-35944
 URL: https://issues.apache.org/jira/browse/FLINK-35944
 Project: Flink
  Issue Type: Sub-task
Reporter: Jim Hughes
Assignee: Jim Hughes


In addition to the annotations, implement the BatchCompiledPlan test for this 
operator.

For this operator, the BatchExecHashAggregate operator must be annotated as 
well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35945) Using Flink to dock with Kafka data sources without enabling checkpoints cannot browse consumer group information in Kafka

2024-07-31 Thread Jira
任铭睿 created FLINK-35945:
---

 Summary: Using Flink to dock with Kafka data sources without 
enabling checkpoints cannot browse consumer group information in Kafka
 Key: FLINK-35945
 URL: https://issues.apache.org/jira/browse/FLINK-35945
 Project: Flink
  Issue Type: Improvement
  Components: API / DataStream
Reporter: 任铭睿


Using Flink to dock with Kafka data sources without enabling checkpoints cannot 
browse consumer group information in Kafka



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-470: Support Adaptive Broadcast Join

2024-07-31 Thread Xia Sun
Hi Lincoln,

Thanks for your detailed explanation. I understand your concern.
Introducing configuration with redundant semantics can indeed confuse
users, and the engine should minimize user exposure to these details. Based
on this premise, while also ensuring that users can choose to enable the
broadcast hash join optimization during either the compile-time or runtime,
I think we can introduce a new configuration
`table.optimizer.adaptive-broadcast-join.strategy`, and reuse the existing
configuration `table.optimizer.join.broadcast-threshold` as a unified
threshold for determining broadcast hash join optimization. The
`table.optimizer.adaptive-broadcast-join.strategy` configuration would be
of an enumeration type with three options:

AUTO: Flink will autonomously select the optimal timing for the
optimization.
RUNTIME_ONLY: The broadcast hash join optimization will only be performed
at runtime.
NONE: The broadcast hash join optimization will only be performed at
compile phase.
And AUTO will be the default option.

I have also updated this information in FLIP, PTAL.

Best,
Xia

Lincoln Lee  于2024年7月30日周二 23:39写道:

> Thanks Xia for your explanation!
>
> I can understand your concern, but considering the design of this FLIP,
> which already covers compile-time inaccurate optimization for runtime
> de-optimization, is it necessary to make the user manually turn off
> 'table.optimizer.join.broadcast-threshold' and set the new
> 'table.optimizer.adaptive.join.broadcast-threshold' again? Another option
> is that users only need to focus on the existing broadcast size threshold,
> and accept the reality that 100% accurate optimization cannot be done
> at compile time, and adopt the new capability of dynamic optimization at
> runtime, and ultimately, users will trust that flink will always optimize
> accurately, and from this point of view, I would prefer a generic parameter
> 'table.optimizer. adaptive-optimization.enabled', which would allow for
> more dynamic optimization in the future, not limited to broadcast join
> scenarios and will not continuously bring more new options, WDYT?
>
>
> Best,
> Lincoln Lee
>
>
> Xia Sun  于2024年7月30日周二 11:27写道:
>
> > Hi Lincoln,
> >
> > Thank you for your input and participation in the discussion!
> >
> > Compared to introducing the 'table.optimizer.adaptive-join.enabled'
> option,
> > introducing the "table.optimizer.adaptive.join.broadcast-threshold" can
> > also cover the need to disable static broadcast optimization while only
> > enabling dynamic broadcast optimization. From this perspective,
> introducing
> > a new threshold configuration might be more appropriate. What do you
> think?
> >
> > Best,
> > Xia
> >
> > Lincoln Lee  于2024年7月29日周一 23:12写道:
> >
> > > +1 for this useful optimization!
> > >
> > > I have a question about the new optoin, do we really need two broadcast
> > > join thresholds? IIUC, this adaptive broadcast join is a complement to
> > > compile-time optimization, there is no need for the user to configure
> two
> > > different thresholds (not the off represented by -1), so we just want
> to
> > > control the adaptive optimization itself, should we provide a
> > configuration
> > > option like 'table.optimizer.adaptive-join.enabled' or a more general
> one
> > > 'table.optimizer.adaptive-optimization.enabled' for such related
> > > optimizations?
> > >
> > >
> > > Best,
> > > Lincoln Lee
> > >
> > >
> > > Ron Liu  于2024年7月26日周五 11:59写道:
> > >
> > > > Hi, Xia
> > > >
> > > > Thanks for your reply. It looks good to me.
> > > >
> > > >
> > > > Best,
> > > > Ron
> > > >
> > > > Xia Sun  于2024年7月26日周五 10:49写道:
> > > >
> > > > > Hi Ron,
> > > > >
> > > > > Thanks for your feedback!
> > > > >
> > > > > -> creation of the join operators until runtime
> > > > >
> > > > >
> > > > > That means when creating the AdaptiveJoinOperatorFactory, we will
> not
> > > > > immediately create the JoinOperator. Instead, we only pass in the
> > > > necessary
> > > > > parameters for creating the JoinOperator. The appropriate
> > JoinOperator
> > > > will
> > > > > be created during the StreamGraphOptimizationStrategy optimization
> > > phase.
> > > > >
> > > > > You mentioned that the runtime's visibility into the table planner
> is
> > > > > indeed an issue. It includes two aspects,
> > > > > (1) we plan to place both implementations of the
> > > > > AdaptiveBroadcastJoinOptimizationStrategy and
> > > AdaptiveJoinOperatorFactory
> > > > > in the table layer. During the runtime phase, we will obtain the
> > > > > AdaptiveBroadcastJoinOptimizationStrategy through class loading.
> > > > Therefore,
> > > > > the flink-runtime does not need to be aware of the table layer's
> > > > > implementation.
> > > > > (2) Since the dynamic codegen in the AdaptiveJoinOperatorFactory
> > needs
> > > to
> > > > > be aware of the table planner, we will consider placing the
> > > > > AdaptiveJoinOperatorFactory in the table planner module as well.
> > > > >
> > > > >
> > > > >  -> When did you 

[jira] [Created] (FLINK-35946) Finalize release 1.20.0

2024-07-31 Thread Weijie Guo (Jira)
Weijie Guo created FLINK-35946:
--

 Summary: Finalize release 1.20.0
 Key: FLINK-35946
 URL: https://issues.apache.org/jira/browse/FLINK-35946
 Project: Flink
  Issue Type: New Feature
Reporter: Weijie Guo
Assignee: lincoln lee
 Fix For: 1.19.0


Once the release candidate has been reviewed and approved by the community, the 
release should be finalized. This involves the final deployment of the release 
candidate to the release repositories, merging of the website changes, etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35947) CLONE - Deploy Python artifacts to PyPI

2024-07-31 Thread Weijie Guo (Jira)
Weijie Guo created FLINK-35947:
--

 Summary: CLONE - Deploy Python artifacts to PyPI
 Key: FLINK-35947
 URL: https://issues.apache.org/jira/browse/FLINK-35947
 Project: Flink
  Issue Type: Sub-task
Reporter: Weijie Guo
Assignee: lincoln lee


Release manager should create a PyPI account and ask the PMC add this account 
to pyflink collaborator list with Maintainer role (The PyPI admin account info 
can be found here. NOTE, only visible to PMC members) to deploy the Python 
artifacts to PyPI. The artifacts could be uploaded using 
twine([https://pypi.org/project/twine/]). To install twine, just run:
{code:java}
pip install --upgrade twine==1.12.0
{code}
Download the python artifacts from dist.apache.org and upload it to pypi.org:
{code:java}
svn checkout 
https://dist.apache.org/repos/dist/dev/flink/flink-${RELEASE_VERSION}-rc${RC_NUM}
cd flink-${RELEASE_VERSION}-rc${RC_NUM}
 
cd python
 
#uploads wheels
for f in *.whl; do twine upload --repository-url 
https://upload.pypi.org/legacy/ $f $f.asc; done
 
#upload source packages
twine upload --repository-url https://upload.pypi.org/legacy/ 
apache-flink-libraries-${RELEASE_VERSION}.tar.gz 
apache-flink-libraries-${RELEASE_VERSION}.tar.gz.asc
 
twine upload --repository-url https://upload.pypi.org/legacy/ 
apache-flink-${RELEASE_VERSION}.tar.gz 
apache-flink-${RELEASE_VERSION}.tar.gz.asc
{code}
If upload failed or incorrect for some reason (e.g. network transmission 
problem), you need to delete the uploaded release package of the same version 
(if exists) and rename the artifact to 
\{{{}apache-flink-${RELEASE_VERSION}.post0.tar.gz{}}}, then re-upload.

(!) Note: re-uploading to pypi.org must be avoided as much as possible because 
it will cause some irreparable problems. If that happens, users cannot install 
the apache-flink package by explicitly specifying the package version, i.e. the 
following command "pip install apache-flink==${RELEASE_VERSION}" will fail. 
Instead they have to run "pip install apache-flink" or "pip install 
apache-flink==${RELEASE_VERSION}.post0" to install the apache-flink package.

 

h3. Expectations
 * Python artifacts released and indexed in the 
[PyPI|https://pypi.org/project/apache-flink/] Repository



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35949) CLONE - Create Git tag and mark version as released in Jira

2024-07-31 Thread Weijie Guo (Jira)
Weijie Guo created FLINK-35949:
--

 Summary: CLONE - Create Git tag and mark version as released in 
Jira
 Key: FLINK-35949
 URL: https://issues.apache.org/jira/browse/FLINK-35949
 Project: Flink
  Issue Type: Sub-task
Reporter: Weijie Guo
Assignee: lincoln lee


Create and push a new Git tag for the released version by copying the tag for 
the final release candidate, as follows:
{code:java}
$ git tag -s "release-${RELEASE_VERSION}" refs/tags/${TAG}^{} -m "Release Flink 
${RELEASE_VERSION}"
$ git push  refs/tags/release-${RELEASE_VERSION}
{code}
In JIRA, inside [version 
management|https://issues.apache.org/jira/plugins/servlet/project-config/FLINK/versions],
 hover over the current release and a settings menu will appear. Click Release, 
and select today’s date.

(Note: Only PMC members have access to the project administration. If you do 
not have access, ask on the mailing list for assistance.)

If PRs have been merged to the release branch after the the last release 
candidate was tagged, make sure that the corresponding Jira tickets have the 
correct Fix Version set.

 

h3. Expectations
 * Release tagged in the source code repository
 * Release version finalized in JIRA. (Note: Not all committers have 
administrator access to JIRA. If you end up getting permissions errors ask on 
the mailing list for assistance)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35948) CLONE - Deploy artifacts to Maven Central Repository

2024-07-31 Thread Weijie Guo (Jira)
Weijie Guo created FLINK-35948:
--

 Summary: CLONE - Deploy artifacts to Maven Central Repository
 Key: FLINK-35948
 URL: https://issues.apache.org/jira/browse/FLINK-35948
 Project: Flink
  Issue Type: Sub-task
Reporter: Weijie Guo
Assignee: lincoln lee


Use the [Apache Nexus repository|https://repository.apache.org/] to release the 
staged binary artifacts to the Maven Central repository. In the Staging 
Repositories section, find the relevant release candidate orgapacheflink-XXX 
entry and click Release. Drop all other release candidates that are not being 
released.
h3. Deploy source and binary releases to dist.apache.org

Copy the source and binary releases from the dev repository to the release 
repository at [dist.apache.org|http://dist.apache.org/] using Subversion.
{code:java}
$ svn move -m "Release Flink ${RELEASE_VERSION}" 
https://dist.apache.org/repos/dist/dev/flink/flink-${RELEASE_VERSION}-rc${RC_NUM}
 https://dist.apache.org/repos/dist/release/flink/flink-${RELEASE_VERSION}
{code}
(Note: Only PMC members have access to the release repository. If you do not 
have access, ask on the mailing list for assistance.)
h3. Remove old release candidates from [dist.apache.org|http://dist.apache.org/]

Remove the old release candidates from 
[https://dist.apache.org/repos/dist/dev/flink] using Subversion.
{code:java}
$ svn checkout https://dist.apache.org/repos/dist/dev/flink --depth=immediates
$ cd flink
$ svn remove flink-${RELEASE_VERSION}-rc*
$ svn commit -m "Remove old release candidates for Apache Flink 
${RELEASE_VERSION}
{code}
 

h3. Expectations
 * Maven artifacts released and indexed in the [Maven Central 
Repository|https://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.flink%22]
 (usually takes about a day to show up)
 * Source & binary distributions available in the release repository of 
[https://dist.apache.org/repos/dist/release/flink/]
 * Dev repository [https://dist.apache.org/repos/dist/dev/flink/] is empty
 * Website contains links to new release binaries and sources in download page
 * (for minor version updates) the front page references the correct new major 
release version and directs to the correct link



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35950) CLONE - Publish the Dockerfiles for the new release

2024-07-31 Thread Weijie Guo (Jira)
Weijie Guo created FLINK-35950:
--

 Summary: CLONE - Publish the Dockerfiles for the new release
 Key: FLINK-35950
 URL: https://issues.apache.org/jira/browse/FLINK-35950
 Project: Flink
  Issue Type: Sub-task
Reporter: Weijie Guo
Assignee: lincoln lee


Note: the official Dockerfiles fetch the binary distribution of the target 
Flink version from an Apache mirror. After publishing the binary release 
artifacts, mirrors can take some hours to start serving the new artifacts, so 
you may want to wait to do this step until you are ready to continue with the 
"Promote the release" steps in the follow-up Jira.

Follow the [release instructions in the flink-docker 
repo|https://github.com/apache/flink-docker#release-workflow] to build the new 
Dockerfiles and send an updated manifest to Docker Hub so the new images are 
built and published.

 

h3. Expectations

 * Dockerfiles in [flink-docker|https://github.com/apache/flink-docker] updated 
for the new Flink release and pull request opened on the Docker official-images 
with an updated manifest



--
This message was sent by Atlassian Jira
(v8.20.10#820010)