Re: [SUMMARY] Flink 1.17 release sync 23rd of January, 2023

2023-01-26 Thread Martijn Visser
Hi Lijie,

You're right, that's not clearly phrased. Ideally we would like to get it
fixed before the 26th, because that gives us a couple of days to monitor
the benchmarks to see if the regression is actually fixed. If a contributor
is actively working on a ticket, we could consider delaying the branch cut
(not the feature freeze) over until the issue is resolved. That is of
course except the fix would take something like more than a week after the
31st, then it makes more sense to revert the commit that introduced the
regression.

Let me know what you think.

Best regards,

Martijn

Op do 26 jan. 2023 om 01:11 schreef Lijie Wang :

> Hi Martijn,
>
> I'm working on FLINK-30624,  and it may take a while to be resolved. Do you
> mean we should resolve it before the 26th? I used to think the deadline was
> the 31st(the date of feature freeze).
>
> Best,
> Lijie
>
> Martijn Visser  于2023年1月25日周三 18:07写道:
>
> > Hi everyone,
> >
> > A summary of the release sync of yesterday:
> >
> > - We still have 3 performance regressions (
> > https://issues.apache.org/jira/browse/FLINK-30623,
> > https://issues.apache.org/jira/browse/FLINK-30625,
> > https://issues.apache.org/jira/browse/FLINK-30624) that are being worked
> > on
> > but need to be completed before the release branch cut on the 31st. If we
> > can't merge PRs to resolve this (at latest on Thursday the 26th) we will
> > revert the commits that introduced the regressions.
> > - There are 3 release blockers from a test perspective:
> > https://issues.apache.org/jira/browse/FLINK-29405,
> > https://issues.apache.org/jira/browse/FLINK-30727 and
> > https://issues.apache.org/jira/browse/FLINK-29427. Please make sure that
> > if
> > you are assigned to this ticket, that you have marked the ticket as "In
> > Progress".
> > - The feature freeze starts on Thursday the 31st of January and the
> release
> > branch will be cut as soon as the blockers have been resolved. When the
> > release branch has been cut, the release testing will start.
> >
> > Best regards,
> >
> > Qingsheng, Leonard, Matthias and Martijn
> >
>


[jira] [Created] (FLINK-30794) Disable dependency convergence check for night connector builds

2023-01-26 Thread Martijn Visser (Jira)
Martijn Visser created FLINK-30794:
--

 Summary: Disable dependency convergence check for night connector 
builds
 Key: FLINK-30794
 URL: https://issues.apache.org/jira/browse/FLINK-30794
 Project: Flink
  Issue Type: Technical Debt
  Components: Connectors / Common
Reporter: Martijn Visser
Assignee: Martijn Visser






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Confluent Avro and Debezium formats - default schema name can be incompatible with registered schema name

2023-01-26 Thread Fruzsina Nagy
Hi everyone,

I have come across the below issue, while experimenting with the Confluent 
registry and avro-confluent, debezium-avro-confluent formats. Please let me 
know your thoughts on it. Should this issue be addressed?

Thanks in advance,
Fruzsina
The use case

Create a new topic on Confluent Cloud
Create a value schema with the name “sampleRecord”:
{
  "type": "record",
  "namespace": "com.mycorp.mynamespace",
  "name": "sampleRecord",
…}
Create table with “avro-confluent” format:
CREATE TABLE `newtesttopic` (
 `my_field1` INT NOT NULL,
 `my_field2` DOUBLE NOT NULL,
 `my_field3` VARCHAR(2147483647) NOT NULL,
 ") WITH (
 'connector' = 'kafka',
 'topic' = 'newtesttopic',
 'scan.startup.mode' = 'latest-offset',
 'properties.bootstrap.servers' = 'bootstrapServers',
 'properties.sasl.jaas.config' = 'saslJaasConfig',
 'properties.sasl.mechanism' = 'PLAIN',
 'properties.security.protocol' = 'SASL_SSL',
 'format' = 'avro-confluent',
 'avro-confluent.url' = 'confluentSchemaRegUrl',
 'avro-confluent.basic-auth.credentials-source' = 'USER_INFO',
 'avro-confluent.basic-auth.user-info' = 'user:pw')

Insert data into the “newtesttopic”
The following error is thrown:
io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: 
Schema being registered is incompatible with an earlier schema for subject 
"newtesttopic-value", details: [Incompatibility{type:NAME_MISMATCH, 
location:/name, message:expected: com.mycorp.mynamespace.sampleRecord, 
reader:{"type":"record","name":"record",...}, 
writer:{"type":"record","name":"sampleRecord",...}
This error of course can be avoided if we don’t register a schema for our topic 
on the Confluent Cloud site before inserting data into the kafka table, and we 
just let Flink register it for us with the name “record”.

The cause of the error

I found that the error is caused by the 
EncodingFormat> created by 
RegistryAvroFormatFactory.createEncodingFormat, because when creating a 
AvroRowDataSerializationSchema, it uses 
AvroSchemaConverter.convertToSchema(LogicalType schema) 

which names the schema “record” 

 by default.

But the registered schema is named “sampleRecord” in the above example, so the 
Confluent Schema Registry client doesn’t accept it.
The problem

To resolve this I added a new option “schema-name” to “avro-confluent” and 
“debezium-avro-confluent” formats. And as I was testing the 
“debezium-avro-confluent” format, it turned out that this solution doesn’t 
solve the problem in those cases when there are named schemas (record, enum, 
fixed types) nested in the schema of the topic.

For example:
In case of “debezium-avro-confluent” the schema created is a union of null and 
a Debezium specific record schema (before, after, op). If I use the above 
option to provide a specific name for the schema, I get an 
org.apache.avro.UnresolvedUnionException, because 
AvroRowDataSerializationSchema 

 converts the RowType to a record schema with the name “record”, which will not 
be found in the union, if the the Debezium specific record has a different name.
Union type is problematic because in the general case, if we define a union 
schema [schema1, schema2]meaning that the schema is either schema1 or schema2, 
we must determine somehow which schema we are converting the RowType to.

In case of nested named schemas, Flink creates a name based on the record name 
and the field name 
.
 Schema registry client will also throw an error in this case, if the 
registered names don’t match.

Possible solutions

Look up names of the schemas in the field comment, e.g. if there is a field of 
type ROW with a comment “avro-name = recordname”, 
we can use this name when converting the LogicalType to avro schema.
there could be a schema-name option for the schema of the topic / table or
the name of the topic / table schema could be defined in the table comment
Use further table options to define the schema names, e.g.:
‘avro-confluent.schema-name.record.nested_record’ = ‘nested_record_name’ (where 
record and nested_record are field names)
in this case the schema-name option is suffixed with the path to the named 
schema



Re: [VOTE] FLIP-285: Refactoring LeaderElection to make Flink support multi-component leader election out-of-the-box

2023-01-26 Thread Chesnay Schepler

+1

On 25/01/2023 10:33, Matthias Pohl wrote:

Hi everyone,
After the discussion thread [1] on FLIP-285 [2] didn't bring up any new
items, I want to start voting on FLIP-285. This FLIP will not only align
the leader election code base again through FLINK-26522 [3]. I also plan to
improve the test coverage for the leader election as part of this change
(covered in FLINK-30338 [4]).

The vote will remain open until at least Jan 30th (at least 72 hours)
unless there are some objections or insufficient votes.

Best,
Matthias

[1] https://lists.apache.org/thread/qrl881wykob3jnmzsof5ho8b9fgkklpt
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-285%3A+Refactoring+LeaderElection+to+make+Flink+support+multi-component+leader+election+out-of-the-box
[3] https://issues.apache.org/jira/browse/FLINK-26522
[4] https://issues.apache.org/jira/browse/FLINK-30338





Re: Reworking the Rescale API

2023-01-26 Thread Maximilian Michels
Hey ConradJam,

Thank you for your thoughtful response. It would be great to start writing
a FLIP for the Rescale API. If you want to take a stab, please go ahead,
I'd be happy to review. I'm sure Gyula or others will also chime in.

I want to answer your question so we are aligned:

● Does scaling work on YARN, or just k8s?
>

I think it should work for both YARN and K8s. We would have to make changes
to the drivers (AbstractResourceManagerDriver) which is implemented for
both K8s and YARN. The outlined approach for rescaling does not require
integrating with those systems, just maybe updating how the driver is used,
so we should be able to make it work across both YARN and K8s.

● Rescaling supports Standalone mode?
>

Yes, I think it should and easily can. We do use a different type of
resource manager (StandaloneResourceManager, not ActiveResourceManager) but
I think the logic will sit on a higher level where the ResourceManager
implementation is not relevant.

● Can we simplify the recovery steps?
>

For the first version, I would prefer the simple approach of (1) acquiring
the required slots for rescaling, then (2) trigger a stop with savepoint
(3) resubmit the job with updated parallelisms. What you have in mind is a
bit more involved but certainly a great optimization, especially when only
a fraction of the job state needs to be repartitioned.

Of course, there are many details, such as
> ● At some point we may not be able to use this kind of hot update, and
> still need to restart the job, when this happens, we should prevent users
> from using rescaling requests
>

I'm curious to learn more about "hot updates". How would we support this in
Flink? Would we have to support dynamically repartitioning tasks? I don't
think Flink supports this yet. For now, restarting the job may be the best
we can do.

● After rescaling is submitted, when we fail, there should be a rollback
> mechanism to roll back to the previous degree of parallelism.
>

This should not be necessary if all the requirements for rescaling, e.g.
enough task slots, are satisfied by the Rescale API. I'm not even sure
rolling back is an option because we can't guarantee that a rollback would
always work.

Thanks,
Max

On Tue, Jan 24, 2023 at 6:34 AM ConradJam  wrote:

> Hello max
>
> Thanks for driving it, I think there is no problem with your previous
> suggestion of [1] FLINK-30773. Here I just put forward some supplements and
> doubts.I have some suggestions and insights for this
>
>  I have experienced the autoscaling of Flink K8S Operator for a part of the
> time. The current method is to stop the job and modify the parallelism,
> which will interrupt the business for a long time. I think the purpose of
> modifying Rescaling Api is to better fit cloud native and reduce the impact
> of job scaling downtime.
>
> I have tried scaling with less time, and I call this step "hot update
> parallelism" (if there is an available Slots, there is no need to re-deploy
> the JobManager Or TaskManager on K8S)
>
> Around this topic, I raised the *following questions*:
> ● Does scaling work on YARN, or just k8s?
>○ I think we can support running on K8S for the first version, and Yarn
> can be considered later
> ● Rescaling supports Standalone mode?
>○ I think it can be supported. The essence is just to modify the
> parallelism of job vertices. As for the tuning strategy, it should be
> determined by the external system or K8S Operator
> ● Can we simplify the recovery steps?
>○ As far as I know, the traditional way to adjust the parallelism is to
> stop a job and do a Savepoint, and then run the job with the adjusted
> parallelism. If we hide this step in the *JobManager*, it will be an
> important means to reduce the delay.
>
>   Of course, there are many details, such as
> ● At some point we may not be able to use this kind of hot update, and
> still need to restart the job, when this happens, we should prevent users
> from using rescaling requests
> ● After rescaling is submitted, when we fail, there should be a rollback
> mechanism to roll back to the previous degree of parallelism.
>
> more and more ~
>
>   By the way, because the content may be more, I did not expand more ideas
> and descriptions here. This proposal modifies the original Rescaling API.
> I would also like to hear if  *@gyula* has some new ideas on this as it was
> also involved in the development of FLIP-271
> I am willing to write a FLIP for this purpose to improve and write some
> ideas with dev Community and then submit it. What do you think about
> starting a discussion for the community?
>
>
>1. https://issues.apache.org/jira/browse/FLINK-30773
>
> Best~
>
> Maximilian Michels  于2023年1月24日周二 01:08写道:
>
> > Hi,
> >
> > The current rescale API appears to be a work in progress. A couple years
> > ago, we disabled access to the API [1].
> >
> > I'm looking into this problem as part of working on autoscaling [2] where
> > we currently require a full restart of th

Re: [VOTE] Release 1.16.1, release candidate #1

2023-01-26 Thread Sergey Nuyanzin
Thanks Martijn.
+1 (non-binding)

- Verified checksums and signatures
- Built from sources
- Compared downloaded with mentioned git tag
- Verified version in poms
- Ran several queries in batch and streaming (standalone cluster)
- Verified NOTICE and LICENSE files


On Mon, Jan 23, 2023 at 2:06 PM Matthias Pohl
 wrote:

> Thanks Martijn for pushing this one.
> +1 (non-binding)
>
> * Downloaded artifacts & built Flink from sources
> * Verified SHA512 checksums & GPG signatures
> * Compared checkout with provided sources
> * Verified pom file versions
> * Went over NOTICE file/pom files changes without finding anything
> suspicious
> * Deployed standalone session cluster and ran WordCount example in batch
> and streaming: Nothing suspicious in log files found
> * went over flink-web PR
>
> On Thu, Jan 19, 2023 at 5:08 PM Martijn Visser 
> wrote:
>
> > Hi everyone,
> > Please review and vote on the release candidate #1 for the version
> 1.16.1,
> > as follows:
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> >
> > The complete staging area is available for your review, which includes:
> > * JIRA release notes [1],
> > * the official Apache source release and binary convenience releases to
> be
> > deployed to dist.apache.org [2], which are signed with the key with
> > fingerprint A5F3BCE4CBE993573EC5966A65321B8382B219AF [3] (,
> > * all artifacts to be deployed to the Maven Central Repository [4],
> > * source code tag "release-1.16.1-rc1" [5],
> > * website pull request listing the new release and adding announcement
> blog
> > post [6].
> >
> > The vote will be open for at least 72 hours. It is adopted by majority
> > approval, with at least 3 PMC affirmative votes.
> >
> > Thanks,
> > Release Manager
> >
> > NOTE: The maven artifacts have been signed by Chesnay with the key with
> > fingerprint C2EED7B111D464BA
> >
> > [1]
> >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12352344
> > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.16.1-rc1/
> > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > [4]
> https://repository.apache.org/content/repositories/orgapacheflink-1580
> > [5] https://github.com/apache/flink/releases/tag/release-1.16.1-rc1
> > [6] https://github.com/apache/flink-web/pull/603
> >
>


-- 
Best regards,
Sergey


Re: [SUMMARY] Flink 1.17 release sync 23rd of January, 2023

2023-01-26 Thread Lijie Wang
Hi Martijn,

Thanks for your detailed explanation. Honestly, the FLINK-30624 is unlikely
to be completed before the 26th, but it 's likely to be completed before
the 31st. Considering that there will be a period of time between feature
freeze and release, I think we have enough time to see if it is actually
fixed.

In summary, I would prefer to extend the "revert regression commits" to
31st.

Best,
Lijie

Martijn Visser  于2023年1月26日周四 16:22写道:

> Hi Lijie,
>
> You're right, that's not clearly phrased. Ideally we would like to get it
> fixed before the 26th, because that gives us a couple of days to monitor
> the benchmarks to see if the regression is actually fixed. If a contributor
> is actively working on a ticket, we could consider delaying the branch cut
> (not the feature freeze) over until the issue is resolved. That is of
> course except the fix would take something like more than a week after the
> 31st, then it makes more sense to revert the commit that introduced the
> regression.
>
> Let me know what you think.
>
> Best regards,
>
> Martijn
>
> Op do 26 jan. 2023 om 01:11 schreef Lijie Wang :
>
> > Hi Martijn,
> >
> > I'm working on FLINK-30624,  and it may take a while to be resolved. Do
> you
> > mean we should resolve it before the 26th? I used to think the deadline
> was
> > the 31st(the date of feature freeze).
> >
> > Best,
> > Lijie
> >
> > Martijn Visser  于2023年1月25日周三 18:07写道:
> >
> > > Hi everyone,
> > >
> > > A summary of the release sync of yesterday:
> > >
> > > - We still have 3 performance regressions (
> > > https://issues.apache.org/jira/browse/FLINK-30623,
> > > https://issues.apache.org/jira/browse/FLINK-30625,
> > > https://issues.apache.org/jira/browse/FLINK-30624) that are being
> worked
> > > on
> > > but need to be completed before the release branch cut on the 31st. If
> we
> > > can't merge PRs to resolve this (at latest on Thursday the 26th) we
> will
> > > revert the commits that introduced the regressions.
> > > - There are 3 release blockers from a test perspective:
> > > https://issues.apache.org/jira/browse/FLINK-29405,
> > > https://issues.apache.org/jira/browse/FLINK-30727 and
> > > https://issues.apache.org/jira/browse/FLINK-29427. Please make sure
> that
> > > if
> > > you are assigned to this ticket, that you have marked the ticket as "In
> > > Progress".
> > > - The feature freeze starts on Thursday the 31st of January and the
> > release
> > > branch will be cut as soon as the blockers have been resolved. When the
> > > release branch has been cut, the release testing will start.
> > >
> > > Best regards,
> > >
> > > Qingsheng, Leonard, Matthias and Martijn
> > >
> >
>


Re: [VOTE] Release 1.16.1, release candidate #1

2023-01-26 Thread Konstantin Knauf
Thanks, Martijn.

+1 (binding)

- Verified checksums and signatures for binaries & sources - OK
- checked pom changes for newly introduced dependencies - OK
- went over flink-web PR - Looks good except for Matthias' remarks

Am Do., 26. Jan. 2023 um 12:34 Uhr schrieb Sergey Nuyanzin <
snuyan...@gmail.com>:

> Thanks Martijn.
> +1 (non-binding)
>
> - Verified checksums and signatures
> - Built from sources
> - Compared downloaded with mentioned git tag
> - Verified version in poms
> - Ran several queries in batch and streaming (standalone cluster)
> - Verified NOTICE and LICENSE files
>
>
> On Mon, Jan 23, 2023 at 2:06 PM Matthias Pohl
>  wrote:
>
> > Thanks Martijn for pushing this one.
> > +1 (non-binding)
> >
> > * Downloaded artifacts & built Flink from sources
> > * Verified SHA512 checksums & GPG signatures
> > * Compared checkout with provided sources
> > * Verified pom file versions
> > * Went over NOTICE file/pom files changes without finding anything
> > suspicious
> > * Deployed standalone session cluster and ran WordCount example in batch
> > and streaming: Nothing suspicious in log files found
> > * went over flink-web PR
> >
> > On Thu, Jan 19, 2023 at 5:08 PM Martijn Visser  >
> > wrote:
> >
> > > Hi everyone,
> > > Please review and vote on the release candidate #1 for the version
> > 1.16.1,
> > > as follows:
> > > [ ] +1, Approve the release
> > > [ ] -1, Do not approve the release (please provide specific comments)
> > >
> > >
> > > The complete staging area is available for your review, which includes:
> > > * JIRA release notes [1],
> > > * the official Apache source release and binary convenience releases to
> > be
> > > deployed to dist.apache.org [2], which are signed with the key with
> > > fingerprint A5F3BCE4CBE993573EC5966A65321B8382B219AF [3] (,
> > > * all artifacts to be deployed to the Maven Central Repository [4],
> > > * source code tag "release-1.16.1-rc1" [5],
> > > * website pull request listing the new release and adding announcement
> > blog
> > > post [6].
> > >
> > > The vote will be open for at least 72 hours. It is adopted by majority
> > > approval, with at least 3 PMC affirmative votes.
> > >
> > > Thanks,
> > > Release Manager
> > >
> > > NOTE: The maven artifacts have been signed by Chesnay with the key with
> > > fingerprint C2EED7B111D464BA
> > >
> > > [1]
> > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12352344
> > > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.16.1-rc1/
> > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > [4]
> > https://repository.apache.org/content/repositories/orgapacheflink-1580
> > > [5] https://github.com/apache/flink/releases/tag/release-1.16.1-rc1
> > > [6] https://github.com/apache/flink-web/pull/603
> > >
> >
>
>
> --
> Best regards,
> Sergey
>


-- 
https://twitter.com/snntrable
https://github.com/knaufk


Re: [SUMMARY] Flink 1.17 release sync 23rd of January, 2023

2023-01-26 Thread Martijn Visser
Hi Lijie,

> In summary, I would prefer to extend the "revert regression commits" to
31st.

Sounds good to me.

Thanks,

Martijn

Op do 26 jan. 2023 om 12:48 schreef Lijie Wang :

> Hi Martijn,
>
> Thanks for your detailed explanation. Honestly, the FLINK-30624 is unlikely
> to be completed before the 26th, but it 's likely to be completed before
> the 31st. Considering that there will be a period of time between feature
> freeze and release, I think we have enough time to see if it is actually
> fixed.
>
> In summary, I would prefer to extend the "revert regression commits" to
> 31st.
>
> Best,
> Lijie
>
> Martijn Visser  于2023年1月26日周四 16:22写道:
>
> > Hi Lijie,
> >
> > You're right, that's not clearly phrased. Ideally we would like to get it
> > fixed before the 26th, because that gives us a couple of days to monitor
> > the benchmarks to see if the regression is actually fixed. If a
> contributor
> > is actively working on a ticket, we could consider delaying the branch
> cut
> > (not the feature freeze) over until the issue is resolved. That is of
> > course except the fix would take something like more than a week after
> the
> > 31st, then it makes more sense to revert the commit that introduced the
> > regression.
> >
> > Let me know what you think.
> >
> > Best regards,
> >
> > Martijn
> >
> > Op do 26 jan. 2023 om 01:11 schreef Lijie Wang  >:
> >
> > > Hi Martijn,
> > >
> > > I'm working on FLINK-30624,  and it may take a while to be resolved. Do
> > you
> > > mean we should resolve it before the 26th? I used to think the deadline
> > was
> > > the 31st(the date of feature freeze).
> > >
> > > Best,
> > > Lijie
> > >
> > > Martijn Visser  于2023年1月25日周三 18:07写道:
> > >
> > > > Hi everyone,
> > > >
> > > > A summary of the release sync of yesterday:
> > > >
> > > > - We still have 3 performance regressions (
> > > > https://issues.apache.org/jira/browse/FLINK-30623,
> > > > https://issues.apache.org/jira/browse/FLINK-30625,
> > > > https://issues.apache.org/jira/browse/FLINK-30624) that are being
> > worked
> > > > on
> > > > but need to be completed before the release branch cut on the 31st.
> If
> > we
> > > > can't merge PRs to resolve this (at latest on Thursday the 26th) we
> > will
> > > > revert the commits that introduced the regressions.
> > > > - There are 3 release blockers from a test perspective:
> > > > https://issues.apache.org/jira/browse/FLINK-29405,
> > > > https://issues.apache.org/jira/browse/FLINK-30727 and
> > > > https://issues.apache.org/jira/browse/FLINK-29427. Please make sure
> > that
> > > > if
> > > > you are assigned to this ticket, that you have marked the ticket as
> "In
> > > > Progress".
> > > > - The feature freeze starts on Thursday the 31st of January and the
> > > release
> > > > branch will be cut as soon as the blockers have been resolved. When
> the
> > > > release branch has been cut, the release testing will start.
> > > >
> > > > Best regards,
> > > >
> > > > Qingsheng, Leonard, Matthias and Martijn
> > > >
> > >
> >
>


[jira] [Created] (FLINK-30795) StreamingWithStateTestBase of win not correct

2023-01-26 Thread JinxinTang (Jira)
JinxinTang created FLINK-30795:
--

 Summary: StreamingWithStateTestBase of win not correct
 Key: FLINK-30795
 URL: https://issues.apache.org/jira/browse/FLINK-30795
 Project: Flink
  Issue Type: Bug
  Components: Tests
 Environment: Windows path such as

"

file://C:\Users\xx\AppData\Local\Temp\junit373749850266957074\junit7014045318909690439

"

will throw 

 
new IllegalArgumentException("Cannot use the root directory for checkpoints.");
Reporter: JinxinTang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] Release 1.16.1, release candidate #1

2023-01-26 Thread Dawid Wysakowicz

+1 (binding)

 * verified checksums & signatures
 * checked differences of pom.xml and NOTICE files with 1.16.0 release.
   looks good
 * checked source release contains no binaries
 * built from sources
 * run StateMachineExample on a local cluster
 * checked the web PR

Best,

Dawid

On 19/01/2023 17:07, Martijn Visser wrote:

Hi everyone,
Please review and vote on the release candidate #1 for the version 1.16.1,
as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)


The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release and binary convenience releases to be
deployed to dist.apache.org [2], which are signed with the key with
fingerprint A5F3BCE4CBE993573EC5966A65321B8382B219AF [3] (,
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "release-1.16.1-rc1" [5],
* website pull request listing the new release and adding announcement blog
post [6].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
Release Manager

NOTE: The maven artifacts have been signed by Chesnay with the key with
fingerprint C2EED7B111D464BA

[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12352344
[2]https://dist.apache.org/repos/dist/dev/flink/flink-1.16.1-rc1/
[3]https://dist.apache.org/repos/dist/release/flink/KEYS
[4]https://repository.apache.org/content/repositories/orgapacheflink-1580
[5]https://github.com/apache/flink/releases/tag/release-1.16.1-rc1
[6]https://github.com/apache/flink-web/pull/603



OpenPGP_0x31D2DD10BFC15A2D.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


[jira] [Created] (FLINK-30796) Make Slf4jReporter less noisy when no/few metrics exist

2023-01-26 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-30796:


 Summary: Make Slf4jReporter less noisy when no/few metrics exist
 Key: FLINK-30796
 URL: https://issues.apache.org/jira/browse/FLINK-30796
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Metrics
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.17.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Reworking the Rescale API

2023-01-26 Thread Konstantin Knauf
Hi Max,

it seems to me we are now running in some of the potential duplication of
efforts across the standard and adaptive scheduler that Chesnay had
mentioned on the original ticket. The issue of having to do a full restart
of the Job for rescaling as well as waiting for resources to be available
before doing a rescaling operation were some of the main motivations behind
introducing the adaptive scheduler. In the adaptive scheduler we can
further do things like only to trigger a rescaling operations exactly when
a checkpoint was completed to minimize reprocessing. For Jobs with small
state size, the downtime during rescaling can already be << 1 second today.

Chesnay and David Moravek are currently in the process of drafting two
FLIPs that will extend the support of the adaptive scheduler to session
mode and will allow clients to change the desired/min/max parallelism of
the vertices of a Job during its runtime via the REST API. We currently
plan to publish a draft of these FLIPs next week for discussion. Would you
consider moving to the adaptive scheduler for the kubernetes operator
provided these FLIPs make it? I think, it has the potential to simplify the
logic required for rescaling on the operator side quite a bit.

Best,

Konstantin


Am Do., 26. Jan. 2023 um 12:16 Uhr schrieb Maximilian Michels <
m...@apache.org>:

> Hey ConradJam,
>
> Thank you for your thoughtful response. It would be great to start writing
> a FLIP for the Rescale API. If you want to take a stab, please go ahead,
> I'd be happy to review. I'm sure Gyula or others will also chime in.
>
> I want to answer your question so we are aligned:
>
> ● Does scaling work on YARN, or just k8s?
> >
>
> I think it should work for both YARN and K8s. We would have to make changes
> to the drivers (AbstractResourceManagerDriver) which is implemented for
> both K8s and YARN. The outlined approach for rescaling does not require
> integrating with those systems, just maybe updating how the driver is used,
> so we should be able to make it work across both YARN and K8s.
>
> ● Rescaling supports Standalone mode?
> >
>
> Yes, I think it should and easily can. We do use a different type of
> resource manager (StandaloneResourceManager, not ActiveResourceManager) but
> I think the logic will sit on a higher level where the ResourceManager
> implementation is not relevant.
>
> ● Can we simplify the recovery steps?
> >
>
> For the first version, I would prefer the simple approach of (1) acquiring
> the required slots for rescaling, then (2) trigger a stop with savepoint
> (3) resubmit the job with updated parallelisms. What you have in mind is a
> bit more involved but certainly a great optimization, especially when only
> a fraction of the job state needs to be repartitioned.
>
> Of course, there are many details, such as
> > ● At some point we may not be able to use this kind of hot update, and
> > still need to restart the job, when this happens, we should prevent users
> > from using rescaling requests
> >
>
> I'm curious to learn more about "hot updates". How would we support this in
> Flink? Would we have to support dynamically repartitioning tasks? I don't
> think Flink supports this yet. For now, restarting the job may be the best
> we can do.
>
> ● After rescaling is submitted, when we fail, there should be a rollback
> > mechanism to roll back to the previous degree of parallelism.
> >
>
> This should not be necessary if all the requirements for rescaling, e.g.
> enough task slots, are satisfied by the Rescale API. I'm not even sure
> rolling back is an option because we can't guarantee that a rollback would
> always work.
>
> Thanks,
> Max
>
> On Tue, Jan 24, 2023 at 6:34 AM ConradJam  wrote:
>
> > Hello max
> >
> > Thanks for driving it, I think there is no problem with your previous
> > suggestion of [1] FLINK-30773. Here I just put forward some supplements
> and
> > doubts.I have some suggestions and insights for this
> >
> >  I have experienced the autoscaling of Flink K8S Operator for a part of
> the
> > time. The current method is to stop the job and modify the parallelism,
> > which will interrupt the business for a long time. I think the purpose of
> > modifying Rescaling Api is to better fit cloud native and reduce the
> impact
> > of job scaling downtime.
> >
> > I have tried scaling with less time, and I call this step "hot update
> > parallelism" (if there is an available Slots, there is no need to
> re-deploy
> > the JobManager Or TaskManager on K8S)
> >
> > Around this topic, I raised the *following questions*:
> > ● Does scaling work on YARN, or just k8s?
> >○ I think we can support running on K8S for the first version, and
> Yarn
> > can be considered later
> > ● Rescaling supports Standalone mode?
> >○ I think it can be supported. The essence is just to modify the
> > parallelism of job vertices. As for the tuning strategy, it should be
> > determined by the external system or K8S Operator
> > ● Can we simplify the reco

Re: Reworking the Rescale API

2023-01-26 Thread Gyula Fóra
Hi Konstantin!

I think the Adaptive Scheduler still will not support Kubernetes Native
integration and can only be used in standalone mode. This means that the
operator needs to manage all resources externally, and compute exactly how
much new slots are needed during rescaling etc.

I think whatever scaling API we build, it should work for both standalone
and native integration as much as possible. It's not a duplicated effort to
add it to the standard scheduler as long as the adaptive scheduler does not
support active resource management.

Also it seems this will not reduce complexity on the operator side, which
can already do scaling actions by executing an upgrade.

And a side note: the operator supports both native and standalone
integration (both standard and adaptive scheduler this way) but the bigger
problem is actually computing the required number of slots and required new
resources which is much harder than simply using active resource management.

Cheers,
Gyula

On Thu, Jan 26, 2023 at 2:57 PM Konstantin Knauf  wrote:

> Hi Max,
>
> it seems to me we are now running in some of the potential duplication of
> efforts across the standard and adaptive scheduler that Chesnay had
> mentioned on the original ticket. The issue of having to do a full restart
> of the Job for rescaling as well as waiting for resources to be available
> before doing a rescaling operation were some of the main motivations behind
> introducing the adaptive scheduler. In the adaptive scheduler we can
> further do things like only to trigger a rescaling operations exactly when
> a checkpoint was completed to minimize reprocessing. For Jobs with small
> state size, the downtime during rescaling can already be << 1 second today.
>
> Chesnay and David Moravek are currently in the process of drafting two
> FLIPs that will extend the support of the adaptive scheduler to session
> mode and will allow clients to change the desired/min/max parallelism of
> the vertices of a Job during its runtime via the REST API. We currently
> plan to publish a draft of these FLIPs next week for discussion. Would you
> consider moving to the adaptive scheduler for the kubernetes operator
> provided these FLIPs make it? I think, it has the potential to simplify the
> logic required for rescaling on the operator side quite a bit.
>
> Best,
>
> Konstantin
>
>
> Am Do., 26. Jan. 2023 um 12:16 Uhr schrieb Maximilian Michels <
> m...@apache.org>:
>
>> Hey ConradJam,
>>
>> Thank you for your thoughtful response. It would be great to start writing
>> a FLIP for the Rescale API. If you want to take a stab, please go ahead,
>> I'd be happy to review. I'm sure Gyula or others will also chime in.
>>
>> I want to answer your question so we are aligned:
>>
>> ● Does scaling work on YARN, or just k8s?
>> >
>>
>> I think it should work for both YARN and K8s. We would have to make
>> changes
>> to the drivers (AbstractResourceManagerDriver) which is implemented for
>> both K8s and YARN. The outlined approach for rescaling does not require
>> integrating with those systems, just maybe updating how the driver is
>> used,
>> so we should be able to make it work across both YARN and K8s.
>>
>> ● Rescaling supports Standalone mode?
>> >
>>
>> Yes, I think it should and easily can. We do use a different type of
>> resource manager (StandaloneResourceManager, not ActiveResourceManager)
>> but
>> I think the logic will sit on a higher level where the ResourceManager
>> implementation is not relevant.
>>
>> ● Can we simplify the recovery steps?
>> >
>>
>> For the first version, I would prefer the simple approach of (1) acquiring
>> the required slots for rescaling, then (2) trigger a stop with savepoint
>> (3) resubmit the job with updated parallelisms. What you have in mind is a
>> bit more involved but certainly a great optimization, especially when only
>> a fraction of the job state needs to be repartitioned.
>>
>> Of course, there are many details, such as
>> > ● At some point we may not be able to use this kind of hot update, and
>> > still need to restart the job, when this happens, we should prevent
>> users
>> > from using rescaling requests
>> >
>>
>> I'm curious to learn more about "hot updates". How would we support this
>> in
>> Flink? Would we have to support dynamically repartitioning tasks? I don't
>> think Flink supports this yet. For now, restarting the job may be the best
>> we can do.
>>
>> ● After rescaling is submitted, when we fail, there should be a rollback
>> > mechanism to roll back to the previous degree of parallelism.
>> >
>>
>> This should not be necessary if all the requirements for rescaling, e.g.
>> enough task slots, are satisfied by the Rescale API. I'm not even sure
>> rolling back is an option because we can't guarantee that a rollback would
>> always work.
>>
>> Thanks,
>> Max
>>
>> On Tue, Jan 24, 2023 at 6:34 AM ConradJam  wrote:
>>
>> > Hello max
>> >
>> > Thanks for driving it, I think there is no problem with your previous
>> > su

Re: [VOTE] Release flink-connector-gcp-pubsub v3.0.0, release candidate #1

2023-01-26 Thread Konstantin Knauf
+1 (binding)

* checked Maven and source artifact signatures and checksums - OK
* no binaries or packaged dependencies - OK
* checked website changes - Approved.

Am Fr., 20. Jan. 2023 um 15:39 Uhr schrieb Martijn Visser <
martijnvis...@apache.org>:

> Hi everyone,
> Please review and vote on the release candidate #1 for the version 3.0.0,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org
> [2],
> which are signed with the key with fingerprint
> A5F3BCE4CBE993573EC5966A65321B8382B219AF [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag v3.0.0-rc1 [5],
> * website pull request listing the new release [6].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Release Manager
>
> [1]
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12352589
> [2]
>
> https://dist.apache.org/repos/dist/dev/flink/flink-connector-gcp-pubsub-3.0.0-rc1
> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> [4]
> https://repository.apache.org/content/repositories/orgapacheflink-1581/
> [5]
>
> https://github.com/apache/flink-connector-gcp-pubsub/releases/tag/v3.0.0-rc1
> [6] https://github.com/apache/flink-web/pull/604
>


-- 
https://twitter.com/snntrable
https://github.com/knaufk


Re: [SUMMARY] Flink 1.17 release sync 23rd of January, 2023

2023-01-26 Thread Rui Fan
Hi Martijn,

I was working on FLINK-30623[1], it has been merged today.
And the performance of Unaligned checkpoint[2][3] has been
recovered.

[1] https://issues.apache.org/jira/browse/FLINK-30623
[2]
http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=checkpointSingleInput.UNALIGNED&extr=on&quarts=on&equid=off&env=2&revs=200
[3]
http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=checkpointSingleInput.UNALIGNED_1&extr=on&quarts=on&equid=off&env=2&revs=200

Best
Rui Fan

On Thu, Jan 26, 2023 at 8:57 PM Martijn Visser 
wrote:

> Hi Lijie,
>
> > In summary, I would prefer to extend the "revert regression commits" to
> 31st.
>
> Sounds good to me.
>
> Thanks,
>
> Martijn
>
> Op do 26 jan. 2023 om 12:48 schreef Lijie Wang :
>
> > Hi Martijn,
> >
> > Thanks for your detailed explanation. Honestly, the FLINK-30624 is
> unlikely
> > to be completed before the 26th, but it 's likely to be completed before
> > the 31st. Considering that there will be a period of time between feature
> > freeze and release, I think we have enough time to see if it is actually
> > fixed.
> >
> > In summary, I would prefer to extend the "revert regression commits" to
> > 31st.
> >
> > Best,
> > Lijie
> >
> > Martijn Visser  于2023年1月26日周四 16:22写道:
> >
> > > Hi Lijie,
> > >
> > > You're right, that's not clearly phrased. Ideally we would like to get
> it
> > > fixed before the 26th, because that gives us a couple of days to
> monitor
> > > the benchmarks to see if the regression is actually fixed. If a
> > contributor
> > > is actively working on a ticket, we could consider delaying the branch
> > cut
> > > (not the feature freeze) over until the issue is resolved. That is of
> > > course except the fix would take something like more than a week after
> > the
> > > 31st, then it makes more sense to revert the commit that introduced the
> > > regression.
> > >
> > > Let me know what you think.
> > >
> > > Best regards,
> > >
> > > Martijn
> > >
> > > Op do 26 jan. 2023 om 01:11 schreef Lijie Wang <
> wangdachui9...@gmail.com
> > >:
> > >
> > > > Hi Martijn,
> > > >
> > > > I'm working on FLINK-30624,  and it may take a while to be resolved.
> Do
> > > you
> > > > mean we should resolve it before the 26th? I used to think the
> deadline
> > > was
> > > > the 31st(the date of feature freeze).
> > > >
> > > > Best,
> > > > Lijie
> > > >
> > > > Martijn Visser  于2023年1月25日周三 18:07写道:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > A summary of the release sync of yesterday:
> > > > >
> > > > > - We still have 3 performance regressions (
> > > > > https://issues.apache.org/jira/browse/FLINK-30623,
> > > > > https://issues.apache.org/jira/browse/FLINK-30625,
> > > > > https://issues.apache.org/jira/browse/FLINK-30624) that are being
> > > worked
> > > > > on
> > > > > but need to be completed before the release branch cut on the 31st.
> > If
> > > we
> > > > > can't merge PRs to resolve this (at latest on Thursday the 26th) we
> > > will
> > > > > revert the commits that introduced the regressions.
> > > > > - There are 3 release blockers from a test perspective:
> > > > > https://issues.apache.org/jira/browse/FLINK-29405,
> > > > > https://issues.apache.org/jira/browse/FLINK-30727 and
> > > > > https://issues.apache.org/jira/browse/FLINK-29427. Please make
> sure
> > > that
> > > > > if
> > > > > you are assigned to this ticket, that you have marked the ticket as
> > "In
> > > > > Progress".
> > > > > - The feature freeze starts on Thursday the 31st of January and the
> > > > release
> > > > > branch will be cut as soon as the blockers have been resolved. When
> > the
> > > > > release branch has been cut, the release testing will start.
> > > > >
> > > > > Best regards,
> > > > >
> > > > > Qingsheng, Leonard, Matthias and Martijn
> > > > >
> > > >
> > >
> >
>


Re: Reworking the Rescale API

2023-01-26 Thread Konstantin Knauf
Hi Gyula,

if the adaptive scheduler supported active resource managers, would there
be any other blocker to migrate to it? I don't know much about the
implementation-side here, but conceptually once we have session mode
support and each Jobs in a session clusters declaris their desired
parallelism (!=infinity) there shouldn't be a big gap to support active
resource managers. Am I missing something, Chesnay?

Regarding the complexity, I was referring to the procedure that Max
outlines in his ticket around check if slots are available or not and then
triggering scaling operations. The adaptive scheduler already does this and
is more responsive in that regard than an external process would be in my
understanding.

Cheers,

Konstantin



Am Do., 26. Jan. 2023 um 15:05 Uhr schrieb Gyula Fóra :

> Hi Konstantin!
>
> I think the Adaptive Scheduler still will not support Kubernetes Native
> integration and can only be used in standalone mode. This means that the
> operator needs to manage all resources externally, and compute exactly how
> much new slots are needed during rescaling etc.
>
> I think whatever scaling API we build, it should work for both standalone
> and native integration as much as possible. It's not a duplicated effort to
> add it to the standard scheduler as long as the adaptive scheduler does not
> support active resource management.
>
> Also it seems this will not reduce complexity on the operator side, which
> can already do scaling actions by executing an upgrade.
>
> And a side note: the operator supports both native and standalone
> integration (both standard and adaptive scheduler this way) but the bigger
> problem is actually computing the required number of slots and required new
> resources which is much harder than simply using active resource management.
>
> Cheers,
> Gyula
>
> On Thu, Jan 26, 2023 at 2:57 PM Konstantin Knauf 
> wrote:
>
>> Hi Max,
>>
>> it seems to me we are now running in some of the potential duplication of
>> efforts across the standard and adaptive scheduler that Chesnay had
>> mentioned on the original ticket. The issue of having to do a full restart
>> of the Job for rescaling as well as waiting for resources to be available
>> before doing a rescaling operation were some of the main motivations behind
>> introducing the adaptive scheduler. In the adaptive scheduler we can
>> further do things like only to trigger a rescaling operations exactly when
>> a checkpoint was completed to minimize reprocessing. For Jobs with small
>> state size, the downtime during rescaling can already be << 1 second today.
>>
>> Chesnay and David Moravek are currently in the process of drafting two
>> FLIPs that will extend the support of the adaptive scheduler to session
>> mode and will allow clients to change the desired/min/max parallelism of
>> the vertices of a Job during its runtime via the REST API. We currently
>> plan to publish a draft of these FLIPs next week for discussion. Would you
>> consider moving to the adaptive scheduler for the kubernetes operator
>> provided these FLIPs make it? I think, it has the potential to simplify the
>> logic required for rescaling on the operator side quite a bit.
>>
>> Best,
>>
>> Konstantin
>>
>>
>> Am Do., 26. Jan. 2023 um 12:16 Uhr schrieb Maximilian Michels <
>> m...@apache.org>:
>>
>>> Hey ConradJam,
>>>
>>> Thank you for your thoughtful response. It would be great to start
>>> writing
>>> a FLIP for the Rescale API. If you want to take a stab, please go ahead,
>>> I'd be happy to review. I'm sure Gyula or others will also chime in.
>>>
>>> I want to answer your question so we are aligned:
>>>
>>> ● Does scaling work on YARN, or just k8s?
>>> >
>>>
>>> I think it should work for both YARN and K8s. We would have to make
>>> changes
>>> to the drivers (AbstractResourceManagerDriver) which is implemented for
>>> both K8s and YARN. The outlined approach for rescaling does not require
>>> integrating with those systems, just maybe updating how the driver is
>>> used,
>>> so we should be able to make it work across both YARN and K8s.
>>>
>>> ● Rescaling supports Standalone mode?
>>> >
>>>
>>> Yes, I think it should and easily can. We do use a different type of
>>> resource manager (StandaloneResourceManager, not ActiveResourceManager)
>>> but
>>> I think the logic will sit on a higher level where the ResourceManager
>>> implementation is not relevant.
>>>
>>> ● Can we simplify the recovery steps?
>>> >
>>>
>>> For the first version, I would prefer the simple approach of (1)
>>> acquiring
>>> the required slots for rescaling, then (2) trigger a stop with savepoint
>>> (3) resubmit the job with updated parallelisms. What you have in mind is
>>> a
>>> bit more involved but certainly a great optimization, especially when
>>> only
>>> a fraction of the job state needs to be repartitioned.
>>>
>>> Of course, there are many details, such as
>>> > ● At some point we may not be able to use this kind of hot update, and
>>> > still need to resta

Re: Reworking the Rescale API

2023-01-26 Thread Gyula Fóra
If the adaptive scheduler would support all execution modes like Native
Applications, Sessions etc including active resource management then I
think we could use that all the time. I would love to use one scheduler
instead of having 2 options.

Currently however there is a huge gap in functionality between
active/passive resource management and from my experience, the active
(native) integration is much more convenient for Kubernetes environments.

Gyula

On Thu, Jan 26, 2023 at 3:13 PM Konstantin Knauf  wrote:

> Hi Gyula,
>
> if the adaptive scheduler supported active resource managers, would there
> be any other blocker to migrate to it? I don't know much about the
> implementation-side here, but conceptually once we have session mode
> support and each Jobs in a session clusters declaris their desired
> parallelism (!=infinity) there shouldn't be a big gap to support active
> resource managers. Am I missing something, Chesnay?
>
> Regarding the complexity, I was referring to the procedure that Max
> outlines in his ticket around check if slots are available or not and then
> triggering scaling operations. The adaptive scheduler already does this and
> is more responsive in that regard than an external process would be in my
> understanding.
>
> Cheers,
>
> Konstantin
>
>
>
> Am Do., 26. Jan. 2023 um 15:05 Uhr schrieb Gyula Fóra <
> gyula.f...@gmail.com>:
>
>> Hi Konstantin!
>>
>> I think the Adaptive Scheduler still will not support Kubernetes Native
>> integration and can only be used in standalone mode. This means that the
>> operator needs to manage all resources externally, and compute exactly how
>> much new slots are needed during rescaling etc.
>>
>> I think whatever scaling API we build, it should work for both standalone
>> and native integration as much as possible. It's not a duplicated effort to
>> add it to the standard scheduler as long as the adaptive scheduler does not
>> support active resource management.
>>
>> Also it seems this will not reduce complexity on the operator side, which
>> can already do scaling actions by executing an upgrade.
>>
>> And a side note: the operator supports both native and standalone
>> integration (both standard and adaptive scheduler this way) but the bigger
>> problem is actually computing the required number of slots and required new
>> resources which is much harder than simply using active resource management.
>>
>> Cheers,
>> Gyula
>>
>> On Thu, Jan 26, 2023 at 2:57 PM Konstantin Knauf 
>> wrote:
>>
>>> Hi Max,
>>>
>>> it seems to me we are now running in some of the potential duplication
>>> of efforts across the standard and adaptive scheduler that Chesnay had
>>> mentioned on the original ticket. The issue of having to do a full restart
>>> of the Job for rescaling as well as waiting for resources to be available
>>> before doing a rescaling operation were some of the main motivations behind
>>> introducing the adaptive scheduler. In the adaptive scheduler we can
>>> further do things like only to trigger a rescaling operations exactly when
>>> a checkpoint was completed to minimize reprocessing. For Jobs with small
>>> state size, the downtime during rescaling can already be << 1 second today.
>>>
>>> Chesnay and David Moravek are currently in the process of drafting two
>>> FLIPs that will extend the support of the adaptive scheduler to session
>>> mode and will allow clients to change the desired/min/max parallelism of
>>> the vertices of a Job during its runtime via the REST API. We currently
>>> plan to publish a draft of these FLIPs next week for discussion. Would you
>>> consider moving to the adaptive scheduler for the kubernetes operator
>>> provided these FLIPs make it? I think, it has the potential to simplify the
>>> logic required for rescaling on the operator side quite a bit.
>>>
>>> Best,
>>>
>>> Konstantin
>>>
>>>
>>> Am Do., 26. Jan. 2023 um 12:16 Uhr schrieb Maximilian Michels <
>>> m...@apache.org>:
>>>
 Hey ConradJam,

 Thank you for your thoughtful response. It would be great to start
 writing
 a FLIP for the Rescale API. If you want to take a stab, please go ahead,
 I'd be happy to review. I'm sure Gyula or others will also chime in.

 I want to answer your question so we are aligned:

 ● Does scaling work on YARN, or just k8s?
 >

 I think it should work for both YARN and K8s. We would have to make
 changes
 to the drivers (AbstractResourceManagerDriver) which is implemented for
 both K8s and YARN. The outlined approach for rescaling does not require
 integrating with those systems, just maybe updating how the driver is
 used,
 so we should be able to make it work across both YARN and K8s.

 ● Rescaling supports Standalone mode?
 >

 Yes, I think it should and easily can. We do use a different type of
 resource manager (StandaloneResourceManager, not ActiveResourceManager)
 but
 I think the logic will sit on a higher

Re: Reworking the Rescale API

2023-01-26 Thread Maximilian Michels
Thanks for the replies! I don't mind which scheduler handles the
implementation, as long as autoscaling via the Flink operator works
with it.

I see slightly different goals for the standard and the adaptive
scheduler. The adaptive scheduler's goal is to adapt the Flink job
according to the available resources. The standard scheduler's goal is
to acquire / create as many resources as needed to run the provided
Flink job. From this perspective, the standard scheduler is a better
fit for the Flink Kubernetes operator because it alters only the job
(the operator parallelisms) and leaves the resource management to
Flink.

On the other hand, there are obvious advantages of the adaptive
scheduler, like knowing upfront which resources are available, support
for overriding the parallelism of job vertices, and knowing about
Flink runtime events.

We may be able to adapt the adaptive scheduler to fit the goal of the
Kubernetes operator and provide this functionality through the Rescale
API. How much work that would be, I'm not completely sure yet. After
all, both schedulers share the same super class, so it should be
doable if we add support beyond its current "reactiveness".

>Chesnay and David Moravek are currently in the process of drafting two FLIPs 
>that will extend the support of the adaptive scheduler to session mode and 
>will allow clients to change the desired/min/max parallelism of the vertices 
>of a Job during its runtime via the REST API

It would be good to create a discussion on the mailing list around
these FLIPs you mentioned. This may have already happened, but I
couldn't find anything. As Gyula also mentioned, the session mode is
only one side of the story. It does sound interesting that the
upcoming FLIP may introduce per-vertex parallelism overrides. Maybe
you or the FLIP authors have a good idea how to solve the conflicting
goals between standard and adaptive scheduler (as described above)?

-Max


On Thu, Jan 26, 2023 at 3:19 PM Gyula Fóra  wrote:
>
> If the adaptive scheduler would support all execution modes like Native
> Applications, Sessions etc including active resource management then I
> think we could use that all the time. I would love to use one scheduler
> instead of having 2 options.
>
> Currently however there is a huge gap in functionality between
> active/passive resource management and from my experience, the active
> (native) integration is much more convenient for Kubernetes environments.
>
> Gyula
>
> On Thu, Jan 26, 2023 at 3:13 PM Konstantin Knauf  wrote:
>
> > Hi Gyula,
> >
> > if the adaptive scheduler supported active resource managers, would there
> > be any other blocker to migrate to it? I don't know much about the
> > implementation-side here, but conceptually once we have session mode
> > support and each Jobs in a session clusters declaris their desired
> > parallelism (!=infinity) there shouldn't be a big gap to support active
> > resource managers. Am I missing something, Chesnay?
> >
> > Regarding the complexity, I was referring to the procedure that Max
> > outlines in his ticket around check if slots are available or not and then
> > triggering scaling operations. The adaptive scheduler already does this and
> > is more responsive in that regard than an external process would be in my
> > understanding.
> >
> > Cheers,
> >
> > Konstantin
> >
> >
> >
> > Am Do., 26. Jan. 2023 um 15:05 Uhr schrieb Gyula Fóra <
> > gyula.f...@gmail.com>:
> >
> >> Hi Konstantin!
> >>
> >> I think the Adaptive Scheduler still will not support Kubernetes Native
> >> integration and can only be used in standalone mode. This means that the
> >> operator needs to manage all resources externally, and compute exactly how
> >> much new slots are needed during rescaling etc.
> >>
> >> I think whatever scaling API we build, it should work for both standalone
> >> and native integration as much as possible. It's not a duplicated effort to
> >> add it to the standard scheduler as long as the adaptive scheduler does not
> >> support active resource management.
> >>
> >> Also it seems this will not reduce complexity on the operator side, which
> >> can already do scaling actions by executing an upgrade.
> >>
> >> And a side note: the operator supports both native and standalone
> >> integration (both standard and adaptive scheduler this way) but the bigger
> >> problem is actually computing the required number of slots and required new
> >> resources which is much harder than simply using active resource 
> >> management.
> >>
> >> Cheers,
> >> Gyula
> >>
> >> On Thu, Jan 26, 2023 at 2:57 PM Konstantin Knauf 
> >> wrote:
> >>
> >>> Hi Max,
> >>>
> >>> it seems to me we are now running in some of the potential duplication
> >>> of efforts across the standard and adaptive scheduler that Chesnay had
> >>> mentioned on the original ticket. The issue of having to do a full restart
> >>> of the Job for rescaling as well as waiting for resources to be available
> >>> before doing a rescaling opera

Re: Reworking the Rescale API

2023-01-26 Thread Chesnay Schepler

On 26/01/2023 16:18, Maximilian Michels wrote:

I see slightly different goals for the standard and the adaptive
scheduler. The adaptive scheduler's goal is to adapt the Flink job
according to the available resources.


This is really a misconception that we just have to stomp out.

This statement only applies to /reactive mode/, a /special mode/ in 
which the adaptive scheduler (AS) can run in where active resource 
management is not supported since requesting infinite resources from k8s 
doesn't really make sense.


The AS itself can work perfectly fine with active resource management, 
and has no effect on how the RM talks to k8s. It can just keep the job 
running in cases where less than desired (==user-provided parallelism) 
resources are provided by k8s (possibly temporarily).


On 26/01/2023 16:18, Maximilian Michels wrote:

After
all, both schedulers share the same super class


Apart from implementing the same interface the implementations of the 
adaptive and default schedulers are separate.


Re: Reworking the Rescale API

2023-01-26 Thread Maximilian Michels
Thanks for the explanation. If not for the "reactive mode", what is
the advantage of the adaptive scheduler? What other modes does it
support?

>Apart from implementing the same interface the implementations of the adaptive 
>and default schedulers are separate.

Last time I looked they implemented the same interface and the same
base class. Of course, their behavior is quite different.

I'm still very interested in learning about the future FLIPs
mentioned. Based on the replies, I'm assuming that they will support
the changes required for
https://issues.apache.org/jira/browse/FLINK-30773, or at least provide
the basis for implementing them.

-Max

On Thu, Jan 26, 2023 at 4:57 PM Chesnay Schepler  wrote:
>
> On 26/01/2023 16:18, Maximilian Michels wrote:
>
> I see slightly different goals for the standard and the adaptive
> scheduler. The adaptive scheduler's goal is to adapt the Flink job
> according to the available resources.
>
> This is really a misconception that we just have to stomp out.
>
> This statement only applies to reactive mode, a special mode in which the 
> adaptive scheduler (AS) can run in where active resource management is not 
> supported since requesting infinite resources from k8s doesn't really make 
> sense.
>
> The AS itself can work perfectly fine with active resource management, and 
> has no effect on how the RM talks to k8s. It can just keep the job running in 
> cases where less than desired (==user-provided parallelism) resources are 
> provided by k8s (possibly temporarily).
>
> On 26/01/2023 16:18, Maximilian Michels wrote:
>
> After
> all, both schedulers share the same super class
>
> Apart from implementing the same interface the implementations of the 
> adaptive and default schedulers are separate.


[VOTE] Externalize AWS formats to flink-connector-aws

2023-01-26 Thread Teoh, Hong
Hi all,

As discussed in the discussion thread [1], I would like to propose we 
externalize flink-avro-glue-schema-registry and flink-json-glue-schema-registry 
formats to the flink-connector-aws repository [2].

Motivation:
1. We can unify and upgrade the AWS SDK versions more easily.
2. We can now move flink-connector-aws-base to the external repository as well.

Voting Schema:
Consensus, committers have binding votes, open for at least 72 hours.

Thanks,
Hong


[1] https://lists.apache.org/thread/03l99yz62mq3ngj8cvg8stk4foym65jq
[2] https://github.com/apache/flink-connector-aws/


Re: Reworking the Rescale API

2023-01-26 Thread Chesnay Schepler

There's the default and reactive mode; nothing else.
At it's core they are the same thing; reactive mode just cranks up the 
desired parallelism to infinity and enforces certain assumptions (e.g., 
no active resource management).


The advantage is that the adaptive scheduler can run jobs while not 
sufficient resources are available, and scale things up again once they 
are available.
This is it's core functionality, but we always intended to extend it 
such that users can modify the parallelism at runtime as well.
And since the AS can already rescale jobs (and was purpose-built with 
that functionality in mind), this is just a matter of exposing an API 
for it. Everything else is already there.


As a concrete use-case, let's say you have an SLA that says jobs must 
not be down longer than X seconds, and a TM just crashed.
If you can absolutely guarantee that your k8s cluster can provision a 
new TM within X seconds, no matter what cruel reality has in store for 
you, than you /may/ not need it.

If you can't, well then here's a use-case for you.

> Last time I looked they implemented the same interface and the same 
base class. Of course, their behavior is quite different.


They never shared a base class since day 1. Are you maybe mixing up the 
AdaptiveScheduler and AdaptiveBatchScheduler?


As for FLINK-30773, I think that should be covered.

On 26/01/2023 17:10, Maximilian Michels wrote:

Thanks for the explanation. If not for the "reactive mode", what is
the advantage of the adaptive scheduler? What other modes does it
support?


Apart from implementing the same interface the implementations of the adaptive 
and default schedulers are separate.

Last time I looked they implemented the same interface and the same
base class. Of course, their behavior is quite different.

I'm still very interested in learning about the future FLIPs
mentioned. Based on the replies, I'm assuming that they will support
the changes required for
https://issues.apache.org/jira/browse/FLINK-30773, or at least provide
the basis for implementing them.

-Max

On Thu, Jan 26, 2023 at 4:57 PM Chesnay Schepler  wrote:

On 26/01/2023 16:18, Maximilian Michels wrote:

I see slightly different goals for the standard and the adaptive
scheduler. The adaptive scheduler's goal is to adapt the Flink job
according to the available resources.

This is really a misconception that we just have to stomp out.

This statement only applies to reactive mode, a special mode in which the 
adaptive scheduler (AS) can run in where active resource management is not 
supported since requesting infinite resources from k8s doesn't really make 
sense.

The AS itself can work perfectly fine with active resource management, and has 
no effect on how the RM talks to k8s. It can just keep the job running in cases 
where less than desired (==user-provided parallelism) resources are provided by 
k8s (possibly temporarily).

On 26/01/2023 16:18, Maximilian Michels wrote:

After
all, both schedulers share the same super class

Apart from implementing the same interface the implementations of the adaptive 
and default schedulers are separate.




Re: Reworking the Rescale API

2023-01-26 Thread Gyula Fóra
Chesnay,

Seems like you are suggesting that the Adaptive scheduler does everything
the standard scheduler does and more.

I am clearly not an expert on this topic but can you please explain why the
AdaptiveScheduler is not the default scheduler?
If it can do everything, why do we even have 2 schedulers? Why not simply
drop the "old" one?

That would probably clear up all confusionsthen :)

Gyula

On Thu, Jan 26, 2023 at 6:23 PM Chesnay Schepler  wrote:

> There's the default and reactive mode; nothing else.
> At it's core they are the same thing; reactive mode just cranks up the
> desired parallelism to infinity and enforces certain assumptions (e.g.,
> no active resource management).
>
> The advantage is that the adaptive scheduler can run jobs while not
> sufficient resources are available, and scale things up again once they
> are available.
> This is it's core functionality, but we always intended to extend it
> such that users can modify the parallelism at runtime as well.
> And since the AS can already rescale jobs (and was purpose-built with
> that functionality in mind), this is just a matter of exposing an API
> for it. Everything else is already there.
>
> As a concrete use-case, let's say you have an SLA that says jobs must
> not be down longer than X seconds, and a TM just crashed.
> If you can absolutely guarantee that your k8s cluster can provision a
> new TM within X seconds, no matter what cruel reality has in store for
> you, than you /may/ not need it.
> If you can't, well then here's a use-case for you.
>
>  > Last time I looked they implemented the same interface and the same
> base class. Of course, their behavior is quite different.
>
> They never shared a base class since day 1. Are you maybe mixing up the
> AdaptiveScheduler and AdaptiveBatchScheduler?
>
> As for FLINK-30773, I think that should be covered.
>
> On 26/01/2023 17:10, Maximilian Michels wrote:
> > Thanks for the explanation. If not for the "reactive mode", what is
> > the advantage of the adaptive scheduler? What other modes does it
> > support?
> >
> >> Apart from implementing the same interface the implementations of the
> adaptive and default schedulers are separate.
> > Last time I looked they implemented the same interface and the same
> > base class. Of course, their behavior is quite different.
> >
> > I'm still very interested in learning about the future FLIPs
> > mentioned. Based on the replies, I'm assuming that they will support
> > the changes required for
> > https://issues.apache.org/jira/browse/FLINK-30773, or at least provide
> > the basis for implementing them.
> >
> > -Max
> >
> > On Thu, Jan 26, 2023 at 4:57 PM Chesnay Schepler
> wrote:
> >> On 26/01/2023 16:18, Maximilian Michels wrote:
> >>
> >> I see slightly different goals for the standard and the adaptive
> >> scheduler. The adaptive scheduler's goal is to adapt the Flink job
> >> according to the available resources.
> >>
> >> This is really a misconception that we just have to stomp out.
> >>
> >> This statement only applies to reactive mode, a special mode in which
> the adaptive scheduler (AS) can run in where active resource management is
> not supported since requesting infinite resources from k8s doesn't really
> make sense.
> >>
> >> The AS itself can work perfectly fine with active resource management,
> and has no effect on how the RM talks to k8s. It can just keep the job
> running in cases where less than desired (==user-provided parallelism)
> resources are provided by k8s (possibly temporarily).
> >>
> >> On 26/01/2023 16:18, Maximilian Michels wrote:
> >>
> >> After
> >> all, both schedulers share the same super class
> >>
> >> Apart from implementing the same interface the implementations of the
> adaptive and default schedulers are separate.
>
>


Re: Reworking the Rescale API

2023-01-26 Thread David Morávek
Hi Gyula,


> can you please explain why the AdaptiveScheduler is not the default
> scheduler?


There are still some smaller bits missing. As far as I know, the missing
parts are:

1) Local recovery (reusing the already downloaded state files after restart
/ rescale)
2) Support for fine-grained resource management
3) Support for the session cluster (Chesnay will be submitting a FLIP for
this soon)

We're looking into addressing all of these limitations in the short term.

Personally, I'd love to start a discussion about making transitioning the
AdaptiveScheduler into a default one after those limitations are fixed.
Being able to eventually deprecate and remove the DefaultScheduler would
simplify the code-base by a lot since there are many adapters between new
and old interfaces (eg. SlotPool-related interfaces).

Best,
D.

On Thu, Jan 26, 2023 at 6:27 PM Gyula Fóra  wrote:

> Chesnay,
>
> Seems like you are suggesting that the Adaptive scheduler does everything
> the standard scheduler does and more.
>
> I am clearly not an expert on this topic but can you please explain why the
> AdaptiveScheduler is not the default scheduler?
> If it can do everything, why do we even have 2 schedulers? Why not simply
> drop the "old" one?
>
> That would probably clear up all confusionsthen :)
>
> Gyula
>
> On Thu, Jan 26, 2023 at 6:23 PM Chesnay Schepler 
> wrote:
>
> > There's the default and reactive mode; nothing else.
> > At it's core they are the same thing; reactive mode just cranks up the
> > desired parallelism to infinity and enforces certain assumptions (e.g.,
> > no active resource management).
> >
> > The advantage is that the adaptive scheduler can run jobs while not
> > sufficient resources are available, and scale things up again once they
> > are available.
> > This is it's core functionality, but we always intended to extend it
> > such that users can modify the parallelism at runtime as well.
> > And since the AS can already rescale jobs (and was purpose-built with
> > that functionality in mind), this is just a matter of exposing an API
> > for it. Everything else is already there.
> >
> > As a concrete use-case, let's say you have an SLA that says jobs must
> > not be down longer than X seconds, and a TM just crashed.
> > If you can absolutely guarantee that your k8s cluster can provision a
> > new TM within X seconds, no matter what cruel reality has in store for
> > you, than you /may/ not need it.
> > If you can't, well then here's a use-case for you.
> >
> >  > Last time I looked they implemented the same interface and the same
> > base class. Of course, their behavior is quite different.
> >
> > They never shared a base class since day 1. Are you maybe mixing up the
> > AdaptiveScheduler and AdaptiveBatchScheduler?
> >
> > As for FLINK-30773, I think that should be covered.
> >
> > On 26/01/2023 17:10, Maximilian Michels wrote:
> > > Thanks for the explanation. If not for the "reactive mode", what is
> > > the advantage of the adaptive scheduler? What other modes does it
> > > support?
> > >
> > >> Apart from implementing the same interface the implementations of the
> > adaptive and default schedulers are separate.
> > > Last time I looked they implemented the same interface and the same
> > > base class. Of course, their behavior is quite different.
> > >
> > > I'm still very interested in learning about the future FLIPs
> > > mentioned. Based on the replies, I'm assuming that they will support
> > > the changes required for
> > > https://issues.apache.org/jira/browse/FLINK-30773, or at least provide
> > > the basis for implementing them.
> > >
> > > -Max
> > >
> > > On Thu, Jan 26, 2023 at 4:57 PM Chesnay Schepler
> > wrote:
> > >> On 26/01/2023 16:18, Maximilian Michels wrote:
> > >>
> > >> I see slightly different goals for the standard and the adaptive
> > >> scheduler. The adaptive scheduler's goal is to adapt the Flink job
> > >> according to the available resources.
> > >>
> > >> This is really a misconception that we just have to stomp out.
> > >>
> > >> This statement only applies to reactive mode, a special mode in which
> > the adaptive scheduler (AS) can run in where active resource management
> is
> > not supported since requesting infinite resources from k8s doesn't really
> > make sense.
> > >>
> > >> The AS itself can work perfectly fine with active resource management,
> > and has no effect on how the RM talks to k8s. It can just keep the job
> > running in cases where less than desired (==user-provided parallelism)
> > resources are provided by k8s (possibly temporarily).
> > >>
> > >> On 26/01/2023 16:18, Maximilian Michels wrote:
> > >>
> > >> After
> > >> all, both schedulers share the same super class
> > >>
> > >> Apart from implementing the same interface the implementations of the
> > adaptive and default schedulers are separate.
> >
> >
>


Re: [VOTE] Release flink-connector-gcp-pubsub v3.0.0, release candidate #1

2023-01-26 Thread Sergey Nuyanzin
+1 (non-binding)

* verified hashes and signatures
* verified versions in pom files
* verified LICENSE and NOTICE files
* compared sources against git tag
* built from sources


On Thu, Jan 26, 2023 at 3:08 PM Konstantin Knauf  wrote:

> +1 (binding)
>
> * checked Maven and source artifact signatures and checksums - OK
> * no binaries or packaged dependencies - OK
> * checked website changes - Approved.
>
> Am Fr., 20. Jan. 2023 um 15:39 Uhr schrieb Martijn Visser <
> martijnvis...@apache.org>:
>
> > Hi everyone,
> > Please review and vote on the release candidate #1 for the version 3.0.0,
> > as follows:
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> >
> > The complete staging area is available for your review, which includes:
> > * JIRA release notes [1],
> > * the official Apache source release to be deployed to dist.apache.org
> > [2],
> > which are signed with the key with fingerprint
> > A5F3BCE4CBE993573EC5966A65321B8382B219AF [3],
> > * all artifacts to be deployed to the Maven Central Repository [4],
> > * source code tag v3.0.0-rc1 [5],
> > * website pull request listing the new release [6].
> >
> > The vote will be open for at least 72 hours. It is adopted by majority
> > approval, with at least 3 PMC affirmative votes.
> >
> > Thanks,
> > Release Manager
> >
> > [1]
> >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12352589
> > [2]
> >
> >
> https://dist.apache.org/repos/dist/dev/flink/flink-connector-gcp-pubsub-3.0.0-rc1
> > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > [4]
> > https://repository.apache.org/content/repositories/orgapacheflink-1581/
> > [5]
> >
> >
> https://github.com/apache/flink-connector-gcp-pubsub/releases/tag/v3.0.0-rc1
> > [6] https://github.com/apache/flink-web/pull/604
> >
>
>
> --
> https://twitter.com/snntrable
> https://github.com/knaufk
>


-- 
Best regards,
Sergey


[jira] [Created] (FLINK-30797) Bump json5 from 1.0.1 to 1.0.2 in

2023-01-26 Thread Martijn Visser (Jira)
Martijn Visser created FLINK-30797:
--

 Summary: Bump json5 from 1.0.1 to 1.0.2 in
 Key: FLINK-30797
 URL: https://issues.apache.org/jira/browse/FLINK-30797
 Project: Flink
  Issue Type: Technical Debt
  Components: Runtime / Web Frontend
Reporter: Martijn Visser
Assignee: Martijn Visser
 Fix For: 1.17.0


Dependabot has created https://github.com/apache/flink/pull/21617

This is the corresponding Jira ticket



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30798) Make OutputFormat support speculative execution for batch jobs

2023-01-26 Thread Biao Liu (Jira)
Biao Liu created FLINK-30798:


 Summary: Make OutputFormat support speculative execution for batch 
jobs
 Key: FLINK-30798
 URL: https://issues.apache.org/jira/browse/FLINK-30798
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Biao Liu
 Fix For: 1.17.0


This issue would make OutputFormat based sink run with speculative execution 
for batch jobs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30799) Make SinkFunction support speculative execution for batch jobs

2023-01-26 Thread Biao Liu (Jira)
Biao Liu created FLINK-30799:


 Summary: Make SinkFunction support speculative execution for batch 
jobs
 Key: FLINK-30799
 URL: https://issues.apache.org/jira/browse/FLINK-30799
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Biao Liu
 Fix For: 1.17.0


In this ticket, it would make SinkFunction based sink run with speculative 
execution for batch jobs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] Release 1.16.1, release candidate #1

2023-01-26 Thread Gabor Somogyi
+1 (non-binding)

* Verified versions in the poms
* Built from source
* Verified checksums and signatures
* Started basic workloads with kubernetes operator
* Verified NOTICE and LICENSE files

G


On Thu, Jan 26, 2023 at 2:28 PM Dawid Wysakowicz 
wrote:

> +1 (binding)
>
>- verified checksums & signatures
>- checked differences of pom.xml and NOTICE files with 1.16.0 release.
>looks good
>- checked source release contains no binaries
>- built from sources
>- run StateMachineExample on a local cluster
>- checked the web PR
>
> Best,
>
> Dawid
> On 19/01/2023 17:07, Martijn Visser wrote:
>
> Hi everyone,
> Please review and vote on the release candidate #1 for the version 1.16.1,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release and binary convenience releases to be
> deployed to dist.apache.org [2], which are signed with the key with
> fingerprint A5F3BCE4CBE993573EC5966A65321B8382B219AF [3] (,
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "release-1.16.1-rc1" [5],
> * website pull request listing the new release and adding announcement blog
> post [6].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Release Manager
>
> NOTE: The maven artifacts have been signed by Chesnay with the key with
> fingerprint C2EED7B111D464BA
>
> [1]https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12352344
> [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.16.1-rc1/
> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> [4] https://repository.apache.org/content/repositories/orgapacheflink-1580
> [5] https://github.com/apache/flink/releases/tag/release-1.16.1-rc1
> [6] https://github.com/apache/flink-web/pull/603
>
>